Our REU site focuses on the unifying theme of "Safe and Reliable AI and the Internet" spanning three interconnected sub-themes: (1) networking, (2) cybersecurity, (3) data science and AI.
We focus on research to address: (1) user safety, (2) trustworthiness of online information and online communication, and (3) reliability of online communication.
User safety projects will focus on protecting users from online misinformation and scams, as well as protecting user privacy. Trustworthiness projects will focus on establishing trust in online communication in social networks, understanding how misinformation spreads online, and defending against targeted attacks on online services and networks. Reliability projects will focus on efficient and robust online communication.
Faculty and staff at USC-ISI lead vibrant research in many areas of computer science, and are world-renowned experts in the sub-themes we have chosen for our REU site.
This REU site intentionally offers a wide spectrum of research topics to attract diverse student population.
Jointly, our sub-themes offer a broad learning experience, ranging from network and protocol layers of the Internet, all the way to how people use Internet today, and how we can understand broader contemporary societal and environmental changes via machine learning on big data.
Below are project we expect to offer in summer 2025.
Please click on to see expanded descriptions
Evaluation of Internet Peninsulas of Partial Reachability in
Network Routing (Supervisor: John Heidemann)
Growing out our our work studying Internet reliability, we identified
peninsulas as cases where parts of the Internet are reachable from
some places but not from others. Peninsulas are important because
they can prevent users from reaching what they want, and users have no
way to resolve the problem. In prior REU [Saluja22a] and PhD [Baltra23a] work we identified peninsulas by
sending pings.
thereby testing data plane connectivity. In this project,
the student will look for peninsulas in BGP routing data, testing
control plane connectivity, and compare the two. Research questions
include: how closely do data-plane and control-plane peninsulas
compare? Does control-plane and BGP give us more vantage points (VPs)
than data-plane? How does the number of VPs affect how many
peninsulas we see?
Expected research outcomes: Publications and poster presentations, algorithms that can be applied to track future peninsulas combining data plane and control plane information.
Expected learning outcomes: Students will learn how to process large datasets using Hadoop, how to perform statistical hypothesis testing and how to correlate measurements, how to present their research and how to write research papers.
Understanding Non-productive DNS Root Traffic (Supervisor: Jelena Mirkovic)
DNS roots are key infrastructure for the Internet. They receive billions of queries per day,
but the majority of these are non-productive: repetitions of queries whose replies should have been cached, ill formed queries that will produce no useful reply, and queries from private IP addresses that cannot receive a reply. This project aims to quantify different non-productive query categories, and to understand generative processes behind them, building on past REU student work [Ginesin22b]. Research questions include: Do different classes of non-productive queries increase over time? Are queries in each class byproduct of malicious activities or do they result from end-system misconfiguration? The student will leverage and enhance the existing data analysis tools and the existing Hadoop cluster to parse 10 years of DNS data at USC-ISI's B-root, and quantify non-productive query categories. The student will use clustering and data mining to test various hypotheses of processes that generate non-productive DNS queries.
Expected research outcomes: Publications and poster presentations, tool improvements for non-productive DNS query measurement.
Expected learning outcomes: Students will learn how to write and deploy code on Hadoop, how to process large dataset, how to reason about DNS traffic and its sources, how to present their research and how to write research papers.
Wireless and Traditional Network Measurements from Mobile Platforms (Supervisor: Eric Kline)
5G/6G wireless networks are poorly understood and few
measurement mechanisms exist to understand them. This project aims to
pierce opaque wireless networks through measurements conducted
from the phones themselves. The student will leverage existing
measurement tools, such as scamper, executed on the mobile platform
(i.e., Android and iOS). Implementations of these tools exist on the
platforms, but the student may need to modify them to be fully functional.
Research questions include determining the effectiveness of
measurement tools on mobile networks, learning about the structure of mobile
networks, and how they interconnect with the broader Internet.
Expected research outcomes: Publications and poster presentations, tools for further measurement of mobile networks.
Expected learning outcomes: Students will learn how to write and deploy code on phones and phone emulators, how to measure wireless network structure, how to present their research and how to write research papers.
Privacy-preserving Record Linkage (Supervisor: Srivatsan Ravi)
Performing Record Linkage (RL) over data from varied and distributed data sources is challenging due to different formats of record identifiers (e.g., a person's full name). The privacy-preserving record linkage (PPRL) task extends RL with the additional requirement to not reveal sensitive information to any party present in the computation or to an adversary. In this project, the student will leverage multi-party computation cryptographic (MPCC) tools for solving the PPRL task efficiently over large data sets and will benchmark the costs of enforcing strong privacy. Research questions include: What is the cost of using MPCC over large datasets? What optimizations reduce the cost?
Expected research outcomes: Publications and poster presentations, measurements for PPRL efficiency over large datasets.
Expected learning outcomes: Students will learn how to use MPCC tools, how to work with large datasets using distributed computing, how to present their research and write research papers.
I am going to find you! Using Causal Inference to detect hidden flows (Supervisor: Alefiya Hussain)
Understanding how a cyberattack evolves is key to defending against
it. In the early stages of an attack malicious flows
are interleaved with benign ones for reconnaissance and infiltration
of the victim site. This project aims to detect such hidden flows,
leveraging causal inference with observational data, to build models
capable of detecting early threats. The student will prototype several
detection algorithms and evaluate them by generating synthetic traffic on a network testbed.
Research questions include: identification of
characteristics that can be used for hidden flow identification, the
effectiveness of hidden flow detection when mixed with various types of network application
traffic.
Expected research outcomes: Publications and poster presentations, algorithms for detection of stealthy flows.
Expected learning outcomes: Students will learn how to parse network traffic to extract data, how to orchestrate experiments on a network testbed, how to present their research and write research papers.
Assessing Security Posture of Cyberinfrastuctures through Semantic
Data Modeling and Analysis (Supervisor: Jelena Mirkovic)
Protecting public scientific cyberinfrastructures (CIs) must be protected against
accidental leaks or intentional attacks. These CIs must
remain open to facilitate scientific use, but they may become targets of attacks that seek to misuse data or resources. This project will investigate the use
of semantic data models to represent the systems, interfaces, data,
and workflows involved in the design, implementation, operation and
use of CI resources, to explore
the impacts of various threat models on the CI. The
student will
develop a knowledge graph of
SPHERE, a new security and privacy CI developed by USC-ISI through an
NSF mid-scale research infrastructure award.
The student will explore existing security frameworks
including MITRE, SPARTA, NIST and the NSF TrustedCI framework to
describe SPHERE's design. Research questions include: What level
of detail is needed for a knowledge graph support actionable decisions in the CI's security posture? Can we automate KG construction?
Expected research outcomes: Publications and poster presentations, a prototype knowledge graph for SPHERE CI.
Expected learning outcomes: Students will learn about various security frameworks, how to build knowledge graphs, how to present their research, and how to write research papers.
Analyzing the Impact of Misinformation on Voter Behavior in
Online Platforms (Supervisor: Emilio Ferrara)
This project aims to explore the influence of
misinformation on voter behavior within social media platforms. Given
the increasing prevalence of online misinformation and its potential
impact on democratic processes [Ferrara2016]
including by prior REU students [Allen2021] and
[Ko2021], this research is both timely and
significant. Research questions will
address the effectiveness of different misinformation types and the
role of platform algorithms. The student will utilize a mixed-methods approach,
combining quantitative data analysis with qualitative case studies to
understand how misinformation spreads and influences voter
decision-making.
Expected research outcomes: Publications and poster presentations, models that quantify how misinformation influences voter behaviors.
Expected learning outcomes: Students will learn how to work with large-scale social network data, how to identify and track misinformation online, how to present their research and how to write research papers.
AI-driven Simulation of Human Decision Making
(Supervisor: Luca Luceri)
This project aims to develop an AI-driven messaging testbed to simulate human decision-making by incorporating key psychological mechanisms. By addressing limitations in current large language models (LLMs), such as their inability to replicate deeper social and psychographic dynamics, the testbed will enable realistic simulations of behaviors like influence dynamics, opinion diffusion, and social approval. Integrating methodologies like Agent-Based Modeling, Reinforcement Learning, and Neuro-Symbolic AI, the testbed will provide a powerful tool for predicting and countering cyber threats, such as influence campaigns and foreign malign operations. Beyond defense, it will advance research in psychology, cognitive science, and AI, offering transformative capabilities for understanding and modeling human behavior at scale.
Expected research outcomes: Publications and poster presentations, conversational agents that seek to replicate deep social and psychographic dynamics
Expected learning outcomes: Students will learn how to work with large language models, write conversational agents, present their research, and write research papers.
Analyzing Healthy Restaurant Access and Its Effects on Diet and Well-Being Using Causal Machine Learning in the Gallup National Sample (Supervisor: Abigail Horn)
This project aims to harness large-scale data and methods from causal machine learning to explore associative and causal relationships between individuals’ access and exposure to healthy restaurants and their diet, diet-related health, and well-being. The student will link the restaurant nutrition density measures to individuals from a 100,000-person survey sample population from Gallup's National Sample and will explore the relationship across self-reported diet and food preparation behaviors, health outcomes, and factors related to well-being, and survey participants' access to healthy restaurants. Research questions include: How does restaurant vicinity and nutrition density impact people diet choices? Do impacts vary by sub-population?
Expected research outcomes: Publications and poster presentations, models of how healthy restaurant access correlates with resident diet choices.
Expected learning outcomes: Students will learn how to work with predictive modeling, deep learning, NLP, multi-level statistical modeling, causal modeling, and spatial computation, how to present their research and how to write research papers.
Boosting Simulation Techniques for Quantum Many-Body Physics (Supervisor: Itay Hen)
Simulations of quantum many-body physics systems are notorious for being a challenging task for today’s computers or even super computers. The Hen group devotes a significant portion of its resources to developing numerical simulation codes designed to advance the current state of the art in the field, thereby allowing the physics community to gain insights into the properties of complex large-scale quantum materials. In this project, the student will inspect various sub-routines of these simulation algorithms and develop ideas and write code to improve them.
Expected research outcomes: Publications and poster presentations, improved simulation algorithms
Expected learning outcomes: Students will learn how to work with quantum simulators, how to evaluate simulation algorithms and how to test them, how to present their research and how to write research papers.