Research Projects in SRAII 2026

+ Federated Learning for Neuroimaging of Alzheimer's Disease (mentor: Jose-Luis Ambite)

Federated Learning (FL) allows training a neural network over distributed sites without sharing data. Each site (learner) trains the neural network on its private local dataset for a given time or number of epochs, encrypts the network parameters, and sends this encrypted model to the federation controller. A federation controller securely aggregates the encrypted local learners’ models into a community model using an optimized CKKS fully homomorphic encryption (FHE) scheme, and sends the community model back to the sites. This cycle repeats for several federation rounds. Using FHE ensures that adversaries outside the federation cannot access any information about the learned model. Adding gradient noise to local training thwarts insider attacks. This project will explore FL method development and applications to neuroimaging of Alzheimers disease.

Expected research outcomes: Publications and poster presentations, improved FL algorithms.

Expected learning outcomes: Students will learn how to work with federated learning simulators, how to evaluate ML models for suitability in identifying patterns in brain images, how to present their research and how to write research papers.

+ Classifying Scientific Activity for National Policy Analysis (mentor: Andrea Belz)

National scientific research strategies are of great importance, but to date artificial intelligence (AI) has been used only minimally to study the output of research funding. The Management of Innovation, Entrepreneurial Research, and Venture Analysis (MINERVA) group at USC uses AI text analysis techniques to study scientific funding trends, in order to make policy recommendations. Scientific documents are notoriously difficult to classify because the language is highly specialized and requires training to understand. In this project, a student will advance an AI system that classifies a set of scientific documents, utilize it on a set of federally funded research projects, and relate it to national funding strategies.

Expected research outcomes: Publications and poster presentations.

Expected learning outcomes: Students will learn how to develop AI classification models, statistical techniques to assess performance and methods to use newly developed models on novel datasets. They will also learn how to present their research and how to write research papers.

+ Understanding UDP Internet Background Radiation (mentor: Michael Collins)

The Internet is awash in Internet Background Radiation (IBR), traffic which is not aimed at any particular server or user, but instead aimed randomly at everything on the Internet due to misconfiguration, opportunistic attacks, or network glitches. The majority of IBR analysis has focused on traffic as a whole or on Transmission Control Protocol (TCP) traffic, relatively little work has been done on the User Datagram Protocol (UDP) traffic. In this project, a student will analyze IBR data collected by ISI in order to characterize the UDP traffic, model its normal behavior over time, and develop techniques for identifying anomalies.

Expected research outcomes: Publications and poster presentations.

Expected learning outcomes: Students will learn how to analyze network traffic, both using tools such ss wireshark, tcpdump, pcapy and SiLK, and understand traffic on darkspaces. Students will learn how to identify unknown network traffic, research new protocols and statistical and text analysis techniques to analyze unknown payloads. Students will learn how to present their work and write research reports.

+ Auditing Information Suppression in Large Language Models (mentor: Emilio Ferrara)

This project builds tools to measure when and how LLMs with safety filters suppress, reframe, or omit requested information. Students will create a reproducible evaluation pipeline (prompt sets, ground-truth labeling, and metrics), run controlled experiments across APIs/models, and visualize results in an interactive dashboard. The goal is to advance safe and reliable AI by distinguishing appropriate safety refusals from unintended censorship and by proposing testable mitigations.

Expected research outcomes: Publications and poster presentations, datasets that quantify how information suppression occurs in LLMs.

Expected learning outcomes: Students will learn how to work with various large language model APIs, how to test for information suppresion, how to present their research and how to write research papers.

+ Early Detection of Coordinated Influence on Social Networks (mentor: Emilio Ferrara)

Students will develop graph- and language-based methods to spot early “conversation drivers” and coordinated accounts steering online narratives. Using public social data, they’ll implement streaming detectors, evaluate against historical events, and/or build a small visualization tool. The work supports a safe and reliable Internet by improving robustness against manipulation and amplifying trustworthy information.

Expected research outcomes: Publications and poster presentations, datasets that quantify coordinated influence campaigns on social networks.

Expected learning outcomes: Students will learn how to work with various social network APIs, how to test for coordinated influence, how to present their research and how to write research papers.

+ Robustness-by-Design: Stress-Testing AI Systems with Social Simulations (mentor: Emilio Ferrara)

This project creates lightweight agent-based simulations to generate realistic adversarial scenarios (e.g., bursts of low-credibility content or bot-like posting patterns) and uses them to stress-test AI moderation/ranking models. Students will prototype agents, define reliability metrics, and evaluate how design choices affect failure modes.

Expected research outcomes: Publications and poster presentations, reusable simulation code and a report with recommendations for safer AI-in-the-loop platforms.

Expected learning outcomes: Students will learn how to work with social simulation tools and with various AI agent architectures, how to design and execute stress tests, how to present their research and how to write research papers.

+ Evaluation of Internet Peninsulas of Partial Reachability in Network Routing (mentor: John Heidemann)

Growing out our our work studying Internet reliability, we identified peninsulas as cases where parts of the Internet are reachable from some places but not from others. Peninsulas are important because they can prevent users from reaching what they want, and users have no way to resolve the problem. In prior REU [Saluja22a] and PhD [Baltra23a] work we identified peninsulas by sending pings, thereby testing data plane connectivity. In this project, the student will look for peninsulas in BGP routing data, testing control plane connectivity, and compare the two. Research questions include: how closely do data-plane and control-plane peninsulas compare? Does control-plane and BGP give us more vantage points (VPs) than data-plane? How does the number of VPs affect how many peninsulas we see?

Expected research outcomes: Publications and poster presentations, algorithms that can be applied to track future peninsulas combining data plane and control plane information.

Expected learning outcomes: Students will learn how to process large datasets using Hadoop, how to perform statistical hypothesis testing and how to correlate measurements, how to present their research and how to write research papers.

+ Visualizating and Understanding Internet Outages in ISPs (mentor: John Heidemann)

The ANT lab has studied Internet outages for many years, and REU students have contributed to our outage website outage.ant.isi.edu. This website tracks Internet outages, globally, in near-real time. While we currently visualize outages geographically (we show circles in a lat/longitude grid over the world), often outages are not geographic, but instead depend on ISPs actions. For this project the student will develop tools to show outages by ISP, over time. Because ISPs have different footprints, we expect this visualization will be a table listing ISPs, grouped by country, with sparkline-graphs showing outages over the last day, week, or quarter.

Expected research outcomes: A poster presentation and research report that summarizes different strategies to visualize per-ISP outages, which visualizations we think are most effective, methods to generate visualizations in websites, and how to summarize what interesting events we found over one quarter.

Expected learning outcomes: Students will learn how to process large datasets using parallel tools like Hadoop, how to visualize big data, about Internet reliability, and how to present their research and write research papers.

+ Boosting Simulation Techniques for Quantum Many-Body Physics (mentor: Itay Hen)

Simulations of quantum many-body physics systems are notorious for being a challenging task for today’s computers or even super computers. The Hen group devotes a significant portion of its resources to developing numerical simulation codes designed to advance the current state of the art in the field, thereby allowing the physics community to gain insights into the properties of complex large-scale quantum materials. In this project, the student will inspect various sub-routines of these simulation algorithms and develop ideas and write code to improve them.

Expected research outcomes: Publications and poster presentations, improved simulation algorithms.

Expected learning outcomes: Students will learn how to work with quantum simulators, how to evaluate simulation algorithms and how to test them, how to present their research and how to write research papers.

+ Analyzing Healthy Restaurant Access and Its Effects on Diet and Well-Being Using Causal Machine Learning in the Gallup National Sample (mentor: Abigail Horn)

This project aims to harness large-scale data and methods from causal machine learning to explore associative and causal relationships between individuals’ access and exposure to healthy restaurants and their diet, diet-related health, and well-being. The student will link the restaurant nutrition density measures to individuals from a 100,000-person survey sample population from the Gallup National Sample and will explore the relationship across self-reported diet and food preparation behaviors, health outcomes, and factors related to well-being, and survey participant access to healthy restaurants. Research questions include: How does restaurant vicinity and nutrition density impact people diet choices? Do impacts vary by sub-population?

Expected research outcomes: Publications and poster presentations.

Expected learning outcomes: Students will learn how to link and analyze large datasets, and how to perform statistical hypothesis testing.

+ AI-Driven Analysis of Scientific Philanthropy: Uncovering Social Drivers in Grant Making (mentor: Mayank Kejriwal)

This project explores the sociology of science by applying computational social science techniques to analyze large-scale public grant datasets from organizations such as the MIT Sloan Foundation. The student will utilize advanced AI and text analytics tools, including Large Language Models (LLMs) and natural language processing pipelines, to examine the content of funded proposal abstracts, identifying latent themes, funding trends, and the characteristics of awardees. Mapping these "social drivers" of grant-making, the project aims to better understand how scientific priorities are shaped and how AI can be reliably deployed to uncover patterns in research funding.

Expected research outcomes: Publications and poster presentations, annotated datasets about research funding solicitations.

Expected learning outcomes: Students will learn how to work with various large language model APIs, how to identify latent themes, how to analyze large datasets, how to present their research and how to write research papers.

+ FAIR Attribution of Paleoclimate Data and Software Using AI Assistants (mentor: Deborah Khider)

Many scientists use Jupyter notebooks to analyze paleoclimate data, but it is often difficult to keep track of all the datasets and software that should be properly cited. In this project, the student will help develop an AI assistant that automatically detects which datasets and software are used in a notebook, finds their citation information, and creates a ready-to-use citation file. This work will be part of PaleoPAL, an AI tool designed to support paleoclimate research and make scientific workflows more transparent, reproducible, and fair.

Expected research outcomes: Poster presentation, peer-reviewed publications, and a functional AI agent integrated into PaleoPAL.

Expected learning outcomes: Students will learn how to work with scientific knowledge bases, design and implement AI agents using Retrieval-Augmented Generation, and evaluate AI systems for scientific workflows. They will also gain experience presenting research results and writing scientific papers.

+ Wireless and Traditional Network Measurements from Mobile Platforms (mentor: Erik Kline)

Current radio tools that allow monitoring of 5G networks are expensive and difficult to use. However, a series of cheap SDRs such as the Flipper Zero or Lime SDR have been created to potentially fill this gap. The goal of this project would be to use one of these devices to capture traffic from the 5G bands and see if we can leverage LLMs or other ML to decode these messages. This may include using a small computer such as a Raspberry Pi to control the cheap SDR and process the data.

Expected research outcomes: Publications and poster presentations, software for operating cheap SDRs in or near 5G.

Expected learning outcomes: Students will learn how to write and deploy code on SDRs, how 5G messaging works, how to present their research and how to write research papers.

+ Predicting Emergent Collective Behavior in Multi-Agent AI Systems (mentor: Luca Luceri)

This project develops a novel framework for predicting emergent collective behaviors in multi-agent AI systems by measuring micro-scale incentive transformations through inverse reinforcement learning. Autonomous AI agents increasingly populate critical systems, yet we lack methods to predict where, when, or what type of collective behaviors will emerge as these agents interact. The framework infers latent reward functions governing agent decision-making from observed behavioral trajectories in simulated and deployed multi-agent environments, then uses graph neural architectures to forecast macro-scale phenomena including coordination patterns, system fragmentation, and consensus formation. This project advances fundamental understanding of emergence in artificial collectives, offering transformative capabilities for designing resilient multi-agent systems, detecting coordination failures before they cascade, and ensuring deployed AI systems behave as intended at scale.

Expected research outcomes: Validated predictive framework (more than 80% accuracy) for agentic systems; taxonomy of emergence mechanisms; open-source multi-agent simulation environments and emergence monitoring tools; benchmark datasets; publications in top-tier AI venues; demonstrations for autonomous systems applications.

Expected learning outcomes: students will learn inverse reinforcement learning for multi-agent systems, graph neural architectures for modeling agent interactions, and methods for predicting emergent collective behaviors in artificial systems. They will gain hands-on experience designing controlled agentic environments, analyzing emergence patterns, and developing tools for safe autonomous system deployment.

+ Sycophancy in LLMs (mentor: Jonathan May)

This project aims to address the sycophancy issue in Large Language Models (LLMs), the tendency of AI to prioritize user agreement over objective truth, by developing agents that promote critical thinking through constructive disagreement. The student researcher will first process a large-scale social media dataset, establish a robust validation pipeline to prepare the data for public research use. Utilizing the Transformers TRL library, the student will then run controlled experiments to optimize hyperparameters for generating plausible, well-reasoned counter-arguments. This role requires the student to have strong Python programming skills and a foundational understanding of Natural Language Processing (NLP) via the Hugging Face ecosystem.

Expected research outcomes: Publications and poster presentations.

Expected learning outcomes: Students will learn how to use different LLM engines, how to evaluate them, how to use the Transformers library, how to present their research and how to write research papers.

+ Bridging the Instruction-Perception Gap (mentor: Shrikanth Narayanan)

This project addresses three core challenges in the reliability of AI-generated speech: mitigating hallucinations in automated judges, identifying missing safety and stylistic dimensions and achieving cost-effective scalability. The student will work on developing a reliable audio-judge framework. They will categorize failures in current state-of-the-art (SOTA) models (like VoiceSculptor) specifically looking for the “Instruction-Perception Gap” — instances where the model claims to follow an instruction but human listeners disagree. Student will then experiment with different LLM prompts to improve reliability.

Expected research outcomes:Publications and poster presentations, open-source evaluation toolkit, a curated dataset of speech samples where standard AI-judges fail.

Expected learning outcomes: Students will learn how to work with spoken language models and instruction-guided text to speech, how to present their research and how to write research papers.

+ AI Agents for Testbed Experimentation (mentor: Alexey Tregubov)

Testbed experimentation requires multiple complex steps to allocate resources, initialize them and connect to them. Different testbeds have their own constraints on the ordering and the details that the steps require, and their own interfaces for the users to enact the steps. These complexity and diversity issues make for a steep learning curve for new users. This project will build AI agents for testbed experimentation. Agents will use LLMs and testbed-specific documentation to learn how to execute basic tasks (e.g., allocate resources, open terminal access to a given resource, install software packages on allocated resources, etc.) and they will offer a chat-based UI to users to specify their experimental needs. Agents will then either ask for clarification, when needed, or translate user wishes into necessary testbed actions and execute them. Research questions include: identification of basic testbed operations that agents should support, development of AI agent architecture including safety and security safeguards to protect user resources and ensure fair use of testbed resources, understanding portability of AI agents across testbeds.

Expected research outcomes: Publications and poster presentations, AI agent design and prototype deployment on SPHERE testbed.

Expected learning outcomes: Students will learn how to design and build AI agents and how to evaluate their performance. They will learn how to implement safeguards for AI agents.