Theses & Reports

Instructions for submitting a technical report or thesis.

You can find technical reports published prior to 1990 archived here.

Title

Authors

Year

Ph.D. Thesis 2025 Designing Efficient and Equitable Networked Systems for Mobile Users in Emerging Regions Asim, Rohail Abstract | PDF

Title: Designing Efficient and Equitable Networked Systems for Mobile Users in Emerging Regions

Candidate: Asim, Rohail

Advisor(s): Yasir Zaki

Abstract:

Global improvements in network infrastructure have enabled the development of exciting applications across a broad spectrum. These developments range from research powering lightweight educational, informative, and community-building services in rural and developing regions with poor internet accessibility and hardware to Collaborative Extended Reality applications that push the limits of the state-of-the-art network infrastructure available today with the aim of realizing ideas that, prior to recent advancements, were only available in fiction. Across the spectrum, significant challenges restrict the development and deployment of exciting new applications due to poor connectivity, limited access to high-performance devices, and unaffordable service costs in emerging regions and next-generation networked applications such as immersive reality and large-scale AI systems introducing unprecedented demands on bandwidth, latency, and sustainability in regions with state-of-the-art network infrastructure. This thesis addresses these twin challenges of digital inequality and network inefficiency by developing new systems and methodologies that operate across both the application and transport layers of the Internet stack.

In the first part of this dissertation, we present a series of lightweight web access systems designed for low-end phones, offline environments, and bandwidth-constrained regions. Through a global measurement study of 56 cities, we quantify disparities in page load times, web complexity, and mobile affordability. We then introduce Lite-Web, a browser-level rewriting system that accelerates existing websites on low-end devices, and MAML, a markup abstraction for building simplified and visually consistent pages. These systems are deployed in GAIUS, a hyperlocal, offline-first web ecosystem adopted in communities across Kenya, Bangladesh, and India. These web simplification efforts enabled internet accessibility in regions with poor internet accessibility and hardware constraints. However, many emerging regions suffer due to a lack of network infrastructure that creates a barrier between lightweight simplified webpages and people living in these regions. To address this, we also design Sonic, a novel hybrid system that leverages radio infrastructure to broadcast pre-rendered web content over FM radio and enable interaction through SMS, enabling access in disconnected regions such as rural Cameroon.

In the second part of the dissertation, we turn our focus to the transport layer, where emerging applications face severe limitations from current congestion control protocols. Using a new benchmarking framework, we evaluate the performance of state-of-the-art CCAs across synthetic and real 5G networks. Our analysis reveals significant mismatches between protocol behavior and the requirements of next-generation collaborative and immersive applications. To address this, we design Hera, a QoE-aware modular framework for next-generation immersive applications. By bridging the gap between application-level responsiveness and network-level adaptability, Hera lays the foundation for more scalable, robust, and high-fidelity multi-user immersive experiences.

Together, these contributions demonstrate how cross-layer design, from simplified content to smarter transport, can dramatically improve web accessibility, application quality of experience (QoE), and sustainability in both high-demand and underserved settings. The work advances a broader vision for an inclusive and efficient Internet: one that adapts to user constraints, application demands, and the infrastructural realities of the global majority.
Ph.D. Thesis 2025 On The Applications of Coarse Network Geometry to Personalized Immuno-Oncology Bannon, James Abstract | PDF

Title: On The Applications of Coarse Network Geometry to Personalized Immuno-Oncology

Candidate: Bannon, James

Advisor(s): Bud Mishra

Abstract:

Immune checkpoint inhibitors (ICIs), also called immune checkpoint blockers, are a promising category of targeted therapy for solid tumors. Predicting which patients will respond to ICI therapy remains an open problem under active investigation. This thesis aims to improve the precision with which immune checkpoint inhibitors are prescribed. By focusing on one type of biological measurement --- whole-tumor shotgun RNA sequencing data, which we call \textit{bulk RNA-seq} --- we are able to deeply explore the potential and limits of predictors built from this kind of measurement. Two of the algorithms presented here are based on a notion of graph curvature which we believe has extensive promise in bioinformatic inquiry.

The first part of this thesis performs a rigorous permutation testing evaluation of machine learning models for the task of predicting therapy response which we cast as a binary classification problem. We show that bulk RNA-seq data contains predictive signal but that there is an upper limit to ML model efficacy that can potentially be remedied by the curation of larger data sets or augmenting RNA-seq data with other biological measurements.

The next part presents a modular pipeline for the discovery of biomarkers from bulk RNA-seq data. We contextualize gene expression measurements using a protein-protein interaction (PPI) network and then use a notion of graph curvature to find (pairs of) genes in the PPI network that could serve as potential biomarkers. Our candidate biomarkers are evaluated using an extensive literature search and transfer learning experiments. We also provide a harmonized collection of drug-specific candidate markers found through rank aggregation that we believe merit further study.

Lastly, we cluster patients in an unsupervised manner using discrete Ollivier-Ricci Flow (ORF). Our method surfaces populations with distinct survival curves which in turn allows us to find many potential biomarkers, including gene expression modules. We believe the algorithm may be of independent interest for clustering other datasets in a diverse set of research areas.

As a result of the work here we have provided novel algorithmic techniques for analyzing (biological) data and advanced the state of the art in finding biomarkers for ICI therapy.
Ph.D. Thesis 2025 Modern machine learning methods for protein design Berenberg, Daniel Abstract | PDF

Title: Modern machine learning methods for protein design

Candidate: Berenberg, Daniel

Advisor(s): Richard Bonneau, Kyunghyun Cho

Abstract:

Designing biosynthetic molecules such as proteins is critical for applications in therapeutics and agriculture, yet the vast sequence space and complex functional landscape pose significant challenges.

Previous design workflows rely on clustering, mechanistic modeling, or directed evolution and are often constrained by hand-crafted heuristics and domain-specific biases. Advances in deep generative modeling and protein databases of unprecedented size present an opportunity to apply modern machine learning techniques.
In this work, we develop methods to generate and score protein sequences. We propose several steering and guidance techniques that balance data-driven exploration with expert-guided refinement.

Leveraging established classifications of antibodies, we enable targeted redesign of designated regions for applications such as affinity maturation and framework optimization. Expanding the scope to general sequence design, we show effective classifier-guided generation of protein sequences using a novel sequence denoising autoencoder. Finally, we investigate the utility of natural language text embeddings in classifier-free generation and show the capabilities of text conditioned models on downstream generative modeling tasks.

Our work provides a spectrum of methods that transition from bespoke, domain-specific approaches toward a generalized, human-centric framework for modern protein engineering and molecular programming.
Ph.D. Thesis 2025 Fair and Explainable Machine Learning: Estimating Bias, Detecting Disparities, and Designing for Algorithmic Recourse Boxer, Kate Abstract | PDF

Title: Fair and Explainable Machine Learning: Estimating Bias, Detecting Disparities, and Designing for Algorithmic Recourse

Candidate: Boxer, Kate

Advisor(s): Daniel Neill

Abstract:

This dissertation investigates algorithmic bias and explainability from the perspective of an individual's interactions with computational models that have an impact on their circumstances, including those influencing their environmental conditions and those used during institutional decision-making. Accordingly, this dissertation focuses on three subtopics within this broad field: estimating data bias in datasets that inform policy decisions, auditing for predictive bias, and multi-objective formulations for systems that provide algorithmic recourse.

In relation to estimating data bias in datasets utilized to inform governmental resource allocation, we introduce two methods—a novel grouping algorithm for statistical significance testing and a custom latent variable model—to detect under-reporting in citizen-generated data. This introduces a domain-specific framework that is instrumental for practitioners interested in making data-informed policy decisions using self-reported data collected from populations located in urban settings. To audit for predictive bias, we introduce a domain- and model-agnostic framework for detecting statistically significant predictive biases in model outputs affecting both marginal and intersectional subpopulations of a target population through novel pattern detection methods for subgroup scanning, where predictive biases take the form of group-fairness violations.

Lastly, we propose a set of principles aimed at ensuring that systems that provide algorithmic recourse materially increase individual agency. Based on these principles, we endorse specific design choices to ensure the reliability of recommendations, develop burden-based measurements to assess the accessibility and fairness of these systems, and train algorithmic decision-makers that uphold these principles when used in systems that provide algorithmic recourse.

Collectively, these works represent key methodologies to detect data bias and predictive bias, spanning both context-specific and domain-agnostic settings, and also contribute to an effort to fundamentally shift institutional decision-making to ensure that algorithmic decision-makers are designed in such a way that individuals have means to achieve favorable outcomes.
Ph.D. Thesis 2025 Simple Structures in Neural Networks: On Expressiveness, Optimization and Data Distribution Chen, Lei Abstract | PDF

Title: Simple Structures in Neural Networks: On Expressiveness, Optimization and Data Distribution

Candidate: Chen, Lei

Advisor(s): Prof. Joan Bruna

Abstract:

In this era of Large Language Models (LLMs) and other giant neural networks, we aim to analyze simplified settings from scratch, as foundational steps towards understanding the functionality of the giant models. We present our understanding from three aspects. On expressive power, we investigate the function class of simplified graph networks, i.e., Graph-Augmented Multi-layer Perceptrons (GA-MLPs), against the classic Graph Neural Networks (GNNs) using measurements of graph isomorphism testing and counting attributed walks. On optimization, we theoretically study instabilities from large learning rates in training neural networks, i.e., Edge of Stability. We investigate the conditions of how the loss landscape contains such unstable training trajectories, especially oscillating in a low-dimensional subspace. Then we leverage such property in simple, yet representative, learning problems in a teacher-student style. On data distribution of reasoning tasks, we propose a decomposition of next-token prediction into two parts: in-context reasoning and distributional association. We study this decomposition empirically and theoretically in a controlled synthetic setting, and find that feed-forward layers tend to learn simple distributional associations such as bigrams, while attention layers focus on in-context reasoning. Finally, we discuss how such an understanding of next-token predictions and feed-forward layers could be applied to some recent developments of LLMs.
Ph.D. Thesis 2025 Distributed Randomness in Adversarial Settings Choi, Kevin Abstract | PDF

Title: Distributed Randomness in Adversarial Settings

Candidate: Choi, Kevin

Advisor(s): Joseph Bonneau

Abstract:

Distributed randomness in adversarial settings concerns the problem of jointly computing a random output in a network of mutually untrusting participants such that the output is not predictable or biasable by any participant or any coalition of participants. A distributed randomness beacon (DRB) is a service that periodically emits random outputs through such distributed randomness protocols and has found applications in cryptographically verifiable lotteries and gaming as well as leader election in distributed systems and consensus algorithms. In the past decade, the landscape of DRBs has evolved, with many DRB protocols relying on ad hoc heuristics rather than structured design principles. While this bottom-up approach has led to interesting integrations of cryptographic techniques, establishing a unifying framework of DRBs has remained open prior to this work. Similarly, the consideration of security properties of DRBs, such as unbiasability and unpredictability, has typically been restricted to specific settings.

This dissertation seeks to address these gaps by adopting a top-down approach to realizing a distributed randomness beacon. We conceptualize the broader design space of DRBs, introduce comprehensive security definitions applicable to all DRBs, and consider a variety of practical deployment scenarios. Simultaneously, we compare protocols based on their communication and computational efficiency and also highlight the functionality of various cryptographic building blocks in light of DRBs rather than solely focusing on their technical details.

Furthermore, we shed light on the security gap that exists between theoretical models and real-world scenarios, where most theoretical DRBs rely on the honest majority assumption (network assumption that more than half of the nodes are honest) which has shown to break down in practice (e.g. the \$625 million Axie Infinity's Ronin hack in 2022). Recognizing this issue, we propose two new optimized DRB protocols---Bicorn and Cornucopia---that offer robustness even in the presence of a dishonest majority.
Ph.D. Thesis 2025 Computational Design through Differentiable Elastodynamic Simulation, Parametrized by Geometric Techniques, for Applications in Soft Robotics Gjoka, Arvi Abstract | PDF

Title: Computational Design through Differentiable Elastodynamic Simulation, Parametrized by Geometric Techniques, for Applications in Soft Robotics

Candidate: Gjoka, Arvi

Advisor(s): Daniele Panozzo, Denis Zorin

Abstract:

Traditionally, design of physical objects is a tedious task which involves many time-consuming cycles of design and experimentation, often done by area experts which specialize in one or the other. This is especially true with soft objects that exhibit large displacement that is central to their form and function, and it is hard to predict how the object will behave during the design phase. In this talk, we will explore viewing computational design through the lens of differentiable simulation. First, we will outline a framework for differentiable FEM simulation with robust contact handling, which allows us to extract gradients with respect to shape, material parameters, boundary conditions, etc. Next, we discuss how this framework allows us to explore computational design of highly deformable objects, starting with pneumatic soft robots and then looking at the modeling and design of deformable capacitive thin-film sensors that can be draped over objects (such as soft robots or human body parts). For each, we demonstrate the validity of our simulation and optimization results by validating on fabricated objects.
Ph.D. Thesis 2025 Noise and Games in Distribution Estimation: From Survival Analysis to Generative Models Goldstein, Mark Abstract | PDF

Title: Noise and Games in Distribution Estimation: From Survival Analysis to Generative Models

Candidate: Goldstein, Mark

Advisor(s): Rajesh Ranganath

Abstract:

This thesis presents new machine learning methodologies for generative modeling and survival analysis. We introduce novel approaches for specifying, training, and sampling from diffusion-based generative models, including auxiliary-variable and nonlinear noising processes, data-dependent base distributions, and hybrid strategies that combine deterministic and stochastic sampling. We validate these methods across applications such as images, videos, partial differential equations, and active matter systems, showing that simple choices at training and inference time can significantly impact both efficiency and performance.

Shifting focus to a different kind of distribution estimation problem, survival analysis (time-to-event modeling), we propose Inverse-Weighted Survival Games, an optimization framework that handles censoring (i.e., missing data) through the simultaneous estimation of failure and censoring distributions. This approach improves both discriminative performance and calibration on real-world medical datasets.

Returning to diffusion models with new optimization tools, we introduce GameFlow, a novel training method for consistency models (i.e., flow maps that directly and quickly solve the diffusion sampling process). GameFlow uses a game-like formulation to efficiently train consistency models from scratch via Jacobian-vector products, avoiding the need for adversarial objectives, model inverses, or pre-trained models.

Collectively, these works leverage latent variables, differential equations, estimating equations, Monte Carlo gradient methods, and stop-gradient games—highlighting how a shared set of computational tools can be broadly useful for tackling distribution estimation across diverse domains.
Ph.D. Thesis 2025 Understanding Inductive Bias in the Era of Large-Scale Pretraining with Scientific Data Gruver, Nathaniel Abstract | PDF

Title: Understanding Inductive Bias in the Era of Large-Scale Pretraining with Scientific Data

Candidate: Gruver, Nathaniel

Advisor(s): Andrew Wilson

Abstract:

Inductive biases are crucial for machine learning in data-scarce settings, but their optimal role in data-rich regimes remains poorly understood. This thesis challenges the conventional wisdom that strict architectural constraints are necessary for modeling numerical data, particularly in physics and chemistry. Through systematic empirical studies, I demonstrate that data-driven approaches can effectively learn both physical symmetries and broader numerical patterns without explicit architectural constraints. First, I show that transformer models trained with data augmentation can acquire stronger equivariance properties than convolutional neural networks, despite lacking built-in symmetry constraints. Building on this insight, I investigate whether pretrained language models can learn generalizable numerical capabilities from text alone. By studying the behavior of language models in many settings, I demonstrate that text pretraining induces a preference for simple functions that serves as a powerful inductive bias across numerical domains. This emergent bias enables large language models to outperform specialized architectures on benchmark tasks in time series forecasting and 3D structure prediction, achieving state-of-the-art results with minimal task-specific adaptation. However, these benefits do not extend universally - I identify molecular property prediction as a key limitation and trace this failure to fundamental constraints in discrete token representations. This work provides a comprehensive framework for understanding when learned biases can replace architectural constraints in numerical domains, with important implications for model design in scientific machine learning.
Ph.D. Thesis 2025 Computational Shape Design through Robust Physics Simulations Huang, Zizhou Abstract | PDF

Title: Computational Shape Design through Robust Physics Simulations

Candidate: Huang, Zizhou

Advisor(s): Denis Zorin, Daniele Panozzo

Abstract:

Additive manufacturing enables the fabrication of complex geometric structures tailored to specific material properties, with diverse applications ranging from lightweight yet strong aerospace components to customized shoe soles, prosthetic devices, and flexible robotic parts. However, due to the complexity of geometry, novel techniques for engineering analysis and optimization are needed. Our research seeks to address problems by developing robust and accurate physics simulation methods that can enhance the design process of complex structures.

This thesis introduces a physics-based simulation method for elastodynamics, incorporating collisions and friction, that resolves the artifacts in the state-of-the-art method and provides better robustness and efficiency. Further, the simulator is extended to support differentiability with respect to input physics parameters, enabling gradient-based inverse optimization applications such as optimal shape design and material inference. Specifically, we investigate the desired force response of shock-absorbing materials and leverage our differentiable simulator for shape optimization to achieve the desired behavior. The resulting microstructures are fabricated and validated through real-world experiments, demonstrating the accuracy and practical applicability of the proposed simulation framework.
Ph.D. Thesis 2025 Understanding and Mitigating Goal Misgeneralization in Language Models Joshi, Nitish Abstract | PDF

Title: Understanding and Mitigating Goal Misgeneralization in Language Models

Candidate: Joshi, Nitish

Abstract:

As Large Language Models (LLMs) are being widely used in various applications, it is critical that they are robust and generalize well. One of the reasons why LLMs might perform poorly after deployment is due to goal misgeneralization. Goal misgeneralization refers to the issue where an LLM performs well on the training distribution (e.g., high accuracy or reward), but performs poorly on the test distribution due to misgeneralization. Specifically, misgeneralization implies that the model has a systematic failure on the test distribution due to learning unintended functions, as opposed to performing randomly or lacking capability to do well on the test distribution. This encapsulates various problems that the machine learning community has worked on, including spurious correlations, underspecification, and reward hacking.

This dissertation focuses on goal misgeneralization in language models and consists of the following components. (1) For finetuning language models, if explicit knowledge of the spurious correlation which the model relies on is available, mitigating it is not too hard. We propose a new method to mitigate spurious correlations when such knowledge is not available---our method relies on complementary knowledge based on semantic corruptions. We empirically demonstrate the effectiveness of our method outperforming standard training methods. (2) For such methods which do rely on the knowledge of semantics to mitigate spurious correlations, scalably discovering robust semantic features can be done through crowdsourcing, such as in counterfactual data augmentation. We critically analyze the discrepancy between theory and practice for this training method, where in practice it seems to give marginal to no benefits. We show that this occurs due to the difficulty in obtaining diversity in counterfactuals, and this lack of diversity could even exacerbate spurious correlations. (3) We take a step back and ask: Can we use a mitigation method for any spurious correlation encountered in language data? We argue that there are two main sources of spurious correlations in language data, and methods to mitigate and evaluate spurious correlations might not work well for both. One is when the feature is irrelevant to the label (e.g. extra spaces), and the other is when the feature's effect on the label depends on the context (e.g. negation). We formalize this distinction using causal models and demonstrate why the distinction is necessary empirically. (4) We discuss other goal misgeneralization issues beyond spurious correlations in finetuning. First, we demonstrate how goal misgeneralization could occur during pretraining. Specifically, focusing on causal reasoning we show that language models have learned unintended position bias and post hoc fallacy from the pretraining data. We also show that only scaling language models does not address this misgeneralization. Next, we show that underspecification in in-context learning is also an instance of goal misgeneralization, and understand feature preferences of language models in the setting.

Finally, we discuss future directions focusing on other goal misgeneralization issues in language models. We briefly mention goal misgeneralization in the context of safety for LLM-agents, and reward hacking during reinforcement learning in language models.
Ph.D. Thesis 2025 Decision Problems for Global Protocol Specifications Li, Elaine Abstract | PDF

Title: Decision Problems for Global Protocol Specifications

Candidate: Li, Elaine

Advisor(s): Thomas Wies

Abstract:

Concurrency is ubiquitous in modern computing, message passing is a major concurrency paradigm, and communication protocols are therefore a key target for formal verification. Writing implementations for each protocol participant individually, such that their composition is free from communication errors and deadlocks, is challenging and error-prone. In response, various verification methodologies center on the construct of a global protocol. Global protocol specifications synchronously describe the message-passing behaviors of all protocol participants from
a bird’s-eye view, and thus rule out large classes of communication errors by construction. Global protocols are adopted in industry by the ITU standard and UML, and are widely studied in academia in the form of high-level message sequence charts, session types and choreographic programs. Application domains for this top-down verification methodology include cryptographic security, cyber-physical systems, and web services.

This thesis contributes decision procedures for three problems central to global protocol verification: implementability, synthesis, and subtyping. Implementability asks whether a protocol admits a distributed implementation, synthesis in turn computes one, and subtyping asks whether an admissible implementation can be substituted in whole or part to yield fewer behaviors. This thesis additionally contributes a Rocq mechanization of a precise implementability characterization for infinite-state protocols, and the SPROUT tool for automatically verifying such protocols.
Ph.D. Thesis 2025 Governing the Scientific Journals: What Big Data and Computational Modeling Tell Us about the Policies That Shape Editorial Boards Liu, Fengyuan "Michael" Abstract | PDF

Title: Governing the Scientific Journals: What Big Data and Computational Modeling Tell Us about the Policies That Shape Editorial Boards

Candidate: Liu, Fengyuan "Michael"

Advisor(s): Talal Rahwan

Abstract:

Academic journal editors are the gatekeepers of science, collectively shaping the content of scientific publications and setting standards for their fields of research. Yet, most editors take on this role as a form of community service while maintaining their primary careers as research-active scientists. This dual role raises two key questions at the heart of this thesis: (1) To what extent are editors representative of scientists at large in terms of their demographic composition? (2) How prevalent are conflicts of interest among academic editors? To address these questions, I construct two large, novel longitudinal datasets of academic editors and provide quantitative evidence on both fronts. Furthermore, these datasets enable me to evaluate the impact of policy interventions designed to (1) increase editorial board diversity and (2) mitigate conflicts of interest. By leveraging natural experiments identified in historical archives of journal policy documents, I analyze cases where such policies have been implemented and evaluate their effectiveness. Finally, I discuss the broader implications of big data and computational modeling for quantitative policy research.
Ph.D. Thesis 2025 Machine Learning for Simulations Otness, Karl Abstract | PDF

Title: Machine Learning for Simulations

Candidate: Otness, Karl

Advisor(s): Joan Bruna, Benjamin Peherstorfer

Abstract:

Computational modeling of physical systems is a core task of scientific computing. Machine learning methods can extend traditional approaches to modeling partial differential equations and hold the potential to simplify the modeling process and improve simulation accuracy and performance. In this thesis we explore the use of neural networks to learn the behavior of systems from data. We evaluate the performance-accuracy tradeoffs involved in their use as emulators, and use insights gained here to explore a specific application to learning subgrid parameterizations for climate models in particular. For this task we propose two novel techniques to improve the accuracy and stability of the learned parameterizations by tailoring architectures to incorporate favorable inductive biases, and by augmenting training data to encourage stability.
Ph.D. Thesis 2025 Language Models at the Scale of Evolution Rives, Alexander Abstract | PDF

Title: Language Models at the Scale of Evolution

Candidate: Rives, Alexander

Advisor(s): Rob Fergus, Yann LeCun

Abstract:

I will describe the development of the evolutionary scale modeling (ESM) program, which proposes to solve an inverse problem across evolution to learn the biology of proteins from their sequences at the scale of life. Beginning from the idea that the sequences of proteins contain an image of biology in their patterns, this thesis shows that language models trained on protein sequences spanning the natural diversity of the Earth, by learning to predict which amino acids evolution chooses, develop feature spaces that reflect the immense scope and complexity of protein biology containing known and unknown biology. Biological structure and function emerge in the representations of the models. This emergence is shown to occur in a direct linkage with improvements in the language modeling of sequences. The representation space has an ordered structure in which proteins are organized according to their underlying biology, and directions correspond to meaningful biological variations. Attention patterns materialize in the neural network that correspond to the folded three-dimensional structure of proteins. The probabilities assigned to amino acids within a given sequence context, reflect protein function and predict the effects of mutations. The representations learned by protein language models constitute a general and transferable feature space which supports the discovery and generation of new biology. This has enabled an effort to reveal the structures of hundreds of millions of metagenomic proteins for the first time. The thesis concludes with experimental characterizations of proteins created by language models, which demonstrate that the feature space learned from natural proteins supports generating proteins beyond those in nature.
Ph.D. Thesis 2025 Static Analysis Tools For Network-Device Stacks Ruffy, Fabian Abstract | PDF

Title: Static Analysis Tools For Network-Device Stacks

Candidate: Ruffy, Fabian

Advisor(s): Anirudh Sivaraman

Abstract:

Networking devices are becoming more programmable. With this trend, network-device software---dedicated to forwarding packets and interpreting instructions from the network control plane---now covers more functionality and also increases in complexity. Faults in network-device software can have an outsized impact on a network. Hence, network operators and device manufacturers are reaching for static analysis to ensure that this code is both functionally correct and well-optimized. Network-device software is extensive and often written in general-purpose languages such as Python or C++. These languages contain loops, aliasing, or indirection, which can make developing effective static analysis techniques challenging.

In this dissertation, we explore an opportunity to build better static analysis tools for network-device software. We use P4, a domain-specific language for network programming, as our foundation. We develop an execution model for P4 which describes the behavior of a network device, and we reify this execution model using satisfiability modulo theories (SMT), expressed in quantifier-free bit vectors. We refine this execution model through three distinct projects and show its utility by adopting techniques from software engineering research that are theoretically powerful but were considered practically limited for general-purpose languages. Applying our specialized techniques, we were able to find over 50 bugs in network-device software which cause incorrect packet-processing. Furthermore, we reuse our model to optimize network programs based on their control-plane configuration, which can reduce resource usage and increase packet-processing performance.

Our SMT-based execution model for packet processing is protocol-independent, device-agnostic, and precise enough for bug-finding and program optimization. We attribute these successes to tailoring our model to a DSL specialized in packet processing while also appropriately exploiting the restrictions of this DSL.
Ph.D. Thesis 2025 Towards Generally Intelligent Robots that Simply Work Everywhere Shafiullah, Nur Muhammad "Mahi" Abstract | PDF

Title: Towards Generally Intelligent Robots that Simply Work Everywhere

Candidate: Shafiullah, Nur Muhammad "Mahi"

Advisor(s): Lerrel Pinto

Abstract:

Applications of machine learning have touched the lives of common people in innumerable novel ways. Robotics today seems poised to make such an impact, too. Yet the current state-of-the-art in robotics, whether it’s a parkouring humanoid from Boston Dynamics or a T-shirt-folding robot from Google Deepmind, are specialists of their own environments – either by instrumenting and extensively modeling the scene, or by collecting weeks or months of data on the exact same setup.

In this thesis, we focus on building generally intelligent robots that simply work everywhere by studying the interplay of representation, data, and memory in robotics. To create robots that can address the broad and diverse challenges of operating in messy and unstructured environments everywhere, this thesis investigates three fundamental directions. We first look into algorithms that optimize the use of data in robot learning since data, as fuel, plays a critical role in creating broadly capable ML systems. We not only create efficient, self-supervised representations of the robots' perception, but also develop action representations that enable scaling to large, uncurated demonstration datasets. Then, we take a deep dive on creating systems – bridging algorithms and hardware – that can create and learn from robot data in the wild. Such systems enable few-shot and then zero-shot behavior generalization in novel homes in New York City and beyond. Finally, to enable generally intelligent robot behavior that extends over time and space, we construct neural data structures called spatio-semantic memory for robots. These memory modules enable scaling in-the-wild autonomous robot behavior from seconds to hours, and beyond.
Ph.D. Thesis 2025 Shape Design, Repair and Optimization Wang, Siqi Abstract | PDF

Title: Shape Design, Repair and Optimization

Candidate: Wang, Siqi

Advisor(s): Denis Zorin, Daniele Panozzo

Abstract:

Digital geometric models are fundamental to modern engineering, media, and manufacturing. However, models created by artists in-the-wild often contain ambiguities that precludes their use in simulation and manufacturing, while complex designs may need to be simplified for efficiency or functionally optimized to meet competing aesthetic and performance goals. This necessity for robust, useful, and high-performing geometry creates a critical need for advanced computational techniques that can automatically repair, simplify, and optimize digital shapes. Our research addresses these challenges by developing a suite of shape processing and optimization methods designed to enhance the quality and functionality of geometric models for a range of applications.

This thesis delivers solutions across three key areas. First, we present a Bézier curve simplification framework that simplifies complex vector graphics while preserving visual fidelity by defining a curve-to-curve distance metric and repeatedly conducting local segment removal operations. Second, we propose a solid or shell labeling technique for artist-created surface meshes that lack a well-defined interior, guided by a sparse set of user inputs. These labels reduce ambiguity and enable the construction of valid volumetric meshes for downstream applications. Finally, we introduce two powerful shape optimization frameworks: one that leverages neural network-based models to independently control the tactile properties and visual appearance of a texture, and another that optimizes the geometry and position of radiofrequency (RF) receive coil arrays to increase signal-to-noise (SNR) ratio in magnetic resonance imaging (MRI).
Ph.D. Thesis 2025 Mechanisms to Advance the Adoption of Programmable High-speed Packet-Processing Pipelines Wang, Tao Abstract | PDF

Title: Mechanisms to Advance the Adoption of Programmable High-speed Packet-Processing Pipelines

Candidate: Wang, Tao

Advisor(s): Anirudh Sivaraman, Aurojit Panda

Abstract:

Today's programmable high-speed packet-processing pipelines have enabled a wide range of network offloads, e.g., in-network telemetry, parameter aggregation in machine learning, etc. However, it is not ready yet to allow a larger number of people and applications to benefit from those programmable pipelines.

This dissertation looks into this problem from two specific aspects, i.e., multitenancy and general L7 processing, and argues that new hardware primitives together with software toolchains are necessary to make the high-speed packet-processing pipelines a wider adoption for the application developers. Specifically, in this dissertation, we propose two systems: (1) Menshen designs isolation mechanisms to support multiple programs running atop a single pipeline without interfering with each other; (2) QingNiao targets L7 dispatch—a type of L7 process that is pervasive in the networking infrastructure layer—and presents a holistic solution based on the new hardware primitives and a programming model to support running such L7 processing on the programmable pipelines.
Ph.D. Thesis 2025 Enhancing Computational Music Intelligence via Concept Alignment Wang, Ziyu Abstract | PDF

Title: Enhancing Computational Music Intelligence via Concept Alignment

Candidate: Wang, Ziyu

Advisor(s): Gus Xia

Abstract:

Recent advances in generative AI have led to impressive achievements in music generation. Yet, a fundamental challenge remains: how can these black-box models move beyond imitating music data to truly understand human creative intent and collaborate meaningfully with humans? We argue that the missing piece is a deeper alignment between humans and AI. This thesis introduces concept alignment as a framework to bridge the human creative process and machine behavior through various ways of concept manipulation. I explore this through three core directions: (1) concept representation, using disentangled latent codes to control musical attributes like pitch contour and texture; (2) concept organization, designing hierarchical models that structure musical ideas and abstractions; and (3) concept emergence, guiding models to discover symbolic representations directly from raw data in an unsupervised way. These contributions demonstrate how models can learn, organize, and reveal human-like concepts, opening a path toward more interpretable, controllable, and collaborative music AI.
Ph.D. Thesis 2025 Better Incentives: Performant and Private Machine Learning Xu, Mimee Abstract | PDF
Title: Better Incentives: Performant and Private Machine Learning

Candidate: Xu, Mimee

Advisor(s): Leon Bottou

Abstract:

Machine learning algorithms benefit from large and diverse datasets. However, business needs and research workflows are potentially at odds with the ownership of private data. Without sharing private data in their respective contexts, current privacy-enhancing solutions tend to, instead, compromise on performance or privacy.

This thesis addresses gaps between machine learning and data ownership, through modeling a system of three parties: model owners, data owners, and overseers. Incentive issues between the parties are addressed with secure and confidential computation, consisting of Secure-Multiparty Computation (S-MPC) and Homomorphic Encryption (FHE). Though lesser-known to machine learning, these techniques can help support data rights.
- First, as data used for training tends to be owned by disparate parties, the first sub-problem pertains to whether unshared training data's utility can be evaluated without sharing it. We implemented influenced-based appraisal functions that are compatible with efficient S-MPC computation, achieving 92.3\% correlation with plain-text ground truth ranking for 100 datasets under induced class imbalance, and 96.0\% under label-flipping, without the usability challenge of sensitive hyperparameters of training a joint model under S-MPC.
- Second, seeing the trend of deploying proprietary ML models where the input and output to those models are hidden, can the public audit privately-held data, especially in domains where encryption is the default? Using FHE for auditing triaging fairness in hospitals' emergency department, as an example, my prior work provided a qualitative description of the setup that can be applied to ease the tension between regulators and private data parties, without the need to decrypt private data.
- Finally, is it necessary to trade off data utility and privacy in low data domains? Our practical framework, Secure-KL (SKL), incurs no privacy leakage while enabling robust evaluation of additional data to combine with. Without making assumptions about the final downstream model, our dataset-divergence approximation, in secure computation, is consistent with plaintext divergence values by over 90%. We show it successfully identifies beneficial data partnerships for intensive care unit (ICU) mortality prediction, hereby improving downstream classifier performance for the source hospital. We also show that secure methods are more robust and reliable than alternatives of sharing a subset of data (medium leakage), using demographic information (low leakage), or selecting blind (high variance). With zero leakage, SKL allows all parties' data to remain private while entire datasets are utilized, eliminating a key roadblock towards orchestrating broader collaborations in healthcare with limited resources.
Ph.D. Thesis 2025 Diagnosing AI Misbehavior: Why Do Models Fail? Zhang, Anqi Abstract | PDF

Title: Diagnosing AI Misbehavior: Why Do Models Fail?

Candidate: Zhang, Anqi

Advisor(s): Jinyang Li, Aurojit Panda

Abstract:

As AI models become increasingly pervasive across critical domains, understanding and diagnosing their failures has become paramount for ensuring safety, reliability, and trust. This dissertation addresses the importance of diagnosing AI misbehavior across shifting deep learning paradigms — from classifiers to Graph Neural Networks (GNNs) to Large Reasoning Models (LRMs) — each exhibiting distinct failure modes that demand specialized diagnostic approaches.

In this thesis, we focus on different models to explore and diagnose model misbehaviors: (a) for classifiers, we introduce the Average Marginal Effect (AME), a scalable data attribution method that traces prediction errors back to problematic training data, achieving efficient attribution under the sparsity assumption; (b) for GNNs, we develop a novel long-distance targeted poisoning attack that reveals critical blind spots in GNN explanation tools, and adapt our AME method to locate poisoned subgraphs; (c) for large reasoning models, we design self-verification probes, which reveal that intermediate answer correctness signals are encoded in a reasoning model’s hidden states, and enable confidence-based early-exit strategies that reduce inference tokens without compromising accuracy. Our work advances both the understanding of AI misbehavior and the development of practical tools for building more trustworthy, efficient, and interpretable AI systems.
Ph.D. Thesis 2025 An Explicit Certified Method for Path Planning Problem of an SE(3) Robot Zhang, Zhaoqi Abstract | PDF

Title: An Explicit Certified Method for Path Planning Problem of an SE(3) Robot

Candidate: Zhang, Zhaoqi

Advisor(s): Chee Yap

Abstract:

The design and implementation of theoretically-sound robot motion planning algorithms is challenging, especially for robots with high degrees of freedom (DOF). This thesis presents an explicit, practical and certified path planner for a rigid spatial robot with 6 DOFs. The robot is a spatial triangle moving amidst polyhedral obstacles. Correct, complete and practical path planners for such a robot has never been achieved. It is widely recognized as a key challenge in robotics. We design such a planner by using the Soft Subdivision Search (SSS) framework, based on the twin foundations of ε-exactness and soft predicates. This SSS planner is a theoretical alternative to the standard exact algorithms, and provides much stronger guarantees than probabilistic or sampling algorithms.

In this thesis, we address technical challenges for the SE(3) robot. First, we establish the foundational theory of SSS framework by proving a general form of the Fundamental Theorem of SSS. Second, we introduce a topologically correct data structure for non-Euclidean path planning in the SE(3) space. Third, we analyze the distortion bound of the SE(3) representation. Fourth, we design an approximate footprint and combine it with the highly efficient feature set technique which leads to its soft predicate. Finally, we explicitly design the geometric primitives to avoid using a general solver of a polynomial system. This allows a direct implementation. These contributions represent a robust, practical, and adaptable solution to robot motion planning.
Ph.D. Thesis 2025 On the Diversity and Stability of Internal Representations in Deep Neural Networks Zhu, Jiachen Abstract | PDF

Title: On the Diversity and Stability of Internal Representations in Deep Neural Networks

Candidate: Zhu, Jiachen

Advisor(s): Yann LeCun

Abstract:

The quality of internal representations is fundamental to the performance and generalization capabilities of deep neural networks. However, standard training paradigms often produce representations that are suboptimal; they can suffer from feature redundancy and dimensional collapse, which harms transferability, and they rely on complex normalization layers to ensure stable training dynamics. This thesis addresses these critical challenges through a comprehensive investigation into methods that directly shape and control the properties of learned representations. First, we tackle the problem of feature diversity by introducing Variance-Covariance Regularization (VCReg), an explicit regularization method that encourages the network to learn high-variance and low-covariance features. By applying this principle to intermediate representations, we show that VCReg effectively mitigates neural collapse and gradient starvation. This leads to significant improvements in transfer learning performance across a wide range of tasks and modalities, including image classification, video action recognition, and long-tail learning scenarios.

Second, we study the challenge of training stability. Motivated by an empirical analysis of how normalization layers shape activation distributions, we introduce Dynamic Tanh (DyT), a simple, element-wise function designed to replace normalization layers entirely. We demonstrate that Transformers equipped with DyT can be trained stably without any normalization, matching or exceeding the performance of their conventional counterparts on benchmarks spanning computer vision, language modeling, and generative modeling. Taken together, the contributions in this thesis demonstrate that by controlling the statistical properties of internal representations—through both explicit regularization and principled architectural design—we can build deep learning models that are more robust, generalizable, and efficient.