Theses & Reports

Instructions for submitting a technical report or thesis.

You can find technical reports published prior to 1990 archived here.

Title

Authors

Year

Ph.D. Thesis 2024 On Efficient Instantiations of Secure Multi-Party Computation in Practice Bienstock, Alexander Abstract | PDF

Title: On Efficient Instantiations of Secure Multi-Party Computation in Practice

Candidate: Bienstock, Alexander

Advisor(s): Yevgeniy Dodis/Marshall Ball

Abstract:

Secure Multi-Party Computation (MPC) is an area of cryptography that has been studied extensively since the 1980s. In full generality, MPC allows a set of mutually distrusting parties to privately compute a function of their inputs. That is, the parties interact in some protocol, and at the end obtain the output of the function, and nothing else. In the decades since the inception of MPC, great strides have been made towards making it more efficient. However, despite this progress, the use of MPC in practice still faces some shortcomings.

In this thesis, we take steps to mitigate two such shortcomings. The first deficiency we study is related to the communication networks in which such MPC protocols operate. MPC protocols are usually designed assuming that all parties have pairwise secure communication channels which are stable; i.e., nodes never crash, messages always arrive on time, etc. However, in the real-world, this is rarely the case—it is hard to sustain a stable connection between parties over long periods of time. One such model that has been introduced to address this deficiency is called Fluid MPC (Choudhuri et al., CRYPTO 2021). In this model, parties are not mandated to stay online for long periods of time. Instead, parties come online for short periods of time and work together in committees to compute some function. The benefit is that individual committees are much more likely to be able to sustain stable connections for these shorter interactions. However, existing protocols in this model do not match the level of efficiency that is obtained by traditional MPC protocols. In the first part of this thesis, we study Fluid MPC, and in particular, introduce Fluid MPC protocols with efficiency that matches those of traditional MPC.

The second deficiency of MPC which we study in this thesis is that general-purpose protocols often are still not efficient enough to be used in practice. One way to resolve this is by using protocols that are tailor-made for specific applications. One such application that has gained recent attention is called Private Join and Compute (PJC). In this application, two parties come together with input sets and associated values for each item in their sets. The goal is to privately compute a function over the associated values of the intersection of the two sets. In practice, the size of the intersection is quite small, and therefore the private computation of the intersection is actually much more expensive than whatever computation that needs to be done over it. In the second part of this thesis, we improve the efficiency of tailor-made state-of-the-art protocols that are used to privately compute the intersection, thus improving the efficiency of prior PJC protocols.
Ph.D. Thesis 2024 Generative modeling and Stochastic Control as Dynamics on Probability Distributions Domingo i Enrich, Carles Abstract | PDF

Title: Generative modeling and Stochastic Control as Dynamics on Probability Distributions

Candidate: Domingo i Enrich, Carles

Advisor(s): Joan Bruna

Abstract:

Several modern machine learning algorithms can be studied from the perspective of evolution dynamics on the space of probability measures. Gradient descent-ascent algorithms that are used to solve minimax problems such as the ones arising in generative adversarial networks (GANs) can be interpreted as a joint evolution of two measures: one over the space of parameters of the generator, and one over the space of parameters of the discriminator.

In the first chapter of the thesis, I study systems of this form, and provide convergence guarantees when possible. Diffusion models, which are another generative modeling technique, are also based on dynamics on probability measures, in this case over the space of samples. The dynamics are simulated at inference time; the starting distribution is a Gaussian, and the final distribution is meant to be the target data distribution. Diffusion models were generalized by the Flow Matching framework, which allows to construct different paths between the Gaussian noise distribution and the data distribution.

In the second part, I introduce Multisample Flow Matching, which is a generalization of Flow Matching with intimate connections to optimal transport. Stochastic optimal control is a third problem where dynamics on
measures play a critical role. The goal is to learn a vector field (the control) in order to drive the behavior of the solutions of a stochastic differential equation.

In the third chapter, I present Stochastic Optimal Control Matching, a least-squares loss that is based on the same principles that are used to formulate diffusion model losses, and which achieves errors that are an order of magnitude lower than for existing methods.

The talk will cover the second and third chapters.
Ph.D. Thesis 2024 Solver-Aided Compiler Design for Programmable Network Devices Gao, Xiangyu Abstract | PDF

Title: Solver-Aided Compiler Design for Programmable Network Devices

Candidate: Gao, Xiangyu

Advisor(s): Anirudh Sivaraman, Srinivas Narayana

Abstract:

Historically, network devices were mostly fixed-function ones. They could run at a line rate of one network packet per nanosecond, but it was impossible to support newly developed network algorithms without upgrading the device. The emergence of programmable network devices remedies this drawback. These devices use reconfigurable match table model to enable programmability and provide more flexibility for developers to continue updating and adding new algorithms to the device. People developed several programming languages to write programs for these devices. Even though it is not hard to get started with writing packet-processing code, writing programs that can fit within the target devices’ various resource constraints is not an easy job. The root cause for that is the lack of optimizing compilers in this domain. Hence, this thesis focuses on optimizing compiler design for domain-specific network accelerators using solver-aided techniques that can generate better compilation results compared with state-of-the-art compilers.

First, we build the Chipmunk compiler that does code generation for stateful transactions into programmable switches using program synthesis. We frame the compilation problem as a solution-searching problem and use a program synthesis engine, SKETCH, to find a semantically equivalent compilation outcome. Additionally, we also develop a series of algorithms to speed up the compilation process. We find that the Chipmunk compiler can generate better compilation results in terms of hardware resource usage within a reasonable time period.

Second, we build the CaT compiler that does both code generation and resource allocation into the packet-processing pipeline using solver-aided technologies. We decompose the compilation problem for such pipelines into three phases—making extensive use of solver engines to simplify the development of these phases. We also incorporate some heuristics for further resource usage optimization. We observe that the CaT compiler can generate near-optimal compilation results at a much faster speed than Chipmunk.

Third, we build the Polyglotter compiler that outputs programs for target hardware devices from input programs written for source hardware devices in the parser portion. This compiler unifies features across different programming languages and reduces the efforts required to write algorithms across platforms. We discover that the Polyglotter compiler can generate correct transpilation results with better hardware resource usage.

To our best knowledge, we for the first time propose to incorporate solver-aided techniques into compiler design for programmable network devices. In the domain of programmable network devices, these compilers can outperform traditional compilers that rely on program rewrite rules. Our contributions are beyond just building solver-aided compilers and include domain specific algorithms to speed up the whole compilation process. Based on these developed compilers, we explore several useful aspects of solver-aided techniques and hope to extend them into more applications in future works.
Ph.D. Thesis 2024 Predictive and Generative Models of Protein Sequence and Structure Lin, Zeming Abstract | PDF

Title: Predictive and Generative Models of Protein Sequence and Structure

Candidate: Lin, Zeming

Advisor(s): Yann LeCun

Abstract:

Historically, protein engineering has predominantly involved a bottom-up strategy, utilizing naturally occurring components as the building blocks. However, the problem of designing arbitrary protein sequences and structures for specific problems present significant challenges due to the complexity of biological systems. In this work, we tackle the problem of developing models of protein sequences and structures for prediction and generation. We show that neural networks can learn the patterns inherent to these systems and provide results for modeling protein through predicting protein structures from a given sequence and vice versa. Generative models can also model the unconditional distributions of protein sequence and
structure.

To model protein structures, we present an autoencoder architecture that can produce a wide array of protein backbones to model protein structures. These structures exhibit both local and global coherence in terms of secondary and tertiary structures. Using classical techniques to design sequences that fold to generated backbones, we show that the model can generate novel sequences which are validated in-silico. To generate better sequences for these backbones, we then present ESM-IF1, a model for fixed backbone protein design. We designed a large-scale system to predict millions of structures using AlphaFold. By training on the synthetic data, we were able to obtain state of the art results and obtain over 50% sequence recovery.

We then scale large protein language models to 15 billion parameters (ESM-2) as an unconditional model of protein sequences. ESM-2 is capable of replacing multiple sequence alignment (MSA) features to obtain nearly state-of-the-art structure prediction results from a single sequence Removing MSA features gives a 60x speed up, allowing us to catalog the largest database of predicted protein structures. We open-sourced the ESM Metagenomic Atlas, a database of over 225 million high-confidence predicted structures, giving us an unprecedented view into the vast breadth and diversity of natural proteins. Finally, the speed and single sequence nature of our model allows us to directly optimize the protein sequence with respect to the protein structure. We show that black box optimization techniques can enable the design of proteins with structural constraints as symmetry, scaffolding, and binding. In sum, we present a series of models that are able to model the conditional and unconditional distributions of protein sequence and structure.
Ph.D. Thesis 2024 Towards Responsible AI: Safeguarding Privacy, Integrity, and Fairness Mirza, Muhammad Shujaat Abstract | PDF

Title: Towards Responsible AI: Safeguarding Privacy, Integrity, and Fairness

Candidate: Mirza, Muhammad Shujaat

Advisor(s): Prof. Christina Pöpper

Abstract:

The widespread adoption of Artificial Intelligence (AI) into digital platforms, spanning general-purpose applications such as chatbots, professional tools like code generation, and high-risk domains like healthcare, has profoundly transformed user experiences. However, this rapid integration has also brought to the forefront critical concerns surrounding privacy, integrity, and fairness. This thesis systematically investigates these three interconnected challenges through comprehensive investigations revealing vulnerabilities and proposes approaches to address them, contributing to the responsible development of AI technologies.

In addressing privacy concerns, we focus on managing personal information exposure in an era where digital data persists indefinitely. We begin with a global longitudinal analysis of privacy narratives to contextualize the evolving landscape of privacy concerns. Next, we systematically develop a semi-automated pipeline to assess the risks of training data extraction from large language models (LLMs), particularly those used for code generation such as Github Copilot. We demonstrate the feasibility of leaking various types of sensitive personal information, including email addresses, medical records, and passwords. Finally, we undertake a comprehensive systematization of privacy-enhancing technologies for exposure management, bridging gaps between technical solutions and user needs. We identify key discrepancies and propose actionable strategies for aligning technical solutions with user expectations. These findings lay the groundwork for user-centric privacy solutions that effectively address data persistence challenges.

To tackle threats to information integrity, we focus on the potential misuse of generative AI tools and coordinated disinformation campaigns. We conduct a detailed evaluation of factual accuracy of frontier LLMs, such as the GPT series, in the zero-shot classification setting. By comparing different model versions we uncover inconsistencies in performance improvements, with GPT-4's March release outperforming its June counterpart. Next, we develop a novel cybersecurity-inspired framework for characterizing disinformation threats, profiling threat actors, attack patterns, targets, and channels. We validate our framework's effectiveness through case studies of real-world disinformation campaigns, highlighting its potential to strengthen the integrity of online information ecosystems and laying the groundwork for potential automated threat-scoring systems.

Lastly, we address fairness in machine learning systems by identifying biases that reinforce inequalities. We introduce Global-Liar, a novel dataset uniquely balanced in terms of geographic representation, facilitating a more nuanced factuality evaluation of LLM biases across different regions. Using this dataset, we conduct a rigorous evaluation of general-purpose LLMs, revealing significant disadvantages faced by the Global South. Next, we conduct thorough investigation into fairness in high-risk computer vision models used for medical diagnosis in healthcare. Our assessment reveals significant racial and sex biases in kidney and tumor segmentation tasks. We investigate a range of bias mitigation approaches, from pre-processing techniques, like stratified batch sampling, to algorithmic interventions, like fair meta-learning. Notably, our findings suggest that architectural choices play a significant role in bias reduction, emphasizing the necessity of careful design and thorough evaluation of model architectures.

In summary, our findings and proposed solutions in privacy, integrity, and fairness contribute to responsible AI development, aiming to democratize its benefits across all constituencies.
Ph.D. Thesis 2024 Learning from Rewards in Text Generation Pang, Richard Yuanzhe Abstract | PDF

Title: Learning from Rewards in Text Generation

Candidate: Pang, Richard Yuanzhe

Advisor(s): He He, Kyunghyun Cho

Abstract:

The progress in text generation comes from every stage in the pipeline: problem definition, data curation, learning, decoding, and evaluation. This dissertation focuses on learning. There is a mismatch between traditional training objectives and evaluation objectives: regular maximum likelihood estimation tries to minimize the cross-entropy loss with respect to each sample in the dataset, but the downstream evaluation is often based on a reward that scores the compatibility of the input-output pair (e.g., human judgments of the output). I aim to bridge this gap by optimizing for the reward of the generated text directly.

The talk is composed of the following components. (1) Rewards could be expensive to obtain. To tackle this challenge in the social dialogue setting, we extract implicit signals from deployment data without extra human annotations. (2) The model could make slow or no progress in learning, and one idea is to obtain denser and high-quality rewards. In neural machine translation, we define a reward inspired by noisy channel decoding which has a long history, and we are able to increase decoding speed significantly while ensuring similar translation quality. (3) Another way to make progress in learning is to innovate on training algorithms instead. We set the rewards to be based on the simple exact match of generations and references, but algorithm-wise we explore the extreme case where we do not deviate too far from references by framing text generation as an offline reinforcement learning (RL) problem. We propose generation by off-policy learning from demonstrations (GOLD) using importance weighting. Our generations outperform those trained by MLE and policy gradient on a range of tasks. (4) We show that we do not need to rely on RL using a few reasoning tasks (e.g., math, science, commonsense) as the testbed. We develop an approach called iterative reasoning preference optimization (IRPO) that optimizes for winning vs. losing reasoning chain-of-thoughts, using modified direct preference optimization as the criteria. IRPO results in markedly increased accuracies compared to a range of baselines.

To conclude the talk, I will discuss the future directions of using large language models as rewards. I will briefly mention the initial promise given by our work on self-rewarding language models using LLM-based rewards with a learning algorithm connected to that in IRPO; the discussion is then followed by the corresponding challenges and next steps. I will also touch on human–AI collaboration – an additional way to improve LLM evaluation capabilities.
Ph.D. Thesis 2024 Verification of Concurrent Search Structures Patel, Nisarg Abstract | PDF

Title: Verification of Concurrent Search Structures

Candidate: Patel, Nisarg

Advisor(s): Prof. Thomas Wies

Abstract:

Concurrent search structures are a class of concurrent data structures that implement a key-value store. Concurrent search structures are integral components of modern software systems, yet they are notoriously difficult to design and implement. In the context of concurrency, linearizability is the accepted notion of correctness of a data structure. Verifying linearizability of concurrent search structures remains a formidable challenge due to the inherent complexity of the underlying algorithms. So far, verification of these data structures has often led to large, intricate proofs that are hard to comprehend and reuse.

The concrete contribution of the thesis is developing and verifying new template algorithms that cover several variants of lock-free skiplists and lock-based log-structured merge (LSM) trees. The template algorithms capture concurrency mechanism, but abstract away node-level details and the maintenance operations.

The generalizable contribution of the thesis is the advancement in the verification technology required to prove the new template algorithms. There are two key contributions here, first relating to hindsight reasoning and second to keyset reasoning. Hindsight reasoning has been shown to be useful for proving linearizability, but it has not been explored in the context of a foundational program logic. The thesis addresses the challenge by embedding the technique of hindsight reasoning in the concurrent separation logic Iris via prophecy variables. Keyset reasoning is useful for lifting assertions on a node's contents to the global contents held by the structure. The thesis develops a keyset resource algebra, an Iris resource algebra to enable keyset reasoning in Iris.

All of the techniques and proofs are mechanized in Iris/Coq. Verified search structures include in particular the Michael set, the Harris list, the Herlihy-Shavit skiplist and an LSM-tree implementation based on LevelDB. The verification effort represents a significant contribution as it is the first mechanized proof of linearizability for concurrent skiplists and LSM-trees.
Ph.D. Thesis 2024 Neural Language Representations and Scaling Semi-Supervised Learning for Speech Recognition Peyser, Cal Abstract | PDF

Title: Neural Language Representations and Scaling Semi-Supervised Learning for Speech Recognition

Candidate: Peyser, Cal

Advisor(s): Prof. Kyunghyun Cho, Prof. Michael Picheny

Abstract:

Speech recognition research has been focused for several years on the incorporation of unpaired speech and text data alongside conventional supervised datasets. Dominant methods have emphasized auxiliary tasks for refining speech and/or text representations during model training. These methods have generally performed strongly when paired with very small supervised datasets, but do not yield the same improvements against strong, supervised baselines. We argue in this thesis that the path to scaling these methods lies in the speech and text representations themselves. We investigate statistical properties of these representations, and show that downstream ASR performance corresponds to a model's ability to jointly represent speech and text. We analyze existing methods for semisupervised ASR, and develop an algorithm to improve them at scale by aligning speech and text in representation space.
Ph.D. Thesis 2024 Unlocking AI outside the training distribution: Generalization, Causality, and Coronary Risk Modeling Puli, Aahlad Manas Abstract | PDF

Title: Unlocking AI outside the training distribution: Generalization, Causality, and Coronary Risk Modeling

Candidate: Puli, Aahlad Manas

Advisor(s): Prof. Rajesh Ranganath

Abstract:

Modern AI models make it easy to exploit the correlations in a dataset to predict a target of interest from a given set of inputs. However, the primary use of these models often lies outside the training data. For example, while one can train a Transformer to correlate a patient's medical history to their chances of developing coronary heart disease (CHD), the goal would be to estimate risks on populations elsewhere or in the future. Challenges arise if the model relies on correlations that shift between training and test times or capture non-causal relationships. Predictions based on unstable relationships can degrade outside the training distribution, and basing treatment decisions on non-causal relationships can result in harm. This thesis first develops a methodology for generalizing out-of-distribution (OOD) and estimating causal effects. It closes with an empirical study of building and transporting CHD risk models at two large hospital systems.

The first part begins by defining a class of distribution shifts where standard training or balancing the data yield models can perform worse than random guessing. We characterize representations that generalize across such shifts and derive an algorithm to build models with such representations. Next, we develop an approach to encode knowledge of features used by humans into building robust models. The last work in this part identifies biases implicit in the standard way of training, gradient-based optimization of cross-entropy, that force models to depend more on unstable features than on the more informative stable ones. We develop a class of loss functions to encourage dependence on the more informative features.

The second part of this thesis studies cases where common assumptions that enable causal estimation are violated. We provide an algorithm to estimate causal effects with deep models from confounded data where instrumental variables are available. This algorithm generalizes the control function method and works without the separability assumptions required by popular algorithms like the two-stage least-squares and generalized method of moments. Then, we consider tasks where the confounders are known to equal a function of the variables whose effects we want to estimate; this setup violates an assumption known as overlap or positivity, commonly made to uniquely determine (identify) causal effects from non-randomized data. In this setting, we derive nonparametric conditions for identifiability and derive an estimator that solves a gradient flow equation to answer general causal queries from the data without overlap.

The last part of this thesis performs an empirical study of building and transporting CHD risk models between two large hospitals. Departing from the standard approach of constructing risk scores from carefully chosen features, we use broad feature sets available in the electronic health records (EHRs). We train AI models to predict time-to-CHD from minimally curated EHR data that outperform existing risk scores at the institution where they were trained and when transported externally.
Ph.D. Thesis 2024 DrawTalking: Building Interactive Worlds by Sketching and Speaking Rosenberg, Karl Toby Abstract | PDF

Title: DrawTalking: Building Interactive Worlds by Sketching and Speaking

Candidate: Rosenberg, Karl Toby

Advisor(s): Ken Perlin

Abstract:

This thesis introduces the design and implementation of an interaction concept called DrawTalking. Through simple combinations of sketching and speaking, the user can improvisationally build an interactive world of graphics, animations, diagrams, and dynamic mechanisms with behavior and rules, as if by narrating a story or explaining a concept to an audience. The interface demonstrates a possible step towards designing future interfaces more closely in-tune with how we naturally communicate and think.

For context, sketching while speaking has played a major part in innovation across disciplines. The combination of visuals and spoken language enables us to make-believe: think about, describe, communicate, and interact with anything that we can think of, including things that do not or cannot exist in the real world. Evolving technology creates opportunities to move beyond sketching and speech alone. Human-computer interactions of the future, drawing inspiration from our process of make-believe, can add interactive computation to the combination of sketching and speech, allowing us to work with explorable worlds, simulations, and mechanics. By enabling such interactions, we might think, learn, design, play, and tell stories in increasingly expressive ways.

Towards this idea, what makes for a good interface for computation-mediated sketching and speaking? This touches upon several fundamental questions in interaction design, human-AI interaction, and human-centered interfaces, chiefly among them, how to balance human control and machine automation?

Inspired by real-world speaking and sketching interactions, and seminal works in dynamic sketching, interactive visual programming, and language interfaces, we designed interaction techniques that draw on the way people describe objects and phenomena when telling stories and explaining processes at a whiteboard.

How does it work? the user speaks to label hand-drawn sketches with names and properties, and to define rules for how their world should behave. This communicates semantic intent to the computer, while giving the user the flexibility to choose how to represent and change their drawings. Now the user can interact with a simulated world simply by narrating stories or describing mechanics, which dynamically creates running interactive programs from built-in primitives and user-customized rules.

To gauge understanding of the mechanics of DrawTalking and to derive use cases, we invited participants to an open-ended one-on-one user-study session with the researcher to discover and explore the features in DrawTalking. Each user improvised and prototyped interactive sketch-based animations and gameplay scenarios by collaborating with the researcher. The resulting artifacts and discussion were oriented around each participant's specific experiences and background.

Feedback suggests that our approach is promising and intuitive: it prioritizes user control; it is flexible and supports improvisation; the workflow is fluid; the features are extensible and adaptable to other application domains and contexts beyond sketching; the design demonstrates how multiple applications can use similar language-based interaction techniques and behaviors predictably alongside other language-based technologies; it enables programming-like capability without code.

Through the research and design process of DrawTalking, we learned that it could represent an approach to designing complex interoperating systems for human-AI collaboration. We hope it can serve as a useful example for research and design of future machine-mediated interfaces, interactions, and computer systems.
Ph.D. Thesis 2024 Algorithmic enhancements to causal inference problems Shen, Bingran Abstract | PDF

Title: Algorithmic enhancements to causal inference problems

Candidate: Shen, Bingran

Advisor(s): Prof. Dennis Shasha

Abstract:

This thesis explores novel approaches to inferring and representing causal relationships in biological networks. We introduce EnsInfer, an ensemble method that combines state-of-the-art inference algorithms using a Naive Bayes classifier, outperforming individual methods and providing a flexible framework for integrating diverse data types. Our research then challenges the conventional representation of gene regulatory networks (GRNs) by demonstrating that nonlinear machine learning models achieve better predictive performance than models based solely on "gold standard" regulatory edges. To address this limitation, we propose a bipartite network representation that better captures the synergistic regulatory effects of multiple transcription factors on target genes. This framework focuses on four key goals: predictive accuracy, parsimonious enumeration of predictive regulatory genes, identification of disjoint sets of predictive regulatory genes, and construction of a bipartite network representation of causality. Our work provides an actionable and interpretable paradigm for investigating causal gene regulation, with potential applications across diverse domains of causality research.
Ph.D. Thesis 2024 Olympiad-level Geometry Theorem Proving without Human Demonstrations Trinh, Trieu Abstract | PDF

Title: Olympiad-level Geometry Theorem Proving without Human Demonstrations

Candidate: Trinh, Trieu

Advisor(s): He He

Abstract:

Proving mathematical theorems at Olympiad level represents a significant milestone in human-level automated reasoning, owing to their reputed difficulty among the world’s best talents in pre-university mathematics. Current machine learning approaches, however, are not applicable to most mathematical domains due to the high cost of translating human proofs into machine-verifiable format. The problem is even worse for geometry due to its unique translation challenges, resulting in severe scarcity of training data. We propose G0, a theorem prover for Euclidean plane geometry that sidesteps the need for human demonstrations by synthesizing millions of theorems and proofs across different levels of complexity. G0 is a neuro-symbolic system that uses a neural language model, trained from scratch on our large-scale synthetic data, to guide a symbolic deduction engine through infinite branching points in challenging problems. On a test set of 30 latest Olympiad problems, G0 solves 25, outperforming the previous best method that only solves 10 problems and approaching the performance of an average International Mathematical Olympiad (IMO) gold medalist. Notably, G0 produces human-readable proofs, solves all geometry problems in the IMO 2000 and 2015 under human expert evaluation, and discovers a generalized version of a translated IMO theorem in 2004.
Ph.D. Thesis 2024 Improve Language Model Serving Efficiency with Fine-grained and Stateful Scheduling Yu, Lingfan Abstract | PDF

Title: Improve Language Model Serving Efficiency with Fine-grained and Stateful Scheduling

Candidate: Yu, Lingfan

Advisor(s): Jinyang Li

Abstract:

The world has witnessed the remarkable success of large language models (LLMs), led by the fast-growing popularity of ChatGPT. However, it is challenging to serve these language models and deliver both high throughput and low latency due to the iterative nature of language models. This thesis identifies two key issues impacting the performance of existing systems: (1) Coarse-grained batching at the request level results in wasteful computation for requests with variable input and output lengths; (2) The lack of stateful context management results in duplicate computation for applications that engage in multi-turn interactions with the LLM model. Two systems, BatchMaker and Pensieve, are then presented to address these issues.

BatchMaker proposes a technique called cellular batching to improve the latency and throughput of language model inference. Existing systems use batch execution of the dataflow graphs of a fixed set of requests. By contrast, BatchMaker makes finer-grained batching decisions at each token processing step, and dynamically assembles a batch for execution as requests join and leave the system.

Pensieve is a system optimized for multi-turn conversation LLM serving. It maintains the conversation state across requests from the same conversation by caching previously processed history to avoid duplicate processing. Pensieve's multi-tier caching strategy utilizes both GPU and CPU memory to store and retrieve cached data efficiently. Pensieve also generalizes the recent PagedAttention kernel to support attention between multiple input tokens whose KV cache is spread over non-contiguous GPU memory.

Experiments on various workloads show that BatchMaker improves throughput by 25-80% while reducing latency by 18-90% latency, and Pensieve improves throughput by 33-100% and reduces latency by 40-77%.
Ph.D. Thesis 2024 Theory of Symmetric Neural Networks Zweig, Aaron Abstract | PDF

Title: Theory of Symmetric Neural Networks

Candidate: Zweig, Aaron

Advisor(s): Joan Bruna

Abstract:

Symmetric functions, which take as input an unordered, fixed-size set, find practical application in myriad physical settings based on indistinguishable points or particles, and are also used as intermediate building blocks to construct networks with other invariances. Symmetric functions are known to be universally representable by neural networks that enforce permutation invariance. However the theoretical tools that characterize the approximation, optimization and generalization of typical networks fail to adequately characterize architectures that enforce invariance.

This thesis explores when these tools can be adapted to symmetric architectures, and when the invariance properties lead to new theoretical findings altogether. We study and prove approximation limitations on the extension of symmetric neural networks to infinite-sized inputs, the approximation capabilities of symmetric and antisymmetric networks relative to the interaction between set elements, and the learnability of simple symmetric functions with gradient methods