Theses & Reports

Instructions for submitting a technical report or thesis.

You can find technical reports published prior to 1990 archived here.

Title

Authors

Year

Ph.D. Thesis 2019 From 2.5G To 5G: Enhancing Access And Performance For Mobile Users Ahmad, Talal Abstract | PDF

Title: From 2.5G To 5G: Enhancing Access And Performance For Mobile Users

Candidate: Ahmad, Talal

Advisor(s): Subramanian, Lakshminarayanan

Abstract:

This dissertation has two overarching themes: i) enhancing connectivity access for mobile users in rural contexts and ii) enhancing transport layer performance for mobile users.

More than half of the world’s population faces barriers in accessing the Internet. A recent ITU study estimates that 2.6 billion people cannot afford connectivity and that 3.8 billion do not have access to it. To enhance access I have worked on two projects: Wi-Fly and GreenApps. Wi-Fly is a new connectivity paradigm designed for regions without Internet coverage that enables communication between a lightweight Wi-Fi device on commercial planes and ground stations. Through empirical experiments with test flights and simulation, we show that Wi-Fly and its extensions have the potential to provide connectivity in the most remote regions of the world. In GreenApps, we look at how localized cellular applications can be built for rural communities on top of software-defined cellular base stations. We deployed the GreenApps platform on rural base stations for communities in Ghana and Nicaragua and supported multiple localized applications for rural communities.

Enhancing transport layer performance over cellular networks is critical to improve end-to-end application performance for mobile users. Cellular networks have unique challenges that make conventional transport protocols not suitable for these environments. In the past few years, several new delay-based congestion-control algorithms have been developed with complex nonlinear control loops for cellular contexts. While these protocols have shown promise, it has been extremely challenging to analyze and interpret the behavior of these algorithms especially under highly variable network conditions (e.g., cellular links). In the Model-Driven Interpretable (MDI) congestion control work, we provide a model-driven framework to reason about the behavior of such congestion control algorithms. Our modeling approach simplifies a congestion control algorithm’s behavior into a guided random walk over a two-dimensional Markov model. We show that the model of a congestion-control algorithm can give key insights into its convergence and performance. More recently, we also looked at how to learn early signals of congestion in highly varying 5G channels. In particular we worked with Wi-Gig traces collected at 60 GHz and showed that it is possible to learn highly accurate early congestion signals using delay features observed at end-hosts.
M.S. Thesis 2019 End-to-End Hierarchical Clustering with Graph Neural Networks Choma, Nicholas Abstract | PDF

Title: End-to-End Hierarchical Clustering with Graph Neural Networks

Candidate: Choma, Nicholas

Advisor(s): Bruna, Joan

Abstract:

The objective of this thesis is to develop a data-driven, hierarchical clustering method which is capable of operating on large point cloud datasets, necessitating a runtime which is sub-quadratic. Hierarchical clustering is noteworthy for its ability to produce multiscale views of data, allowing for rich and interpretable representations, and for its ability to cluster when the number of clusters is not specified a priori. To date, deep learning methods for clustering have primarily focused on a narrower class of models which cluster using partitioning strategies and require as input the number of clusters to produce. In this work, we introduce the clustering graph neural network, extending previous research into graph neural networks to handle large clustering tasks where the number of clusters is variable and not pre-specified. Our architecture is fast, operating with O(n log n) time complexity, and we note its amenability to high levels of parallelization. Because each stage is differentiable, we emphasize that our architecture is capable of end-to-end training, leveraging signal throughout the learning pipeline as part of a multi-objective loss function. Finally, we demonstrate the clustering graph neural network on a challenging particle tracking task, which, while unable to outperform highly-tuned and domain-specific baselines, nevertheless achieves high performance while remaining flexible to a wide array of clustering tasks.
Ph.D. Thesis 2019 Co-Located Augmented and Virtual Reality Systems DeFanti, Connor Abstract | PDF

Title: Co-Located Augmented and Virtual Reality Systems

Candidate: DeFanti, Connor

Advisor(s): Perlin, Ken

Abstract:

Augmented and Virtual Reality (AVR) systems have become increasingly popular in the worlds of entertainment and industry. However, many current systems are limited in scope to experiences that isolate a single user within a given physical space. While many such experiences allow for interactions between remotely located users, very few experiences allow for multiple users to coexist in the same physical space while interacting with a consistent world-view of shared virtual objects. Our research has found that by enabling this co-located paradigm, users are able to have rich interactions that are otherwise impossible. This thesis presents a series of experiments that demonstrate the importance of the social aspects of co-located AVR, a set of solutions that overcome the difficulties often encountered in such experiences, and directions for future scalability using forthcoming hardware and technologies.
TR2019-993 2019 Vertex-Based Preconditioners for the Coarse Problems of BDDC Dohrmann, Clark R.; Pierson, Kendall H.; Widlund, Olof B. Abstract | PDF

Title: Vertex-Based Preconditioners for the Coarse Problems of BDDC

Author(s): Dohrmann, Clark R.; Pierson, Kendall H.; Widlund, Olof B.

Abstract:

We present a family of approximate BDDC preconditioners based on inexact solvers for the coarse problem. The basic idea is to replace the direct solver for a standard BDDC coarse problem by a preconditioner which requires much less computation and memory. The focus in this study is on scalar elliptic and linear elasticity problems in three dimensions. The preconditioner for the coarse problem employs a standard two-level additive Schwarz approach in which the coarse problem dimension is either one or six times the number of subdomain vertices. We show, under certain assumptions on the coefficients, that favorable BDDC condition number estimates also hold for the approximate preconditioners. Numerical examples are presented to confirm the theory and to demonstrate the computational advantages of the approach.
Ph.D. Thesis 2019 Design for Customized Manufacturing Gil-Ureta, Francisca T. Abstract | PDF

Title: Design for Customized Manufacturing

Candidate: Gil-Ureta, Francisca T.

Advisor(s): Denis Zorin

Abstract:

Over the past few years, 3D printing technology has captivated business and consumers alike with its promise of affordable custom manufacturing. The expectation is, in the future, people will be able to easily customize and manufacture objects to fit individual needs. To make this a reality, we need new methods that support the creative process of makers, from conception to fabrication.

In this thesis, I present three projects where we reexamine the tools and workflows used for customized design. The core idea behind these projects is that, compared with traditional methods, we design for an unknown or changeable manufacturing process, which affects the life-cycles of design. Our goal is to create tools that simplify the modification, optimization, and evaluation of designs such that they can be easily altered to fit manufacturing and personal constraints.

Although fabrication constraints are unlimited, we can study specific domains to learn the most common ones. In the first project, we present an interactive modeling tool for designing mechanical objects, which are determined mostly by kinematic constraints. In the second project, we study the structural efficiency of shells and introduce an efficient method for designing shell reinforcements of minimal weight. Finally, in the third project, we develop a robust collision resolution algorithm, crucial for the design and optimization of
models subject to dynamic impulses.
M.S. Thesis 2019 On Zero-Shot Transfer Learning for Event Extraction Haroon, Shaheer Abstract | PDF

Title: On Zero-Shot Transfer Learning for Event Extraction

Candidate: Haroon, Shaheer

Advisor(s): Grishman, Ralph

Abstract:

Event extraction normally requires large amounts of annotated data for each event type. Each event consist of trigger words and arguments that fulfill certain roles. This limits the ability to add new event types to an existing ontology or when building a new one because of the massive effort involved for manually annotating a corpus. Recent methods have proposed using zero-shot transfer learning to minimize the amount of annotated data required for a classifier to predict new event types. The zero-shot classifier relies on several components, including a preexisting event ontology to be successful. Our goal was to explore factors including choice of role names, event type names, and definitions of event mention and event type structures that could influence the results of a zero-shot classifier. We found that the use of paradigmatic role names and characteristic event type names in an event ontology especially have significant impact on the success of the classifier. As a result, there is still a decent amount of effort required when adding new event types to an ontology in order to promote the success of a zero-shot approach.
Ph.D. Thesis 2019 Scalable Machine Learning using Dataflow Graph Analysis Huang, Chien-Chin Abstract | PDF

Title: Scalable Machine Learning using Dataflow Graph Analysis

Candidate: Huang, Chien-Chin

Advisor(s): Li, Jinyang

Abstract:

In the past decade, the abundance of computing resources and the growth of data boost the development of machine learning applications. Many computation frameworks, e.g., Hadoop, Spark, TensorFlow, and PyTorch, have been proposed and become widely used in the industry. However, programming large-scale machine learning applications is still challenging and requires the manual efforts of developers to achieve good performance.

For example, when parallelizing arrays to hundreds of CPU machines, it is critical to choose a good partition strategy to co-locate the computation arrays to reduce network communication. Unfortunately, existing distributed array frameworks usually use a default partition scheme and requires manually partitioning if another parallel strategy is used, making it less easy to develop a distributed array program. Another example is running deep learning applications with GPU. Modern GPU can be orders of magnitude faster than CPU and becomes an attractive computation resource. Unfortunately, the limited memory size of GPU restricts the scale of the DNN models can be run. It is desired to have a computation framework to allow users to explore deeper and wider DNN models.

Modern distributed frameworks generally adopt a dataflow-style programming paradigm. The dataflow graph of an application exposes valuable information to optimize the application. In this thesis, we present two techniques to address the above issues via dataflow graph analysis.

We first design Spartan to help users parallelize distributed arrays on a CPU cluster. Spartan is a distributed array framework, built on top of a set of higher-order dataflow operators. Based on the operators, Spartan provides a collection of Numpy-like array APIs. Developers can choose the built-in array APIs or directly use the operators to construct machine learning applications. To achieve good performance for the distributed application, Spartan analyzes the communication pattern of the dataflow graph captured through the operators and applies a greedy strategy to find a good partition scheme to minimize the communication cost.

To support memory-intensive deep learning applications on a single GPU, we develop SwapAdvisor, a swapping system that automatically swaps temporarily unused tensors from GPU memory to CPU memory. To minimize the communication overhead, SwapAdvisor analyzes the dataflow graph of the given DNN model and uses a custom-designed genetic algorithm to optimize the operator scheduling and memory allocation. Based on the optimized operator schedule and memory allocation, SwapAdvisor can determine what and when to swap to achieve a good performance.
M.S. Thesis 2019 Leveraging Communication for Efficient Sampling Kapoor, Sanyam Abstract | PDF

Title: Leveraging Communication for Efficient Sampling

Candidate: Kapoor, Sanyam

Advisor(s): Bruna, Joan

Abstract:

Machine Learning has shown promising success tasks like classification, regression and more recently generation. However, long-term planning still remains a challenge for real-world deployment and one of the key components of long-term planning is exploration. In this work, we discuss how communication can be leveraged to improve space exploration. We study this problem from the perspective of sampling from un-normalized density functions.

Hamiltonian Monte Carlo (HMC) finds it improbable to sample from highly separated multi- modal distributions and parallel chains can be wasteful by the nature of Markov chain sampling. We see how replica exchange induces a weak for of communication. This is contrasted with a particle based approach called the Stein Variational Gradient Descent (SVGD) which induces a stronger form of communication via kernel evaluations. The quality of samples from both HMC and SVGD are evaluated with Maximum Mean Discrepancy. We finally propose Graph Neural Networks with stronger inductive biases to amortize the dynamics of SVGD for fast generation of representative samples.
Ph.D. Thesis 2019 Compositional Abstractions for Verifying Concurrent Data Structures Krishna, Siddharth Abstract | PDF

Title: Compositional Abstractions for Verifying Concurrent Data Structures

Candidate: Krishna, Siddharth

Advisor(s): Thomas Wies

Abstract:

Formal verification has had great success in improving the reliability of real-world software, with projects such as ASTREE, CompCert, and Infer showing that rigorous mathematical analysis can handle the scale of today's cyber-infrastructure. However, despite these successes, many core software components are yet to be verified formally. Concurrent data structures are a class of algorithms that are becoming ubiquitous, as software systems seek to make use of the increasingly parallel design of computers and servers. These data structures use sophisticated algorithms to perform fine-grained synchronization between threads, making them notoriously difficult to design correctly, with bugs being found both in actual implementations and in the designs proposed by experts in peer-reviewed publications. The rapid development and deployment of these concurrent algorithms has resulted in a rift between the algorithms that can be verified by the state-of-the-art techniques and those being developed and used today. The goal of this dissertation is to bridge this gap and bring the certified safety of formal verification to the concurrent data structures used in practice.

Permission-based program logics such as separation logic have been established as the standard technique for verifying programs that manipulate complex heap-based data structures. These logics build on so-called separation algebras, which allow expressing properties of heap regions such that modifications to a region do not invalidate properties stated about the remainder of the heap. This concept is key to enabling modular reasoning and also extends to concurrency. However, certain data structure idioms prevalent in real-world programs, especially concurrent programs, are notoriously difficult to reason about, even in these advanced logics (e.g., random access into inductively defined structures, data structure overlays). The underlying issue is that while heaps are naturally related to mathematical graphs, many ubiquitous graph properties are non-local in character. Examples of such properties include reachability between nodes, path lengths, acyclicity and other structural invariants, as well as data invariants which combine with these notions. Reasoning modularly about such global graph properties remains a hard problem, since a local modification can have side-effects on a global property that cannot be easily confined to a small region.

This dissertation addresses the question: What separation algebra can be used to prove that a program maintains a global graph property by reasoning only about the local region modified by the program? We propose a general class of global graph properties, that we call flows, that can be expressed as fixpoints of algebraic equations over graphs. Flows can encode structural properties of the heap (e.g. the reachable nodes from the root form a tree), data invariants (e.g. sortedness), as well as combinations of both shape and data constraints of overlaid structures in a uniform manner. We then introduce the notion of a flow interface, an abstraction of a region in the heap, which expresses the constraints and guarantees between the region and its context with respect to the flow. Under a suitable notion of composition that preserves the flow values, we show that flow interfaces form the desired separation algebra.

Building on our theory of flows, we develop the flow framework, a general proof technique for modular reasoning about global graph properties over program heaps that can be integrated with existing separation logics. We further devise a strategy for automating this technique using SMT-based verification tools. We have implemented this strategy on top of the verification tool Viper and applied it successfully to a variety of challenging benchmarks including 1) algorithms involving general graphs such as Dijkstra's algorithm and a priority inheritance protocol, 2) inductive data structures such as linked lists and B trees, 3) overlaid data structures such as the Harris list and threaded trees, and 4) OO design patterns such as Composite and Subject/Observer. We are not aware of any single other approach that can handle these examples with the same degree of simplicity or automation.

While the flow framework is applicable to any data structure, its features give rise to a new form of modular reasoning for certain concurrent data structures. Concurrent separation logics already apply modularity on multiple levels to simplify correctness proofs, decomposing them according to program structure, program state, and individual threads. Despite these advances, it remains difficult to achieve proof reuse across different data structure implementations. For the large class of concurrent search structures, we demonstrate how one can achieve further proof modularity by decoupling the proof of thread safety from the proof of structural integrity. We base our work on the template algorithms of Shasha and Goodman that dictate how threads interact but abstract from the concrete layout of nodes in memory. By using the flow framework of compositional abstractions in the separation logic Iris, we show how to prove correctness of template algorithms, and how to instantiate them to obtain multiple verified implementations. We demonstrate our approach by formalizing three concurrent search structure templates, based on link, give-up, and lock-coupling synchronization, and deriving implementations based on B-trees, hash tables, and linked lists. These case studies represent algorithms used in real-world file systems and databases, which have so far been beyond the capability of automated or mechanized state-of-the-art verification techniques. Our verification is split between the Coq proof assistant and the deductive verification tool GRASShopper in order to demonstrate that our proof technique and framework can be applied both in fully mechanized proof assistants as well as automated program verifiers. In addition, our approach reduces proof complexity and is able to achieve significant proof reuse.
Ph.D. Thesis 2019 Parallel Contact-Aware Algorithms for Large-Scale Direct Blood Flow Simulations Lu, Libin Abstract | PDF

Title: Parallel Contact-Aware Algorithms for Large-Scale Direct Blood Flow Simulations

Candidate: Lu, Libin

Advisor(s): Zorin, Denis

Abstract:

Experimental and theoretical evidence suggests that blood flow can be well approximated by a mixture model of a Newtonian fluid and deformable particles representing the red blood cells. We use a well-established boundary integral formulation for the problem as the foundation of our approach. This type of formulations, with a high-order spatial discretization and an implicit and adaptive time discretization, have been shown to be able to handle complex interactions between particles with high accuracy. Yet, for dense suspensions, very small time-steps or expensive implicit solves as well as a large number of discretization points are required to avoid non-physical contact and intersections between particles, lead- ing to infinite forces and numerical instability. Given the importance of vesicle flows, in this thesis we focus in efficient numerical methods for such problems: we present computationally parallel-scalable algorithms for the simulation of dense deformable vesicles in two and three dimensions both in unbounded and bounded domain.

Our method maintains the accuracy of previous methods at a significantly lower cost for dense suspensions and the time step size is independent from the volume fraction. The key idea is to ensure interference-free configuration by introducing explicit contact constraints into the system. While such constraints are unnecessary in the formulation, in the discrete form of the problem, they make it possible to eliminate catastrophic loss of accuracy by preventing contact explicitly. Experimental and theoretical evidence suggests that blood flow can be well approximated by a mixture model of a Newtonian fluid and deformable particles representing the red blood cells. We use a well-established boundary integral formulation for the problem as the foundation of our approach. This type of formulations, with a high-order spatial discretization and an implicit and adaptive time discretization, have been shown to be able to handle complex interactions between particles with high accuracy. Yet, for dense suspensions, very small time-steps or expensive implicit solves as well as a large number of discretization points are required to avoid non-physical contact and intersections between particles, lead- ing to infinite forces and numerical instability. Given the importance of vesicle flows, in this thesis we focus in efficient numerical methods for such problems: we present computationally parallel-scalable algorithms for the simulation of dense deformable vesicles in two and three dimensions both in unbounded and bounded domain.

Our method maintains the accuracy of previous methods at a significantly lower cost for dense suspensions and the time step size is independent from the volume fraction. The key idea is to ensure interference-free configuration by introducing explicit contact constraints into the system. While such constraints are unnecessary in the formulation, in the discrete form of the problem, they make it possible to eliminate catastrophic loss of accuracy by preventing contact explicitly.

Introducing contact constraints results in a significant increase in stable time- step size for locally-implicit time-stepping, and a reduction in the number of points adequate for stability. Our method permits simulations with high volume fractions; we report results with up to 60% volume fraction. We demonstrated the parallel v scaling of the algorithms on up to 35K CPU cores.
TR2019-994 2019 The Block FETI--DP/BDDC Preconditioners for Mixed Isogeometric Discretizations of Three-Dimensional Almost Incompressible Elasticity Pavarino, Luca F.; Scacchi, Simone; Widlund, Olof B.; Zampini, Stefano Abstract | PDF

Title: The Block FETI--DP/BDDC Preconditioners for Mixed Isogeometric Discretizations of Three-Dimensional Almost Incompressible Elasticity

Author(s): Pavarino, Luca F.; Scacchi, Simone; Widlund, Olof B.; Zampini, Stefano

Abstract:

A block FETI--DP/BDDC preconditioner for mixed formulations of almost incompressible elasticity are constructed and analyzed; FETI--DP (dual primal finite element tearing and interconnection) and BDDC (balancing domain decomposition by constraints) are two very successful domain decomposition algorithms for a variety of elliptic problems. The saddle point problems of the mixed problems are discretized with mixed isogeometric analysis with continuous pressure fields. As in previous work by Tu and Li (2015), for finite element discretizations of the incompressible Stokes system, the proposed preconditioner is applied to a reduced positive definite system involving only the pressure interface variable and the Lagrange multipliers of the FETI--DP algorithm. The novelty of this preconditioner consists in using BDDC with deluxe scaling for the interface pressure block as well as deluxe scaling for the FETI--DP preconditioner for the Lagrange multiplier block. A convergence rate analysis is presented with a condition number bound for the preconditioned operator which depends on the inf-sup parameter of the fully assembled problem and the condition number of a closely related BDDC algorithm for compressible elasticity. This bound is scalable in the number of subdomains, poly-logarithmic in the ratio of subdomain and element sizes, and robust with respect to material incompressibility and presence of discontinuities of the Lamé parameters across subdomain interfaces. Parallel numerical experiments validate the theory and indicate how the rate of convergence varies with respect to the spline polynomial degree and regularity and the deformation of the domain. Of particular interest is the development of variants of the algorithm with a coarse component of small dimension.
Ph.D. Thesis 2019 Leveraging Program Analysis for Type Inference Pavlinovic, Zvonimir Abstract | PDF

Title: Leveraging Program Analysis for Type Inference

Candidate: Pavlinovic, Zvonimir

Advisor(s): Wies, Thomas

Abstract:

Type inference is a popular feature of programming languages used to automatically guarantee the absence of certain execution errors in programs at compile time. The convenience of type inference, unfortunately, comes with a cost. Developing type inference algorithms is a challenging task that currently lacks a systematic approach. Moreover, programmers often have problems interpreting error reports produced by type inference. The overarching goal of this thesis is to provide a mathematically rigorous framework for the systematic development of sophisticated type inference algorithms that are convenient to use by the programmers. To this end, we focus on two specific problems in this thesis: (1) how to constructively design type inference algorithms that improve over the state-of-the-art and (2) how to automatically debug type errors that arise during inference. We base our approach on the observation that, similar to type inference, program analysis algorithms automatically discover various program properties that can be used to show program correctness. Type inference and program analysis techniques, although similar, have traditionally been developed independently of each other. In contrast, this thesis further explores the recent path of leveraging program analysis for type inference.

As our first contribution, we use abstract interpretation to constructively design type inference algorithms. We specifically focus on Liquid types, an advanced family of algorithms that combine classical typing disciplines and known static analyses to prove various safety properties of functional programs. By using abstract interpretation, we make the design space of Liquid type inference explicit. We also unveil the general type inference framework underlying Liquid types. By properly instantiating this general framework, one obtains novel type inference algorithms that are sound by construction.

Our second contribution is a framework for automatically debugging type errors for languages that deploy type inference in the style of Hindley-Milner, such as OCaml and Haskell. Such languages are notorious for producing cryptic type error reports that are often not helpful in fixing the actual bug. We formulate the problem of finding the root cause of type errors as an optimization problem expressed in a formal logic. We then show how to solve this problem using automated theorem provers. We experimentally illustrate how our framework can efficiently produce type error reports that outperform the state-of-the-art solutions in identifying the true cause of type errors.

In summary, this thesis introduces a mathematical framework for the systematic design of sophisticated type inference algorithms that are sound by construction. Our results further enable automatic generation of more meaningful type error diagnostics, ultimately making type inference more usable by the programmers.
Ph.D. Thesis 2019 Concentration and Anti-concentration for Markov Chains Rao, Shravas Abstract | PDF

Title: Concentration and Anti-concentration for Markov Chains

Candidate: Rao, Shravas

Advisor(s): Regev, Oded

Abstract:

We study tail bounds and small ball probabilities for sums of random variables obtained from a Markov chain. In particular, we consider the following sum \(S_n = f_1(Y_1)+\cdots+f_n(Y_n)\) where \(\{Y_i\}_{i=1}^{\infty}\) is a Markov chain with state space \([N]\), transition matrix \(A\), and stationary distribution \(\mu\) such that \(Y_1\) is distributed as \(\mu\), and \(f_i: [N] \rightarrow \mathbb{R}\). We also consider settings in which \(f_i(Y_i)\) is vector-valued.

In all results, the bounds are in terms of the spectral gap of the Markov chain. In almost all of the results in this thesis, when the transitions are independent and the spectral gap is \(1\), the bounds match the corresponding bounds for independent random variables up to constant factors.

We first obtain tail bounds in the case that only the \(p\)th moment of the random variable \(f_i(Y_i)\) is bounded. This is a Markov chain version of a corollary of the Marcinkiewicz–Zygmund inequality. Using this, we also obtain tail bounds for \(S_n\) when the \(f_i(Y_i)\) are elements of an \(\ell_q\) space.

Next, we obtain sharp tail bounds when the random variables \(f_i(Y_i)\) are bounded and the expected value of \(S_n\) is small. This is a Markov chain version of a Poisson approximation to sums of independent random variables. As an application, we explain how such tail bounds can be used to construct simple and explicit resilient functions that match the non-constructive functions shown to exist due to the work of Ajtai and Linial.

Next, we obtain tail bounds in the case that the \(f_i(Y_i)\) are bounded in the range \([-a_i, a_i]\) for each \(i\). This is a Markov chain version of the Hoeffding inequality. This improves upon previously known bounds in that the dependence is on \(\sqrt{a_1^2+\cdots+a_n^2}\) rather than \(\max_{i}\{a_i\}\sqrt{n}.\) Using this, we obtain tail bounds for certain types of random variables in which the \(f_i(Y_i)\) are elements of any Banach space.

Finally, we show that if the \(f_i(Y_i)\) take on values \(\{-a_i, a_i\}\) with equal probability and the \(a_i\) are Euclidean vectors with norm at least \(1\), the probability that \(S_n\) lies in a ball of volume \(1\) is small. This is a Markov chain version of the Littlewood-Offord inequality. We also construct a new pseudorandom generator for the Littlewood-Offord problem.
M.S. Thesis 2019 Machine Learning Applications to Protein Variant Effect Prediction Soules, Jeffrey Abstract | PDF

Title: Machine Learning Applications to Protein Variant Effect Prediction

Candidate: Soules, Jeffrey

Advisor(s): Bonneau, Richard

Abstract:

Proteins are microscopic machines whose activity forms the basis of all life processes. If a mutation causes variation in the typical amino acid sequence of a protein, the protein’s normal biological function may be compromised. Variant Interpretation and Prediction Using Rosetta (VIPUR) uses sequence and structural data features to predict whether a mutation is deleterious to the protein’s function. VIPUR was originally released with a curated set of protein variants as its training data. As released, it achieved 80% accuracy on this data set. However, the original design was tightly coupled to a logistic regression classifier, so other machine learning techniques could not be easily tested. The reimplementation of VIPUR presented in this work offers a modular design that can be extended with classifiers built on any machine learning approach. It establishes a methodologically sound basis for experimentation with new classifiers, data features, and data sets. This work examines the predictive power of the data features in the original VIPUR training set, and establishes a high baseline for classification performance based on one strongly predictive feature category. The present work includes classifier modules built with four different machine learning approaches—logistic regression, support vector machines, gradient-boosted forests, and neural networks. These represent the two model types considered in the original VIPUR work, and two more recent classifier types. The modules are trained with automated hyperparameter cross-validation and rigorously evaluated with k-fold cross validation, establishing a baseline of performance for future experiments. Results show very slight improvement over the original logistic regression method, consistent with the dominance of a small handful of features in determining classification results. Potential new data features and sources are discussed, which can be used in the new VIPUR design without modification while maintaining backwards compatibility with previously trained classifiers.
Ph.D. Thesis 2019 Approximation algorithms, Hardness, and PCPs Thiruvenkatachari, Devanathan Abstract | PDF

Title: Approximation algorithms, Hardness, and PCPs

Candidate: Thiruvenkatachari, Devanathan

Advisor(s): Khot, Subhash

Abstract:

This thesis is a collection of theoretical results on the topic of approximation algorithms and hardness of approximation. The results presented here use a combination of classical and modern techniques to achieve better approximation algorithms and hardness results for some pivotal NP-hard problems and their variants. We study CSPs from a multi-objective point of view, with the goal of simultaneous optimization of multiple instances over the same set of variables, with MAX-CUT as the central focus. We provide an approximation algorithm that is near optimal assuming the unique games conjecture. We also study PCPs and their role in hardness of approximation, and present a hardness result for 3-LIN in the sub-constant soundness regime. Lastly, dictatorship testing is a property testing problem with significant applications in proving hardness results, and we present an improvement on the soundness of the k-bit dictatorship test with perfect completeness.
Ph.D. Thesis 2019 Tactile Perception Design for Fabrication Tymms, Chelsea Abstract | PDF

Title: Tactile Perception Design for Fabrication

Candidate: Tymms, Chelsea

Advisor(s): Zorin, Denis

Abstract:

High-resolution 3D printing technology provides the ability to manufacture shapes with precise geometry. Controlling this fine-scale geometry to confer haptic qualities is a growing area of research in fabrication. In this thesis, I will present three projects addressing the question of how to fabricate surface textures with controlled tactile properties and exploring how tactile textures can be used in custom manufacturing and to expand the understanding of the human sense of touch.

Surface roughness is one of the most significant qualities in haptic perception, essential to material identification, comfort, and usability. Past perceptual studies on roughness have typically used stimuli that are existing materials or in a narrow range of custom-made materials. In the first project presented in this thesis, we explore the use of 3D printing to manufacture stimuli. We used modeling and 3D printing to manufacture a set of fine parametric bump textures, and we used these texture stimuli in a psychophysical study of human roughness perception. We investigated the contribution of the texton spacing, size, and arrangement to the texture's perceived tactile roughness.

In the second project, we quantitatively address the problem of mapping arbitrary texture geometry to tactile roughness. Drawing from insights in past neurophysiology research, we developed a model that simulates human touch to predict a texture's tactile roughness from its surface geometry. We fabricated a set of 46 parametric and real-life textures, and we used psychophysical experiments with human subjects to place them in the perceptual space for tactile roughness using non-metric multidimensional scaling. We closely match this space with our quantitative model, obtained from strain fields derived from the elasticity simulations of the human skin contacting texture geometry. We demonstrate how this model can be applied to predict and alter surface roughness, and we show several applications in the context of fabrication.

The third project extends these ideas by developing a method to control a texture's haptic qualities and visual appearance at the same time. The tactile feeling and visual appearance of objects often interact in unpredictable ways, and both serve important purposes for identification and usability. In this project, we develop an optimization method to maintain a texture's visual appearance while altering its perceived tactile roughness or tactile temperature. Our optimization method, which is enabled by neural network-based models, allows us to change a texture to a different desired tactile feeling while preserving the visual appearance, at a relatively low computational cost.
M.S. Thesis 2019 Cold Case: The Lost MNIST Digits Yadav, Chhavi Abstract | PDF

Title: Cold Case: The Lost MNIST Digits

Candidate: Yadav, Chhavi

Advisor(s): Fergus, Rob

Abstract:

Although the popular MNIST dataset (LeCun, Cortes, and Burges 1994) is derived from the NIST database (Grother and Hanaoka 1995), the precise processing steps for this derivation have been lost to time. We propose a reconstruction that is accurate enough to serve as a replacement for the MNIST dataset, with insignificant changes in accuracy. We trace each MNIST digit to its NIST source and its rich metadata such as writer identifier, partition identifier, etc. We also reconstruct the complete MNIST test set with 60,000 samples instead of the usual 10,000. Since the balance 50,000 were never distributed, they enable us to investigate the impact of twenty-five years of MNIST experiments on the reported testing performances. Our results unambiguously confirm the trends observed by (Recht et al. 2018; Recht et al. 2019): although the misclassification rates are slightly off, classifier ordering and model selection remain broadly reliable. We attribute this phenomenon to the pairing benefits of comparing classifiers on the same digits.
Ph.D. Thesis 2019 End-to-End Learning for Autonomous Driving Zhang, Jiakai Abstract | PDF

Title: End-to-End Learning for Autonomous Driving

Candidate: Zhang, Jiakai

Advisor(s): Cho, Kyunghyun

Abstract:

The end-to-end learning approach for autonomous driving has sparked great interest in both academic and industry in recent years. The approach can be defined as learning a model that maps from sensory input, such as image frames from a camera, to driving actions for controlling the autonomous vehicle such as steering. Compared to the traditional autonomous driving system, which often includes perception, localization, mapping, and path planning, the end-to-end learning approach offers a more efficient method of utilizing large amounts of expert driver demonstrations to achieve fully autonomous driving without acquiring expensive labeled data such as bounding box for objects.

The end-to-end learning for autonomous driving can be done by supervised learning, where a model is tuned to minimize the difference between predicted actions and ground-truth actions. The ground truth of driving actions is usually obtained from driver demonstrations. A model trained in this way, however, suffers from unexpected behaviors due to the mismatch between the samples visited by a learned model and the samples collected by an expert driver. To address this issue, we first introduce an end-to-end supervised learning approach with data augmentation to train a model to keep a vehicle driving at the center of a lane. The data augmentation is done by synthetically generating new samples through rotating and translating input images captured from a front-facing camera and calculating compensatory steering. We show that using such automatically-augmented data, a trained model can drive a car to follow a lane in various conditions on highways and local and residential roads.

Instead of generating augmented data, we can also collect new samples when trying out the learned model. Aiming to reduce the number of times querying an expert for labeling, we propose SafeDAgger algorithm, which is a query-efficient imitation learning approach. We show that our method significantly reduces the number of querying times and trains a driving model more efficiently. A model trained by our proposed SafeDAgger algorithm can successfully drive a racing car in a simulator to do lane following and overtaking.

The expert demonstrations provided by humans and used for training models often show significant variability due to latent factors. Given such expert demonstrations, a model trained by minimizing the difference between the expert driving actions and predicted driving actions can output dangerous driving actions that may cause serious accidents. We address this issue by introducing a variational mixture density network to model the variability using a discrete latent variable. The experimental results in a racing car simulator show that the model trained using our proposed method can learn the variability of driving signals from expert demonstrations and successfully distinguish certain driving behaviors such as changing lanes and following lanes.

We introduce a simulator to support the development, training, and evaluation of autonomous driving systems using the end-to-end learning approaches. Leveraging this simulator, we demonstrate how to train and evaluate models to drive a truck that follows a navigation map in a video game.

In summary, this thesis introduces the end-to-end learning approaches for autonomous driving to address the data mismatch issue and learn the variability of expert driving actions. Our results show that the trained model can drive the vehicle to follow a lane, change lanes and make turns in simulated driving environments.
Ph.D. Thesis 2019 Text Representation using Convolutional Networks Zhang, Xiang Abstract | PDF

Title: Text Representation using Convolutional Networks

Candidate: Zhang, Xiang

Advisor(s): LeCun, Yann

Abstract:

This dissertation applies convolutional networks for learning representations of text, and it consists of several parts. The first part offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks. These results indicate that using low-level inputs – in this case characters – for convolutional networks could be feasible for text representation learning.

The second part concerns which text encoding method might work for convolutional networks. We include a comprehensive comparison of different encoding methods for the task of text classification using 14 large-scale datasets in 4 languages including Chinese, English, Japanese and Korean. Different encoding levels are studied, including UTF-8 bytes, characters, words, romanized characters and romanized words. For all encoding levels, whenever applicable, we provide comparisons with linear models, fastText and convolutional networks. For convolutional networks, we compare between encoding mechanisms using character glyph images, one-hot (or one-of-n) encoding, and embedding. From these 473 models, one of the conclusions is that byte-level one-hot encoding works consistently best for convolutional networks.

Based on this, in the third part of the dissertation we develop a convolutional network at the level of bytes for learning representations through the task of auto-encoding. The proposed model is a multi-stage deep convolutional encoder-decoder framework using residual connections, containing up to 160 parameterized layers. Each encoder or decoder contains a shared group of modules that consists of either pooling or up-sampling layers, making the network recursive in terms of abstraction levels in representation. The decoding process is non-sequential. Results for 6 large-scale paragraph datasets are reported, in 3 languages including Arabic, Chinese and English. Analyses are conducted to study several properties of the proposed model. Experiments are presented to verify that the auto-encoder can learn useful representations.

In the fourth part of the dissertation, we use the improved design from the previous auto-encoding model to text classification, adding comparisons between residual and dense connections. This further validates the choice of the architecture we made for the auto-encoding model, and the effectiveness of the recursive architecture with residual or dense connections.
Ph.D. Thesis 2019 Unsupervised Learning with Regularized Autoencoders Zhao, Junbo Abstract | PDF

Title: Unsupervised Learning with Regularized Autoencoders

Candidate: Zhao, Junbo

Advisor(s): Yann LeCun

Abstract:

Deep learning has enjoyed remarkable successes in a variety of domains.These successes often emerge at the cost of large annotated datasets and training computationally heavy neural network models.The learning paradigm for this is called \emph{supervised learning}. However, to reduce the sample complexity while improving the universality of the trained models is a crucial next step that may to artificial intelligence. \emph{Unsupervised Learning}, in contrast to supervised learning, aims to build neural network models with more generic loss objectives requiring little or no labelling effort, and therefore it does not reside on any specific domain-task. In spite of the brevity of its goal, unsupervised learning is a broad topic that relates or includes several sub-fields, such as density estimation, generative modeling, world model and etc. In this thesis, we primarily adopt an energy-based view unifying these different fields~\citep{lecun2006tutorial}. A desired energy function reflects the data manifold by differentiating the energy assigned to the points on the data manifold against points off the manifold. With this foundation, we first cast the popular autoencoder and adversarial learning framework into an energy-based perspective, and then propose several technique or architectures with a motivation to learn better-shaped energy function. We also show that the proposed techniques in this thesis cover a wide spectrum of applications including image/text generative modeling, text summarization, style-transfer without aligned data, transfer/semi-supervised learning on both computer vision and natural language processing. The thesis is organized as follows. First, we assess the validity and the main challenges of energy-based learning. We then introduce two frameworks focusing on strengthening autoencoders by building unit connection hierarchies via either hard-coded pooling or self-learned graphs. Finally, we propose several systematic regularization techniques, based on adversarial training and vector discretization.
Ph.D. Thesis 2019 Unsupervised Learning with Regularized Autoencoders Zhao, Junbo Abstract | PDF

Title: Unsupervised Learning with Regularized Autoencoders

Candidate: Zhao, Junbo

Advisor(s): LeCun, Yann

Abstract:

Deep learning has enjoyed remarkable successes in a variety of domains. These successes often emerge at the cost of large annotated datasets and training computationally heavy neural network models. The learning paradigm for this is called supervised learning. However, to reduce the sample complexity while improving the universality of the trained models is a crucial next step that may to artificial intelligence. Unsupervised Learning, in contrast to supervised learning, aims to build neural network models with more generic loss objectives requiring little or no labelling effort, and therefore it does not reside on any specific domain-task. In spite of the brevity of its goal, unsupervised learning is a broad topic that relates or includes several sub-fields, such as density estimation, generative modeling, world model and etc. In this thesis, we primarily adopt an energy-based view unifying these different fields. A desired energy function reflects the data manifold by differentiating the energy assigned to the points on the data manifold against points off the manifold. With this foundation, we first cast the popular autoencoder and adversarial learning framework into an energy-based perspective, and then propose several technique or architectures with a motivation to learn better-shaped energy function. We also show that the proposed techniques in this thesis cover a wide spectrum of applications including image/text generative modeling, text summarization, style-transfer without aligned data, transfer/semi-supervised learning on both computer vision and natural language processing. The thesis is organized as follows. First, we assess the validity and the main challenges of energy-based learning. We then introduce two frameworks focusing on strengthening autoencoders by building unit connection hierarchies via either hard-coded pooling or self-learned graphs. Finally, we propose several systematic regularization techniques, based on adversarial training and vector discretization.