Theses & Reports

Instructions for submitting a technical report or thesis.

You can find technical reports published prior to 1990 archived here.

Title

Authors

Year

Ph.D. Thesis 2020 Out of Distribution Generalization in Machine Learning Arjovsky, Martin Abstract | PDF

Title: Out of Distribution Generalization in Machine Learning

Candidate: Arjovsky, Martin

Advisor(s): Leon Bottou

Abstract:

Machine learning has achieved tremendous success in a variety of
domains in recent years. However, a lot of these success stories have
been in places where the training and the testing distributions are
extremely similar to each other. In everyday situations when models
are tested in slightly different data than it was trained on, ML
algorithms can fail spectacularly. This research attempts to formally
define this problem, what sets of assumptions are reasonable to make
in our data and what kind of guarantees we hope to obtain from them.
Then, we focus on a certain class of out of distribution problems,
their assumptions, and introduce simple algorithms that follow from
these assumptions, and that are able to provide more reliable
generalization. A central topic in the thesis is the strong link
between discovering the causal structure of the data, finding features
that are reliable (when using them to predict) regardless of their
context, and out of distribution generalization.
Ph.D. Thesis 2020 Behavior of the Limited-Memory BFGS Method on Nonsmooth Optimization Problems Asl, Azam Abstract | PDF

Title: Behavior of the Limited-Memory BFGS Method on Nonsmooth Optimization Problems

Candidate: Asl, Azam

Advisor(s): Michael Overton

Abstract:

The limited memory BFGS (Broyden-Fletcher-Goldfarb-Shanno) method,
abbreviated L-BFGS, is widely used for large-scale unconstrained optimization, but its behavior on nonsmooth problems has received little attention. In this thesis we give the first convergence analysis of the L-BFGS method applied to nonsmooth functions. We focus on the simplest version of the method, sometimes known as memoryless BFGS, which uses just one update. L-BFGS can be used with or without “scaling”; the use of scaling is normally recommended. We consider a simple class of convex piecewise linear nonsmooth functions f that are unbounded below. On this class of problems, we show that memoryless BFGS with scaling, using any ArmijoWolfe line search and initialized at any point where f is differentiable, generates iterates that converge to a non-optimal point, if a certain condition
relating the Lipschitz constant of f to the line search Armijo parameter holds. We also present an analysis of the ordinary gradient method with the same line search applied to the same class of functions, giving conditions under which it also fails. However, scaled memoryless BFGS fails under a weaker condition relating the Lipschitz constant of the function to the line search Armijo parameter than that implying failure of the gradient method. Furthermore, in sharp contrast to the gradient method, if a specific standard Armijo-Wolfe bracketing line search is used, scaled memoryless BFGS fails if the Lipschitz constant is sufficiently large regardless of the Armijo
parameter. Our experimental results demonstrate that our analysis is tight on this class of functions, and that similar results likely hold for L-BFGS with any fixed number of updates. In contrast, the “full” BFGS method is remarkably effective for minimizing nonsmooth functions, but it is not a practical approach when the number of variables is large.

We also conduct extensive experiments applying L-BFGS, both unscaled, with various choices for the number of updates, on many other classes of convex nonsmooth functions, ranging from artificially devised, highly ill-conditioned nonsmooth problems to eigenvalue optimization problems that are equivalent to semidefinite programming problems arising from applications. We also apply L-BFGS to smoothed versions of these problems. We find that although L-BFGS is usually a reliable method for minimizing ill-conditioned smooth problems, when the condition number is so large that the function is effectively nonsmooth, L-BFGS consistently fails. This behavior is in sharp contrast to the behavior of full BFGS, which is consistently reliable for nonsmooth optimization problems. We arrive at the conclusion that, for large-scale nonsmooth optimization problems for which BFGS and other methods are not practical, it is far preferable to apply L-BFGS to a smoothed variant of a nonsmooth problem than to apply it directly to the nonsmooth problem.
M.S. Thesis 2020 Cooperation and Deception in multi-agent signaling Enaganti, Inavamsi Abstract | PDF

Title: Cooperation and Deception in multi-agent signaling

Candidate: Enaganti, Inavamsi

Advisor(s): Bhubaneshwar Mishra

Abstract:

We aim to study cooperation and deception in a system with multiple agents through utility and signaling. We start with the classic standard for cooperation namely the ‘Prisoner’s Dilemma’ and then we move on to the ‘Iterated Prisoner’s Dilemma’ which we treat as an iterated version of a signaling Game. This is because the previous actions of an agent are a signal to the opponent about the agent’s type. We then move on to Bio-mimicry and deception where we study dynamics and interesting phenomena that arise due to signaling between Predator and Prey. Cooperation and deception are two sides of a coin and it is imperative to understand both of them as we develop better and more efficient Artificial Intelligence systems.
Ph.D. Thesis 2020 Enhanced Representations for Relations by Multi-task Learning Fu, Lisheng Abstract | PDF

Title: Enhanced Representations for Relations by Multi-task Learning

Candidate: Fu, Lisheng

Advisor(s): Grishman, Ralph

Abstract:

A relation describes the relationship between a pair of entities. Relation Extraction is the process of extracting relations from free text and converting them to structured machine-readable knowledge. This process can facilitate building and extending knowledge bases, and therefore can benefit a variety of natural language processing applications such as Question Answering and Summarization.

Typical relation extraction projects start by defining a relation schema: a set of mutually-exclusive relation types. Based on these definitions, all instances of these relations in a text corpus are labeled by hand, producing a dataset which can be used to train a statistical model. Labeling relations in text is difficult and time-consuming. There only exist limited relation datasets developed in this way. New applications will give rise to new schemas, so the lack of high-quality labeled data is almost inevitable for Relation Extraction.

Despite limited labeled samples in relation datasets, neural net models have been shown to be more effective than traditional methods in learning feature representations with pre-trained word embeddings. In the context of representation learning, this thesis presents multi-task learning frameworks to learn enhanced representations for relations. It shows how to learn better feature representations in both unsupervised and supervised ways. First, the dissertation shows how to learn domain invariant representations using unlabeled entity pairs. Then it shows how to learn a unified encoder by combining multiple annotated datasets. Finally, it shows how to learn the relatedness between relation types across different relation schemas. These techniques improve the relation models without requiring more annotation from the target dataset. The multi-task learning frameworks could be an efficient toolkit for relation extraction in general.
Ph.D. Thesis 2020 Scaling Multi-user Virtual and Augmented Reality Herscher, Sebastian Abstract | PDF

Title: Scaling Multi-user Virtual and Augmented Reality

Candidate: Herscher, Sebastian

Advisor(s): Perlin, Ken

Abstract:

The Virtual and Augmented Reality (XR) ecosystems have been gaining substantial momentum and traction within the gaming, entertainment, enterprise, and training markets in the past half-decade, but have been hampered by limitations in concurrent user count, throughput, and accessibility to mass audiences. Although a litany of XR devices have been made available for public purchase, most XR experiences have been developed for either a single user or a small set of users at a time. Few systems or experiments in co-located XR environments have expanded past a small set of users, leaving the paradigm of being part of a larger virtual audience relatively untested. This thesis presents a set of components, systems, and experiments that assist in the creation, deployment, and scaling of multi-user virtual and augmented reality experiences, and outlines the strengths of techniques found in traditional co-located media for the design space of scaled co-located XR.
M.S. Thesis 2020 Static Responsibility Analysis of Floating-Point Programs Saatcioglu, Goktug Abstract | PDF

Title: Static Responsibility Analysis of Floating-Point Programs

Candidate: Saatcioglu, Goktug

Advisor(s): Thomas Wies

Abstract:

The last decade has seen considerable progress in the analysis of floating-point programs. There now exist frameworks to verify both the total amount of round-off error a program accrues and the robustness of floating-point programs. However, there is a lack of static analysis frameworks to identify causes of erroneous behaviors due to the use of floating-point arithmetic. Such errors are both sporadic and triggered by specific inputs or numbers computed by programs. In this work, we introduce a new static analysis by abstract interpretation to define and detect responsible entities for such behaviors in finite precision implementations. Our focus is on identifying causes of test discontinuity where small differences in inputs may lead to large differences in the control flow of programs causing the computed finite precision path to differ from the same ideal computation carried out in real numbers. However, the analysis is not limited to just discontinuity, as any type of error cause can be identified by the framework. We propose to carry out the analysis by a combination of over-approximating forward partitioning semantics and under-approximating backward semantics of programs, which leads to a forward-backward static analysis with iterated intermediate reduction. This gives a way to the design of a tool for helping programmers identify and fix numerical bugs in their programs due to the use of finite-precision numbers. The implementation of this tool is the next step for this work.
TR2020-995 2020 FireRevit: Using Revit Files to Identify the Room Locations of Fires and Escape Routes Sheng, Luhan; Dennis Shasha Abstract | PDF

Title: FireRevit: Using Revit Files to Identify the Room Locations of Fires and Escape Routes

Author(s): Sheng, Luhan; Dennis Shasha

Abstract:

A Revit file is a proprietary format used by Autodesk Revit to store a building model.
It contains all the information that describes a building model, such as element and
entity data, project location, etc\cite{RN17}. Since 2010, to enable advanced users and
third-party developers to integrate their applications into the Autodesk Revit family of
products, Autodesk has permitted developers to use the API provided by Revit to obtain
building data\citep{RN14}. In fact, one can now process large quantities of Revit files
and extract building information automatically\cite{RN16}.

Based on this, FireRevit consists of a parser for all the building model files in a given
city to get the location of any window in any building and their corresponding rooms, and
create a database to persist the data.

In this way, when a fire breaks out in the city and a drone sighting of the fire gives
latitude,longitude and height, FireRevit can help firefighters determine the building and
room where the fire occurred by retrieving records from the database.

FireRevit also combines the Revit file and information about which rooms are inaccessible
due to fire to guide residents to the nearest exit.
M.S. Thesis 2020 Pointer-Generator Transformers for Morphological Inflection Singer, Assaf Abstract | PDF

Title: Pointer-Generator Transformers for Morphological Inflection

Candidate: Singer, Assaf

Advisor(s): Kyunghyun Cho

Abstract:

In morphologically rich languages, a word's surface form reflects syntactic and semantic properties such as gender, tense or number. For example, most English nouns have both singular and plural forms (e.g., robot/robots, process/processes), which are known as the inflected forms of the noun. The vocabularies of morphologically rich languages, e.g., German or Spanish, are larger than those of morphologically poor languages, e.g., Chinese, if every surface form is considered an independent token. This motivates the development of models that can deal with inflections by either analyzing or generating them and, thus, alleviate the sparsity problem.

This thesis presents approaches to generate morphological inflections. We cast morphological inflection as a sequence-to-sequence problem and apply different versions of the transformer, a state-of-the art deep learning model, to the task. However, for many languages, the availability of morphological lexicons, and, thus, training data for the task, is a big challenge. In our work, we explore different ways to overcome this: 1. We propose a pointer-generator transformer model to allow easy copying of input characters, which is known to improve performance of neural models in the low-resource setting. 2. We implement a system for the task of unsupervised morphological paradigm completion, where systems produce inflections from raw text alone, without relying on morphological information. 3. We explore multitask training and data hallucination pretraining, two methods
which yield more training examples.

With our formulated models and data augmentation methods, we participate in the SIGMORPHON 2020 shared task, and describe the NYU-CUBoulder systems for Task 0 on typologically diverse morphological inflection and Task 2 on unsupervised morphological paradigm completion. Finally, we design a low-resource experiment to show the effectiveness of our proposed approaches for low-resource languages.
M.S. Thesis 2020 Data Flow Refinement Type Inference Tool Drift² Su, Yusen Abstract | PDF

Title: Data Flow Refinement Type Inference Tool Drift²

Candidate: Su, Yusen

Advisor(s): Thomas Wies

Abstract:

Refinement types utilize logical predicate for capturing run-time properties of programs which can be used for program verification. Traditionally, SMT-based checking tools of refinement types such as the implementation of Liquid Types [1] require either heuristics or random sampling logical qualifiers to find the relevant logical predicates.

In this thesis, we describe the implementation of a novel algorithm proposed in Zvonimir Pavlinovic’s PhD thesis "Leveraging Program Analysis for Type Inference" [2], based on the framework of abstract interpretation for inferring refinement types in functional programs. The analysis generalizes Liquid type inference and is parametric with the abstract domain used to express type refinements. The main contribution of this thesis is to achieve the process of instantiating our parametric type analysis and to evaluate the algorithm’s precision and efficiency. Moreover, we describe a tool, called DRIFT², which allows users to select an abstract domain for expressing type refinements and to control the degree to which context-sensitive information is being tracked by the analysis.

Finally, our work compares the precision and efficiency of DRIFT² for different configurations of numerical abstract domains and widening operations [3]. In addition, we compare DRIFT² with existing refinement type inference tools. The experimental results show that our method is both effective and efficient in automatically inferring refinement types.
Ph.D. Thesis 2020 Auditing Outsourced Services Tan, Cheng Abstract | PDF

Title: Auditing Outsourced Services

Candidate: Tan, Cheng

Advisor(s): Michael Walfish

Abstract:

Outsourcing to the cloud is based on assuming that remote servers behave as expected, even under failures, bugs, misconfigurations, operational mistakes, insider threats, and external attacks. Can we instead verify their behavior? There have been various attempts at such verification, but these attempts have had to choose: comprehensive guarantees or good performance? This dissertation studies how to get both.

This dissertation focuses on two essential services: outsourced computation and outsourced databases. Verifying them correspondingly introduces two new abstract problems. We call the first problem the Efficient Server Audit Problem, which examines how to efficiently verify a concurrent and untrusted server. The second problem is verifying a core correctness contract of black-box databases while scaling to real-world online workloads.

To address the two problems, this dissertation respectively introduces two systems: orochi and cobra. Both systems tolerate arbitrary failures in the service provider, and have good performance: in our experiments, orochi’s verifier achieves 5.6–10.9x speedup versus simply re-executing inputs, with less than 10% CPU overhead on the server side; cobra improves over baselines by 10x in verification cost, with modest overhead on clients (less than 5% throughput degradation and about 7% 90-percentile latency increases).
Ph.D. Thesis 2020 Market Efficiency and Dynamics Tao, Yixin Abstract | PDF
Title: Market Efficiency and Dynamics

Candidate: Tao, Yixin

Advisor(s): Richard Cole

Abstract:

General equilibrium theory, initiated by Walras over a century ago, explains the interaction between supply and demand in an economy. In this dissertation, we look at Fisher Markets, which are a particular case of the general equilibrium theory. We consider two issues in Fisher Markets: strategic behavior and dynamics.

Strategic behavior is usually considered in a game, such as auction, in which case, participants in the game may choose not to report their real preferences in order to improve their payoff. In general equilibrium theory, buyers are usually considered to be non-strategic: given the prices, buyers will maximize their true utilities by properly distributing their money on different goods. In this case, the Market equilibrium should be efficient. However, the prices in the market equilibrium are influenced by the demands of the buyers. In principle, buyers can affect prices by changing their demands, which may improve buyers' final utilities. This may result in inefficient outcomes. In this thesis, we investigate this possibility in large Fisher markets. We show that the market will approach full efficiency as the market becomes larger and larger. We also show a similar result for the Walrasian mechanism in large settings.

We also study two dynamics in Fisher Markets in this dissertation:
1. Proportional response is a buyer-oriented dynamics. Each round, buyers update their spending in proportion to the utilities they received in the last round, where prices are the sum of the spendings. This dissertation establishes new convergence results for two generalizations of proportional response in Fisher markets with buyers having CES utility functions. The starting points are respectively a new convex and a new convex-concave formulation of such markets. The two generalizations of proportional response correspond to suitable mirror descent algorithms applied to these formulations. Among other results, we analyze a damped generalized proportional response and show a linear rate of convergence in a Fisher market with buyers whose utility functions cover the full spectrum of CES utilities aside the extremes of linear and Leontief utilities; when these utilities are included, we obtain an empirical O(1 / T) rate of convergence.
2. Tatonnement is considered the most natural dynamics in Fisher Markets: the price of a good is raised if the demand exceeds the supply of the good, and decreased if it is too small. Implicitly, buyers' demands are assumed to be a best-response to the current prices. This dissertation addresses a lack of robustness in existing convergence results for discrete forms of tatonnement, including the fact that it need not converge when buyers have linear utility functions. This dissertation shows that for Fisher markets with buyers having CES utility functions, including linear utility functions, tatonnement will converge quickly to an approximate equilibrium (i.e., at a linear rate), modulo a suitable large market assumption. The quality of the approximation is a function of the parameters of the large market assumption.
Ph.D. Thesis 2020 Flexible and Efficient Systems for Training Emerging Deep Neural Networks Wang, Minjie Abstract | PDF

Title: Flexible and Efficient Systems for Training Emerging Deep Neural Networks

Candidate: Wang, Minjie

Advisor(s): Li, Jinyang

Abstract:

The success of deep neural networks (DNNs) is due to its strong capability to learn from data. To leverage more data requires larger models that may exceed the capacity of a single computing device. To leverage graph structured data demands models of sparse computation pattern. Unfortunately, current deep learning systems limit the exploration of such models, causing disturbing user experience. This thesis proposes a system design to guide the development of new deep learning systems. The goal of this design is to enable efficient training of these emerging DNNs with little user effort.

We then realize the design in two systems, Tofu and DGL. Tofu partitions very large DNNs across multiple GPUs to reduce per-GPU memory footprint. To automatically partition each operator, we propose a description language for annotating the semantics of an operator. To optimally partition the whole training, Tofu proposes an algorithm that minimizes the total communication cost. We evaluate and assess the capability of Tofu to train very models demonstrating the substantial gains by applying the design. We then implement DGL, a new framework for training DNNs for graph structured data. DGL provides an intuitive and expressive interface that can cover a wide range of graph DNN models. We introduce batching and kernel fusion techniques that enable training GNNs on large graphs and achieve significant improvements in performance relative to existing systems.
M.S. Thesis 2020 Are the proposed similarity metrics also a measure of functional similarity? Yellapragada, Manikanta Srikar Abstract | PDF

Title: Are the proposed similarity metrics also a measure of functional similarity?

Candidate: Yellapragada, Manikanta Srikar

Advisor(s): Kyunghyun Cho

Abstract:

A recent body of work attempts to understand the behavior and training dynamics of neural networks by analyzing intermediate representations and designing metrics to defi ne the similarity between those representations. We observe that the representations of the last layer could be thought of as the functional output of the model up to that point. In this work, we investigate if the similarity between these representations can be considered a stand-in for the similarity of the networks' output functions. This can have an impact for many downstream tasks, but we specifically analyze it in the context of transfer learning. Consequently, we perform a series of experiments to understand the relationship between the representational similarity and the functional similarity of neural networks. We show in two ways that the leading metric for representational similarity, CKA, does not bear a strict relationship with functional similarity.