Theses & Reports

Instructions for submitting a technical report or thesis.

You can find technical reports published prior to 1990 archived here.

Title

Authors

Year

TR2018-990 2018 Platform Migrator Contractor, Munir; Pradal, Christophe; Shasha, Dennis Abstract | PDF

Title: Platform Migrator

Author(s): Contractor, Munir; Pradal, Christophe; Shasha, Dennis

Abstract:

Currently, one of the major problems in software development and maintenance, specially in academia, is managing packages across time and systems. An application developed under a particular package manager using a certain set of packages does not always work reliably when ported to a different system or when abandoned for a period of time and picked up again with newer versions of the packages. In this report, we provide and describe Platform Migrator, a software that makes it easy to test applications across systems by identifying various packages in the base system, figuring out their corresponding equivalents in the new system and testing whether the software works as expected on the new system. Platform migrator can migrate software written and set up inside a conda environment to any Linux based system with conda or some other package manager. The philosophy of platform migrator is to identify a closure of the required dependencies for the software being migrated using the conda environment metadata and then use that closure to install the various dependencies on the target system. This documentation provides comprehensive details on how to use platform migrator and what it does internally to migrate software from one system to another. It also contains tutorials and case studies that can be replicated for better understanding of the process.
Ph.D. Thesis 2018 Deep Generative Models of Images and Video Denton, Emily Lynn Abstract | PDF

Title: Deep Generative Models of Images and Video

Candidate: Denton, Emily Lynn

Advisor(s): Fergus, Rob

Abstract:

Deep neural networks have seen wide success in the supervised setting in recent years. Many of these successes rely heavily on large training sets of manually annotated data. Given the difficulty of obtaining enough labeled data to scale many deep learning approaches, it is increasingly important to look for better methods of utilizing large amounts of unlabeled data. Building generative models of images and video is a fundamental paradigm of learning from unlabeled data. Unsupervised criterion based on generating or reconstructing images drive many representation learning frameworks. Video is a particularly appealing domain for unsupervised learning due to the inherent temporal structure of the data. This structure lends itself to representation learning approaches based on extracting invariances and predicting future frames, given the past.

Additionally, building accurate models of the world that facilitate future prediction can be useful for model based reinforcement learning, planning, and more generally, endowing an agent with the capacity to reason about its environment. Incorporating predictive models can potentially help alleviate the sample inefficiency of many reinforcement learning systems.

In this thesis, we review the challenges associated with generating images and videos. We then introduce a multi-scale image generation framework that demonstrates impressive performance on real world image datasets. This method was the first to demonstrate empirically the potential of generative adversarial networks. We also address two challenging aspects of video generation:learning a latent space that affords easier prediction and modeling the uncertainty in video sequences.
M.S. Thesis 2018 Detecting Dead Weights and Units in Neural Networks Evci, Utku Abstract | PDF

Title: Detecting Dead Weights and Units in Neural Networks

Candidate: Evci, Utku

Advisor(s): Fergus, Rob

Abstract:

Deep Neural Networks are highly over-parameterized and the size of the neural networks can be reduced significantly after training without any decrease in performance. One can clearly see this phenomenon in a wide range of architectures trained for various problems. Weight/channel pruning, distillation, quantization, matrix factorization are some of the main methods one can use to remove the redundancy to come up with smaller and faster models.

This work starts with a short informative chapter, where we motivate the pruning idea and provide the necessary notation. In the second chapter, we compare various saliency scores in the context of parameter pruning. Using the insights obtained from this comparison and stating the problems it brings we motivate why pruning units instead of the individual parameters might be a better idea. We propose some set of definitions to quantify and analyze units that don't learn and create any useful information. We propose an efficient way for detecting dead units and use it to select which units to prune. We get 5x model size reduction through unit-wise pruning on MNIST.
Ph.D. Thesis 2018 Deep Networks for Forward Prediction and Planning Henaff, Mikael Bruce Abstract | PDF

Title: Deep Networks for Forward Prediction and Planning

Candidate: Henaff, Mikael Bruce

Advisor(s): LeCun, Yann

Abstract:

Learning to predict how an environment will evolve and the consequences of one’s actions is an important ability for autonomous agents, and can enable planning with relatively few interactions with the environment which may be slow or costly. However, learning an accurate forward model is often difficult in practice due to several features often present in complex environments. First, many environments exhibit long-term dependencies which require the system to learn to record and maintain relevant information in its memory over long timescales. Second, the environment may only be partially observed, and the aspects of the environment which are observed may depend on parts of the environment which are hidden. Third, many observed processes contain some form of apparent or inherent stochasticity, which makes the task of predicting future states ill-defined. In this thesis, we propose approaches to tackle and better understand these different challenges associated with learning predictive models of the environment and using them for planning. We first provide an analysis of recurrent neural network (RNN) memory, which sheds light on the mechanisms by which RNNs are able to store different types of information in their memory over long timescales through the analysis of two synthetic benchmark tasks. We then introduce a new neural network architecture which keeps an estimate of the state of the environment in its memory, and can deal with partial observability by reasoning based on what is observed. We next present a new method for performing planning using a learned model of the environment with both discrete and continuous actions. Finally, we propose an approach for model-based planning in the presence of both environment uncertainty and model uncertainty, and evaluate it on a new real-world dataset and environment with applications to autonomous driving.
Ph.D. Thesis 2018 Learning Representations of Text through Language and Discourse Modeling: From Characters to Sentences Jernite, Yacine Abstract | PDF

Title: Learning Representations of Text through Language and Discourse Modeling: From Characters to Sentences

Candidate: Jernite, Yacine

Advisor(s): Sontag, David

Abstract:

In this thesis, we consider the problem of obtaining a representation of the meaning expressed in a text. How to do so correctly remains a largely open problem, combining a number of inter-related questions (e.g. what is the role of context in interpreting text? how should language understanding models handle compositionality? etc...) In this work, after reflecting on some of these questions and describing the most common sequence modeling paradigms in use in recent work, we focus on two specifically: what level of granularity text should be read at, and what training objectives can lead models to learn useful representations of a text’s meaning.

In a first part, we argue for the use of sub-word information for that purpose, and present new neural network architectures which can either process words in a way that takes advantage of morphological information, or do away with word separations altogether while still being able to identify relevant units of meaning.

The second part starts by arguing for the use of language modeling as a learning objective, and provides algorithms which can help with its scalability issues and propose a globally rather than locally normalized probability distribution. It then explores the question of what makes a good language learning objective, and introduces discriminative objectives inspired by the notion of discourse coherence which help learn a representation of the meaning of sentences.
Ph.D. Thesis 2018 Deep Learning for Information Extraction Nguyen, Thien Huu Abstract | PDF

Title: Deep Learning for Information Extraction

Candidate: Nguyen, Thien Huu

Advisor(s): Grishman, Ralph

Abstract:

The explosion of data has made it crucial to analyze the data and distill important information effectively and efficiently. A significant part of such data is presented in unstructured and free-text documents. This has prompted the development of the techniques for information extraction that allow computers to automatically extract structured information from the natural free-text data. Information extraction is a branch of natural language processing in artificial intelligence that has a wide range of applications, including question answering, knowledge base population, information retrieval etc. The traditional approach for information extraction has mainly involved hand-designing large feature sets (feature engineering) for different information extraction problems, i.e, entity mention detection, relation extraction, coreference resolution, event extraction, and entity linking. This approach is limited by the laborious and expensive effort required for feature engineering for different domains, and suffers from the unseen word/feature problem of natural languages.

This dissertation explores a different approach for information extraction that uses deep learning to automate the representation learning process and generate more effective features. Deep learning is a subfield of machine learning that uses multiple layers of connections to reveal the underlying representations of data. I develop the fundamental deep learning models for information extraction problems and demonstrate their benefits through systematic experiments.

First, I examine word embeddings, a general word representation that is produced by training a deep learning model on a large unlabelled dataset. I introduce methods to use word embeddings to obtain new features that generalize well across domains for relation extraction. This is done for both the feature-based method and the kernel-based method of relation extraction.

Second, I investigate deep learning models for different problems, including entity mention detection, relation extraction and event detection. I develop new mechanisms and network architectures that allow deep learning to model the structures of information extraction problems more effectively. Some extensive experiments are conducted on the domain adaptation and transfer learning settings to highlight the generalization advantage of the deep learning models for information extraction.

Finally, I investigate the joint frameworks to simultaneously solve several information extraction problems and benefit from the inter-dependencies among these problems. I design a novel memory augmented network for deep learning to properly exploit such inter-dependencies. I demonstrate the effectiveness of this network on two important problems of information extraction, i.e, event extraction and entity linking.
M.S. Thesis 2018 Classifying the Quality of Movement via Motion Capture and Machine Learning Saxe, Ryan Abstract | PDF

Title: Classifying the Quality of Movement via Motion Capture and Machine Learning

Candidate: Saxe, Ryan

Advisor(s): Shasha, Dennis

Abstract:

With the recent surge of Machine Vision technology and available video data, computational methods that utilize this data are becoming increasingly important. This Thesis shows that, with the proper application of Skeletal Tracking, it is possible to discern whether or not a physical task — a squat — is performed well. The Skeletal Tracking software employed is provided by Optitrack’s Motion Capture client, Motive:Body. The data generated from Optitrack was used to extract features related to the proper execution of a squat. This thesis uses a variety of machine learning techniques to evalute the quality of physical performance. Support Vector Machines, Random Forests, and Decision Tree algorithms were tested with ten-fold cross validation, and compared to a baseline of Logistic Regression given the binary nature of the problem. While Regression performed at 66% accuracy, all three other algorithms performed substantially better, with Decision Trees performing best at 80%.
Ph.D. Thesis 2018 Accelerating Approximate Simulation with Deep Learning Schlachter, Kristofer Abstract | PDF

Title: Accelerating Approximate Simulation with Deep Learning

Candidate: Schlachter, Kristofer

Advisor(s): Perlin, Ken

Abstract:

Once a simulation resorts to an approximate numerical solution one is faced with various tradeoffs in accuracy versus computation time. We propose that another approximate solution can be learned for two chosen simulations, which in our case, are just as useful but can be made faster to compute. The two problems addressed in this thesis are fluid simulation and the simulation of diffuse inter-reflection in computer graphics.

Real-time simulation of fluid and smoke is a long standing problem in computer graphics, where state-of-the-art approaches require large compute resources, making real-time applications often impractical. In this work, we propose a data-driven approach that leverages the approximation power of deep-learning methods with the precision of standard fluid solvers to obtain both fast and highly realistic simulations. The proposed method solves the incompressible Euler equations following the standard operator splitting method in which a large, often ill-condition linear system must be solved. We propose replacing this system by learning a Convolutional Network (ConvNet) from a training set of simulations using a semi-supervised learning method to minimize long-term velocity divergence.

ConvNets are amenable to efficient GPU implementations and, unlike exact iterative solvers, have fixed computational complexity and latency. The proposed hybrid approach restricts the learning task to a linear projection without modeling the well understood advection and body forces. We present real-time 2D and 3D simulations of fluids and smoke; the obtained results are realistic and show good generalization properties to unseen geometry.

The next simulation that we address is the synthesis of images for training convnets. A challenge with training deep learning models is that they commonly require a large corpus of training data and retrieving sufficient real world data may be unachievable. A solution to this problem can be found in the use of synthetic or simulated training data. However, for simulated photographs or renderings, there hasn't been a systematic approach to comparing the relative benefits of different techniques in image synthesis.

We compare multiple synthesis techniques to one another as well as the real data that they seek to replicate. We also introduce learned synthesis techniques that either train models better than the most realistic graphical methods used by standard rendering packages or else approach their fidelity using far less computation. We accomplish this by learning shading of geometry as well as denoising the results of low sample Monte Carlo image synthesis. Our major contributions are (i) a dataset that allows comparison of real and synthetic versions of the same scene, (ii) an augmented data representation that boosts the stability of learning, and (iii) three different partially differentiable rendering techniques where lighting, denoising and shading are learned. Finally we are able to generate datasets that can outperform full global illumination rendering and approach the performance of training on real data.
TR2018-989 2018 On the Solution of Elliptic Partial Differential Equations on Regions with Corners III: Curved Boundaries Serkh, Kirill Abstract | PDF

Title: On the Solution of Elliptic Partial Differential Equations on Regions with Corners III: Curved Boundaries

Author(s): Serkh, Kirill

Abstract:

In this report we investigate the solution of boundary value problems for elliptic partial differential equations on domains with corners. Previously, we observed that when, in the case of polygonal domains, the boundary value problems are formulated as boundary integral equations of classical potential theory, the solutions are representable by series of certain elementary functions. Here, we extend this observation to the general case of regions with boundaries consisting of analytic curves meeting at corners. We show that the solutions near the corners have the same leading terms as in the polygonal case, plus a series of corrections involving products of the leading terms with integer powers and powers of logarithms. Furthermore, we show that if the curve in the vicinity of a corner approximates a polygon to order \(k\), then the correction added to the leading terms will vanish like \(O(t^k)\), where \(t\) is the distance from the corner.
TR2018-991 2018 Robotic Room Traversal using Optical Range Finding Smith, Cole; Lin, Eric; Shasha, Dennis Abstract | PDF

Title: Robotic Room Traversal using Optical Range Finding

Author(s): Smith, Cole; Lin, Eric; Shasha, Dennis

Abstract:

Consider the goal of visiting every part of a room that is not blocked by obstacles. Doing so efficiently requires both sensors and planning. Our findings suggest a method of inexpensive optical range finding for robotic room traversal. Our room traversal algorithm relies upon the approximate distance from the robot to the nearest obstacle in 360 degrees. We then choose the path with the furthest approximate distance. Since millimeter-precision is not required for our problem, we have opted to develop our own laser range finding solution, in lieu of using more common, but also expensive solutions like light detection and ranging (LIDAR). Rather, our solution uses a laser that casts a visible dot on the target and a common camera (an iPhone, for example). Based upon where in the camera frame the laser dot is detected, we may calculate an angle between our target and the laser aperture. Using this angle and the known distance between the camera eye and the laser aperture, we may solve all sides of a trigonometric model which provides the distance between the robot and the target.
Ph.D. Thesis 2018 Elements of Intelligence: Memory, Communication and Intrinsic Motivation Sukhbaatar, Sainbayar Abstract | PDF

Title: Elements of Intelligence: Memory, Communication and Intrinsic Motivation

Candidate: Sukhbaatar, Sainbayar

Advisor(s): Fergus, Rob

Abstract:

Building an intelligent agent that can learn and adapt to its environment has always been a challenging task. This is because intelligence consists of many different elements such as recognition, memory, and planning. In recent years, deep learning has shown impressive results in recognition tasks. The aim of this thesis is to advance the deep learning techniques to other elements of intelligence.

We start our investigation with memory, an integral part of intelligence that bridges past experience with current decision making. In particular, we focus on the episodic memory, which is responsible for storing our past experiences and recalling them. An agent without such memory will struggle at many tasks such as having a coherent conversation. We show that a neural network with an external memory is better at such tasks, outperforming traditional recurrent networks with an internal memory.

Another crucial ingredient of intelligence is the capability to communicate with others. In particular, communication is essential for cooperative tasks, enabling agents to better collaborate and improve their division of labor. We investigate whether agents can learn to communicate from scratch without any external supervision. Our finding is that communication through a continuous vector facilitates faster learning by allowing gradients to flow between agents.

Lastly, an intelligent agent must have an intrinsic motivation to learn about its environment on its own without any external supervision or rewards. Our investigation led to one such learning strategy where an agent plays a two-role game with itself. The first role proposes a task, and the second role tries to execute it. Since their goal is to make the other fail, their adversarial interplay pushes them to explore increasingly complex tasks, which results in a better understanding of the environment.
Ph.D. Thesis 2018 Rethinking Customer Segmentation and Demand Learning in the Presence of Sparse, Diverse, and Large-scale Data Venkataraman, Ashwin Abstract | PDF

Title: Rethinking Customer Segmentation and Demand Learning in the Presence of Sparse, Diverse, and Large-scale Data

Candidate: Venkataraman, Ashwin

Advisor(s): Jagabathula, Srikanth; Subramanian, Lakshminarayanan

Abstract:

Firms are now able to collect unprecedented amounts of data. This wealth of data provides new opportunities and capabilities for the firm to better solve classical problems within operational and marketing contexts, such as customer segmentation and demand learning. At the same time, the data imposes new challenges. In addition to its large-scale nature which creates computational issues, the data comes from a diversity of sources, varying in their respective measurement scales (e.g., clicks, ratings, purchase signals, etc.), and is typically sparse, containing a large fraction of missing observations. The diversity in the data makes it hard to directly compare different observations (clicks vs purchases, for instance) and the severe sparsity precludes any meaningful imputations of unobserved entries. The data also comes from unreliable sources, which introduce both unintentional and deliberate errors. The identities of such sources is very often unknown, which makes it difficult to determine which sources to trust.

These data challenges require a rethink of traditional techniques for customer segmentation and demand learning. Given their importance and widespread use, this dissertation revisits the classical problems of customer segmentation and demand learning but in the presence of sparse, diverse, and large-scale data. The key contribution of the dissertation is a suite of novel methodologies to deal with the challenges described above.

Part I of the dissertation focuses on the problem of customer segmentation. In Chapter 1, we consider the problem of segmenting (or clustering) a large population of customers based on their preferences, when the preference signals (e.g., clicks, ratings, etc.) come from a multitude of diverse data sources and each customer provides only a few observations. These data characteristics preclude the applicability of traditional marketing techniques as well as standard clustering approaches in machine learning. We propose a model-based embedding technique which takes the customer observations and a probabilistic model class generating the observations as inputs, and outputs an embedding—a low-dimensional vector representation in Euclidean space—for each customer. We then cluster the embeddings to obtain the segments. We show that our segmentation technique can be used to generate highly accurate personalized recommendations in two real-world case studies, including up to 8% improvement over the existing approach on an eBay dataset consisting of millions of customers and items. In addition, it outperforms (both in speed and accuracy) standard techniques in marketing and machine learning.

In Chapter 2, we turn our attention to the domain of crowdsourced labeling, which provides a low-cost, easy and scalable way to collect labels from the crowd—composed of "workers"—which are then aggregated and used as inputs for training machine learning applications. The main challenge is that workers are often unreliable, and therefore can introduce unintentional or even intentional errors into the labels. The reliabilities of the workers are a priori unknown, so correctly aggregating the labels becomes difficult. We propose algorithms to separate the worker population into two segments, what we call "honest" and "adversarial" workers. Honest workers can provide incorrect labels, but their errors are probabilistic and therefore, can be corrected. Adversarial workers, on the other hand, adopt arbitrary labeling strategies (whether deterministic or probabilistic) and therefore, their labels cannot be trusted. We demonstrate that discarding the labels provided by even a few adversarial workers can significantly improve the accuracy of several existing approaches for aggregating the labels in real-world crowdsourcing datasets.

Part II is devoted to demand learning. In Chapter 3, we consider the problem of learning customer demand for a set of substitutable products. Within operations, the customer demand is typically modeled using a mixture of logit models, which can capture heterogeneity as well as rich substitution patterns in customer preferences. The mixture model is fit to historical sales transactions and inventory data, and the fitted model is used to inform pricing and assortment decisions. We propose a novel nonparametric estimator for the mixture of logit models, providing the ability to make effective use of the large amounts of transaction data that firms have access to. By contrast, most existing techniques impose parametric assumptions—usually driven by tractability considerations—on the mixing distribution, and consequently can suffer from model misspecification issues. We show that our estimator is able to recover good approximations of different ground-truth mixing distributions—despite having no knowledge of their underlying structure—and outperforms the standard expectation-maximization (EM) benchmark in predictive and decision accuracies, while being an order of magnitude faster.