Title: Learning Representations of Text through Language and Discourse Modeling: From Characters to Sentences
Candidate: Jernite, Yacine
Advisor(s): Sontag, David
In this thesis, we consider the problem of obtaining a representation of the meaning expressed in a text. How to do so correctly remains a largely open problem, combining a number of inter-related questions (e.g. what is the role of context in interpreting text? how should language understanding models handle compositionality? etc...) In this work, after reflecting on some of these questions and describing the most common sequence modeling paradigms in use in recent work, we focus on two specifically: what level of granularity text should be read at, and what training objectives can lead models to learn useful representations of a text’s meaning.
In a first part, we argue for the use of sub-word information for that purpose, and present new neural network architectures which can either process words in a way that takes advantage of morphological information, or do away with word separations altogether while still being able to identify relevant units of meaning.
The second part starts by arguing for the use of language modeling as a learning objective, and provides algorithms which can help with its scalability issues and propose a globally rather than locally normalized probability distribution. It then explores the question of what makes a good language learning objective, and introduces discriminative objectives inspired by the notion of discourse coherence which help learn a representation of the meaning of sentences.
Title: Deep Learning for Information Extraction
Candidate: Nguyen, Thien Huu
Advisor(s): Grishman, Ralph
The explosion of data has made it crucial to analyze the data and distill important information effectively and efficiently. A significant part of such data is presented in unstructured and free-text documents. This has prompted the development of the techniques for information extraction that allow computers to automatically extract structured information from the natural free-text data. Information extraction is a branch of natural language processing in artificial intelligence that has a wide range of applications, including question answering, knowledge base population, information retrieval etc. The traditional approach for information extraction has mainly involved hand-designing large feature sets (feature engineering) for different information extraction problems, i.e, entity mention detection, relation extraction, coreference resolution, event extraction, and entity linking. This approach is limited by the laborious and expensive effort required for feature engineering for different domains, and suffers from the unseen word/feature problem of natural languages.
This dissertation explores a different approach for information extraction that uses deep learning to automate the representation learning process and generate more effective features. Deep learning is a subfield of machine learning that uses multiple layers of connections to reveal the underlying representations of data. I develop the fundamental deep learning models for information extraction problems and demonstrate their benefits through systematic experiments.
First, I examine word embeddings, a general word representation that is produced by training a deep learning model on a large unlabelled dataset. I introduce methods to use word embeddings to obtain new features that generalize well across domains for relation extraction. This is done for both the feature-based method and the kernel-based method of relation extraction.
Second, I investigate deep learning models for different problems, including entity mention detection, relation extraction and event detection. I develop new mechanisms and network architectures that allow deep learning to model the structures of information extraction problems more effectively. Some extensive experiments are conducted on the domain adaptation and transfer learning settings to highlight the generalization advantage of the deep learning models for information extraction.
Finally, I investigate the joint frameworks to simultaneously solve several information extraction problems and benefit from the inter-dependencies among these problems. I design a novel memory augmented network for deep learning to properly exploit such inter-dependencies. I demonstrate the effectiveness of this network on two important problems of information extraction, i.e, event extraction and entity linking.
Title: Accelerating Approximate Simulation with Deep Learning
Candidate: Schlachter, Kristofer
Advisor(s): Perlin, Ken
Once a simulation resorts to an approximate numerical solution one is faced with various tradeoffs in accuracy versus computation time. We propose that another approximate solution can be learned for two chosen simulations, which in our case, are just as useful but can be made faster to compute. The two problems addressed in this thesis are fluid simulation and the simulation of diffuse inter-reflection in computer graphics.
Real-time simulation of fluid and smoke is a long standing problem in computer graphics, where state-of-the-art approaches require large compute resources, making real-time applications often impractical. In this work, we propose a data-driven approach that leverages the approximation power of deep-learning methods with the precision of standard fluid solvers to obtain both fast and highly realistic simulations. The proposed method solves the incompressible Euler equations following the standard operator splitting method in which a large, often ill-condition linear system must be solved. We propose replacing this system by learning a Convolutional Network (ConvNet) from a training set of simulations using a semi-supervised learning method to minimize long-term velocity divergence.
ConvNets are amenable to efficient GPU implementations and, unlike exact iterative solvers, have fixed computational complexity and latency. The proposed hybrid approach restricts the learning task to a linear projection without modeling the well understood advection and body forces. We present real-time 2D and 3D simulations of fluids and smoke; the obtained results are realistic and show good generalization properties to unseen geometry.
The next simulation that we address is the synthesis of images for training convnets. A challenge with training deep learning models is that they commonly require a large corpus of training data and retrieving sufficient real world data may be unachievable. A solution to this problem can be found in the use of synthetic or simulated training data. However, for simulated photographs or renderings, there hasn't been a systematic approach to comparing the relative benefits of different techniques in image synthesis.
We compare multiple synthesis techniques to one another as well as the real data that they seek to replicate. We also introduce learned synthesis techniques that either train models better than the most realistic graphical methods used by standard rendering packages or else approach their fidelity using far less computation. We accomplish this by learning shading of geometry as well as denoising the results of low sample Monte Carlo image synthesis. Our major contributions are (i) a dataset that allows comparison of real and synthetic versions of the same scene, (ii) an augmented data representation that boosts the stability of learning, and (iii) three different partially differentiable rendering techniques where lighting, denoising and shading are learned. Finally we are able to generate datasets that can outperform full global illumination rendering and approach the performance of training on real data.
Title: Elements of Intelligence: Memory, Communication and Intrinsic Motivation
Candidate: Sukhbaatar, Sainbayar
Advisor(s): Fergus, Rob
Building an intelligent agent that can learn and adapt to its environment has always been a challenging task. This is because intelligence consists of many different elements such as recognition, memory, and planning. In recent years, deep learning has shown impressive results in recognition tasks. The aim of this thesis is to advance the deep learning techniques to other elements of intelligence.
We start our investigation with memory, an integral part of intelligence that bridges past experience with current decision making. In particular, we focus on the episodic memory, which is responsible for storing our past experiences and recalling them. An agent without such memory will struggle at many tasks such as having a coherent conversation. We show that a neural network with an external memory is better at such tasks, outperforming traditional recurrent networks with an internal memory.
Another crucial ingredient of intelligence is the capability to communicate with others. In particular, communication is essential for cooperative tasks, enabling agents to better collaborate and improve their division of labor. We investigate whether agents can learn to communicate from scratch without any external supervision. Our finding is that communication through a continuous vector facilitates faster learning by allowing gradients to flow between agents.
Lastly, an intelligent agent must have an intrinsic motivation to learn about its environment on its own without any external supervision or rewards. Our investigation led to one such learning strategy where an agent plays a two-role game with itself. The first role proposes a task, and the second role tries to execute it. Since their goal is to make the other fail, their adversarial interplay pushes them to explore increasingly complex tasks, which results in a better understanding of the environment.
Title: Predictive Analytics from Noisy, Diverse, Incomplete and Heterogeneous Data
Candidate: Venkataraman, Ashwin
Advisor(s): Jagabathula, Srikanth; Subramanian, Lakshminarayanan
The unique characteristics of big data complicate the extraction of critical information needed for better decision-making. For instance, data comes in diverse types, which vary in the levels of noise and their respective measurement scales (e.g., clicks, ratings, purchase signals, etc.). The data also originates from heterogeneous sources or is incomplete, with several missing observations.
The above data characteristics require us to revisit and rethink many traditional problems because existing techniques fall short. This dissertation revisits two classical problems: (1) segmenting (or clustering) heterogeneous data sources, and (2) learning mixture models. It provides novel, principled, and scalable solutions to deal with the challenges described above.
The first part considers the problem of segmenting customers based on their preferences when the preference signals (e.g., clicks, ratings, etc.) come from a multitude of "big" and "complicated" data sources. We propose a model-based embedding technique which takes the customer preference signals and a probabilistic model class generating the signals as inputs, and outputs an embedding---a low-dimensional representation in Euclidean space---for each customer. We then cluster the embeddings to obtain the segments. We derive precise necessary and sufficient conditions that guarantee asymptotic recovery of the true segments. Using two case studies, including a real-world implementation on eBay data, we show that our method outperforms standard latent class, empirical bayesian and demographic-based techniques.
We also apply similar ideas to propose algorithms for clustering workers based on their "quality" or "reputation" in crowdsourced labeling tasks, where the (noisy) collected labels are aggregated for inferring the (unknown) true labels. Theoretically, we show that our algorithms successfully identify unreliable workers, workers adopting deterministic strategies, and worst-case sophisticated adversaries who can adopt arbitrary labeling strategies to degrade the accuracy of the inferred task labels. Empirically, we show that filtering out low quality workers identified by our algorithms can significantly improve the accuracy of several state-of-the-art label aggregation algorithms in real-world crowdsourcing datasets.
The second part considers the problem of learning mixture models and proposes a new methodology for nonparametric estimation of the mixing distribution. Our main contributions are two-fold: (a) formulating the likelihood-based estimation problem as a constrained convex program and (b) applying the conditional gradient (aka Frank-Wolfe) algorithm to solve this convex program. We show that our method iteratively generates the support of the mixing distribution, so that the algorithm may be terminated at the desired number of mixture components. For mixtures of logit models, we establish that our method has a sublinear rate of convergence to the optimal solution of the convex program. We also characterize the structure of the estimated mixing distribution and show that it bears close resemblance to the notion of consideration sets in existing literature. We test our approach using two case studies on real data, and show that it significantly outperforms the standard expectation-maximization (EM) benchmark on in-sample fit, predictive, and decision accuracy, while being an order of magnitude faster.