Colloquium Details

Compositional Models for Information Extraction

Speaker: Mark Dredze, Johns Hopkins University

Location: 60 Fifth Avenue 150

Date: February 27, 2017, 2 p.m.

Host: Subhash Khot


Advances in machine learning have led to new neural models for learning effective representations directly from data. Yet for many tasks, years of research have created hand-engineered features that yield state of the art performance. This is the case in relation extraction, a task in the field of information extraction in which a system consumes natural language and produces a structured machine readable representation of relationships between entities. Relation extraction systems are the backbone of a many end-user applications, including question answering, web search and clinical text analysis.

I will present feature-rich compositional models that combine both hand-engineered features with learned text representations to achieve new state-of-the-art results for relation extraction. These models are widely applicable to problems within natural language processing and beyond. Additionally, I will survey how these models fit into my broader research program by highlighting work by my group on developing new machine learning methods for extracting public health information from clinical and social media text.

Speaker Bio:

Mark Dredze is an Assistant Research Professor in Computer Science at Johns Hopkins University and a research scientist at the Human Language Technology Center of Excellence. He is also affiliated with the Center for Language and Speech Processing, the Center for Population Health Information Technology, and holds a secondary appointment in the Department of Health Sciences Informatics in the School of Medicine. He obtained his PhD from the University of Pennsylvania in 2009.

Prof. Dredze has wide-ranging research interests developing machine learning models for natural language processing (NLP) applications. Within machine learning, he develops new methods for graphical models, deep neural networks, topic models and online learning, and has worked in a variety of learning settings, such as semi-supervised learning, transfer learning, domain adaptation and large-scale learning. Within NLP he focuses on information extraction but has considered a wide range of NLP tasks, including syntax, semantics, sentiment and spoke language processing.

Beyond his work in core areas of computer science, Prof. Dredze has pioneered new applications of these technologies in public health informatics, including work with social media data, biomedical articles and clinical texts. He has published widely in health journals including the Journal of the American Medical Association (JAMA), the American Journal of Preventative Medicine (AJPM), Vaccine, and the Journal of the American Medical Informatics Association (JAMIA). His work is regularly covered by major media outlets, including NPR, the New York Times and CNN.


In-person attendance only available to those with active NYU ID cards.

How to Subscribe