Authors: Daniel Galron


Title: Information Extraction on High-­School Level Chemistry Labs

 Abstract: 
In this report we present a feasibility study on automatically interpreting
instructions found in a set of high school chemistry labs, and discuss the
role of deep domain knowledge in the interpretation.  We define the task of
sentence-level interpretation as the extraction of symbolic representations
of the sentence semantics.  In the broader scope, the sentence-level
semantics of a particular sentence will be resolved with semantics from
other sentences in the lab along with domain knowledge to disambiguate and
reason about a physical system.  The task of general automatic
sentence-level interpretation is a difficult one.  The general problem is
not very well defined in the natural language processing research community,
and few researchers have studied the problem.  The common practice is to
decompose the problem into subtasks, such as resolving coreferences of noun
phrases, labeling the semantic roles of arguments to predicates, and
identifying word categories.  We describe a pipeline combining the subtasks
described, along with parsing, to create a system capable of extracting
sentence-level semantics. All the systems used for the subtask are found
off-the-shelf, and we should stress that such a system will be highly-error
prone for reasons we discuss.  Finally, we do a close study of the chemistry
lab corpus, and analyze each instruction to determine the feasibility of its
automatic interpretation and the role of deep domain knowledge in its
disambiguation and understanding.