Authors: Daniel Galron Title: Information Extraction on High-School Level Chemistry Labs Abstract: In this report we present a feasibility study on automatically interpreting instructions found in a set of high school chemistry labs, and discuss the role of deep domain knowledge in the interpretation. We define the task of sentence-level interpretation as the extraction of symbolic representations of the sentence semantics. In the broader scope, the sentence-level semantics of a particular sentence will be resolved with semantics from other sentences in the lab along with domain knowledge to disambiguate and reason about a physical system. The task of general automatic sentence-level interpretation is a difficult one. The general problem is not very well defined in the natural language processing research community, and few researchers have studied the problem. The common practice is to decompose the problem into subtasks, such as resolving coreferences of noun phrases, labeling the semantic roles of arguments to predicates, and identifying word categories. We describe a pipeline combining the subtasks described, along with parsing, to create a system capable of extracting sentence-level semantics. All the systems used for the subtask are found off-the-shelf, and we should stress that such a system will be highly-error prone for reasons we discuss. Finally, we do a close study of the chemistry lab corpus, and analyze each instruction to determine the feasibility of its automatic interpretation and the role of deep domain knowledge in its disambiguation and understanding.