Natural Language Processing
CSCI-GA.2590-001

Spring 2012
Adam Meyers
Research Assistant Professor

General Information:

Time and Place Warren Weaver Hall
251 Mercer Street
Room 202
Mondays
7:10PM--9:00 PM
Instructor Contact Info 719 Broadway, Room 702
meyers at cs dot nyu dot edu
212 998 3482
http://nlp.cs.nyu.edu/people/meyers.html
Teaching Assistant Contact Info
715 Broadway, Room 709
asun at cs dot nyu dot edu
212 998 3491
http://cs.nyu.edu/~asun/
Instructor Office Hours Thursday 11:00AM--12:00PM
or by appointment
Required Text Books SPEECH and LANGUAGE PROCESSING 2nd Edition
By Daniel Jurafsky and James H. Martin
http://www.cs.colorado.edu/~martin/slp.html
Natural Language Processing with Python
By Steven Bird, Ewan Klein, and Edward Loper

Available on line or can be purchased
http://www.nltk.org/book
Class Mailing List To Subscribe Go To: http://www.cs.nyu.edu/mailman/listinfo/csci_ga_2590_001_sp12
Mailing List Address: csci_ga_2590_001_sp12 at cs dot nyu dot edu

Course Description and Syllabus:

Natural Language Processing (aka Computaitonal Linguistics) is an inter-disciplinary field applying methodology of computer science and linguistics to the processing of natural languages (English, Chinese, Spanish, Japanese, etc.). Typical applications include:

Much of the best work in the field combines two methodologies: (1)  automatically acquiring statistical information from one set of (training) documents to use as the basis for probabilistically predicting the distribution of similar information in new documents; and (2) using manually encoded linguistic knowledge. For example, many supervised methods of machine learning require: a corpus of text with manually encoded linguistic knowledge, a set of procedures for aquiring statistical patterns from this data and a transducer for predicting these same distinctions in new text.

This class will cover linguistic, statistical and computational aspects of this exciting field.

We will use the two textbooks for substantially different purposes. We will cover approximately 1/2 of the Jurafsky and Martin book, which provides a detailed description of most of the major subareas of natural language processing. On the other hand, NLTK provides access to some actual NLP tools implemented in Python and will be used to try out different NLP components. As NLTK is open source, it allows the students to look at the actual code and figure out for themselves how things are implemented.

I expect to cover: linguistic annotation, regular grammars, finite state machines, part of speech tagging, chunking,  named entity tagging, parsing, semantic role labeling, feature structures, information extraction, anaphor resolution and other topics.

Prerequisites:

Previous Class Website:

    Similar to Ralph Grishman's class: http://cs.nyu.edu/courses/spring10/G22.2590-001/index.html

Final Exam:  May 14, 2012, 7:10-9PM at Meyer 102
                       Meyer Hall is at 4 Washington Place
                       This is not our normal classroom -- it is in a different buildings

Homework:

1
Homework 1 Due January 30, 2012
2
Homework 2
Due February 10, 2012
3
Homework 3
Due February 17, 2012
4
Homework 4
Due March 5, 2012
5
Homework 5 Due March 19, 2012
6
Homework 6 Due March 26, 2012
7
Homework 7 Due April 9, 2012
8
Homework 8
  • Readings
  • Final Project Proposal
Due April 23, 2012
9
Readings for last 2 lectures

10
Final Project
Due May 7, 2012

Homework Submission Information: Homework file names should be in the following format: 

    LastFirst_Assignment_X.extension

For example:  MeyersAdam_Assignment_3.zip would be acceptable for a zip file.  I suggest submitting homework in any of the following formats: zip, tgz, txt or a program file with comments (depending on the nature of the homework).

Please submit homework through Blackboard. Please remember to click on "Submit" and not just "Save" or we won't be able to see it.  Unfortunately, Blackboard does not allow for re-submission. If you need to resubmit the homework, please email it to Ang as an attachment, along with some identifying information.

Class Schedule with Materials from Class Attached 

        This will be constantly updated throughout the semester


Class 1: January 23 Introduction: lecture 1
Class 2: January 30 Formal Languages and Transducers:
lecture 2
Python Session Print Out January 30, 2012
Class 3: February 6
Natural Language Syntax and Parsing: lecture 3
Class 4: February 13
Class Canceled due to Flood in Warren Weaver Hall
Class 5: February 27
POS Tagging and Hidden Markov Models: lecture 4 Ralph Grishman's Viterbi Slides
Class 6: March 5 Named Entities and Machine Learning: lecture 5

Class 7: March 19 Lexical Semantics and Semantic Role Labeling: lecture 6

Class 8: March 26 Information Extraction: Entities, Relations, Events, Time: lecture 7
Discussion about selecting features for relation extraction: NLP HW7 specs (pdf, powerpoint)
Class 9: April 2
Discussion about Final Project and Final Exam
Anaphora: Coreference and Similar Phenomena: lecture 8

Class 10: April 9
Class 11: April 16 Feature Structures and Representing Multiple Phenomena:  lecture 9
GLARF talk talk presented at CUNY in March, 2011

Class 12: April 23
Machine Translation: 1) lecture 10; 2) Birch and Koehn slides about Statistical MT, 3) Notes about HW 6
Class 13: April 30
Class 14: May 7 Review Session        Sample Final Exam: The sample test approximates what the actual final will be like. Although the actual topics may vary somwhat from the sample (not all the above topics are on the sample). Also, the length of the actual final may be slightly shorter than the sample
May 14 Final Exam
Time: 7:10-9PM
Location:
Meyer Hall, Room 102         Meyer Hall is at 4 Washington Place


Installing Software:.

  1. Installing NLTK

    1. Follow the instructions at: http://www.nltk.org/download

    2. Installing for Linux is the easiest -- it basically just works

    3. Installing for Windows is OK, but some modules (the malt parser, for example) won't work at all

    4. Installing for Apple is a little harder than Linux, but not terrible

      1. You must make sure that your tcl/tk is compatible with the Python you install

      2. You have to identify the current OCAML path for some of the later modules (which you might not need anyway)

  2. Other Software: TBA

Grading: 1/3 Homework + 1/3 Final Exam + 1/3 Final Project

Please keep in mind the Computer Science Department's Statement on Academic Integrity

Final Project

    1. System Project
    2. Annotation Project
    3. Term Paper

Due May 7 (the last class of the Semester)

Other Information:  TBA

Additional Resources: