CSCI-UA.0480-006

Special Topics: Natural Language Processing

Adam Meyers

Research Assistant Professor &
Visiting Clinical Associate Professor

Undergraduate Division

Computer Science




General Information

Time and Place

Warren Weaver Hall
251 Mercer Street
Room 317
Tuesday and Thursday
8:00--9:15AM

University Holidays/Legislative Days: Tuesday October 13, Thursday November 26

Midterm: TBA
Final Exam: TBA

Instructor Contact Info

715 Broadway, Room 702
meyers at cs dot nyu dot edu
212 998 3482
http://nlp.cs.nyu.edu/people/meyers.html

Instructor Office Hours

Monday: 1:30-3PM or Thursday: 10:30-12PM or by appointment






Required Text Books

SPEECH and LANGUAGE PROCESSING 2nd Edition
By Daniel Jurafsky and James H. Martin
http://www.cs.colorado.edu/~martin/slp.html

Natural Language Processing with Python
By Steven Bird, Ewan Klein, and Edward Loper

Available on line or can be purchased
http://www.nltk.org/book

Note that the book is in the process of being revised, so the online version may be more uptodate than the print version.



Description and Syllabus: 

Natural Language Processing (aka Computational Linguistics) is an inter-disciplinary field applying methodology of computer science and linguistics to the processing of natural languages (English, Chinese, Spanish, Japanese, etc.). Typical applications include:

Much of the best work in the field combines two methodologies: (1)  automatically acquiring statistical information from one set of (training) documents to use as the basis for probabilistically predicting the distribution of similar information in new documents; and (2) using manually encoded linguistic knowledge. For example, many supervised methods of machine learning require: a corpus of text with manually encoded linguistic knowledge, a set of procedures for acquiring statistical patterns from this data and a transducer for predicting these same distinctions in new text.

This class will cover linguistic, statistical and computational aspects of this exciting field.

We will use the two textbooks for substantially different purposes. We will cover approximately 1/2 of the Jurafsky and Martin book, which provides a detailed description of most of the major subareas of natural language processing. On the other hand, NLTK provides access to some actual NLP tools implemented in Python and will be used to try out different NLP components. As NLTK is open source, it allows the students to look at the actual code and figure out for themselves how things are implemented.

I expect to cover a subset of the following topics: linguistic annotation, regular grammars, finite state machines, part of speech tagging, chunking,  named entity tagging, parsing, semantic role labeling, feature structures, information extraction, anaphor resolution and other topics.

I taught a similar class at Montclair State University in 2014. The class website is: http://nlp.cs.nyu.edu/meyers/montclair-class/.  Examining this website could give the prospective student an idea about what this class will be like.

This website will be updated over the next few weeks and will include most of the lecture slides for the first few classes before the semester begins.  It will then be continuously updated throughout the semester.

Test Dates

Test

Dates

Sample Problems

Tests/Answers/Grades

Midterm

Thurs, Oct 22,2015, 8AM-9:15

sample midterm

TBA

Final

Thurs Dec 17, 2015 8AM-9:50

TBA (See Previous Classes)

TBA

Reading and Programming Assignments:

Assignment
Number

Assignment Description

Due Date

1

Introduction and Simple Annotation Task

September 9, 2015

2

Regular Expressions

September 16, 2015

3

Grammars and Parsing

September 23, 2015

4

HMM and POS Tagging

October 6, 2015

5

Inverse Document Frequency

October 25, 2015

6

MEMM

November 10, 2015

7

Final Project Proposal

November 19, 2015

8

Percent Task (Extra Credit)

December 8, 2015

Final Project Information

See: Final Project Proposal

Class Schedule, Lecture Slides and other Materials from Class Attached:

This table will be continuously updated during the semester. Documents will be updated; errors will be corrected and additional material will be added.  The schedule may also be modified to provide more time to explain the material.  All materials originated by me will be freely-downloadable from this site. Proprietary material will be distributed using links to NYUClasses and will require an NYUClasses login to access.

Class

Date

Slides (.pdf)

Other Documents


1

Thurs, Sep 3, 2015

Lecture 1: Introduction



2

Tues, Sep 8, 2015



3

Thurs, Sep 10, 2015

Lecture 2: Formal Languages, Regular Expressions, Automata, Transducers
annotation_agreement_report

Annotation Spreadsheet


4

Tues, Sep 15, 2015

OANC Text
simple-grep-regexp-time-expression
python-idle-dump-90615


5

Thurs, Sep 17, 2015

Lecture 3: English Syntax and Parsing Algorithms

Random Sentence Generator Files


6

Tues, Sep 22, 2015

Code Illustrating CKY algorithm


7

Thurs, Sep 24, 2015

Lecture 4: HMM and Part of Speech Tagging
Ralph's Viterbi Slides

homework4 materials


8

Tues, Sep 29, 2015



9

Thurs, Oct 1, 2015

Lecture 5: Corpus Linguistics for NLP

Lecture 6: Information Retrieval and Terminology Extraction
Terminology Extraction: The Termolator
Lecture 7
Final project possiblities



10

Tues, Oct 6, 2015



11

Thurs, Oct 8, 2015

stop_list.py


12

Thurs, Oct 15, 2015



13

Tues, Oct 20, 2015

Pre-Midterm Review

sample midterm


14

Thurs, Oct 22, 2015

Midterm Exam



15

Tues, Oct 27, 2015

Post-Midterm Review
Lecture 8: Reference Resolution



16

Thurs, Oct 29, 2015

Finish Lecture 8
Lecture 9: Lexical Semantics including Sense Disambiguation and Semantic Role Labeling

17

Tues, Nov 3, 2015

Lecture 9 Continued


18

Thurs, Nov 5, 2015

lecture 9 Continued, Lecture 10: Information Extraction: Beyond Named Entities


19

Tues, Nov 10, 2015

Completed Lecture 10, Started Lecture 11: Feature Structures and How to
Represent Multiple Phenomena Simultaneously



20

Thurs, Nov 12, 2015

Lecture 11: continued, GLARF Talk



21

Tues, Nov 17, 2015

finish GLARF Talk, partitive task description, Statistical NLP: A Machine Learning Perspective by Miao Fan



22

Thurs, Nov 19, 2015

Statistical NLP continued



23

Tues, Nov 24, 2015

TBA



24

Tues, Dec 1, 2015

Machine Translation
with reference to:
Birch and Koehn 2010
Discussion about Student Presentations



25

Thurs, Dec 3, 2015



26

Tues, Dec 8, 2015

Student Presentations: 3 minutes plus 1 minute for questions



27

Thurs, Dec 10, 2015



28

Tues, Dec 15, 2015

Pre-Final Review

Sample Final


Final

TBA

Final Exam