CSCI-UA.0480-011

Special Topics: Natural Language Processing

Adam Meyers

Research Assistant Professor &
Visiting Clinical Associate Professor

Undergraduate Division

Computer Science




General Information

Time and Place

Warren Weaver Hall
251 Mercer Street
Room 102
Monday and Wednesday
11:00 to 12:15AM

University Holidays: February 15 (President's Day), March 14 & 16 (Spring Break)

Midterm: Wed March 9, 11:00—12:15
Final Exam: Monday May 16, 10:00—11:50

Instructor Contact Info

715 Broadway, Room 702
meyers at cs dot nyu dot edu
212 998 3482
http://nlp.cs.nyu.edu/people/meyers.html

Instructor Office Hours

Monday: 1:30-3PM or Thursday: 10:30--12PM or by appointment






Required Text Books

SPEECH and LANGUAGE PROCESSING 2nd Edition
By Daniel Jurafsky and James H. Martin
http://www.cs.colorado.edu/~martin/slp.html

Natural Language Processing with Python
By Steven Bird, Ewan Klein, and Edward Loper

Available on line or can be purchased
http://www.nltk.org/book

Note that the book is in the process of being revised, so the online version may be more uptodate than the print version.



Description and Syllabus: 

Natural Language Processing (aka Computational Linguistics) is an inter-disciplinary field applying methodology of computer science and linguistics to the processing of natural languages (English, Chinese, Spanish, Japanese, etc.). Typical applications include:

Much of the best work in the field combines two methodologies: (1)  automatically acquiring statistical information from one set of (training) documents to use as the basis for probabilistically predicting the distribution of similar information in new documents; and (2) using manually encoded linguistic knowledge. For example, many supervised methods of machine learning require: a corpus of text with manually encoded linguistic knowledge, a set of procedures for acquiring statistical patterns from this data and a transducer for predicting these same distinctions in new text.

This class will cover linguistic, statistical and computational aspects of this exciting field.

We will use the two textbooks for substantially different purposes. We will cover approximately 1/2 of the Jurafsky and Martin book, which provides a detailed description of most of the major subareas of natural language processing. On the other hand, NLTK provides access to some actual NLP tools implemented in Python and will be used to try out different NLP components. As NLTK is open source, it allows the students to look at the actual code and figure out for themselves how things are implemented.

I expect to cover a subset of the following topics: linguistic annotation, regular grammars, finite state machines, part of speech tagging, chunking,  named entity tagging, parsing, semantic role labeling, feature structures, information extraction, anaphor resolution and other topics.

This website will then be continuously updated throughout the semester.

Test Dates

Test

Dates

Sample Tests and Problems

Midterm

Wednesday, March 9, 2016

TBA

Final

Monday, May 16, 2016, 10:00—11:50 

TBA

Reading and Programming Assignments:

Assignment
Number

Assignment Description

Due Date

1

homework 1

2/1/2016

2

homework 2

2/10/2016

3

homework 3

2/17/2016

4

homework 4  

3/2/2016

5

homework 5

3/21/2016

6

homework 6

4/7/2016

7

homework7 (Final Project Proposal)

4/14/2016

8

Reading 8: J &M: Chapter 21:3-8, 21:9 and Lappin and Leas (1994) (Other optional reading in lecture 8)

4/21/2016

9

Reading 9: J & M: Chapter 19 and Section 20.9 and first 2 wordnet papers at this link (other optional reading in lecture 9)

4/21/2016

10

Reading 10: J & M Chapters 22.2 to 22.4 (other optional reading in lecture 10)

4/28/2016

11

Reading 11: J & M Chapters 13.4.2 and 15 (other optional reading in lecture 11)

4/28/2016

12

Reading 12: J & M Chapters 13.4.2 and 15

5/2/2016

Class Schedule, Lecture Slides and other Materials from Class Attached:

This table will be continuously updated during the semester. Documents will be updated; errors will be corrected and additional material will be added.  The schedule may also be modified to provide more time to explain the material.  All materials originated by me will be freely-downloadable from this site. Proprietary material will be distributed using links to NYUClasses and will require an NYUClasses login to access.The end result will probably be similar to last semester's version of this class: http://cs.nyu.edu/courses/fall15/CSCI-UA.0480-006/

Class

Date

Slides (.pdf)

Other Documents


1

Mon, Jan 25, 2016

Lecture 1: Introduction



2

Wed, Jan 27, 2016

jointly annotated medical file begun in class

same file annotated by professor


3

Mon Feb 1, 2016

Lecture 2: Formal Languages, Regular Expressions, Automata, Transducers



4

Wed Feb 3, 2016

OANC Text


5

Mon Feb 8, 2016

Lecture 3: English Syntax and Parsing Algorithms

Random Sentence Generator Files


6

Wed Feb 10,2016

Code Illustrating CKY algorithm


7

Wed Feb 17, 2016

Lecture 4: HMM and Part of Speech Tagging

Ralph's Viterbi



8

Mon Feb 22, 2016



9

Wed Feb 24,2016

lecture 5: Corpus Linguistics for NLP

lecture6: Information Retrieval and Term Extraction
The Termolator
Discussion of Final Projects



10

Mon Feb 29, 2016



11

Wed Mar 2,2016



12

Mon Mar 7, 2016

Pre-Midterm Review

Sample Midterm


13

Wed Mar 9, 2016

Midterm Exam



14

Mon Mar 21, 2016

Post-Midterm Review

Lecture 7: Named Entities



15

Wed Mar 23, 2016



16

Mon Mar 28,2016

Lecture 8: Reference Resolution
Lecture 9: Lexical Semantics




17

Wed Mar 30, 2016



18

Mon Apr 4, 2016



19

Wed Apr 6, 2016

Lecture 10: Information Extraction Beyond Named Entities
Paper on annotation in Scientific Documents




20

Mon Apr 11, 2016



21

Wed Apr 13, 2016

Lecture 11: Feature Structures: How to Represent Multiple Phenomena Simultaneously

CUNY talk about GLARF

Talk about Final Project Talks (including preliminary schedule)



22

Mon Apr 18, 2016



23

Wed Apr 20, 2016

Machine Translation




24

Mon Apr 25, 2016



25

Wed Apr 27, 2016

Student Presentations: 3 minutes plus 1 minute for questions



26

Mon May 2, 2016



27

Wed May 4, 2016



28

Mon May 9, 2016

Pre-Final Review

sample final exam
python notes from class (cosine similarity)

question7_answer

All practice Final answers


Final

Mon May 16, 2016

Final Exam 10:00—11:50





h