CSCI-UA.0480-009


Time and Place

Room Meyer Hall, 4 Washington Place, Room 122 (formerly Silver 414)
Time Monday and Wednesday: 12:30–1:45AM

 

Instructor Contact Information and Office Hours

Instructor Contact Info Email: meyers at cs dot nyu dot edu
Telephone: 212-998-3482
Office: 60 Fifth Avenue, Room 301
Instructor Office Hours Monday 10:30AM–12:00PM
Thursday 2:30PM–4:00PM
Or by appointment
Required Text Books

 

Description and Syllabus.

Natural Language Processing (aka Computational Linguistics) is an inter-disciplinary field applying methodology of computer science and linguistics to the processing of natural languages (English, Chinese, Spanish, Japanese, etc.). Applications include the following (among others):

Much of the best work in the field combines two methodologies: (1) automatically acquiring statistical information from one set of (training) documents to use as the basis for probabilistically predicting the distribution of similar information in new documents; and (2) using manually encoded linguistic knowledge. For example, many supervised methods of machine learning require: a corpus of text with manually encoded linguistic knowledge, a set of procedures for acquiring statistical patterns from this data and a transducer for predicting these same distinctions in new text. This class will cover linguistic, statistical and computational aspects of this exciting field. We will use the two textbooks for substantially different purposes. We will cover approximately 1/2 of the Jurafsky and Martin book, which provides a detailed description of most of the major subareas of natural language processing. On the other hand, NLTK provides access to some actual NLP tools implemented in Python and will be used to try out different NLP components. As NLTK is open source, it allows the students to look at the actual code and figure out for themselves how things are implemented. I expect to cover a subset of the following topics: linguistic annotation, regular grammars, finite state machines, part of speech tagging, chunking,  named entity tagging, parsing, semantic role labeling, feature structures, information extraction, anaphor resolution and other topics.

This semester draws extensively on the lectures from the Fall 2019 class. However, the material is being revised as the semester progresses. The schedule below includes materials for several classes into the future and indicates the topic of future materials, many of which are close to ones found on last semester's website, but are under revision. Therefore, this website will be updated many times during the semester.

Midterm and Final Project Due Dates: Final

Test or Deadline Date
Midterm

Mon March 23

(Class 14)

Final Project Proposal

Wed Apr 1

(Class 17)

Final Project 30 second Progress Report

Mon Apr 13

(Class 20)

Final Project First Draft

Wed Apr 22

(Class 23)

Final Project Final Version

Mon May 11

Last Class

Homework to Hand In or Present

Provisional List of programming assignments, annotation assignments, writing assignments and in-class presentations (subject to change until posted).

Assignment Number Date and Time Due Assignment
Assignment 1

Mon Feb 3

(Date of Class 3)

Adjective Annotation

Assignment 2

Mon Feb 10

(Date of Class 5)

Regular Expressions

Assignment 3

Mon Feb 24

(Date of Class 8)

HMM and POS tagging

Assignment 4

Wed Mar 4

(Date of Class 11)

Information Retrieval

Assignment 5

Tues Mar 25

(Date of Class 15)

Sequence Labeling (Noun Groups)

Assignment 6

Wed Apr 1

(Date of Class 17)

Final Project Proposal

Short Homework 1

Wed Apr 15

(Date of Class 21)

Short Homework about Coreference

To submit this homework:

  • Printout the linked .pdf file
  • Type or write the answers in the appropriate boxes.
  • Scan in the document with the answers filled in as a pdf file.
  • Upload to Gradescope
Final Project 1st Draft

Mon Apr 27

(Date of Class 24)

Final Project 1st Draft

Short Homework 2

Wed Apr 29

(Date of Class 25)

Homework about Feature Structures

To submit this homework:

  • Printout the linked .pdf file
  • Type or write the answers in the appropriate boxes.
  • Scan in the document with the answers filled in as a pdf file.
  • Upload to Gradescope
Student Presentations

Mon May 4 and Wed May 6

(Date of Classes 26 and 27)

Student Presentations

Short Homework 3

Wed May 6

(Date of Class 27)

Homework about Machine Translation

To submit this homework:

  • Printout the linked .pdf file
  • Type or write the answers in the appropriate boxes.
  • Scan in the document with the answers filled in as a pdf file.
  • Upload to Gradescope
Final Project

Mon May 11

(Day of Last Class)

Final Project -- Final Version

 

Downloadable Resources available from NYUClasses

The materials listed in this section are subject to licnesing agreements. However, I have made them available to members of this class via the resources section of NYUClasses, since NYU has licenses for students (and others at NYU) to use these materials. Additional material subject to licensing restrictions can also be made available (e.g., NYU has licenses with the Linguistic Data Consortium for their materials). Other materials used in this class that are not subject to licensing restrictions are distributed elsewhere in this website. These materials are usable for both homework assignments and final projects.

 

Class Schedule, Lecture Slides and other Materials from Class

This table will be continuously updated during the semester. Documents will be updated; errors will be corrected and additional material will be added. The schedule may also be modified either to add additional material or remove material. All materials originated by me will be freely-downloadable from this site. I will assume a Creative Commons NonCommercial License for all my personal materials unless we reach an agreement otherwise. Proprietary material will be distributed using links to NYUClasses and will require an NYUClasses login to access.The end result will probably be similar to last semester's version of this class. Reading assignments are listed next to the corresponding lectures and are expected to be completed at approximately the same time as the lectures. Please assume that reading assignments are definite once the slides are included for that lecture and tentative before that, as I am in the process of revising the lectures as the term progresses. Still, I do not expect to make very many additional changes to the reading assignments.

Class Date Slides Other Documents Reading Assignments

1

2

Mon Jan 27

Wed Jan 29

Lecture 1: Introduction

  • Chapter 1 in Jurafsky and Martin
  • Install NLTK, Read Chapter 1 and follow examples.
  • Optional: Read through the full Penn Treebank Part of Speech tagset description.

3

4

Mon Feb 3

Wed Feb 5

Lecture 2: Formal Languages

  • Chapters 2 and 3 in Jurafsky and Martin
  • Chapters 2 and 3 in NLTK

5

6

Mon Feb 10

Wed Feb 12

Lecture 3: HMM and Part of Speech Tagging

Ralph's Viterbi Slides

Discussion of Adjective Annotation (HW 1) results.

  • Chapter 5 in J & M
  • Section 5 in NLTK

Holiday: Mon Feb 17

No class

7

8

Wed Feb 19

Mon Feb 24

Lecture 4: Information Retrieval and Terminology Extraction

9

10

Wed Feb 26

Mon Mar 2

Lecture 5 -- Models of Word Distribution within the Sentence
  • Chapters 4.1–4.4, 12 and 13 in J & M
  • Chapter 8 in NLTK
11

Wed Mar 4

Lecture 6 -- Shallow Parsing, Named Entities and Machine Learning

A simple dtd (for annoting Named Entities for use with the Mae Annotation tool)

Sample Corpora for annotating names:

  • Chapter 6 in J&M
  • Sections 6 and 7.5 in NLTK
  • ACE Named Entity Specifications: First 3 sections Only (Optional)
  • Bikel, et. al. (1997). Nymble: a High-Performance Learning Name-finder. In 5th Conference on Applied NLP

12

Mon Mar 9

Discussion about Final Projects

13

Wed Mar 11

Review for Midterm Exam
  • sample midterm
  • sample midterm answers
  • last term's midterm
  • last term's midterm answers

Spring Break: Mar 16 and Mar 18

No class

14

Mon Mar 23

Midterm Exam
15 Wed Mar 25 Lecture 8: Lexical Semantics: Semantic Role Labeling Sample WSJ semantic role annotation
  • Text
  • Syntactic Trees
  • PropBank
  • NomBank
J & M: Section 20.9

16

17

Mon Mar 30

Wed Apr 1

  • Post-Midterm Review
  • Lecture 9 -- Information Extraction
  • Midterm and Answers
    • Midterm
    • Midterm answers
  • Sample Timex Rules from R. Grishman's Proteus system
J & M Chapters 22.2 to 22.4
18

Mon Apr 6

Reference Resolution
  • J &M: Chapter 21:3-8, 21:9
  • Lappin and Leas (1994)
19 Wed Apr 8 Lecture 11: Lexical Semantics: Word Similarity
20

21

Mon Apr 13

Wed Apr 15

30 second Progress Reports from Students (See Instructions)

Lecture 12 -- Feature Structures

Talk about GLARF

J & M Chapters 13.4.2 and 15
22

Mon Apr 20

Talk about 3-4 minute Student Presentations (+ 1 minute for questions):

This includes a preliminary schedule of talks, arranged by topic. This schedule can be modified if needed, but I will not change a time without consulting everyone involved. Note that each class last only 75 minutes. So there are some limits to how much I can change the schedule.

23

Wed Apr 22

To Be Announced

24

25

Mon Apr 27

Wed Apr 29

Lecture 13 -- Machine Translation

Birch and Koehn slides -->

SelecT paper slides (Published version in these proceedings on pages 209–218 )

J & M Chapter 25

26

27

Mon May 4

Wed May 6

Student Presentations
28

Mon May 11

Final Lecture