CSCI-UA.0480-006


Time and Place

Room BOBST Library, Room LL150 (lower level room, 150)
Time Tuesday and Thursday: 8:00AM–9:15AM

 

Instructor Contact Information and Office Hours

Instructor Contact Info Email: meyers at cs dot nyu dot edu
Telephone: 212-998-3482
Office: 60 Fifth Avenue, Room 301
Instructor Office Hours Monday 1:30PM–3:00PM
Thursday 10:30AM–12:00PM
Or by appointment
Required Text Books

 

Description and Syllabus.

Natural Language Processing (aka Computational Linguistics) is an inter-disciplinary field applying methodology of computer science and linguistics to the processing of natural languages (English, Chinese, Spanish, Japanese, etc.). Applications include the following (among others):

Much of the best work in the field combines two methodologies: (1) automatically acquiring statistical information from one set of (training) documents to use as the basis for probabilistically predicting the distribution of similar information in new documents; and (2) using manually encoded linguistic knowledge. For example, many supervised methods of machine learning require: a corpus of text with manually encoded linguistic knowledge, a set of procedures for acquiring statistical patterns from this data and a transducer for predicting these same distinctions in new text. This class will cover linguistic, statistical and computational aspects of this exciting field. We will use the two textbooks for substantially different purposes. We will cover approximately 1/2 of the Jurafsky and Martin book, which provides a detailed description of most of the major subareas of natural language processing. On the other hand, NLTK provides access to some actual NLP tools implemented in Python and will be used to try out different NLP components. As NLTK is open source, it allows the students to look at the actual code and figure out for themselves how things are implemented. I expect to cover a subset of the following topics: linguistic annotation, regular grammars, finite state machines, part of speech tagging, chunking,  named entity tagging, parsing, semantic role labeling, feature structures, information extraction, anaphor resolution and other topics.

This semester draws extensively on the lectures from the Spring 2018 class. However, the material is being reordered and revised as the semester progresses. The schedule below includes materials for several classes into the future and indicates the topic of future materials, many of which are close to ones found on last semester's website, but are under revision. Therefore, this website will be updated many times during the semester.

Midterm and Final Project Due Dates: Final

Test or Deadline Date
Midterm

Thursday October 25

(Class 15)

Final Project Proposal

Thursday November 8

(Class 19)

Final Project 30 second Progress Report

Tuesday November 27

(Class 23)

Final Project Initial Version

Tuesday December 4

(1 day before Class 25)

Final Project Final Version

Tuesday December 18

During Final Exam Week

Homework to Hand In or Present

Provisional List of programming assignments, annotation assignments, writing assignments and in-class presentations (subject to change until posted).

Assignment Number Date and Time Due Assignment
Assignment 1

Tuesday Sept. 13

(Date of Class 4)

Adjective Annotation

Assignment 2

Tuesday Sept. 18

(Date of Class 5)

Regular Expressions

Assignment 3

Tuesday Oct. 2

(Date of Class 9)

HMM and POS tagging

Assignment 4

Thursday Oct. 11

(Date of Class 11)

Information Retrieval

Assignment 5

Tuesday Oct. 30

(Date of Class 16)

Sequence Labeling (Noun Groups)

Assignment 6

Thursday Nov. 8

(Date of Class 19)

Final Project Proposal

Short Homework 1

Thursday Nov. 8

(Date of Class 19)

Short Homework about Coreference

Short Homework 2

Thursday Nov. 13

(Date of Class 20)

Short Homework about Sense Similarity

Short Homework 3

Thursday Nov. 27

(Date of Class 23)

Homework about Feature Structures

Tuesday Dec. 4

(Date of Class 25)

Final Project 1st Draft

Thursday Dec. 6 and Tuesday Dec. 11

(Date of Classes 26 and 27)

Student Presentations

Short Homework 4

Tuesday Dec. 11

(Date of Class 27)

Homework about Machine Translation

Approximately Dec. 18

(During Final Exam Week)

Final Project -- Final Version

 

Downloadable Resources available from NYUClasses

The materials listed in this section are subject to licnesing agreements. However, I have made them available to members of this class via the resources section of NYUClasses, since NYU has licenses for students (and others at NYU) to use these materials. Additional material subject to licensing restrictions can also be made available (e.g., NYU has licenses with the Linguistic Data Consortium for their materials). Other materials used in this class that are not subject to licensing restrictions are distributed elsewhere in this website. These materials are usable for both homework assignments and final projects.

 

Class Schedule, Lecture Slides and other Materials from Class

This table will be continuously updated during the semester. Documents will be updated; errors will be corrected and additional material will be added. The schedule may also be modified either to add additional material or remove material. All materials originated by me will be freely-downloadable from this site. I will assume a Creative Commons NonCommercial License for all my personal materials unless we reach an agreement otherwise. Proprietary material will be distributed using links to NYUClasses and will require an NYUClasses login to access.The end result will probably be similar to last semester's version of this class. Reading assignments are listed next to the corresponding lectures and are expected to be completed at approximately the same time as the lectures. Please assume that reading assignments are definite once the slides are included for that lecture and tentative before that, as I am in the process of revising the lectures as the term progresses. Still, I do not expect to make very many additional changes to the reading assignments.

Class Date Slides Other Documents Reading Assignments

1

2

Tuesday September 4

Thursday September 6

Lecture 1: Introduction

Adjective Task Error Analysis
  • Chapter 1 in Jurafsky and Martin
  • Install NLTK, Read Chapter 1 and follow examples.
  • Optional: Read through the full Penn Treebank Part of Speech tagset description.

3

4

Tuesday September 11

Thursday September 13

Lecture 2: Formal Languages

This script (developed cooperatively in class) uses grep with a regular expression to identify some time expressions (not all). It pipes the output of grep to the program less. You can also find regexps in less by typing "/" followed by the regexp. The script runs in a UNIX shell (linux, Apple, etc.). I suspect that there is an equivalent for Windows (maybe you have to use a Powershell?). The script assumes that the file "all-OANC.txt" from homework assignment 2 is in the same directory.
  • Chapters 2 and 3 in Jurafsky and Martin
  • Chapters 2 and 3 in NLTK

5

6

Tuesday September 18

Thursday September 21

Lecture 3: HMM and Part of Speech Tagging

Ralph's Viterbi Slides

  • Chapter 5 in J & M
  • Section 5 in NLTK

7

8

Tuesday September 25

Thursday September 27

Lecture 4: Information Retrieval and Terminology Extraction

  • Chapter 23.1 in J &M
  • Optional: Meyers, et. al. 2018 paper on Termolator

9

10

Tuesday October 2

Thursday October 4

Lecture 5 -- Models of Word Distribution within the Sentence
  • Chapters 4.1–4.4, 12 and 13 in J & M
  • Chapter 8 in NLTK
Legislative Day: Monday Classes Scheduled for Tuesday October 9
11

Thursday October 11

Lecture 6 -- Shallow Parsing, Named Entities and Machine Learning

A simple dtd (for annoting Named Entities for use with the Mae Annotation tool)

Sample Corpora for annotating names:

  • Chapter 6 in J&M
  • Sections 6 and 7.5 in NLTK
  • ACE Named Entity Specifications: First 3 sections Only (Optional)
  • Bikel, et. al. (1997). Nymble: a High-Performance Learning Name-finder. In 5th Conference on Applied NLP

12

13

Tuesday October 16

Thursday October 18

Lecture 7 -- Corpus Linguistics

Discussion about Final Projects

Sample Annotation on the Penn Treebank:
14

Tuesday October 23

Review for Midterm Exam
15

Thursday October 25

Midterm Exam
16

Tuesday October 30

Reference Resolution
  • J &M: Chapter 21:3-8, 21:9
  • Lappin and Leas (1994)
17

Thursday November 1

Post-Midterm Review

18

19

Tuesday November 6

Thursday November 8

Lecture 9a: Lexical Semantics: Word Similarity

Lecture 9b: Lexical Semantics: Semantic Role Labeling

20

Tuesday November 13

Lecture 10 -- Information Extraction Sample Timex Rules from R. Grishman's Proteus system J & M Chapters 22.2 to 22.4
22

21

Tuesday November 20

Thursday November 15

Lecture 11 -- Feature Structures

Talk about GLARF

J & M Chapters 13.4.2 and 15
Holiday: Thursday November 22
23

Tuesday November 27

30 second Progress Reports from Students (See Instructions)

Talk about 3-4 minute Student Presentations:

This will includes a preliminary schedule of talks, arranged by topic (will be filled in). If a project has been incorrectly grouped or if additional is provided (for the miscellaneous category), I will change the schedule. Note that each class last only 75 minutes. So there are some limits to how much I can change the schedule.

24

25

Thursday November 29

Tuesday December 4

Lecture 12 -- Machine Translation

Birch and Koehn slides

SelecT paper slides (Published version in these proceedings on pages 209–218 )

J & M Chapter 25

26

27

Thursday December 6

Tuesday December 11

Student Presentations
28

Thursday December 13

Final Lecture
Tuesday December 18 Final Project Due