CSCI-UA.0480-009


Time and Place

Room Silver Building, 100 Washington Sq East, Room 414
Time Monday and Wednesday: 12:30PM–1:45AM

 

Instructor Contact Information and Office Hours

Instructor Contact Info Email: meyers at cs dot nyu dot edu
Telephone: 212-998-3482
Office: 60 Fifth Avenue, Room 301
Instructor Office Hours Monday 2:30PM–4:00PM
Thursday 10:30AM–12:00PM
Or by appointment
Required Text Books

 

Description and Syllabus.

Natural Language Processing (aka Computational Linguistics) is an inter-disciplinary field applying methodology of computer science and linguistics to the processing of natural languages (English, Chinese, Spanish, Japanese, etc.). Applications include the following (among others):

Much of the best work in the field combines two methodologies: (1) automatically acquiring statistical information from one set of (training) documents to use as the basis for probabilistically predicting the distribution of similar information in new documents; and (2) using manually encoded linguistic knowledge. For example, many supervised methods of machine learning require: a corpus of text with manually encoded linguistic knowledge, a set of procedures for acquiring statistical patterns from this data and a transducer for predicting these same distinctions in new text. This class will cover linguistic, statistical and computational aspects of this exciting field. We will use the two textbooks for substantially different purposes. We will cover approximately 1/2 of the Jurafsky and Martin book, which provides a detailed description of most of the major subareas of natural language processing. On the other hand, NLTK provides access to some actual NLP tools implemented in Python and will be used to try out different NLP components. As NLTK is open source, it allows the students to look at the actual code and figure out for themselves how things are implemented. I expect to cover a subset of the following topics: linguistic annotation, regular grammars, finite state machines, part of speech tagging, chunking,  named entity tagging, parsing, semantic role labeling, feature structures, information extraction, anaphor resolution and other topics.

This semester draws extensively on the lectures from the Fall 2018 class. However, the material is being reordered and revised as the semester progresses. The schedule below includes materials for several classes into the future and indicates the topic of future materials, many of which are close to ones found on last semester's website, but are under revision. Therefore, this website will be updated many times during the semester.

Midterm and Final Project Due Dates: Final

Test or Deadline Date
Midterm

Monday March 24

(Class 15)

Final Project Proposal

Wednesday April 10

(Class 19)

Final Project 30 second Progress Report

Wednesday April 24

(Class 23)

Final Project Initial Version

Wednesday May 1

(1 day before Class 25)

Final Project Final Version

Friday May 17

Final Exam Time

Homework to Hand In or Present

Provisional List of programming assignments, annotation assignments, writing assignments and in-class presentations (subject to change until posted).

Assignment Number Date and Time Due Assignment
Assignment 1

Monday February 4

(Date of Class 3)

Adjective Annotation

Assignment 2

Monday February 11

(Date of Class 5)

Regular Expressions

Assignment 3

Monday February 25

(Date of Class 8)

HMM and POS tagging

Assignment 4

Wednesday March 6

(Date of Class 11)

Information Retrieval

Assignment 5

Monday March 24

(Date of Class 14)

Sequence Labeling (Noun Groups)

Assignment 6

Monday April 8

(Date of Class 18)

Final Project Proposal

Short Homework 1

Wednesday April 10

(Date of Class 19)

Short Homework about Coreference

Short Homework 2

Monday April 15

(Date of Class 20)

Short Homework about Sense Similarity

Short Homework 3

Wednesday April 24

(Date of Class 23)

Homework about Feature Structures

Wednesday May 1

(Date of Class 25)

Final Project 1st Draft

Monday May 6 and Wednesday May 8

(Date of Classes 26 and 27)

Student Presentations

Short Homework 4

Wednesday May 8

(Date of Class 27)

Homework about Machine Translation

Friday May 17

(Final Exam Time)

Final Project -- Final Version

 

Downloadable Resources available from NYUClasses

The materials listed in this section are subject to licnesing agreements. However, I have made them available to members of this class via the resources section of NYUClasses, since NYU has licenses for students (and others at NYU) to use these materials. Additional material subject to licensing restrictions can also be made available (e.g., NYU has licenses with the Linguistic Data Consortium for their materials). Other materials used in this class that are not subject to licensing restrictions are distributed elsewhere in this website. These materials are usable for both homework assignments and final projects.

 

Class Schedule, Lecture Slides and other Materials from Class

This table will be continuously updated during the semester. Documents will be updated; errors will be corrected and additional material will be added. The schedule may also be modified either to add additional material or remove material. All materials originated by me will be freely-downloadable from this site. I will assume a Creative Commons NonCommercial License for all my personal materials unless we reach an agreement otherwise. Proprietary material will be distributed using links to NYUClasses and will require an NYUClasses login to access.The end result will probably be similar to last semester's version of this class. Reading assignments are listed next to the corresponding lectures and are expected to be completed at approximately the same time as the lectures. Please assume that reading assignments are definite once the slides are included for that lecture and tentative before that, as I am in the process of revising the lectures as the term progresses. Still, I do not expect to make very many additional changes to the reading assignments.

Class Date Slides Other Documents Reading Assignments

1

2

Monday January 28

Wednesday January 30

Lecture 1: Introduction

  • Chapter 1 in Jurafsky and Martin
  • Install NLTK, Read Chapter 1 and follow examples.
  • Optional: Read through the full Penn Treebank Part of Speech tagset description.

3

4

Monday February 4

Wednesday February 6

Lecture 2: Formal Languages

  • Chapters 2 and 3 in Jurafsky and Martin
  • Chapters 2 and 3 in NLTK

5

6

Monday February 11

Wednesday February 13

Lecture 3: HMM and Part of Speech Tagging

Ralph's Viterbi Slides

  • Chapter 5 in J & M
  • Section 5 in NLTK
Holiday: Monday February 18

7

8

Wednesday February 20

Monday February 25

Lecture 4: Information Retrieval and Terminology Extraction

  • Chapter 23.1 in J &M
  • Optional: Meyers, et. al. 2018 paper on Termolator

9

10

Wednesday February 27

Monday March 4

Lecture 5 -- Models of Word Distribution within the Sentence
  • Chapters 4.1–4.4, 12 and 13 in J & M
  • Chapter 8 in NLTK
11

Wednesday March 6

Lecture 6 -- Shallow Parsing, Named Entities and Machine Learning

A simple dtd (for annoting Named Entities for use with the Mae Annotation tool)

Sample Corpora for annotating names:

  • Chapter 6 in J&M
  • Sections 6 and 7.5 in NLTK
  • ACE Named Entity Specifications: First 3 sections Only (Optional)
  • Bikel, et. al. (1997). Nymble: a High-Performance Learning Name-finder. In 5th Conference on Applied NLP

12

13

Monday March 11

Wednesday March 13

Lecture 7 -- Corpus Linguistics

Discussion about Final Projects

Sample Annotation on the Penn Treebank:
Spring Break: Monday March 18–Sunday March 24
14

Monday March 25

Review for Midterm Exam
15

Wednesday March 27

Midterm Exam
16

Monday April 1

Reference Resolution
  • J &M: Chapter 21:3-8, 21:9
  • Lappin and Leas (1994)
17

Wednesday April 3

Post-Midterm Review

18

19

Monday April 8

Wednesday April 10

Lecture 9a: Lexical Semantics: Word Similarity

Lecture 9b: Lexical Semantics: Semantic Role Labeling

20

Monday April 15

Lecture 10 -- Information Extraction Sample Timex Rules from R. Grishman's Proteus system J & M Chapters 22.2 to 22.4
22

21

Wednesday April 17

Monday April 22

Lecture 11 -- Feature Structures

Talk about GLARF

J & M Chapters 13.4.2 and 15
23

Wednesday April 24

30 second Progress Reports from Students (See Instructions)

Talk about 3-4 minute Student Presentations:

This will includes a preliminary schedule of talks, arranged by topic (will be filled in). If a project has been incorrectly grouped or if additional is provided (for the miscellaneous category), I will change the schedule. Note that each class last only 75 minutes. So there are some limits to how much I can change the schedule.

24

25

Monday April 29

Wednesday May 1

Lecture 12 -- Machine Translation

Birch and Koehn slides

SelecT paper slides (Published version in these proceedings on pages 209–218 )

J & M Chapter 25

26

27

Monday May 6

Wednesday May 8

Student Presentations
28

Monday May 13

Final Lecture
Friday May 17 Final Project Due