CSCI-UA.0480-006


Time and Place

Room 60 Fifth Avenue, Room C10
Time Tuesday and Thursday: 8:00–9:15AM

 

Instructor Contact Information and Office Hours

Instructor Contact Info Email: meyers at cs dot nyu dot edu
Telephone: 212-998-3482
Office: 60 Fifth Avenue, Room 301
Instructor Office Hours Monday 10:30AM–12:00PM
Thursday 2:30PM–4:00PM
Or by appointment
Required Text Books

 

Description and Syllabus.

Natural Language Processing (aka Computational Linguistics) is an inter-disciplinary field applying methodology of computer science and linguistics to the processing of natural languages (English, Chinese, Spanish, Japanese, etc.). Applications include the following (among others):

Much of the best work in the field combines two methodologies: (1) automatically acquiring statistical information from one set of (training) documents to use as the basis for probabilistically predicting the distribution of similar information in new documents; and (2) using manually encoded linguistic knowledge. For example, many supervised methods of machine learning require: a corpus of text with manually encoded linguistic knowledge, a set of procedures for acquiring statistical patterns from this data and a transducer for predicting these same distinctions in new text. This class will cover linguistic, statistical and computational aspects of this exciting field. We will use the two textbooks for substantially different purposes. We will cover approximately 1/2 of the Jurafsky and Martin book, which provides a detailed description of most of the major subareas of natural language processing. On the other hand, NLTK provides access to some actual NLP tools implemented in Python and will be used to try out different NLP components. As NLTK is open source, it allows the students to look at the actual code and figure out for themselves how things are implemented. I expect to cover a subset of the following topics: linguistic annotation, regular grammars, finite state machines, part of speech tagging, chunking,  named entity tagging, parsing, semantic role labeling, feature structures, information extraction, anaphor resolution and other topics.

This semester draws extensively on the lectures from the Spring 2019 class. However, the material is being revised as the semester progresses. The schedule below includes materials for several classes into the future and indicates the topic of future materials, many of which are close to ones found on last semester's website, but are under revision. Therefore, this website will be updated many times during the semester.

Midterm and Final Project Due Dates: Final

Test or Deadline Date
Midterm

Thurs Oct 22

(Class 15)

Final Project Proposal

Tues Nov 5

(Class 18)

Final Project 30 second Progress Report

Thurs Nov 14

(Class 21)

Final Project First Draft

Tues Nov 26

(Class 24)

Final Project Final Version

Tuesday, Dec 17

Final Exam Time

Homework to Hand In or Present

Provisional List of programming assignments, annotation assignments, writing assignments and in-class presentations (subject to change until posted).

Assignment Number Date and Time Due Assignment
Assignment 1

Tues Sept 10

(Date of Class 3)

Adjective Annotation

Assignment 2

Tues Sept 17

(Date of Class 5)

Regular Expressions

Assignment 3

Tues Oct 1

(Date of Class 7)

HMM and POS tagging

Assignment 4

Thurs Oct 10

(Date of Class 11)

Information Retrieval

Assignment 5

Tues Oct 22

(Date of Class 14)

Sequence Labeling (Noun Groups)

Assignment 6

Tues Nov 5

(Date of Class 18)

Final Project Proposal

Short Homework 1

Tues Nov 14

(Date of Class 21)

Short Homework about Coreference

To submit this homework:

  • Printout the linked .pdf file
  • Type or write the answers in the appropriate boxes.
  • Scan in the document with the answers filled in as a pdf file.
  • Upload to Gradescope
Short Homework 2

Tues Nov 19

(Date of Class 22)

Short Homework about Sense Similarity

To submit this homework:

  • Printout the linked .pdf file
  • Type or write the answers in the appropriate boxes.
  • Scan in the document with the answers filled in as a pdf file.
  • Upload to Gradescope
Final Project 1st Draft

Tues Nov 26

(Date of Class 24)

Final Project 1st Draft

Short Homework 3

Thurs Nov 28

(Date of Class 25)

Homework about Feature Structures

To submit this homework:

  • Printout the linked .pdf file
  • Type or write the answers in the appropriate boxes.
  • Scan in the document with the answers filled in as a pdf file.
  • Upload to Gradescope
Student Presentations

Thurs Dec 5 and Tues Dec 10

(Date of Classes 26 and 27)

Student Presentations

Short Homework 4

Tues Dec 10

(Date of Class 27)

Homework about Machine Translation

To submit this homework:

  • Printout the linked .pdf file
  • Type or write the answers in the appropriate boxes.
  • Scan in the document with the answers filled in as a pdf file.
  • Upload to Gradescope
Final Project

Tuesday, Dec 17

(Final Exam Time)

Final Project -- Final Version

 

Downloadable Resources available from NYUClasses

The materials listed in this section are subject to licnesing agreements. However, I have made them available to members of this class via the resources section of NYUClasses, since NYU has licenses for students (and others at NYU) to use these materials. Additional material subject to licensing restrictions can also be made available (e.g., NYU has licenses with the Linguistic Data Consortium for their materials). Other materials used in this class that are not subject to licensing restrictions are distributed elsewhere in this website. These materials are usable for both homework assignments and final projects.

 

Class Schedule, Lecture Slides and other Materials from Class

This table will be continuously updated during the semester. Documents will be updated; errors will be corrected and additional material will be added. The schedule may also be modified either to add additional material or remove material. All materials originated by me will be freely-downloadable from this site. I will assume a Creative Commons NonCommercial License for all my personal materials unless we reach an agreement otherwise. Proprietary material will be distributed using links to NYUClasses and will require an NYUClasses login to access.The end result will probably be similar to last semester's version of this class. Reading assignments are listed next to the corresponding lectures and are expected to be completed at approximately the same time as the lectures. Please assume that reading assignments are definite once the slides are included for that lecture and tentative before that, as I am in the process of revising the lectures as the term progresses. Still, I do not expect to make very many additional changes to the reading assignments.

Class Date Slides Other Documents Reading Assignments

1

2

Tues Sept 3

Thurs Sept 5

Lecture 1: Introduction

  • Chapter 1 in Jurafsky and Martin
  • Install NLTK, Read Chapter 1 and follow examples.
  • Optional: Read through the full Penn Treebank Part of Speech tagset description.

3

4

Tuesday Sept 10

Thursday Sept 12

Lecture 2: Formal Languages

  • Chapters 2 and 3 in Jurafsky and Martin
  • Chapters 2 and 3 in NLTK

5

6

Tues Sept 17

Thurs Sept 19

Lecture 3: HMM and Part of Speech Tagging

ls -

Ralph's Viterbi Slides

Discussion of Adjective Annotation (HW 1) results.

  • Chapter 5 in J & M
  • Section 5 in NLTK

7

8

Tues Sept 24

Thurs Sept 26

Lecture 4: Information Retrieval and Terminology Extraction

  • Chapter 23.1 in J &M
  • Optional: Meyers, et. al. 2018 paper on Termolator

9

10

Tues Oct 1

Thurs Oct 3

Lecture 5 -- Models of Word Distribution within the Sentence
  • Chapters 4.1–4.4, 12 and 13 in J & M
  • Chapter 8 in NLTK
11

Tues Oct 8

Lecture 6 -- Shallow Parsing, Named Entities and Machine Learning

A simple dtd (for annoting Named Entities for use with the Mae Annotation tool)

Sample Corpora for annotating names:

  • Chapter 6 in J&M
  • Sections 6 and 7.5 in NLTK
  • ACE Named Entity Specifications: First 3 sections Only (Optional)
  • Bikel, et. al. (1997). Nymble: a High-Performance Learning Name-finder. In 5th Conference on Applied NLP

12

Thurs Oct 10

Lecture 7 -- TBA

Legislative Day: Tues Oct 15

Tuesday with Monday Schedule — No class

13

Thurs Oct 17

Discussion about Final Projects

14

Tuesday Oct 22

Review for Midterm Exam
15

Thurs Oct 24

Midterm Exam
16 Tues Oct 29 Lecture 8: Lexical Semantics: Semantic Role Labeling Sample WSJ semantic role annotation J & M: Section 20.9

17

18

Thurs Oct 31

Tues Nov 5

J & M Chapters 22.2 to 22.4
19

Thurs Nov 7

Reference Resolution
  • J &M: Chapter 21:3-8, 21:9
  • Lappin and Leas (1994)
20 Tues Nov 12 Lecture 11: Lexical Semantics: Word Similarity
  • J & M: Chapter 19
  • First 2 wordnet papers at this link
21

22

Thurs Nov 14

Tues Nov 19

30 second Progress Reports from Students (See Instructions)

Lecture 12 -- Feature Structures

Talk about GLARF

J & M Chapters 13.4.2 and 15
23

Thurs Nov 21

Talk about 2-3 minute Student Presentations:

This includes a preliminary schedule of talks, arranged by topic. This schedule can be modified if needed, but I will not change a time without consulting everyone involved. Note that each class last only 75 minutes. So there are some limits to how much I can change the schedule.

24

Tues Nov 26

Lecture 13 -- Machine Translation

Birch and Koehn slides

SelecT paper slides (Published version in these proceedings on pages 209–218 )

J & M Chapter 25
Holiday: Thurs Nov 28
25

Tues Dec 3

Machine Translation Continued J & M Chapter 25

26

27

Thurs Dec 5

Tues Dec 10

Student Presentations
28

Thurs Dec 12

Final Lecture
Tues Dec 17 Final Project Due