Web Search Engines

Monday 5:00-7:00
Room WWH 102
Professor Ernest Davis

Reaching Me

Prerequisites: None.

Required Textbooks:
Introduction to Information Retrieval by Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze, (MRS) Cambridge U. Press, 2008.
Available online .

Other useful books:
Search Engines: Information Retrieval in Practice by W. Bruce Croft, Donald Metzler, and Trevor Strohman. In particular, this has more about web search engines specifically (as opposed to information retrieval generally) than MRS.

Information Retrieval: Implementing and Evaluating Search Engines by Stefan Büttcher, Charles L.A. Clarke, and Gordon V. Cormack. Includes very in-depth discussion of data structures and algorithms for indexing.

Natural Language Processing with Python: Analyzing Text with Steven Bird, Ewan Klein, and Edward Loper, O'Reilly Pubs., 2009. Online version. Note: The online version is current. The print version (2009) is out of date; It uses an outmoded form of Python, discusses a functionality (babelize) that stopped working with BabelFish was discontinued, etc.

List of course topics:

We will discuss the design of a Web search engine and the extraction of information off the Web. Topics include


Problem sets (20%)
Programming assignments (15%)
Project (25%)
Final exam (40%)


Problem set 1 Due Feb. 6
Programming Assignment 1 Due Feb. 13
Problem set 2 Due Feb. 27
Programming Assignment 2 Due Mar. 6
Problem set 3 Due Mar. 20
Programming Assignment 3 Due Apr. 3
Problem set 4 Due Apr. 3
Problem set 5 Due Apr. 24
Course Project Due May 1.

Submitting problem sets: Homeworks should be uploaded to the NYU Classes site in either Word or PDF. Please do not handwrite your assignments, scan them, and upload them.

Submitting programming assignments: Source code should be uploaded to the NYU Classes site. If there is anything at all non-obvious about how to run it, then a README file should be submitted as well.

Late policy: Problem sets and programming assignments are due at the start of class on their due date. Problem sets will be accepted up to 1 week late, with a penalty of 1 point out of 10. Programming assignments will be accepted up to 2 weeks late, with a penalty of 1 point out of 10 for each week late (rounding up).

Class email list

You should be automatically subscribed to the class email list.

Course Schedule

Final Exam

The final exam will be held on Mon. May 15, 5:00-6:50, WWH 102. It will be open book and open notes.
Notes on Final Exam

Students with Disabilities

Academic accommodations are available for students with disabilities. Please contact the Moses Center for Students with Disabilities (212-998-4980 or mosescsd@nyu.edu) for further information. Students who are requesting academic accommodations are advised to reach out to the Moses Center as early as possible in the semester for assistance.


You may discuss any of the assignments with your classmates (or anyone else) but all work for all assignments must be entirely your own. Any sharing or copying of assignments will be considered cheating. By the rules of the Graduate School of Arts and Science, I am required to report any incidents of cheating to the department. My policy is that the first incident of cheating will result in the student getting a grade of F for the course. The second incident, by GSAS rules, will result in expulsion from the University.