Web Search Engines

Wednesday 5:00-7:00
Room WWH 102
Professor Ernest Davis

Reaching Me

Prerequisites: None.

Required Textbook: Introduction to Information Retrieval by Christopher Manning, Prabhakar Raghavan, and Hinrich Schutze, Cambridge U. Press, 2008.
Available online .

Recommended Textbook: Mining the Web: Discovering Knowledge from Hypertext Data by Soumen Chakrabarti

Tentative list of course topics:

We will discuss the design of a Web search engine and the extraction of information off the Web. Topics include

Lecture notes

Note: These are not complete lecture notes. They have cross references and material that I preferred to project than to write on the blackboard.

Lecture 1: Jan. 26
Lecture 2: Relevance and PageRank Feb. 2
Lecture 3: Index, Query answering, Hardware Feb. 9
Lecture 4: Near Duplicates; Evaluation Feb. 16
Lecture 5: Clustering Feb. 23
Lecture 6: Sponsored Links. Collaborative Filtering Mar. 2
Slides of Wei Xu's talk on NLP tools. Mar. 9
Lecture 7: Specialized Search Engines Mar. 9
Lecture 8: Invisible Web; Table Search; Sentiment Analysis. Mar. 23
Lecture 9: Result Diversity and Query Log Mining Mar. 30.
Lecture 10: Images and Music April 6
Lecture 11: The Multi-Lingual Web April 13
Lecture 12: Information Extraction April 20
Lecture 13: Software Search April 27
Lecture 14: The Politics and Ethics of Search Engines May 4

Instructional Assistants

The instructional assistants for this course are
Daniel Galron       galron@cs.nyu.edu       x8-3127       704 715 Bway       Monday 4-6       Students with last names A-P.
Wei Xu xuwei@cs.nyu.edu x8-3365 713 719 Bway Tuesday 5-7 Students with last names Q-Z.


A course project. (60%)
Final exam (40%).

Class email list

Link to the class email web page and follow the instructions there for subscribing.

Web Hosting

Information about hosting web pages on the CIMS server can be found at The Web at Courant

Final Exam

The final exam will be given Wednesday, May 11, 5:00-7:00, WWH 102.
Format of the Final Exam
Sample Exam, Part I
Sample Exam, Part I: Solutions