Web Search Engines
Room WWH 102
Professor Ernest Davis
- phone: (212) 998-3123
- office: 329 Warren Weaver Hall
- office hours: Wednesday 10:00-12:00, Thursday 3:00-4:00
or by appointment.
Introduction to Information Retrieval by Christopher Manning,
Prabhakar Raghavan, and Hinrich Schütze, (MRS)
Cambridge U. Press, 2008.
Available online .
Other useful books:
Search Engines: Information Retrieval in Practice
by W. Bruce Croft, Donald Metzler, and Trevor Strohman. In particular, this
has more about web search engines specifically (as opposed to information
retrieval generally) than MRS.
Information Retrieval: Implementing and Evaluating Search
Engines by Stefan Büttcher, Charles L.A. Clarke, and Gordon V.
Cormack. Includes very in-depth discussion of data structures and algorithms
Natural Language Processing with Python: Analyzing Text
Steven Bird, Ewan Klein, and Edward Loper, O'Reilly Pubs., 2009.
Note: The online version is current. The print version (2009) is out of date;
It uses an outmoded form of Python, discusses a functionality (babelize) that
stopped working with BabelFish was discontinued, etc.
List of course topics:
We will discuss the design of a Web search engine and the extraction of
information off the Web. Topics include
- Web crawlers.
- Relevance ranking
- Document Similarity and Clustering
- The "invisible" Web
- Natural Language Processing
- Web content mining
- Web usage mining
- Business model: Pricing advertizing
- Multi-media retrieval.
- Multilingual retrieval.
Problem sets (20%)
Programming assignments (15%)
Final exam (40%)
Problem set 1 Due Feb. 6
Programming Assignment 1 Due Feb. 13
Problem set 2 Due Feb. 27
Programming Assignment 2 Due Mar. 6
Problem set 3 Due Mar. 20
Programming Assignment 3 Due Apr. 3
Problem set 4 Due Apr. 3
Course Project Due May 1.
Submitting problem sets: Homeworks should be uploaded to the NYU
Classes site in either Word or PDF. Please do not handwrite your assignments,
scan them, and upload them.
Submitting programming assignments: Source code should be uploaded
to the NYU Classes site. If there is anything at all non-obvious about how to
run it, then a README file should be submitted as well.
Late policy: Problem sets and programming assignments are due
at the start of class on their due date. Problem sets will be accepted up
to 1 week late, with a penalty of 1 point out of 10. Programming assignments
will be accepted up to 2 weeks late, with a penalty of 1 point out of 10 for
each week late (rounding up).
Class email list
You should be automatically subscribed to the
class email list.
Students with Disabilities
Academic accommodations are available for students with disabilities.
Please contact the Moses Center for Students with Disabilities (212-998-4980
or email@example.com) for further information. Students who are requesting
academic accommodations are advised to reach out to the Moses Center
as early as possible in the semester for assistance.
You may discuss any of the assignments with your classmates (or anyone else)
but all work for all assignments must be
entirely your own. Any sharing or copying of assignments will be
considered cheating. By the rules of the Graduate School of Arts and Science,
I am required to report any incidents of cheating to the department.
My policy is that the first incident of cheating will result in the
student getting a grade of F for the course.
The second incident, by GSAS rules, will result
in expulsion from the University.