Web Search Engines
G22.2580
Monday 5:00-7:00
Room 109, Warren Weaver Hall
Professor Ernest Davis
Reaching Me
- phone: (212) 998-3123
- office: 429 Warren Weaver Hall
- office hours: Tuesday 4:00-6:30, Wednesday 1:00-3:00.
Prerequisites: None.
Textbook:
Mining the Web: Discovering Knowledge from Hypertext Data
Soumen Chakrabarti. Morgan Kaufmann, Pubs., 2002.
Course topics:
We will discuss all aspects of designing a Web search engine, including:
- Web crawlers.
- Database design.
- Query language.
- Relevance ranking
- Document Similarity and Clustering
- The "invisible" Web
- Specialized search engines
- Evaluation.
- Natural Language Processing
- Intelligent retrieval and the semantic Web.
- Web mining
- Multi-media retrieval.
- Multilingual retrieval.
Requirements
Three projects (programming and experimental) (60%)
Final exam (40%).
Class email list
Link to
the class email web page and follow the instructions there for
subscribing.
TA
The TA for this class is Zhongshan Zhang: zhongsha@cs, x8-3319, 801 WWH.
His office hours will be Tue, 2:00-4:00.
Lecture Notes
Lecture 1 (Sept. 13)
Lecture 2 (Sept. 20)
Lecture 3 (Sept. 27)
Lecture 4 (Oct. 4)
Lecture 5 (Oct. 11) was a review of linear algebra.
Lecture 6 (Oct. 18)
Lecture 7 (Oct. 25)
Lecture 8 (Nov. 1)
Lecture 9 (Nov. 8)
Lecture 10 (Nov. 15)
Lecture 11 (Nov. 22)
Lecture 12 (Nov. 29)
Lecture 13 (Dec. 6)
Lecture 14 (Dec. 6)
Projects
Project 1: Subject-Specific Crawler. Due Oct. 4.
NOTE: I have changed this slightly since the original handout.
Project 2: PageRank, HITS, and Random Graphs.
In PostScript.
In PDF.
Revised version of Project 2.
In PostScript.
In PDF.
Test sets: Test set 1.
Test set 2.
Test set 3.
Project 3: Clustering
Reading list for final exam
Solutions to Sample Final Exam