Web Search Engines (CSCI-GA 2580)

Fall 2014, Department of Computer Science, NYU

Course Schedule   Course Projects

Current News:

2014/10/27: Midterm available here.
2014/09/28: Sign up for project demo slots here.
2014/09/15: Sign up for project group here.
2014/04/27: Course website is up.

Brief Description:

Search engines have become a core part of our daily lives. In this course, we will study the foundations of information retrieval and the technical aspects of modern Web search engines. We will also explore advanced/special topics that have emerged to become highly influential in relation to Web search.

You are expected to study the course material (textbook and research papers), participate in class discussion, and work on a class project that involves system design and implementation.

Instructors and Logistics:

Dr. Fernando Diaz (Microsoft Research), first_initial_and_last_name [AT] cs dot nyu dot edu
Dr. Cong Yu (Google Research), full_name [AT] cs dot nyu dot edu

Teaching Assistants (Questions regarding homeworks should be sent to TAs first):
  Si Liu, sl4072 [at] nyu [dot] edu
  Weicheng Ma, hikaritgpass [at] nyu [dot] edu

Prerequisite: It is expected that you have a good knowledge of algorithms and at least one of the major programming languages.
Although not a strict prereq, having taken UA.0310 is a good proxy.

Time and Location: Mondays 5:10p - 7:00p, CIWW 1302.
Office Hours: Mondays 4:00p - 5:00p, WWH328 (starting 9/15)
Mailing List: csci_ga_2580_001_fa14 [AT] cs dot nyu dot edu

Textbook:
Search Engines - Information Retrieval in Practice, by W. Bruce Croft, Donald Metzler, Trevor Strohman. Addison Wesley. 2009.

Grading:
Participation 10%;
Exams 40%: Midterm 15%, Final 25%;
Project 50%: 3 Homeworks 30% (10% each), Project Report 10%, Project Demo 10%.

Course Schedule (tentative)

Notations: FD = Fernando Diaz; CY = Cong Yu.
Reading materials will be provided on the web site approximately one week before the lecture date.

Date Topic (Instructor) Reading Material Deadlines
Lec 00 (a, b) (09/08) Introduction (FD) Chapter 1–2 HW0 out.
Lec 01 (09/15) Evaluation (FD) Chapter 8  
Lec 02 (09/22) Ranking (FD) Chapter 7 HW0 due; HW1 out.
Lec 03 (09/29) Indexing (CY) Chapter 5  
Lec 04 (10/06) Document Processing (CY) Chapter 4 HW1 due; HW2 out.
10/13 Fall Recess no class  
Lec 05 (10/20) Web Crawl (CY) Chapter 3 HW2 due.
Lec 06 (10/27) Query Mining (FD) Chapter 6 Midterm out. [1]
Lec 07 (11/03) Personalization (FD)    
Lec 08 (11/10) Big Data (CY) [2], [3] Midterm due;
HW3 out.

Lec 09 (11/17) Internet Advertising
(by Nitish Korula)
   
Lec 10 (11/24) Realtime (FD)   HW3 due.
Lec 11 (12/01) Knowledge (CY) [4-9]  
Lec 12 (12/08) Standards and Ethics (FD)    
12/10 Final Exam (CY+FD)    
12/15-17 Project Demo Days
WWH 805, 5pm to 7pm
  Project Report due at 12/15 9am.
Project Code due at 12/18 9am.
[1] Each group is encouraged to send us a short project proposal via email for a quick check on whether the work is on the right track. If we receive your proposal by 11/10, we will respond via email. After that you need to stop by office hour.
[2] MapReduce: Simplified Data Processing on Large Clusters, by Jefferey Dean and Sanjay Ghemawat, OSDI 2004.
[3] Distributed Cube Materialization on Holistic Measures, by Arnab Nandi et al, ICDE 2011.
[4] Automatic Acquisition of Hyponyms from Large Text Corpora, by Marti Hearst, ACL 1992.
[5] Open Information Extraction from the Web, by Banko et al, IJCAI 2007.
[6] Named Entity Recognition in Query, by Guo et al, SIGIR 2009.
[7] Improving Recommendation for Long-tail Queries via Templates, by Szpektor et al, WWW 2011.
[8] Peekaboom: A Game for Locating Objects in Images, by von Ahn et al, CHI 2006.
[9] Quizz: Targeted Crowdsourcing with a Billion (Potential) Users, by Panagiotis Ipeirotis and Evgeniy Gabrilovich, WWW 2014.


Course Projects

A big component of the course is a group project. Each group will design and implement a mini search engine in the first part of the project through a series of homeworks, and an advanced component on top in the second part of the project.


Groups:
Group ID Group Members Group ID Group Members
G01 wl1002, fg742, jz1371 G02 yy1112, zx339, ks3226
G03 xh499, sw2507, cz764 G04 ak4533, sm5119, aut204
G05 sl3760, dz720, kc2180 G06 yl1949, ws951, sy1288
G07 ww738, hk1642, hy821 G08 arpit.jain, ss7359, maw627
G09 kbp247, rap450, aa3793 G10 rsj259, sh3309, ans486
G11 bmd296, chc490, yg706 G12 cp1425, tsp261, sx238
G13 kh1715, sz1288, ycy247 G14 tt1161, aly233, ff648, adi225
G15 jd3011, jd3007, jl6589 G16 ytl264, saa567, hx364

Project Demo Slot Assignments:
Time (pm ET) December 15 December 16 December 17
5:00 G08 G12 G03
5:15 G09 G15 G05
5:30 G01   G06
5:45 G13 G11 G14
6:00 G04   last resort slot
6:15 G07 G02 (moved) last resort slot
6:30 G16   last resort slot
6:45 G10   last resort slot