|
2013/05/13: Project report is due 9am May 15th, and the project code is due 9am May 18th,
both via NYU Classes. 2013/05/13: HW3 grades are released, you have till May 17 7pm to check with the TAs if you have any issue with the grading. As usual, the 50% penalty applies. 2013/05/06: Please note that project demo location is CIWW605, not the class room. 2013/05/02: Instructions for project report submission and project code submission will be available in NYUClasses on May 8th. Late submission policy of 1 hour (20% penalty) / 3 hours (50% penalty) applies for both. 2013/04/29: HW2 grades are released. 2013/04/02: Nitish Korula will give a guest lecture on Internet Advertising for the May 1st class! |
Brief Description:Search engines have become a core part of our daily lives. In this course, we will study the foundations of information retrieval and the technical aspects of modern Web search engines. We will also explore a few advanced topics that have emerged to become highly influential in relation to Web search.You are expected to study the course material (textbook and research papers), participate in class discussion, and work on a class project that involves system design and implementation. |
|
Notations:
FD = Fernando Diaz; CY = Cong Yu.
Reading materials will be provided on the web site approximately one week before the lecture date. |
| Date | Topic (Instructor) | Reading Material | Deadlines |
| Lec 00 (a, b) (01/30) | Introduction (CY + FD) | Chapter 1–2 | HW0 out. |
| Lec 01 (02/06) | Evaluation (FD) | Chapter 8 | |
| Lec 02 (02/13) | Ranking (FD) | Chapter 7 | HW0 due; HW1 out. HW1 FAQ |
| Lec 03 (02/20) | Indexing (CY) | Chapter 5 | |
| Lec 04 (02/27) | Document Processing (CY) | Chapter 4 | HW1 due; HW2 out. |
| Lec 05 (03/06) | Crawl (CY) | Chapter 3 | |
| Lec 06 (03/13) | Query Mining (FD) | Chapter 6.1, 6.2, [4], [5], [6] |
HW2 due; Midterm out. |
| 03/20 | Spring Break | no class | |
| Lec 07 (03/27) | Big Data (CY) | [2], [3] |
Midterm due; HW3 out. |
| Lec 08 (04/03) | Search Personalization (FD) | ||
| Lec 09 (04/10) | Realtime Search 1 (FD) | HW3 due. | |
| Lec 10 (04/17) | Realtime Search 2 (FD) | ||
| Lec 11 (04/24) | Knowledge Search (CY) | [7], [8] | |
| Lec 12 (05/01) | Internet Advertising (Nitish Korula) |
article 1; article 2 | |
| 05/08 | Final Exam (CY + FD) | ||
| 05/15-18 | Project Demo Days | CIWW605 |
(via NYU Classes) Project Report due at 5/15 9am. Project Code due at 5/18 9am. |
|
[1] Data-Intensive Text Processing with MapReduce by Lin and Dyer. (Supplemental reading on Big Data) [2] MapReduce: Simplified Data Processing on Large Clusters, by Jefferey Dean and Sanjay Ghemawat, OSDI 2004. [3] Distributed Cube Materialization on Holistic Measures, by Arnab Nandi, et al, ICDE 2011. [4] Donald Metzler, Susan Dumais, and Christopher Meek. 2007. Similarity measures for short segments of text. In Proceedings of the 29th European conference on IR research (ECIR'07), Giambattista Amati, Claudio Carpineto, and Giovanni Romano (Eds.). Springer-Verlag, Berlin, Heidelberg, 16-27. [5] Rosie Jones and Kristina Lisa Klinkner. 2008. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In Proceedings of the 17th ACM conference on Information and knowledge management (CIKM '08). ACM, New York, NY, USA, 699-708. DOI=10.1145/1458082.1458176 http://doi.acm.org/10.1145/1458082.1458176 [6] Marius Pasca and Benjamin Van Durme. 2007. What you seek is what you get: extraction of class attributes from query logs. In Proceedings of the 20th international joint conference on Artifical intelligence (IJCAI'07), Rajeev Sangal, Harish Mehta, and R. K. Bagga (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2832-2837. [7] A Web of Concepts, by Dalvi et al, PODS 2009. [8] Web 3.0: The Dawn of Semantic Search, by James Hendler, IEEE Computer, 43(1), 2010. | |||
|
A big component of the course is a group project. Each group will design and implement
a mini search engine in the first part of the project through a series of homeworks,
and an advanced component on top in the second part of the project. |
| Groups: | |||
| Group ID | Group Members | Group ID | Group Members |
| G01 | xc432, xh379, zz477 | G02 | ssb402, pvb221, nav237 |
| G03 | rc1972, alg489, ssw288 | G04 | jj1233, jyh300, ss6321 |
| G05 | hm1021, ka1042, hj601 | G06 | ql337, yl1258, yl1404 |
| G07 | cc3263, sj1167, hl1115 | G08 | ly544, zz491, mg3658 |
| G09 | fw454, bz465, jl4550 | G10 | zj285, ys1024, qh237 |
| G11 | td859, jl4527, dx262 | G12 | cl1934, rz557, zc440 |
| G13 | al3096, ao925, ys1155 | G14 | sl3268, hz575, yl1766 |
| G15 | ml3329, zl527, wx277 | G16 | aps398, sdb359, pk1094 |
| G17 | kb1573, mkv218, sp2619 | G18 | bs1781, qt224, tyw239 |
| G19 | am5156, kp1264, yg657 | ||
| Project Demo Slot Assignments: | |||
| Time (pm ET) | May 15 | May 16 | May 17 |
|---|---|---|---|
| 5:00 | G01 (CY) | G02 (FD) | |
| 5:15 | G15 (CY) | G09 (CY) | |
| 5:30 | G08 (FD) | G11 (FD) | |
| 5:45 | reserved | G10 (CY) | G18 (CY) |
| 6:00 | G04 (CY) | G16 (CY) | G03 (FD) |
| 6:15 | G19 (FD) | G06 (FD) | G17 (FD) |
| 6:30 | G14 (CY) | G12 (FD) | G14 (CY) - 2nd |
| 6:45 | G13 (CY) | G05 (FD) | G07 (FD) |