Search Engine Architecture

CSCI-GA 3033-006 - Spring 2017

NYU Courant Institute of Mathematical Sciences
Department of Computer Science, Graduate Division


A new class of system architectures is needed to address the challenges posed by massively data-intensive problems. The goal of this course is to develop a fast, highly-scalable, and highly-available search engine. We will combine elements of information retrieval, natural language processing, machine learning, and distributed systems with a focus on practical implementation. While web search will be addressed specifically, we will see that the principles studied in this course are common to a variety of data-intensive applications.


CSCI-GA 1170 Fundamental Algorithms, CSCI-GA 2110 Programming Languages, CSCI-GA 2250 Operating Systems, and working proficiency in Python.


Instructor: Matt Doherty


Time and Location: Wednesday 5:10pm - 7:00pm in CIWW 1302

Office Hours: Wednesday 7:00pm - 9:00pm


Class Participation: 10%

Assignments: 50%

Final Project: 40%

Late Policy

Grades for assignments received up to 24 hours late will be multiplied by 0.75. Between 24 and 48 hours, the multiplier will be 0.5. Between 48 and 72 hours, the multiplier will be 0.25. After 72 hours, no credit will be given for the assignment.


Portions of this course were made possible with generous support from Amazon Web Services.