Realtime and Big Data Analytics

CSCI-GA.3033-016

 

NYU Courant Institute of Mathematical Sciences

Computer Science Department, Graduate Division

Fall 2013

 


 

General Information

 

Lecturer: Suzanne McIntosh (mcintosh@cs.nyu.edu)

 

Office Hours: Thur. 6:30-7:00 pm and by appointment, WWH 328

 

Semester: Fall 2013

 

Room: CIWW (Courant Institute, Warren Weaver Hall) 1302

 

Day and Time: Thursday, 7:10-9:00 pm

 


Prerequisites

 

CSCI-GA 2250 or equivalent Operating Systems course; programming experience in Java, Python, or C/C++ for assignments and final project; CSCI-GA 2262, CSCI-GA 2620, or undergraduate course in networks. A familiarity with databases will be useful.

 


Text 

 

Hadoop: The Definitive Guide, by Tom White

Hadoop Operations, by Eric Sammer (optional)

Programming Pig, by Alan Gates (optional)

 


Description

 

This course will introduce technologies at the foundation of the Big Data movement that have facilitated scalable management of vast quantities of data collected through realtime and near realtime sensing. We will also explore the tools enabling the acquisition of near realtime data in the social domain, the fusion of those data when in flight and at rest, and their meaningful representation in graphical visualizations.

 

Students are required to complete weekly reading and programming assignments, and demonstrate mastery of course topics by developing and demonstrating an analytics project of their design. Class time will be set aside for project proposal and final demo.

 


Grading

 

Grades are based on the following approximate weighting:

 

Readings, lab assignments, class participation

30%

Midterm

20%

Final

20%

Project

30%

 


Syllabus

 

        

Class

Date

Topic

1

Sep. 5, 2013

Introduction to Hadoop and Big Data

2

Sep. 12, 2013

Distributed File Systems, Pig Programming Language

3

Sep. 19, 2013

Realtime Data Collection and Analytics

4

Sep. 26, 2013

New Alternatives to Traditional Database Systems and Access Methods

5

Oct. 3, 2013

Project Proposals Day

6

Oct. 10, 2013

Managing Big Data

7

Oct. 17, 2013

Midterm Exam

8

Oct. 24, 2013

Project Team Meetings

9

Oct. 31, 2013

Big Data Visualization

10

Nov. 7, 2013

Realtime and Big Data in The Cloud, part one

11

Nov. 14, 2013

Realtime and Big Data in The Cloud, part two

12

Nov. 21, 2013

Realtime and Big Data - Performance

 

 

 

No class

Nov. 28, 2013

Thanksgiving Break

 

 

 

13

Dec. 5, 2013

Project Demo Day!

14

Dec. 12, 2013

Final Exam Review

15

Dec. 19, 2013

Final Exam