Realtime and Big Data Analytics

CSCI-GA.3033-008

 

NYU Courant Institute of Mathematical Sciences

Computer Science Department, Graduate Division

Spring 2013

 


 

General Information

 

Lecturer: Suzanne McIntosh (mcintosh@cs.nyu.edu)

 

Office Hours: Wed. 5:00-6:00 pm and by appointment, WWH 328

 

Semester: Spring 2013

 

Room: WWH 201

 

Day and Time: Wednesday, 7:10-9:00 pm

 


Prerequisites

 

CSCI-GA 2250 or equivalent Operating Systems course; programming experience in Java, Python, or C/C++ for assignments and final project; CSCI-GA 2262, CSCI-GA 2620, or undergraduate course in networks. A familiarity with databases will be useful.

 


Text 

 

Hadoop: The Definitive Guide, by Tom White

Optional text: Programming Pig, by Alan Gates

 


Description

 

This course will introduce technologies at the foundation of the Big Data movement that have facilitated scalable management of vast quantities of data collected through realtime and near realtime sensing. We will also explore the tools enabling the acquisition of near realtime data in the social domain, the fusion of those data when in flight and at rest, and their meaningful representation in graphical visualizations.

 

Students are required to complete weekly reading and programming assignments, and demonstrate mastery of course topics by developing and demonstrating an analytics project of their design. Class time will be set aside for project proposal and final demo.

 


Grading

 

Grades are based on the following approximate weighting:

 

Readings, lab assignments, class participation - 30%

Midterm - 20%

Final - 20%

Project - 30%

 


Syllabus

 

Class

Date

Topic

1

Jan. 30, 2013

Introduction to Hadoop and Big Data

2

Feb. 6, 2013

Distributed File Systems, Pig Programming Language

3

Feb. 13, 2013

Realtime Data Collection and Analytics

4

Feb. 20, 2013

Project Proposals Day

5

Feb. 27, 2013

 

Realtime Data, New Alternatives to Traditional Database Systems and Access Methods

 

6

Mar. 6, 2013

Managing Big Data

7

Mar. 13, 2013

Midterm Exam

No class

Mar 20, 2013

Spring Break

8

Mar. 27, 2013

Project Breakout

9

Apr. 3, 2013

Realtime and Big Data in The Cloud I

 

10

Apr. 10, 2013

Realtime and Big Data in The Cloud II

 

11

Apr. 17, 2013

Realtime and Big Data in The Cloud III, Zookeeper

12

Apr. 24, 2013

MapReduce 1.0 Architecture

13

May 1, 2013

Project Demo Day!

14

May 8, 2013

Project Demos Part 2, Final Exam Review

15

May 15, 2013

Final Exam