Big Data Science

Syllabus

  • Agenda and Topics

    Part#0: Introduction

            Big Data Science Usecases
            The Lifecycle of a Data Science Project

    Part#1: Brief Revision of Analytics Algorithms and Applications

    Data Pre-processing Techniques

    Dimension Reduction Algorithms

    Principal Component Analysis

    Singular Value Decomposition

    Feature Selection and Feature Extraction

    Forward Selection Algorithms

    Feature Ranking Algorithms

    Finding Similarity in Data

              Similarity Measures

             K-means

             Hierarchical Clustering

              DBSCAN

             MinHash

             Bioinspired Algorithms

    Finding Frequent Itemsets

    Data Classification Algorithms

        Decision Trees

        Neural Networks

        KNN

        Support Vector Machines

    Ensemble Methods

    Data Analytics Model Validation

    Recommender Systems

        Content Based Recommenders

        Collaborative Filtering Recommenders

        Trust Based Recommenders

               Sentiment Analysis

                Mining Data Streams

                Advertising on the Web

       Part#2: Large-Scale File Systems and Map-Reduce

    Relational Database Management Systems

    A brief history of Apache Hadoop

    MapReduce Software Paradigm                          
    Mahout (MapReduce Version of Part#1)

    Analyzing Datasets with Hadoop

    Java MapReduce

    Hadoop Streaming

              HDFS

              Introduction to Yarn

    Developing a MapReduce Application

    Intro to Flume, Sqoop, Pig, Hive, Crunch HBase and ZooKeeper


    Part#3: Introduction to Spark



  • Course Work

    Final grades for the course will be determined using the following weights:

                  25% Assignments
                  20% Project
                  25% Midterm
                  25% Final Exam
                  5% Quizzes and Participation
  • Office Hours

    Tuesdays and Thursdays 4:00-5:30pm 

  • Course Mailing List & Other Business

    Late Submission of Assignment
    Programming assignments must be uploaded before or on the due date. There will be a 10% loss for every day late submission.
    Assignments that are submitted three days after the original due data will NOT be accepted.
    In case of an emergency that prevents you from submitting your homework on time, please notify the intrsuctor of the course -- Otherwise the penalty will apply to the homework's grade.  
    Course Mailing List and NYU Classes
    NYUClasses is available for this class. You will need to check NYU Classes regularly for class notes. You will also be recieving regular emails from the intstructor about course notes, grades and guidelines.
    Academic Integrity
    Every student must submit her or his own work.
    All references used in the assignment must be cited.
    Please review  the department policy  that also applies to this course.