Students study the principles of relational database design and learn to build, populate, manipulate and query databases using SQL on datasets relevant to their interests. Students will also explore data presentation through data visualization. Not open to Graduate Computer Science, Information Systems, Mathematics or Scientific Computing students.
An accelerated introduction to the fundamental concepts of computer science for students who lack a formal background in the field. Topics include algorithm design and program development; data types; control structures; subprograms and parameter passing; recursion; data structures; searching and sorting; dynamic storage allocation and pointers; abstract data types, such as stacks, queues, lists, and tree structures; generic packages; and an introduction to the principles of object-oriented programming. The primary programming language used in the course will be Java. Students should expect an average of 12-16 hours of programming and related course work per week.
Reviews a number of important algorithms, with emphasis on correctness and efficiency. The topics covered include solution of recurrence equations, sorting algorithms, selection, binary search trees and balanced-tree strategies, tree traversal, partitioning, graphs, spanning trees, shortest paths, connectivity, depth-first and breadth-first search, dynamic programming, and divide-and-conquer techniques.
This is a capstone course based on computer graphics tools. The course covers a selection of topics that may include computer animation, gaming, geometric modeling, motion capture, computational photography, physically based simulation, scientific visualization, and user interfaces. Not all areas are available every semester; the choice of areas is determined by the instructor. The capstone project involves some or all of the following elements: formation of a small team, project proposal, literature review, interim report, project presentation, and final report.
This course introduces the fundamental concepts and methods of machine learning, including the description and analysis of several modern algorithms, their theoretical basis, and the illustration of their applications. Many of the algorithms described have been successfully used in text and speech processing, bioinformatics, and other areas in real-world products and services. The main topics covered are probability and general bounds; PAC model; VC dimension; perceptron, Winnow; support vector machines (SVMs); kernel methods; decision trees; boosting; regression problems and algorithms; ranking problems and algorithms; halving algorithm, weighted majority algorithm, mistake bounds; learning automata, Angluin-type algorithms; and reinforcement learning, Markov decision processes (MDPs).
Gaussian elimination, Random functions and random differential equations, Chebyshev series, Rational functions, Quadrature, and ODEs
This six-week course will be structured in an unusual way. Each of our six meetings will be independent. At each meeting, the first hour will be a lecture aimed at anyone interested in numerical analysis at a high level, organized around a well-known topic and mixing historical perspectives, recent developments, and always some new mathematics. The second hour will be for enrolled students only, a hands-on work session making use of Chebfun.
In this course, we will cover architectural aspects and capabilities of modern GPUs (graphics processing unit) and will learn the architecture and the programming of GPUs to solve different type of problems. Many computations can be performed much faster on the GPU than on a traditional processor. This is why GPUs are present now in almost all computers; and the majority of Top 500 supercomputers in the world are built around GPUs. GPUs are no longer restricted to graphics applications but are now used for a diverse set of applications and domains. This introduces the concept of general-purpose GPUs or GPGPUs, which is the main subject of this course. This course serves as a capstone for the MSCS program.
In this course we will examine some of the core tasks in natural language processing, starting with simple word-based models for text classification and building up to rich, structured models for syntactic parsing and machine translation. In each case we will discuss recent research progress in the area and how to design efficient systems for practical user applications. There will be a focus on corpus-driven methods that make use of supervised and unsupervised machine learning methods and algorithms. We will explore statistical approaches based on graphical models and neural networks. In the course assignments, which will be updated this year to include more neural network modeling, you will construct basic systems and then improve them through a cycle of error analysis and model redesign. This course assumes a good background in basic probability and a strong ability and interest in building real systems.
This class aims to cover introductory and recent concepts in big data and machine learning systems. The class will focus on a broad spectrum of big data computational problems, algorithms and platforms. The class will be centered around several big data case studies in commercial use to understand big data and machine learning tasks across different domains and provide an in-depth understanding of the problems and algorithms used in each domain. From the computational platform perspective, this class covers popular and widely used commercial platforms including Hadoop and Mapreduce variants, Spark, streaming platforms, Tensorflow, ML platforms (MLBase, MxNet) and GPU platforms. The goal of this class is to educate students about the algorithms and systems techniques used to build scalable big data and ML platforms and how these platforms are used in the real world.
This class covers the following broad topics:
a. Cloud Infrastructure Blocks: AWS, Mapreduce, Hadoop, Yarn
b. Big Data Algorithms: searching and indexing large data sets, implementing standard statistical algorithms, similar items, nearest neighbors, graph mining, network analytics, dimensionality reduction
c. Big Data Computation Platforms: Spark, TensorFlow, MLBase, MLib, MxNet, GPU computing
d. Scalable ML Algorithms: Implementing deep learning algorithms, graph and network analytics, scalable vision algorithms, scalable NLP algorithms
e. Domain Specific Applications and Case Studies: vision analytics engines, NLP analytics engines, search and graph analytics
Bitcoin, Ethereum, and other systems using decentralized ledgers (blockchains) have quickly grown to valuations of tens of billions of dollars. Their future potential has captured the imagination of researchers working on applications as diverse as banking and finance, voting, law, corporate governance, gambling and online gaming. This course will cover the technical concepts underlying these systems: append-only ledgers, decentralized consensus, smart contracts and zero-knowledge proof systems. Students will gain working knowledge of both Bitcoin and Ethereum through practical assignments. The course will also survey the wide variety of potential future applications.
A practical introduction to creating modern web applications. Covers full stack web development - including topics such as database / data model design, MVC architecture, templating, handling user input, asynchronous processing, and client side interactivity. Students will use current server and client side web frameworks to build dynamic, data-driven sites. Various tools to support development will also be introduced, such as version control and build systems. Basic knowledge of HTML and CSS and familiarity with command line tools are recommended.
Social Networks is a specific example of many forms of networks that have become ubiquitous in our modern society. Their utilities have been enhanced by their ability to generate massive amount of personal data that need to be analyzed and disseminated quickly. The World Wide Web enables information flows among vast number of humans; Facebook, LinkedIn, etc. connect small groups of friends; amazon, eBay, etc. provide opportunities for trading, etc. These networks determine our information, influence our opinions, and shape our political attitudes. They also link us, often through important but weak ties, to other humans. Their origin is biological: going back to quorum-sensing, swarming, flocking, social grooming, gossip, etc. Yet, as we have connected our social networks to traditional human institutions (markets, justice systems, education, etc.) through new technologies, the underlying biology has become obscured, but not dormant. This course will introduce the tools, analytics and algorithms for the study of networks and their data. It will show how certain common principles permeate the functioning of these diverse networks: e.g., issues related to robustness, fragility, and interlinkages etc.
This class will examine the challenges of scaling up a data-centric web application to serve millions of users. Students will learn and apply distributed computing concepts as they convert a simple monolithic web service to a scalable distributed architecture, with an emphasis on back-end components. Students will be introduced to the practice of scaling by distribution, and the exercises will have students apply a combination of popular open source technologies and code newly written by them, as they continue to augment the service. Lectures will cover topics such as Data models (Relational Databases vs No-SQL); Data partitioning; Caching; the RPC abstraction and RESTful APIs; distributed data processing pipelines; logging and monitoring; replication; and managed cloud technologies. We will also review existing popular systems that apply these concepts, as case studies. Topics: distributed computing, distributed storage systems, replication and consistency, Open Source technologies.
Natural Language Processing (aka Computational Linguistics) is an inter-disciplinary field applying methodology of computer science and linguistics to the processing of natural languages (English, Chinese, Spanish, Japanese, etc.). Typical applications include: information extraction (automatically finding information from text); information retrieval (web searches and other applications involving the automatic selection of "relevant" documents); sentiment analysis (automatic extraction of opinions about a set of issues); and machine translation (automatically translating one natural language to another). Much of the best work in the field combines two methodologies: (1) automatically acquiring statistical information from one set of "training" documents to use as the basis for probabilistically predicting the distribution of similar information in new documents; and (2) using manually encoded linguistic knowledge. For example, many supervised methods of machine learning require: a corpus of text with manually encoded linguistic knowledge, a set of procedures for acquiring statistical patterns from this data and a transducer for predicting these same distinctions in new text. This class will cover linguistic, statistical and computational aspects of this exciting field.
This course provides an introduction to the field of Computer Graphics. The course will cover the basic mathematical concepts, study the interaction of light with geometry, and implement basic rendering algorithms such as ray tracing and rasterization. At the end of the semester, students will have built their own raytracer, and developed a real-time 3D computer graphics system using the OpenGL 4 graphics API.