Fall 2012 Graduate Special Topics in
NOTE: for descriptions of standard graduate computer science
courses, see Graduate Course
CSCI-GA 3033-001 Statistical Natural Language Processing
In this course we will explore statistical, model based approaches to natural language processing.
There will be a focus on corpus-driven methods that make use of supervised and unsupervised machine
learning methods and algorithms. We will examine some of the core tasks in natural language
processing, starting with simple word-based models for text classification and building up to rich,
structured models for syntactic parsing and machine translation. In each case we will discuss recent
research progress in the area and how to design efficient systems for practical user applications. In
the course assignments you will construct basic systems and then improve them through a cycle of error
analysis and model redesign. This course assumes a good background in basic probability and a strong
ability and interest to program in Java. The class is open to graduate as well as undergraduate
CSCI-GA 3033-002 Financial Software Projects
The theme of this course is an "applied case study" and focuses on fixed income markets.
Topics covered include an overview of the markets, the inner workings of an investment bank, the
market players, and where software engineers fit in. Students will be grouped into small teams to
build a financial application using practical software engineering principles. Each team will build a
risk management framework, starting with basic components. Prerequisites: It is assumed that the
students can code in C++. No prior experience in the financial sector domain is required.
CSCI-GA 3033-003 Production Quality Software
In this course, students learn to develop production quality software. Lectures present real-world
development practices that maximize software correctness and minimize development time. A special
emphasis is placed on increasing proficiency in a particular programming language by doing weekly
development projects and participating in code reviews. Assignments become more sophisticated as the
semester progresses, eventually incorporating unit tests, build scripts, design patterns, and other
CSCI-GA 3033-004 Open Source Tools
This course covers a brief history and philosophy of open source software, followed by an in-depth
look at open source tools intended for developers. In particular, we will present an overview of the
Linux operating system, command line tools (find, grep, sed), programming tools (GIT, trace), web and
database tools (Apache, MySQL, App Engine), and system administration tools. We will also cover
scripting languages such as shell and Python, and use them to write web applications.
CSCI-GA 3033-005 Distributed Systems
Distributed systems help programmers aggregate the resource of many networked computers to
construct highly available and scalable services. This class teaches the abstraction, design and
implementation techniques that allow one to build fast, scalable, fault-tolerant distributed systems.
Topics include multithreading, network programming, consistency, naming, fault tolerance, security and
several case studies of distributed systems.
CSCI-GA 3033-006 Motion Capture for Gaming & Urban Sensing
CSCI-GA 3033-007 Music Software Projects
Did you ever wonder why there are 12 notes in the western music scale? Or how the intervals between
notes came to be? When were the first musical scales developed or "discovered" and how (and
why) have they been modified since? Who were the key innovators of western music theory over the last
It is not uncommon for software developers to have an affinity for music. After all, the creation
of both software and music is part art and part science. Further, music and computing are built upon
fundamental mathematical principles. While it is not required to understand music theory to be a good
player, understanding why we are constrained to a certain set of notes is an enlightening topic - for
musicians and non-musicians alike.
This course is for students interested in how both music and software are constructed. Student
teams will build software in phases which will demonstrate the underlying rules in modern western
music theory. The beauty of software is that it can be applied in just about any domain.
Music students are encouraged to apply even though this course is primarily a software development
class. The interdisciplinary product development teams will be composed of at least one engineer and
one subject domain expert who will work together on the assignments. The software the teams build will
be used to demonstrate how music theory developed as well as give students an intuitive grasp of some
fascinating underlying universal truths...
CSCI-GA 3033-008 Cancelled
CSCI-GA 3033-009 Speech Recognition
This course gives a computer science presentation of automatic speech recognition, the problem of
transcribing accurately spoken utterances. The description includes the essential algorithms for
creating large-scale speech recognition systems. The algorithms and techniques presented are now used
in most research and industrial systems.
Many of the learning and search algorithms and techniques currently used in natural language
processing, computational biology, and other areas of application of machine learning were originally
designed for tackling speech recognition problems. Speech recognition continues to feed computer
science with challenging problems, in particular because of the size of the learning and search
problems it generates.
The objective of the course is thus not just to familiarize students with particular algorithms
used in speech recognition, but rather use that as a basis to explore general text and speech and
machine learning algorithms relevant to a variety of other areas in computer science. The course will
make use of several software libraries and will study recent research and publications in this
CSCI-GA 3033-010 Computer Games
CSCI-GA 3033-011 Cloud Computing: Concepts & Practice
This is a graduate level course on Cloud Computing with emphasis on
hands-on design and implementations. Both Infrastructure as a Service
(IaaS) and Platform as a Service (PaaS) cloud technologies and
concepts will be covered. By the end of the course, students should
have fair amount of knowledge about how to use a Cloud, write
applications on Cloud and build your own private Cloud.
The first part of the course covers basic building blocks such as
virtualization technologies, virtual appliance, automated
provisioning, elasticity, and cloud monitoring. We shall learn these
concepts by using and extending capabilities available in real clouds
such as Amazon AWS, Google App Engine and OpenStack.
The second part of the course will cover more advanced topics with
emphasis on ultra large scale systems, computation models and storage
clouds for big data. Example topics are storage cloud, cloud
security, Hadoop for Big Data, Network Virtualization (SDNs) and new
services leveraging cloud migration. Several real world applications
will be covered to illustrate these concepts and research innovations
including Facebook Cassandra, Amazon Dynamo, Google Big Table, Hadoop
HDFS, Yahoo Zookeeper.
Students will benefit from background in Operating Systems, and
object oriented programming such as Java. The students are expected
to participate in class discussions, present research papers, and
conduct a significant course project.
CSCI-GA 3033-012 Multicore Processors: Architecture & Programming
The tremendous advances in process technology have created a revolution both in hardware and in
software. On the hardware side, we moved from single core processors to multicore/manycore processors.
Multicore chips are now everywhere. You can find them in smartphones, playstations, notebooks, all the
way up to supercomputers. To benefit from these chips, software must be parallelized, which starts
another revolution in software.
The purpose of this course is to introduce students to both the hardware advances and parallel
programming techniques targeting multicore and manycore processors. Students will learn how to make
the best use of the underlying hardware to build applications that can take advantage of the on-chip
CSCI-GA 3033-013/MATH-GA 2011-003 Analytical Methods in Computer Science
In this course we will explore some of the most exciting developments in theoretical computer
science over the last decade or two, emphasizing the common use of analytic techniques such as
Fourier analysis. The main areas we will touch upon include:
* Property testing: can you test that a certain program does what it is supposed to do using only a
small number of invocations of the program?
* Hardness of approximation: how does one prove that a certain problem is hard to approximate?
* Computational learning: how can a computer learn an unknown concept?
* Voting: is there a way to conduct a vote leading to a ranking of three candidates?
Underlying all these topics is the theory of Fourier analysis of Boolean functions, which would be
the common thread throughout the course. We will see some of the key concepts in this theory,
including the hypercontractive inequality and the "majority is stablest" theorem.
Depending on time constraints and interest we will also get to see very recent topics such as the
use of quantum algorithm to construct low-degree functions, or linear programs for the traveling
Although there are no specific prerequisites, this course is rather mathematical in nature, and so
mathematical maturity is a must. In addition, familiarity with the basics of probability,
probabilistic method, algebra (especially finite fields), analysis of algorithms, and computational
complexity would be helpful, but not necessary.
CSCI-GA 3033-014 Principles of Software Security
Modern societies are increasingly dependent upon the proper functioning of their computing
infrastructure. Yet, that infrastructure is riddled with flaws that at best mean systems fail, and at
worst, allow a malicious attacker to take control. Broadly speaking, this course will address two
1. What are common security problems and what are their underlying causes?
2. What are programming techniques, guidelines, principles, and tools that can help to detect and
Traditionally, computer security is enforced by the operating system, which uses special hardware
support to ensure security properties at application boundaries. However, the proliferation of
successful attacks, such as viruses, worms, SQL injection, and cross-site scripting, shows that
traditional approaches to security are insufficient. Adversaries exploit weaknesses both in the
operating system itself, bypassing any protection mechanisms, and more and more frequently at the
application level, where the operating system provides very limited guarantees. In this class we
consider how programming language techniques can be used to fill the security gap by defending against
Prerequisites: The course is opened to Master and PhD students. The students are assumed to have
previously studied a course in programming languages, to have a good practice of programming in any
high-level programming language, and to have a basic knowledge in formal methods.
| contact firstname.lastname@example.org