Graduate Special Topics in Computer Science
NOTE: for descriptions
of standard graduate computer science courses, see Graduate Course Descriptions.
Topics Computational Biology: Cell Informatics
Presently, there is no clear
way to determine if the current body of biological facts is sufficient to
explain phenomenology. In the biological community, it is not uncommon to
assume certain biological problems to have achieved a cognitive finality
without rigorous justification. In these particular cases, rigorous mathematical
models with automated tools for reasoning, simulation, and computation can
be of enormous help to uncover cognitive flaws, qualitative simplification
or overly generalized assumptions. Some ideal candidates for such study would
include: prion hypothesis, cell cycle machinery (DNA replication and repair,
chromosome segregation, cell-cycle period control, spindle pole duplication,
etc.), muscle contractility, processes involved in cancer (cell cycle regulation,
angiogenesis, DNA repair, apoptosis, cellular senescence, tissue space modeling
enzymes, etc.), signal transduction pathways, circadian rhythms (especially
the effect of small molecular concentration on its robustness), and many
others. We believe that the difficulty of biological modeling will become
acute as biologists prepare to understand even more complex systems.
Fortunately, in the past, similar
issues had been faced by other disciplines: for instance, design of complex
microprocessors involving many millions of transistors, building and controlling
a configurable robots involving very high degree-of-freedom actuators, implementing
hybrid controllers for high-way traffic or air-traffic, or even reasoning
about data traffic on a computer network. The approaches developed by control
theorists analyzing stability of a system with feedback, physicists studying
asymptotic properties of dynamical systems, computer scientists reasoning
about a discrete or hybrid (combining discrete events with continuous events)
reactive systems---all have tried to address some aspects of the same problem
in a very concrete manner. We believe that biological processes could be
studied in a similar manner, once the appropriate tools are made available.
The goal of this course is to
understand, design and create a large-scale computational system centered
on the biology of individual cells, population of cells, intra-cellular processes,
and realistic simulation and visualization of these processes at multiple
spatio-temporal scales. Such a reasoning system, in the hands of a working
biologist, can then be used to gain insight into the underlying biology,
design refutable biological experiments, and ultimately, discover intervention
schemes to suitably modify the biological processes for therapeutic purposes.
The course will focus primarily on two biological processes: genome-evolution
and cell-to-cell communication.
& Intranet Protocols & Applications
Internet and Intranet Protocols
and Applications studies the world's most widely used application level network
protocols and software systems.
We study protocols, such as
HTTP for the Web, SMTP and POP3 for email, FTP for file transfer, and SSL
for security. We consider protocol design issues, especially as they influence
functionality, reliability and performance. We carefully read protocol specifications,
such as the HTTP specification, RFC 2068. We study the systems which use
these protocols, clients and servers. We also study intermediate systems
which enhance performance, such as caching proxies and content delivery services.
We will examine complex functionality and performance issues, such as time-out
management and high-performance concurrent servers.
Programming assignments ask
students to write clients and servers to the sockets interface. Students
are expected to have taken Data Communications and Networks or equivalent.
Students will write several small programming assignments and one large
project. The large programming project will ask students to design and implement
a load balancing manager as used by content serving companies such as Akamai
Guest lecturers will present
current research and practice on some of the following issues: the design
and operation of an Internet EDI Service, the design and operation of a
high volume Web-based branding system, performance issues in WWW servers,
and Internet security.
The last quarter of the course
examines research that enhances internet and Web performance.
G22.3033-005 The Design
and Programming of Embedded Systems
Languages (G22.2110), Compilers (G22.2130)
The vast majority of computers
today are not general-purpose desktop or laptop machines, they are embedded
as components of other electronic devices - cell phones, microwave ovens,
automobiles, etc. Often, the primary concern when designing and programming
these embedded systems is not speed of execution, but rather power consumption,
memory requirements, and reliability. In this course, we will discuss the
issues faced by embedded system designers, both at the hardware and software
levels. In addition, there will be programming assignments for microprocessors
commonly used in embedded systems.
Prerequisites: G22.2110 and
basic familiarity with Java (or another object-oriented language).
The goal of this course is to
familiarize students with several advanced object-oriented techniques that
are currently widely used in industry. After a brief review of object-oriented
terminology (subtyping, dynamic dispatch, inheritance, delegation, etc.),
the following topics will be presented in detail:
- UML diagrams, and how to
use for designing object-oriented programs
- design patterns: an overview
of design idioms are useful for creating flexible and extensible designs
- techniques for testing of
- performance analysis of
- refactoring: techniques
for restructuring programs in order to accommodate changed requirements
The objective of the course
is to make students sufficiently proficient with the use of these techniques
so that they can apply them in practice. To achieve this goal, the course
has a substantial practical component, in the form of a series of programming
assignments that are performed in groups.
The following is a very preliminary
outline of the course. Please be aware that the following may be subject
Overview of course and project:
- Review of object-oriented
terminology, concepts, and of object-oriented language constructs in
- Introduction to UML. Overview of the 9
types of diagrams. Use cases, use case diagrams classes, attributes,
operations class diagrams relationships: associations, generalization,
aggregation, and composition.
- Relationships (in detail), association,
generalization, multiplicities, navigability, notes, stereotypes, constraints,
interfaces, realization, roles, package diagrams, and object diagrams.
- Interaction diagrams, sequence diagrams,
collaboration diagrams, modeling events, signals, and exceptions, activity
diagrams, statechart diagrams, component diagrams, and deployment
- Introduction to design patterns. Creational
- Design patterns continued: Structural
- Design patterns continued: Behavioral
- Designing with patterns (guest lecture
by John Vlissides).
Midterm (may be split up into 2 separate
Testing of object-oriented applications.
Advanced topics: multiple inheritance,
Advanced topics, to be determined.
Performance analysis of object-oriented
applications (guest lecture by Gary Sevitsky).
Analysis of object-oriented programs.
Application extraction techniques.
The Unified Modeling Language User Guide
by Grady Booch, James Rumbaugh, Ivar Jacobson. Hardcover - 482 pages
(October 30, 1998). Addison-Wesley Pub Co; ISBN: 0201571684
Design Patterns by Erich Gamma, Richard
Helm, Ralph Johnson, John Vlissides. Hardcover - 395 pages 1st edition
(January 15, 1995). Addison-Wesley Pub Co; ISBN: 0201633612
Refactoring : Improving the Design of
Existing Code by Martin Fowler et al. Hardcover - 431 pages 1st edition
(August 1999) Addison-Wesley Pub Co; ISBN: 0201485672
A good textbook on Java is recommended. Two
examples of good textbooks are given below:
Java in a Nutshell : A Desktop Quick Reference
(3rd Edition) by David Flanagan. Paperback - 648 pages 3rd edition
(November 1999). O'Reilly Associates; ISBN: 1565924878
The Java Programming Language by Ken
Arnold, James Gosling, David Holmes. Paperback - 704 pages 3rd edition
(June 15, 2000) Addison-Wesley Pub Co; ISBN: 0201704331
The course will have a large practical component,
in the form of a project in which a simulation of a web-based book-selling
system (or something similar) is built. The project consists of several steps:
Creation of an initial design using UML.
This initial design will require the use of several design patterns.
Implementation of the design in Java, and
testing it (possibly using an automated testing framework such as
Refactoring of the system after the requirements
have changed (e.g., addition/deletion of features, and requirements
make the design more flexible in several respects).
Students will work on projects in groups of
2 or 3 people.
Details to be announced.
G22.3033-009 Empirical Natural Language Processing
Prerequisite: G22.2245-001 (Unix Tools) or
An introductory course in the analysis, design,
and implementation of NLP systems, focusing on data-driven techniques. The
course will start with a hands-on introduction to working with large text
corpora. We will then cover rudimentary machine learning, including basic
information theory, and parameter estimation. The rest of the course will
explore strategies for building NLP applications, such as:
- automatic text classification by topic and/or
- spam filtering
- gazetteer construction via automatic word
- context-sensitive spelling correction for
- language modeling for information retrieval
- induction of monolingual and/or bilingual
- discourse segmentation and automatic mark-up
(text to sgml)
- automatic hyperlinking
- bitext detection and language ID on the
G22.3033-010 Information Visualization
Prerequisites (required): Substantial background
in any one of: Cognitive or perceptual science, computational geometry, computer
graphics, graphic or media design, scientific visualization.
Recommended: Substantial background in two
or more of the above topics. "Substantial background" is taken to mean a
graduate-level course in the subject or substantial undergraduate work (i.e.,
This course will introduce the cross-disciplinary
field of information visualization: the process of creating pictures from
data as an aid for human comprehension and decision-making. Bar graphs are
a simple example of a visualization; this course will move well beyond that
into scientific data, multivariate and time-varying information, and complex,
abstract data structures. Information visualizations are often not simple,
two-dimensional static pictures, so the course will deal with the role of
animation and direct manipulation, methods of handling extremely large data
sets of arbitrary dimension, and tools for filtering data to provide useful
subsets. As the goal of a successful information visualization is to aid
human thought, all of these approaches will be presented in the context of
an understanding of human perceptual and cognitive processes.
Course work will include: readings from current
scientific literature (journal papers, conference proceedings); written analyses;
final project of either an implementation of an existing technique in a practical
setting or development of an effective new technique.
Students will undertake a course project to
build a non-trivial NLP application
G22.3033-012 Molecular Modeling
Prerequisites: basic knowledge of calculus
and programming required; some biology/chemistry recommended.
Content: Introduction to biomolecular modeling
and simulation, including:
- Protein and Nucleic Acid Structure and
Dynamics - minitutorials;
- Modeling Approaches - quantum and molecular
mechanics, molecular dynamics, Monte Carlo;
- Force Fields - functional construction,
variability, evaluation tricks of the trade;
- Molecular Visualization & Simulation
- introduction to the INSIGHT package;
- Selected Topics - protein folding, RNA
folding, DNA dynamics, structural and functional genomics.
Intended Audience: Advanced undergraduates
and graduate students, from all Washington Square science and math departments
(chemistry, biology, physics, mathematics, computer science and
neuroscience), as well as graduate students from the Sackler Institute
of Graduate Biomedical Sciences
Textbook: "Molecular Modeling: An Interdisciplinary
Guide" by Tamar Schlick (Springer-Verlag, to appear in 2002).
Text will be supplemented by articles and additional reference
Format: Class lectures (instructor and guests),
student presentations, videos, and computer labs (homework)
G22.3033-013 Data Mining
We live in the Age of Information. The importance
of collecting data that reflects a business or scientific activity to achieve
competitive advantage is widely recognized now. Advanced systems for collecting
data and managing it in large databases are in place in most large and mid-range
companies. However, the bottleneck of turning this data into your success
is the difficulty of extracting knowledge about the system from the collected
What goods should be promoted to this customer?
What is the probability that a certain customer will respond to a planned
Can one predict the most profitable securities to buy/sell during the next
Will this customer default on a loan or pay back on schedule?
What medical diagnosis should be assigned to this patient?
How large are the peak loads of a telephone or energy network going to be?
Why does the manufacturing facility suddenly start to produce defective goods?
These are all the questions that can be answered
if information hidden in a database can be found explicitly and utilized.
Modeling the investigated system and discovering relations that connect variables
are the subject of data mining.
The course will introduce concepts and techniques
of data mining and data warehousing, including concept, principle, architecture,
design, implementation, application of data warehousing and data mining.
Data warehousing and OLAP technology for data mining
Descriptive data mining: characterization and comparison
Classification and prediction
Mining complex types of data
Applications and trends in data mining
| contact firstname.lastname@example.org