Theses & Reports

Instructions for submitting a technical report or thesis.

You can find technical reports published prior to 1990 archived here.

Title

Authors

Year

TR2005-867 2005 Infrastructure for Automatic Dynamic Deployment of J2EE Applications in Distributed Environments Akkerman, Anatoly; Totok, Alexander; Karamcheti, Vijay Abstract | PDF

Title: Infrastructure for Automatic Dynamic Deployment of J2EE Applications in Distributed Environments

Author(s): Akkerman, Anatoly; Totok, Alexander; Karamcheti, Vijay

Abstract:

Recent studies showed potential for using component frameworks for building flexible adaptible applications for deployment in distributed environments. However this approach is hindered by the complexity of deployment of component-based applications, which usually involves a great deal of configuration of both the application components and system services they depend on. In this paper we propose an infrastructure for automatic dynamic deployment of J2EE applications,that specifically addresses the problems of (1) inter-component connectivity specification and its effects on component configuration and deployment; and (2) application component dependencies on application server services, their configuration and deployment. The proposed infrastructure provides simple yet expressive abstractions for potential application adaptation through dynamic deployment and undeployment of components. We implement the infrastructure as a part of the JBoss J2EE application server and test it on several sample J2EE applications.
TR2005-858 2005 Remembrance of Experiments Past: Analyzing Time Course Datasets to Discover Complex Temporal Invariants Antoniotti, Marco; Ramakrishnan, Naren; Kumar, Deept; Spivak, Marina; Mishra, Bud Abstract | PDF

Title: Remembrance of Experiments Past: Analyzing Time Course Datasets to Discover Complex Temporal Invariants

Author(s): Antoniotti, Marco; Ramakrishnan, Naren; Kumar, Deept; Spivak, Marina; Mishra, Bud

Abstract:

Motivation: Current microarray data analysis techniques draw the biologist's attention to targeted sets of genes but do not otherwise present global and dynamic perspectives (e.g., invariants) inferred collectively over a dataset. Such perspectives are important in order to obtain a process-level understanding of the underlying cellular machinery, especially how cells react, respond, and recover from stresses.

Results: We present GOALIE, a novel computational approach and software system that uncovers formal temporal logic models of biological processes from time course microarray datasets. GOALIE `redescribes' data into the vocabulary of biological processes and then pieces together these redescriptions into a Kripke-structure model, where possible worlds encode transcriptional states and are connected to future possible worlds. This model then supports various query, inference, and comparative assessment tasks, besides providing descriptive process-level summaries. An application of GOALIE to characterizing the yeast (S. cerevisiae) cell cycle is described.

Availability: GOALIE runs on Windows XP platforms and is available on request from the authors.
TR2005-878 2005 An Abstract Decision Procedure for Satisfiability in the Theory of Recursive Data Types Barrett, Clark; Shikanian, Igor; Tinelli, Cesare Abstract | PDF

Title: An Abstract Decision Procedure for Satisfiability in the Theory of Recursive Data Types

Author(s): Barrett, Clark; Shikanian, Igor; Tinelli, Cesare

Abstract:

The theory of recursive data types is a valuable modeling tool for software verification. In the past, decision procedures have been proposed for both the full theory and its universal fragment. However, previous work has been limited in various ways, including an inability to deal with multiple constructors, multi-sorted logic, and mutually recursive data types. More significantly, previous algorithms for the universal case have been based on inefficient nondeterministic guesses and have been described in fairly complex procedural terms.

We present an algorithm which addresses these issues for the universal theory. The algorithm is presented declaratively as a set of abstract rules which are terminating, sound, and complete. We also describe strategies for applying the rules and explain why our recommended strategy is more efficient than those used by previous algorithms. Finally, we discuss how the algorithm can be used within a broader framework of cooperating decision procedures.
TR2005-869 2005 Squidball: An Experiment in Large-Scale Motion Capture and Game Design Bregler, Christoph; Castiglia, Clothilde; DeVincenzo, Jessica; DuBois, Roger Luke; Feeley, Kevin; Igoe, Tom; Meyer, Jonathan; Naimark, Michael; Postelnicu, Alexandru; Rabinovich, Michael; Rosenthal, Sally; Salen, Katie; Sudol, Jeremi; Wright, Bo Abstract | PDF

Title: Squidball: An Experiment in Large-Scale Motion Capture and Game Design

Author(s): Bregler, Christoph; Castiglia, Clothilde; DeVincenzo, Jessica; DuBois, Roger Luke; Feeley, Kevin; Igoe, Tom; Meyer, Jonathan; Naimark, Michael; Postelnicu, Alexandru; Rabinovich, Michael; Rosenthal, Sally; Salen, Katie; Sudol, Jeremi; Wright, Bo

Abstract:

This paper describes a new large-scale motion capture based game that is called Squidball. It was tested on up to 4000 player audiences last summer at SIGGRAPH 2004. It required to build the world's largest motion capture space, the largest motion capture markers (balls), and many other challenges in technology, production, game play, and social studies. Our aim was to entertain the SIGGRAPH Electronic Theater audience with a cooperative and energetic game that is played by everybody together, in controlling real-time graphics and audio, while bouncing and batting multiple large helium filled balloons across the entire theater space. We detail in this paper all the lessons learned in producing such a system and game, and argue why we believe Squidball was a great success.
TR2005-860 2005 A Domain Decomposition Discretization of Parabolic Problems Dryja, Maksymilian; Tu, Xuemin Abstract | PDF

Title: A Domain Decomposition Discretization of Parabolic Problems

Author(s): Dryja, Maksymilian; Tu, Xuemin

Abstract:

In recent years, domain decomposition methods have attracted much attention due to their successful application to many elliptic and parabolic problems. Domain decomposition methods treat problems based on a domain substructuring, which is attractive for parallel computation, due to the independence among the subdomains. In principle, domain decomposition methods may be applied to the system resulting from a standard discretization of the parabolic problems or, directly, be carried out through a direct discretization of parabolic problems. In this paper, a direct domain decomposition method is introduced to discretize the parabolic problems. The stability and convergence of this algorithm are analyzed, and an $O(\tau+h)$ error bound is provided.
Ph.D. Thesis 2005 Translation Validation of Optimizing Compilers Fang, Yi Abstract | PDF

Title: Translation Validation of Optimizing Compilers

Candidate: Fang, Yi

Advisor(s): Pnueli, Amir; Zuck, Lenore

Abstract:

There is a growing awareness, both in industry and academia, of the crucial role of formally verifying the translation from high-level source-code into low-level object code that is typically performed by an optimizing comiler. Formally verifying an optimizing compiler, as one woule verify any other large program, is not feasible due to its size, ongoing evolution and modification, and possibly, proprietary considerations. Translation validation is a novel approach that offers an alternative to the verification of translator in general and compilers in particular: Rather than verifying the compiler itself, one constructs a validation tool which, after every run of the compiler, formally confirms that the target code produced in the run is a correct translation of the source program. This thesis work takes an important step towards ensuring an extremely high level of confidence in compilers targeted at EPIC architectures.

In this thesis, we focus on the translation validation of structure preserving optimizations, i.e. transformations that do not modify programs' structure in a major way. This category of optimizations covers most of the global optimizations performed by compilers. This thesis has two main parts. One develops a proof rule that formally establishes the correctness of structure preserving transformation based on computational induction. The other part is the development of a tool that applies the proof rule to the automatic validation of global optimizaitons performed by Intel's ORC compiler for IA-64 architecture. With minimal instrumentation from the compiler, the tool constructs ''verification conditions'' -- formal theorems that, if valid, establish the correctness of a translation. The verificaiton conditions are then transferred to an automatic theorem prover that checks their validity. Together, the tool offers a fully automatic method to formally establish the correctness of each translation.
TR2005-875 2005 Nonlinear Image Representation via Local Multiscale Orientation Hammond, David K.; Simoncelli, Eero P. Abstract | PDF

Title: Nonlinear Image Representation via Local Multiscale Orientation

Author(s): Hammond, David K.; Simoncelli, Eero P.

Abstract:

We present a nonlinear image representation based on multiscale local orientation measurements. Specifically, an image is first decomposed using a two-orientation steerable pyramid, a tight-frame representation in which the basis functions are directional derivatives of a radially symmetric blurring operator. The pair of subbands at each scale are thus gradients of progressively blurred copies of the original image. We then discard the magnitude information and retain only the orientation of each gradient vector. We develop a method for reconstructing the original image from this orientation information using an algorithm based on projection onto convex sets, and demonstrate its robustness to quantization.
TR2005-866 2005 An Analysis of Usage Locality for Data-Centric Web Services He, Congchun; Karamcheti, Vijay Abstract | PDF

Title: An Analysis of Usage Locality for Data-Centric Web Services

Author(s): He, Congchun; Karamcheti, Vijay

Abstract:

The growing popularity of XML Web Services is resulting in a significant increase in the proportion of Internet traffic that involves requests to and responses from Web Services. Unfortunately, web service responses, because they are generated dynamically, are considered ``uncacheable" by traditional caching infrastructures. One way of remedying this situation is by developing alternative caching infrastructures, which improve performance using on-demand service replication, data offloading, and request redirection. These infrastructures benefit from two characteristics of web service traffic --- (1) the open nature of the underlying protocols, SOAP, WSDL, UDDI, which results in service requests and responses adhering to a well-formatted, widely known structure; and (2) the observation that for a large number of currently deployed data-centric services, requests can be interpreted as structured accesses against a physical or virtual database --- but require that there be sufficient locality in service usage to offset replication and redirection costs.

This paper investigates whether such locality does in fact exist in current web service workloads. We examine access logs from two large data-centric web service sites, SkyServer and TerraServer, to characterize workload locality across several dimensions: data space, network regions, and different time epochs. Our results show that both workloads exhibit a high degree of spatial and network locality: 10\% of the client IP addresses in the SkyServer trace contribute to about 99.95\% of the requests, and 99.94\% of the requests in the TerraServer trace are directed towards regions that represent less than 10\% of the overall data space accessible through the service. Our results point to the substantial opportunity for improving Web Services scalability by on-demand service replication.
TR2005-877 2005 Oriented Overlays For Clustering Client Requests To Data-Centric Network Services He, Congchun; Karamcheti, Vijay Abstract | PDF

Title: Oriented Overlays For Clustering Client Requests To Data-Centric Network Services

Author(s): He, Congchun; Karamcheti, Vijay

Abstract:

Many of the data-centric network services deployed today hold massive volumes of data at their origin websites, accessing the data to dynamically generate responses. Such dynamic responses are poorly supported by traditional caching infrastructures and result in poor performance and scalability for such services. One way of remedying this situation is to develop alternative caching infrastructures, which can dynamically detect the often large degree of service usage locality and leverage such information to on-demand replicate and redirect requests to service portions at appropriate network locations. Key to building such infrastructures is the ability to cluster and inspect client requests, at various points across a wide-area network.

This paper presents a zone-based scheme for constructing oriented overlays, which provide such an ability. Oriented overlays differ from previously proposed unstructured overlays in supporting network traffic flows from many sources towards one (or a small number) of destinations, and vice-versa. A good oriented overlay would offer sufficient clustering ability without adversely affecting path latencies. Our overlay construction scheme organizes participating nodes into different zones according to their latencies from the origin server(s), and has each node associate with one or more parents in another zone closer to the origin. Extensive experiments with a PlanetLab-based implementation of our scheme shows that it produces overlays that are (1) robust to network dynamics; (2) offer good clustering ability; and (3) minimally impact end-to-end network latencies seen by clients.
Ph.D. Thesis 2005 Translation Validation of Loop Optimizations Hu, Ying Abstract | PDF

Title: Translation Validation of Loop Optimizations

Candidate: Hu, Ying

Advisor(s): Goldberg, Benjamin; Barrett, Clark

Abstract:

Formal verification is important in designing reliable computer systems. For a critical software system, it is not enough to have a proof of correctness for the source code, there must also be an assurance that the compiler produces a correct translation of the source code into the target machine code. Verifying the correctness of modern optimizing compilers is a challenging task because of their size, their complexity, and their evolution over time.

In this thesis, we focus on the Translation Validation of loop optimizations. In order to validate the optimizations performed by the compiler, we try to prove the equivalence of the intermediate codes before and after the optimizations. There were previously a set of proof rules for building the equivalence relation between two programs. However, they cannot validate some cases with legal loop optimizations. We propose new proof rules to consider the conditions of loops and possible elimination of some loops, so that those cases can also be handled. According to these new proof rules, algorithms are designed to apply them to an automatic validation process.

Based on the above proof rules, we implement an automatic validation tool for loop optimizations which analyzes the loops, guesses what kinds of loop optimizations occur, proves the validity of a combination of loop optimizations, and synthesizes a series of intermediate codes. We integrate this new loop tool into our translation validation tool TVOC, so that TVOC handles not only optimizations which do not significantly change the structure of the code, but also loop optimizations which do change the structure greatly. With this new part, TVOC has succeeded in validating many examples with loop optimizations.

Speculative optimizations are the aggressive optimizations that are only correct under certain conditions that cannot be known at compile time. In this thesis, we present the theory and algorithms for validating speculative optimizations and generating the runtime tests necessary for speculative optimizations. We also provide several examples and the results of the algorithms for speculative optimizations.
TR2005-870 2005 Two-Level Schwarz Algorithms, Using Overlapping Subregions, for Mortar Finite Element Methods Hyun Kim, Hyea; Widlund, Olof B. Abstract | PDF

Title: Two-Level Schwarz Algorithms, Using Overlapping Subregions, for Mortar Finite Element Methods

Author(s): Hyun Kim, Hyea; Widlund, Olof B.

Abstract:

Preconditioned conjugate gradient methods based on two-level overlapping Schwarz methods often perform quite well. Such a preconditioner combines a coarse space solver with local components which are defined in terms of subregions which form an overlapping covering of the region on which the elliptic problem is defined. Precise bounds on the rate of convergence of such iterative methods have previously been obtained in the case of conforming lower order and spectral finite elements as well as in a number of other cases. In this paper, this domain decomposition algorithm and analysis are extended to mortar finite elements. It is established that the condition number of the relevant iteration operator is independent of the number of subregions and varies with the relative overlap between neighboring subregions linearly as in the conforming cases previously considered.
Ph.D. Thesis 2005 Construction of Component-Based Applications by Planning Kichkaylo, Tatiana Abstract | PDF

Title: Construction of Component-Based Applications by Planning

Candidate: Kichkaylo, Tatiana

Advisor(s): Karamcheti, Vijay; Ernest Davis

Abstract:

Many modern wide-area distributed systems are component-based. This approach provides great flexibility in adapting applications to the changing state of the environment and user requirements, but increases the complexity of configuring the applications. Because of the scale and heterogeneity of modern wide-area environments, manual configuration is hard, inefficient, suboptimal, and error-prone. Automated application configuration is desired.

Constructing distributed applications requires choosing a set of components that will constitute the application instance and assigning network resources to component executions and data transfers. Stated this way, the application configuration problem (ACP) is similar to the planning (action selection) and scheduling (resource allocation) problems studied by the Artificial Intelligence (AI) community.

This thesis investigates the problem of solving the ACP using AI planning techniques. However, the ACP poses several challenges not usually encountered and addressed by the traditional AI solutions. The problem specification for the ACP can be much larger than the solution, with the relevant portions only identified during the search. Additionally, the interactions between planning operators are numeric rather than logical. Finally, it is desirable to be able to trade off quality of the solution versus search time.

We show that the ACP is undecidable in general. Therefore, instead of a single algorithm, we propose a set of techniques that can be used to compose an algorithm for a particular variety of the ACP that can exploit natural restrictions exhibited by that variety. These techniques address the challenges above by dynamically obtaining portions of the problem specification as necessary during the search, using envelope hierarchies based on numeric information for pruning and search guidance, and discretizing continuous variables to approximate numeric parameters without restricting the form of supported numeric functions.

We illustrate these techniques by describing their use in algorithms tailored for two specific varieties of the ACP --- snapshot configurations for dynamic component-based frameworks, and scheduling of grid workflows with replica selection and explicit resource reservations. Experimental evaluation of the performance of these two algorithms shows that the techniques successfully achieve their goals, with acceptable run-time overhead.
TR2005-873 2005 A BDDC algorithm for problems with mortar discretization Kim, Hyea Hyun; Dryja, Maksymilian; Widlund, Olof B. Abstract | PDF

Title: A BDDC algorithm for problems with mortar discretization

Author(s): Kim, Hyea Hyun; Dryja, Maksymilian; Widlund, Olof B.

Abstract:

A BDDC (balancing domain decomposition by constraints) algorithm is developed for elliptic problems with mortar discretizations for geometrically non-conforming partitions in both two and three spatial dimensions. The coarse component of the preconditioner is defined in terms of one mortar constraint for each edge/face which is an intersection of the boundaries of a pair of subdomains. A condition number bound of the form $C \max_i \left\{ (1+\text{log} (H_i/h_i) )3 \right\}$ is established. In geometrically conforming cases, the bound can be improved to $C \max_i \left\{ (1+\text{log} (H_i/h_i) )2 \right\}$. This estimate is also valid in the geometrically nonconforming case under an additional assumption on the ratio of mesh sizes and jumps of the coefficients. This BDDC preconditioner is also shown to be closely related to the Neumann-Dirichlet preconditioner for the FETI--DP algorithms of \cite{K-04-3d,KL-02} and it is shown that the eigenvalues of the BDDC and FETI--DP methods are the same except possibly for an eigenvalue equal to 1.
TR2005-863 2005 A FETI-DP formulation of three dimensional elasticity problems with mortar discretization Kim, Hyea Hyun Abstract | PDF

Title: A FETI-DP formulation of three dimensional elasticity problems with mortar discretization

Author(s): Kim, Hyea Hyun

Abstract:

In this paper, a FETI-DP formulation for the three dimensional elasticity problem on non-matching grids over a geometrically conforming subdomain partition is considered. To resolve the nonconformity of the finite elements, a mortar matching condition on the subdomain interfaces (faces) is imposed. By introducing Lagrange multipliers for the mortar matching constraints, the resulting linear system becomes similar to that of a FETI-DP method. In order to make the FETI-DP method efficient for solving this linear system, a relatively large set of primal constraints, which include average and momentum constraints over interfaces (faces) as well as vertex constraints, is introduced. A condition number bound $C(1+\text{log}(H/h))2$ for the FETI-DP formulation with a Neumann-Dirichlet preconditioner is then proved for the elasticity problems with discontinuous material parameters when only some faces are chosen as primal faces on which the average and momentum constraints will be imposed. An algorithm which selects a quite small number of primal faces is also discussed.
TR2005-861 2005 BDDC Algorithms for Incompressible Stokes Equations Li, Jing; Widlund, Olof B. Abstract | PDF

Title: BDDC Algorithms for Incompressible Stokes Equations

Author(s): Li, Jing; Widlund, Olof B.

Abstract:

The purpose of this paper is to extend the BDDC (balancing domain decomposition by constraints) algorithm to saddle-point problems that arise when mixed finite element methods are used to approximate the system of incompressible Stokes equations. The BDDC algorithms are iterative substructuring methods, which form a class of domain decomposition methods based on the decomposition of the domain of the differential equations into nonoverlapping subdomains. They are defined in terms of a set of primal continuity constraints, which are enforced across the interface between the subdomains and which provide a coarse space component of the preconditioner. Sets of such constraints are identified for which bounds on the rate of convergence can be established that are just as strong as previously known bounds for the elliptic case. In fact, the preconditioned operator is effectively positive definite, which makes the use of a conjugate gradient method possible. A close connection is also established between the BDDC and FETI-DP algorithms for the Stokes case.
TR2004-857 2005 FETI--DP, BDDC, and Block Cholesky Methods Li, Jing; Widlund, Olof B. Abstract | PDF

Title: FETI--DP, BDDC, and Block Cholesky Methods

Author(s): Li, Jing; Widlund, Olof B.

Abstract:

Two popular non-overlapping domain decomposition methods, the FETI--DP and BDDC algorithms, are reformulated using Block Cholesky factorizations, an approach which can provide a useful framework for the design of domain decomposition algorithms for solving symmetric positive definite linear system of equations. Instead of introducing Lagrange multipliers to enforce the coarse level, primal continuity constraints in these algorithms, a change of variables is used such that each primal constraint corresponds to an explicit degree of freedom. With the new formulations of these algorithms, a simplified proof is provided that the spectra of a pair of FETI--DP and BDDC algorithms, with the same set of primal constraints, are the same. Results of numerical experiments also confirm this result.
TR2005-871 2005 On the Use of Inexact Subdomain Solvers for BDDC Algorithms Li, Jing; Widlund, Olof B. Abstract | PDF

Title: On the Use of Inexact Subdomain Solvers for BDDC Algorithms

Author(s): Li, Jing; Widlund, Olof B.

Abstract:

The standard BDDC (balancing domain decomposition by constraints) preconditioner is shown to be equivalent to a preconditioner built from a partially subassembled finite element model. This results in a system of linear algebraic equations which is much easier to solve in parallel than the fully assembled model; the cost is then often dominated by that of the problems on the subdomains. An important role is also played, both in theory and practice, by an average operator and in addition exact Dirichlet solvers are used on the subdomains in order to eliminate the residual in the interior of the subdomains. The use of inexact solvers for these problems and even the replacement of the Dirichlet solvers by a trivial extension are considered. It is established that one of the resulting algorithms has the same eigenvalues as the standard BDDC algorithm, and the connection of another with the FETI-DP algorithm with a lumped preconditioner is also considered. Multigrid methods are used in the experimental work and under certain assumptions, it can be established that the iteration count essentially remains the same as when exact solvers are used, while considerable gains in the speed of the algorithm can be realized since the cost of the exact solvers grows superlinearly with the size of the subdomain problems while the multigrid methods are linear.

TR2005-872 2005 Real-time rendering of normal maps with discontinuities Parilov, Evgueni; Rosenberg, Ilya; Zorin, Denis Abstract | PDF

Title: Real-time rendering of normal maps with discontinuities

Author(s): Parilov, Evgueni; Rosenberg, Ilya; Zorin, Denis

Abstract:


Title: Real-time rendering of normal maps with discontinuities


(NYU-CS-TR872)

Authors:  Evgueni Parilov, Ilya Rosenberg and Denis Zorin

Abstract:


Normal mapping uses normal perturbations stored in a texture to give
objects a more geometrically complex appearance without increasing
the number of geometric primitives.  Standard bi- and trilinear
interpolation of normal maps works well if the normal field is
continuous, but may result in visible artifacts in the areas where
the field is discontinuous, which is common for surfaces with creases
and dents. 


In this paper we describe a  real-time rendering technique which
preserves the discontinuity curves of the normal field at sub-pixel
level and its GPU implementation. Our representation of the
piecewise-continuous normal field is based on approximations of the
distance function to the discontinuity set and its gradient. Using
these approximations we can efficiently reconstruct discontinuities
at arbitrary resolution and ensure that no normals are interpolated
across the discontinuity. We also described a method for updating
the normal field along the discontinuities in real-time based on
blending the original field with the one calculated from a user-defined
surface profile.

TR2005-859 2005 Algorithmic Algebraic Model Checking I: The Case of Biochemical Systems and their Reachability Analysis Piazza, C.; Antoniotto, M.; Mysore, V.; Policriti, A.; Winkler, F.; Mishra, B. Abstract | PDF

Title: Algorithmic Algebraic Model Checking I: The Case of Biochemical Systems and their Reachability Analysis

Author(s): Piazza, C.; Antoniotto, M.; Mysore, V.; Policriti, A.; Winkler, F.; Mishra, B.

Abstract:

Presently, there is no clear way to determine if the current body of biological facts is sufficient to explain phenomenology. Rigorous mathematical models with automated tools for reasoning, simulation, and computation can be of enormous help to uncover cognitive flaws, qualitative simplification or overly generalized assumptions. The approaches developed by control theorists analyzing stability of a system with feedback, physicists studying asymptotic properties of dynamical systems, computer scientists reasoning about discrete or hybrid (combining discrete events with continuous events) reactive systems---all have tried to address some aspects of the same problem in a very concrete manner. We explore here how biological processes could be studied in a similar manner, and how the appropriate tools for this purpose can be created.

In this paper, we suggest a possible confluence of the theory of hybrid automata and the techniques of algorithmic algebra to create a computational basis for systems biology. We start by discussing our basis for this choice -- semi-algebraic hybrid systems, as we also recognize its power and limitations. We explore solutions to the bounded-reachability problem through symbolic computation methods, applied to the descriptions of the traces of the hybrid automaton. Because the description of the automaton is through semi-algebraic sets, the evolution of the automaton can be described even in cases where system parameters and initial conditions are unspecified. Nonetheless, semialgebraic decision procedures provide a succinct description of algebraic constraints over the initial values and parameters for which proper behavior of the system can be expected. In addition, by keeping track of conservation principles in terms of constraint or invariant manifolds on which the system must evolve, we avoid many of the obvious pitfalls of numerical approaches.
Ph.D. Thesis 2005 Extensible MultiModal Environment Toolkit (EMMET): A Toolkit for Prototyping and Remotely Testing Speech and Gesture Based Multimodal Interfaces Robbins, Christopher A. Abstract | PDF

Title: Extensible MultiModal Environment Toolkit (EMMET): A Toolkit for Prototyping and Remotely Testing Speech and Gesture Based Multimodal Interfaces

Candidate: Robbins, Christopher A.

Advisor(s): Perlin, Ken

Abstract:

Ongoing improvements to the performance and accessibility of less conventional input modalities such as speech and gesture recognition now provide new dimensions for interface designers to explore. Yet there is a scarcity of commercial applications which utilize these modalities either independently or multimodally. This scarcity partially results from a lack of development tools and design guidelines to facilitate the use of speech and gesture.

An integral aspect of the user interface design process is the ability to easily evaluate various design solutions through an iterative process of prototyping and testing. Through this process guidelines emerge that aid in the design of future interfaces. Today there is no shortage of tools supporting the development of conventional interfaces. However there do not exist resources allowing interface designers to easily prototype and quickly test, via remote distribution, interface designs utilizing speech and gesture.

The thesis work for this dissertation explores the development of an Extensible MultiModal Environment Toolkit (EMMET) for prototyping and remotely testing speech and gesture based multimodal interfaces to three-dimensional environments. The overarching goals for this toolkit are to allow its users to: explore speech and gesture based interface design without requiring an understanding of the details involved in the low-level implementation of speech or gesture recognition, quickly distribute their multimodal interface prototypes via the Web, and receive multimodal usage statistics collected remotely after each use of their application.

EMMET ultimately contributes to the field of multimodal user interface design by providing an environment to existing user interface developers in which speech and gesture recognition have been seamlessly integrated into their palette of user input options. Such seamless integration serves to increase the utilization within applications of speech and gesture modalities by removing any actual or perceived deterrents to the use of these modalities versus the use of conventional modalities. EMMET additionally strives to improve the quality of speech and gesture based interfaces by supporting the prototype-and-test development cycle through its Web distribution and usage statistics collection capabilities. These capabilities also allow developers to realize new design guidelines specific to the use of speech and gesture.
TR2005-874 2005 Ranking with a P-norm Push Rudin, Cynthia Abstract | PDF

Title: Ranking with a P-norm Push

Author(s): Rudin, Cynthia

Abstract:

We are interested in supervised ranking with the following twist: our goal is to design algorithms that perform especially well near the top of the ranked list, and are only required to perform sufficiently well on the rest of the list. Towards this goal, we provide a general form of convex objective that gives high-scoring examples more importance. This ``push'' near the top of the list can be chosen arbitrarily large or small. We choose $\ell_p$-norms to provide a specific type of push; as $p$ becomes large, the algorithm concentrates harder near the top of the list.

We derive a generalization bound based on the $p$-norm objective. We then derive a corresponding boosting-style algorithm, and illustrate the usefulness of the algorithm through experiments on UCI data.
TR2005-876 2005 Better Burst Detection Shasha, Dennis; Zhang, Xin Abstract | PDF

Title: Better Burst Detection

Author(s): Shasha, Dennis; Zhang, Xin

Abstract:

A burst is a large number of events occurring within a certain time window. As an unusual activity, it's a noteworthy phenomenon in many natural and social processes. Many data stream applications require the detection of bursts across a variety of window sizes. For example, stock traders may be interested in bursts having to do with institutional purchases or sales that are spread out over minutes or hours. Detecting a burst over any of $k$ window sizes, a problem we call {\em elastic burst detection}, in a stream of length $N$ naively requires $O(kN)$ time. Previous work \cite{DiscoveryBook03} showed that a simple Shifted Binary Tree structure can reduce this time substantially (in very favorable cases near to $O(N)$) by filtering away obvious non-bursts. Unfortunately, for certain data distributions, the filter marks many windows of events as possible bursts, even though a detailed check shows them to be non-bursts.

In this paper, we present a new algorithmic framework for elastic burst detection: a family of data structures that generalizes the Shifted Binary Tree. We then present a heuristic search algorithm to find an efficient structure among the many offered by the framework, given the input. We study how different inputs affect the desired structures. Experiments on both synthetic and real world data show a factor of up to 35 times improvement compared with the Shifted Binary Tree over a wide variety of inputs, depending on the data distribution. We show an example application that identifies interesting correlations between bursts of activity in different stocks.
TR2005-868 2005 Modeling Of Concurrent Web Sessions With Bounded Inconsistency In Shared Data Totok, Alexander; Karamcheti, Vijay Abstract | PDF

Title: Modeling Of Concurrent Web Sessions With Bounded Inconsistency In Shared Data

Author(s): Totok, Alexander; Karamcheti, Vijay

Abstract:

Client interactions with modern web-accessible network services are typically organized into sessions involving multiple requests that read and write shared application data. Therefore when executed concurrently, web sessions may invalidate each other's data. Depending on the nature of the business represented by the service, allowing the session with invalid data to progress might lead to financial penalties for the service provider, while blocking the session's progress and deferring its execution (e.g., by relaying its handling to the customer service) will most probably result in user dissatisfaction. A compromise would be to tolerate some bounded data inconsistency, which would allow most of the sessions to progress, while limiting the potential financial loss incurred by the service. In order to quantitatively reason about these tradeoffs, the service provider can benefit from models that predict metrics, such as the percentage of successfully completed sessions, for a certain degree of tolerable data inconsistency.

This paper develops such analytical models of concurrent web sessions with bounded inconsistency in shared data for three popular concurrency control algorithms. We illustrate our models using the sample buyer scenario from the TPC-W e-Commerce benchmark, and validate them by showing their close correspondence to measured results of concurrent session execution in both a simulated and a real web server environment. Our models take as input parameters of service usage, which can be obtained through profiling of incoming client requests. We augment our web application server environment with a profiling and automated decision making infrastructure which is shown to successfully choose, based on the specified performance metric, the best concurrency control algorithm in real time in response to changing service usage patterns.
Ph.D. Thesis 2005 Pattern Discovery for Hypotheses Generation in Biology Tsirigos, Aristotelis Abstract | PDF

Title: Pattern Discovery for Hypotheses Generation in Biology

Candidate: Tsirigos, Aristotelis

Advisor(s): Shasha, Dennis

Abstract:

In recent years, the increase in the amounts of available genomic as well as gene expression data has provided researchers with the necessary information to train and test various models of gene origin, evolution, function and regulation. In this thesis, we present novel solutions to key problems in computational biology that deal with nucleotide sequences (horizontal gene transfer detection), amino-acid sequences (protein sub-cellular localization prediction), and gene expression data (transcription factor - binding site pair discovery). Different pattern discovery techniques are utilized, such as maximal sequence motif discovery and maximal itemset discovery, and combined with support vector machines in order to achieve significant improvements against previously proposed methods.
TR2005-865 2005 A BDDC algorithm for flow in porous media with a hybrid finite element discretization Tu, Xuemin Abstract | PDF

Title: A BDDC algorithm for flow in porous media with a hybrid finite element discretization

Author(s): Tu, Xuemin

Abstract:

The BDDC (balancing domain decomposition by constraints) methods have been applied successfully to solve the large sparse linear algebraic systems arising from conforming finite element discretizations of elliptic boundary value problems. In this paper, the scalar elliptic problems for flow in porous media are discretized by a hybrid finite element method which is equivalent to a nonconforming finite element method. The BDDC algorithm is extended to these problems which originate as saddle point problems.

Edge/face average constraints are enforced across the interface and the same rate of convergence is obtained as in conforming cases. The condition number of the preconditioned system is estimated and numerical experiments are discussed.
TR2005-864 2005 A BDDC Algorithm for Mixed Formulation of Flow in Porous Media Tu, Xuemin Abstract | PDF

Title: A BDDC Algorithm for Mixed Formulation of Flow in Porous Media

Author(s): Tu, Xuemin

Abstract:

The BDDC (balancing domain decomposition by constraints) algorithms are similar to the balancing Neumann-Neumann methods, with a small number of continuity constraints enforced across the interface throughout the iterations. These constraints form a coarse, global component of the preconditioner. The BDDC methods are powerful for solving large sparse linear algebraic systems arising from discretizations of elliptic boundary value problems. In this paper, the BDDC algorithm is extended to saddle point problems generated from the mixed finite element methods used to approximate the scalar elliptic problems for flow in porous media.

Edge/face average constraints are enforced and the same rate of convergence is obtained as for simple elliptic cases. The condition number bound is estimated and numerical experiments are discussed. In addition, a comparison of the BDDC method with an edge/face-based iterative substructuring method is provided.
TR2005-879 2005 BDDC Domain Decomposition Algorithms: Methods with Three Levels and for Flow in Porous Media Tu, Xuemin Abstract | PDF

Title: BDDC Domain Decomposition Algorithms: Methods with Three Levels and for Flow in Porous Media

Author(s): Tu, Xuemin

Abstract:

Two inexact coarse solvers for Balancing Domain Decomposition by Constraints (BDDC) algorithms are introduced and analyzed. These solvers help remove a bottleneck for the two-level BDDC algorithms related to the cost of the coarse problem when the number of subdomains is large. At the same time, a good convergence rate is maintained.

BDDC algorithms are also developed for the linear systems arising from flow in porous media discretized with mixed and hybrid finite elements. Our methods are proven to be scalable and the condition numbers of the operators with our BDDC preconditioners grow only polylogarithmically with the size of the subdomain problems.
TR2005-862 2005 Three-Level BDDC in Three Dimensions Tu, Xuemin Abstract | PDF

Title: Three-Level BDDC in Three Dimensions

Author(s): Tu, Xuemin

Abstract:

BDDC methods are nonoverlapping iterative substructuring domain decomposition methods for the solution of large sparse linear algebraic systems arising from discretization of elliptic boundary value problems. Its coarse problem is given by a small number of continuity constraints which are enforced across the interface. The coarse problem matrix is generated and factored by direct solvers at the beginning of the computation and it can ultimately become a bottleneck, if the number of subdomains is very large.

In this paper, two three-level BDDC methods are introduced for solving the coarse problem approximately in three dimensions. This is an extension of previous work for the two dimensional case and since vertex constraints alone do not suffice to obtain polylogarithmic condition number bound, edge constraints are considered in this paper. Some new technical tools are then needed in the analysis and this makes the three dimensional case more complicated than the two dimensional case.

Estimates of the condition numbers are provided for two three-level BDDC methods and numerical experiments are also discussed.
Ph.D. Thesis 2005 Automatic Verification of Parameterized Systems Xu, Jiazhao Abstract | PDF

Title: Automatic Verification of Parameterized Systems

Candidate: Xu, Jiazhao

Advisor(s): Pnueli, Amir

Abstract:

Verification plays an indispensable role in designing reliable computer hardware and software systems. With the fast growth in design complexity and the quick turnaround in design time, formal verification has become an increasingly important technology for establishing correctness as well as for finding difficult bugs. Since there is no ``silver-bullet'' to solve all verification problems, a spectrum of powerful techniques in formal verification have been developed to tackle different verification problems and complexity issues. Depending on the nature of the problem whose most salient components are the system implementation and the property specification, a proper methodology or a combination of different techniques is applied to solve the problem.

In this thesis, we focus on the research and development of formal methods to uniformly verify parameterized systems. A parameterized system is a class of systems obtained by instantiating the system parameters. Parameterized verification seeks a single correctness proof of a property for the entire class. Although the general parameterized verification problem is undecidable [AK86], it is possible to solve special classes by applying a repertoire of techniques and heuristics. Many methods in parameterized verification require a great deal of human interaction. This makes the application of these methods to real world problems infeasible. Thus, the main focus of this research is to develop techniques that can be automated to deliver proofs of safety and liveness properties.

Our research combines various formal techniques such as deductive methods, abstraction and model checking. One main result in this thesis is an automatic deductive method for parameterized verification. We apply small model properties of Bounded Data Systems (a special type of parameterized system) to help prove deductive inference rules for the safety properties of BDS systems. Another methodology we developed enables us to prove liveness properties of parameterized systems via an automatic abstraction method called counter abstraction . There are several useful by-products from our research: A set of heuristics is established for the automatic generation of program invariants which can benefit deductive verification in general; also we proposed methodologies for the automatic abstraction of fairness conditions that are crucial for proving liveness properties.
Ph.D. Thesis 2005 Mobility, Route Caching, and TCP Performance in Mobile Ad Hoc Networks Yu, Xin Abstract | PDF

Title: Mobility, Route Caching, and TCP Performance in Mobile Ad Hoc Networks

Candidate: Yu, Xin

Advisor(s): Johnson, David B.

Abstract:

In a mobile ad hoc network, mobile nodes communicate with each other through wireless links. Mobility causes frequent topology changes. This thesis addresses the fundamental challenges mobility presents to on-demand routing protocols and to TCP.

On-demand routing protocols use route caches to make routing decisions. Due to mobility, cached routes easily become stale. To address the cache staleness issue, prior work used adaptive timeout mechanisms. However, heuristics cannot accurately estimate timeouts because topology changes are unpredictable. I propose to proactively disseminate the broken link information to the nodes that have cached the link. I define a new cache structure called a cache table to maintain the information necessary for cache updates, and design a distributed cache update algorithm. This algorithm is the first work that proactively updates route caches in an adaptive manner. Simulation results show that proactive cache updating is more efficient than adaptive timeout mechanisms. I conclude that proactive cache updating is key to the adaptation of on-demand routing protocols to mobility.

TCP does not perform well in mobile ad hoc networks. Prior work provided link failure feedback to TCP so that it can avoid invoking congestion control mechanisms for packet losses caused by route failures. Simulation results show that my cache update algorithm significantly improves TCP throughput since it reduces the effect of mobility on TCP. TCP still suffers from frequent data and ACK losses. I propose to make routing protocols aware of lost TCP packets and help reduce TCP timeouts. I design two mechanisms that exploit cross-layer information awareness: early packet loss notification (EPLN) and best-effort ACK delivery (BEAD). EPLN notifies TCP senders about lost data. BEAD retransmits ACKs at intermediate nodes or at TCP receivers. Simulation results show that the two mechanisms significantly improve TCP throughput. I conclude that cross-layer information awareness is key to making TCP efficient in the presence of mobility.

I also study the impact of route caching strategies on the scalability of on-demand routing protocols with mobility. I show that making route caches adapt quickly and efficiently to topology changes is key to the scalability of on-demand routing protocols with mobility.
Ph.D. Thesis 2005 Information Extraction from Multiple Syntactic Sources Zhao, Shubin Abstract | PDF

Title: Information Extraction from Multiple Syntactic Sources

Candidate: Zhao, Shubin

Advisor(s): Grishman, Ralph

Abstract:

Information Extraction is the automatic extraction of facts from text, which includes detection of named entities, entity relations and events. Conventional approaches to Information Extraction try to find syntactic patterns based on deep processing of text, such as partial or full parsing. The problem these solutions have to face is that as deeper analysis is used, the accuracy of the result decreases, and one cannot recover from the induced errors. On the other hand, lower level processing is more accurate and it can also provide useful information. However, within the framework of conventional approaches, this kind of information can not be efficiently incorporated.

This thesis describes a novel supervised approach based on kernel methods to address these issues. In this approach customized kernels are used to match syntactic structures produced from different preprocessing phases. Using properties of a kernel, individual kernels are combined into composite kernels to integrate and extend all the information. The composite kernels can be used with various classifiers, such as Nearest Neighbor or Support Vector Machines (SVM). The main classifier we propose to use is SVM due to its ability to generalize in large dimensional feature spaces. We will show that each level of syntactic information can contribute to IE tasks, and low level information can help to recover from errors in deep processing.

The new approach has demonstrated state-of-the-art performance on two benchmark tasks. The first task is detecting slot fillers for management succession events (MUC-6). For this task two types of kernels were designed, a surface kernel based on word n-grams and a kernel built on sentence dependency trees; the second task is the ACE RDR evaluation, which is to recognize relations between entities in text from newswire and broadcast news transcript. For this task, five kernels were built to represent information from sentence tokenization, syntactic parsing and dependency parsing. Experimental results for the two tasks will be shown and discussed.