Instructions for submitting a technical report or thesis.
Title: Persistent LINDA: Design and implementation of a system to add transactions to LINDA
Candidate: Anderson, Brian
Advisor(s): Shasha, Dennis
Abstract:
Persistent Linda (PLinda hereafter) is based on the shared tuple space model of Linda. PLinda extends the model to facilitate the manipulation of sets and to implement transactional persistence. Its operations are upward compatible with Linda's. We have chosen Linda as a basis for the following reasons:
1) A shared memory model is the language of much parallel algorithms work, so implementing a parallel algorithm is easiest in that model. At the same time, a persistent data store is most useful as a shared resource.
2) In a distributed system, the cost of sending a message is often dominated by the cost of setting up the message. By encapsulating accesses into large semantic units (i.e. Linda tuples) as opposed to machine-dependent units such as words, Linda reduces the number of data transfers, thereby reducing set-up overhead. Shared data stores are also best accessed in large chunks for the same reason.
3) Associative retrieval of tuples is a convenient abstraction to the parallel programmer and is a good target of optimization for the database implementer.
Title: A Theory of Natural Learning
Candidate: Botta, Alexander
Advisor(s): Davis, Ernest
Abstract:
Unsupervised learning is based on capturing regularities in data. We formalize the vague notion of regularity, using the concept of algorithmic information (Solomonoff,Chaitin, Koppel). We present a theory on how regularities are induced and accumulated. A generative model captures a regularity if it achieves compression. A basic regularity is a building block for hierarchical structures. We prove that a basic regularity may be identified as a local maximum in compressibility. Stepwise induction is a polynomial-time approach to structures whose basic components have bound complexity.
Agents exploring a universe engage in active learning. The regularities of their sensory-motor streams are similar to Piaget's schemes and constituents of an induced ontology. We illustrate these ideas on three microworlds. First are Moore automata. State representations are constructed incrementally from results of tests when in that state and from outputs percieved on the way to that state. The second world contains loosely coupled geometric objects. They are basic regularities identifiable by stepwise induction. In the third world the agent has an elaborate eye and can move objects on a tiled surface. Statistical correlations between sets of stimuli are induced, then models are constructed to generate instances of new correlations from already known ones.
Algorithmic information theory allows a unified perspective on many areas of learning research. We define analysis as the separation of novelty in data from the already known. We present explanation based generalization as a well formalized instance of analysis, and constructive induction as an ill defined instance. We show EBG to specialize a theory through positive examples, and we prove it a language independent method, valid beyond the predicate calculus representations.
Title: Differential Properties of Eigenvalues
Author(s): Burke, J.; Overton, M.
Abstract:
We define and study a directional derivative for two functions of the spectrum of an analytic matrix valued function. These are the maximum real part and the maximum modulus of the spectrum. Results are first obtained for the roots of polynomials with analytic coefficients by way of Puiseux-Newton series. In this regard, the primary analytic tool is the so called Puiseux-Newton diagram. These results are then translated into the context of matrices. Precise results are obtained when the eigenvalues that achieve the maximum value for the function under consideration are all either nondefective or nonderogatory. In the defective derogatory cases a general lower bound for the directional derivative is given which, in particular, describes those directions in which the directional derivative attains an infinite value.
Title: A Practical Method for Constructing Efficient LALR(k) Parsers with Automatic Error Recovery
Candidate: Charles, Phillipe
Advisor(s): Schonberg, Edmond
Abstract:
LR parsing is used for a wide range of applications, including compiler construction, automatic code generation, language-specific editors and natural language processing. Currently, however, solutions have not been developed for practical multiple-lookahead parsing, fully-automatic error recovery, and space and time-efficient LR parsing across this wide-range of applications.
We present a practical framework for LR(k) parsing, for k > 1. We give an efficient algorithm that incrementally constructs an LALR(k) parser with varying- length lookahead strings, and whose symbols are consulted during parsing only when necessary.
Currently, effective LR error recovery systems require some user intervention. We describe an effective and fully automated syntactic error recovery method for LR(k) parsers. Finally, we present a generally effective method for compressing LR(k) parsing tables.
We have incorporated these innovations into a parser generator system that automatically constructs a production-quality parser with built-in error diagnostics and recovery. We will show examples of its performance on several programming languages.
Title: Statistical Techniques for Parsing Messages
Candidate: Chitrao, Mahesh
Advisor(s): Grishman, Ralph
Abstract:
Message processing is the extraction of information about key events described in brief narratives concerning a narrow domain. This is a suitable task for natural language understanding, since the amount of world knowledge required is limited. However, the messages are often ill-formed and therefore require the grammar which parses them to be quite forgiving. This often results in a proliferation of parses. This problem is compounded by one's inability to construct a complete domain model which would resolve all the semantic ambiguity. Thus, selection of the correct parse becomes an important goal for such systems.
Structural preference is a technique which helps disambiguation by assigning a higher preference to certain syntactic structures. The idea of statistical parsing evolved from the desire of being able to prefer certain structures over others on the basis of empirical observations, rather than ad-hoc judgement. In the framework of statistical parsing, every production of the grammar is assigned a priority, which is computed from a statistical analysis of a corpus.
There are two distinct methodologies that can be used for assigning these priorities. In Supervised Training , only the correct parses are used for training the grammar. On the other hand, Unsupervised Training uses parses independent of their semantic validity. After assigning the priorities, the parser searches for parses in a best-first order as dictated by these priorities.
When this scheme was incorporated into the PROTEUS message understanding system while processing OPREP (U.S. Navy Operational) messages, a two-fold advantage was observed. Firstly, the speed of the parsing increased, because rare productions tended not to get used at all. Secondly, since the parses were generated in the best-first order, the parses generated earlier on tended to be more likely and semantically more acceptable.
The performance of the modified parsing algorithm was evaluated with and without several refinements such as the use of context sensitive statistics and the use of heuristic penalties. The relative performances of the grammars trained by Supervised Training and Unsupervised Training were also compared.
Title: Randomized Parallel Algorithms for Trapezoidal Diagrams
Author(s): Clarkson, K. L.; Cole, R.; Tarjan, R. E.
Abstract:
We describe randomized parallel CREW PRAM algorithms for building trapezoidal diagrams of line segments in the plane. For general segments, we give an algorithm requiring optimal O(A + nlogn) expected work and optimal O(logn) time, where A is the number of intersecting pairs of segments. If the segments for a simple chain, we give an algorithm requiring optimal O(n) expected work and O(lognloglognlog*n) expected time, and a simpler algorithm requiring O(nlog*n) expected work. The serial algorithm corresponding to the latter is the simplest known algorithm requiring O(nlog*n) expected operations. For a set of segments forming K chains, we give an algorithm requiring O(A + nlog*n + Klogn) expected work and O(lognloglognlog*n) expected time. The parallel time bounds require the assumption that enough processors are available, with processor allocations every logn steps.
Title: The APRAM - The Rounds Complexity Measure and the Explicit Costs of Synchronization
Author(s): Cole, R.; Zajicek, O.
Abstract:
This paper studies the explicit costs of synchronization by examining an asynchronous generalization of the PRAM model called the APRAM model. The APRAM model and its associated complexity measure, the rounds complexity, are defined and then illustrated by designing and analyzing two algorithms: a parallel summation algorithm which proceeds along an implicit complete binary tree and a recursive doubling algorithm which proceeds along a linked list. In both cases replacing global synchronization with local synchronization yields algorithms with reduced complexity.
Title: On the Detection of Robust Curves
Author(s): Cole, R.; Vishkin, U.
Abstract:
Given m points in the plane and a threshold t, a curve is defined to be robust if at least t points lie on it. Efficient algorithms for detecting robust curves are given; the key contribution is to use randomized sampling. In addition, an approximate version of the problem is introduced. A geometric solution to this problem is given; it too can be enhanced by randomization.
These algorithms are readily generalized to solve the problem of robust curve detection in a scene of curve fragments: given a set of curve segments, a curve oe is defined to be robust if curve segments of total length at least l lie on oe. Again, both an exact and an approximate version of the problem are considered.
The problems and solutions are closely related to the well-investigated Hough Transform technique.
Title: An Asynchronous Parallel Algorithm for Undirected Graph Connectivity
Author(s): Cole, R.; Zajicek, O.
Abstract:
An algorithm for computing the components of an undirected graph in the (asynchronous) APRAM model is given; the algorithm uses O(n + e) processes and O(log n) rounds.
Title: The Expected Advantage of Asynchrony
Author(s): Cole, R.; Zajicek, O.
Abstract:
This paper studies the implicit costs of synchronization and the advantage that may be gained by avoiding synchronization in asynchronous environments. An asynchronous generalization of the PRAM model called the APRAM model is used and appropriate complexity measures are defined. The advantage asynchrony provides is illustrated by analyzing two algorithms: a parallel summation algorithm which proceeds along an implicit complete binary tree and a recursive doubling algorithm which proceeds along a linked list.
Title: On the satisfiability problem for unquantified classes of formulae involving set-theoretical and topological constructs
Candidate: Cutello, Vincenzo
Advisor(s): Schwartz, Jacob T.
Abstract:
In this thesis we prove the solvability of the satisfiability problem for various classes of unquantified set-theoretical formulae. In particular, we will provide satisfiability tests that given a formula as input produce a model for it, if any exists. We will also show how the decidability of certain fragments of set theory can be used to prove the solvability of the satisfiability problem for some unquantified languages involving topological notions. In particular, a list of topological statements whose validity can be checked by our algorithms is given. The underlying motivation for this study is to enrich the class of theoretical results that can be used for a set-theoretic proof verifier; we also provide lower bounds for what is undecidable in set theory and topology.
Title: Lucid Representations
Author(s): Davis, E.
Abstract:
This paper criticizes the widespread idea that knowledge bases in AI systems should be complete and that representations should be "model-like." The arguments in favor of such representations are less cogent and more ambiguous than they appear at first. Levesque's suggestion that representations should be "vivid" is extremely restrictive, particularly in its uniform imposition of a closed-world assumption. Spatial representations that are adequate for reasoning about a wide range of physical phenomena must ultimately either use complex symbolic reasoning or deal with partial and imperfect approximations. Requiring that temporal representations be fully detailed simulations will often be extremely inefficient. Finally, a flexible intelligent system must necessarily deal with partial information of all kinds, and the techniques for carrying out reasoning about partial information using complete representations are very limited in their application.
Title: The Kinematics of Cutting Solid Objects
Author(s): Davis, E.
Abstract:
This paper studies how the cutting of one solid object by another can be described in a formal theory. We present two alternative first-order representations for this domain. The first views an object as gradually changing its shape until it is split, at which time the original object ceases to exist and two (or more) new objects come into existence. The second focusses instead on chunks of material which are part of the overall object. A chunk persists with constant shape until some piece of it is cut away, when the chunk ceases to exist. We prove that the two theories are equivalent under ordinary circumstances, and we show that they are sufficient to support some simple commonsense inferences and algorithms.
Title: Additive Schwarz Methods for Elliptic Finite Element Problems in Three Dimensions
Author(s): Dryja, M.; Widlund, O.
Abstract:
Many domain decomposition algorithms and certain multigrid methods can be described and analyzed as additive Schwarz methods. When designing and analyzing domain decomposition methods, we encounter special difficulties in the case of three dimensions and if the coefficients are discontinuous and vary over a large range. In this paper, we first introduce a general framework for Schwarz methods. Three classes of applications are then considered: certain wire basket based iterative substructuring methods, Neumann-Neumann algorithms with low dimensional, global subspaces and a modified form of a multilevel algorithm introduced by Bramble, Pasciak and Xu.
Title: Efficient Algorithms for Cyclic Scheduling
Author(s): Gasperoni, F.; Schwiegelshohn, U.
Abstract:
This work addresses the problem of non-preemptively scheduling a cyclic set of interdependent operations, representing for instance a program loop, when p identical processors are available. For p = 1 we give a simple, efficient, polynomial time algorithm producing optimum results. When p ! 1 the problem becomes NP-hard and a slight modification of our algorithm generates provably close to optimum results.
We consider real-time systems in which the value of a task is proportional to its computation time. The system obtains the value of a given task if the task completes by its deadline. Otherwise, the system obtains no value for the task.
When such a system is underloaded (i.e. there exists a schedule for which all tasks meet their deadlines), Dertouzos showed that the earliest deadline first algorithm will achieve 100% of the possible value. We consider the case of a possibly overloaded system and present an algorithm which: 1. behaves like the earliest deadline first algorithm when the system is underloaded. 2. obtains at least 1/4 of the maximum value that an optimal clairvoyant algorithm could obtain even when the system is overloaded.
We implement our algorithm with an amortized cost of O(log n) time per task, where n bounds the number of tasks in the system at any instant.
Title: Scheduling for Horizontal Systems: The VLIW Paradigm in Persepctive
Candidate: Gasperoni, Franco
Advisor(s): Schonberg, Edmond
Abstract:
This work focuses on automatic extraction of operation level parallelism from programs originally intended to be sequential. Optimality issues in the framework of very long instruction word architectures and compilers (VLIW) are investigated. Possible advantages of an idealized dynamic scheduler over a purely static one are also explored. More specifically the model and the results of scheduling theory are extended to account for cyclicity and branching capabilities present in sequential programs. The existence of inherent bottlenecks in the VLIW paradigm is substantiated and the advantage of dynamic over static scheduling is demonstrated for certain type of loops. A novel technique for efficient parallelization of straight line loops is presented. A simple scheduling heuristic for arbitrary programs is proven to perform between a constant and a logarithmic factor from appropriately defined optimality criteria. Finally it is proven the existence of loops containing branches for which no parallel program can achieve time optimal performance on VLIWs with unlimited resources. The overall aim of the thesis is to identify the family of sequential programs for which the VLIW model of parallel computation is viable.
Title: On Shape Optimizing the Ratio of the First Two Eigenvalues of the Laplacian
Author(s): Haeberly, J.
Abstract:
We investigate numerically a 1956 conjecture of Payne, Polya, and Weinberger. The conjecture asserts that the ratio of the first two eigenvalues of the Laplacian on a bounded domain \Omega of the plane with Dirichlet boundary conditions reaches its minimum value precisely when \Omega is a disk. A crucial feature of this problem is the loss of smoothness of the objective function at the solution. The following results form the core of our numerical treatment. First, we construct finite dimensional families of deformations of a disk equipped with a uniform triangulation. This permits the formulation of a discrete model of the problem via finite element techniques. Second, we build on the work of M. Overton to derive optimality conditions in terms of Clarke's generalized gradients for nonsmooth functions. These ideas are then combined into an algorithm and implemented in Fortran.
Title: Programming with Structures, Functions, and Objects
Author(s): Henglein, F.; Laufer, K.
Abstract:
We describe program structuring mechanisms for integrating algebraic, functional and object-oriented programming in a single framework. Our language is a statically typed higher-order language with specifications, structures, types, and values, and with universal and existential abstraction over structures, types, and values.
We show that existential types over structures generalize both the necessarily homogeneous type classes of Haskell and the necessarily heterogeneous object classes of object-oriented programming languages such as C++ or Eiffel. Following recent work on ML, we provide separate linguistic mechanisms for reusing specifications and structures. Subtyping is provided in the form of explicit type conversions.
The language mechanisms are introduced by examples to emphasize their pragmatic aspects. We compare them with the mechanisms of XML+, Haskell and Eiffel and give a type-theoretic perspective. These mechanisms have been developed within a larger, ongoing prototyping language design project.
Title: Efficient Loop-Level Parallelishm in ADA
Author(s): Hind, M.
Abstract:
Parallelism in scientific applications can most often be found at the loop level. Although Ada supports parallelism via the task construct, its coarseness renders it unsuitable for this light-weight parallelism. In this work, we propose Ada constructs to achieve efficient loop-level parallelism in ANSI-Ada. This is accomplished in two steps. First, we present an idiom that allows the specification of light-weight tasks. Second, we give an efficient implementation of this idiom that is considerably more efficient than a standard Ada task.
In addition, we present an idiom that makes the fetch and add synchronization primitive available at the Ada level. Our implementation of this idiom is more efficient in both time and space than previous results. In addition to providing universal synchronization, using fetch and add simplifies program analysis (e.g. proving the absence of race conditions in the implementation of a parallel algorithm). Since all these idioms are written in standard Ada, they maintain the portability that is central to the mandated uses of the language.
Title: Efficienty Loop-Level Parallelism in ADA
Candidate: Hind, Michael
Advisor(s): Schonberg, Edmond
Abstract:
Parallelism in scientific applications can most often be found at the loop level. Although Ada supports parallelism via the task construct, its coarseness renders it unsuitable for this light-weight parallelism. In this work, we propose Ada constructs to achieve efficient loop-level parallelism in ANSI-Ada. This is accomplished in two steps. First, we present an idiom that allows the specification of light-weight tasks. Second, we give an efficient implementation of this idiom (for a variety of shared memory machines) that is considerably more efficient than a standard Ada task.
In addition, we present an idiom that makes the fetch_and_add synchronization primitive available at the Ada level. Our implementation of this idiom is more efficient in both time and space than previous results. In addition to providing universal synchronization, using fetch_and_add simplifies program analysis (e.g. proving the absence of race conditions in the implementation of a parallel algorithm). Since all these idioms are written in standard Ada, they maintain the portability that is central to the mandated uses of the language.
Title: Segmentation and Surface-Based Modeling Objects in Three-Dimensional Biomedical Images
Candidate: Kalvin, Alan
Advisor(s): Hummel, Robert
Abstract:
The rapid development of technologies for imaging the human body has led to a growing interest in the extraction and analysis of objects in 3D biomedical images for applications in fields such as clinical medicine, biomedical research, and physical anthropology.
This dissertation examines the problem of creating surface-based geometric models of biomedical objects that are suitable for analysis through visualization, mensuration, and manipulation. This is a two-stage problem. First the objects are identified by segmenting the 3D image into regions of interest, and then surface-based models of the objects are created.
We discuss the issues of segmentation and surface construction and introduce the following new methods for solving these problems.
First, we present the MLO algorithm, a general-purpose, domain-independent segmentation algorithm that has been applied successfully to identify skulls in CT images, the ventricle walls of the heart in MR images, brain ventricles in CT images, and carotid arteries in MR angiography images. It uses an iterative, cooperative procedure to segment an image by optimizing a cost function. To achieve a fast segmentation, a coarse-to-fine strategy is employed, using a multiresolution pyramid.
The GRG algorithm is a model-driven, special-purpose algorithm for identifying thin bone in CT head images. The algorithm, developed specifically for craniofacial surgical planning, uses anatomical knowledge in the segmentation process, and can handle the abnormal anatomy of craniofacial patients. It successfully finds most of the thin bone that can not be found using previous methods.
ALLIGATOR is a surface construction algorithm that creates models using the ``winged-edge'' data structure of Baumgart, enabling efficient access to the topological and geometric information of the surfaces, and permitting efficient, topologically consistent modifications to the representations.
Unlike previous surface construction algorithms, ALLIGATOR is suitable not just for visualizing biomedical objects, but for measuring and manipulating them as well. Another important feature of ALLIGATOR is that it uses an adaptive face-merging process to create surface models that are significantly more concise, in terms of vertices, edges, and faces, than the models produced by other surface construction algorithms.
Title: The Development of Parallel Image Algorithms by Prototyping
Candidate: Kelly, Robert
Advisor(s): Hummel, Robert
Abstract:
We examine the process of parallel algorithm development for a class of image synthesis and image processing problems. Algorithms are developed for a class of parallel machines characterized by shared memory multiprocessors, such as is exhibited by the Ultracomputer model. The new algorithms are asynchronous in nature, and many employ the ``pool of tasks'' paradigm. These algorithms are prototyped using the sequential specification language SETL that has been adapted to function as a parallel specification tool. The issue of refinement of the high-level specification is illustrated with a number of examples of machine-specific implementations.
Parallel algorithms are proposed for the connected components problem, for hidden surface removal in surface rendering, and parallel algorithms for ray tracing are discussed. Within the investigation of connected components algorithms, new algorithms are suggested for four classes of approaches to the problem: (1) Adjacency matrix methods, (2) pointer graph methods based on the vertex collapse algorithm of Hirschberg, (3) pointer graph methods based on the Shiloach/Vishkin connected components algorithm, and (4) image scan algorithms, based on the sequential raster scan ``blob coloring'' algorithm. For the third area, the Shiloach/Vishkin-type connected components algorithm, we show how a stronger model of computation (one that permits constant-time concurrent additive-writes) allows the elimination of one of the steps of the algorithm. Although this modification does not improve the asymptotic time complexity of the algorithm, the MIMD version of the Shiloach/Vishkin algorithm is then considerably simplified, and contains fewer synchronization points, and has improved expected execution time performance.
All algorithms are given in the parallel-adapted SETL language. The final versions of all proposed parallel connected components algorithms are further refined into EPEX/Fortran, suitable for execution on an RP3 simulator system. Empirical results are obtained for various algorithms, by use of instrumenting either the SETL version or the EPEX/Fortran version, thereby providing estimates of expected performance times by means of examining average lengths of queues of tasks. In particular, queue activity patterns are examined for executions of the parallel adjacency matrix connected components algorithm, and the MIMD version of the Shiloach/Vishkin connected components algorithm. For the latter, run-time performance estimates are made demonstrating the utility of the modifications made to the MIMD version of the algorithm. For the image scan algorithms, estimates are obtained comparing the size of subimages that are assigned to processors against the sizes of the reduced graph connected components problem that result, based on runs of the EPEX/Fortran version. Finally, the shared-memory access patterns of the parallel ray-tracing algorithm are examined, suggesting that the algorithm is viable in terms of memory contention rates.
Title: Semantically Based Concurrent Data Structure Algorithms
Candidate: Lanin, Vladimir
Advisor(s): Shasha, Dennis
Abstract:
A computational environment is called concurrent when it allows several threads of sequential control, or processes, to overlap in time and to communicate with each other. Such an environment is called synchronous when the length of time it takes any process to execute any sequence of steps can be determined in advance. When such a calculation is impossible (at least to the precision required), the environment is called asynchronous.
Algorithms designed to work in the asynchronous concurrent environment have appeared in the literature for such data structures as B-trees, hash tables, and queues. The most common standard of correctness for a concurrent algorithm is serializability, which requires that the effects of a concurrent computation be equivalent to some serial composition of the same actions. However, several notions of ``equivalence'' exist, depending on whether they take into account the semantics of the data structure,or only the syntax of the computation.
We examine the drawbacks and advantages of several correctness standards, and identify a particular standard to be of general utility. Furthermore, we formalize the notion of decisive operations, and show how it can be applied to greatly simplify semantic serializability proofs.
We apply the concepts of syntactic and semantic serializability to the development of several novel algorithms, including an extension of the tree protocol to changing trees, a highly concurrent B-tree algorithm, and a wait-free set manipulation algorithm. Useful techniques appearing in the design are identified, and the correctness proofs serve as examples of the techniques previously described.
Title: On the Optimization of Term Rewriting
Candidate: Li, Ke
Advisor(s): Kedem, Zvi
Abstract:
Term rewriting systems (TRSs) are widely applied in automated theorem proving, equational languages, logic programming, specification of software and hardware, and other symbolic computations. An important computation procedure in applications of TRSs is to reduce a term to its normal form. The research of this thesis examines the complexity of normal form computation and explores efficient rewriting strategies for it.
A rewriting strategy is said optimal if for any term, it always generates the shortest derivation when computing the term's normal form. First, we prove that a universal optimal rewriting strategy for any canonical TRS does not exist unless NP = P . We prove the same result for AC TRS in which some functions are associative and commutative.
To find efficient rewriting strategies, we divide TRSs into three categories: variable-more , variable-equal , and variable-fewer , and propose optimal rewriting strategies for the first two categories and approximate strategies for the last.
We have done experiments on RRL (Rewrite Rule Laboratory) -- an automated theorem prover with term rewriting as the basic inference rule. The experiment output confirm our theoretical results. Based on our theory and experiment results, we improve RRL by implementing new programs that automatically choose an efficient rewriting strategy for an given term rewriting system.
Title: The Design and Implementation of ALLOY, a Higher Level Parallel Programming Language
Candidate: Mitsolides, Thanasis
Advisor(s): Harrison, Malcolm C.
Abstract:
The goal of this thesis is to show that it is possible to define a parallel higher level programming language for programming in the large which will be able to easily express both complicated parallel problems and traditional serial ones. Such a language would provide many good features of serial and parallel programming languages and be appropriate for programming massively parallel computing systems. To demonstrate this a simple language, called ALLOY, was designed. The main features of this language, could be incorporated into other languages.
ALLOY, directly supports functional, object oriented and logic programming styles in a unified and controlled framework. Evaluating modes support serial or parallel execution, eager or lazy evaluation, non-determinism or multiple solutions. These modes can be combined freely. ALLOY is simple, utilizing only 29 primitives, half of which are for object oriented programming.
The power of ALLOY is demonstrated through the use of a wide variety of examples. Some of the examples are: a) partition sort and FP library demonstrating clarity, efficiency, and simple parallelism, b) prime numbers and buffering demonstrating the ability to select between eager and lazy evaluation, c) systolic sort and merge sort demonstrating dynamic networks of communicating processes, d) N-queens and list permutations demonstrating serial and parallel searching. A library is given for programming in logic programming styles. Finally a number of parallel objects demonstrate ALLOY's ability to exploit massively parallel architectures effectively.
An interpreter of ALLOY together with a number of utilities and a programming environment has been written in Common Lisp. The system is available for anonymous ftp. It is shown that ALLOY can have reasonably efficient implementation on shared memory multiprocessor (MIMD) systems supporting highly parallel operations, on distributed architectures, and possibly on Data Flow architectures as well.
Title: Decomposition and Fictitious Domains Methods for Elliptic Boundary Value Problems
Author(s): Nepomnyaschikh, S.
Abstract:
Boundary value problems for elliptic second order equations in three-dimensional domains with piecewise smooth boundaries are considered. Discretization of the problem is performed using a conventional version of the finite element method with piecewise linear basis functions. The main purpose of the paper is the construction of a preconditioning operator for the resulting system of grid equations. The method is based on two approaches: decomposition of the domain into subdomains and using a new version of the method of fictitious domains. The rate of convergence of the corresponding conjugate gradient method is independent of both the grid size and the number of subdomains.
Title: Optimality Conditions and Duality Theory for Minimizing Sums of the Largest Eigenvalues of Symmetric Matrices
Author(s): Overton, M.; Womersley, R.
Abstract:
This paper gives max characterizations for the sum of the largest eigenvalues of a symmetric matrix. The elements which achieve the maximum provide a concise characterization of the generalized gradient of the eigenvalue sum in terms of a dual matrix. The dual matrix provides the information required to either verify first-order optimality conditions at a point or to generate a descent direction for the eigenvalue sum from that point, splitting a multiple eigenvalue if necessary. A model minimization algorithm is outlined, and connections with the classical literature on sums of eigenvalues are explained. Sums of the largest eigenvalues in absolute value are also addressed.
Keywords: symmetric matrix, maximum eigenvalues, spectral radius, minimax problem, max characterization, generalized gradient.
Title: Semantic program analyses for storage management optimizations in functional language implementations
Candidate: Park, Young G.
Advisor(s): Goldberg, Benjamin
Abstract:
One of the major overheads in implementing functional languages in both uniprocessor and multiprocessor environments is the storage management overhead due to dynamic allocation and automatic reclamation of indefinite- extent storage. We investigate compiler optimization to reduce such overhead by statically inferring the lifetime information about dynamically-allocated objects.
We have developed a set of compile-time semantic analyses for a higher-order monomorphic strict functional language based on denotational semantics and abstract interpretation:
Those analyses are extended to both polymorphic and non-strict (either normal- order evaluation or lazy evaluation) languages.
Using statically inferred escape information, we have proposed a variety of storage management optimization techniques including stack allocation, explicit reclamation, in-place reuse, reference counting elimination, block allocation/reclamation, and improving generational garbage collection.
Title: An Additive Schwarz Method for the P-Version Finite Element Method
Author(s): Pavarino, L.
Abstract:
The additive Schwarz method was originally proposed for the h-version finite element method for elliptic problems. In this paper, we apply it to the p-version, in which increased accuracy is achieved by increasing the degree of the elements while the mesh is fixed. We obtain a constant bound, independent of p, for the condition number of the iteration operator in two and three dimensions. The result holds for linear, self adjoint, second order elliptic problems and for quadrilateral elements.
Title: Counting Real Zeros
Candidate: Pedersen, Paul
Advisor(s): Mishra, Bud
Abstract:
This thesis presents an n -dimensional generalization of Hermite's theorem for counting real roots of a polynomial using quadratic forms. We solve the problem of counting the number of real solutions of a system of polynomial equations within an algebraic polyhedron in n -dimensional space, where the polynomials are taken to have rational coefficients.
Our algorithm is purely symbolic, which means that it may be used to implement infinite-precision algorithms for arithmetic in the real-algebraic subset of the real numbers. We present algorithms for doing this as an application of the general theory.
Our algorithms are based on resultant theory, both because this theory provides insights into the algorithms, and because it makes possible a comparatively clear complexity analysis which shows the algorithms to be worst-case optimal, i.e., singly exponential in the degree of the polynomials.
Title: Combinatorial and algorithmic analysis of stabbing and visibility problems in three-dimensional space
Candidate: Pellegrini, Marco
Advisor(s): Pollack, Richard
Abstract:
Given a set $T$ of triangles in 3-space, with $\vert T\vert$ = $n$, let ${\cal S}$($T$) be the set of all lines stabbing the set $T$. The combinatorial descriptive complexity of ${\cal S}$($T$) is denoted by #${\cal S}$($T$). The following questions about ${\cal S}$($T$) are considered in this thesis: (a) answer the query given a line $l$, is $l$ $\in$ ${\cal S}$($T$)? (query problem). (b) decide whether ${\cal S}$($T$) $\ne$ 0 (existence problem). (c) Give upper and lower bounds on #${\cal S}$($T$). The following results are shown in this thesis: (1) There is an $\Omega (n\sb3$) lower bound for #${\cal S}$($T$). Also ${\cal S}$($T$) may have $\Omega (n\sb2$) connected components. (2) There is an $O (n\sp{3+\epsilon}$) upper bound on #${\cal S}$($T$). Within the same time bound it is possible to solve the existence problem. (3) The existence problem for triangles on a set of planes with $g$ different plane inclinations can be solved in $O(g\sp2 n\sp2 {\rm log}\ n$) time. (4) The query problem is solvable in $O(n\sp{2+\epsilon}$) preprocessing and storage and logarithmic $O({\rm log}\ n$) query time. (5) The results (1), (2), (3) and (4) extend, with the same asymptotic bounds, to sets of convex polyhedra with total complexity $n$. Given a set $T$ of n disjoint triangles, the ray shooting problem for $T$ is the following: preprocess $T$ so to be able to answer queries of the form Given a ray $\rho$, does $\rho$ hit any triangle in $T$?. The following results are shown in this thesis: (1) Using $O(n\sp{3+\epsilon}$) randomized preprocessing time and storage we can solve ray-shooting queries in $O (\sqrt{n}{\rm log}\sp2 n$) worst case query time. (2) If we are given $m$ $>$ $n\sp{7/5}$ rays and $n$ disjoint triangles, we can answer all the ray shooting queries in $O (m\sp{5/6-\delta} n\sp{5/6+5\delta}$log $n$ + $m$ log $n$ + $n$ log $m$) randomized expected time and $O (m+n$) space, for every $\delta$ $>$ 0. The multiplicative constants depend on $\delta$. (3) Given $m$ rays and $n$ axis-oriented boxes we can answer ray shooting queries in randomized expected time $O(m\sp{3/4-\delta}n\sp{3/4+3\delta}\log\sp3n+m$ log$\sp3n+n$ log $m$) and $O(m+n$) space, for 1/28 $<$ $\delta$ $<$ 1/9. The multiplicative constants depend on $\delta$.
Title: Properties of Convex Polytopes
Candidate: Prabhu, N.
Advisor(s): Pollack, Richard
Abstract:
The thesis presents some results about the boundary complexes of convex polytopes.
$n > c d\sqrt{d}$
( c a constant) one can construct a simple d -polytope with n vertices. In fact for all $n>c d \sqrt{d},$
we construct a Hamiltonian simple d -polytope with n vertices. The Hamiltonicity of the constructed polytopes improves a result of Victor Klee.Title: Amortized Complexity of Data Structures
Candidate: Sundar, Rajamani
Advisor(s): Boppana, Ravi
Abstract:
This thesis investigates the amortized complexity of some fundamental data structure problems and introduces interesting ideas for proving lower bounds on amortized complexity and for performing amortized analysis. The problems are as follows:
Title: Performance Evaluation of Solutions to the TLB Consistency Problem
Candidate: Teller, Patricia
Advisor(s): Gottlieb, Allan
Abstract:
To implement virtual memory efficiently, virtual-to-physical address translation information is stored in page tables and cached in translation-lookaside buffers (TLBs). In multiprocessors with multiple TLBs, page-table modifications can result in outdated TLB entries, the use of which can cause erroneous memory accesses.
We propose three new solutions to this TLB consistency problem, which unlike existing solutions for highly-parallel shared-memory multiprocessors do not require interprocessor synchronization and communication, and neither interrupt processor execution nor introduce unnecessary serialization.
The cost of each of our solutions is embodied in the cost of TLB reloads, which load translation information for referenced pages into TLBs. Two assume TLBs at processors and one assumes TLBs at memory. We study their performance in scalable multiprocessor architectures via a trace-driven simulation system capable of simulating a range of systems using just one address trace.
Our results show that system performance improves if TLBs are located at memory, rather than processors, provided that memory is organized as multiple paging arenas, where the mapping of pages to arenas is fixed.
A class of parallel workloads can produce a number of TLB reloads, R, that grows linearly with N. A set of our simulations for processor-based TLBs validate this model.
A processor-based TLB reload costs O(log N) because of network transit. Thus, management of processor-based TLBs, be it consistency ensuring or not, has an overhead that grows as N log N.
The cost of a memory-based TLB reload within a paging arena can be made smaller than that of a processor-based TLB, since additional network transits are not required.
Simulation result show that when there is only one paging arena, memory-based TLBs exhibit generally larger miss rates than processor-based TLBs, and the related overhead is generally larger. When there are two paging arenas, memory-based TLBs produce smaller miss rates than processor-based TLBs of equal size, and the related overhead is generally smaller. To maintain low overhead for large machines, it is likely that the number of paging arenas must grow as O(N).
Title: Applications and Analysis of Probabilistic Techniques
Candidate: Tetali, Prasad
Advisor(s): Spencer, Joel
Abstract:
The thesis illustrates the strength of randomness by applying some recent probabilistic techniques to solve problems in number theory, graph theory and computer science.
The first part of the thesis is concerned with random construction of integer sequences with certain additive properties. A set of natural numbers is called an asymptotic basis of order k , if every number (sufficiently large) can be expressed as a sum of k distinct numbers from the set. We prove that for every fixed k , there exists an asymptotic basis of order k such that the number of representations of n is $\Theta (\log n)$
. The case k =2 was proved in 1956 by Paul Erdos.
The second part deals with analysis of random walks on graphs. Random walks on graphs have been known to have interesting analogies in electrical networks. A precise characterization of effective resistance in electrical networks is provided in this thesis in terms of random walks on the underlying graphs. The interpretation of effective resistance yields interesting new results and new proofs for some known results. The main result here is an exact formula for the hitting time between two vertices in terms of the effective resistances in the network, settling an open question. This is much in the spirit of the commute time result by Ashok Chandra et al.
Title: Resilient Computations in the Presence of Slow-Downs
Candidate: Turek, John
Advisor(s): Shasha, Dennis; Cole, Richard
Abstract:
With the advent of low cost work stations, distributed systems are becoming increasingly attractive. However, as the number of components in the system increases so does the probability of some component failing. When system designers discuss fault-tolerance, they typically restrict themselves to the problem of handling fail-stop failures. This work proposes an enhanced failure model that allows processes to fail by either slowing down or stopping; slow processes may later speed up, continue to proceed slowly, or, eventually, stop. We call such failures slow-downs. The model does not assume the ability to distinguish among these possibilities, say, by using a timeout mechanism, nor does it assume that it is possible to kill a slow process.
This thesis presents several results in this context. We discuss how to execute transactions under the slow-down model when the correctness criteria is serializability. We then discuss how to transform a class of lock-based concurrent data structures into nonblocking data structures. Both results are developed in the context of a shared memory machine having an atomic compare&swap.
We conclude this thesis by giving algorithms that can be used to emulate a reliable shared memory with compare&swap on a message passing system prone to slow-downs.
Title: Query Optimization in Database and Information Retrieval Systems
Candidate: Wang, Tsong-Li
Advisor(s): Shasha, Dennis
Abstract:
Recently, several prototype and commercial systems based on a loosely-coupled shared-nothing architecture have been proposed and built for database applications. To achieve speed-ups proportional to the number of processors for operations such as selections and joins, such systems often distribute data across storage units using a hashing function. In the first part of this thesis, we investigate ways of minimizing response time for various multi-join queries in such systems. We develop a dynamic programming algorithm for queries whose closures are chains. We next prove the NP-completeness for more general queries and propose four heuristics for them. We then evaluate experimentally the relative performance of these heuristics and their performance relative to optimums. The empirical results show that a hybrid heuristic combining our chain algorithm with a heuristic related to Kruskal's spanning tree algorithm performs well.
In the second part of the thesis, we present a scheme to answer best-match queries from a file containing a collection of objects. A best-match query is to find the objects in the file which are closest (according to some (dis)similarity measure) to a given target.
Previous work suggested that one can reduce the computational effort required to achieve the desired results using the triangle inequality when starting with a data structure for the file which reflects some precomputed intrafile distances. We generalize the technique to allow the optimum use of any given set of precomputed intrafile distances. We then extend our scheme to a class of queries for retrieving similar or dissimilar objects that commonly arise in vision and molecular biology. Artificial data and actual protein sequences are used to illustrate the effectiveness of our scheme for different queries, and to compare its performance with previous algorithms.
Finally, we implement our techniques into a tree information system that enables users to retrieve and extract information from trees based on approximate comparison. We expect this system to have applications in pattern recognition, biology, linguistics, and programming languages. The system is implemented in C and X-windows, and is fully operational on SUN workstations.
Title: Some Schwarz Methods for Symmetric and Nonsymmetric Elliptic Problems
Author(s): Widlund, O.
Abstract:
This paper begins with an introduction to additive and multiplicative Schwarz methods. A two-level method is then reviewed and a new result on its rate of convergence is established for the case when the overlap is small. Recent results by Xuejun Zhang, on multi-level Schwarz methods, are formulated and discussed. The paper is concluded with a discussion of recent joint results with Xiao-Chuan Cai on nonsymmetric and indefinite problems.
Title: Toward a Fully Integrated VLSI CAD System: from Custom to Fully Automatic
Candidate: You, Yongtao
Advisor(s): Siegel, Alan
Abstract:
This thesis describes an integrated CAD environment, which is intented to support almost all phases of the VLSI circuit design cycle, from high-level circuit description down to mask specification. Several VLSI CAD tools have been integrated together under the environment, including a multi-level simulator Msim, a hardware description language CHDL, some automatic placement tools, a schematic layout editor, and the UC Berkeley-developed geometry layout editor Magic.
The multi-level simulator Msim supports top-down design by allowing circuits whose components are described at different levels to be simulated together. The levels of circuit description currently supported include a hardware description language CHDL, which is a variant of the C programming language for circuit behavior descriptions, a schematic layout representation, and the Magic layout from which masks for wafer fabrication can be generated.
The schematic layout editor allows designers to specify interconnections among circuit components in a very efficient manner. It supports both behavioral descriptions and high level geometric layout of a circuit. Designers can have a graphical view of their design, and specify, within this graphical organization, the behavioral description of components at different levels of abstraction. These schematic layouts with different levels of representation can be simulated using the multi-level simulator Msim.
The automatic placement tool presently performs bottom-up iterative improvement, with simulated annealing as its assistant when needed. An interactive graphics interface is provided which allows human intervention on intermediate as well as final layouts.
In addition, the linear (true) charge-sharing modeling problem with indeterminate transistor switches is shown to be NP-Complete, which explains why it is integrated exclusively within the lattice model for our switch-level simulation.
Title: Domain Decomposition Algorithms for the Biharmonic Dirichlet Problem
Author(s): Zhang, X.
Abstract:
We consider additive Schwarz methods for the biharmonic Dirichlet problem and show that the algorithms have optimal convergence properties for some conforming finite elements. Some multilevel methods are also discussed.
A class of multilevel methods for second order problems is considered in the additive Schwarz framework. It is established that, in the general case, the condition number of the iterative operator grows at most linearly with the number of levels. The bound is independent of the mesh sizes and the number of levels under a regularity assumption. This is an improvement of a result by Dryja and Widlund on a multilevel additive Schwarz algorithm, and the theory given by Bramble, Pasciak and Xu for the BPX algorithm.
Additive Schwarz and iterative substructuring algorithms for the biharmonic equation are also considered. These are domain decomposition methods which have previously been developed extensively for second order elliptic problems by Bramble, Pasciak and Schatz, Dryja and Widlund and others.
Optimal convergence properties are established for additive Schwarz algorithms for the biharmonic equation discretized by certain conforming finite elements. The number of iterations for the iterative substructuring methods grows only as the logarithm of the number of degrees of freedom associated with a typical subregion. It is also demonstrateed that it is possible to simplify the basic algorithms. This leads to a decrease of the cost but not of the rate of convergence of the iterative methods. In the analysis, new tools are developed to deal with Hermitian elements. Certain new inequalities for discrete norms for finite element spaces are also used.
Title: Edge representation from wavelet transform maxima
Candidate: Zhong, Sifen
Advisor(s): Mallat, Stephane
Abstract:
The multiscale edges of a signal are the sharp variation points measured at different scales. This thesis studies a model of multiscale edge representation based on the local maxima wavelet transform. The wavelet transform is a mathematical formulation of a multiscale decomposition. It decomposes a signal into multiple components indexed by a scale parameter. A particular class of wavelets are used such that each of these components is the first derivative of a smooth version of the signal, with the scale parameter indicating the degree of smoothing. The local maxima of this wavelet transform is therefore a multiscale edge representation. This thesis shows that the local maxima not only identify the edges but also characterize the edges. An algorithm to reconstruct a signal from its local maximum representation is developed. The experimental results show that the algorithm reconstructs the original signal, and this reconstruction is stable. This implies that the local maximum representation is a reorganization of the signal information. Therefore, various pattern analysis algorithms can be developed uniquely based on the properties of edges. Image processing can also be done through the multiscale edge representation. An application to image coding is described.