Theses & Reports
Instructions for submitting a technical report or thesis.
You can find technical reports published prior to 1990 archived here.
-
TR1996-721
1996
Primal-Dual Interior-Point Methods for Semidefinite Programming: Convergence Rates, Stability and Numerical Results
Alizadeh, F.;
Haeberly, J.A.; Overton, M.L.
Abstract
|
PDF
Title: Primal-Dual Interior-Point Methods for Semidefinite Programming: Convergence Rates, Stability and Numerical Results
Author(s): Alizadeh, F.; Haeberly, J.A.; Overton, M.L.
Abstract:
Primal-dual interior-point path-following methods for semidefinite programming (SDP) are considered. Several variants are discussed, based on Newton's method applied to three equations: primal feasibility, dual feasibility, and some form of centering condition.
The focus is on three such algorithms, called respectively the XZ, XZ+ZX and Q methods. For the XZ+ZX and Q algorithms, the Newton system is well-defined and its Jabobian is nonsingular at the solution, under nondegeneracy assumptions. The associated Schur complement matrix has an unbounded condition number on the central path, under the nondegeneracy assumptions and an additional rank assumption.
Practical aspects are discussed, including Mehrotra predictor-corrector variants and issues of numerical stability. Compared to the other methods considered, the XZ+ZX method is more robust with respect to its ability to step close to the boundary, converges more rapidly, and achieves higher accuracy.
-
Ph.D. Thesis
1996
Algorithms in Semi-Algabraic Geometry
Basu, Saugata
Abstract
|
PDF
Title: Algorithms in Semi-Algabraic Geometry
Candidate: Basu, Saugata
Advisor(s): Pollack, Richard
Abstract:
In this thesis we present new algorithms to solve several very general problems of semi-algebraic geometry. Our algorithms are currently the best algorithms for solving these problems. In addition, we have proved new bounds on the topological complexity of real semi-algebraic sets, in terms of the parameters of the polynomial system defining them, which improve some old and widely used results in this field.
The first part of the thesis deals mainly with the decision problem for the first order theory of real closed fields, and the more general problem of quantifier elimination. We give algorithms which improve the complexity of of all the previously known algorithms for these problems. Moreover, our techniques allow us to prove some purely mathematical theorems on the number of connected components and on the existence of small rational points in a given semi-algebraic set.
The second part of this work deals with connectivity questions of semi-algebraic sets. We develop new techniques in order to give an algorithm for computing roadmaps of semi-algebraic sets which improves on the complexity of the previous algorithms for this problem.
The third part of this work deals with bounding the topological complexity of semi-algebraic sets in terms of the number and the degrees of the polynomials describing it. We extend and improve a classical and widely used result of Oleinik and Petrovsky(1949), Thom (1965) and Milnor(1964), bounding the sum of the Betti numbers of semi-algebraic sets. Using the ideas behind this result, we give the first singly exponential algorithm for computing the Euler characteristic of an arbitrary semi-algebraic set.
One common thread that links these results is that our bounds are separated into a combinatorial part (the part depending on the number of polynomials) and an algebraic part (the part depending on the degrees of the polynomials). The combinatorial part of the complexity of our algorithms is frequently tight and this marks the improvement of many of our results. This is most striking when one considers that in many applications, for instance in computational geometry, it is the number of polynomials which is the most important parameter (the degrees and the number of variables are usually small). Another important and new feature of some of our results is that when the given semi-algebraic set is contained in a lower dimensional variety, the combinatorial part of the complexity depends on the dimension of this variety rather than on the dimension of the ambient space. This is useful when one considers semi-algebraic sets which have low real dimension embedded in a higher dimensional space.
-
TR1996-729
1996
PLinda User Manual
Brown, T.;
Jeong, K.; Li, B.; Talla, S.; Wyckoff, P.; Shasha, D.
Abstract
|
PDF
Title: PLinda User Manual
Author(s): Brown, T.; Jeong, K.; Li, B.; Talla, S.; Wyckoff, P.; Shasha, D.
Abstract:
Persistent Linda (PLinda) is a programming environment for writing fault-tolerant distributed/parallel programs that may be run on networks of workstations. PLinda is a set of extensions to the Linda parallel programming model and PLinda/C++ (and Fortran77 respectively) is an implementation combined with the sequential language C++.
The PLinda User Manual introduces the PLinda model, mechanics of the PLinda operations, and programming in PLinda/C++ and PLinda/Fortran77.
-
TR1996-717
1996
Schwarz Preconditioners for Spectral and Mortar Finite Element Methods with Applications to Incompressible Fluids
Casarin, M.A., Jr.
Abstract
|
PDF
Title: Schwarz Preconditioners for Spectral and Mortar Finite Element Methods with Applications to Incompressible Fluids
Author(s): Casarin, M.A., Jr.
Abstract:
The spectral element method has been used extensively for the simulation of fluid flows. The resulting linear systems are often not amenable to direct methods of solution, and are especially ill-conditioned. Domain decomposition preconditioners, well adapted to the solution on parallel computers, are proposed and analyzed; both two and three space dimensions are considered.
Second-order elliptic equations are considered first, and the now well-developed theory of domain decomposition methods for finite elements is fully extended to spectral elements. This includes an analysis of exotic coarse spaces, which have proven necessary for the efficient solution of elliptic problems with large discontinuities in the coefficients, as well as a study of overlapping methods. Estimates of the condition numbers of the Schur complement restricted to an edge (in two dimensions) or to a face (in three dimensions) are also given; in particular, a fast method is designed and studied in full detail for problems with many subregions.
The Stokes problem, when restricted to the space of discrete divergence free velocities, is symmetric positive definite. A number of preconditioners are proposed, which are based on previous results for the scalar elliptic case, and new global models. The construction of a basis for the constrained velocity space is not required,and the resulting condition numbers grow only weakly with the degree $N$ and are independent of the number of subdomains.
We also consider the stationary Navier-Stokes equations, solved with Newton's method. In each iteration, a non-symmetric indefinite problem is solved using a Schwarz preconditioner. A new coarse space is proposed which satisfies the usual properties required by the elliptic theory, and also a specific $H^1$-approximation property. The rate of convergence of the algorithm grows only weakly with $N$, and does not depend on the number of subdomains, or the Newton step.
Finally, a hierarchical basis preconditioner for the mortar finite element method in two dimensions is proposed and analyzed. It is also further shown that the analysis of the symmetric positive definite preconditioner can also be applied to construct preconditioners for symmetric indefinite problems arising from second-order elliptic equations. Numerical results are presented for the Helmholtz equation.
-
TR1996-725
1996
Building a Fast Double-Dummy Bridge Solver
Chang, M-S.
Abstract
|
PDF
Title: Building a Fast Double-Dummy Bridge Solver
Author(s): Chang, M-S.
Abstract:
Compared to other games, particularly chess, the research in computer bridge is immature, and the best bridge-playing programs are mediocre. In this paper we address the problem of designing a fast double-dummy bridge game (i.e., a simplified bridge game with perfect information) solver. Although th size of the game tree we generated for searching the best line of play is huge (about on the order of $13! \cdot 2^{39} \approx 10^{21}$, even if we assume the average branching factor for players to follow suit is just 2), we show that, through varieties of searching techniques and some efficient moves ordering and pruning heuristics, most double-dummy bridge hands can be solved within a reasonable amount of time. In this paper we first give a brief introduction to computer bridge and previous work on the card-playing phase of bridge. Next, we describe the top-level architecture of our double-dummy solver (dds), followed by a number of implementation techniques we employed in our dds. Finally we present experimental results, draw our conclusion and describe some future work toward automating card-playing in real bridge.
-
TR1996-730
1996
An O(n log n) Algorithm for the Maximum Agreement Subtree Problem for Binary Trees
Cole, R.;
Farach, M.; Hariharan, R.; Przytycka, T.; Thorup, M.
Abstract
|
PDF
Title: An O(n log n) Algorithm for the Maximum Agreement Subtree Problem for Binary Trees
Author(s): Cole, R.; Farach, M.; Hariharan, R.; Przytycka, T.; Thorup, M.
Abstract:
The Maximum Agreement Subtree problem is the following:
Given two trees whose leaves are drawn from the same set of items (e.g., species), find the largest subset of these items so that the portions of the two trees restricted to these items are isomorphic. We consider the case which occurs frequently in practice, i.e., the case when the trees are binary, and give an O(n log n) time algorithm for this problem.
-
TR1996-731
1996
Tree Pattern Matching and Subset Matching in Randomized O(n log^3 m) Time
Cole, R.;
Hariharan, R.
Abstract
|
PDF
Title: Tree Pattern Matching and Subset Matching in Randomized O(n log^3 m) Time
Author(s): Cole, R.; Hariharan, R.
Abstract:
The main goal of this paper is to give an efficient algorithm for the Tree Pattern Matching problem. We also introduce and give an efficient algorithm for the Subset Matching problem.
The Subset Matching problem is to find all occurrences of a pattern string p of length m in a text string t of length n, where each pattern and text location is a set of characters drawn from some alphabet. The pattern is said to occur at text position i if the set p[j] is a subset of the set t[i+j-1], for all j, 1 <= j <= m. We give an O((s+n)\log^3 m) randomized algorithm for this problem, where s denotes the sum of the sizes of all the sets.
Then we reduce the Tree Pattern Matching problem to a number of instances of the Subset Matching problem. This reduction takes linear time and the sum of the sizes of the Subset Matching problems obtained is also linear. Coupled with our first result, this implies an O(nlog^3 m) time randomized of metric spaces are also considered.
-
TR1996-724
1996
Two Heuristics for the Steiner Tree Problem
Dreyer, D. R.;
Overton, M.L.
Abstract
|
PDF
Title: Two Heuristics for the Steiner Tree Problem
Author(s): Dreyer, D. R.; Overton, M.L.
Abstract:
The Steiner tree problem is to find the tree with minimal Euclidean length spanning a set of fixed points in the plane, given the ability to add points (Steiner points). The problem is NP-hard, so polynomial-time heuristics are desired. We present two such heuristics, both of which utilize an efficient method for optimizing a tree with a given topology. The first systematically inserts Steiner points between edges of the minimal spanning tree meeting at angles less than 120 degrees, performing a local optimization at the end. The second begins by finding the Steiner tree for three of the fixed points. Then, at each iteration, it introduces a new fixed point to the tree, connecting it to each possible edge by inserting a Steiner point, and minimizes over all connections, performing a local optimizations for each. We present a variety of test cases that demonstrate the strengths and weaknesses of both algorithms.
-
Ph.D. Thesis
1996
Statistical Source Channel Models for Natural Language Understanding
Epstein, Mark
Abstract
|
PDF
Title: Statistical Source Channel Models for Natural Language Understanding
Candidate: Epstein, Mark
Advisor(s): Grishman, Ralph
Abstract:
The problem of Natural Language Understanding (NLU) has intrigued researchers since the 1960's. Most researchers working in computational linguistics focus on linguistic solutions to their problems. They develop grammars and parsers to process the input natural language into a meaning representation . In this thesis, a new approach is utilized. Borrowing from the field of communication theory , an information theoretic approach to natural language understanding is applied. This is based on the source-channel model of communication.
The source-channel model of NLU assumes that the user has a meaning in the domain of the application that he wishes to convey. This meaning is sent through a noisy channel . The observer receives the English sentence as output from the noisy channel. The observer then submits the English sentence to a decoder , which determines the meaning most likely to have generated the English. The decoder uses mathematical models of the channel and the meanings to process the English sentence. Thus, the following problems must be addressed in a source-channel model for NLU:
- 1.
- A mathematical model of the noisy-channel must be developed.
- 2.
- The parameters of the model must be set, either manually or by an automatic training procedure.
- 3.
- A decoder must be built to search through the meaning space for the most likely meaning to have generated the observed English.
This dissertation focuses on the first two of these problems. Several mathematical models of the noisy channel are developed. They are trained from a corpus of context independent sentence pairs consisting of both English and the corresponding meaning. The parameters of the models are trained to maximize the likelihood of the model's prediction of the observed training data using Dempster and Laird's Expectation-Maximization algorithm . Results are presented for the Air Travel Information Service (ATIS) domain.
-
TR1996-720
1996
A Model and Solution to the DNA Flipping String Problem
Geiger, D.;
Parida, L.
Abstract
|
PDF
Title: A Model and Solution to the DNA Flipping String Problem
Author(s): Geiger, D.; Parida, L.
Abstract:
We consider the case where a pool of DNA molecules clones both, flipped and not-flipped, have been cut by restriction enzymes. Ideally, each clone is cut in the same positions, although in practice due to errors, this does not always happen. The computational problem is to determine where the cuts have occurred.
This is a key problem in determining the structure of the original DNA molecule.
A single molecule is represented by a string of 1's and 0's, with cuts represented by $1's$. A set of molecules clones (with errors) is observed, but the orientation/parity of each molecule is unknown. Clear is that the location of the observed cuts of one molecule are dependent on the parity: flipping the molecule would result in the cuts location, as observed, being ``flipped'' .
We propose a Bayesian approach to generate a posterior distribution on the cuts and parity, given the data. We first present an approximate algorithm where we attempt to divide the problem into subproblems, but it is not guaranteed to solve the problem. Then, we propose another approximate method based on a statistical framework and a mean field annealing algorithm. It computes the maximum posterior marginal (MPM estimator) and maximum aposteriori estimate (MAP estimator).
We also provide evidence that the exact solution of the problem is intractable.
-
TR1996-727
1996
Hierarchically Split Cube Forests for Decision Support: description and tuned design
Johnson, T.;
Shasha, D.
Abstract
|
PDF
Title: Hierarchically Split Cube Forests for Decision Support: description and tuned design
Author(s): Johnson, T.; Shasha, D.
Abstract:
The paradigmatic view of data in decision support consists of a set of dimensions (e.g., location, product, time period, ...), each encoding a hierarchy (e.g., location has hemisphere, country, state/province, ..., block). Typical queries consist of aggregates over a quantifiable attribute (e.g., sales) as a function of at most one attribute in each dimension of this ``data cube.'' For example, find the sum of all sales of blue polo shirts in Palm Beach during the last quarter. In this paper, we introduce an index structure for storing and indexing aggregates, called ``cube forests,'' to support such cube queries efficiently --- one index search is usually enough.
In their most general form, cube forests require a lot of space. So, we present an optimized structure, called ``hierarchically split cube forests'' that exploit the hierarchical nature of the data to save space. We then present a model and algorithms to arrive at designs that further reduce update time, but suffer an increase in query time. Our experiments bear out the model and show that the structure has promise for decision support applications in read-intensive environments.
-
TR1996-716
1996
Preconditioners for Indefinite Problems
Klawonn, A.
Abstract
|
PDF
Title: Preconditioners for Indefinite Problems
Author(s): Klawonn, A.
Abstract:
Two different preconditioners for symmetric saddle point problems with a penalty term are analyzed. The saddle point problems are discretized by mixed finite elements. The preconditioners are applied in combination with Krylov space methods. It is shown that both methods yield convergence rates that are independent from both, the discretization and the penalty parameters. The first method is based on a symmetric positive definite block-diagonal preconditioner and the second one uses a non-symmetric and indefinite block-triangular preconditioner. Numerical results are presented for a problem of linear elasticity. The preconditioners in our experiments are based on domain decomposition and multilevel techniques. It is further shown that the analysis of the symmetric positive definite preconditioner can also be applied to construct preconditioners for symmetric indefinite problems arising from second-order elliptic equations. Numerical results are presented for the Helmholtz equation.
-
TR1996-715
1996
Skip-Over: Algorithms and Complexity for Overloaded Systems that Allow Skips
Koren, G.;
Shasha, D.
Abstract
|
PDF
Title: Skip-Over: Algorithms and Complexity for Overloaded Systems that Allow Skips
Author(s): Koren, G.; Shasha, D.
Abstract:
In applications ranging from video reception to telecommunications and packet communication to aircraft control, tasks enter periodically and have fixed response time constraints, but missing a deadline is acceptable, provided most deadlines are met. We call such tasks ``occasionally skippable''. We look at the problem of uniprocessor scheduling of occasionally skippable periodic tasks in an environment having periodic tasks. We show that making optimal use of skips is NP-hard. We then look at two algorithms called Skip-Over Algorithms (one a variant of earliest deadline first and one of rate monotonic scheduling) that exploit skips. We give schedulability bounds for both.
-
TR1996-723
1996
Highly Efficient Instruction Scheduling of Realtime Programs on RISC Processors
Leung, A.;
Palem, K.V.; Pnueli, A.
Abstract
|
PDF
Title: Highly Efficient Instruction Scheduling of Realtime Programs on RISC Processors
Author(s): Leung, A.; Palem, K.V.; Pnueli, A.
Abstract:
Enabled by RISC technologies, low-cost commodity microprocessors are performing at ever increasing levels, significantly via instruction level parallelism (ILP). This in turn increases the opportunities for their use in a variety of day-to-day applications ranging from the simple control of appliances such as microwave ovens, to sophisticated systems for cabin control in modern aircraft. Indeed, ``embedded'' applications such as these represent segments in the computer industry with great potential for growth. However, this growth is currently impeded by the lack of robust optimizing compiler technologies that support the assured, rapid and inexpensive prototyping of real-time software in the context of microprocessors with ILP. In this paper, we will present fast (polynomial-time) algorithms for compile-time instruction scheduling, of programs instrumented with timing-constraints, on processors with ILP. Our algorithms can be distinguished from earlier work in that they are guaranteed to find feasible schedules --- those satisfying the timing-constraints --- whenever such schedules exist, in cases of practical interest. Consequently, they can serve as powerful engines and can simultaneously support the ``analysis'' of the program prior to compilation, as well as during compilation once a feasible schedule is identified via analysis. We will also describe a novel notation, Time_tract, for specifying timing-constraints in programs, independent of the base language being used to develop the embedded application; Time_tract specifications are language independent and can be instrumented into imperative and object-oriented languages non-intrusively. As we will show, the instruction scheduling questions that arise out of Time_tract specifications are always ``tractable''. In contrast, a range of specification mechanisms proposed earlier yield substantially intractable instruction scheduling questions, thereby limiting their potential utility. We will sketch a formal and precise comparison of the tractability and related expressive power issues between Time_tract and some of the extant mechanisms for specifying properties of timed programs; this will be done using the canonical framework of timed-automata.
-
TR1996-718
1996
CoRRet: A CONSTRAINT Based Environment for Rapid Prototyping Real Time Programs
Palem, K.
Abstract
|
PDF
Title: CoRRet: A CONSTRAINT Based Environment for Rapid Prototyping Real Time Programs
Author(s): Palem, K.
Abstract:
The information revolution that we are in the midst of has led to the use of computers controlling applications ranging from automobiles and games, to video-pumps in the information highway. These applications are distinguished by the fact that they use programs with special timing relationships between their constituent elements. For example, a program running in the microprocessor controlling an ABS system in a modern automobile must sense and react to the friction coefficient between the brake pads and the wheel at well-defined intervals of time; failure to do so will result in a systemic failure of the brakes. Referred to typically as embedded systems, these applications constitute a significant portion of the potential growth in the computer industry. However, this growth opportunity is being hampered by a lack of adequate support via software development tools, to aid the easy, rapid and correct prototyping of embedded applications.
In this report, we outline CoRReT, a COnstraint based environment for the Rapid prototyping of REal Time programs. The report outlines the overall system architecture as well as the key modules in this environment that are being currently developed. CoRReT is a scheduling centric system in that a suite of algorithms for instruction scheduling programs instrumented with real-time constraints, are at its core. These algorithms are an integral part of an (optimizing) compiler which will compile these programs automaticallywhile attempting to ensure that the timing constraints are met; when the constraints are met, the resulting schedule for the instructions is referred to be feasible. If a feasible schedule is found, it will be fed automatically into a code-generator in the back-end of the compiler. Our envisioned scheduler can --- in addition to traditional control- and data-dependence constraints in the source program --- also cope with a variety of timing constraints specified by the programmer.
Our focus is on computational platforms that embody parallelism at two levels of granularity. At the highest level, we envision a tightly-coupled parallel machine offering large-scale parallelism. In this setting, a single embedded application can be distributed across the individual processors of the cluster. Furthermore, each processor in this parallel machine can embody Instruction Level Parallelism (ILP) at a fine-grained level.
Unfortunately, due to a lack of automatic tools and technology that can provide compilation support for real-time constraints ubiquitous to embedded applications, parallel computing platforms have not proliferated in this setting. Considering the fine-grained case first, RISC processors with ILP have not yet found a niche in this domain; currently, developers of embedded systems are reluctant to embrace ILP technologies due to the onerous task of ensuring timing relationships in the program by hand --- a difficulty compounded by parallelism (at a fine-grained level) in the processor. Clearly, providing support through automation that frees the programmer of these difficulties, is a means of overcoming this challenge.
Our response to this challenge via CoRReT is to develop scheduling methodologies and tools for automatically harnessing very high performance from these platforms, in the context of embedded systems. In the absence of time-constraints, major progress has been achieved in this direction at the coarse-grained level. The situation is even better at the fine-grained level where scheduling technology is being used routinely in product-quality compilers for RISC processors.
The methodology on which CoRReT is based is independent of any particular target processor, and is applicable to third and fourth generation languages. Furthermore, we propose to use the same scheduling engines during the static analysis of the program as well as during compilation. We anticipate this ``confluence'' in the scheduling algorithms to aid in shorter prototyping cycles, since identical schedules will be used by the analysis tools and the back-end of the compiler to generate code. We envision that the algorithms and tools that go into CoRReT will naturally form an integral part of a full-fledged programming environment for prototyping real-time programs on parallel platforms.
-
Ph.D. Thesis
1996
Solving the Navier-Stokes Equations on a Distributed Parallel Computer
Sabbagh, Hadil
Abstract
|
PDF
Title: Solving the Navier-Stokes Equations on a Distributed Parallel Computer
Candidate: Sabbagh, Hadil
Advisor(s): Peskin, Charles S.
Abstract:
Speed and space are two major issues in computational fluid dynamics today. Scalable parallel or distributed computers offer the promise of faster time to solve problems through parallelism and solutions to larger problems by adding more parallel processors with their own private memories. These systems use message passing to share data between processors. Parallel programs are difficult to write, especially for message passing systems, and there are few well-studied test cases.
In this dissertation, we solve the incompressible Navier-Stokes equations on a periodic cubic domain (3-torus). The numerical method is a finite difference method that consists of two parts: upwind differencing applied to the non-linear terms and solution of the Stokes equations. The latter are solved implicitly using a three-dimensional FFT. For the parallel implementation, the domain is divided into equally sized non-periodic cubic subdomains. Each subdomain is assigned to a processor; the processors form a process grid which is periodic. The parallel upwind differencing is preceded by an exchange of face data. The discrete Fourier transform in the Stokes solver is computed by applying one-dimensional FFTs sequentially in the three coordinate directions. In each coordinate direction, data must be exchanged only among those processors that lie on the same line of the process grid.
The parallel algorithm was implemented twice: once using PVM and once using MPI. Although both implementations are described in the thesis, the performance of only the MPI version is discussed.
The Navier-Stokes solver is tested on the IBM SP-2. Three constant problem size and three scalability experiments are used to analyze the performance of the solver. The fluid solver achieves a speedup of 48.8 when solving a 240 * 240 * 240 problem on 216 processors. Furthermore, there is evidence of scalability.
-
TR1996-719
1996
NYU Reactive Gripper: An Implementation
Teichmann, M.;
Mishra, B.
Abstract
|
PDF
Title: NYU Reactive Gripper: An Implementation
Author(s): Teichmann, M.; Mishra, B.
Abstract:
We consider the problem of grasping an unknown polygonal flat object using a parallel jaw gripper. Our design equips a standard gripper with several light-beam sensors (close to each jaw) and employs a control scheme based on a reactive grasping algorithm. This is done by probing the object to locate a good grasp position, and then grasping, without moving the object significantly. The goal is to do as little motion as possible to find a grasp. In this paper, we discuss an implementation of this device using NYU's MOSAIC robot, following a quick overview of the underlying reactivity principle.
-
TR1996-726
1996
Some Numerical Results Using An Additive Schwarz Method For Maxwell's Equations
Toselli, A.
Abstract
|
PDF
Title: Some Numerical Results Using An Additive Schwarz Method For Maxwell's Equations
Author(s): Toselli, A.
Abstract:
We present some numerical results for a two-level additive overlapping Schwarz method applied to the 2-D Maxwell's equations. Nedelec finite elements defined on rectangles are employed. Numerical examples show the convergence properties of the method, when varying the mesh size of the coarse and fine problems, the overlap and the time step of the implicit finite difference scheme employed.
-
TR1996-722
1996
A Note on Scheduling Algorithms for Processors with Lookahead
Ungureanu, C.
Abstract
|
PDF
Title: A Note on Scheduling Algorithms for Processors with Lookahead
Author(s): Ungureanu, C.
Abstract:
Many superscalar processors designed today are able to dynamically schedule instructions. Dynamic scheduling means that a processor is able to analyze a portion of the instruction stream ``on the fly'', and has the capability of issuing an instruction other than the next one available in the input, in order to avoid stalling. Such an instruction is said to be executed out of order.
Scheduling algorithms for machines with in-order execution are used in most compilers today. However, schedules which are optimal for machines with in-order execution may be sub-optimal for a machine with out-of-order execution. Here, we are presenting an algorithm which produces a local schedule for a trace of basic blocks, such that the completion time is minimized for a processor with a depth of the pipeline k=2 and dynamic scheduling ability with scope size s=2. The algorithm runs in polynomial time. A generalization of the algorithm to work for machines with larger scopes is straightforward.
-
TR1996-728
1996
Formal Models of Distributed Memory Management
Ungureanu, C.;
Goldberg, B.
Abstract
|
PDF
Title: Formal Models of Distributed Memory Management
Author(s): Ungureanu, C.; Goldberg, B.
Abstract:
We develop an abstract model of memory management in distributed systems. The model is low-level enough so we can express communication, allocation and garbage collection, but otherwise hide many of the lower-level details of an actual implementation.
Recently, such formal models have been developed for memory management in a functional, sequential setting by Morrisett, Felleisen, and Harper. The models are rewriting systems whose terms are programs. Programs have both the "code" (control string) and the "store" syntactically apparent. Evaluation is expressed as conditional rewriting and includes store operations. Garbage collection becomes a rewriting relation that removes part of the store without affecting the behavior of the program.
By using techniques developed for communicating and concurrent systems such as Milner's CCS, we extend the model for a distributed environment. Sending and receiving messages is also made apparent at the syntactic level. A very general garbage collection rule based on reachability is introduced and proved correct. Now, proving correct a specific collection strategy is reduced to showing that the relation between programs defined by the strategy is a subrelation of the general relation. Any actual implementation which is capable of providing the transitions (including their atomicity constraints) specified by the strategy is therefore correct.