Analysis and Transformation of SetTheoretic Languages
Bob Paige
Courant Institute, New York University
This minicourse will use SETL2 as a vehicle for showing how types
and transformations can be used
to integrate algorithm design and analysis, program development, and
high level compilation of set theoretic programming languages.
There are four lectures, each lasting two hours. Several References
may be found
here.
Topics
 1. Introduction to SETL2

The first lecture will describe the kernel of the SETL2 language, that which
was designed and implemented recently by Kirk Snyder based on
SETL1, which Jack Schwartz designed almost 30 years ago. It will
be illustrated
with mathematical expressions, data representations,
algorithms, and problem specifications. We describe
SETL sources of inefficiency, and program improvement by finite
differencing.
A good reference
manual on the core SETL2 language (without object oriented features) is
found here .
Setl2 is installed
at NYU. Instructions on how to use the NYU implementation, how
to install SETL2 on your own site, and sample
programs are found by clicking here.
 2. Data Structure Selection

We show how to simulate typed
SETL set and map primitive operations on a sequential RAM
in real time. A generic read method tailored to the types of the
input variables converts
external input in string form
into initial data structures
in linear time with respect to the length of the
input string.
The method is used to
reconstruct and extend the linear time fragment of Willard's
database predicate retrieval theory.
 3. Programming Productivity Experiments

Experimental evidence is presented suggesting that a transformational
methodology yields a fivefold improvement
in the productivity of algorithm
implementation over conventional hand crafted programming. There will
be a live demonstration.
Course Reading and Assignments
Reading for Lecture 1
SETL overview: Reference [Keller and Paige], Sec. 2.1;
Finite Differencing Overview: " Programming With
Invariants " : whole paper (handout). Alternatively, you can look
here for a brief intuitive explanation
of finite differencing,
and here for a more formal explanation.
Finite Differrencing derivations sketched for cycle
testing and topological
sorting are guides for homework assignments 2 and
3 just below.
Homework

(SAMPLE)
The center of a finite nonempty connected free tree T can be found
in the following way. Let x be any leaf (i.e., a vertex with only one
adjacent vertex). Let y be any leaf furthest away from x (in terms of
number of edges). Let z be any leaf furthest away from y. The path P
connecting z to y is a longest path in T. If P contains an odd number
of vertices, then the single vertex at the middle of P is the center
of T. If P contains an even number of vertices, then the two middle
vertices of P form the unique center of T.
Write an executable tree center SETL2 program whose body contains only a read
statement and a print statement as in the style of very high
level programs found here.
Prove that this specification is
equivalent (i.e., computes the same center for finite nonempty
connected free trees) to the medium level program found
here. (answer)

Write an executable topological sorting SETL2 program whose body contains only a read
statement and a print statement as in the style of very high
level programs found here.

Describe how finite differencing can be used to improve the
medium level graph reachability program.
Implement your
algorithm, and test it using the data found here.
Is the output
from your program the same as from the original program? Is it the
same as from executing the very high level reachability
program. Should it
be the same? Please accompany your final program with an explanation
of which invariants were used, and how they are maintained.

Same as 3. but for the algorithm to find the maximal independent
set of a graph found here. Use input found
here. How
can the original algorithm be simplified if we know that the graph is
undirected?
Reading for Lecture 2
Data Structure Selection Methodology: (Reference [Goyal and Paige, sec. 2.1],
[Keller and Paige, pp. 12  20], and [Paige and Yang])
Data Structure Selection
is applied to cycle testing
using data structures found here.
Optional: Read references [Paige 89], [Keller and Paige, section 3],
[Paige 94].
Homework

Show how the toplogical sorting program at the end of
this file
can be simulated in realtime on a conventional
machine, so that each primitive set operation is implemented by a
conventional pointer or cursor operation. That is, annotate the low
level SETL2 topological sorting program with realtime simulation
assertions as was done for cycle testing, and
draw a diagram of the
topsort data structures as for cycle testing.
Explain why you can
overlay the space used to implement minset with numpred?

Rewrite the low level SETL2 topsort program in terms of the data
structures you designed in the previous question. Model these data
structures in
terms of SETL2 tuplevalued data. Your program should have no sets or
maps, and no dynamically growing tuples. However, keep the same
abstract read statement. Just after the read statement, preallocate
the tuples (whose sizes remain the same during the rest of the
program) that implement your data structures, and enter values from
the input into them. Your preallocation code may include any SETL
expression (including sets and maps), no matter how inefficient.

Translate your fast SETL2 topsort program into a low level
SETL2 program, but read high
SETL level input as is found here . Implement a
read method
(tailored to this problem) that validates this input and translates it
into your initial data structures in linear time in the length of the
input string. Do the same for printing the total
ordering as a SETL2 tuple.
It may be helpful to examine the multiset discrimination procedures
found here. (WARNING: the procedures found
there are not yet debugged.)
Reading for Lecture 3
Productivity experiments: [Cai and Paige, SIGSOFT 93].