Analysis and Transformation of Set-Theoretic Languages

Bob Paige

Courant Institute, New York University

This minicourse will use SETL2 as a vehicle for showing how types and transformations can be used to integrate algorithm design and analysis, program development, and high level compilation of set theoretic programming languages. There are four lectures, each lasting two hours. Several References may be found here.


1. Introduction to SETL2
The first lecture will describe the kernel of the SETL2 language, that which was designed and implemented recently by Kirk Snyder based on SETL1, which Jack Schwartz designed almost 30 years ago. It will be illustrated with mathematical expressions, data representations, algorithms, and problem specifications. We describe SETL sources of inefficiency, and program improvement by finite differencing.

A good reference manual on the core SETL2 language (without object oriented features) is found here . Setl2 is installed at NYU. Instructions on how to use the NYU implementation, how to install SETL2 on your own site, and sample programs are found by clicking here.

2. Data Structure Selection
We show how to simulate typed SETL set and map primitive operations on a sequential RAM in real time. A generic read method tailored to the types of the input variables converts external input in string form into initial data structures in linear time with respect to the length of the input string.

The method is used to reconstruct and extend the linear time fragment of Willard's database predicate retrieval theory.

3. Programming Productivity Experiments
Experimental evidence is presented suggesting that a transformational methodology yields a five-fold improvement in the productivity of algorithm implementation over conventional hand crafted programming. There will be a live demonstration.

Course Reading and Assignments

Reading for Lecture 1

SETL overview: Reference [Keller and Paige], Sec. 2.1;
Finite Differencing Overview: " Programming With Invariants " : whole paper (handout). Alternatively, you can look here for a brief intuitive explanation of finite differencing, and here for a more formal explanation. Finite Differrencing derivations sketched for cycle testing and topological sorting are guides for homework assignments 2 and 3 just below.


  1. (SAMPLE) The center of a finite nonempty connected free tree T can be found in the following way. Let x be any leaf (i.e., a vertex with only one adjacent vertex). Let y be any leaf furthest away from x (in terms of number of edges). Let z be any leaf furthest away from y. The path P connecting z to y is a longest path in T. If P contains an odd number of vertices, then the single vertex at the middle of P is the center of T. If P contains an even number of vertices, then the two middle vertices of P form the unique center of T.

    Write an executable tree center SETL2 program whose body contains only a read statement and a print statement as in the style of very high level programs found here. Prove that this specification is equivalent (i.e., computes the same center for finite nonempty connected free trees) to the medium level program found here. (answer)

  2. Write an executable topological sorting SETL2 program whose body contains only a read statement and a print statement as in the style of very high level programs found here.
  3. Describe how finite differencing can be used to improve the medium level graph reachability program. Implement your algorithm, and test it using the data found here. Is the output from your program the same as from the original program? Is it the same as from executing the very high level reachability program. Should it be the same? Please accompany your final program with an explanation of which invariants were used, and how they are maintained.
  4. Same as 3. but for the algorithm to find the maximal independent set of a graph found here. Use input found here. How can the original algorithm be simplified if we know that the graph is undirected?

Reading for Lecture 2

Data Structure Selection Methodology: (Reference [Goyal and Paige, sec. 2.1], [Keller and Paige, pp. 12 - 20], and [Paige and Yang]) Data Structure Selection is applied to cycle testing using data structures found here.
Optional: Read references [Paige 89], [Keller and Paige, section 3], [Paige 94].


  1. Show how the toplogical sorting program at the end of this file can be simulated in real-time on a conventional machine, so that each primitive set operation is implemented by a conventional pointer or cursor operation. That is, annotate the low level SETL2 topological sorting program with realtime simulation assertions as was done for cycle testing, and draw a diagram of the topsort data structures as for cycle testing. Explain why you can overlay the space used to implement minset with numpred?
  2. Rewrite the low level SETL2 topsort program in terms of the data structures you designed in the previous question. Model these data structures in terms of SETL2 tuple-valued data. Your program should have no sets or maps, and no dynamically growing tuples. However, keep the same abstract read statement. Just after the read statement, preallocate the tuples (whose sizes remain the same during the rest of the program) that implement your data structures, and enter values from the input into them. Your preallocation code may include any SETL expression (including sets and maps), no matter how inefficient.
  3. Translate your fast SETL2 topsort program into a low level SETL2 program, but read high SETL level input as is found here . Implement a read method (tailored to this problem) that validates this input and translates it into your initial data structures in linear time in the length of the input string. Do the same for printing the total ordering as a SETL2 tuple.

    It may be helpful to examine the multiset discrimination procedures found here. (WARNING: the procedures found there are not yet debugged.)

Reading for Lecture 3

Productivity experiments: [Cai and Paige, SIGSOFT 93].