Compilation of Array-Style Programs
for Distributed Memory MIMD Machines:
a Geometric Approach
4:30 p.m., Thursday, September 15, 1994
12th floor conference room, 719 Broadway
Distributed memory MIMD (Multiple Instruction Multiple Data) machines are emerging as a cost-effective means of speeding up numerically intensive programs. They scale more easily than other parallel machines. But writing explicitly parallel programs for these machines is both difficult and error prone. Compilers for languages like HPF make the task easier by generating the necessary inter-processor communication from the data distribution directives supplied by the programmer. This dissertation shows that for a large class of array-style programs automatic data distribution can produce a significant speedup on a distributed memory MIMD machine. Array-style programs use array primitives to manipulate entire arrays, rather than looping explicitly over the array elements. APL programs are typically array-style.
We show how to apply automated data distribution to APL programs, that treat arrays and operations on them as atomic. Automated data distribution determines the necessary inter-processor communication from the way APL primitives manipulate the entire arrays, rather than by complex algebraic analysis of the patterns of array subscripts, as would be done in more conventional compilers. A simple distribution and alignment scheme automatically distributes arrays across available processors. Arrays can be dynamic, with sizes varying during program execution. Data distribution is guided by array size estimates. Distribution trade-off analysis attempts to optimize the initial distribution by comparing the estimated communication and computation times, and replicating arrays whose partitioning results in excessive communication.
Building on the APL to C compiler developed by W.-M. Ching, we produce explicitly parallelized C, from APL source programs. We describe the parallel implementation of most of the APL primitives. The implementation of several APL primitives uses the monotonic data movement algorithm. The ideas developed are demonstrated with eight APL programs of varying complexity. We show the speedup and efficiency obtained when running these programs on 2 to 32 processors. The speedup achieved on 32 processors, ranging from 7 to 30, shows the technique to be applicable to a wide range of programs.