next up previous
Next: 5.3 Guidelines Up: 5. On Data Processing Previous: 5.1 The Field

5.2 Problems and Solutions

The intelligent modern programmer, faced with the task of designing a data processing system, will try to use existing packages as much as possible, and thereby reduce the job to one of coordinating large chunks of software by means of relatively small interconnection programs.

These programs, when they are written in an interpretive ``shell'' language, are often called scripts, and typically operate at the highest semantic level in the system, dealing as they do at the granularity of files and whole programs. Shell languages have their foundations in early ``job control'' languages.

Because the shell language has been the unavoidable starting point for any serious computer work right up until the age of the GUI (and still is, for many programmers), and because memory was not always a cheap and abundant resource, shell languages have tended to be very thin, lacking such amenities as lexical structure and sometimes even control structure. Many do not support the concept of an algebraic expression, and most provide little opportunity for static checking. Historically, therefore, the goal of a system designer has been to do as little as possible at this high level, and merely use the shell as a launching pad for programs coded for efficiency.

Not surprisingly, we feel the drag of this history today. People still tend (1) to use unnecessarily weak shell languages, (2) to use systems programming languages where high-level languages would be more appropriate, and (3) to show little regard for their shell scripts. The last of these phenomena probably results from the fact that most shell languages discourage good programming style. Every successful tool starts to be overused at some point, and this is exactly the situation we find ourselves in now--shell languages are being pressed to perform feats they were never designed for.

Much of this pressure on our worn-out tools comes from the modern data processing scene. What more obvious language to write small interconnecting programs in than the locally available shell language? If programs were as small as they are initially conceived to be in the minds of their creators, and if they stayed that way, all might be well, but of course the simple program hacked together in the weak language all too often grows into the unmanageable monster, and earns the respect of the unwary only after it has caused considerable frustration.

It is commonly said that the choice of language has a controlling influence on how we think about programming. It is equally true to say that it has a profound influence on what we think of a program after it has been written, though the operative factor in that regard is the care taken by the programmer to make the program readable in the first place. Of course, some languages make writing readable programs easier than others do.

SETL's particular contribution to readability, when programs are written in a style appropriate to its mathematical character, is that its most fundamental and reusable forms make sense when literally ``read out loud'' in phrases like ``the set of [all] x in [the universe] S such that P(x) [holds]'' for ``{x in S | P(x)}'', or ``if [there] exists [an] x in S such that P(x) [holds] $\ldots$'' for ``if exists x in S | P(x) ...''. These may seem idle matters to those who have no experience with SETL, but the ``dual view'' of sets and predicates alluded to in Section 1.1 [Why SETL?] is actually tremendously important in helping the programmer to stand back and look at a set or predicate as being delineated by constraints on a universe (the mathematical view), or to move closer and look inside to see a mechanism in which iterators produce candidate values that are tested in turn and either accepted or rejected (the algorithmic view). The strength in the fact that the same set or predicate can be viewed in both of these ways is that the mental image created for the one view provides a helpful doublecheck on the other.

A similar psychology obtains in the case {expression : iterators | predicate}, where further readability springs from the focus on the expression that characterizes all the members of the set. The singleton {expression} then appears as a degenerate form of this set former, and the enumerated set {expr1expr2, ..., exprk} is a natural generalization of the singleton, lending still more readability to SETL programs through uniformity of notation.

It is interesting to compare how SETL encourages high-level ways of thinking about problem solving to how languages such as Ada 95, with their strong support for defining and implementing high-level abstractions efficiently, do it. SETL takes the ``minimum of fuss'' approach, and really offers little beyond a few well chosen abstractions from the foundations of mathematics, generally free of inconvenient restrictions and machine-level concerns. Ada 95, on the other hand, deliberately predefines few abstractions, but provides facilities whereby a skilled programmer can create high-level abstractions running the gamut from generic to completely application-specific.

In the world of data processing, at least in that large part of it that involves small programs, SETL is attractive in combining readability with conciseness, so that a person with almost no knowledge of a system or its conventions can usually start to understand a SETL program rather quickly without getting lost in details. Of course, a well written Ada 95 program will also have this quality, but the program will probably have taken much longer to write than the equivalent SETL program, and will inevitably be longer textually. There is no place for the quick and dirty in any realm, but sometimes, especially for small programs, the writer's need to save time and the reader's need to get the right idea quickly are better served by a minimal, high-level script than by an exemplar of masonry.

If programs are the modules of a system, software engineering teaches us that we are most likely to achieve clean interfaces and comprehensible implementations by keeping those modules as small as they comfortably can be, too. In the modern data processing setting, the fact that a small program can be modified with much more confidence than a large one is also a good defense against shifting user requirements. Furthermore, the substantial cost advantages of using pre-existing large software components as much as possible dictates a strong anti-monolith policy in favor of small interconnecting programs. Finally, the rise of the network over the slowly rusting mainframe militates a distributed approach, and while Ada 95 is an excellent example of a language that allows the coordination issues to be dealt with even without sacrificing the advantages of static checking, Ada 95 is really a systems and applications programming language, as distinct from what might be regarded as a more than usually respectable scripting language (SETL). In a large data processing system, various languages, some of them quite specialized, will be found useful at various points, and again this argues for many little programs over a few big ones.

The tools at the disposal of a data processing programmer must be flexible, convenient, reasonably efficient, and robust. This is at least as much a matter of good implementation as it is of good language design. Because people tend to look down on data processing tools in the first place, they will rapidly become impatient with them unless they are obviously of high quality and scope, so although the design of a language should not be too fixated on implementation concerns, it should at least balance idealism with enough foresight to accommodate practice in a data processing context. This has been the motivation behind most of the SETL extensions described in this dissertation.

next up previous
Next: 5.3 Guidelines Up: 5. On Data Processing Previous: 5.1 The Field
David Bacon