next up previous
Next: 6.7 Beyond the Fringe Up: 6. Conclusions Previous: 6.5 Exceptions

Subsections


6.6 Miscellaneous Desiderata

There are numerous minor features which could be added to SETL for the sake of enhancing its already excellent support for Internet data processing, without greatly increasing the complexity of the language. Here we mention a few.

6.6.1 Lexical Nesting

Routines (procedure and operator definitions) cannot be nested in SETL, though procedure nesting is allowed in SETL2. This is of little consequence in small programs from the software engineering point of view, but strictly speaking, any routine q that is purely a ``helper'' for another routine p ought to be private to p, and the most convenient way of arranging this is to have q lexically contained within p.

This notion extends to variables and constants as well as routines, and SETL would be improved further by allowing names to be declared locally to control structures, as in Algol 68. The rarely used keyword begin should also be imported from Algol 68 if this extension is made, for the sake of doing nothing more than framing a local scope.

Indeed, my own feeling is that SETL would do well to follow the lead of Algol 68 and C++ in allowing declarations to occur anywhere that other statements can occur. A name bound by such a declaration can then be referenced throughout the remainder of the scope. For SETL, a special proviso would have to be made that if a name with no binding applicable anywhere in the current scope occurs, its default declaration is taken to be at the beginning of the innermost enclosing routine (where the main program unit is considered a routine for this purpose). If it is declared in a given scope, the standard rule that says it cannot be referenced in that scope before its point of declaration would apply.

6.6.2 Filename Globbing

All Unix shells are able to create lists of filenames based on patterns that universally include the asterisk (*) as a ``wild card'' that matches any run of 0 or more characters. The standard Unix 98 shell, and most other Unix shells, also support patterns such as the question mark (?) to match any single character, a bracket-enclosed ([ ]) run of characters to match any character in that run, and a brace-enclosed ({}) list of strings giving a set of alternatives. For example, if the files foo.c and foo.o are present in the current working directory (see getwd and chdir in Section 2.6 [Files, Links, and Directories]), then the patterns foo.*, foo.?, foo.[co] and foo.{c,o} all stand for the same pair of filenames. Notice that although several characters have special significance in this so-called globbing convention, their meaning is different from that which obtains in regular expressions.

In SETL, the most appropriate realization of such a feature would seem to be to introduce a glob operator which accepts a string containing a pattern obeying the conventions of the Unix 98 shell and yields a (possibly null) tuple of strings representing filenames that match that pattern. Shells behave similarly, but not identically: if a given pattern does not match any filenames, the standard shell will simply leave the pattern unexpanded, whereas the C shell will issue a diagnostic and abort the process of constructing a list of tokens to form a command. All shells have quoting conventions that allow special characters which are normally expanded to be used as themselves in filenames. In programs, access to such filenames is of course achieved simply by not globbing them.

A glob function has appeared in the Posix [117] specification, and is now part of Unix 98, along with an fnmatch function which tests a single filename to see if it satisfies a given glob-style pattern. Provision of a roughly equivalent SETL primitive would be appropriate when all the main vendors of Unix systems have caught up with these potentially very helpful functions. ``Word'' expansion in the shell sense would dovetail with this kind of filename expansion, so that abbreviations for user home directory names and other simple expressions that are familiar to shell users could be easily accessible without the need for such convolutions as *

[fred_home] := split (filter (`bash -c "echo ~fred"'));
to obtain a home (login) directory name.

6.6.3 Format-Directed I/O

Another convenience, especially valuable in lower-level programming languages, is format-directed I/O such as that found in Fortran, the C library (standardized by Posix and Unix 98), and Algol 68. COBOL pictures are among the most sophisticated formatting features in any popular programming language. The SETL functions whole, fixed, and floating (Section 2.14.2 [Formatting and Extracting Values]) get their names from Algol 68 routines, which have the interesting property of being one-for-one with symbolic expressions in the format strings of the Algol 68 transput (I/O) system.

The need for format-directed I/O is less in high-level languages than in lower-level languages because it is so easy to build strings and manipulate them as values in the former. Formats also tend to be in rather arcane little sublanguages, which in most cases separate the expression to be output or the variable to be input from the description of its appearance quite widely, making the correspondence difficult to discern. Nevertheless, formats can be quite useful and concise for encoding complex output layouts or dealing with highly structured inputs, particularly as they tend to reflect layouts rather pictorially. My experience with Algol 68, a language of high enough level in its handling of strings and rich enough in its set of I/O primitives to make the use of formats anything but a sine qua non, was that for some tasks, they were still to be preferred over long concatenations of string-forming expressions.

For SETL, where there has been some effort in recent years to remain compatible with Unix (a moving but definitely slowing target in this decade), the most natural choice for a format sublanguage would seem to be one which strives to remain close to that of the C-callable printf and scanf series. This has not yet been assessed in serious detail, however.

A related format conversion issue arises for dates and times. The fdate primitive described in Section 2.16 [Time] can render the number of milliseconds since the beginning of 1970 (UTC) as a date and time in the current time zone or based on UTC, but there is currently no corresponding primitive for taking a formatted date and time apart into constituents in the manner of the Unix 98 strptime routine nor for recombining those parts into a single integer representing time in the manner of mktime.

6.6.4 High-Level Internet Protocols

One of the strengths of the Java API is its support for Internet protocols above the level of UDP and TCP, such as FTP, HTTP, and even (through third-party sources) SMTP, NNTP, and so on. URL ``connections'' can be opened, and for those which use HTTP, the associated MIME header information can be fetched and set through method calls on the object representing the connection.

In SETL, communication via HTTP is accomplished using a package of SETL routines which must be imported into every SETL program which wants to use them. URLs are probably going to be with us for a long time, and it would be much more convenient and natural to communicate with the entities addressed by URLs through one or more I/O modes such as `url', `url-in', and `url-out', which would be directly supported by open. These would be the first modes to participate in the handling of the data in streams rather than simply passing the data, so some fairly serious design work will be needed here. For example, should MIME headers appear as a map, or be manipulated by a mechanism like getenv/setenv (Section 2.1 [Invocation Environment]), or both? How should non-HTTP protocols such as FTP, which can also be specified with URLs, be treated?

The open-endedness of this problem in fact suggests that the only viable solution will be a modular one, where support for protocols for things like distributed file systems, database systems, and transaction management systems will have to be mediated by add-on modules.


next up previous
Next: 6.7 Beyond the Fringe Up: 6. Conclusions Previous: 6.5 Exceptions
David Bacon
1999-12-10