FOM: The axioms of ZF

Fri Jan 23 19:45:48 EST 1998

In one of his previous messages Harvey Friedman noted that
The ZF axioms can be viewed as an extrapolation of the axioms of 
finite set theory. This is a very important observation (although I 
am not sure what its implications really are). In this message  
I like to give further support to the claim that most of ZF axioms are
just an extrapolation from the finite case by looking at them from a 
somewhat unusual point of view: that of commercial relational database 
query languages (like SQL).  Although the axioms of ZF and these 
query languages were developed completely independently of each other, 
the similarity of ideas is striking.  Moreover: this point of view  
provides a new, different presentation of ZF, which is based on  purely 
syntactic considerations  (in contrast to the usual semantical one).

 First, to those who are unfamiliar with the subject: a database is 
just a finite set of FINITE relations P_1,...,P_n (called "tables") 
which are defined on some domain. To make things simpler we may
assume here that a domain can be either the set of natural
numbers or some subset of it which includes all the numbers which
are mentioned in the tables. A query is just a first-order formula
A(X) (where X=x_1,..,x_k is a list of all the free variables
in A). The language in which  A is written includes =, P_1,..., P_n, 
numerals, and perhaps symbols for some standard relations and functions, 
like < and +. An answer to a query is the set of all the tuples in the 
domain which satisfy it, given the interpretation of P_1,..,P_n 
which is supplied by the database, together with the standard 
interpretation of the other symbols.

Now, not every formula can be accepted as a query as far as database 
systems are concerned. A first demand that we have is that the answer 
to a query should be FINITE (so it can be presented as a "table").
A query is usually called "SAFE" if this is the case for every possible
finite interpretation of P_1,...,P_n. A second demand is that the answer
to a query should be COMPUTABLE. A third, stronger demand, is
that the query should be DOMAIN-INDEPENDENT. This means that the
answer should be the same relative to every domain which includes the
numbers that are mentioned either in the query or in one of the tables.
Domain-independence implies the other two conditions (but not vice versa).
Commercial query languages are designed so that it is possible to
make in them only domain-independent queries. Unfortunately, neither
"safety" nor "Domain-independence" are decidable (even if we allow
only P_1,...,P_n and = in queries and consider only finite domains). 
Therefore all commercial query languages allow to use as queries only 
formulas from some decidable, syntactically defined subclass of the class
of domain-independent queries. Several suggestions have been made
what this syntactically defined subclass should be
(and some of them have also ben implemented). What I shall  
describe below is a natural improvement of that which is described 
in Ullman's book: "Principles of Database and Knowledge-Base systems".
The idea of this improvement is to generalize the notions of safety 
and of domain-independence, so that the basic syntactic characterization 
will be that of a formula A(X,Y) being safe (or domain-independent) 
RELATIVE to X. The intented meaning is that every formula which is 
obtained from A(X,Y) by substituting concrete values for the variables 
in Y should be safe (or domain-independent) according to the original,
restricted  definitions of these notions. A domain-independent (or safe)
formula is then  just a formula which is domain-independent (or safe)
relative to its set of free variables. Now the use of this relativized 
notion allows a clean syntactical characterization of the queries
Ullman and others permit to use. This characterization is given by 
the following conditions:

0) If A is d.i. (domain-independent) relative to X and Z is a subset
   of X the A is d.i. relative to Z.
1) x=t and t=x are d.i. relative to {x} if t is a valid term not
   containing  free occurences of x.
2) P_i(t_1,...,t_n_i) is d.i. relative to X in case X is a subset
      of {t_1,...,t_n_i}.
3) If A and B are both d.i. relative to X then so is A\vee B (A or B).
4) If A is d.i. relative to X and $x \in X$ then $\exists x A$ 
   is d.i. relative to X-{x}.
5) if A is d.i. relative to X, B is d.i. relative to Y, and no y in Y
   is free in A, then A&B and B&A are d.i. relative to the union of X and Y.  
6) If A  is d.i. relative to the empty set of variables, then so 
   is its negation.

(it is worth noting that condition 3 corresponds to the union operation
of what is known as the relational algebra, condition 4 to projection and
condition 5 to join and selection)

When we turn to the weaker notion of safety, we can characterize a wide 
class of (relatively) safe formulas by replacing the last condition with:

7) Every formula is safe relative to the empty set of variables.

(one can add also conditions like: x<t is safe relative to {x} provided t
does not contain free occurences of x. It will be then also d.i. w. r. t.
{x} if we consider only domains which are initial sagements of N)

Let us turn now to ZF. Here we may call a formula A "safe relative to
the variable x" if unrestricted comprehension is allowed w.r.t.
A and x, i.e.: $\exists Z (\forall x. x\in Z iff A$ is valid/provable
(depending on one's philosophy). Here A may, of course, contains 
parameters. 

For simplicity of presentation, I assume in what follows that the language
we use has the means to make definitions (this assumption is tacitly
assumed also in Shoenfield's chapter in the Handbook of Mathematical
Logic, in which the axioms of ZF are explained and an attempt is
made to justify them on a semantic ground). I'll assume also that
=, $\in$ and $\subseteq$ are all primitive, and that the axioms of ZF
include the basic relationships between them. The various
comprehension axioms of ZF (pairing, union, separation, replacement
and powerset) can then be replaced by the following principle:

  If the formula A is SYNTACTICALLY safe relative to x then 
unrestricted comprehension w.r.t. A and x is an axiom.

What remains to be defined is the notion of SYNTACTICALL SAFETY. For
this one needs again to generalize it to the notion of A being safe
relative to an arbitrary finite set X of variables. For this it is 
possible to use exactly  the same conditions 0-7 as above, with just one 
change: replace condition 2 by: $x \in t$ and $x \subseteq t$ 
are safe relative to {x}, provided t contains no free occurences of x. 
since \in and \subseteq are the primitives 
relations, this in fact is the most natural change to make!)
What we get is a system which is equivalent to ZF (provided, of course,
that we add the infinity and the foundation axioms).

As an example, note that the existence of the cartezian product 
of two sets, U and V follows from the fact that the following
formula is syntactically safe realative to x:

 $\exists a \exist b. a\in U \wedge b\in V \wedge x=<a,b>$

(one should here justifies first the use of the term <a,b>, which 
is easy. U and V are here parameters). Note that the powerset
axiom is NOT used in this argument!) 

The upshot of this is that safe queries in finite database theory
and formulas which are allowed in forming sets in ZF are defined by 
exactly the same principles!

Two notes are in order here:

1) The powerset axiom is reflected here by choosing $\subseteq$ as
a primitive relation. If we take it as a defined one (in the usual
way) then we need to formulate a more subtle extra condition
concerning safety of the bound in bounded universal quantification.

2) If we insist on using just standard first-order language
in the basic signature (i.e.: no definitions of new constants
or new function symbols are allowed) then replacement causes problems. 
The solution is to translate the above meta-conditions of 
safety into the language. Details will be supplied in a paper
I am now writing.

  Finally, what about domain-independence? Is there any analogue in
the literature on set theory? Sure there is: the concept of Absoluteness
is in fact a particular case of relative domain-independence: it is
domain-independence relative to the empty set of variables. The exact
meaning of this notion is determined here by the class of domains 
we are willing to consider.
(In set theory we allows, first of all, only transitive domains. Usually
we have other conditions as well). Again a large class of formulas
which are usually d.i. relative to some set X of variables is determined
by conditions 0-6 above. Only this time we usually demand only $x\in t$
to be d.i. ralative to {x} in condition 2 (provided x is not free in t),
but not $x \susseteq t$, and the term t in conditions 1 and 2 should be 
absolute itself (a proper definition of this should also be given,
of course). It is interesting that while 
in relational database theory the main interest is in d.i. of formulas 
relative to all of their free variables, in set theory the interest 
has been in d.i. of formulas relative to the EMPTY set of variables.

  A detailed investigation of relative d.i. (or relative absoluteness)
in set theory will also be included in the paper I am now writing.

Arnon Avron

Name: Arnon Avron
Position: Professor of Mathematics and Computer Science
Institution: Tel-Aviv University (Israel)
Research interest: foundations of Logic and mathematics, Applications
                   of Logic in Computer Science, non-classical logics.
For more information: www.math.tau.ac.il/~aa/