# FOM: Canonical codings

JoeShipman@aol.com JoeShipman at aol.com
Mon Sep 3 03:18:29 EDT 2001

```Axel Boldt's question about canonical non-computable numbers raises some
interesting issues.  Some codings involve arbitrary choices, others seem to
be more canonical, but precise definitions are lacking.

The following bijective codings of sets of finite objects seem to be arguably
"canonical":

Successor:{non-negative integers}-->{positive integers}
Successor(x)=x+1

Fold:{integers}-->{positive integers}
Fold(x)=0.5 + |2x-0.5| (this "counts" the integers in the sequence
0,1,-1,2,-2,3,-3,...; the only other candidate for canonicity counts
0,-1,1,-2,2,-3,3,...)

Godelnumber:{eventually-zero infinite sequences of non-negative integers}-->
{positive integers}
Godelnumber((x1,x2,x3,...,x_n,0,0,0,0,.....))=(p1^x1)*(p2^x2)*...*(p_n^x_n)
where p1,p2,... is the sequence of primes 2,3,5,...

Basetwo:{finite sets of non-negative integers}-->{non-negative integers}
Basetwo({x1,x2,...,x_n}) = 2^x1+2^x2+...+2^x_n (the empty set goes to an
empty sum which is 0, so combined with Successor this counts finite sets of
non-negative integers {},{0},{1},{0,1},{2},{0,2},{1,2},{0,1,2},{3},...

HF:{hereditarily finite sets}-->{non-negative integers}
HF(X)=sum over x in X of 2^HF(x)
This counts V_omega as
{},
{{}},
{{{}}},
{{},{{}}},
{{{{}}}},
{{},{{{}}}},
{{{}},{{{}}}},
{{},{{}},{{{}}}}},
{{{},{{}}}},
etc.

What is missing?  We might like a canonical enumeration of rationals and of
ordered pairs of non-negative integers.  For pairs there are a number of ways
to do it, but none seems to be obviously "most canonical".  My candidate
would be

OPair:{ordered pairs of non-negative integers}-->non-negative integers
OPair((x,y))=x + (x+y)(x+y+1)/2
This counts pairs
(0,0),(0,1),(1,0),(0,2),(1,1),(2,0),(0,3),(1,2),(2,1),(3,0),(0,4),...

To canonically enumerate the rationals we have a problem.  The usual way to
do it is to just enumerate positive rationals (x+1)/(y+1) by enumerating the
corresponding pairs (x,y), and discarding those expressions not in lowest
terms.  But this "generate and test" procedure is not naturally bijective, in
particular the inverse function is no longer straightforward to calculate.

A somewhat better way would be to use Farey fractions and enumerate rationals
between 0/1 and 1/1 as follows:
Preparatory stage: 0/1,1/1
Stage 0: 1/2,
Stage 1: 1/3,2/3,
Stage 2: 1/4,2/5,3/5,3/4,
Stage 3: 1/5,2/7,3/8,3/7,4/7,5/8,5/7,4/5,
Stage 4: 1/6,...

The pattern here is that at the nth stage we enumerate the 2^n fractions
formed by taking "Farey averages" of adjacent fractions from the previous
stages.  The "Farey average" of a/b and c/d is (a+c)/(b+d).  It is easy to
show this is a bijection of the fractions generated starting with stage 0 and
the rationals between 0 and 1; this can be composed with the "Fold" and
0/1, and get all rationals.

But this still doesn't seem compellingly non-arbitrary.  Any better
suggestions?

We've seen that we can get a canonical coding of Diophantine equations (with
non-negative integer coefficients) by using the Godelnumber coding twice,
once for the sequence of coefficients in a polynomial and once for the
sequence of exponents in a monomial.

Word problems are harder to do canonically because of the asymmetry involved
(you have to give the generators and relations and separately a word to test
for triviality).  Fortunately, for both groups and semigroups the triviality
problem is also unsolvable and instances can be represented more
symmetrically.

Triviality problem for semigroups
Instance: a finite set of relations (pairs of words in a finite alphabet
which are declared to be equal; assume that every letter in the alphabet
appears in at least one relation so that we don't need to specify generators
separately--we can do this because if some generator isn't involved in any
relations the semigroup is nontrivial).
Question: whether the relations imply that all the letters (1-letter words)
in the alphabet are equal to the identity element (corresponding to the empty
word).

The problem with canonically enumerating instances here is that we don't have
such a great pairing function, and a relation needs to be a pair of words
because we don't have inverses available.

Triviality problem for groups
Instance: a finite set W={w1,w2,...,w_n} of words in the doubly infinite
alphabet x_1,x_-1,x_2,x_-2,x_3,x_-3,....
Question: Whether W generates a subgroup of the free group on
{x_1,x_2,x_3,....} that is equal to the subgroup generated by exactly those
generators x_i or x_-i which appear in some word in W.  (As before, we don't
care about generators that aren't involved in any relation word because
they'd prevent triviality.)

This has some hope of being canonical.  We can use Fold to make the alphabet
singly infinite and Basetwo to enumerate finite sets; ***the only problem is
canonically enumerating the set of words in a denumerably infinite
alphabet***.

Here is one dopey way: enumerate all words beginning with x in the 2-letter
alphabet {x,'} and then reinterpret such a word as a word in the denumerably
infinite alphabet {x, x', x'', x'''',x'''',...} in the obvious way, tacking
the empty word on the front.

Can anyone suggest a better way to do this?

(Yes, I know that there is a relatively small integer n such that the word
problem for groups with n-generator presentations is already unsolvable, and
it is easy to enumerate words in a finite alphabet, but the choice of n is
arbitrary until we can find the minimal such n, and we are nowhere near able
to do that!)

-- Joe Shipman

```