non-computability of overfitting

Wed Jan 26 14:40:19 EST 2022

Sam Sanders asked:

> The question is thus whether the below ML result is of the same kind, or
> whether
> ML[machine learning] problems that appear in practise?
> are undecidable.

Hervert A. Simon [2] wrote an essay about which machine learning problems
appear in practice. I quote his conclusion, which may be applied to
mathematics as is done in practice by human mathematicians (in opposition
as to how a machine could do mathematics):

My thesis has been that one path to the construction of a non-trivial
> theory of complex systems is by way of a theory of hierarchy. Empirically,
> a large proportion of the complex systems we observe in nature exhibit
> hierarchic structure. On theoretical grounds we could expect complex
> systems to be hierarchies in a world in which complexity had to evolve from
> simplicity. In their dynamics, hierarchies have a property,
> near-decomposability, that greatly simplifies their behavior.
> Near-decomposability also simplifies the description of a complex system,
> and makes it easier to understand the information needed for the
> development or reproduction of the system can be stored in reasonable
> compass.

There are results in the theory of artificial neural networks that are
useless in practice, for example, the Universal Approximation Theorem [1],
which states that a neural network can approximate any continuous function
over any compact if and only if the continuous function for activating
neurons is not a polynomial. According to the experts, e.g., Stephane
Mallat, this result is useless in practice since it's just the statement
that neural networks with non-polynomial activation functions can be used
as computer memory (it approximates the continuous function over a compact,
just because it memorizes it, not because it learned something deep about
the functions).

A neural network can be determined by a (finite-dimensional) tensor (this
is why the library for neural networks for the programming language python
is called TensorFlow). So, we can imagine the process of learning from data
as a sequence of tensors having the same dimensions. Any tensor is produced
from the previous tensor and the data using an algorithm of optimization,
e.g., gradient descent. Let C(n) be the Kolmogorov complexity of the n-th
term in this sequence of tensors (notice that this function is defined up
to a constant, hence we are in an asymptotic framework). If the first
tensor is zero, C(1) is at its minimum value: the neural network is
extremely easy to describe. When n grows, C(n) may increase and decrease
several times. A necessary condition for learning could be that, for an
index n_learning, the Kolmogorov complexity C(n_learning) of the neural
network at iteration n_learning should be the same as the Kolmogorov
complexity of the data used for training. A sufficient condition for
overfitting could be the fact that C(n) moves far away from the Kolmogorov
complexity of the data used for training for some n > n_learning. The
conjecture is that n_learning is cannot be computed as a function of the
parameters of the problem.

Notice that a way to describe a fixed overfitted neural network (a computer
memory), is essential as a pair consisting of the data used for training
and the number of steps needed to memorize the data. If this is a minimal
description, then it is strictly larger than the minimal description of the
data, because it involves the above-mentioned number of steps.

As a practical application, which may provide insights into the field of
human intelligence, let's consider the question: if humans are essentially
learning machines, why do they understand instead of memorizing
everything? In other words: if humans are essentially learning machines,
why they do not overfit? An approach would be to deny that humans can be
modeled by Turing machines. This is the position of J.R. Lucas [3] and
Roger Penrose [4]. Both claim to derive this conclusion from Gödel’s first
incompleteness theorem. This position has been criticized by Solomon Feferman
[5].

My solution would be that the uncomputability of the problems humans face
prevents them from overfitting. In the case of mathematicians, we can
imagine an expert simply doing research on the Monster (finite) group and
ignoring everything else in mathematics, including the natural numbers. For
the sake of argument, let's assume that this human researcher is immortal
but mentally limited as any human. Then, given enough time, it's easy to
imagine this expert giving up reasoning and just memorizing everything
about the Monster group. Nevertheless, if his field of expertise was number
theory, the overfitting would never happen, since it would be prevented by
the undecidability of the Entscheidungsproblem (Hilbert's decision
problem). Notice that this argument can be applied to any artificial
learning machine facing uncomputable problems. According to Gregory Chaitin
[6], undecidability, uncomputability and similar phenomena from the field
of foundations of mathematics are crucial for biological evolution.

Kind regards,
Jose M.

[1] Pinkus, Allan (January 1999). "Approximation theory of the MLP model in
neural networks". *Acta Numerica*. 8: 143–195. Bibcode
<https://www.wikiwand.com/en/Bibcode_(identifier)>:1999AcNum...8..143P
<https://ui.adsabs.harvard.edu/abs/1999AcNum...8..143P>. doi
<https://www.wikiwand.com/en/Doi_(identifier)>:10.1017/S0962492900002919
<https://doi.org/10.1017%2FS0962492900002919>.

[2] Simon, Herbert A. "The architecture of complexity." *Facets of systems
science*. Springer, Boston, MA, 1991. 457-476.
https://cs.brown.edu/courses/csci1810/resources/ch3_readings/Simon%20-The%20Architecture%20of%20Complexity.pdf

[3] Lucas, John R. "Minds, Machines and Gödel." *Philosophy* 36.137 (1961):
112-127.

[4] Penrose, Roger. *Shadows of the Mind*. Vol. 4. Oxford: Oxford
University Press, 1994.

[5] Feferman, Solomon. "Penrose's Gödelian argument." *Psyche* 2.1 (1996).
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.518.8666&rep=rep1&type=pdf

[6] Chaitin, G. J. "Life as evolving software." *World Scientific:
Singapore* (2012).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/fom/attachments/20220126/c1c6f436/attachment-0001.html>