non-computability of overfitting

José Manuel Rodríguez Caballero josephcmac at gmail.com
Fri Jan 21 21:13:07 EST 2022


On Fri, Jan 21, 2022 at 8:53 PM JOSEPH SHIPMAN <joeshipman at aol.com> wrote:

> This is an excellent conjecture, because, if true, it would be the best
> example I know of of a natural uncomputable function with easily computable
> upper and lower bounds that aren’t very far from each other.
>
> — JS
>

Despite the possible ambiguities in my formulation, I think that the
relationship between (natural and artificial) learning and
non-computability can be attributed to Roger Penrose:

" ‘Non-computability’ and causal agency As shown by Gödel's theorem,
Penrose [23], [24] described how the mental quality of ‘understanding’
cannot be encapsulated by any computational system and must derive from
some ‘non-computable’ effect. Moreover, the neurocomputational approach to
volition, where algorithmic computation completely determines all thought
processes, appears to preclude any possibility for independent causal
agency, or free will. Something else is needed. What non-computable factor
may occur in the brain? "
Reference:
https://www.sciencedirect.com/science/article/pii/S1571064513001188



> Sent from my iPhone
>
> On Jan 21, 2022, at 7:51 PM, José Manuel Rodríguez Caballero <
> josephcmac at gmail.com> wrote:
>
> 
> Dear FOM members,
> In the field of (supervised) machine learning, we consider two finite
> sequences x_train = (x_1, ..., x_n) (features) and y_train = (y_1, ...,
> y_n) (labels). We are interested in inferring the rule to obtain y_i from
> x_i. A possible way [1] to formalize the notion of learning in the
> framework of Kolmogorov complexity [2] could be by identifying "learning"
> with the smallest algorithm D such that D(x_i) = y_i for all i = 1, 2, ...,
> n. This algorithm is called the decompressor of x_train and y_train. Then,
> given another sequence x_test = (x_{n+1}, x_{n+2}, ..., x_{n+k}), the
> predictions of this learning process is the sequence y_test = ( D(x_{n+1}),
> D(x_{n+2}), ..., D(x_{n+k}) ). The practical problem is that the function
> associating x_train and x_test to D is non-computable.
>
> What is usually done by machine learning engineers is to consider a space
> [3, 4] containing the algorithms E such that E(x_i) approximates y_i for
> all i = 1, 2, ..., n and smoothly moves from one of such algorithms (point)
> to a neighbor algorithm (point) to improve the approximation. The
> improvements of the approximation are given by the iterations of a method,
> e.g., gradient descent. It was empirically observed that, in practical
> situations, when considering the dataset for testing x_test =
> (x_{n+1}, x_{n+2}, ..., x_{n+k}), the predictions E( x_{n + i} )
> increasingly approximate the values that they are intended to predict as
> the number of iterations increases, but when the number of iterations is
> too large, the approximation degenerates and it is worst than the result
> obtained with fewer iterations. This phenomenon is known as overfitting.
>
> A possible way to mathematically formalize overfitting, although not the
> only one, could be to suppose that E( x_{n + i} ) is intended to predict D(
> x_{n + i} ) for all i = 1, 2, ..., k. In this framework, could be proved
> that the maximum number of iterations just before overfitting is a
> non-computable function of the parameters of the problem? I am not
> excluding that this number could be computable in some particular cases.
>
> Kind regard,
> Jose M.
>
> References:
> [1] Verbeek, Jakob J. "Overfitting using the Minimum Description Length
> Principle." (2000).
> https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.19.9142&rep=rep1&type=pdf
>
> [2] Pearlmutter, Barak A., and Ronald Rosenfeld. "Chaitin-Kolmogorov
> complexity and generalization in neural networks." *Advances in neural
> information processing systems* (1991): 925-931.
>
> [3] Dong, Xiao, Jiasong Wu, and Ling Zhou. "How deep learning works--The
> geometry of deep learning." *arXiv preprint arXiv:1710.10784* (2017).
>
> [4] Neyshabur B, Tomioka R, Salakhutdinov R, Srebro N. Geometry of
> optimization and implicit regularization in deep learning. arXiv preprint
> arXiv:1705.03071. 2017 May 8.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/fom/attachments/20220121/2f39aec5/attachment-0001.html>


More information about the FOM mailing list