non-computability of overfitting

Wed Jan 26 14:43:48 EST 2022

I am far from being an expert in theoretical ML, but I don't think
overfitting is a solved problem - both theoretically and practically.

Even in the currently hottest neural settings, there has been interesting
new research recently, contradicting the standard "more/bigger training
implies more overfitting" practical wisdom [1].

There are also interesting recent theoretical results about the unusability
of neural methods for hard computational problems [2].

Probably the part of (quite theoretical) ML that is most related to topics
like Kolmogoroff complexity is Solomonoff's induction [3]. There certainly
are provable theorems there about uncomputability. But I am not sure how
much overfitting was studied in that setting.

Josef

[1]: https://arxiv.org/pdf/1812.11118.pdf
[2]: https://arxiv.org/abs/2002.09398
[3]:
https://en.wikipedia.org/wiki/Solomonoff%27s_theory_of_inductive_inference

On Sat, Jan 22, 2022 at 1:52 AM José Manuel Rodríguez Caballero <
josephcmac at gmail.com> wrote:

> Dear FOM members,
> In the field of (supervised) machine learning, we consider two finite
> sequences x_train = (x_1, ..., x_n) (features) and y_train = (y_1, ...,
> y_n) (labels). We are interested in inferring the rule to obtain y_i from
> x_i. A possible way [1] to formalize the notion of learning in the
> framework of Kolmogorov complexity [2] could be by identifying "learning"
> with the smallest algorithm D such that D(x_i) = y_i for all i = 1, 2,
> ..., n. This algorithm is called the decompressor of x_train and y_train.
> Then, given another sequence x_test = (x_{n+1}, x_{n+2}, ..., x_{n+k}),
> the predictions of this learning process is the sequence y_test = ( D(x_{n+1}),
> D(x_{n+2}), ..., D(x_{n+k}) ). The practical problem is that the function
> associating x_train and x_test to D is non-computable.
>
> What is usually done by machine learning engineers is to consider a space
> [3, 4] containing the algorithms E such that E(x_i) approximates y_i for
> all i = 1, 2, ..., n and smoothly moves from one of such algorithms (point)
> to a neighbor algorithm (point) to improve the approximation. The
> improvements of the approximation are given by the iterations of a method,
> e.g., gradient descent. It was empirically observed that, in practical
> situations, when considering the dataset for testing x_test =
> (x_{n+1}, x_{n+2}, ..., x_{n+k}), the predictions E( x_{n + i} )
> increasingly approximate the values that they are intended to predict as
> the number of iterations increases, but when the number of iterations is
> too large, the approximation degenerates and it is worst than the result
> obtained with fewer iterations. This phenomenon is known as overfitting.
>
> A possible way to mathematically formalize overfitting, although not the
> only one, could be to suppose that E( x_{n + i} ) is intended to predict
> D( x_{n + i} ) for all i = 1, 2, ..., k. In this framework, could be proved
> that the maximum number of iterations just before overfitting is a
> non-computable function of the parameters of the problem? I am not
> excluding that this number could be computable in some particular cases.
>
> Kind regard,
> Jose M.
>
> References:
> [1] Verbeek, Jakob J. "Overfitting using the Minimum Description Length
> Principle." (2000).
> https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.19.9142&rep=rep1&type=pdf
>
> [2] Pearlmutter, Barak A., and Ronald Rosenfeld. "Chaitin-Kolmogorov
> complexity and generalization in neural networks." *Advances in neural
> information processing systems* (1991): 925-931.
>
> [3] Dong, Xiao, Jiasong Wu, and Ling Zhou. "How deep learning works--The
> geometry of deep learning." *arXiv preprint arXiv:1710.10784* (2017).
>
> [4] Neyshabur B, Tomioka R, Salakhutdinov R, Srebro N. Geometry of
> optimization and implicit regularization in deep learning. arXiv preprint
> arXiv:1705.03071. 2017 May 8.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/fom/attachments/20220126/ed839bca/attachment.html>