Authors: Martin Raphan and Eero P. Simoncelli


Title: Learning least squares estimators without assumed priors or supervision 

 Abstract: 
 The two standard methods of obtaining a least-squares optimal estimator are
 (1) Bayesian estimation, in which one assumes a prior distribution on the true
 values and combines this with a   model of the measurement process to obtain
 an optimal estimator, and (2) supervised regression, in which one optimizes a
 parametric estimator over a training set containing pairs of corrupted
 measurements and their associated true values.  But many real-world systems do
 not have access to either supervised training examples or a prior model.
 Here, we study the problem of obtaining an optimal estimator given a
 measurement process with known statistics, and a set of corrupted measurements
 of random values drawn from an unknown prior.  We develop a general form of
 nonparametric empirical Bayesian estimator that is written as a direct
 function of the measurement density, with no explicit reference to the prior.
 We study the observation conditions under which such "prior-free" estimators
 may be obtained, and we derive specific forms for a variety of different
 corruption processes.  Each of these prior-free estimators may also be used to
 express the mean squared estimation error as an expectation over the
 measurement density, thus generalizing Stein's unbiased risk estimator (SURE)
 which provides such an expression for the additive Gaussian noise case.
 Minimizing this expression over measurement samples provides an "unsupervised
 regression" method of learning an optimal estimator from noisy measurements in
 the absence of clean training data.  We show that combining a prior-free
 estimator with its corresponding unsupervised regression form produces a
 generalization of the "score matching" procedure for parametric density
 estimation, and we develop an incremental form of learning for estimators that
 are written as a linear combination of nonlinear kernel functions.  Finally,
 we show through numerical simulations that the convergence of these estimators
 can be comparable to their supervised or Bayesian counterparts.