Authors: Martin Raphan and Eero P. Simoncelli Title: Learning least squares estimators without assumed priors or supervision Abstract: The two standard methods of obtaining a least-squares optimal estimator are (1) Bayesian estimation, in which one assumes a prior distribution on the true values and combines this with a model of the measurement process to obtain an optimal estimator, and (2) supervised regression, in which one optimizes a parametric estimator over a training set containing pairs of corrupted measurements and their associated true values. But many real-world systems do not have access to either supervised training examples or a prior model. Here, we study the problem of obtaining an optimal estimator given a measurement process with known statistics, and a set of corrupted measurements of random values drawn from an unknown prior. We develop a general form of nonparametric empirical Bayesian estimator that is written as a direct function of the measurement density, with no explicit reference to the prior. We study the observation conditions under which such "prior-free" estimators may be obtained, and we derive specific forms for a variety of different corruption processes. Each of these prior-free estimators may also be used to express the mean squared estimation error as an expectation over the measurement density, thus generalizing Stein's unbiased risk estimator (SURE) which provides such an expression for the additive Gaussian noise case. Minimizing this expression over measurement samples provides an "unsupervised regression" method of learning an optimal estimator from noisy measurements in the absence of clean training data. We show that combining a prior-free estimator with its corresponding unsupervised regression form produces a generalization of the "score matching" procedure for parametric density estimation, and we develop an incremental form of learning for estimators that are written as a linear combination of nonlinear kernel functions. Finally, we show through numerical simulations that the convergence of these estimators can be comparable to their supervised or Bayesian counterparts.