Title: On Transductive Regression


Authors: Corinna Cortes and Mehryar Mohri

In many modern large-scale learning applications, the amount of
unlabeled data far exceeds that of labeled data. A common instance of
this problem is the 'transductive' setting where the unlabeled
test points are known to the learning algorithm. This paper presents a
study of regression problems in that setting. It presents
'explicit' VC-dimension error bounds for transductive regression
that hold for all bounded loss functions and coincide with the tight
classification bounds of Vapnik when applied to classification.  It
also presents a new transductive regression algorithm inspired by our
bound that admits a primal and kernelized closed-form solution and
deals efficiently with large amounts of unlabeled data. The algorithm
exploits the position of unlabeled points to locally estimate their
labels and then uses a global optimization to ensure robust
predictions. Our study also includes the results of experiments with
several publicly available regression data sets with up to 20,000
unlabeled examples.  The comparison with other transductive regression
algorithms shows that it performs well and that it can scale to large
data sets.