Factor Graphs for Relational Regression
Candidate: Sumit Chopra
Advisor: Yann LeCun


Inherent in many interesting regression problems is a rich underlying inter-sample "Relational Structure". In these problems, the samples may be related to each other in ways such that the unknown variables associated with any sample not only depends on its individual attributes, but also depends on the variables associated with related samples. One such problem, whose importance is further emphasized by the present economic crises, is understanding real estate prices. The price of a house clearly depends on its individual attributes, such as, the number of bedrooms. However, the price also depends on the neighborhood in which the house lies and on the time period in which it was sold. This effect of neighborhood and time on the price is not directly measurable. It is merely reflected in the prices of other houses in the vicinity that were sold around the same time period. Uncovering these spatio-temporal dependencies can certainly help better understand house prices, while at the same time improving prediction accuracy.

Problems of this nature fall in the domain of "Statistical Relational Learning". However the drawback of most models proposed so far is that they cater only to classification problems. To this end, we propose "relational factor graph" models for doing regression in relational data. A single factor graph is used to capture, one, dependencies among individual variables of sample, and two, dependencies among variables associated with multiple samples. The proposed models are capable of capturing hidden inter-sample dependencies via latent variables, and also permits non-linear log-likelihood functions in parameter space, thereby allowing considerably more complex architectures. Efficient inference and learning algorithms for relational factor graphs are proposed. The models are applied to predicting the prices of real estate properties and for constructing house price indices. The relational aspect of the model accounts for the hidden spatio-temporal influences on the price of every house. Experiments show that one can achieve considerably superior performance by identifying and using the underlying spatio-temporal structure associated with the problem. To the best of our knowledge this is the first work in the direction of relational regression and is also the first work in constructing house price indices by simultaneously accounting for the spatio-temporal effects on house prices using large-scale industry standard data set.