Abstract ======= Feature learning is becoming an important topic in machine learning, but most studies have focused on developing heuristic methods without proper justifications. In this talk I'd discuss two application problems we investigated recently that led to simple and effect feature learning algorithms based on sound statistical models. The first model is unsupervised learning of super-vector coding for image classification, and the second model is supervised learning of regularized decision forest for generic nonlinear prediction problems. Both methods can be regarded as learning nonlinear basis functions for statistical additive models. The first method is mathematically motivated by the computation of Hellinger distance between two empirical probability distributions in high dimension, and our solution achieves the state of the art performance in image classification. The second method is motivated by boosted decision trees where we are able to apply the modern concept of structured sparsity regularization to design algorithms that are superior.