Publications

All bibtex in one file: Here.
  • International peer reviewed conferences and journals:

    • Deciding How To Decide: Self-Control and Meta-Decision Making,
      Y-Lan Boureau, Peter Sokol-Hessner, and Nathaniel Daw
      Trends in Cognitive Sciences, 2015
      Abstract | BibTeX | PDF

      Abstract

      Many different situations related to self control involve competition between two routes to decisions: default and frugal versus more resource-intensive. Examples include habits versus deliberative decisions, fatigue versus cognitive effort, and Pavlovian versus instrumental decision making. We propose that these situations are linked by a strikingly similar core dilemma, pitting the opportunity costs of monopolizing shared resources such as executive functions for some time, against the possibility of obtaining a better outcome. We offer a unifying normative perspective on this underlying rational meta-optimization, review how this may tie together recent advances in many separate areas, and connect several independent models. Finally, we suggest that the crucial mechanisms and meta-decision variables may be shared across domains.

      BibTeX

      @inproceedings {boureau-tics-15,
         title={Deciding How To Decide: Self-Control and Meta-Decision Making},
         author={Boureau, Y-Lan and Sokol-Hessner, Peter and Daw, Nathaniel D},
         journal={Trends in Cognitive Sciences},
         year={2015},
         publisher={Elsevier}
      }
         
    • Ask the locals: multi-way local pooling for image recognition,
      Y-Lan Boureau, Nicolas Le Roux, Francis Bach, Jean Ponce, and Yann LeCun
      Proc. International Conference on Computer Vision (ICCV'11), 2011
      Abstract | BibTeX | PDF | Supplemental material

      Abstract

      Invariant representations in object recognition systems are generally obtained by pooling feature vectors over spatially local neighborhoods. But pooling is not local in the feature vector space, so that widely dissimilar features may be pooled together if they are in nearby locations. Recent approaches rely on sophisticated encoding methods and more specialized codebooks (or dictionaries), e.g., learned on subsets of descriptors which are close in feature space, to circumvent this problem. In this work, we argue that a common trait found in much recent work in image recognition or retrieval is that it leverages locality in feature space on top of purely spatial locality. We propose to apply this idea in its simplest form to an object recognition system based on the spatial pyramid framework, to increase the performance of small dictionaries with very little added engineering. State-of-the-art results on several object recognition benchmarks show the promise of this approach.

      BibTeX

      @inproceedings {boureau-iccv-11,
      	title = "Ask the locals: multi-way local pooling for image recognition",
      	author = "Boureau, {Y-Lan} and {Le Roux}, Nicolas and Bach, Francis and Ponce, Jean and LeCun, Yann",
      	booktitle = "Proc. International Conference on Computer Vision (ICCV'11)",
      	publisher = "IEEE",   
      	year = "2011"
      }
         
    • Opponency Revisited: Competition and Cooperation Between Dopamine and Serotonin,
      Y-Lan Boureau and Peter Dayan
      Neuropsychopharmacology Reviews, 2011
      Abstract | BibTeX | PDF

      Abstract

      Affective valence lies on a spectrum ranging from punishment to reward. The coding of such spectra in the brain almost always involves opponency between pairs of systems or structures. There is ample evidence for the role of dopamine in the appetitive half of this spectrum, but little agreement about the existence, nature, or role of putative aversive opponents such as serotonin. In this review, we consider the structure of opponency in terms of previous biases about the nature of the decision problems that animals face, the conflicts that may thus arise between Pavlovian and instrumental responses, and an additional spectrum joining invigoration to inhibition. We use this analysis to shed light on aspects of the role of serotonin and its interactions with dopamine.
      Keywords: dopamine; norepinephrine; opponency; psychiatry and behavioral sciences; reinforcement learning; serotonin

      BibTeX

      @ARTICLE{Boureau2011,
      	author = {Y-Lan Boureau and Peter Dayan},
      	title = {Opponency Revisited: Competition and Cooperation Between Dopamine
      		and Serotonin.},
      	journal = {Neuropsychopharmacology},
      	year = {2011},
      	month = {Jan},
      	volume= {36},
      	pages= {74--97},
      	doi = {10.1038/npp.2010.151},
      	pii = {npp2010151},
      	pmid = {20881948},
      	url = {http://dx.doi.org/10.1038/npp.2010.151}
      }
         
    • Learning Convolutional Feature Hierachies for Visual Recognition,
      Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michael Mathieu, and Yann LeCun
      Advances in Neural Information Processing Systems (NIPS 2010), 2011
      Abstract | BibTeX | PDF

      Abstract

      We propose an unsupervised method for learning multi-stage hierarchies of sparse convolutional features. While sparse coding has become an increasingly popular method for learning visual features, it is most often trained at the patch level. Applying the resulting filters convolutionally results in highly redundant codes because overlapping patches are encoded in isolation. By training convolutionally over large image windows, our method reduces the redudancy between feature vectors at neighboring locations and improves the efficiency of the overall representation. In addition to a linear decoder that reconstructs the image from sparse features, our method trains an efficient feed-forward encoder that predicts quasisparse features from the input. While patch-based training rarely produces anything but oriented edge detectors, we show that convolutional training produces highly diverse filters, including center-surround filters, corner detectors, cross detectors, and oriented grating detectors. We show that using these filters in multistage convolutional network architecture improves performance on a number of visual recognition and detection tasks.

      BibTeX

      @inproceedings{ koray-nips-10,
              author = "Kavukcuoglu, Koray and Sermanet, Pierre and Boureau, {Y-Lan} and Gregor, Karol and Mathieu, {Micha\"el} and LeCun, Yann",
              title = "Learning Convolutional Feature Hierachies for Visual Recognition",
              booktitle = "Advances in Neural Information Processing Systems (NIPS 2010)",
              year = "2010"
      }
      
         
    • A theoretical analysis of feature pooling in vision algorithms,
      Y-Lan Boureau, Jean Ponce, and Yann LeCun
      Proc. International Conference on Machine learning (ICML'10), 2010
      Abstract | BibTeX | PDF

      Abstract

      Many modern visual recognition algorithms incorporate a step of spatial ‘pooling’, where the outputs of several nearby feature detectors are combined into a local or global ‘bag of features’, in a way that preserves task-related information while removing irrelevant details. Pooling is used to achieve invariance to image transformations, more compact representations, and better robustness to noise and clutter. Several papers have shown that the details of the pooling operation can greatly influence the performance, but studies have so far been purely empirical. In this paper, we show that the reasons underlying the performance of various pooling methods are obscured by several confounding factors, such as the link between the sample cardinality in a spatial pool and the resolution at which low-level features have been extracted. We provide a detailed theoretical analysis of max pooling and average pooling, and give extensive empirical comparisons for object recognition tasks.

      BibTeX

      @inproceedings {boureau-icml-10,
      	title = "A theoretical analysis of feature pooling in vision algorithms",
      	author = "Boureau, {Y-Lan} and Ponce, Jean and LeCun, Yann",
      	booktitle = "Proc. International Conference on Machine learning (ICML'10)",
      	year = "2010"
      }
         
    • Learning Mid-Level Features for Recognition,
      Y-Lan Boureau, Francis Bach, Yann LeCun, and Jean Ponce
      Proc. International Conference on Computer Vision and Pattern Recognition (CVPR'10), IEEE, 2010
      Abstract | BibTeX | PDF

      Abstract

      Many successful models for scene or object recognition transform low-level descriptors (such as Gabor filter responses, or SIFT descriptors) into richer representations of intermediate complexity. This process can often be broken down into two steps: (1) a coding step, which performs a pointwise transformation of the descriptors into a representation better adapted to the task, and (2) a pooling step, which summarizes the coded features over larger neighborhoods. Several combinations of coding and pooling schemes have been proposed in the literature. The goal of this paper is threefold. We seek to establish the relative importance of each step of mid-level feature extraction through a comprehensive cross evaluation of several types of coding modules (hard and soft vector quantization, sparse coding) and pooling schemes (by taking the average, or the maximum), which obtains state-of-the-art performance or better on several recognition benchmarks. We show how to improve the best performing coding scheme by learning a supervised discriminative dictionary for sparse coding. We provide theoretical and empirical insight into the remarkable performance of max pooling. By teasing apart components shared by modern mid-level feature extractors, our approach aims to facilitate the design of better recognition architectures.

      BibTeX

      @inproceedings {boureau-cvpr-10,
      	title = "Learning Mid-Level Features for Recognition",
      	author = "Boureau, {Y-Lan} and Bach, Francis and LeCun, Yann and Ponce, Jean",
      	booktitle = "Proc. International Conference on Computer Vision and Pattern Recognition (CVPR'10)",
      	publisher = "IEEE",   
      	year = "2010"
      }
      
         
    • Sparse feature learning for deep belief networks,
      Marc'Aurelio Ranzato, Y-Lan Boureau, and Yann LeCun
      Advances in Neural Information Processing Systems (NIPS 2007), 2007
      Abstract | BibTeX | PDF

      Abstract

      Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the representation to have certain desirable properties (e.g. low dimension, sparsity, etc). Others are based on approximating density by stochastically reconstructing the input from the representation. We describe a novel and efficient algorithm to learn sparse representations, and compare it theoretically and experimentally with a similar machine trained probabilistically, namely a Restricted Boltzmann Machine. We propose a simple criterion to compare and select different unsupervised machines based on the trade-off between the reconstruction error and the information content of the representation. We demonstrate this method by extracting features from a dataset of handwritten numerals, and from a dataset of natural image patches. We show that by stacking multiple levels of such machines and by training sequentially, high-order dependencies between the input observed variables can be captured.

      BibTeX

      @inproceedings{ ranzato-nips-07,
      	author = "Ranzato, Marc'Aurelio and Boureau, {Y-Lan} and LeCun, Yann",
      	title = "Sparse feature learning for deep belief networks",
      	booktitle = "Advances in Neural Information Processing Systems (NIPS 2007)",
      	year = "2007"
      }
       
    • Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition,
      Marc'Aurelio Ranzato, Fu-Jie Huang, Y-Lan Boureau, and Yann LeCun
      Proc. International Conference on Computer Vision and Pattern Recognition (CVPR'07), IEEE, 2007
      Abstract | BibTeX | PDF

      Abstract

      We present an unsupervised method for learning a hierarchy of sparse feature detectors that are invariant to small shifts and distortions. The resulting feature extractor consists of multiple convolution filters, followed by a pointwise sigmoid non-linearity, and a feature-pooling layer that computes the max of each filter output within adjacent windows. A second level of larger and more invariant features is obtained by training the same algorithm on patches of features from the first level. Training a supervised classifier on these features yields 0.64% error on MNIST, and 54% average recognition rate on Caltech 101 with 30 training samples per category. While the resulting architecture is similar to convolutional networks, the layer-wise unsupervised training procedure alleviates the over-parameterization problems that plague purely supervised learning procedures, and yields good performance with very few labeled training samples.

      BibTeX

      @inproceedings{ ranzato-cvpr-07,
      	author = "Ranzato, {Marc'Aurelio} and Huang, {Fu-Jie} and Boureau, {Y-Lan} and LeCun, Yann",
      	title = "Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition",
      	booktitle = "Proc. Computer Vision and Pattern Recognition Conference (CVPR'07)",
      	publisher = "IEEE Press",
      	year = 2007
      }
      
    • A Unified Energy-Based Framework for Unsupervised Learning,
      Marc'Aurelio Ranzato, Y-Lan Boureau, Sumit Chopra, and Yann LeCun
      Proc. Conference on AI and Statistics (AI-Stats), 2007
      Abstract | BibTeX | PDF

      Abstract

      We introduce a view of unsupervised learning that integrates probabilistic and nonprobabilistic methods for clustering, dimensionality reduction, and feature extraction in a unified framework. In this framework, an energy function associates low energies to input points that are similar to training samples, and high energies to unobserved points. Learning consists in minimizing the energies of training samples while ensuring that the energies of unobserved ones are higher. Some traditional methods construct the architecture so that only a small number of points can have low energy, while other methods explicitly “pull up” on the energies of unobserved points. In probabilistic methods the energy of unobserved points is pulled by minimizing the log partition function, an expensive, and sometimes intractable process. We explore different and more efficient methods using an energy-based approach. In particular, we show that a simple solution is to restrict the amount of information contained in codes that represent the data. We demonstrate such a method by training it on natural image patches and by applying to image denoising.

      BibTeX

      @inproceedings{ ranzato-unsup-07,
      	author = "Ranzato, {Marc'Aurelio} and Boureau, {Y-Lan} and Chopra, Sumit and LeCun, Yann",
      	title = "A Unified Energy-Based Framework for Unsupervised Learning",
      	booktitle = "Proc. Conference on AI and Statistics (AI-Stats)",
      	year = 2007
      }
      
  • Thesis:

    Learning hierarchical feature extractors for image recognition
    Y-Lan Boureau
    Phd Thesis, 2012.
    Abstract | PDF

    Abstract

    Telling cow from sheep is effortless for most animals, but requires much engineering for computers. In this thesis, we seek to tease out basic principles that underlie many recent advances in image recognition. First, we recast many methods into a common unsupervised feature extraction framework based on an alternation of coding steps, which encode the input by comparing it with a collection of reference patterns, and pooling steps, which compute an aggregation statistic summarizing the codes within some region of interest of the image.
    Within that framework, we conduct extensive comparative evaluations of many coding or pooling operators proposed in the literature. Our results demonstrate a robust superiority of sparse coding (which decomposes an input as a linear combination of a few visual words) and max pooling (which summarizes a set of inputs by their maximum value). We also propose macrofeatures, which import into the popular spatial pyramid framework the joint encoding of nearby features commonly practiced in neural networks, and obtain significantly improved image recognition performance. Next, we analyze the statistical properties of max pooling that underlie its better performance, through a simple theoretical model of feature activation. We then present results of experiments that confirm many predictions of the model. Beyond the pooling operator itself, an important parameter is the set of pools over which the summary statistic is computed. We propose locality in feature configuration space as a natural criterion for devising better pools. Finally, we propose ways to make coding faster and more powerful through fast convolutional feedforward architectures, and examine how to incorporate supervision into feature extraction schemes. Overall, our experiments offer insights into what makes current systems work so well, and state-of-the-art results on several image recognition benchmarks.