Towards Scalable Representation Learning for Visual Recognition
Speaker: Saining Xie, Facebook AI Research
Location: 60 Fifth Avenue 150
Date: March 29, 2022, 2 p.m.
A powerful biological and cognitive representation is essential for humans' remarkable visual recognition abilities. Deep learning has achieved unprecedented success in a variety of domains over the last decade. One major driving force is representation learning, which is concerned with learning efficient, accurate, and robust representations from raw data that are useful for a downstream classifier or predictor.
A modern deep learning system is composed of two core and often intertwined components: 1) neural network architectures and 2) representation learning algorithms. In this talk, we will present several studies in both directions. On the neural network modeling side, we will examine modern network design principles and how they affect the scaling behavior of ConvNets and recent Vision Transformers. Additionally, we will demonstrate how we can acquire a better understanding of neural network connectivity patterns through the lens of random graphs. In terms of representation learning algorithms, we will discuss our recent efforts to move beyond the traditional supervised learning paradigm and demonstrate how self-supervised visual representation learning, which does not require human annotated labels, can outperform its supervised learning counterpart across a variety of visual recognition tasks. The talk will encompass a variety of vision application domains and modalities (e.g. 2D images, 3D scenes and languages). The goal is to show existing connections between the techniques specialized for different input modalities and provide some insights about diverse challenges that each modality presents. Finally, we will discuss several pressing challenges and opportunities that the ``big model era’’ raises for computer vision research.
Saining Xie is a research scientist at Facebook AI Research (FAIR). He received his Ph.D. and M.S. degrees in computer science from the University of California San Diego, advised by Zhuowen Tu. Prior to that, he received his Bachelor's degree from Shanghai Jiao Tong University. He has broad research interests in deep learning and computer vision, with a focus on developing deep representation learning techniques to push the boundaries of core visual recognition. His research has been extensively cited (more than 16,000 times) by other researchers and adopted in several industrial-scale applications. He is also a recipient of the Marr Prize Honorable Mention at ICCV 2015.
In-person attendance only available to those with active NYU ID cards.