Colloquium Details
MAE Talk - Towards a "Science" of Data In Robotics
Speaker: Joey Hejna, Stanford
Location: 5 Metrotech Center LC400
Date: March 2, 2026, 11 a.m.
Host:
Synopsis:
The remarkable success of generalist models in language and vision has been driven by a simple recipe: large, diverse datasets and high-capacity architectures. Robotics, however, presents a unique challenge — robot data is expensive to collect, and more data does not always translate to better performance. This raises a fundamental question: what makes robot data good, and how should we collect, curate, and leverage it? In this talk, I will argue that answering these questions requires a principled science of data for embodied AI. In the pre-training regime, I will describe how we can design methods for efficiently curating and mixing datasets for robotics, dramatically improving the performance of policies. In the post-training regime, I will re-examine how we model preference data from human labelers, allowing us to develop simpler learning algorithms. Finally, I will conclude by outlining future steps towards gaining a better understanding of robot data and the capabilities it will unlock.
Speaker Bio:
Joey Hejna is a final year PhD student in the computer science department at Stanford University, advised by Dorsa Sadigh and supported by an NDSEG fellowship. His research focuses on robot learning, where he tries to leverage grounded insights about data for policy learning. His work in this area was nominated for the best paper award at CoRL 24. Outside of his PhD, Joey spent time as a research intern at both Physical Intelligence and Google DeepMind Robotics. Prior to his PhD, he completed his undergraduate studies at UC Berkeley where he worked with Professors Pieter Abbeel and Lerrel Pinto.
Notes:
In-person attendance only available to those with active NYU ID cards.