NYC Computer Vision Day 2024
Date April 1, 2024, 10AM - 6PM
location New York University Kimmel Center, Rosenthal Pavilion (Map)


The NYC Computer Vision Day is an invite-only event that aims to be an informal day where the computer vision community from NYC and surroundings can share ideas and meet. A primary focus is visibility for graduate students and early career researchers. In addition to a strong showing from ≈ 260 researchers from 60+ research labs and 15+ universities, we anticipate a small number of our industry friends.

Tentative Schedule

Our schedule gives near-equal time to informal discussion and talks to encourage discussion; our talks are primarily from students to give them visibility in the community; our start is slightly later than usual since people are coming from far away.

Breakfast, lunch, and coffee are all provided, and there is time for informal discussion before the official start time of 10AM and after the official end time of 6PM.

9:30 - 10AMBreakfast Available and Informal Conversations (Not part of official program)
10AM - NoonTalk Session 1: 2 Keynotes, 9 Lightning Talks
photo of Chuang Gan Keynote 1: Chuang Gan (UMass Amherst)
Learning World Models for Embodied Generalist Agents
⚡ Rundi Wu (Columbia): ReconFusion: 3D Reconstruction with Diffusion Priors
⚡ Yueyu Hu (NYU): Towards 3D Telepresense via Point Cloud Videos: Compression, Streaming, and Rendering
⚡ Jason Ma (Penn): Foundation Reward Models for Robot Learning from Human Videos
⚡ Akshaj Veldanda (NYU): Hyper-parameter Tuning for Fair Classification without Sensitive Attribute Access
photo of Christine Allen-Blanchette Keynote 2: Christine Allen-Blanchette (Princeton)
Representing color as a symmetry

We introduce a convolutional neural network equivariant to color variation by design. We leverage the observation that changes in hue and saturation can be modeled geometrically to encode color information explicitly, resulting in improved interpretability, accuracy and generalizability over conventional counterparts.
⚡ Ruoshi Liu (Columbia): Learning to Design Tools in the Real World
⚡ Zeliang Zhang (U. Rochester): Discover and Mitigate Multiple Biased Subgroups in Image Classifiers
⚡ Aishik Konwer (Stony Brook University): Less is Enough: Representation Learning with Low-resource Medical Imaging Datasets
⚡ Katherine Xu / Huzheng Yang (Penn): Amodal Completion via Progressive Mixed Context Diffusion / Brain Decodes Deep Nets
⚡ Shengyi Qian (Michigan/NYU): Understanding 3D Object Interaction from Ordinary Images
Noon - 1:30PMLunch & Networking
1:30 - 2:40PMTalk Session 2: 10 Lightning Talks
⚡ Mahi Shafiullah (NYU): On Bringing Robots Home
⚡ Ruyi Lian (Stony Brook University): CheckerPose: Progressive Dense Keypoint Localization for Object Pose Estimation with Graph Neural Network
⚡ Faith Johnson (Rutgers): Feudal Networks for Visual Navigation
⚡ Aditya Chattopadhyay (Penn/JHU): An Information-theoretic Framework for Explainable ML
⚡ R. Kenny Jones (Brown): Learning to Infer Generative Template Programs for Visual Concepts
⚡ Xuan Wang (CUNY): GMC: A General Framework of Multi-stage Context Learning and Utilization for Visual Detection Tasks
⚡ Nate Gillman (Brown): Self-Correcting Self-Consuming Loops for Generative Model Training
⚡ Lahav Lipson (Princeton): Rapid 3D Mapping
⚡ Rahul Sajnani (Brown): GeoDiffuser: Geometry-Based Image Editing with Diffusion Models
⚡ Mingzhen Huang (SUNY Buffalo): Detecting Text-Image Inconsistency with Diffusion Models
2:40 - 3:40PMPoster Session 1 & Coffee Break
3:40 - 5:00PMTalk Session 3: 1 Keynote, 7 Lightning Talks
Photo of Lingjie Liu Keynote 3: Lingjie Liu (Penn)
Single-view 3D Reconstruction with Diffusion Priors

Single-view 3D reconstruction is an ill-posed problem for traditional reconstruction algorithms. In this talk, I will present our recent work using diffusion priors for single-view reconstruction.
⚡ Alexandros Graikos (Stony Brook University): Diffusion models for synthesis of large digital histopathology images
⚡ Shimian Zhang (Penn State): Recurrence in Human and Machine Perception
⚡ Sunnie S. Y. Kim (Princeton): Bridging Computer Vision and HCI: Understanding End-Users' Trust and Explainability Needs in a Real-World Computer Vision Application
⚡ Cheng Phoo (Cornell) & Utkarsh Mall (Cornell/Columbia): Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment
⚡ Xudong Lin (Columbia): Stop Wasting Computation on Crossmodal Pretraining for Large Multimodal Models
⚡ Xichen Pan (NYU): Image Sculpting: Precise Object Editing with 3D Geometry Control
⚡ Irving Fang (NYU): EgoPAT3Dv2: Predicting 3D Action Target from 2D Egocentric Vision for Human-Robot Interaction
5:00 - 6:00PMPoster Session 2 & Snacks
6:00 - 6:30PMSnacks Available and Informal Conversations (Not part of official program)

Presentation Information

We will have both oral presentations and posters.

Posters: Each attending PI will be given a 24" (high) x 36" (wide) posterboard in one session. This can be used as the PI sees fit: for instance, a single larger poster or multiple smaller posters.

Presentations: Lightning talks will have 5 minutes to present, and keynotes will have 17 minutes plus 5 minutes of questions.

Attendance Information

NYU has strict building security and our room has limited capacity (≤284 in the format that allows us posters). Thus, we have a strict guest list. If you are not a confirmed guest, you will not be admitted to the building or event. There are no exceptions.

Organizers and Sponsors

David Fouhey1 (Lead), Jia Deng4, Noah Snavely3, Olga Russakovsky4, Carl Vondrick2, Saining Xie1
1New York University 2Columbia University 3Cornell Tech 4Princeton University


NYU Tandon
      NYU Courant

Image from Daniel Schwen, Icons from Flaticon