Interests and Approach to Computer Vision

Vision touches on all areas of cognition and perception, and occupies a large part of our brain activities. Computer Vision studies the processing of images and thus, in its own way, addresses cognitive and perception issues such as object recognition and stereovision . One of the most intriguing mysteries is how the recognition process, after collecting data (images), partially processes it without knowledge of the recognition task (the bottom up process) and partially uses the model(s)/tasks to quickly recognize them in the image (the top down process). This mystery is central to a classical and fundamental problem in computer vision: to delineate the complete boundary of an object in an image, deciding what is figure and what is background. This problem epitomizes the mystery of how to integrate bottom up and top down processes. The bottom-up process removes image redundancy by grouping local features. The top down process aims to remove further image redundancy by seeking the models in the images.

My work revolves around concrete problems that moves towards this question. (While I will use the first person in describing my research, much of the work is collaborative, as detailed in the publications .) My work seeks to find in images prior model representations of surfaces and object shapes. The choice is not arbitrary.

Surface representation is the most basic concept to help group local features and explain bottom-up processes. One of my early works was to study the formation of surfaces from noisy and sparse data and to connect to other known deterministic approaches. When studying the phenomena of Illusory Contours (Surfaces), I based my formulation on constructing piecewise smooth surfaces, where the discontinuities give rise to the illusory contours (it is known in psychology that all illusory contours are perceived as surface depth boundaries.) Moreover, I have recently investigated how the model incorporates bias toward convex shapes with a new continuous notion of convexity measure of shapes. Most previous work had only been concerned with extending contours and had not considered surfaces (a couple of exceptions have not been tested) and cannot account for convexity. Stereovision is a direct method to recover surfaces. I formulated Stereo Matching as the problem of seeking surfaces from a pair of images (left and right); special attention was given to occlusions and discontinuities of these surfaces. I have understood precisely how to relate occlusions to discontinuities in a computational framework. In my formulation, the optimal matching is the one that yields the optimal surface. We have also studied the detection of Multi-Junctions , since image junctions (e.g., corners, T-junctions, X-junctions) provide important local information about surface occlusions and transparency.

Object shapes are a basic feature in the representation of models. Motivated to understand the top down process and its relation to the bottom up process, I have focused on the problem of recognizing articulated and occluded objects. The work aims at the question: How can a given shape contour model (say a contour of a frontal view of a person) be recognized in an image, when the instance of the model in the image has changed due to view variations, small deformations, articulations, and possibly partial occlusion? When seeking a representation for shapes I showed that energy functions could be applied to this problem domain. This is the first optimization method that computes what is essentially a symmetry axis representation. It minimizes a sum of local energies (based on pairing of the contour shape). It gives a stable solution not sensitive to small changes in the contour, since the energy potential can be crafted for such a goal. The algorithm, which gives guaranteed global optimization in polynomial time, outputs a tree representation of shapes, where edges between nodes represent parts of the object shape. We have also adapted a snake model (for computing contour boundaries) to a Markov Random Field formulation suitable for the use of efficient algorithms. Our recent work has sought to combine these two models ("snake" and "shape") so as to detect a contour that simultaneously matches the tree representation of a task shape. A formulation based on simply summing cost functions is possible, but entails a significant computational burden. We are investigating better ways to combine these models.

Methodologically, I have long been interested in obtaining global characteristics based on the accumulation of local features. It was then natural to first adopt, and subsequently create, Markov Random Field models (MRFs) of surfaces with occlusions and discontinuities. These models support an optimization framework that searches for a global minimum derived by local computations, e.g., dynamic programming, shortest path algorithms, maximum flow algorithms, other graph techniques, and statistical (physics) techniques such as mean field and EM algorithms. In sum, I am seeking representations that are global and delineated by the accumulation of local features. Methods that are global but not based on local structures have difficulties with discontinuities, occlusions and articulations and methods that are local but not global can be led astray trivially.

There are numerous applications related to these questions in the areas of security, medical imaging, and entertainment. I have been interested in applying and testing these ideas on realistic problem applications. At Siemens Corporate Research I developed a snake model, which uses dynamic programming, to detect the left ventricle of the heart from a spatial-temporal sequence of MRI images. The algorithm provides important wall strength information of the left ventricle. The software is a component of Siemens current MRI machines. Presently, I am exploring applications of image registration in medicine.

Teaching is an exciting and important part of my research program. For I believe the field of Vision draws together several important themes in computer science. It touches on deep scientific questions about how our mind works and the essence of computing. It has a great engineering component, capable of building real systems that are cool and of great impact to society. It has one drawback; the discipline is still in its infancy. Thus, it requires elaboration and updating to construct appropriate courses for new students of the field. I am committed to foster the development of the field through teaching and advising students.