Interests and Approach to Computer Vision

Vision touches on all areas of cognition and perception, and occupies a large part of our brain activities. Computer Vision studies the processing of images and thus, in its own way, addresses cognitive and perception issues such as stereovision . It is thought that Vision work by layers of processing, where early layer are general (bottom up) and increase redudancy by adding more representations of the image, thus increasing the storage. Later layers are more task dependent, or model dependent, and remove information from previous layers that are not needed for such tasks (or models), while extracting task (or model) invariant information. These layers compress the information, by removing not needed data and grouping information into invariants. Thus, reducing the information progressively as one moves deeper in the layers.

My work revolves around concrete problems that moves towards this question. (While I will use the first person in describing my research, much of the work is collaborative, as detailed in the publications .)

Skeletons is in Progress. When studying the phenomena of Illusory Contours (Surfaces), I based my formulation on constructing piecewise smooth surfaces, where the discontinuities give rise to the illusory contours (it is known in psychology that all illusory contours are perceived as surface depth boundaries.)I formulated Stereo Matching as the problem of seeking surfaces from a pair of images (left and right); special attention was given to occlusions and discontinuities of these surfaces. I have understood precisely how to relate occlusions to discontinuities in a computational framework. In my formulation, the optimal matching is the one that yields the optimal surface. We have also studied the detection of Multi-Junctions , since image junctions (e.g., corners, T-junctions, X-junctions) provide important local information about surface occlusions and transparency.

Object shapes are a basic feature in the representation of models. Motivated to understand the top down process and its relation to the bottom up process, I have focused on the problem of recognizing articulated and occluded objects. The work aims at the question: How can a given shape contour model (say a contour of a frontal view of a person) be recognized in an image, when the instance of the model in the image has changed due to view variations, small deformations, articulations, and possibly partial occlusion? When seeking a representation for shapes I showed that energy functions could be applied to this problem domain. This is the first optimization method that computes what is essentially a symmetry axis representation. It minimizes a sum of local energies (based on pairing of the contour shape). It gives a stable solution not sensitive to small changes in the contour, since the energy potential can be crafted for such a goal. The algorithm, which gives guaranteed global optimization in polynomial time, outputs a tree representation of shapes, where edges between nodes represent parts of the object shape. We have also adapted a snake model (for computing contour boundaries) to a Markov Random Field formulation suitable for the use of efficient algorithms. Our recent work has sought to combine these two models ("snake" and "shape") so as to detect a contour that simultaneously matches the tree representation of a task shape. A formulation based on simply summing cost functions is possible, but entails a significant computational burden. We are investigating better ways to combine these models.

Methodologically, I have long been interested in obtaining global characteristics based on the accumulation of local features. It was then natural to first adopt, and subsequently create, Markov Random Field models (MRFs) of surfaces with occlusions and discontinuities. These models support an optimization framework that searches for a global minimum derived by local computations, e.g., dynamic programming, shortest path algorithms, maximum flow algorithms, other graph techniques, and statistical (physics) techniques such as mean field and EM algorithms. In sum, I am seeking representations that are global and delineated by the accumulation of local features. Methods that are global but not based on local structures have difficulties with discontinuities, occlusions and articulations and methods that are local but not global can be led astray trivially.

There are numerous applications related to these questions in the areas of security, medical imaging, and entertainment. I have been interested in applying and testing these ideas on realistic problem applications. At Siemens Corporate Research I developed a snake model, which uses dynamic programming, to detect the left ventricle of the heart from a spatial-temporal sequence of MRI images. The algorithm provides important wall strength information of the left ventricle. The software is a component of Siemens current MRI machines. Presently, I am exploring applications of image registration in medicine.

Teaching is an exciting and important part of my research program. For I believe the field of Vision draws together several important themes in computer science. It touches on deep scientific questions about how our mind works and the essence of computing. It has a great engineering component, capable of building real systems that are cool and of great impact to society. It has one drawback; the discipline is still in its infancy. Thus, it requires elaboration and updating to construct appropriate courses for new students of the field. I am committed to foster the development of the field through teaching and advising students.