Perceptual Organization and Object Recognition

Wednesday, 5pm-7pm. Class room 402.


Davi Geiger, geiger@cs.nyu.edu
Office: WWH-419
Phone: x8-3235
Office Hours: Wednesday 3pm-5pm.

David Jacobs, dwj@research.nj.nec.com
Office: WWH-430
Phone: x8-3340 (Wednesday), (609) 951-2752 (Other days).
Office Hours: Wednesday 3pm-5pm.

Course description

This course will look at advanced topics in higher-level computer vision. Each week, we will read and discuss three papers. During the first half of the course we will consider papers on perceptual organization that address such problems as illusory contour formation, perceptual saliency, and the segmentation of regions in images. During the second half of the course, we will look at work on visual object recognition. We will consider a variety of approaches to recognition, including search based approaches that have been motivated by insights from computational geometry and AI, appearance-based appraoches that attempt to model 3-D objects as the set of 2-D images that they can produce, and work on object classification and shape similarity. While our focus will be on computational work, we will also read a number of papers that describe the psychological background of this work. Our goal will be to achieve an in-depth understanding of the most significant current approaches to these problems.


Previous experience in computer or human vision is desirable, but not required. Students will be asked to read three papers for each class. They will be required to write a one page review of one of these papers, for each class, due before we discuss this paper. Each week, every paper in the next week's class will be assigned to a different student, who will lead the discussion of that paper. Students should therefore expect to lead the discussion for several papers throughout the semester, the exact number depending on class size.


Class 1: Introduction

Perceptual Organization

Class 2: Approaches to Perceptual Organization

Required Reading
    • G. Kanisza, Organization in Vision Chapter 5.
    • Julesz, B., ``Subjective Contours in Early Vision and Beyond,'' DIMACS Workshop on Partitioning Data Sets, edited by Cox, I., Hansen, P., and Julesz, B., 1995.
  • Rock, I., The Logic of Perception, chapter 6th.
  • Witkin, A., and J Tenenbaum, ``On the Role of Structure in Vision,'' In Human and Machine Vision, edited by Beck, Hope, and Rosenfeld. Academic Press, New York.
  • Class 3: Contour Grouping I

    Required Reading
  • S. Parent and S. W. Zucker, ``Trace inference, curvature consistency and curve detection'', IEEE PAMI, Vol. 11, No. 8, pp. 823-839, August 1989.
  • M. Kass, A. Witkin and D. Terzopoulos, Snakes: Active Contour Models, in Proc. First International Conference on Computer Vision, pp. 259-268, England, 1987.
  • D. Field, A. Hayes, R. Hess "Contour integration by the human visual system: evidence for a local "Association Field" " Vision Res. Vol 33, No 2 pp-173-193, 1993.
    • J. Elder and S. Zucker, ``A Measure of Closure,'' Vision Research, 34(24):3361-3370.
    • I. Kovacs and B. Julesz, ``A Closed Curve is Much More than an Incomplete One: Effect of Closure in Figure-Ground Segmentation.'' Proc. Nat. Acad. Sci., USA, 90:7495-7497, 1993.
  • Related Reading

  • R. von der Heydt, ``Form Analysis in Visual Cortex'', pp. 365--382, in The Cognitive Neurosciences, edited by M. Gazzaniga, MIT Press.
  • N. Rubin, K. Nakayama and R. Shapley "Abrupt learning and retinal size specificity in illusory-contour perception" Current Biology Vol 7, No 7, pp 461--467, June 1997.
  • D. Lowe, ``Three-dimensional Object Recognition from Single Two-dimensional Images,'' TR #202, NYU Courant Institute 1986.
  • Class 4: Contour Grouping II

    Required Reading
    • A. Shashua and S. Ullman, ``Structural Saliency: The Detection of Globally Salient Structures Using a Locally Connected Network,'' IEEE ICCV:321-327, 1988.
    • T. Alter and R. Basri, ``Extracting Salient Curves from Images: An Analysis of the Saliency Network,'' Int. J. of Comp. Vision (forthcoming).
  • G. Guy and G. Medioni, ``Inferring Global Perceptual Contours from Local Features,'' Int. J. of Comp. Vis.,20,113-133.
    • L. Williams and D. Jacobs, ``Stochastic Completion Fields: A Neural Model of Illusory Contour Shape and Salience'', Neural Computation, 9(4):837--858.
    • D. Mumford, ``Elastica and Computer Vision'', Algebraic Geometry and its Applications, edited by Chandrajit Bajaj, New York, Springer-Verlag, 1994.
  • Related Reading

  • E. Sharon, A. Brandt and R. Basri, ``Completion Energies and Scale,'' CVPR:884--890, 1997.
  • S. Ullman, ``Filling-in the Gaps: The Shape of Subjective Contours and a Model for their Generation,'' Biological Cybernetics(21):1--6, 1976.
  • S. Grossberg and E. Mingolla, ``Neural Dynamics of Perceptual Grouping: Textures, boundaries and emergent segmentations,'' Perception and Psychophysics, 38(2):141--171, 1985.
  • Class 5: Region Grouping I

    Required Reading
  • S. Geman and D. Geman. "Stochastic relaxation, gibbs distributions, and the bayesian restoration of images", IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6:721--741, 1984.
  • K. Kumaran and D. Geiger and L. Gurvits. "Illusory Surface Perception and Visual Organization" Network: Computation in Neural Systems (7) 33--60, 1996.
  • T. Adelson, "Perceptual Organization and the Judgment of Brightness", Science, Vol. 262, pp:2042-2044, December, 1993.
  • Related Reading

  • D. Mumford and J. Shah. Boundary detection by minimizing functionals, i. In Proc. IEEE Conf. on Computer Vision \& Pattern Recognition, San Francisco, CA, 1985.
  • Class 6: Region Grouping II

    Required Reading
  • Jianbo Shi and Jitendra Malik "Normalized cuts and image segmentation", Conference in Computer Vision and Pattern Recognition, pp 731--737, San Juan, 1997.
  • S. Wuerger, R. Shapley, N. Rubin, "On the visually perceived direction of motion by Hans Wallach: 60 years later", Perception, vol. 25, pp 1317-1367, 1996.
  • Related Reading

  • I. Cox and S. Rao and Y. Zhong. Ratio regions: a technique for image segmentation. Working paper, NEC Research Institute, 1996.
  • A. Amir and M. Lindenbaum, "Quantitative Analysis of Grouping Processes", 4th European Conference on Computer Vision, vol. I, pp 371-384
  • Class 7: The Symmetry Axis

    Required Reading
  • H. Blum, ``Biological Shape and Visual Science'', J. of Theoretical Biology, 38:205-287, 1973.
  • K. Siddiqi and B.B. Kimia, ``Parts of Visual Form: Computational Aspects'', IEEE PAMI, Vol. 17, No. 3, pp. 239-251, March, 1995.
  • J. Ponce, D. Chelberg, W. B. Mann, ``Invariant Properties of Straight Homogeneous Generalized Cylinders and Their Contours'' IEEE PAMI, Vol. 11, No. 9, pp. 951-966, September, 1989.
  • Related Reading

  • G. Scott, S. Turner and A. Zisserman, ``Using a Mixed Wave/Diffusion Process to Elicit the Symmetry Set,'' Image and Vision Computing, 7(1):63--70.
  • T-L. Liu, D. Geiger, R. Kohn, "A Variational Approach to Computing the Self-Similarities of Shapes" International Conference in Computer Vision, India, 1998.

  • Object Detection

    Class 8: Approaches

    Required Reading
  • Biederman, I., 1985, ``Human Image Understanding: Recent Research and a Theory,'' Computer Graphics, Vision, and Image Processing, (32):29-73.
  • H. Bulthoff, S. Edelman, and M. Tarr, ``How Are Three-Dimensional Objects Represented in the Brain?'' MIT AI Memo #1479.
  • D. Marr, ``Visual Information Processing: The Structure and Creation of Visual Representations,'' Phil. Trans. R. Soc. Lond., B. 290, 199--218, 1980.
  • Related Reading

  • Lowe, D., 1985, Perceptual Organization and Visual Recognition, Kluwer Academic Publishers, The Netherlands.
  • W.E.L. Grimson, Object Recognition by Computer: The role of geometric constraints, MIT Press, Cambridge, 1990.
  • Ullman High-Level Vision, MIT Press, Cambridge, 1996.
  • Class 9: Principal Component Analysis

    Required Reading
  • R. Duda and P. Hart "Pattern Classification and Scene Analysis", John Wiley, New York, 1973.
  • M. Turk and A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neuroscience, 3, 71-86.
  • Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J. "Train-ing Models of Shape from Sets of Examples," Proceedings of the British Machine Vision Conference , Springer-Verlag, 1992, pp. 9-18
  • Related Reading

  • D. Beymer and T. Poggio "Image representation for visual learning", Science, 272:1905-1909, June 1996.
  • M. Kirby and L. Sirovich, " The application of the Karhunen-Loeve procedure for the characterization of human faces", IEEE transactions on Pattern Analysis and Machine Intelligence, 12 (1):103-108, January 1990.
  • Y. Adini, Y. Moses, and S. Ullman, ``Face Recognition: The Problem of Compensating for Changes in Illumination Direction,'' IEEE Trans. PAMI 19(7):721--732.
  • Class 10: Representing 3-D objects by their 2-D images

    Required Reading
  • Belhumeur, P. and Kriegman, D., 1996. ``What is the Set of Images of an Object Under All Possible Lighting Conditions?'', Int. J. of Comp. Vis. (forthcoming).
  • Murase, H., Nayar, S. , "Visual Learning and Recognition of 3-D Objects from Apppearance." International Journal of Computer Vision, (1994).
  • Ullman, S. and Basri, R., 1991, ``Recognition by Linear Combinations of Models,'' IEEE Trans. PAMI, 13(10):992-1007.
  • Related Reading

  • Tomasi, C., Kanade, T., "Shape and Motion from Image Streams under Orthography: a Factorization Method," International Journal of Computer Vision, Vol. 9, No. 2 (1992) pp. 137-154.
  • Shashua, A., ``On Photometric Issues in 3D Visual Recognition from a Single Image,'' International Journal of Computer Vision (21)1/2:99--122, (January 1997).
  • Class 11: Search

    Required Reading
  • T. Cass, ``Polynomial-Time Geometric Matching for Object Recognition,'' Int. J. of Comp. Vis. 21(1/2):37--62.
  • Fischler, M. A., and R. C. Bolles, ``Random Sample Consensus: A Paradigm for Model Fitting with Applications to Analysis and Automated Cartography,'' Communications of the Association of Computing Machinery, 24(6):381-395, 1981.
  • A. Triesman and G. Gelade, " A feature integration theory of attention", Cognitive Psychology 12, pp 97-136, 1980.
  • Related Reading

  • Baird, H., 1985, Model-Based Image Matching Using Location, MIT Press, Cambridge. (excerpt)
  • Roberts, L., 1966, ``Machine Perception of Three-Dimensional Solid Objects,'' Optical and Electro-optical Information Processing, edited by J. Tippett, MIT Press, Cambridge.
  • Breuel, T., 1991, ``Model Based Recognition using Pruned Correspondence Search,'' IEEE Conference on Computer Vision and Pattern Recognition, pp. 257--268.
  • Ruckledge, W., ``Locating Objects Using the Hausdorff Distance,'' Fifth Int. Conf. on Comp. Vis.:457--464.
  • Class 12: Indexing

    Required Reading
  • Lamdan, Y., J.T. Schwartz and H.J. Wolfson, 1990, ``Affine Invariant Model-Based Object Recognition,'' IEEE Transactions Robotics and Automation, 6:578--589.
  • Forsyth, D., J.L. Mundy, A. Zisserman, C. Coelho, A. Heller, and C. Rothwell, 1991, ``Invariant Descriptors for 3-D Object Recognition and Pose'', IEEE Transactions on Patern Analysis and Machine Intelligence, 13(10):971--991.
  • D. Jacobs ``Matching 3-D Models to 2-D Images'', International Journal of Computer Vision (21)1/2:123--153, (January 1997).
  • Related Reading

  • Weiss, I., 1993. ``Geometric invariants and object recognition,'' International Journal of Computer Vision, 10(3): 207--231.
  • Weinshall, D. 1993. Model Based Invariants for 3D Vision. International Journal of Computer Vision, 10(1):27-42.
  • Vijayakumar, B., Kriegman, D., and Ponce, J., 1995. ``Invariant-Based Recognition of Complex Curved 3D Objects from Image Contours,'' Fifth Int. Conf. on Comp. Vis.:508--514.
  • Burns, J., R. Weiss, and E. Riseman, 1992, ``The Non-Existence of General-Case View-Invariants,'' Geometric Invariance in Computer Vision, edited by J. Mundy, and A. Zisserman, MIT Press, Cambridge.
  • Class 13: Information Theory and similarity

    Required Reading
  • P. Viola and W. Wells, "Alignment by Maxmization of Mutual Information" International Conference on Computer Vision, Boston, MA, 1995.
  • Grenander, to be announced.
  • Related Reading

  • S. C. Zhu, Y. N. Wu, and D. Mumford, "Filters, Random Fields, and Minimax Entropy (FRAME: Towards a unified theory for texture modeling". Proc. Comp. Vision and Patt. Recog., San Francisco, 1996.
  • S. Mallat and Z. Zhang, ``Matching Pursuit with Time-Frequency Dictionaries'', IEEE Trans. on Signal Processing, Dec. 1993.
  • H. Barlow, ``The absolute efficiency of perceptual decisions,'' in Phil. Trans. of the R. Soc. Lond. B 290 71-82, 1980.
  • Y. Amit, D. Geman and K. Wilder, "Recognizing Shapes from Simple Queries about Geometry" TR University of Massachusetts, 1995.
  • U. Grenander Y. Chow, and D. M. Keenan. "Hands. A Pattern Theoretic Study of Biological Shapes", Springer Verlag, New York, 1991.
  • Class 14: Classification

    Required Reading
  • D. Mumford, "Mathematical Theories of Shape: Do they model perception ?", Proceddings of SPIE workshop on Geometric Methods in Computer Vision
  • E. Rosch, C. Mervis, W. Gray, D. Johnson, and P. Boyes-Braem, ``Basic Objects in Natural Categories'', Cognitive Psychology, 8:382--439.
  • M. Zerroug and R. Nevatia, ``Using Invariance and Quasi-Invariance for the Segmentation and Recovery of Curved Objects,'' Applications of Invariance in Computer Vision, edited by J. Mundy and Z. Zisserman, Springer-Verlag, Berlin, Heidelberg, 1994.