6.892 Statistical Vision and Learning

Syllabus

The course, because it is new, will not be following a precise syllabus. My current plan is to spend the first two days motivating the course.

Review the major aspects of visual processing.
Touch on some influential approaches.
Discuss Marr's thinking on scientific theories.

We will then spend 4 or 5 classes reviewing background material. This will include the basic principles of statistical inference, an introduction to the neurophysiology of the visual system, a small bit of information theory, and a discussion of some ``neural network'' research.

The first few lectures will follow Duda and Hart quite closely. Currently it appears as though this will be the only ``required'' textbook. It is my personal belief that perhaps 90% of current research on statistical approaches to computer vision, learning, or neural networks is closely related to sections of Duda and Hart. It is well worth the purchase price.

Bayes Decision Theory
Classifiers and Discriminant Functions
Parameter estimation and supervised learning
Curse of dimensionality
Non-parametric techniques
Linear discriminant functions
Unsupervised learning and clustering
Multi-layer perceptrons.

We will briefly review some of the known physiology of the visual cortex. While much has been revealed in recent years, as contained in literally 1000's of journal articles, some very fundamental questions remain. Since our time is limited, the lectures will be aimed at getting the class to the point where we can understand some of the computational theories of perceptual processing in the visual cortex. In this section we will briefly cover:

The optics of the eye.
The structure of the retina.
Projections from the eye to LGN and visual cortex.
The receptive field.
Retinotopy.
Segregation of visual input.
Primary visual cortex.

Cell response properties.
Organization.

Temporal Cortex (are IT).

This will bring us to the first material perhaps not covered in other classes: information transmission theories of visual processing. There were a number of pioneering theories for what visual processing might be for (for example see Barlow, Letvin and others). The visual processing areas are adaptive, and rely on visual experience to insure proper development. Kittens raised in the dark grow up to become cats who cannot see. This provides an intriguing test for theories of visual processing: can they be used to define adaptation algorithms that when exposed to various ``natural'' stimuli yield physiologically plausible receptive fields? Before proceeding we will review information theory. The best textbook in this area is Cover and Thomas (it too is well worth the money but most likely I will try to copy sections of it...).

The entropy of a random process.
Discrete vs. continuous entropy.
Mutual information.
Asymmetric divergence.
Information transmission.
The gaussian and a bag of info theory tricks.
Entropy estimators.

We will then discuss a number of influential theories in the area (2 or 3 classes).

Linsker's Info-Max theory of receptive field development.
Hebb's rule for the adaptation of the synapse.
Oja's rule for determining the first principal component.
Oja's multi-unit and Sanger's GHA algorithm for PCA.
Atick
What is the distribution of natural images?
Field's work on the role of visual processing.
Olhausen and Field's theory of receptive field development.
Bell and Sejnowski's theory of receptive field development.

The Bell and Sejnowski ICA algorithm.
The Perlmutter and Parra Bayseian formulation.

If there is time and interest we will discuss some theories for the topographic layout of the visual cortex. These are interesting because they are related to the process of mixture modelling.

Kohonen map.
The elastic net.

Finally to round out our discussion of the role of statistics and information theory in understand biological perceptual processing we will discuss:

Bialek's analysis of neuronal spikes in the visual perception of flies.

There have been a number of approaches that are related in mathematical form that have attempted to address ``higher-level'' processing. These theories are in an area between engineering and neuroscience.

Becker and Hinton's use of information for visual processing.
Zemel and Hinton's use of coding for neural network learning.
Hinton et. al. and the Helmholtz machine.

Discussion: Do these theories tell us anything about the brain? Do they tell us anything about engineering?
Has there ever been a computational theory of vision that has told us something about the brain?
Are the goals of science (understanding natural phenomena) and engineering (constructing artifacts to solve problems) ever synergistic?

At this point the course will segue into a discussion of engineering approaches to vision (i.e. those whose ultimate evaluation criterion is how well they work and how often). We will begin our discussion with low-level vision. Low-level visual processing is especially interesting because from first principles we can show that there are many valid interpretation of any image. These can only be differentiated with the help of a prior model of what it is that we are seeing. It is a bit surprising, but every time we look at a scene we must invoke Bayes!

Shape from Shading

Horn's approach
The role of prior models.
Linear shape from shading.
Freeman's Bayesian shape from shading.

Edge detection and region segmentation.

Zero-crossing of the DOG.
**** The prior model built into edge detection.
Canny edge detector.
Markov Random Fields
LeClerc's Minimum Description Length edges
Bayesian segmentation.

Somewhere in the midst of the above discussion we will digress on the topic of intermediate representations of images. This work is based on the insight that a compressed representation of an image is often a useful for reasoning.

Prior models for images.
Compression algorithms = Prior models

Run Length Coding.
Edge based compression.
Pyramid coding.
Wavlet coding.

Markov Random Fields
Other prior models
Texture analysis and synthesis.

Bayes's law turns out to be equally useful when analyzing higher level vision like object recognition:

Template matching --- maximum likelihood
What is an edge and what is it good for?
Edge matching approaches.
Generalized Hough Transforms
Statistical edge matching approaches.
Discussion:Are edges a good idea?
Eigenfaces.
Fisherfaces.
Flexible Templates.
Complex features: a point between edges and images?
Mutual Information: --- an outrageously general matching metric?

Paul A. Viola
Wed Sep 4 18:44:23 EDT 1996