The Learning and Vision Group – Our Common Ground



Since the fields of learning and vision are so young, and their intersection is even younger, there is not yet a standard curriculum or body of knowledge whose mastery is considered essential for study in these fields. However, given the foci of the current learning and vision group, experience has dictated that certain methodologies, mathematical techniques, and skills are important in being an effective participant in the learning and vision group.

Of course, effective participation involves both understanding for the individual and contribution (in the form of feedback) to other members of the group. Students without the requisite background will be limited in both respects.

What you will find below

Below is a roadmap for your path through the world of probabilistic and statistical machine learning and vision. For each term, there are recommendations for the background material you should develop, the courses you may want to consider in developing this background, and some of the basic papers and methods which you will probably want to take a look at to start developing the normal background for the field. If you find the earlier material easy or basic, great! Then go ahead and get started on the rest of it. If you find the recommendations daunting, hang in there. It can be done, as proven by the once fantastically ignorant Erik Miller.

Below the term by term recommendations are some topic areas which are considered very important by some, and not so important by others. That is, there is perhaps less consensus about the centrality of the concepts. I nevertheless include a few of these topic areas for those with voracious appetites for knowledge or for those who are interested in digging deeper into certain topic areas.

By the middle of your third year, you will probably be digging deeper into topics of particular interest to you, and so presumably you will be more self-directed. However, you might still refer to this page to see if you have any big holes in your background.

1st year, Fall Term

Background material

Fundamentals of probability: axioms of probability, sample spaces, conditional probability, joint distributions, Bayes’ Rule, Gaussian distributions. Moments of distributions.

Courses to Consider

*6.041 (same as 6.431) – Applied Probability Theory. If you’re weak in probability, take this class early on. You’ll be glad you did.

6.801 (same as 6.866) -- Machine Vision

Papers to Read

Paul’s Thesis. This may be difficult for some new students (it was for me!). However, it is an excellent way to measure your progress in mastering the fundamentals of our group, since you should probably understand pretty much everything in this paper eventually. It demonstrates interesting and effective methods of applying probability theory, information theory, non-parametric density estimation, stochastic gradient descent algorithms, and more!

An Introduction to Hidden Markov Models. Rabiner, L.R. and Juang, B.H. This will be a good way to see if your probability is up to snuff. It also incorporates dynamic programming, a critical computer science technique that you should pick up if you don’t already know it. (For more on dynamic programming, see Cormen, Leiserson, and Rivest, Introduction to Algorithms.) It also will introduce you to the Expectation-Maximization algorithm, a cornerstone of modern AI research.

Let's put some more papers in here….

1st year, Spring Term

Background material

Linear Systems Theory. Be able to define linearity. Eigenvectors, eigenvalues, principal components analysis. Linear perceptrons (why are they not very useful?). Finite and infinite linear transforms. Your linear algebra should be getting de-rusted. If you haven’t had a course, better go get one! Or at least set aside some time to learn the basics.

Courses to Consider

6.003 – Signals and Systems (covers linear time-invariant systems, Fourier and Laplace transforms, etc.). NOTE: some people without an engineering background find this class difficult to take without first taking 6.002.

Papers to Read (need papers and links)

Paper using PCA (Eigenfaces?)

Featuring Fisher's Linear Discriminants (Fisherfaces?)

Condensation Algorithm.

K-means clustering.

K-nearest neighbors.

E-M algorithm.

Vector Quantization

2nd year, Fall Term

Background material

Time to get up to speed in AI. The following should start to mean something to you. Bayesian estimation, Maximum Likelihood (ML) decision theory, Maximum A Posteriori (MAP) decision theory, definition of entropy, definition of mutual information, gradient descent algorithms, stochastic gradient descent, simulated annealing, neural nets. A great way to get this stuff is to take the Machine Learning Class. Overfitting. Bias-variance trade off. If you have a strong engineering and math background, taking this class in your first term is certainly doable.

People with backgrounds in signal processing have made major contributions to AI, applying such techniques as Kalman filtering, parametric density estimation, and other commonly used signal processing techniques to classic AI problems. You may want to talk to people with backgrounds in signal processing (like John Fisher, Alan Willsky, etc.) to decide whether this route is for you. If so, I recommend 6.011 followed by 6.432. Below is a description of 6.011. It will probably be quite difficult for you if you have not had a class like 6.003, Signals and Systems.

Courses to Consider

*6.891—Machine Learning and Neural Networks. Paul Viola has taught this class for the past couple of years. It is an excellent introduction to many of the most important and classic techniques in probabilistic and statistical AI. Requisite material for anyone in the group. This course covers the topics listed in the first paragraph above under "Background Material" and many other topics.

6.011 – Introduction to Communication, Control, and Signal Processing. Input-output and state-space models of linear systems driven by deterministic and random signals; time- and transform-domain representations. Sampling, discrete-time processing of continuous-time signals. State feedback, observers. Probabilistic models; stochastic processes, correlation functions, power spectra, whitening filters. Detection; matched filters. Least-mean square error estimation; Wiener filtering.

Papers to Read

Blind Source Separation – Bell and Sejnowski

Tutorial paper on wavelets - Wavelets for Computer Graphics: A Primer, by Eric Stollnitz, Tony DeRose, and David Salesin

Shiftable Multiscale Transforms, Simoncelli, Freeman, Adelson, and Heeger

Steerable Filters, by Adelson et al.?

Mixtures of Gaussians (Anyone got one of these?)

*Paul’s Thesis – Time to understand this completely.

Tutorial on Support Vector Machines by

Cover and Thomas, Information Theory, Chapter 2.

Christopher Bishop’s book "Neural Networks for Pattern Recognition." Chapters 1,2,3, and the first half of Chapter 4.

Duda and Hart (1st Edition). Chapters 1-3. 

2nd year, Spring Term

Take a break and finish your master’s thesis!

3rd year, Fall Term

Background material

Generalized linear models, Bayes nets, Statistical Physics Models, Markov Random Fields. No one of these is essential, but all of them make frequent appearances in the literature. Any one of these you don’t understand at least a little bit will probably become an uncomfortable spot for you eventually. Make understanding them long term goals.

Courses to Consider

6.432– Stochastic Processes. Known as 6.011 on steroids. When you come out of here, you should have a firm grasp on basic parametric estimation (including random and so-called non-random parameter estimation), the Cramer-Rao bounds, Kalman and Wiener filtering, and lots of other stuff which is ubiquitous in the signal processing approach to AI. This course includes cool applications of abstract algebra, namely the conceptualization of random variables as elements of a vector space. Very worthwhile. A challenging class for most people.

6.441 – Information Theory. If you aren’t comfortable with entropy and mutual information when you start, you will be when you finish! Discrete and continuous entropy and MI. Source coding and Channel coding. The Asymptotic Equipartition Property. Typicality. Joint Typicality. This is all good stuff, but you can get a lot of it by reading Chapters 2, 3, 5, and 12 in Cover and Thomas.

Papers to Read (Optional)

Introduction to Bayesian Networks. The more of this book you read, the better you will understand Bayes Nets.

One of Paul and Jeremy’s papers on multi-resolution features.



The Information Bottleneck Method