6.891 Machine Learning

Text

Second edition of Pattern Classification by Duda, Hart & Stork.

Articles

Feature Selection

Toward optimal feature selection. D. Koller and M. Sahami. Proceedings of the 13th International Conference on Machine Learning (ICML). 1996.

Boosting

Additive Logistic Regression: A Statistical View of Boosting. J. Friedman, T. Hastie and R. Tibshirani. Stanford University Technical Report. 1998.

Support Vector Machines (SVMs)

A Tutorial on Support Vector Machines for Pattern Recognition. Christopher Burges. Machine Learning. 1998.

Simple Learning Algorithms for Training Support Vector Machines. Colin Campbell and Nello Cristianini. University of Bristol Technical Report. 1998.

An Introduction to Kernel Methods. Colin Campbell. Radial Basis Function Networks: Design and Applications. 2000.

Expectation-Maximization (EM)

Learning to Classify Text from Labeled and Unlabeled Documents. Kamal Nigam et. al. Proceedings of Fifteenth National Conference on Artificial Intelligence (AAAI-98). 1998.

Maximum Likelihood from Incomplete Data via the EM Algorithm. Dempster, Laird and Rubin. Journal of the Royal Statistical Society, B, volume 39, pp. 1-38. 1977.

Clustering

Agglomerative Information Bottlneck. Noam Slonim and Naftali Tishby. Advances in Neural Information Processing Systems 12. 2000.

Bayesian networks

A tutorial on learning with Bayesian networks. David Heckerman. Microsoft research technical report MSR-TR-95-06. 1995.

Probabilistic Independence Networks for Hidden Markov Probability Models. P. Smyth, D. Heckerman, M. Jordan. Technical Report MSR-TR-96-03, Microsoft Research. 1996.

Sampling

Probabilistic Inference Using Markov Chain Monte Carlo Methods. Radford M. Neal. Technical Report CRG-TR-93-1, Dept. of Computer Science, University of Toronto. 1993. PDF
See page 33 for a description of Importance Sampling.

Reinforcement Learning

Reinforcement Learning: A Survey. Leslie Pack Kaelbling, Michael L. Littman and Andrew W. Moore. Journal of Artificial Intelligence Research, vol 4, pp. 237-277. 1996.

Other Reference Texts

Various Machine Learning Topics
Machine Learning. Tom Mitchell. McGraw-Hill. 1997.
Neural networks for pattern recognition. Christopher Bishop. Oxford University Press. 1995.

Boosting, VC-Dimension
An Introduction to Computational Learning Theory. Michael Kearns and Umesh Vazirani. The MIT Press. 1994.

Information Theory (Entropy, Mutual Information)
Elements of Information Theory. Thomas Cover and Joy Thomas. John Wiley & Sons. 1991.

Linear Algebra
Introduction to Linear Algebra. Gilbert Strang. Wellesley-Cambridge Press. 1993.

Demonstrations/Applications

Yoav Freund's AdaBoost Applet

AT&T/Lucent SVM Applet

ifile: intelligent mail filter

MIT Face Detection

CMU Face Detection (Demo)

JavaBayes - Bayesian Networks in Java