6.891 Machine Learning


Second edition of Pattern Classification by Duda, Hart & Stork.


  • Feature Selection
  • Toward optimal feature selection. D. Koller and M. Sahami. Proceedings of the 13th International Conference on Machine Learning (ICML). 1996.
  • Boosting
  • Additive Logistic Regression: A Statistical View of Boosting. J. Friedman, T. Hastie and R. Tibshirani. Stanford University Technical Report. 1998.
  • Support Vector Machines (SVMs)
  • A Tutorial on Support Vector Machines for Pattern Recognition. Christopher Burges. Machine Learning. 1998.
  • Simple Learning Algorithms for Training Support Vector Machines. Colin Campbell and Nello Cristianini. University of Bristol Technical Report. 1998.
  • An Introduction to Kernel Methods. Colin Campbell. Radial Basis Function Networks: Design and Applications. 2000.
  • Expectation-Maximization (EM)
  • Learning to Classify Text from Labeled and Unlabeled Documents. Kamal Nigam et. al. Proceedings of Fifteenth National Conference on Artificial Intelligence (AAAI-98). 1998.
  • Maximum Likelihood from Incomplete Data via the EM Algorithm. Dempster, Laird and Rubin. Journal of the Royal Statistical Society, B, volume 39, pp. 1-38. 1977.
  • Clustering
  • Agglomerative Information Bottlneck. Noam Slonim and Naftali Tishby. Advances in Neural Information Processing Systems 12. 2000.
  • Bayesian networks
  • A tutorial on learning with Bayesian networks. David Heckerman. Microsoft research technical report MSR-TR-95-06. 1995.
  • Probabilistic Independence Networks for Hidden Markov Probability Models. P. Smyth, D. Heckerman, M. Jordan. Technical Report MSR-TR-96-03, Microsoft Research. 1996.
  • Sampling
  • Probabilistic Inference Using Markov Chain Monte Carlo Methods. Radford M. Neal. Technical Report CRG-TR-93-1, Dept. of Computer Science, University of Toronto. 1993. PDF
    See page 33 for a description of Importance Sampling.
  • Reinforcement Learning
  • Reinforcement Learning: A Survey. Leslie Pack Kaelbling, Michael L. Littman and Andrew W. Moore. Journal of Artificial Intelligence Research, vol 4, pp. 237-277. 1996.
  • Other Reference Texts

  • Various Machine Learning Topics
    Machine Learning. Tom Mitchell. McGraw-Hill. 1997.
    Neural networks for pattern recognition. Christopher Bishop. Oxford University Press. 1995.
  • Boosting, VC-Dimension
    An Introduction to Computational Learning Theory. Michael Kearns and Umesh Vazirani. The MIT Press. 1994.
  • Information Theory (Entropy, Mutual Information)
    Elements of Information Theory. Thomas Cover and Joy Thomas. John Wiley & Sons. 1991.
  • Linear Algebra
    Introduction to Linear Algebra. Gilbert Strang. Wellesley-Cambridge Press. 1993.
  • Demonstrations/Applications

  • Yoav Freund's AdaBoost Applet
  • AT&T/Lucent SVM Applet
  • ifile: intelligent mail filter
  • MIT Face Detection
  • CMU Face Detection (Demo)
  • JavaBayes - Bayesian Networks in Java