Research Projects NTT-MIT Research Collaboration: a partnership in the future of communication and computation

Adaptive Information Filtering with Minimal Instruction


Start date: 07/2000

Tommi Jaakkola and Tomaso Poggio

Naonori Ueda

Project summary

We develop mathematical foundations as well as proof of concept tools for accurate retrieval of information.

Project description


It is generally hard to find a few pieces of relevant information (such as research articles) within a large dataset of predominantly incomplete and possibly superficially similar information (such as technical report archives). This problem has become one of the pervasive challenges of information technology. In this project we exploit and further develop automated information filtering methods arising from a specific synthesis of modern machine learning techniques. The tools that we develop have the ability to function accurately with mininal instruction of what is relevant, learn from related filtering problems, and make use of any optional feed-back provided by or automatically queried from the user. The results of this project can be readily translated into various applied and commercial uses. We plan to build proof of concept tools specifically aimed to allow flexible document retrieval and filtering algorithms for various databases in molecular biology

Demos, movies and other examples

The principal investigators

Presentations and posters

Jaakkola, T. (2000). Maximum entropy approach to classification with incomplete labels and other discrimination problems.

Jaakkola, T. (1998). Exploiting generative models in discriminative classifiers.


A. Corduneanu and T. Jaakkola (2001). Stable mixing of complete and incomplete information. Submitted.

T. Jaakkola and H. Siegelmann (2001). Active information retrieval. Submitted.

M. Szummer and T. Jaakkola (2001). Clustering and efficient use of unlabeled examples. Submitted.

M. Szummer and Jaakkola, T. (2000). Kernel expansions with unlabeled examples. To appear in Neural Information processing systems 13.

T. Jebara and Jaakkola, T. (2000). Feature selection and dualities in maximum entropy discrimination.

T. Evgeniou., M. Pontil and T. Poggio (2000). Regularization Networks and Support Vector Machines, Advances in Computational Mathematics, 13, 1, 1-50

C. Papageorgiou and T. Poggio (2000). A Trainable System for Object Detection, International Journal of Computer Vision, 38, 1, 15-33.

Jaakkola and Jordan (1999). Variational probabilistic inference and the QMR-DT database . Journal of Artificial Intelligence Research, Vol 10, pages 291-322

Jaakkola, Meila, Jebara (1999). Maximum entropy discrimination. In Neural Information processing systems 12.

S. Mukherjee and V. Vapnik (1999). Multivariate Density Estimation: An SVM Approach, CBCL Paper #170/AI Memo #1653, Massachusetts Institute of Technology.

S. Mukherjee, P. Tamayo, J.P. Mesirov, D. Slonim, A. Verri and T. Poggio (1999). Support Vector Machine Classification of Microarray Data, CBCL Paper #182/AI Memo #1676, Massachusetts Institute of Technology.

Jaakkola, Diekhans, Haussler (1998). A discriminative framework for detecting remote protein homologies. Journal of Computational Biology.

Jaakkola and Haussler (1998). Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems 11.

Proposals and progress reports


NTT Bi-Annual Progress Report, July to December 2000:

NTT Bi-Annual Progress Report, January to June 2001:

NTT Bi-Annual Progress Report, July to December 2001:

NTT Bi-Annual Progress Report, January to June 2002:

NTT Bi-Annual Progress Report, July to December 2002:

For more information