Feature Selection for Unsupervised Learning Applied to Content-Based Image Retrieval


Jennifer Dy


Northeastern University


This talk has two parts. First, I will present my approach to content-based image retrieval (CBIR) called the customized queries approach (CQA). CBIR is the retrieval of images from a database by using sample images as queries instead of text. The goal of my research is to help doctors diagnose patients by finding similar images with known pathologies. In the talk, I will describe CQA, apply CQA to a database of high-resolution computed tomography images of the lungs, show that our system improves doctors' diagnoses and that CQA increases retrieval precision over the traditional single feature vector approach. In the second part of the talk, we will explore feature selection for unsupervised learning. Choosing the features to represent data is important because it significantly affects the performance of learning algorithms, including that of CQA. Typically, a human defines the features or attributes that are potentially useful. Because not all of these features may be needed in identifying the human labeled categories, a subset is chosen from the original pool of features using an automated feature selection algorithm. The feature selection problem becomes more difficult when the labeled categories are unavailable as in unsupervised learning. In the talk, I will present the issues involved in developing automated feature selection for unsupervised learning algorithms through my algorithm, FSSEM (Feature Subset Selection wrapped around Expectation-Maximization clustering) and through two different performance criteria for evaluating candidate feature subsets: maximum likelihood and scatter separability. I will explain the dimensionality biases of these feature criteria, and present a normalization scheme that can be applied to any criteria to ameliorate these biases. I will, then, present experimental results showing the performance of FSSEM.

Prof. Dy received her Ph.D from Purdue University in 2001, where she studied machine learning and medical image analysis and retrieval. In 2002, she joined the Electrical and Computer Engineering department at Northeastern University as an assistant professor.