\section{Introduction}

Recently, mechanisms related to the Support Vector Machine (SVM) paradigm have 
%Recently, Support Vector Machine (SVM) related mechanisms have 
produced  the dramatically best results for 
information retrieval, e.g. in experiments as measured over the 
standard {\em Reuters } dataset  \citep{DPHS}. 
%\citep{ManevitzMalik1}.

However, these studies are based on training using both positive and 
negative examples, as the basic SVM paradigm suggests.   We have
been interested, however, in information retrieval using {\em only}
positive examples for training.  This is important in many 
applications \citep[see][]{ManevitzMalik1}.  Consider, for example,
trying to classify sites of ``interest'' to a web surfer where the only
information available is the history of the user's activities.  One can 
envisage identifying typical positive examples by such tracking, but
it would be hard to identify representative negative examples.   Of course,
the absence of negative information entails a price, and one should not
expect as good results as when they are available  \citep{DPHS, TJ}.  
%\citep{TJ}.

Since \citet{Scholkopf1} recently extended the 
%Since Sch\"{o}lkopf et al \citep{Scholkopf1} recently extended the 
SVM methodology to handle training using only positive information
(what they call ``one-class" classification), we decided to apply 
their method to documentation classification and compare it with 
other one-class methods, including a method we recently developed and
studied based on a compression neural network as a filter.


There are many parameters in these methods, including the representation
of the data, and the decisions involved in modifying basically two-class
methods to one class ones.    Our studies are fairly broad although
not completely comprehensive; below we describe each of the choices we made.

In the end, it turns out that the suggestion of Sch\"{o}lkopf is quite excellent,
substantially better than all other methods except the neural network based
one with which it is comparable.   Moreover, it is somewhat simpler
to implement than the neural network method.

However, it turns out to be surprisingly 
sensitive to specific choices of representation and kernel in ways which 
are not very transparent.    For example, the method works best with 
binary representation as opposed to tf-idf or ``Hadamard" representations
which are known to be superior in other methods.   In addition, the 
proper choice of a kernel is dependent on the number of features in the
binary vector.   Since the difference in performance is very dramatic
based on these choices, this means that the method is not robust without
a deeper understanding of these representation issues.

This means that, for the moment, we would 
prefer the neural network method for reasons of robustness.












