\title{Machine Learning with Data Dependent Hypothesis Classes}

\author{\name Adam Cannon \email cannon@cs.columbia.edu \\
       \addr Department of Computer Science\\
       Columbia University\\
       New York, NY 10027, USA
       \AND
       \name J. Mark Ettinger  \email ettinger@lanl.gov \\
       \addr Nonproliferation and International Security Group, NIS-8\\
       Los Alamos National Laboratory\\
       Los Alamos, NM 87545, USA
       \AND
       \name Don Hush \email dhush@lanl.gov\\
       \name Clint Scovel \email jcs@lanl.gov\\
       \addr Modeling, Algorithms, and Informatics Group, CCS-3\\
       Los Alamos National Laboratory\\
       Los Alamos, NM 87545, USA}
       
\editor{Peter Bartlett}
\maketitle

\begin{abstract}%
We extend the VC theory of statistical learning to
data dependent spaces of classifiers.
This theory can be viewed as a decomposition
of classifier design into two components;
the first component is a restriction to a data dependent
hypothesis class
and the second is empirical risk minimization
within that class.
We define a measure of complexity for
data dependent hypothesis classes and
provide data dependent versions of
bounds on error deviance and estimation error.
We also provide
a structural risk minimization procedure
over data dependent hierarchies and prove consistency.
We use this theory to provide a framework for
studying the trade-offs between performance and
computational complexity in classifier design.
As a consequence we obtain
a new family of classifiers with dimension independent
performance bounds and efficient learning procedures.
\end{abstract}