Last update Sun Mar 19 05:05:02 2006


AIM-2005-037

Author[s]: Charles C. Kemp and Aaron Edsinger

Visual Tool Tip Detection and Position Estimation for Robotic Manipulation of Unknown Human Tools

November 16, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-037.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-037.pdf

Robots that use human tools could more easily work with people, perform tasks that are important to people, and benefit from human strategies for accomplishing these tasks. For a wide variety of tools and tasks, control of the tool's endpoint is sufficient for its use. In this paper we present a straight-forward method for rapidly detecting the endpoint of an unmodeled tool and estimating its position with respect to the robot's hand. The robot rotates the tool while using optical flow to detect the most rapidly moving image points, and then finds the 3D position with respect to its hand that best explains these noisy 2D detections. The resulting 3D position estimate allows the robot to control the position of the tool endpoint and predict its visual location. We show successful results for this method using a humanoid robot with a variety of traditional tools, including a pen, a hammer, and pliers, as well as more general tools such as a bottle and the robot's own finger.


AIM-2005-036

CBCL-259

Author[s]: T. Serre, M. Kouh, C. Cadieu, U. Knoblich, G. Kreiman, T. Poggio

A theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex

December 19, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-036.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-036.pdf

We describe a quantitative theory to account for the computations performed by the feedforward path of the ventral stream of visual cortex and the local circuits implementing them. We show that a model instantiating the theory is capable of performing recognition on datasets of complex images at the level of human observers in rapid categorization tasks. We also show that the theory is consistent with (and in some case has predicted) several properties of neurons in V1, V4, IT and PFC. The theory seems sufficiently comprehensive, detailed and satisfactory to represent an interesting challenge for physiologists and modelers: either disprove its basic features or propose alternative theories of equivalent scope. The theory suggests a number of open questions for visual physiology and psychophysics.


AIM-2005-035

CBCL-258

Author[s]: Yuri Ivanov, Thomas Serre and Jacob Bouvrie

Confidence weighted classifier combination for multi-modal human identification

December 14, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-035.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-035.pdf

In this paper we describe a technique of classifier combination used in a human identification system. The system integrates all available features from multi-modal sources within a Bayesian framework. The framework allows representing a class of popular classifier combination rules and methods within a single formalism. It relies on a “per-class” measure of confidence derived from performance of each classifier on training data that is shown to improve performance on a synthetic data set. The method is especially relevant in autonomous surveillance setting where varying time scales and missing features are a common occurrence. We show an application of this technique to the real-world surveillance database of video and audio recordings of people collected over several weeks in the office setting.


AIM-2005-034

Author[s]: Leonid Taycher, Gregory Shakhnarovich, David Demirdjian, and Trevor Darrell

Conditional Random People: Tracking Humans with CRFs and Grid Filters

December 1, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-034.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-034.pdf

We describe a state-space tracking approach based on a Conditional Random Field (CRF) model, where the observation potentials are \emph{learned} from data. We find functions that embed both state and observation into a space where similarity corresponds to $L_1$ distance, and define an observation potential based on distance in this space. This potential is extremely fast to compute and in conjunction with a grid-filtering framework can be used to reduce a continuous state estimation problem to a discrete one. We show how a state temporal prior in the grid-filter can be computed in a manner similar to a sparse HMM, resulting in real-time system performance. The resulting system is used for human pose tracking in video sequences.


AIM-2005-033

Author[s]: Sanjoy Dasgupta, Adam Tauman Kalai, Claire Monteleoni

Analysis of Perceptron-Based Active Learning

November 17, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-033.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-033.pdf

We start by showing that in an active learning setting, the Perceptron algorithm needs $\Omega(\frac{1}{\epsilon^2})$ labels to learn linear separators within generalization error $\epsilon$. We then present a simple selective sampling algorithm for this problem, which combines a modification of the perceptron update with an adaptive filtering rule for deciding which points to query. For data distributed uniformly over the unit sphere, we show that our algorithm reaches generalization error $\epsilon$ after asking for just $\tilde{O}(d \log \frac{1}{\epsilon})$ labels. This exponential improvement over the usual sample complexity of supervised learning has previously been demonstrated only for the computationally more complex query-by-committee algorithm.


AIM-2005-032

Author[s]: Claire Monteleoni, Tommi Jaakkola

Online Learning of Non-stationary Sequences

November 17, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-032.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-032.pdf

We consider an online learning scenario in which the learner can make predictions on the basis of a fixed set of experts. We derive upper and lower relative loss bounds for a class of universal learning algorithms involving a switching dynamics over the choice of the experts. On the basis of the performance bounds we provide the optimal a priori discretization of the switching-rate parameter that governs the switching dynamics. We demonstrate the algorithm in the context of wireless networks.


AIM-2005-031

Author[s]: Alexandr Andoni and Piotr Indyk

New LSH-based Algorithm for Approximate Nearest Neighbor

November 3, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-031.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-031.pdf

We present an algorithm for c-approximate nearest neighbor problem in a d-dimensional Euclidean space, achieving query time of O(dn^{1/c^2+o(1)}) and space O(dn + n^{1+1/c^2+o(1)}).


AIM-2005-030

CBCL-257

Author[s]: Ross Lippert and Ryan Rifkin

Asymptotics of Gaussian Regularized Least-Squares

October 20, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-030.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-030.pdf

We consider regularized least-squares (RLS) with a Gaussian kernel. We prove that if we let the Gaussian bandwidth $\sigma \rightarrow \infty$ while letting the regularization parameter $\lambda \rightarrow 0$, the RLS solution tends to a polynomial whose order is controlled by the relative rates of decay of $\frac{1}{\sigma^2}$ and $\lambda$: if $\lambda = \sigma^{-(2k+1)}$, then, as $\sigma \rightarrow \infty$, the RLS solution tends to the $k$th order polynomial with minimal empirical error. We illustrate the result with an example.


AIM-2005-029

CBCL-256

Author[s]: Gadi Geiger & Domenic G Amara

Towards the Prevention of Dyslexia

October 18, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-029.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-029.pdf

Previous studies have shown that dyslexic individuals who supplement windowed reading practice with intensive small-scale hand-eye coordination tasks exhibit marked improvement in their reading skills. Here we examine whether similar hand-eye coordination activities, in the form of artwork performed by children in kindergarten, first and second grades, could reduce the number of students at-risk for reading problems. Our results suggest that daily hand-eye coordination activities significantly reduce the number of students at-risk. We believe that the effectiveness of these activities derives from their ability to prepare the students perceptually for reading.


AIM-2005-028

CBCL-255

Author[s]: Sanmay Das

Learning to Trade with Insider Information

October 7, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-028.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-028.pdf

This paper introduces algorithms for learning how to trade using insider (superior) information in Kyle's model of financial markets. Prior results in finance theory relied on the insider having perfect knowledge of the structure and parameters of the market. I show here that it is possible to learn the equilibrium trading strategy when its form is known even without knowledge of the parameters governing trading in the model. However, the rate of convergence to equilibrium is slow, and an approximate algorithm that does not converge to the equilibrium strategy achieves better utility when the horizon is limited. I analyze this approximate algorithm from the perspective of reinforcement learning and discuss the importance of domain knowledge in designing a successful learning algorithm.


AIM-2005-027

Author[s]: Georgios Theocharous, Sridhar Mahadevan, Leslie Pack Kaelbling

Spatial and Temporal Abstractions in POMDPs Applied to Robot Navigation

September 27, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-027.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-027.pdf

Partially observable Markov decision processes (POMDPs) are a well studied paradigm for programming autonomous robots, where the robot sequentially chooses actions to achieve long term goals efficiently. Unfortunately, for real world robots and other similar domains, the uncertain outcomes of the actions and the fact that the true world state may not be completely observable make learning of models of the world extremely difficult, and using them algorithmically infeasible. In this paper we show that learning POMDP models and planning with them can become significantly easier when we incorporate into our algorithms the notions of spatial and tempral abstraction. We demonstrate the superiority of our algorithms by comparing them with previous flat approaches for large scale robot navigation.


AIM-2005-026

Author[s]: Chris Stauffer

Automated Audio-visual Activity Analysis

September 20, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-026.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-026.pdf

Current computer vision techniques can effectively monitor gross activities in sparse environments. Unfortunately, visual stimulus is often not sufficient for reliably discriminating between many types of activity. In many cases where the visual information required for a particular task is extremely subtle or non-existent, there is often audio stimulus that is extremely salient for a particular classification or anomaly detection task. Unfortunately unlike visual events, independent sounds are often very ambiguous and not sufficient to define useful events themselves. Without an effective method of learning causally-linked temporal sequences of sound events that are coupled to the visual events, these sound events are generally only useful for independent anomalous sounds detection, e.g., detecting a gunshot or breaking glass. This paper outlines a method for automatically detecting a set of audio events and visual events in a particular environment, for determining statistical anomalies, for automatically clustering these detected events into meaningful clusters, and for learning salient temporal relationships between the audio and visual events. This results in a compact description of the different types of compound audio-visual events in an environment.


AIM-2005-025

Author[s]: Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman

LabelMe: a database and web-based tool for image annotation

September 8, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-025.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-025.pdf

Research in object detection and recognition in cluttered scenes requires large image collections with ground truth labels. The labels should provide information about the object classes present in each image, as well as their shape and locations, and possibly other attributes such as pose. Such data is useful for testing, as well as for supervised learning. This project provides a web-based annotation tool that makes it easy to annotate images, and to instantly share such annotations with the community. This tool, plus an initial set of 10,000 images (3000 of which have been labeled), can be found at http://www.csail.mit.edu/$\sim$brussell/research/LabelMe/intro.html


AIM-2005-024

Author[s]: Whitman Richards

Collective Choice with Uncertain Domain Moldels

August 16, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-024.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-024.pdf

When groups of individuals make choices among several alternatives, the most compelling social outcome is the Condorcet winner, namely the alternative beating all others in a pair-wise contest. Obviously the Condorcet winner cannot be overturned if one sub-group proposes another alternative it happens to favor. However, in some cases, and especially with haphazard voting, there will be no clear unique winner, with the outcome consisting of a triple of pair-wise winners that each beat different subsets of the alternatives (i.e. a “top-cycle”.) We explore the sensitivity of Condorcet winners to various perturbations in the voting process that lead to top-cycles. Surprisingly, variations in the number of votes for each alternative is much less important than consistency in a voter’s view of how alternatives are related. As more and more voter’s preference orderings on alternatives depart from a shared model of the domain, then unique Condorcet outcomes become increasingly unlikely.


AIM-2005-023

CBCL-254

Author[s]: Jerry Jun Yokono and Tomaso Poggio

Boosting a Biologically Inspired Local Descriptor for Geometry-free Face and Full Multi-view 3D Object Recognition

July 7, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-023.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-023.pdf

Object recognition systems relying on local descriptors are increasingly used because of their perceived robustness with respect to occlusions and to global geometrical deformations. Descriptors of this type -- based on a set of oriented Gaussian derivative filters -- are used in our recognition system. In this paper, we explore a multi-view 3D object recognition system that does not use explicit geometrical information. The basic idea is to find discriminant features to describe an object across different views. A boosting procedure is used to select features out of a large feature pool of local features collected from the positive training examples. We describe experiments on face images with excellent recognition rate.


AIM-2005-022

CBCL-253

Author[s]: Chou Hung, Gabriel Kreiman, Tomaso Poggio, James J. DiCarlo

Ultra-fast Object Recognition from Few Spikes

July 6, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-022.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-022.pdf

Understanding the complex brain computations leading to object recognition requires quantitatively characterizing the information represented in inferior temporal cortex (IT), the highest stage of the primate visual stream. A read-out technique based on a trainable classifier is used to characterize the neural coding of selectivity and invariance at the population level. The activity of very small populations of independently recorded IT neurons (~100 randomly selected cells) over very short time intervals (as small as 12.5 ms) contains surprisingly accurate and robust information about both object ‘identity’ and ‘category’, which is furthermore highly invariant to object position and scale. Significantly, selectivity and invariance are present even for novel objects, indicating that these properties arise from the intrinsic circuitry and do not require object-specific learning. Within the limits of the technique, there is no detectable difference in the latency or temporal resolution of the IT information supporting so-called ‘categorization’ (a.k. basic level) and ‘identification’ (a.k. subordinate level) tasks. Furthermore, where information, in particular information about stimulus location and scale, can also be read-out from the same small population of IT neurons. These results show how it is possible to decode invariant object information rapidly, accurately and robustly from a small population in IT and provide insights into the nature of the neural code for different kinds of object-related information.


AIM-2005-021

Author[s]: ali rahimi, ben recht, trevor darrell

Nonlinear Latent Variable Models for Video Sequences

June 6, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-021.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-021.pdf

Many high-dimensional time-varying signals can be modeled as a sequence of noisy nonlinear observations of a low-dimensional dynamical process. Given high-dimensional observations and a distribution describing the dynamical process, we present a computationally inexpensive approximate algorithm for estimating the inverse of this mapping. Once this mapping is learned, we can invert it to construct a generative model for the signals. Our algorithm can be thought of as learning a manifold of images by taking into account the dynamics underlying the low-dimensional representation of these images. It also serves as a nonlinear system identification procedure that estimates the inverse of the observation function in nonlinear dynamic system. Our algorithm reduces to a generalized eigenvalue problem, so it does not suffer from the computational or local minimum issues traditionally associated with nonlinear system identification, allowing us to apply it to the problem of learning generative models for video sequences.


AIM-2005-020

Author[s]: Florent Segonne, Jean-Philippe Pons, Bruce Fischl, and Eric Grimson

A Novel Active Contour Framework. Multi-component Level Set Evolution under Topology Control

June 1, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-020.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-020.pdf

We present a novel framework to exert a topology control over a level set evolution. Level set methods offer several advantages over parametric active contours, in particular automated topological changes. In some applications, where some a priori knowledge of the target topology is available, topological changes may not be desirable. A method, based on the concept of simple point borrowed from digital topology, was recently proposed to achieve a strict topology preservation during a level set evolution. However, topologically constrained evolutions often generate topological barriers that lead to large geometric inconsistencies. We introduce a topologically controlled level set framework that greatly alleviates this problem. Unlike existing work, our method allows connected components to merge, split or vanish under some specific conditions that ensure that no topological defects are generated. We demonstrate the strength of our method on a wide range of numerical experiments.


AIM-2005-019

CBCL-252

Author[s]: Andrea Caponnetto, Lorenzo Rosasco, Ernesto De Vito and Alessandro Verri

Empirical Effective Dimension and Optimal Rates for Regularized Least Squares Algorithm

May 27, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-019.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-019.pdf

This paper presents an approach to model selection for regularized least-squares on reproducing kernel Hilbert spaces in the semi-supervised setting. The role of effective dimension was recently shown to be crucial in the definition of a rule for the choice of the regularization parameter, attaining asymptotic optimal performances in a minimax sense. The main goal of the present paper is showing how the effective dimension can be replaced by an empirical counterpart while conserving optimality. The empirical effective dimension can be computed from independent unlabelled samples. This makes the approach particularly appealing in the semi-supervised setting.


AIM-2005-018

CBCL-250

Author[s]: Andrea Caponnetto and Alexander Rakhlin

Some Properties of Empirical Risk Minimization over Donsker Classes

May 17, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-018.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-018.pdf

We study properties of algorithms which minimize (or almost minimize) empirical error over a Donsker class of functions. We show that the L2-diameter of the set of almost-minimizers is converging to zero in probability. Therefore, as the number of samples grows, it is becoming unlikely that adding a point (or a number of points) to the training set will result in a large jump (in L2 distance) to a new hypothesis. We also show that under some conditions the expected errors of the almost-minimizers are becoming close with a rate faster than n^{-1/2}.


AIM-2005-017

Author[s]: Thade Nahnsen, Ozlem Uzuner, Boris Katz

Lexical Chains and Sliding Locality Windows in Content-based Text Similarity Detection

May 19, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-017.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-017.pdf

We present a system to determine content similarity of documents. More specifically, our goal is to identify book chapters that are translations of the same original chapter; this task requires identification of not only the different topics in the documents but also the particular flow of these topics. We experiment with different representations employing n-grams of lexical chains and test these representations on a corpus of approximately 1000 chapters gathered from books with multiple parallel translations. Our representations include the cosine similarity of attribute vectors of n-grams of lexical chains, the cosine similarity of tf*idf-weighted keywords, and the cosine similarity of unweighted lexical chains (unigrams of lexical chains) as well as multiplicative combinations of the similarity measures produced by these approaches. Our results identify fourgrams of unordered lexical chains as a particularly useful representation for text similarity evaluation.


AIM-2005-016

Author[s]: Christopher Taylor, Ali Rahimi, Jonathan Bachrach and Howard Shrobe

Simultaneous Localization, Calibration, and Tracking in an ad Hoc Sensor Network

April 26, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-016.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-016.pdf

We introduce Simultaneous Localization and Tracking (SLAT), the problem of tracking a target in a sensor network while simultaneously localizing and calibrating the nodes of the network. Our proposed solution, LaSLAT, is a Bayesian filter providing on-line probabilistic estimates of sensor locations and target tracks. It does not require globally accessible beacon signals or accurate ranging between the nodes. When applied to a network of 27 sensor nodes, our algorithm can localize the nodes to within one or two centimeters.


AIM-2005-015

CBCL-249

Author[s]: Ernesto De Vito and Andrea Caponnetto

Risk Bounds for Regularized Least-squares Algorithm with Operator-valued kernels

May 16, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-015.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-015.pdf

We show that recent results in [3] on risk bounds for regularized least-squares on reproducing kernel Hilbert spaces can be straightforwardly extended to the vector-valued regression setting. We first briefly introduce central concepts on operator-valued kernels. Then we show how risk bounds can be expressed in terms of a generalization of effective dimension.


AIM-2005-014

Author[s]: Jacob Eisenstein and Randall Davis

Gestural Cues for Sentence Segmentation

April 19, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-014.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-014.pdf

In human-human dialogues, face-to-face meetings are often preferred over phone conversations. One explanation is that non-verbal modalities such as gesture provide additional information, making communication more efficient and accurate. If so, computer processing of natural language could improve by attending to non-verbal modalities as well. We consider the problem of sentence segmentation, using hand-annotated gesture features to improve recognition. We find that gesture features correlate well with sentence boundaries, but that these features improve the overall performance of a language-only system only marginally. This finding is in line with previous research on this topic. We provide a regression analysis, revealing that for sentence boundary detection, the gestural features are largely redundant with the language model and pause features. This suggests that gestural features can still be useful when speech recognition is inaccurate.


AIM-2005-013

CBCL-248

Author[s]: Andrea Caponnetto and Ernesto De Vito

Fast Rates for Regularized Least-squares Algorithm

April 14, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-013.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-013.pdf

We develop a theoretical analysis of generalization performances of regularized least-squares on reproducing kernel Hilbert spaces for supervised learning. We show that the concept of effective dimension of an integral operator plays a central role in the definition of a criterion for the choice of the regularization parameter as a function of the number of samples. In fact, a minimax analysis is performed which shows asymptotic optimality of the above-mentioned criterion.


AIM-2005-012

Author[s]: Jacob Beal

Learning From Snapshot Examples

April 13, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-012.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-012.pdf

Examples are a powerful tool for teaching both humans and computers. In order to learn from examples, however, a student must first extract the examples from its stream of perception. Snapshot learning is a general approach to this problem, in which relevant samples of perception are used as examples. Learning from these examples can in turn improve the judgement of the snapshot mechanism, improving the quality of future examples. One way to implement snapshot learning is the Top-Cliff heuristic, which identifies relevant samples using a generalized notion of peaks. I apply snapshot learning with the Top-Cliff heuristic to solve a distributed learning problem and show that the resulting system learns rapidly and robustly, and can hallucinate useful examples in a perceptual stream from a teacherless system.


AIM-2005-011

Author[s]: Justin Werfel, Yaneer Bar-Yam, Radhika Nagpal

Construction by robot swarms using extended stigmergy

April 8, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-011.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-011.pdf

We describe a system in which simple, identical, autonomous robots assemble two-dimensional structures out of identical building blocks. We show that, in a system divided in this way into mobile units and structural units, giving the blocks limited communication abilities enables robots to have sufficient global structural knowledge to rapidly build elaborate pre-designed structures. In this way we extend the principle of stigmergy (storing information in the environment) used by social insects, by increasing the capabilities of the blocks that represent that environmental information. As a result, arbitrary solid structures can be built using a few fixed, local behaviors, without requiring construction to be planned out in detail.


AIM-2005-010

Author[s]: Kilian M. Pohl, John Fisher, W. Eric L. Grimson, William M. Wells

An Expectation Maximization Approach for Integrated Registration, Segmentation, and Intensity Correction

April 1, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-010.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-010.pdf

This paper presents a statistical framework which combines the registration of an atlas with the segmentation of MR images. We use an Expectation Maximization-based algorithm to find a solution within the model, which simultaneously estimates image inhomogeneities, anatomical labelmap, and a mapping from the atlas to the image space. An example of the approach is given for a brain structure-dependent affine mapping approach. The algorithm produces high quality segmentations for brain tissues as well as their substructures. We demonstrate the approach on a set of 30 brain MR images. In addition, we show that the approach performs better than similar methods which separate the registration from the segmentation problem.


AIM-2005-009

CBCL-247

Author[s]: Lior Wolf & Stanley Bileschi

Combining Variable Selection with Dimensionality Reduction

March 30, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-009.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-009.pdf

This paper bridges the gap between variable selection methods (e.g., Pearson coefficients, KS test) and dimensionality reduction algorithms (e.g., PCA, LDA). Variable selection algorithms encounter difficulties dealing with highly correlated data, since many features are similar in quality. Dimensionality reduction algorithms tend to combine all variables and cannot select a subset of significant variables. Our approach combines both methodologies by applying variable selection followed by dimensionality reduction. This combination makes sense only when using the same utility function in both stages, which we do. The resulting algorithm benefits from complex features as variable selection algorithms do, and at the same time enjoys the benefits of dimensionality reduction.1


AIM-2005-008

Author[s]: Leonid Taycher, John W. Fisher III, and Trevor Darrell

Combining Object and Feature Dynamics in Probabilistic Tracking

March 2, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-008.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-008.pdf

Objects can exhibit different dynamics at different scales, a property that is often exploited by visual tracking algorithms. A local dynamic model is typically used to extract image features that are then used as inputs to a system for tracking the entire object using a global dynamic model. Approximate local dynamics may be brittle---point trackers drift due to image noise and adaptive background models adapt to foreground objects that become stationary---but constraints from the global model can make them more robust. We propose a probabilistic framework for incorporating global dynamics knowledge into the local feature extraction processes. A global tracking algorithm can be formulated as a generative model and used to predict feature values that influence the observation process of the feature extractor. We combine such models in a multichain graphical model framework. We show the utility of our framework for improving feature tracking and thus shape and motion estimates in a batch factorization algorithm. We also propose an approximate filtering algorithm appropriate for online applications, and demonstrate its application to problems such as background subtraction, structure from motion and articulated body tracking.


AIM-2005-007

Author[s]: Kristen Grauman and Trevor Darrell

Pyramid Match Kernels: Discriminative Classification with Sets of Image Features

March 17, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-007.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-007.pdf

Discriminative learning is challenging when examples are sets of local image features, and the sets vary in cardinality and lack any sort of meaningful ordering. Kernel-based classification methods can learn complex decision boundaries, but a kernel similarity measure for unordered set inputs must somehow solve for correspondences -- generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function which maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in this space. This ``pyramid match" computation is linear in the number of features, and it implicitly finds correspondences based on the finest resolution histogram cell where a matched pair first appears. Since the kernel does not penalize the presence of extra features, it is robust to clutter. We show the kernel function is positive-definite, making it valid for use in learning algorithms whose optimal solutions are guaranteed only for Mercer kernels. We demonstrate our algorithm on object recognition tasks and show it to be dramatically faster than current approaches.


AIM-2005-006

CBCL-246

Author[s]: Benjamin Balas, Pawan Sinha

Receptive field structures for recognition

March 1, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-006.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-006.pdf

Localized operators, like Gabor wavelets and difference-of-Gaussian filters, are considered to be useful tools for image representation. This is due to their ability to form a ‘sparse code’ that can serve as a basis set for high-fidelity reconstruction of natural images. However, for many visual tasks, the more appropriate criterion of representational efficacy is ‘recognition’, rather than ‘reconstruction’. It is unclear whether simple local features provide the stability necessary to subserve robust recognition of complex objects. In this paper, we search the space of two-lobed differential operators for those that constitute a good representational code under recognition/discrimination criteria. We find that a novel operator, which we call the ‘dissociated dipole’ displays useful properties in this regard. We describe simple computational experiments to assess the merits of such dipoles relative to the more traditional local operators. The results suggest that non-local operators constitute a vocabulary that is stable across a range of image transformations.


AIM-2005-005

Author[s]: Josef Sivic, Bryan C. Russell, Alexei A. Efros, Andrew Zisserman, William T. Freeman

Discovering object categories in image collections

February 25, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-005.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-005.pdf

Given a set of images containing multiple object categories, we seek to discover those categories and their image locations without supervision. We achieve this using generative models from the statistical text literature: probabilistic Latent Semantic Analysis (pLSA), and Latent Dirichlet Allocation (LDA). In text analysis these are used to discover topics in a corpus using the bag-of-words document representation. Here we discover topics as object categories, so that an image containing instances of several categories is modelled as a mixture of topics. The models are applied to images by using a visual analogue of a word, formed by vector quantizing SIFT like region descriptors. We investigate a set of increasingly demanding scenarios, starting with image sets containing only two object categories through to sets containing multiple categories (including airplanes, cars, faces, motorbikes, spotted cats) and background clutter. The object categories sample both intra-class and scale variation, and both the categories and their approximate spatial layout are found without supervision. We also demonstrate classification of unseen images and images containing multiple objects. Performance of the proposed unsupervised method is compared to the semi-supervised approach of Fergus et al.


AITR-2005-004

Author[s]: Ozlem Uzuner

Identifying Expression Fingerprints using Linguistic Information

November 16, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AITR-2005-004.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AITR-2005-004.pdf

This thesis presents a technology to complement taxation-based policy proposals aimed at addressing the digital copyright problem. The approach presented facilitates identification of intellectual property using expression fingerprints. Copyright law protects expression of content. Recognizing literary works for copyright protection requires identification of the expression of their content. The expression fingerprints described in this thesis use a novel set of linguistic features that capture both the content presented in documents and the manner of expression used in conveying this content. These fingerprints consist of both syntactic and semantic elements of language. Examples of the syntactic elements of expression include structures of embedding and embedded verb phrases. The semantic elements of expression consist of high-level, broad semantic categories. Syntactic and semantic elements of expression enable generation of models that correctly identify books and their paraphrases 82% of the time, providing a significant (approximately 18%) improvement over models that use tfidf-weighted keywords. The performance of models built with these features is also better than models created with standard features used in stylometry (e.g., function words), which yield an accuracy of 62%. In the non-digital world, copyright holders collect revenues by controlling distribution of their works. Current approaches to the digital copyright problem attempt to provide copyright holders with the same kind of control over distribution by employing Digital Rights Management (DRM) systems. However, DRM systems also enable copyright holders to control and limit fair use, to inhibit others' speech, and to collect private information about individual users of digital works. Digital tracking technologies enable alternate solutions to the digital copyright problem; some of these solutions can protect creative incentives of copyright holders in the absence of control over distribution of works. Expression fingerprints facilitate digital tracking even when literary works are DRM- and watermark-free, and even when they are paraphrased. As such, they enable metering popularity of works and make practicable solutions that encourage large-scale dissemination and unrestricted use of digital works and that protect the revenues of copyright holders, for example through taxation-based revenue collection and distribution systems, without imposing limits on distribution.


AIM-2005-004

Author[s]: Reina Riemann, Keith Winstein

Improving 802.11 Range with Forward Error Correction

February 24, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-004.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-004.pdf

The ISO/IEC 8802-11:1999(E) specification uses a 32-bit CRC for error detection and whole-packet retransmissions for recovery. In long-distance or high-interference links where the probability of a bit error is high, this strategy results in excessive losses, because any erroneous bit causes an entire packet to be discarded. By ignoring the CRC and adding redundancy to 802.11 payloads in software, we achieved substantially reduced loss rates on indoor and outdoor long-distance links and extended line-of-sight range outdoors by 70 percent.


AITR-2005-003

Author[s]: Christopher J. Taylor

Simultaneous Localization and Tracking in Wireless Ad-hoc Sensor Networks

May 31, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AITR-2005-003.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AITR-2005-003.pdf

In this thesis we present LaSLAT, a sensor network algorithm that simultaneously localizes sensors, calibrates sensing hardware, and tracks unconstrained moving targets using only range measurements between the sensors and the target. LaSLAT is based on a Bayesian filter, which updates a probability distribution over the quantities of interest as measurements arrive. The algorithm is distributable, and requires only a constant amount of space with respect to the number of measurements incorporated. LaSLAT is easy to adapt to new types of hardware and new physical environments due to its use of intuitive probability distributions: one adaptation demonstrated in this thesis uses a mixture measurement model to detect and compensate for bad acoustic range measurements due to echoes. We also present results from a centralized Java implementation of LaSLAT on both two- and three-dimensional sensor networks in which ranges are obtained using the Cricket ranging system. LaSLAT is able to localize sensors to within several centimeters of their ground truth positions while recovering a range measurement bias for each sensor and the complete trajectory of the mobile.


AIM-2005-003

Author[s]: Gerald Jay Sussman and Jack Wisdom

Functional Differential Geometry

February 2, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-003.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-003.pdf

Differential geometry is deceptively simple. It is surprisingly easy to get the right answer with unclear and informal symbol manipulation. To address this problem we use computer programs to communicate a precise understanding of the computations in differential geometry. Expressing the methods of differential geometry in a computer language forces them to be unambiguous and computationally effective. The task of formulating a method as a computer-executable program and debugging that program is a powerful exercise in the learning process. Also, once formalized procedurally, a mathematical idea becomes a tool that can be used directly to compute results.


AITR-2005-002

CBCL-251

Author[s]: Jia Jane Wu

Comparing Visual Features for Morphing Based Recognition

May 25, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AITR-2005-002.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AITR-2005-002.pdf

This thesis presents a method of object classification using the idea of deformable shape matching. Three types of visual features, geometric blur, C1 and SIFT, are used to generate feature descriptors. These feature descriptors are then used to find point correspondences between pairs of images. Various morphable models are created by small subsets of these correspondences using thin-plate spline. Given these morphs, a simple algorithm, least median of squares (LMEDS), is used to find the best morph. A scoring metric, using both LMEDS and distance transform, is used to classify test images based on a nearest neighbor algorithm. We perform the experiments on the Caltech 101 dataset [5]. To ease computation, for each test image, a shortlist is created containing 10 of the most likely candidates. We were unable to duplicate the performance of [1] in the shortlist stage because we did not use hand-segmentation to extract objects for our training images. However, our gain from the shortlist to correspondence stage is comparable to theirs. In our experiments, we improved from 21% to 28% (gain of 33%), while [1] improved from 41% to 48% (gain of 17%). We find that using a non-shape based approach, C2 [14], the overall classification rate of 33.61% is higher than all of the shaped based methods tested in our experiments.


AIM-2005-002

CBCL-244

Author[s]: Benjamin Balas

Using computational models to study texture representations in the human visual system.

February 7, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-002.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-002.pdf

Traditionally, human texture perception has been studied using artificial textures made of random-dot patterns or abstract structured elements. At the same time, computer algorithms for the synthesis of natural textures have improved dramatically. The current study seeks to unify these two fields of research through a psychophysical assessment of a particular computational model, thus providing a sense of what image statistics are most vital for representing a range of natural textures. We employ Portilla and Simoncelli’s 2000 model of texture synthesis for this task (a parametric model of analysis and synthesis designed to mimic computations carried out by the human visual system). We find an intriguing interaction between texture type (periodic v. structured) and image statistics (autocorrelation function and filter magnitude correlations), suggesting different processing strategies may be employed for these two texture families under pre-attentive viewing.


AITR-2005-001

Author[s]: Attila Kondacs

Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods

January 28, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AITR-2005-001.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AITR-2005-001.pdf

In this thesis I will be concerned with linking the observed speech signal to the configuration of articulators. Due to the potentially rapid motion of the articulators, the speech signal can be highly non-stationary. The typical linear analysis techniques that assume quasi-stationarity may not have sufficient time-frequency resolution to determine the place of articulation. I argue that the traditional low and high-level primitives of speech processing, frequency and phonemes, are inadequate and should be replaced by a representation with three layers: 1. short pitch period resonances and other spatio-temporal patterns 2. articulator configuration trajectories 3. syllables. The patterns indicate articulator configuration trajectories (how the tongue, jaws, etc. are moving), which are interpreted as syllables and words. My patterns are an alternative to frequency. I use short time-domain features of the sound waveform, which can be extracted from each vowel pitch period pattern, to identify the positions of the articulators with high reliability. These features are important because by capitalizing on detailed measurements within a single pitch period, the rapid articulator movements can be tracked. No linear signal processing approach can achieve the combination of sensitivity to short term changes and measurement accuracy resulting from these nonlinear techniques. The measurements I use are neurophysiologically plausible: the auditory system could be using similar methods. I have demonstrated this approach by constructing a robust technique for categorizing the English voiced stops as the consonants B, D, or G based on the vocalic portions of their releases. The classification recognizes 93.5%, 81.8% and 86.1% of the b, d and g to ae transitions with false positive rates 2.9%, 8.7% and 2.6% respectively.


AIM-2005-001

Author[s]: Jacob Beal, Gerald Sussman

Biologically-Inspired Robust Spatial Programming

January 18, 2005

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-001.ps

ftp://publications.ai.mit.edu/ai-publications/2005/AIM-2005-001.pdf

Inspired by the robustness and flexibility of biological systems, we are developing linguistic and programming tools to allow us to program spatial systems populated by vast numbers of unreliable components interconnected in unknown, irregular, and time-varying ways. We organize our computations around geometry, making the fact that our system is made up of discrete individuals implicit. Geometry allows us to specify requirements in terms of the behavior of the space occupied by the aggregate rather than the behavior of individuals, thereby decreasing complexity. So we describe the behavior of space explicitly, abstracting away the discrete nature of the components. As an example, we present the Amorphous Medium Language, which describes behavior in terms of homeostatic maintenance of constraints on nested regions of space.


AIM-2004-031

CBCL-245

Author[s]: Minjoon Kouh and Tomaso Poggio

A general mechanism for tuning: Gain control circuits and synapses underlie tuning of cortical neurons

December 31, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-031.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-031.pdf

Tuning to an optimal stimulus is a widespread property of neurons in cortex. We propose that such tuning is a consequence of normalization or gain control circuits. We also present a biologically plausible neural circuitry of tuning.


AIM-2004-030

Author[s]: Percy Liang and Nathan Srebro

Methods and Experiments With Bounded Tree-width Markov Networks

December 30, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-030.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-030.pdf

Markov trees generalize naturally to bounded tree-width Markov networks, on which exact computations can still be done efficiently. However, learning the maximum likelihood Markov network with tree-width greater than 1 is NP-hard, so we discuss a few algorithms for approximating the optimal Markov network. We present a set of methods for training a density estimator. Each method is specified by three arguments: tree-width, model scoring metric (maximum likelihood or minimum description length), and model representation (using one joint distribution or several class-conditional distributions). On these methods, we give empirical results on density estimation and classification tasks and explore the implications of these arguments.


AIM-2004-029

Author[s]: Whitman Richards & H. Sebastian Seung

Neural Voting Machines

December 31, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-029.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-029.pdf

“Winner-take-all” networks typically pick as winners that alternative with the largest excitatory input. This choice is far from optimal when there is uncertainty in the strength of the inputs, and when information is available about how alternatives may be related. In the Social Choice community, many other procedures will yield more robust winners. The Borda Count and the pair-wise Condorcet tally are among the most favored. Their implementations are simple modifications of classical recurrent networks.


AIM-2004-028

Author[s]: Luis Perez-Breva

Cascading Regularized Classifiers

April 21, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-028.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-028.pdf

Among the various methods to combine classifiers, Boosting was originally thought as an stratagem to cascade pairs of classifiers through their disagreement. I recover the same idea from the work of Niyogi et al. to show how to loosen the requirement of weak learnability, central to Boosting, and introduce a new cascading stratagem. The paper concludes with an empirical study of an implementation of the cascade that, under assumptions that mirror the conditions imposed by Viola and Jones in [VJ01], has the property to preserve the generalization ability of boosting.


AIM-2004-027

Author[s]: Kristen Grauman and Trevor Darrell

Efficient Image Matching with Distributions of Local Invariant Features

November 22, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-027.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-027.pdf

Sets of local features that are invariant to common image transformations are an effective representation to use when comparing images; current methods typically judge feature sets' similarity via a voting scheme (which ignores co-occurrence statistics) or by comparing histograms over a set of prototypes (which must be found by clustering). We present a method for efficiently comparing images based on their discrete distributions (bags) of distinctive local invariant features, without clustering descriptors. Similarity between images is measured with an approximation of the Earth Mover's Distance (EMD), which quickly computes the minimal-cost correspondence between two bags of features. Each image's feature distribution is mapped into a normed space with a low-distortion embedding of EMD. Examples most similar to a novel query image are retrieved in time sublinear in the number of examples via approximate nearest neighbor search in the embedded space. We also show how the feature representation may be extended to encode the distribution of geometric constraints between the invariant features appearing in each image. We evaluate our technique with scene recognition and texture classification tasks.


AIM-2004-026

CBCL-243

Author[s]: Thomas Serre, Lior Wolf and Tomaso Poggio

A new biologically motivated framework for robust object recognition

November 14, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-026.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-026.pdf

In this paper, we introduce a novel set of features for robust object recognition, which exhibits outstanding performances on a variety of object categories while being capable of learning from only a few training examples. Each element of this set is a complex feature obtained by combining position- and scale-tolerant edge-detectors over neighboring positions and multiple orientations. Our system - motivated by a quantitative model of visual cortex - outperforms state-of-the-art systems on a variety of object image datasets from different groups. We also show that our system is able to learn from very few examples with no prior category knowledge. The success of the approach is also a suggestive plausibility proof for a class of feed-forward models of object recognition in cortex. Finally, we conjecture the existence of a universal overcomplete dictionary of features that could handle the recognition of all object categories.


AIM-2004-025

CBCL-242

Author[s]: Lior Wolf and Ian Martin

Regularization Through Feature Knock Out

November 12, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-025.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-025.pdf

In this paper, we present and analyze a novel regularization technique based on enhancing our dataset with corrupted copies of the original data. The motivation is that since the learning algorithm lacks information about which parts of the data are reliable, it has to produce more robust classification functions. We then demonstrate how this regularization leads to redundancy in the resulting classifiers, which is somewhat in contrast to the common interpretations of the Occam’s razor principle. Using this framework, we propose a simple addition to the gentle boosting algorithm which enables it to work with only a few examples. We test this new algorithm on a variety of datasets and show convincing results.


AIM-2004-024

CBCL-241

Author[s]: Charles Cadieu, Minjoon Kouh, Maximilian Riesenhuber, and Tomaso Poggio

Shape Representation in V4: Investigating Position-Specific Tuning for Boundary Conformation with the Standard Model of Object Recognition

November 12, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-024.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-024.pdf

The computational processes in the intermediate stages of the ventral pathway responsible for visual object recognition are not well understood. A recent physiological study by A. Pasupathy and C. Connor in intermediate area V4 using contour stimuli, proposes that a population of V4 neurons display bjectcentered, position-specific curvature tuning [18]. The “standard model” of object recognition, a recently developed model [23] to account for recognition properties of IT cells (extending classical suggestions by Hubel, Wiesel and others [9, 10, 19]), is used here to model the response of the V4 cells described in [18]. Our results show that a feedforward, network level mechanism can exhibit selectivity and invariance properties that correspond to the responses of the V4 cells described in [18]. These results suggest how object-centered, position-specific curvature tuning of V4 cells may arise from combinations of complex V1 cell responses. Furthermore, the model makes predictions about the responses of the same V4 cells studied by Pasupathy and Connor to novel gray level patterns, such as gratings and natural images. These predictions suggest specific experiments to further explore shape representation in V4.


AIM-2004-023

Author[s]: Kurt Steinkraus, Leslie Pack Kaelbling

Combining dynamic abstractions in large MDPs

October 21, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-023.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-023.pdf

One of the reasons that it is difficult to plan and act in real-world domains is that they are very large. Existing research generally deals with the large domain size using a static representation and exploiting a single type of domain structure. In this paper, we create a framework that encapsulates existing and new abstraction and approximation methods into modules, and combines arbitrary modules into a system that allows for dynamic representation changes. We show that the dynamic changes of representation allow our framework to solve larger and more interesting domains than were previously possible, and while there are no optimality guarantees, suitable module choices gain tractability at little cost to optimality.


AIM-2004-022

Author[s]: Gene Yeo, Eric Van Nostrand, Dirk Holste, Tomaso Poggio, Christopher Burge

Predictive identification of alternative events conserved in human and mouse

September 30, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-022.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-022.pdf

Alternative pre-messenger RNA splicing affects a majority of human genes and plays important roles in development and disease. Alternative splicing (AS) events conserved since the divergence of human and mouse are likely of primary biological importance, but relatively few such events are known. Here we describe sequence features that distinguish exons subject to evolutionarily conserved AS, which we call 'alternative- conserved exons' (ACEs) from other orthologous human/mouse exons, and integrate these features into an exon classification algorithm, ACEScan. Genome-wide analysis of annotated orthologous human-mouse exon pairs identified ~2,000 predicted ACEs. Alternative splicing was verified in both human and mouse tissues using an RT-PCR- sequencing protocol for 21 of 30 (70%) predicted ACEs tested, supporting the validity of a majority of ACEScan predictions. By contrast, AS was observed in mouse tissues for only 2 of 15 (13%) tested exons which had EST or cDNA evidence of AS in human but were not predicted ACEs, and was never observed for eleven negative control exons in human or mouse tissues. Predicted ACEs were much more likely to preserve reading frame, and less likely to disrupt protein domains than other AS events, and were enriched in genes expressed in the brain and in genes involved in transcriptional regulation, RNA processing and development. Our results also imply that the vast majority of AS events represented in the human EST databases are not conserved in mouse, and therefore may represent aberrant, disease- or allele-specific, or highly lineage-restricted splicing events.


AIM-2004-021

Author[s]: Michael R. Benjamin

The Interval Programming Model for Multi-objective Decision Making

September 27, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-021.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-021.pdf

The interval programming model (IvP) is a mathematical programming model for representing and solving multi-objective optimization problems. The central characteristic of the model is the use of piecewise linearly defined objective functions and a solution method that searches through the combination space of pieces rather than through the actual decision space. The piecewise functions typically represent an approximation of some underlying function, but this concession is balanced on the positive side by relative freedom from function form assumptions as well as the assurance of global optimality. In this paper the model and solution algorithms are described, and the applicability of IvP to certain applications are discussed.


AIM-2004-020

CBCL-240

Author[s]: Gabriel Kreiman, Chou Hung, Tomaso Poggio, James DiCarlo

Selectivity of Local Field Potentials in Macaque Inferior Temporal Cortex

September 21, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-020.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-020.pdf

While single neurons in inferior temporal (IT) cortex show differential responses to distinct complex stimuli, little is known about the responses of populations of neurons in IT. We recorded single electrode data, including multi-unit activity (MUA) and local field potentials (LFP), from 618 sites in the inferior temporal cortex of macaque monkeys while the animals passively viewed 78 different pictures of complex stimuli. The LFPs were obtained by low-pass filtering the extracellular electrophysiological signal with a corner frequency of 300 Hz. As reported previously, we observed that spike counts from MUA showed selectivity for some of the pictures. Strikingly, the LFP data, which is thought to constitute an average over large numbers of neurons, also showed significantly selective responses. The LFP responses were less selective than the MUA responses both in terms of the proportion of selective sites as well as in the selectivity of each site. We observed that there was only little overlap between the selectivity of MUA and LFP recordings from the same electrode. To assess the spatial organization of selective responses, we compared the selectivity of nearby sites recorded along the same penetration and sites recorded from different penetrations. We observed that MUA selectivity was correlated on spatial scales up to 800 m while the LFP selectivity was correlated over a larger spatial extent, with significant correlations between sites separated by several mm. Our data support the idea that there is some topographical arrangement to the organization of selectivity in inferior temporal cortex and that this organization may be relevant for the representation of object identity in IT.


AIM-2004-019

Author[s]: Charles Kemp, Thomas L. Griffiths and Joshua B. Tenenbaum

Discovering Latent Classes in Relational Data

July 22, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-019.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-019.pdf

We present a framework for learning abstract relational knowledge with the aim of explaining how people acquire intuitive theories of physical, biological, or social systems. Our approach is based on a generative relational model with latent classes, and simultaneously determines the kinds of entities that exist in a domain, the number of these latent classes, and the relations between classes that are possible or likely. This model goes beyond previous psychological models of category learning, which consider attributes associated with individual categories but not relationships between categories. We apply this domain-general framework to two specific problems: learning the structure of kinship systems and learning causal theories.


AIM-2004-018

Author[s]: Ozlem Uzuner

Distribution Volume Tracking on Privacy-Enhanced Wireless Grid

July 25, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-018.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-018.pdf

In this paper, we discuss a wireless grid in which users are highly mobile, and form ad-hoc and sometimes short-lived connections with other devices. As they roam through networks, the users may choose to employ privacy-enhancing technologies to address their privacy needs and benefit from the computational power of the grid for a variety of tasks, including sharing content. The high rate of mobility of the users on the wireless grid, when combined with privacy enhancing mechanisms and ad-hoc connections, makes it difficult to conclusively link devices and/or individuals with network activities and to hold them liable for particular downloads. Protecting intellectual property in this scenario requires a solution that can work in absence of knowledge about behavior of particular individuals. Building on previous work, we argue for a solution that ensures proper compensation to content owners without inhibiting use and dissemination of works. Our proposal is based on digital tracking for measuring distribution volume of content and compensation of authors based on this accounting information. The emphasis is on obtaining good estimates of rate of popularity of works, without keeping track of activities of individuals or devices. The contribution of this paper is a revenue protection mechanism, Distribution Volume Tracking, that does not invade the privacy of users in the wireless grid and works even in the presence of privacy-enhancing technologies they may employ.


AIM-2004-017

CBCL-239

Author[s]: Thomas Serre and Maximilian Riesenhuber

Realistic Modeling of Simple and Complex Cell Tuning in the HMAX Model, and Implications for Invariant Object Recognition in Cortex

July 27, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-017.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-017.pdf

Riesenhuber \& Poggio recently proposed a model of object recognition in cortex which, beyond integrating general beliefs about the visual system in a quantitative framework, made testable predictions about visual processing. In particular, they showed that invariant object representation could be obtained with a selective pooling mechanism over properly chosen afferents through a {\sc max} operation: For instance, at the complex cells level, pooling over a group of simple cells at the same preferred orientation and position in space but at slightly different spatial frequency would provide scale tolerance, while pooling over a group of simple cells at the same preferred orientation and spatial frequency but at slightly different position in space would provide position tolerance. Indirect support for such mechanisms in the visual system come from the ability of the architecture at the top level to replicate shape tuning as well as shift and size invariance properties of ``view-tuned cells'' (VTUs) found in inferotemporal cortex (IT), the highest area in the ventral visual stream, thought to be crucial in mediating object recognition in cortex. There is also now good physiological evidence that a {\sc max} operation is performed at various levels along the ventral stream. However, in the original paper by Riesenhuber \& Poggio, tuning and pooling parameters of model units in early and intermediate areas were only qualitatively inspired by physiological data. In particular, many studies have investigated the tuning properties of simple and complex cells in primary visual cortex, V1. We show that units in the early levels of HMAX can be tuned to produce realistic simple and complex cell-like tuning, and that the earlier findings on the invariance properties of model VTUs still hold in this more realistic version of the model.


AIM-2004-016

Author[s]: Tevfik Metin Sezgin and Randall Davis

Early Sketch Processing with Application in HMM Based Sketch Recognition

July 28, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-016.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-016.pdf

Freehand sketching is a natural and crucial part of everyday human interaction, yet is almost totally unsupported by current user interfaces. With the increasing availability of tablet notebooks and pen based PDAs, sketch based interaction has gained attention as a natural interaction modality. We are working to combine the flexibility and ease of use of paper and pencil with the processing power of a computer, to produce a user interface for design that feels as natural as paper, yet is considerably smarter. One of the most basic tasks in accomplishing this is converting the original digitized pen strokes in a sketch into the intended geometric objects. In this paper we describe an implemented system that combines multiple sources of knowledge to provide robust early processing for freehand sketching. We also show how this early processing system can be used as part of a fast sketch recognition system with polynomial time segmentation and recognition algorithms.


AIM-2004-015

Author[s]: Mihai Badoiu, Piotr Indyk, Anastasios Sidiropoulos

A Constant-Factor Approximation Algorithm for Embedding Unweighted Graphs into Trees

July 5, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-015.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-015.pdf

We present a constant-factor approximation algorithm for computing an embedding of the shortest path metric of an unweighted graph into a tree, that minimizes the multiplicative distortion.


AIM-2004-014

Author[s]: Piotr Indyk and David Woodruff

Optimal Approximations of the Frequency Moments

July 2, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-014.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-014.pdf

We give a one-pass, O~(m^{1-2/k})-space algorithm for estimating the k-th frequency moment of a data stream for any real k>2. Together with known lower bounds, this resolves the main problem left open by Alon, Matias, Szegedy, STOC'96. Our algorithm enables deletions as well as insertions of stream elements.


AIM-2004-013

Author[s]: Antonio Torralba, Kevin P. Murphy, William T. Freeman

Contextual models for object detection using boosted random fields

June 25, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-013.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-013.pdf

We seek to both detect and segment objects in images. To exploit both local image data as well as contextual information, we introduce Boosted Random Fields (BRFs), which uses Boosting to learn the graph structure and local evidence of a conditional random field (CRF). The graph structure is learned by assembling graph fragments in an additive model. The connections between individual pixels are not very informative, but by using dense graphs, we can pool information from large regions of the image; dense models also support efficient inference. We show how contextual information from other objects can improve detection performance, both in terms of accuracy and speed, by using a computational cascade. We apply our system to detect stuff and things in office and street scenes.


AIM-2004-012

Author[s]: Jaime Teevan

How People Re-find Information When the Web Changes

June 18, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-012.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-012.pdf

This paper investigates how people return to information in a dynamic information environment. For example, a person might want to return to Web content via a link encountered earlier on a Web page, only to learn that the link has since been removed. Changes can benefit users by providing new information, but they hinder returning to previously viewed information. The observational study presented here analyzed instances, collected via a Web search, where people expressed difficulty re-finding information because of changes to the information or its environment. A number of interesting observations arose from this analysis, including that the path originally taken to get to the information target appeared important in its re-retrieval, whereas, surprisingly, the temporal aspects of when the information was seen before were not. While people expressed frustration when problems arose, an explanation of why the change had occurred was often sufficient to allay that frustration, even in the absence of a solution. The implications of these observations for systems that support re-finding in dynamic environments are discussed.


AIM-2004-011

Author[s]: Lilla Zollei, John Fisher, William Wells

A Unified Statistical and Information Theoretic Framework for Multi-modal Image Registration

April 28, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-011.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-011.pdf

We formulate and interpret several multi-modal registration methods in the context of a unified statistical and information theoretic framework. A unified interpretation clarifies the implicit assumptions of each method yielding a better understanding of their relative strengths and weaknesses. Additionally, we discuss a generative statistical model from which we derive a novel analysis tool, the "auto-information function", as a means of assessing and exploiting the common spatial dependencies inherent in multi-modal imagery. We analytically derive useful properties of the "auto-information" as well as verify them empirically on multi-modal imagery. Among the useful aspects of the "auto-information function" is that it can be computed from imaging modalities independently and it allows one to decompose the search space of registration problems.


AIM-2004-010

CBCL-238

Author[s]: Jerry Jun Yokono and Tomaso Poggio

Rotation Invariant Object Recognition from One Training Example

April 27, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-010.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-010.pdf

Local descriptors are increasingly used for the task of object recognition because of their perceived robustness with respect to occlusions and to global geometrical deformations. Such a descriptor--based on a set of oriented Gaussian derivative filters-- is used in our recognition system. We report here an evaluation of several techniques for orientation estimation to achieve rotation invariance of the descriptor. We also describe feature selection based on a single training image. Virtual images are generated by rotating and rescaling the image and robust features are selected. The results confirm robust performance in cluttered scenes, in the presence of partial occlusions, and when the object is embedded in different backgrounds.


AITR-2004-009

Author[s]: Nathan Srebro

Learning with Matrix Factorizations

November 22, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-009.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-009.pdf

Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or high-dimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning for over a century, with many new formulations and models suggested in recent years (Latent Semantic Indexing, Aspect Models, Probabilistic PCA, Exponential PCA, Non-Negative Matrix Factorization and others). In this thesis we address several issues related to learning with matrix factorizations: we study the asymptotic behavior and generalization ability of existing methods, suggest new optimization methods, and present a novel maximum-margin high-dimensional matrix factorization formulation.


AIM-2004-009

Author[s]: Antonio Torralba

Contextual Influences on Saliency

April 14, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-009.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-009.pdf

This article describes a model for including scene/context priors in attention guidance. In the proposed scheme, visual context information can be available early in the visual processing chain, in order to modulate the saliency of image regions and to provide an efficient short cut for object detection and recognition. The scene is represented by means of a low-dimensional global description obtained from low-level features. The global scene features are then used to predict the probability of presence of the target object in the scene, and its location and scale, before exploring the image. Scene information can then be used to modulate the saliency of image regions early during the visual processing in order to provide an efficient short cut for object detection and recognition.


AITR-2004-008

Author[s]: Justin Werfel

Neural Network Models for Zebra Finch Song Production and Reinforcement Learning

November 9, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-008.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-008.pdf

The zebra finch is a standard experimental system for studying learning and generation of temporally extended motor patterns. The first part of this project concerned the evaluation of simple models for the operation and structure of the network in the motor nucleus RA. A directed excitatory chain with a global inhibitory network, for which experimental evidence exists, was found to produce waves of activity similar to those observed in RA; this similarity included one particularly important feature of the measured activity, synchrony between the onset of bursting in one neuron and the offset of bursting in another. Other models, which were simpler and more analytically tractable, were also able to exhibit this feature, but not for parameter values quantitatively close to those observed. Another issue of interest concerns how these networks are initially learned by the bird during song acquisition. The second part of the project concerned the analysis of exemplars of REINFORCE algorithms, a general class of algorithms for reinforcement learning in neural networks, which are on several counts more biologically plausible than standard prescriptions such as backpropagation. The former compared favorably with backpropagation on tasks involving single input-output pairs, though a noise analysis suggested it should not perform so well. On tasks involving trajectory learning, REINFORCE algorithms meet with some success, though the analysis that predicts their success on input-output-pair tasks fails to explain it for trajectories.


AIM-2004-008

Author[s]: Antonio Torralba, Kevin P. Murphy, William T. Freeman

Sharing visual features for multiclass and multiview object detection

April 14, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-008.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-008.pdf

We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (run-time) computational complexity, and the (training-time) sample complexity, scales linearly with the number of classes to be detected. It seems unlikely that such an approach will scale up to allow recognition of hundreds or thousands of objects. We present a multi-class boosting procedure (joint boosting) that reduces the computational and sample complexity, by finding common features that can be shared across the classes (and/or views). The detectors for each class are trained jointly, rather than independently. For a given performance level, the total number of features required, and therefore the computational cost, is observed to scale approximately logarithmically with the number of classes. The features selected jointly are closer to edges and generic features typical of many natural structures instead of finding specific object parts. Those generic features generalize better and reduce considerably the computational cost of an algorithm for multi-class object detection.


AITR-2004-007

Author[s]: Lisa Tucker-Kellogg

Systematic Conformational Search with Constraint Satisfaction

October 1, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-007.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-007.pdf

Throughout biological, chemical, and pharmaceutical research, conformational searches are used to explore the possible three-dimensional configurations of molecules. This thesis describes a new systematic method for conformational search, including an application of the method to determining the structure of a peptide via solid-state NMR spectroscopy. A separate portion of the thesis is about protein-DNA binding, with a three-dimensional macromolecular structure determined by x-ray crystallography. The search method in this thesis enumerates all conformations of a molecule (at a given level of torsion angle resolution) that satisfy a set of local geometric constraints, such as constraints derived from NMR experiments. Systematic searches, historically used for small molecules, generally now use some form of divide-and-conquer for application to larger molecules. Our method can achieve a significant improvement in runtime by making some major and counter-intuitive modifications to traditional divide-and-conquer: (1) OmniMerge divides a polymer into many alternative pairs of subchains and searches all the pairs, instead of simply cutting in half and searching two subchains. Although the extra searches may appear wasteful, the bottleneck stage of the overall search, which is to re-connect the conformations of the largest subchains, can be greatly accelerated by the availability of alternative pairs of sidechains. (2) Propagation of disqualified conformations across overlapping subchains can disqualify infeasible conformations very rapidly, which further offsets the cost of searching the extra subchains of OmniMerge. (3) The search may be run in two stages, once at low-resolution using a side-effect of OmniMerge to determine an optimal partitioning of the molecule into efficient subchains; then again at high-resolution while making use of the precomputed subchains. (4) An A* function prioritizes each subchain based on estimated future search costs. Subchains with sufficiently low priority can be omitted from the search, which improves efficiency. A common theme of these four ideas is to make good choices about how to break the large search problem into lower-dimensional subproblems. In addition, the search method uses heuristic local searches within the overall systematic framework, to maintain the systematic guarantee while providing the empirical efficiency of stochastic search. These novel algorithms were implemented and the effectiveness of each innovation is demonstrated on a highly constrained peptide with 40 degrees of freedom.


AIM-2004-007

CBCL-237

Author[s]: Jerry Jun Yokono and Tomaso Poggio

Evaluation of sets of oriented and non-oriented receptive fields as local descriptors

March 24, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-007.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-007.pdf

Local descriptors are increasingly used for the task of object recognition because of their perceived robustness with respect to occlusions and to global geometrical deformations. We propose a performance criterion for a local descriptor based on the tradeoff between selectivity and invariance. In this paper, we evaluate several local descriptors with respect to selectivity and invariance. The descriptors that we evaluated are Gaussian derivatives up to the third order, gray image patches, and Laplacian-based descriptors with either three scales or one scale filters. We compare selectivity and invariance to several affine changes such as rotation, scale, brightness, and viewpoint. Comparisons have been made keeping the dimensionality of the descriptors roughly constant. The overall results indicate a good performance by the descriptor based on a set of oriented Gaussian filters. It is interesting that oriented receptive fields similar to the Gaussian derivatives as well as receptive fields similar to the Laplacian are found in primate visual cortex.


AITR-2004-006

Author[s]: Artur Miguel Arsenio

Cognitive-Developmental Learning for a Humanoid Robot: A Caregiver's Gift

September 26, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-006.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-006.pdf

The goal of this work is to build a cognitive system for the humanoid robot, Cog, that exploits human caregivers as catalysts to perceive and learn about actions, objects, scenes, people, and the robot itself. This thesis addresses a broad spectrum of machine learning problems across several categorization levels. Actions by embodied agents are used to automatically generate training data for the learning mechanisms, so that the robot develops categorization autonomously. Taking inspiration from the human brain, a framework of algorithms and methodologies was implemented to emulate different cognitive capabilities on the humanoid robot Cog. This framework is effectively applied to a collection of AI, computer vision, and signal processing problems. Cognitive capabilities of the humanoid robot are developmentally created, starting from infant-like abilities for detecting, segmenting, and recognizing percepts over multiple sensing modalities. Human caregivers provide a helping hand for communicating such information to the robot. This is done by actions that create meaningful events (by changing the world in which the robot is situated) thus inducing the "compliant perception" of objects from these human-robot interactions. Self-exploration of the world extends the robot's knowledge concerning object properties. This thesis argues for enculturating humanoid robots using infant development as a metaphor for building a humanoid robot's cognitive abilities. A human caregiver redesigns a humanoid's brain by teaching the humanoid robot as she would teach a child, using children's learning aids such as books, drawing boards, or other cognitive artifacts. Multi-modal object properties are learned using these tools and inserted into several recognition schemes, which are then applied to developmentally acquire new object representations. The humanoid robot therefore sees the world through the caregiver's eyes. Building an artificial humanoid robot's brain, even at an infant's cognitive level, has been a long quest which still lies only in the realm of our imagination. Our efforts towards such a dimly imaginable task are developed according to two alternate and complementary views: cognitive and developmental.


AIM-2004-006

CBCL-236

Author[s]: Riesenhuber, Jarudi, Gilad, Sinha

Face processing in humans is compatible with a simple shape-based model of vision

March 5, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-006.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-006.pdf

Understanding how the human visual system recognizes objects is one of the key challenges in neuroscience. Inspired by a large body of physiological evidence (Felleman and Van Essen, 1991; Hubel and Wiesel, 1962; Livingstone and Hubel, 1988; Tso et al., 2001; Zeki, 1993), a general class of recognition models has emerged which is based on a hierarchical organization of visual processing, with succeeding stages being sensitive to image features of increasing complexity (Hummel and Biederman, 1992; Riesenhuber and Poggio, 1999; Selfridge, 1959). However, these models appear to be incompatible with some well-known psychophysical results. Prominent among these are experiments investigating recognition impairments caused by vertical inversion of images, especially those of faces. It has been reported that faces that differ “featurally” are much easier to distinguish when inverted than those that differ “configurally” (Freire et al., 2000; Le Grand et al., 2001; Mondloch et al., 2002) – a finding that is difficult to reconcile with the aforementioned models. Here we show that after controlling for subjects’ expectations, there is no difference between “featurally” and “configurally” transformed faces in terms of inversion effect. This result reinforces the plausibility of simple hierarchical models of object representation and recognition in cortex.


AIM-2004-005

Author[s]: Howard Shrobe and Robert Laddaga

New Architectural Models for Visibly Controllable Computing: The Relevance of Dynamic Object Oriented Architectures and Plan Based Computing Models

February 9, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-005.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-005.pdf

Traditionally, we've focussed on the question of how to make a system easy to code the first time, or perhaps on how to ease the system's continued evolution. But if we look at life cycle costs, then we must conclude that the important question is how to make a system easy to operate. To do this we need to make it easy for the operators to see what's going on and to then manipulate the system so that it does what it is supposed to. This is a radically different criterion for success. What makes a computer system visible and controllable? This is a difficult question, but it's clear that today's modern operating systems with nearly 50 million source lines of code are neither. Strikingly, the MIT Lisp Machine and its commercial successors provided almost the same functionality as today's mainstream sytsems, but with only 1 Million lines of code. This paper is a retrospective examination of the features of the Lisp Machine hardware and software system. Our key claim is that by building the Object Abstraction into the lowest tiers of the system, great synergy and clarity were obtained. It is our hope that this is a lesson that can impact tomorrow's designs. We also speculate on how the spirit of the Lisp Machine could be extended to include a comprehensive access control model and how new layers of abstraction could further enrich this model.


AITR-2004-004

Author[s]: Robert A. Hearn

Building Grounded Abstractions for Artificial Intelligence Programming

June 16, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-004.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-004.pdf

Most Artificial Intelligence (AI) work can be characterized as either ``high-level'' (e.g., logical, symbolic) or ``low-level'' (e.g., connectionist networks, behavior-based robotics). Each approach suffers from particular drawbacks. High-level AI uses abstractions that often have no relation to the way real, biological brains work. Low-level AI, on the other hand, tends to lack the powerful abstractions that are needed to express complex structures and relationships. I have tried to combine the best features of both approaches, by building a set of programming abstractions defined in terms of simple, biologically plausible components. At the ``ground level'', I define a primitive, perceptron-like computational unit. I then show how more abstract computational units may be implemented in terms of the primitive units, and show the utility of the abstract units in sample networks. The new units make it possible to build networks using concepts such as long-term memories, short-term memories, and frames. As a demonstration of these abstractions, I have implemented a simulator for ``creatures'' controlled by a network of abstract units. The creatures exist in a simple 2D world, and exhibit behaviors such as catching mobile prey and sorting colored blocks into matching boxes. This program demonstrates that it is possible to build systems that can interact effectively with a dynamic physical environment, yet use symbolic representations to control aspects of their behavior.


AIM-2004-004

CBCL-235

Author[s]: Robert Schneider and Maximilian Riesenhuber

On the difficulty of feature-based attentional modulations in visual object recognition: A modeling study.

January 14, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-004.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-004.pdf

Numerous psychophysical experiments have shown an important role for attentional modulations in vision. Behaviorally, allocation of attention can improve performance in object detection and recognition tasks. At the neural level, attention increases firing rates of neurons in visual cortex whose preferred stimulus is currently attended to. However, it is not yet known how these two phenomena are linked, i.e., how the visual system could be "tuned" in a task-dependent fashion to improve task performance. To answer this question, we performed simulations with the HMAX model of object recognition in cortex [45]. We modulated firing rates of model neurons in accordance with experimental results about effects of feature-based attention on single neurons and measured changes in the model's performance in a variety of object recognition tasks. It turned out that recognition performance could only be improved under very limited circumstances and that attentional influences on the process of object recognition per se tend to display a lack of specificity or raise false alarm rates. These observations lead us to postulate a new role for the observed attention-related neural response modulations.


AITR-2004-003

Author[s]: Jonathan A. Goler

BioJADE: A Design and Simulation Tool for Synthetic Biological Systems

May 28, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-003.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-003.pdf

The next generations of both biological engineering and computer engineering demand that control be exerted at the molecular level. Creating, characterizing and controlling synthetic biological systems may provide us with the ability to build cells that are capable of a plethora of activities, from computation to synthesizing nanostructures. To develop these systems, we must have a set of tools not only for synthesizing systems, but also designing and simulating them. The BioJADE project provides a comprehensive, extensible design and simulation platform for synthetic biology. BioJADE is a graphical design tool built in Java, utilizing a database back end, and supports a range of simulations using an XML communication protocol. BioJADE currently supports a library of over 100 parts with which it can compile designs into actual DNA, and then generate synthesis instructions to build the physical parts. The BioJADE project contributes several tools to Synthetic Biology. BioJADE in itself is a powerful tool for synthetic biology designers. Additionally, we developed and now make use of a centralized BioBricks repository, which enables the sharing of BioBrick components between researchers, and vastly reduces the barriers to entry for aspiring Synthetic Biologists.


AIM-2004-003

Author[s]: Kristen Grauman, Gregory Shakhnarovich, Trevor Darrell

Virtual Visual Hulls: Example-Based 3D Shape Estimation from a Single Silhouette

January 28, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-003.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-003.pdf

Recovering a volumetric model of a person, car, or other object of interest from a single snapshot would be useful for many computer graphics applications. 3D model estimation in general is hard, and currently requires active sensors, multiple views, or integration over time. For a known object class, however, 3D shape can be successfully inferred from a single snapshot. We present a method for generating a ``virtual visual hull''-- an estimate of the 3D shape of an object from a known class, given a single silhouette observed from an unknown viewpoint. For a given class, a large database of multi-view silhouette examples from calibrated, though possibly varied, camera rigs are collected. To infer a novel single view input silhouette's virtual visual hull, we search for 3D shapes in the database which are most consistent with the observed contour. The input is matched to component single views of the multi-view training examples. A set of viewpoint-aligned virtual views are generated from the visual hulls corresponding to these examples. The 3D shape estimate for the input is then found by interpolating between the contours of these aligned views. When the underlying shape is ambiguous given a single view silhouette, we produce multiple visual hull hypotheses; if a sequence of input images is available, a dynamic programming approach is applied to find the maximum likelihood path through the feasible hypotheses over time. We show results of our algorithm on real and synthetic images of people.


AITR-2004-002

Author[s]: Jonathan Kennell

Generative Temporal Planning with Complex Processes

May 18, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-002.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-002.pdf

Autonomous vehicles are increasingly being used in mission-critical applications, and robust methods are needed for controlling these inherently unreliable and complex systems. This thesis advocates the use of model-based programming, which allows mission designers to program autonomous missions at the level of a coach or wing commander. To support such a system, this thesis presents the Spock generative planner. To generate plans, Spock must be able to piece together vehicle commands and team tactics that have a complex behavior represented by concurrent processes. This is in contrast to traditional planners, whose operators represent simple atomic or durative actions. Spock represents operators using the RMPL language, which describes behaviors using parallel and sequential compositions of state and activity episodes. RMPL is useful for controlling mobile autonomous missions because it allows mission designers to quickly encode expressive activity models using object-oriented design methods and an intuitive set of activity combinators. Spock also is significant in that it uniformly represents operators and plan-space processes in terms of Temporal Plan Networks, which support temporal flexibility for robust plan execution. Finally, Spock is implemented as a forward progression optimal planner that walks monotonically forward through plan processes, closing any open conditions and resolving any conflicts. This thesis describes the Spock algorithm in detail, along with example problems and test results.


AIM-2004-002

CBCL-234

Author[s]: Lior Wolf, Amnon Shashua, and Sayan Mukherjee

Selecting Relevant Genes with a Spectral Approach

January 27, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-002.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-002.pdf

Array technologies have made it possible to record simultaneously the expression pattern of thousands of genes. A fundamental problem in the analysis of gene expression data is the identification of highly relevant genes that either discriminate between phenotypic labels or are important with respect to the cellular process studied in the experiment: for example cell cycle or heat shock in yeast experiments, chemical or genetic perturbations of mammalian cell lines, and genes involved in class discovery for human tumors. In this paper we focus on the task of unsupervised gene selection. The problem of selecting a small subset of genes is particularly challenging as the datasets involved are typically characterized by a very small sample size — in the order of few tens of tissue samples — and by a very large feature space as the number of genes tend to be in the high thousands. We propose a model independent approach which scores candidate gene selections using spectral properties of the candidate affinity matrix. The algorithm is very straightforward to implement yet contains a number of remarkable properties which guarantee consistent sparse selections. To illustrate the value of our approach we applied our algorithm on five different datasets. The first consists of time course data from four well studied Hematopoietic cell lines (HL-60, Jurkat, NB4, and U937). The other four datasets include three well studied treatment outcomes (large cell lymphoma, childhood medulloblastomas, breast tumors) and one unpublished dataset (lymph status). We compared our approach both with other unsupervised methods (SOM,PCA,GS) and with supervised methods (SNR,RMB,RFE). The results clearly show that our approach considerably outperforms all the other unsupervised approaches in our study, is competitive with supervised methods and in some case even outperforms supervised approaches.


AITR-2004-001

Author[s]: Oana L. Stamatoiu

Learning Commonsense Categorical Knowledge in a Thread Memory System

May 18, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-001.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AITR-2004-001.pdf

If we are to understand how we can build machines capable of broad purpose learning and reasoning, we must first aim to build systems that can represent, acquire, and reason about the kinds of commonsense knowledge that we humans have about the world. This endeavor suggests steps such as identifying the kinds of knowledge people commonly have about the world, constructing suitable knowledge representations, and exploring the mechanisms that people use to make judgments about the everyday world. In this work, I contribute to these goals by proposing an architecture for a system that can learn commonsense knowledge about the properties and behavior of objects in the world. The architecture described here augments previous machine learning systems in four ways: (1) it relies on a seven dimensional notion of context, built from information recently given to the system, to learn and reason about objects' properties; (2) it has multiple methods that it can use to reason about objects, so that when one method fails, it can fall back on others; (3) it illustrates the usefulness of reasoning about objects by thinking about their similarity to other, better known objects, and by inferring properties of objects from the categories that they belong to; and (4) it represents an attempt to build an autonomous learner and reasoner, that sets its own goals for learning about the world and deduces new facts by reflecting on its acquired knowledge. This thesis describes this architecture, as well as a first implementation, that can learn from sentences such as ``A blue bird flew to the tree'' and ``The small bird flew to the cage'' that birds can fly. One of the main contributions of this work lies in suggesting a further set of salient ideas about how we can build broader purpose commonsense artificial learners and reasoners.


AIM-2004-001

CBCL-233

Author[s]: Alexander Rakhlin, Dmitry Panchenko, Sayan Mukherjee

Risk Bounds for Mixture Density Estimation

January 27, 2004

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-001.ps

ftp://publications.ai.mit.edu/ai-publications/2004/AIM-2004-001.pdf

In this paper we focus on the problem of estimating a bounded density using a finite combination of densities from a given class. We consider the Maximum Likelihood Procedure (MLE) and the greedy procedure described by Li and Barron. Approximation and estimation bounds are given for the above methods. We extend and improve upon the estimation results of Li and Barron, and in particular prove an $O(\frac{1}{\sqrt{n}})$ bound on the estimation error which does not depend on the number of densities in the estimated combination.


AIM-2003-027

Author[s]: Jacob Beal and Seth Gilbert

RamboNodes for the Metropolitan Ad Hoc Network

December 17, 2003

ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-027.ps

ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-027.pdf

We present an algorithm to store data robustly in a large, geographically distributed network by means of localized regions of data storage that move in response to changing conditions. For example, data might migrate away from failures or toward regions of high demand. The PersistentNode algorithm provides this service robustly, but with limited safety guarantees. We use the RAMBO framework to transform PersistentNode into RamboNode, an algorithm that guarantees atomic consistency in exchange for increased cost and decreased liveness. In addition, a half-life analysis of RamboNode shows that it is robust against continuous low-rate failures. Finally, we provide experimental simulations for the algorithm on 2000 nodes, demonstrating how it services requests and examining how it responds to failures.


AIM-2003-026

Author[s]: Kristen Grauman and Trevor Darrell

Fast Contour Matching Using Approximate Earth Mover's Distance

December 5, 2003

ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-026.ps

ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-026.pdf

Weighted graph matching is a good way to align a pair of shapes represented by a set of descriptive local features; the set of correspondences produced by the minimum cost of matching features from one shape to the features of the other often reveals how similar the two shapes are. However, due to the complexity of computing the exact minimum cost matching, previous algorithms could only run efficiently when using a limited number of features per shape, and could not scale to perform retrievals from large databases. We present a contour matching algorithm that quickly computes the minimum weight matching between sets of descriptive local features using a recently introduced low-distortion embedding of the Earth Mover's Distance (EMD) into a normed space. Given a novel embedded contour, the nearest neighbors in a database of embedded contours are retrieved in sublinear time via approximate nearest neighbors search. We demonstrate our shape matching method on databases of 10,000 images of human figures and 60,000 images of handwritten digits.


AIM-2003-025

Author[s]: Yu-Han Chang, Tracey Ho, Leslie Pack Kaelbling

Mobilized ad-hoc networks: A reinforcement learning approach

December 4, 2003

ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-025.ps

ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-025.pdf

Research in mobile ad-hoc networks has focused on situations in which nodes have no control over their movements. We investigate an important but overlooked domain in which nodes do have control over their movements. Reinforcement learning methods can be used to control both packet routing decisions and node mobility, dramatically improving the connectivity of the network. We first motivate the problem by presenting theoretical bounds for the connectivity improvement of partially mobile networks and then present superior empirical results under a variety of different scenarios in which the mobile nodes in our ad-hoc network are embedded with adaptive routing policies and learned movement policies.


AIM-2003-024

CBCL-232

Author[s]: Christian Morgenstern, Bernd Heisele

Component based recognition of objects in an office environment

November 28, 2003

ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-024.ps

ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-024.pdf

We present a component-based approach for recognizing objects under large pose changes. From a set of training images of a given object we extract a large number of components which are clustered based on the similarity of their image features and their locations within the object image. The cluster centers build an initial set of component templates from which we select a subset for the final recognizer. In experiments we evaluate different sizes and types of components and three standard techniques for component selection. The component classifiers are finally compared to global classifiers on a database of four objects.


AIM-2003-023

Author[s]: Jacob Eisenstein

Evolving Robocode Tank Fighters

October 28, 2003

ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-023.ps

ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-023.pdf

In this paper, I describe the application of genetic programming to evolve a controller for a robotic tank in a simulated environment. The purpose is to explore how genetic techniques can best be applied to produce controllers based on subsumption and behavior oriented languages such as REX. As part of my implementation, I developed TableRex, a modification of REX that can be expressed on a fixed-length genome. Using a fixed subsumption architecture of TableRex modules, I evolved robots that beat some of the most competitive hand-coded adversaries.


AIM-2003-022

Author[s]: Michael G. Ross and Leslie Pack Kaelbling

Learning object segmentation from video data

September 8, 2003

ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-022.ps

ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-022.pdf

This memo describes the initial results of a project to create a self-supervised algorithm for learning object segmentation from video data. Developmental psychology and computational experience have demonstrated that the motion segmentation of objects is a simpler, more primitive process than the detection of object boundaries by static image cues. Therefore, motion information provides a plausible supervision signal for learning the static boundary detection task and for evaluating performance on a test set. A video camera and previously developed background subtraction algorithms can automatically produce a large database of motion-segmented images for minimal cost. The purpose of this work is to use the information in such a database to learn how to detect the object boundaries in novel images using static information, such as color, texture, and shape. This work was funded in part by the Office of Naval Research contract #N00014-00-1-0298, in part by the Singapore-MIT Alliance agreement of 11/6/98, and in part by a National Science Foundation Graduate Student Fellowship.


AIM-2003-021

CBCL-231

Author[s]: Minjoon Kouh and Maximilian Riesenhuber

Investigating shape representation in area V4 with HMAX: Orientation and Grating selectivities

September 8, 2003

ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-021.ps

ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-021.pdf

The question of how shape is represented is of central interest to understanding visual processing in cortex. While tuning properties of the cells in early part of the ventral visual stream, thought to be responsible for object recognition in the primate, are comparatively well understood, several different theories have been proposed regarding tuning in higher visual areas, such as V4. We used the model of object recognition in cortex presented by Riesenhuber and Poggio (1999), where more complex shape tuning in higher layers is the result of combining afferent inputs tuned to simpler features, and compared the tuning properties of model units in intermediate layers to those of V4 neurons from the literature. In particular, we investigated the issue of shape representation in visual area V1 and V4 using oriented bars and various types of gratings (polar, hyperbolic, and Cartesian), as used in several physiology experiments. Our computational model was able to reproduce several physiological findings, such as the broadening distribution of the orientation bandwidths and the emergence of a bias toward non-Cartesian stimuli. Interestingly, the simulation results suggest that some V4 neurons receive input from afferents with spatially separated receptive fields, leading to experimentally testable predictions. However, the simulations also show that the stimulus set of Cartesian and non-Cartesian gratings is not sufficiently complex to probe shape tuning in higher areas, necessitating the use of more complex stimulus sets.


AIM-2003-020

CBCL-230

Author[s]: Hiroaki Shimizu and Tomaso Poggio

Direction Estimation of Pedestrian from Images

August 27, 2003

ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-020.ps

ftp://publications.ai.mit.edu/ai-publications/2003/AIM-2003-020.pdf

The capability of estimating the walking direction of people would be useful in many applications such as those involving autonomous cars and robots. We introduce an approach for estimating the walking direction of people from images, based on learning the correct classification of a still image by using SVMs. We find that the performance