Last update Sun Mar 19 05:05:02 2006


Author[s]: Charles C. Kemp and Aaron Edsinger

Visual Tool Tip Detection and Position Estimation for Robotic Manipulation of Unknown Human Tools

November 16, 2005

Robots that use human tools could more easily work with people, perform tasks that are important to people, and benefit from human strategies for accomplishing these tasks. For a wide variety of tools and tasks, control of the tool's endpoint is sufficient for its use. In this paper we present a straight-forward method for rapidly detecting the endpoint of an unmodeled tool and estimating its position with respect to the robot's hand. The robot rotates the tool while using optical flow to detect the most rapidly moving image points, and then finds the 3D position with respect to its hand that best explains these noisy 2D detections. The resulting 3D position estimate allows the robot to control the position of the tool endpoint and predict its visual location. We show successful results for this method using a humanoid robot with a variety of traditional tools, including a pen, a hammer, and pliers, as well as more general tools such as a bottle and the robot's own finger.



Author[s]: T. Serre, M. Kouh, C. Cadieu, U. Knoblich, G. Kreiman, T. Poggio

A theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex

December 19, 2005

We describe a quantitative theory to account for the computations performed by the feedforward path of the ventral stream of visual cortex and the local circuits implementing them. We show that a model instantiating the theory is capable of performing recognition on datasets of complex images at the level of human observers in rapid categorization tasks. We also show that the theory is consistent with (and in some case has predicted) several properties of neurons in V1, V4, IT and PFC. The theory seems sufficiently comprehensive, detailed and satisfactory to represent an interesting challenge for physiologists and modelers: either disprove its basic features or propose alternative theories of equivalent scope. The theory suggests a number of open questions for visual physiology and psychophysics.



Author[s]: Yuri Ivanov, Thomas Serre and Jacob Bouvrie

Confidence weighted classifier combination for multi-modal human identification

December 14, 2005

In this paper we describe a technique of classifier combination used in a human identification system. The system integrates all available features from multi-modal sources within a Bayesian framework. The framework allows representing a class of popular classifier combination rules and methods within a single formalism. It relies on a “per-class” measure of confidence derived from performance of each classifier on training data that is shown to improve performance on a synthetic data set. The method is especially relevant in autonomous surveillance setting where varying time scales and missing features are a common occurrence. We show an application of this technique to the real-world surveillance database of video and audio recordings of people collected over several weeks in the office setting.


Author[s]: Leonid Taycher, Gregory Shakhnarovich, David Demirdjian, and Trevor Darrell

Conditional Random People: Tracking Humans with CRFs and Grid Filters

December 1, 2005

We describe a state-space tracking approach based on a Conditional Random Field (CRF) model, where the observation potentials are \emph{learned} from data. We find functions that embed both state and observation into a space where similarity corresponds to $L_1$ distance, and define an observation potential based on distance in this space. This potential is extremely fast to compute and in conjunction with a grid-filtering framework can be used to reduce a continuous state estimation problem to a discrete one. We show how a state temporal prior in the grid-filter can be computed in a manner similar to a sparse HMM, resulting in real-time system performance. The resulting system is used for human pose tracking in video sequences.


Author[s]: Sanjoy Dasgupta, Adam Tauman Kalai, Claire Monteleoni

Analysis of Perceptron-Based Active Learning

November 17, 2005

We start by showing that in an active learning setting, the Perceptron algorithm needs $\Omega(\frac{1}{\epsilon^2})$ labels to learn linear separators within generalization error $\epsilon$. We then present a simple selective sampling algorithm for this problem, which combines a modification of the perceptron update with an adaptive filtering rule for deciding which points to query. For data distributed uniformly over the unit sphere, we show that our algorithm reaches generalization error $\epsilon$ after asking for just $\tilde{O}(d \log \frac{1}{\epsilon})$ labels. This exponential improvement over the usual sample complexity of supervised learning has previously been demonstrated only for the computationally more complex query-by-committee algorithm.


Author[s]: Claire Monteleoni, Tommi Jaakkola

Online Learning of Non-stationary Sequences

November 17, 2005

We consider an online learning scenario in which the learner can make predictions on the basis of a fixed set of experts. We derive upper and lower relative loss bounds for a class of universal learning algorithms involving a switching dynamics over the choice of the experts. On the basis of the performance bounds we provide the optimal a priori discretization of the switching-rate parameter that governs the switching dynamics. We demonstrate the algorithm in the context of wireless networks.


Author[s]: Alexandr Andoni and Piotr Indyk

New LSH-based Algorithm for Approximate Nearest Neighbor

November 3, 2005

We present an algorithm for c-approximate nearest neighbor problem in a d-dimensional Euclidean space, achieving query time of O(dn^{1/c^2+o(1)}) and space O(dn + n^{1+1/c^2+o(1)}).



Author[s]: Ross Lippert and Ryan Rifkin

Asymptotics of Gaussian Regularized Least-Squares

October 20, 2005

We consider regularized least-squares (RLS) with a Gaussian kernel. We prove that if we let the Gaussian bandwidth $\sigma \rightarrow \infty$ while letting the regularization parameter $\lambda \rightarrow 0$, the RLS solution tends to a polynomial whose order is controlled by the relative rates of decay of $\frac{1}{\sigma^2}$ and $\lambda$: if $\lambda = \sigma^{-(2k+1)}$, then, as $\sigma \rightarrow \infty$, the RLS solution tends to the $k$th order polynomial with minimal empirical error. We illustrate the result with an example.



Author[s]: Gadi Geiger & Domenic G Amara

Towards the Prevention of Dyslexia

October 18, 2005

Previous studies have shown that dyslexic individuals who supplement windowed reading practice with intensive small-scale hand-eye coordination tasks exhibit marked improvement in their reading skills. Here we examine whether similar hand-eye coordination activities, in the form of artwork performed by children in kindergarten, first and second grades, could reduce the number of students at-risk for reading problems. Our results suggest that daily hand-eye coordination activities significantly reduce the number of students at-risk. We believe that the effectiveness of these activities derives from their ability to prepare the students perceptually for reading.



Author[s]: Sanmay Das

Learning to Trade with Insider Information

October 7, 2005

This paper introduces algorithms for learning how to trade using insider (superior) information in Kyle's model of financial markets. Prior results in finance theory relied on the insider having perfect knowledge of the structure and parameters of the market. I show here that it is possible to learn the equilibrium trading strategy when its form is known even without knowledge of the parameters governing trading in the model. However, the rate of convergence to equilibrium is slow, and an approximate algorithm that does not converge to the equilibrium strategy achieves better utility when the horizon is limited. I analyze this approximate algorithm from the perspective of reinforcement learning and discuss the importance of domain knowledge in designing a successful learning algorithm.


Author[s]: Georgios Theocharous, Sridhar Mahadevan, Leslie Pack Kaelbling

Spatial and Temporal Abstractions in POMDPs Applied to Robot Navigation

September 27, 2005

Partially observable Markov decision processes (POMDPs) are a well studied paradigm for programming autonomous robots, where the robot sequentially chooses actions to achieve long term goals efficiently. Unfortunately, for real world robots and other similar domains, the uncertain outcomes of the actions and the fact that the true world state may not be completely observable make learning of models of the world extremely difficult, and using them algorithmically infeasible. In this paper we show that learning POMDP models and planning with them can become significantly easier when we incorporate into our algorithms the notions of spatial and tempral abstraction. We demonstrate the superiority of our algorithms by comparing them with previous flat approaches for large scale robot navigation.


Author[s]: Chris Stauffer

Automated Audio-visual Activity Analysis

September 20, 2005

Current computer vision techniques can effectively monitor gross activities in sparse environments. Unfortunately, visual stimulus is often not sufficient for reliably discriminating between many types of activity. In many cases where the visual information required for a particular task is extremely subtle or non-existent, there is often audio stimulus that is extremely salient for a particular classification or anomaly detection task. Unfortunately unlike visual events, independent sounds are often very ambiguous and not sufficient to define useful events themselves. Without an effective method of learning causally-linked temporal sequences of sound events that are coupled to the visual events, these sound events are generally only useful for independent anomalous sounds detection, e.g., detecting a gunshot or breaking glass. This paper outlines a method for automatically detecting a set of audio events and visual events in a particular environment, for determining statistical anomalies, for automatically clustering these detected events into meaningful clusters, and for learning salient temporal relationships between the audio and visual events. This results in a compact description of the different types of compound audio-visual events in an environment.


Author[s]: Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman

LabelMe: a database and web-based tool for image annotation

September 8, 2005

Research in object detection and recognition in cluttered scenes requires large image collections with ground truth labels. The labels should provide information about the object classes present in each image, as well as their shape and locations, and possibly other attributes such as pose. Such data is useful for testing, as well as for supervised learning. This project provides a web-based annotation tool that makes it easy to annotate images, and to instantly share such annotations with the community. This tool, plus an initial set of 10,000 images (3000 of which have been labeled), can be found at$\sim$brussell/research/LabelMe/intro.html


Author[s]: Whitman Richards

Collective Choice with Uncertain Domain Moldels

August 16, 2005

When groups of individuals make choices among several alternatives, the most compelling social outcome is the Condorcet winner, namely the alternative beating all others in a pair-wise contest. Obviously the Condorcet winner cannot be overturned if one sub-group proposes another alternative it happens to favor. However, in some cases, and especially with haphazard voting, there will be no clear unique winner, with the outcome consisting of a triple of pair-wise winners that each beat different subsets of the alternatives (i.e. a “top-cycle”.) We explore the sensitivity of Condorcet winners to various perturbations in the voting process that lead to top-cycles. Surprisingly, variations in the number of votes for each alternative is much less important than consistency in a voter’s view of how alternatives are related. As more and more voter’s preference orderings on alternatives depart from a shared model of the domain, then unique Condorcet outcomes become increasingly unlikely.



Author[s]: Jerry Jun Yokono and Tomaso Poggio

Boosting a Biologically Inspired Local Descriptor for Geometry-free Face and Full Multi-view 3D Object Recognition

July 7, 2005

Object recognition systems relying on local descriptors are increasingly used because of their perceived robustness with respect to occlusions and to global geometrical deformations. Descriptors of this type -- based on a set of oriented Gaussian derivative filters -- are used in our recognition system. In this paper, we explore a multi-view 3D object recognition system that does not use explicit geometrical information. The basic idea is to find discriminant features to describe an object across different views. A boosting procedure is used to select features out of a large feature pool of local features collected from the positive training examples. We describe experiments on face images with excellent recognition rate.



Author[s]: Chou Hung, Gabriel Kreiman, Tomaso Poggio, James J. DiCarlo

Ultra-fast Object Recognition from Few Spikes

July 6, 2005

Understanding the complex brain computations leading to object recognition requires quantitatively characterizing the information represented in inferior temporal cortex (IT), the highest stage of the primate visual stream. A read-out technique based on a trainable classifier is used to characterize the neural coding of selectivity and invariance at the population level. The activity of very small populations of independently recorded IT neurons (~100 randomly selected cells) over very short time intervals (as small as 12.5 ms) contains surprisingly accurate and robust information about both object ‘identity’ and ‘category’, which is furthermore highly invariant to object position and scale. Significantly, selectivity and invariance are present even for novel objects, indicating that these properties arise from the intrinsic circuitry and do not require object-specific learning. Within the limits of the technique, there is no detectable difference in the latency or temporal resolution of the IT information supporting so-called ‘categorization’ (a.k. basic level) and ‘identification’ (a.k. subordinate level) tasks. Furthermore, where information, in particular information about stimulus location and scale, can also be read-out from the same small population of IT neurons. These results show how it is possible to decode invariant object information rapidly, accurately and robustly from a small population in IT and provide insights into the nature of the neural code for different kinds of object-related information.


Author[s]: ali rahimi, ben recht, trevor darrell

Nonlinear Latent Variable Models for Video Sequences

June 6, 2005

Many high-dimensional time-varying signals can be modeled as a sequence of noisy nonlinear observations of a low-dimensional dynamical process. Given high-dimensional observations and a distribution describing the dynamical process, we present a computationally inexpensive approximate algorithm for estimating the inverse of this mapping. Once this mapping is learned, we can invert it to construct a generative model for the signals. Our algorithm can be thought of as learning a manifold of images by taking into account the dynamics underlying the low-dimensional representation of these images. It also serves as a nonlinear system identification procedure that estimates the inverse of the observation function in nonlinear dynamic system. Our algorithm reduces to a generalized eigenvalue problem, so it does not suffer from the computational or local minimum issues traditionally associated with nonlinear system identification, allowing us to apply it to the problem of learning generative models for video sequences.


Author[s]: Florent Segonne, Jean-Philippe Pons, Bruce Fischl, and Eric Grimson

A Novel Active Contour Framework. Multi-component Level Set Evolution under Topology Control

June 1, 2005

We present a novel framework to exert a topology control over a level set evolution. Level set methods offer several advantages over parametric active contours, in particular automated topological changes. In some applications, where some a priori knowledge of the target topology is available, topological changes may not be desirable. A method, based on the concept of simple point borrowed from digital topology, was recently proposed to achieve a strict topology preservation during a level set evolution. However, topologically constrained evolutions often generate topological barriers that lead to large geometric inconsistencies. We introduce a topologically controlled level set framework that greatly alleviates this problem. Unlike existing work, our method allows connected components to merge, split or vanish under some specific conditions that ensure that no topological defects are generated. We demonstrate the strength of our method on a wide range of numerical experiments.



Author[s]: Andrea Caponnetto, Lorenzo Rosasco, Ernesto De Vito and Alessandro Verri

Empirical Effective Dimension and Optimal Rates for Regularized Least Squares Algorithm

May 27, 2005

This paper presents an approach to model selection for regularized least-squares on reproducing kernel Hilbert spaces in the semi-supervised setting. The role of effective dimension was recently shown to be crucial in the definition of a rule for the choice of the regularization parameter, attaining asymptotic optimal performances in a minimax sense. The main goal of the present paper is showing how the effective dimension can be replaced by an empirical counterpart while conserving optimality. The empirical effective dimension can be computed from independent unlabelled samples. This makes the approach particularly appealing in the semi-supervised setting.



Author[s]: Andrea Caponnetto and Alexander Rakhlin

Some Properties of Empirical Risk Minimization over Donsker Classes

May 17, 2005

We study properties of algorithms which minimize (or almost minimize) empirical error over a Donsker class of functions. We show that the L2-diameter of the set of almost-minimizers is converging to zero in probability. Therefore, as the number of samples grows, it is becoming unlikely that adding a point (or a number of points) to the training set will result in a large jump (in L2 distance) to a new hypothesis. We also show that under some conditions the expected errors of the almost-minimizers are becoming close with a rate faster than n^{-1/2}.


Author[s]: Thade Nahnsen, Ozlem Uzuner, Boris Katz

Lexical Chains and Sliding Locality Windows in Content-based Text Similarity Detection

May 19, 2005

We present a system to determine content similarity of documents. More specifically, our goal is to identify book chapters that are translations of the same original chapter; this task requires identification of not only the different topics in the documents but also the particular flow of these topics. We experiment with different representations employing n-grams of lexical chains and test these representations on a corpus of approximately 1000 chapters gathered from books with multiple parallel translations. Our representations include the cosine similarity of attribute vectors of n-grams of lexical chains, the cosine similarity of tf*idf-weighted keywords, and the cosine similarity of unweighted lexical chains (unigrams of lexical chains) as well as multiplicative combinations of the similarity measures produced by these approaches. Our results identify fourgrams of unordered lexical chains as a particularly useful representation for text similarity evaluation.


Author[s]: Christopher Taylor, Ali Rahimi, Jonathan Bachrach and Howard Shrobe

Simultaneous Localization, Calibration, and Tracking in an ad Hoc Sensor Network

April 26, 2005

We introduce Simultaneous Localization and Tracking (SLAT), the problem of tracking a target in a sensor network while simultaneously localizing and calibrating the nodes of the network. Our proposed solution, LaSLAT, is a Bayesian filter providing on-line probabilistic estimates of sensor locations and target tracks. It does not require globally accessible beacon signals or accurate ranging between the nodes. When applied to a network of 27 sensor nodes, our algorithm can localize the nodes to within one or two centimeters.



Author[s]: Ernesto De Vito and Andrea Caponnetto

Risk Bounds for Regularized Least-squares Algorithm with Operator-valued kernels

May 16, 2005

We show that recent results in [3] on risk bounds for regularized least-squares on reproducing kernel Hilbert spaces can be straightforwardly extended to the vector-valued regression setting. We first briefly introduce central concepts on operator-valued kernels. Then we show how risk bounds can be expressed in terms of a generalization of effective dimension.


Author[s]: Jacob Eisenstein and Randall Davis

Gestural Cues for Sentence Segmentation

April 19, 2005

In human-human dialogues, face-to-face meetings are often preferred over phone conversations. One explanation is that non-verbal modalities such as gesture provide additional information, making communication more efficient and accurate. If so, computer processing of natural language could improve by attending to non-verbal modalities as well. We consider the problem of sentence segmentation, using hand-annotated gesture features to improve recognition. We find that gesture features correlate well with sentence boundaries, but that these features improve the overall performance of a language-only system only marginally. This finding is in line with previous research on this topic. We provide a regression analysis, revealing that for sentence boundary detection, the gestural features are largely redundant with the language model and pause features. This suggests that gestural features can still be useful when speech recognition is inaccurate.



Author[s]: Andrea Caponnetto and Ernesto De Vito

Fast Rates for Regularized Least-squares Algorithm

April 14, 2005

We develop a theoretical analysis of generalization performances of regularized least-squares on reproducing kernel Hilbert spaces for supervised learning. We show that the concept of effective dimension of an integral operator plays a central role in the definition of a criterion for the choice of the regularization parameter as a function of the number of samples. In fact, a minimax analysis is performed which shows asymptotic optimality of the above-mentioned criterion.


Author[s]: Jacob Beal

Learning From Snapshot Examples

April 13, 2005

Examples are a powerful tool for teaching both humans and computers. In order to learn from examples, however, a student must first extract the examples from its stream of perception. Snapshot learning is a general approach to this problem, in which relevant samples of perception are used as examples. Learning from these examples can in turn improve the judgement of the snapshot mechanism, improving the quality of future examples. One way to implement snapshot learning is the Top-Cliff heuristic, which identifies relevant samples using a generalized notion of peaks. I apply snapshot learning with the Top-Cliff heuristic to solve a distributed learning problem and show that the resulting system learns rapidly and robustly, and can hallucinate useful examples in a perceptual stream from a teacherless system.


Author[s]: Justin Werfel, Yaneer Bar-Yam, Radhika Nagpal

Construction by robot swarms using extended stigmergy

April 8, 2005

We describe a system in which simple, identical, autonomous robots assemble two-dimensional structures out of identical building blocks. We show that, in a system divided in this way into mobile units and structural units, giving the blocks limited communication abilities enables robots to have sufficient global structural knowledge to rapidly build elaborate pre-designed structures. In this way we extend the principle of stigmergy (storing information in the environment) used by social insects, by increasing the capabilities of the blocks that represent that environmental information. As a result, arbitrary solid structures can be built using a few fixed, local behaviors, without requiring construction to be planned out in detail.


Author[s]: Kilian M. Pohl, John Fisher, W. Eric L. Grimson, William M. Wells

An Expectation Maximization Approach for Integrated Registration, Segmentation, and Intensity Correction

April 1, 2005

This paper presents a statistical framework which combines the registration of an atlas with the segmentation of MR images. We use an Expectation Maximization-based algorithm to find a solution within the model, which simultaneously estimates image inhomogeneities, anatomical labelmap, and a mapping from the atlas to the image space. An example of the approach is given for a brain structure-dependent affine mapping approach. The algorithm produces high quality segmentations for brain tissues as well as their substructures. We demonstrate the approach on a set of 30 brain MR images. In addition, we show that the approach performs better than similar methods which separate the registration from the segmentation problem.



Author[s]: Lior Wolf & Stanley Bileschi

Combining Variable Selection with Dimensionality Reduction

March 30, 2005

This paper bridges the gap between variable selection methods (e.g., Pearson coefficients, KS test) and dimensionality reduction algorithms (e.g., PCA, LDA). Variable selection algorithms encounter difficulties dealing with highly correlated data, since many features are similar in quality. Dimensionality reduction algorithms tend to combine all variables and cannot select a subset of significant variables. Our approach combines both methodologies by applying variable selection followed by dimensionality reduction. This combination makes sense only when using the same utility function in both stages, which we do. The resulting algorithm benefits from complex features as variable selection algorithms do, and at the same time enjoys the benefits of dimensionality reduction.1


Author[s]: Leonid Taycher, John W. Fisher III, and Trevor Darrell

Combining Object and Feature Dynamics in Probabilistic Tracking

March 2, 2005

Objects can exhibit different dynamics at different scales, a property that is often exploited by visual tracking algorithms. A local dynamic model is typically used to extract image features that are then used as inputs to a system for tracking the entire object using a global dynamic model. Approximate local dynamics may be brittle---point trackers drift due to image noise and adaptive background models adapt to foreground objects that become stationary---but constraints from the global model can make them more robust. We propose a probabilistic framework for incorporating global dynamics knowledge into the local feature extraction processes. A global tracking algorithm can be formulated as a generative model and used to predict feature values that influence the observation process of the feature extractor. We combine such models in a multichain graphical model framework. We show the utility of our framework for improving feature tracking and thus shape and motion estimates in a batch factorization algorithm. We also propose an approximate filtering algorithm appropriate for online applications, and demonstrate its application to problems such as background subtraction, structure from motion and articulated body tracking.


Author[s]: Kristen Grauman and Trevor Darrell

Pyramid Match Kernels: Discriminative Classification with Sets of Image Features

March 17, 2005

Discriminative learning is challenging when examples are sets of local image features, and the sets vary in cardinality and lack any sort of meaningful ordering. Kernel-based classification methods can learn complex decision boundaries, but a kernel similarity measure for unordered set inputs must somehow solve for correspondences -- generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function which maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in this space. This ``pyramid match" computation is linear in the number of features, and it implicitly finds correspondences based on the finest resolution histogram cell where a matched pair first appears. Since the kernel does not penalize the presence of extra features, it is robust to clutter. We show the kernel function is positive-definite, making it valid for use in learning algorithms whose optimal solutions are guaranteed only for Mercer kernels. We demonstrate our algorithm on object recognition tasks and show it to be dramatically faster than current approaches.



Author[s]: Benjamin Balas, Pawan Sinha

Receptive field structures for recognition

March 1, 2005

Localized operators, like Gabor wavelets and difference-of-Gaussian filters, are considered to be useful tools for image representation. This is due to their ability to form a ‘sparse code’ that can serve as a basis set for high-fidelity reconstruction of natural images. However, for many visual tasks, the more appropriate criterion of representational efficacy is ‘recognition’, rather than ‘reconstruction’. It is unclear whether simple local features provide the stability necessary to subserve robust recognition of complex objects. In this paper, we search the space of two-lobed differential operators for those that constitute a good representational code under recognition/discrimination criteria. We find that a novel operator, which we call the ‘dissociated dipole’ displays useful properties in this regard. We describe simple computational experiments to assess the merits of such dipoles relative to the more traditional local operators. The results suggest that non-local operators constitute a vocabulary that is stable across a range of image transformations.


Author[s]: Josef Sivic, Bryan C. Russell, Alexei A. Efros, Andrew Zisserman, William T. Freeman

Discovering object categories in image collections

February 25, 2005

Given a set of images containing multiple object categories, we seek to discover those categories and their image locations without supervision. We achieve this using generative models from the statistical text literature: probabilistic Latent Semantic Analysis (pLSA), and Latent Dirichlet Allocation (LDA). In text analysis these are used to discover topics in a corpus using the bag-of-words document representation. Here we discover topics as object categories, so that an image containing instances of several categories is modelled as a mixture of topics. The models are applied to images by using a visual analogue of a word, formed by vector quantizing SIFT like region descriptors. We investigate a set of increasingly demanding scenarios, starting with image sets containing only two object categories through to sets containing multiple categories (including airplanes, cars, faces, motorbikes, spotted cats) and background clutter. The object categories sample both intra-class and scale variation, and both the categories and their approximate spatial layout are found without supervision. We also demonstrate classification of unseen images and images containing multiple objects. Performance of the proposed unsupervised method is compared to the semi-supervised approach of Fergus et al.


Author[s]: Ozlem Uzuner

Identifying Expression Fingerprints using Linguistic Information

November 16, 2005

This thesis presents a technology to complement taxation-based policy proposals aimed at addressing the digital copyright problem. The approach presented facilitates identification of intellectual property using expression fingerprints. Copyright law protects expression of content. Recognizing literary works for copyright protection requires identification of the expression of their content. The expression fingerprints described in this thesis use a novel set of linguistic features that capture both the content presented in documents and the manner of expression used in conveying this content. These fingerprints consist of both syntactic and semantic elements of language. Examples of the syntactic elements of expression include structures of embedding and embedded verb phrases. The semantic elements of expression consist of high-level, broad semantic categories. Syntactic and semantic elements of expression enable generation of models that correctly identify books and their paraphrases 82% of the time, providing a significant (approximately 18%) improvement over models that use tfidf-weighted keywords. The performance of models built with these features is also better than models created with standard features used in stylometry (e.g., function words), which yield an accuracy of 62%. In the non-digital world, copyright holders collect revenues by controlling distribution of their works. Current approaches to the digital copyright problem attempt to provide copyright holders with the same kind of control over distribution by employing Digital Rights Management (DRM) systems. However, DRM systems also enable copyright holders to control and limit fair use, to inhibit others' speech, and to collect private information about individual users of digital works. Digital tracking technologies enable alternate solutions to the digital copyright problem; some of these solutions can protect creative incentives of copyright holders in the absence of control over distribution of works. Expression fingerprints facilitate digital tracking even when literary works are DRM- and watermark-free, and even when they are paraphrased. As such, they enable metering popularity of works and make practicable solutions that encourage large-scale dissemination and unrestricted use of digital works and that protect the revenues of copyright holders, for example through taxation-based revenue collection and distribution systems, without imposing limits on distribution.


Author[s]: Reina Riemann, Keith Winstein

Improving 802.11 Range with Forward Error Correction

February 24, 2005

The ISO/IEC 8802-11:1999(E) specification uses a 32-bit CRC for error detection and whole-packet retransmissions for recovery. In long-distance or high-interference links where the probability of a bit error is high, this strategy results in excessive losses, because any erroneous bit causes an entire packet to be discarded. By ignoring the CRC and adding redundancy to 802.11 payloads in software, we achieved substantially reduced loss rates on indoor and outdoor long-distance links and extended line-of-sight range outdoors by 70 percent.


Author[s]: Christopher J. Taylor

Simultaneous Localization and Tracking in Wireless Ad-hoc Sensor Networks

May 31, 2005

In this thesis we present LaSLAT, a sensor network algorithm that simultaneously localizes sensors, calibrates sensing hardware, and tracks unconstrained moving targets using only range measurements between the sensors and the target. LaSLAT is based on a Bayesian filter, which updates a probability distribution over the quantities of interest as measurements arrive. The algorithm is distributable, and requires only a constant amount of space with respect to the number of measurements incorporated. LaSLAT is easy to adapt to new types of hardware and new physical environments due to its use of intuitive probability distributions: one adaptation demonstrated in this thesis uses a mixture measurement model to detect and compensate for bad acoustic range measurements due to echoes. We also present results from a centralized Java implementation of LaSLAT on both two- and three-dimensional sensor networks in which ranges are obtained using the Cricket ranging system. LaSLAT is able to localize sensors to within several centimeters of their ground truth positions while recovering a range measurement bias for each sensor and the complete trajectory of the mobile.


Author[s]: Gerald Jay Sussman and Jack Wisdom

Functional Differential Geometry

February 2, 2005

Differential geometry is deceptively simple. It is surprisingly easy to get the right answer with unclear and informal symbol manipulation. To address this problem we use computer programs to communicate a precise understanding of the computations in differential geometry. Expressing the methods of differential geometry in a computer language forces them to be unambiguous and computationally effective. The task of formulating a method as a computer-executable program and debugging that program is a powerful exercise in the learning process. Also, once formalized procedurally, a mathematical idea becomes a tool that can be used directly to compute results.



Author[s]: Jia Jane Wu

Comparing Visual Features for Morphing Based Recognition

May 25, 2005

This thesis presents a method of object classification using the idea of deformable shape matching. Three types of visual features, geometric blur, C1 and SIFT, are used to generate feature descriptors. These feature descriptors are then used to find point correspondences between pairs of images. Various morphable models are created by small subsets of these correspondences using thin-plate spline. Given these morphs, a simple algorithm, least median of squares (LMEDS), is used to find the best morph. A scoring metric, using both LMEDS and distance transform, is used to classify test images based on a nearest neighbor algorithm. We perform the experiments on the Caltech 101 dataset [5]. To ease computation, for each test image, a shortlist is created containing 10 of the most likely candidates. We were unable to duplicate the performance of [1] in the shortlist stage because we did not use hand-segmentation to extract objects for our training images. However, our gain from the shortlist to correspondence stage is comparable to theirs. In our experiments, we improved from 21% to 28% (gain of 33%), while [1] improved from 41% to 48% (gain of 17%). We find that using a non-shape based approach, C2 [14], the overall classification rate of 33.61% is higher than all of the shaped based methods tested in our experiments.



Author[s]: Benjamin Balas

Using computational models to study texture representations in the human visual system.

February 7, 2005

Traditionally, human texture perception has been studied using artificial textures made of random-dot patterns or abstract structured elements. At the same time, computer algorithms for the synthesis of natural textures have improved dramatically. The current study seeks to unify these two fields of research through a psychophysical assessment of a particular computational model, thus providing a sense of what image statistics are most vital for representing a range of natural textures. We employ Portilla and Simoncelli’s 2000 model of texture synthesis for this task (a parametric model of analysis and synthesis designed to mimic computations carried out by the human visual system). We find an intriguing interaction between texture type (periodic v. structured) and image statistics (autocorrelation function and filter magnitude correlations), suggesting different processing strategies may be employed for these two texture families under pre-attentive viewing.


Author[s]: Attila Kondacs

Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods

January 28, 2005

In this thesis I will be concerned with linking the observed speech signal to the configuration of articulators. Due to the potentially rapid motion of the articulators, the speech signal can be highly non-stationary. The typical linear analysis techniques that assume quasi-stationarity may not have sufficient time-frequency resolution to determine the place of articulation. I argue that the traditional low and high-level primitives of speech processing, frequency and phonemes, are inadequate and should be replaced by a representation with three layers: 1. short pitch period resonances and other spatio-temporal patterns 2. articulator configuration trajectories 3. syllables. The patterns indicate articulator configuration trajectories (how the tongue, jaws, etc. are moving), which are interpreted as syllables and words. My patterns are an alternative to frequency. I use short time-domain features of the sound waveform, which can be extracted from each vowel pitch period pattern, to identify the positions of the articulators with high reliability. These features are important because by capitalizing on detailed measurements within a single pitch period, the rapid articulator movements can be tracked. No linear signal processing approach can achieve the combination of sensitivity to short term changes and measurement accuracy resulting from these nonlinear techniques. The measurements I use are neurophysiologically plausible: the auditory system could be using similar methods. I have demonstrated this approach by constructing a robust technique for categorizing the English voiced stops as the consonants B, D, or G based on the vocalic portions of their releases. The classification recognizes 93.5%, 81.8% and 86.1% of the b, d and g to ae transitions with false positive rates 2.9%, 8.7% and 2.6% respectively.


Author[s]: Jacob Beal, Gerald Sussman

Biologically-Inspired Robust Spatial Programming

January 18, 2005

Inspired by the robustness and flexibility of biological systems, we are developing linguistic and programming tools to allow us to program spatial systems populated by vast numbers of unreliable components interconnected in unknown, irregular, and time-varying ways. We organize our computations around geometry, making the fact that our system is made up of discrete individuals implicit. Geometry allows us to specify requirements in terms of the behavior of the space occupied by the aggregate rather than the behavior of individuals, thereby decreasing complexity. So we describe the behavior of space explicitly, abstracting away the discrete nature of the components. As an example, we present the Amorphous Medium Language, which describes behavior in terms of homeostatic maintenance of constraints on nested regions of space.



Author[s]: Minjoon Kouh and Tomaso Poggio

A general mechanism for tuning: Gain control circuits and synapses underlie tuning of cortical neurons

December 31, 2004

Tuning to an optimal stimulus is a widespread property of neurons in cortex. We propose that such tuning is a consequence of normalization or gain control circuits. We also present a biologically plausible neural circuitry of tuning.


Author[s]: Percy Liang and Nathan Srebro

Methods and Experiments With Bounded Tree-width Markov Networks

December 30, 2004

Markov trees generalize naturally to bounded tree-width Markov networks, on which exact computations can still be done efficiently. However, learning the maximum likelihood Markov network with tree-width greater than 1 is NP-hard, so we discuss a few algorithms for approximating the optimal Markov network. We present a set of methods for training a density estimator. Each method is specified by three arguments: tree-width, model scoring metric (maximum likelihood or minimum description length), and model representation (using one joint distribution or several class-conditional distributions). On these methods, we give empirical results on density estimation and classification tasks and explore the implications of these arguments.


Author[s]: Whitman Richards & H. Sebastian Seung

Neural Voting Machines

December 31, 2004

“Winner-take-all” networks typically pick as winners that alternative with the largest excitatory input. This choice is far from optimal when there is uncertainty in the strength of the inputs, and when information is available about how alternatives may be related. In the Social Choice community, many other procedures will yield more robust winners. The Borda Count and the pair-wise Condorcet tally are among the most favored. Their implementations are simple modifications of classical recurrent networks.


Author[s]: Luis Perez-Breva

Cascading Regularized Classifiers

April 21, 2004

Among the various methods to combine classifiers, Boosting was originally thought as an stratagem to cascade pairs of classifiers through their disagreement. I recover the same idea from the work of Niyogi et al. to show how to loosen the requirement of weak learnability, central to Boosting, and introduce a new cascading stratagem. The paper concludes with an empirical study of an implementation of the cascade that, under assumptions that mirror the conditions imposed by Viola and Jones in [VJ01], has the property to preserve the generalization ability of boosting.


Author[s]: Kristen Grauman and Trevor Darrell

Efficient Image Matching with Distributions of Local Invariant Features

November 22, 2004

Sets of local features that are invariant to common image transformations are an effective representation to use when comparing images; current methods typically judge feature sets' similarity via a voting scheme (which ignores co-occurrence statistics) or by comparing histograms over a set of prototypes (which must be found by clustering). We present a method for efficiently comparing images based on their discrete distributions (bags) of distinctive local invariant features, without clustering descriptors. Similarity between images is measured with an approximation of the Earth Mover's Distance (EMD), which quickly computes the minimal-cost correspondence between two bags of features. Each image's feature distribution is mapped into a normed space with a low-distortion embedding of EMD. Examples most similar to a novel query image are retrieved in time sublinear in the number of examples via approximate nearest neighbor search in the embedded space. We also show how the feature representation may be extended to encode the distribution of geometric constraints between the invariant features appearing in each image. We evaluate our technique with scene recognition and texture classification tasks.



Author[s]: Thomas Serre, Lior Wolf and Tomaso Poggio

A new biologically motivated framework for robust object recognition

November 14, 2004

In this paper, we introduce a novel set of features for robust object recognition, which exhibits outstanding performances on a variety of object categories while being capable of learning from only a few training examples. Each element of this set is a complex feature obtained by combining position- and scale-tolerant edge-detectors over neighboring positions and multiple orientations. Our system - motivated by a quantitative model of visual cortex - outperforms state-of-the-art systems on a variety of object image datasets from different groups. We also show that our system is able to learn from very few examples with no prior category knowledge. The success of the approach is also a suggestive plausibility proof for a class of feed-forward models of object recognition in cortex. Finally, we conjecture the existence of a universal overcomplete dictionary of features that could handle the recognition of all object categories.



Author[s]: Lior Wolf and Ian Martin

Regularization Through Feature Knock Out

November 12, 2004

In this paper, we present and analyze a novel regularization technique based on enhancing our dataset with corrupted copies of the original data. The motivation is that since the learning algorithm lacks information about which parts of the data are reliable, it has to produce more robust classification functions. We then demonstrate how this regularization leads to redundancy in the resulting classifiers, which is somewhat in contrast to the common interpretations of the Occam’s razor principle. Using this framework, we propose a simple addition to the gentle boosting algorithm which enables it to work with only a few examples. We test this new algorithm on a variety of datasets and show convincing results.



Author[s]: Charles Cadieu, Minjoon Kouh, Maximilian Riesenhuber, and Tomaso Poggio

Shape Representation in V4: Investigating Position-Specific Tuning for Boundary Conformation with the Standard Model of Object Recognition

November 12, 2004

The computational processes in the intermediate stages of the ventral pathway responsible for visual object recognition are not well understood. A recent physiological study by A. Pasupathy and C. Connor in intermediate area V4 using contour stimuli, proposes that a population of V4 neurons display bjectcentered, position-specific curvature tuning [18]. The “standard model” of object recognition, a recently developed model [23] to account for recognition properties of IT cells (extending classical suggestions by Hubel, Wiesel and others [9, 10, 19]), is used here to model the response of the V4 cells described in [18]. Our results show that a feedforward, network level mechanism can exhibit selectivity and invariance properties that correspond to the responses of the V4 cells described in [18]. These results suggest how object-centered, position-specific curvature tuning of V4 cells may arise from combinations of complex V1 cell responses. Furthermore, the model makes predictions about the responses of the same V4 cells studied by Pasupathy and Connor to novel gray level patterns, such as gratings and natural images. These predictions suggest specific experiments to further explore shape representation in V4.


Author[s]: Kurt Steinkraus, Leslie Pack Kaelbling

Combining dynamic abstractions in large MDPs

October 21, 2004

One of the reasons that it is difficult to plan and act in real-world domains is that they are very large. Existing research generally deals with the large domain size using a static representation and exploiting a single type of domain structure. In this paper, we create a framework that encapsulates existing and new abstraction and approximation methods into modules, and combines arbitrary modules into a system that allows for dynamic representation changes. We show that the dynamic changes of representation allow our framework to solve larger and more interesting domains than were previously possible, and while there are no optimality guarantees, suitable module choices gain tractability at little cost to optimality.


Author[s]: Gene Yeo, Eric Van Nostrand, Dirk Holste, Tomaso Poggio, Christopher Burge

Predictive identification of alternative events conserved in human and mouse

September 30, 2004

Alternative pre-messenger RNA splicing affects a majority of human genes and plays important roles in development and disease. Alternative splicing (AS) events conserved since the divergence of human and mouse are likely of primary biological importance, but relatively few such events are known. Here we describe sequence features that distinguish exons subject to evolutionarily conserved AS, which we call 'alternative- conserved exons' (ACEs) from other orthologous human/mouse exons, and integrate these features into an exon classification algorithm, ACEScan. Genome-wide analysis of annotated orthologous human-mouse exon pairs identified ~2,000 predicted ACEs. Alternative splicing was verified in both human and mouse tissues using an RT-PCR- sequencing protocol for 21 of 30 (70%) predicted ACEs tested, supporting the validity of a majority of ACEScan predictions. By contrast, AS was observed in mouse tissues for only 2 of 15 (13%) tested exons which had EST or cDNA evidence of AS in human but were not predicted ACEs, and was never observed for eleven negative control exons in human or mouse tissues. Predicted ACEs were much more likely to preserve reading frame, and less likely to disrupt protein domains than other AS events, and were enriched in genes expressed in the brain and in genes involved in transcriptional regulation, RNA processing and development. Our results also imply that the vast majority of AS events represented in the human EST databases are not conserved in mouse, and therefore may represent aberrant, disease- or allele-specific, or highly lineage-restricted splicing events.


Author[s]: Michael R. Benjamin

The Interval Programming Model for Multi-objective Decision Making

September 27, 2004

The interval programming model (IvP) is a mathematical programming model for representing and solving multi-objective optimization problems. The central characteristic of the model is the use of piecewise linearly defined objective functions and a solution method that searches through the combination space of pieces rather than through the actual decision space. The piecewise functions typically represent an approximation of some underlying function, but this concession is balanced on the positive side by relative freedom from function form assumptions as well as the assurance of global optimality. In this paper the model and solution algorithms are described, and the applicability of IvP to certain applications are discussed.



Author[s]: Gabriel Kreiman, Chou Hung, Tomaso Poggio, James DiCarlo

Selectivity of Local Field Potentials in Macaque Inferior Temporal Cortex

September 21, 2004

While single neurons in inferior temporal (IT) cortex show differential responses to distinct complex stimuli, little is known about the responses of populations of neurons in IT. We recorded single electrode data, including multi-unit activity (MUA) and local field potentials (LFP), from 618 sites in the inferior temporal cortex of macaque monkeys while the animals passively viewed 78 different pictures of complex stimuli. The LFPs were obtained by low-pass filtering the extracellular electrophysiological signal with a corner frequency of 300 Hz. As reported previously, we observed that spike counts from MUA showed selectivity for some of the pictures. Strikingly, the LFP data, which is thought to constitute an average over large numbers of neurons, also showed significantly selective responses. The LFP responses were less selective than the MUA responses both in terms of the proportion of selective sites as well as in the selectivity of each site. We observed that there was only little overlap between the selectivity of MUA and LFP recordings from the same electrode. To assess the spatial organization of selective responses, we compared the selectivity of nearby sites recorded along the same penetration and sites recorded from different penetrations. We observed that MUA selectivity was correlated on spatial scales up to 800 m while the LFP selectivity was correlated over a larger spatial extent, with significant correlations between sites separated by several mm. Our data support the idea that there is some topographical arrangement to the organization of selectivity in inferior temporal cortex and that this organization may be relevant for the representation of object identity in IT.


Author[s]: Charles Kemp, Thomas L. Griffiths and Joshua B. Tenenbaum

Discovering Latent Classes in Relational Data

July 22, 2004

We present a framework for learning abstract relational knowledge with the aim of explaining how people acquire intuitive theories of physical, biological, or social systems. Our approach is based on a generative relational model with latent classes, and simultaneously determines the kinds of entities that exist in a domain, the number of these latent classes, and the relations between classes that are possible or likely. This model goes beyond previous psychological models of category learning, which consider attributes associated with individual categories but not relationships between categories. We apply this domain-general framework to two specific problems: learning the structure of kinship systems and learning causal theories.


Author[s]: Ozlem Uzuner

Distribution Volume Tracking on Privacy-Enhanced Wireless Grid

July 25, 2004

In this paper, we discuss a wireless grid in which users are highly mobile, and form ad-hoc and sometimes short-lived connections with other devices. As they roam through networks, the users may choose to employ privacy-enhancing technologies to address their privacy needs and benefit from the computational power of the grid for a variety of tasks, including sharing content. The high rate of mobility of the users on the wireless grid, when combined with privacy enhancing mechanisms and ad-hoc connections, makes it difficult to conclusively link devices and/or individuals with network activities and to hold them liable for particular downloads. Protecting intellectual property in this scenario requires a solution that can work in absence of knowledge about behavior of particular individuals. Building on previous work, we argue for a solution that ensures proper compensation to content owners without inhibiting use and dissemination of works. Our proposal is based on digital tracking for measuring distribution volume of content and compensation of authors based on this accounting information. The emphasis is on obtaining good estimates of rate of popularity of works, without keeping track of activities of individuals or devices. The contribution of this paper is a revenue protection mechanism, Distribution Volume Tracking, that does not invade the privacy of users in the wireless grid and works even in the presence of privacy-enhancing technologies they may employ.



Author[s]: Thomas Serre and Maximilian Riesenhuber

Realistic Modeling of Simple and Complex Cell Tuning in the HMAX Model, and Implications for Invariant Object Recognition in Cortex

July 27, 2004

Riesenhuber \& Poggio recently proposed a model of object recognition in cortex which, beyond integrating general beliefs about the visual system in a quantitative framework, made testable predictions about visual processing. In particular, they showed that invariant object representation could be obtained with a selective pooling mechanism over properly chosen afferents through a {\sc max} operation: For instance, at the complex cells level, pooling over a group of simple cells at the same preferred orientation and position in space but at slightly different spatial frequency would provide scale tolerance, while pooling over a group of simple cells at the same preferred orientation and spatial frequency but at slightly different position in space would provide position tolerance. Indirect support for such mechanisms in the visual system come from the ability of the architecture at the top level to replicate shape tuning as well as shift and size invariance properties of ``view-tuned cells'' (VTUs) found in inferotemporal cortex (IT), the highest area in the ventral visual stream, thought to be crucial in mediating object recognition in cortex. There is also now good physiological evidence that a {\sc max} operation is performed at various levels along the ventral stream. However, in the original paper by Riesenhuber \& Poggio, tuning and pooling parameters of model units in early and intermediate areas were only qualitatively inspired by physiological data. In particular, many studies have investigated the tuning properties of simple and complex cells in primary visual cortex, V1. We show that units in the early levels of HMAX can be tuned to produce realistic simple and complex cell-like tuning, and that the earlier findings on the invariance properties of model VTUs still hold in this more realistic version of the model.


Author[s]: Tevfik Metin Sezgin and Randall Davis

Early Sketch Processing with Application in HMM Based Sketch Recognition

July 28, 2004

Freehand sketching is a natural and crucial part of everyday human interaction, yet is almost totally unsupported by current user interfaces. With the increasing availability of tablet notebooks and pen based PDAs, sketch based interaction has gained attention as a natural interaction modality. We are working to combine the flexibility and ease of use of paper and pencil with the processing power of a computer, to produce a user interface for design that feels as natural as paper, yet is considerably smarter. One of the most basic tasks in accomplishing this is converting the original digitized pen strokes in a sketch into the intended geometric objects. In this paper we describe an implemented system that combines multiple sources of knowledge to provide robust early processing for freehand sketching. We also show how this early processing system can be used as part of a fast sketch recognition system with polynomial time segmentation and recognition algorithms.


Author[s]: Mihai Badoiu, Piotr Indyk, Anastasios Sidiropoulos

A Constant-Factor Approximation Algorithm for Embedding Unweighted Graphs into Trees

July 5, 2004

We present a constant-factor approximation algorithm for computing an embedding of the shortest path metric of an unweighted graph into a tree, that minimizes the multiplicative distortion.


Author[s]: Piotr Indyk and David Woodruff

Optimal Approximations of the Frequency Moments

July 2, 2004

We give a one-pass, O~(m^{1-2/k})-space algorithm for estimating the k-th frequency moment of a data stream for any real k>2. Together with known lower bounds, this resolves the main problem left open by Alon, Matias, Szegedy, STOC'96. Our algorithm enables deletions as well as insertions of stream elements.


Author[s]: Antonio Torralba, Kevin P. Murphy, William T. Freeman

Contextual models for object detection using boosted random fields

June 25, 2004

We seek to both detect and segment objects in images. To exploit both local image data as well as contextual information, we introduce Boosted Random Fields (BRFs), which uses Boosting to learn the graph structure and local evidence of a conditional random field (CRF). The graph structure is learned by assembling graph fragments in an additive model. The connections between individual pixels are not very informative, but by using dense graphs, we can pool information from large regions of the image; dense models also support efficient inference. We show how contextual information from other objects can improve detection performance, both in terms of accuracy and speed, by using a computational cascade. We apply our system to detect stuff and things in office and street scenes.


Author[s]: Jaime Teevan

How People Re-find Information When the Web Changes

June 18, 2004

This paper investigates how people return to information in a dynamic information environment. For example, a person might want to return to Web content via a link encountered earlier on a Web page, only to learn that the link has since been removed. Changes can benefit users by providing new information, but they hinder returning to previously viewed information. The observational study presented here analyzed instances, collected via a Web search, where people expressed difficulty re-finding information because of changes to the information or its environment. A number of interesting observations arose from this analysis, including that the path originally taken to get to the information target appeared important in its re-retrieval, whereas, surprisingly, the temporal aspects of when the information was seen before were not. While people expressed frustration when problems arose, an explanation of why the change had occurred was often sufficient to allay that frustration, even in the absence of a solution. The implications of these observations for systems that support re-finding in dynamic environments are discussed.


Author[s]: Lilla Zollei, John Fisher, William Wells

A Unified Statistical and Information Theoretic Framework for Multi-modal Image Registration

April 28, 2004

We formulate and interpret several multi-modal registration methods in the context of a unified statistical and information theoretic framework. A unified interpretation clarifies the implicit assumptions of each method yielding a better understanding of their relative strengths and weaknesses. Additionally, we discuss a generative statistical model from which we derive a novel analysis tool, the "auto-information function", as a means of assessing and exploiting the common spatial dependencies inherent in multi-modal imagery. We analytically derive useful properties of the "auto-information" as well as verify them empirically on multi-modal imagery. Among the useful aspects of the "auto-information function" is that it can be computed from imaging modalities independently and it allows one to decompose the search space of registration problems.



Author[s]: Jerry Jun Yokono and Tomaso Poggio

Rotation Invariant Object Recognition from One Training Example

April 27, 2004

Local descriptors are increasingly used for the task of object recognition because of their perceived robustness with respect to occlusions and to global geometrical deformations. Such a descriptor--based on a set of oriented Gaussian derivative filters-- is used in our recognition system. We report here an evaluation of several techniques for orientation estimation to achieve rotation invariance of the descriptor. We also describe feature selection based on a single training image. Virtual images are generated by rotating and rescaling the image and robust features are selected. The results confirm robust performance in cluttered scenes, in the presence of partial occlusions, and when the object is embedded in different backgrounds.


Author[s]: Nathan Srebro

Learning with Matrix Factorizations

November 22, 2004

Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or high-dimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning for over a century, with many new formulations and models suggested in recent years (Latent Semantic Indexing, Aspect Models, Probabilistic PCA, Exponential PCA, Non-Negative Matrix Factorization and others). In this thesis we address several issues related to learning with matrix factorizations: we study the asymptotic behavior and generalization ability of existing methods, suggest new optimization methods, and present a novel maximum-margin high-dimensional matrix factorization formulation.


Author[s]: Antonio Torralba

Contextual Influences on Saliency

April 14, 2004

This article describes a model for including scene/context priors in attention guidance. In the proposed scheme, visual context information can be available early in the visual processing chain, in order to modulate the saliency of image regions and to provide an efficient short cut for object detection and recognition. The scene is represented by means of a low-dimensional global description obtained from low-level features. The global scene features are then used to predict the probability of presence of the target object in the scene, and its location and scale, before exploring the image. Scene information can then be used to modulate the saliency of image regions early during the visual processing in order to provide an efficient short cut for object detection and recognition.


Author[s]: Justin Werfel

Neural Network Models for Zebra Finch Song Production and Reinforcement Learning

November 9, 2004

The zebra finch is a standard experimental system for studying learning and generation of temporally extended motor patterns. The first part of this project concerned the evaluation of simple models for the operation and structure of the network in the motor nucleus RA. A directed excitatory chain with a global inhibitory network, for which experimental evidence exists, was found to produce waves of activity similar to those observed in RA; this similarity included one particularly important feature of the measured activity, synchrony between the onset of bursting in one neuron and the offset of bursting in another. Other models, which were simpler and more analytically tractable, were also able to exhibit this feature, but not for parameter values quantitatively close to those observed. Another issue of interest concerns how these networks are initially learned by the bird during song acquisition. The second part of the project concerned the analysis of exemplars of REINFORCE algorithms, a general class of algorithms for reinforcement learning in neural networks, which are on several counts more biologically plausible than standard prescriptions such as backpropagation. The former compared favorably with backpropagation on tasks involving single input-output pairs, though a noise analysis suggested it should not perform so well. On tasks involving trajectory learning, REINFORCE algorithms meet with some success, though the analysis that predicts their success on input-output-pair tasks fails to explain it for trajectories.


Author[s]: Antonio Torralba, Kevin P. Murphy, William T. Freeman

Sharing visual features for multiclass and multiview object detection

April 14, 2004

We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (run-time) computational complexity, and the (training-time) sample complexity, scales linearly with the number of classes to be detected. It seems unlikely that such an approach will scale up to allow recognition of hundreds or thousands of objects. We present a multi-class boosting procedure (joint boosting) that reduces the computational and sample complexity, by finding common features that can be shared across the classes (and/or views). The detectors for each class are trained jointly, rather than independently. For a given performance level, the total number of features required, and therefore the computational cost, is observed to scale approximately logarithmically with the number of classes. The features selected jointly are closer to edges and generic features typical of many natural structures instead of finding specific object parts. Those generic features generalize better and reduce considerably the computational cost of an algorithm for multi-class object detection.


Author[s]: Lisa Tucker-Kellogg

Systematic Conformational Search with Constraint Satisfaction

October 1, 2004

Throughout biological, chemical, and pharmaceutical research, conformational searches are used to explore the possible three-dimensional configurations of molecules. This thesis describes a new systematic method for conformational search, including an application of the method to determining the structure of a peptide via solid-state NMR spectroscopy. A separate portion of the thesis is about protein-DNA binding, with a three-dimensional macromolecular structure determined by x-ray crystallography. The search method in this thesis enumerates all conformations of a molecule (at a given level of torsion angle resolution) that satisfy a set of local geometric constraints, such as constraints derived from NMR experiments. Systematic searches, historically used for small molecules, generally now use some form of divide-and-conquer for application to larger molecules. Our method can achieve a significant improvement in runtime by making some major and counter-intuitive modifications to traditional divide-and-conquer: (1) OmniMerge divides a polymer into many alternative pairs of subchains and searches all the pairs, instead of simply cutting in half and searching two subchains. Although the extra searches may appear wasteful, the bottleneck stage of the overall search, which is to re-connect the conformations of the largest subchains, can be greatly accelerated by the availability of alternative pairs of sidechains. (2) Propagation of disqualified conformations across overlapping subchains can disqualify infeasible conformations very rapidly, which further offsets the cost of searching the extra subchains of OmniMerge. (3) The search may be run in two stages, once at low-resolution using a side-effect of OmniMerge to determine an optimal partitioning of the molecule into efficient subchains; then again at high-resolution while making use of the precomputed subchains. (4) An A* function prioritizes each subchain based on estimated future search costs. Subchains with sufficiently low priority can be omitted from the search, which improves efficiency. A common theme of these four ideas is to make good choices about how to break the large search problem into lower-dimensional subproblems. In addition, the search method uses heuristic local searches within the overall systematic framework, to maintain the systematic guarantee while providing the empirical efficiency of stochastic search. These novel algorithms were implemented and the effectiveness of each innovation is demonstrated on a highly constrained peptide with 40 degrees of freedom.



Author[s]: Jerry Jun Yokono and Tomaso Poggio

Evaluation of sets of oriented and non-oriented receptive fields as local descriptors

March 24, 2004

Local descriptors are increasingly used for the task of object recognition because of their perceived robustness with respect to occlusions and to global geometrical deformations. We propose a performance criterion for a local descriptor based on the tradeoff between selectivity and invariance. In this paper, we evaluate several local descriptors with respect to selectivity and invariance. The descriptors that we evaluated are Gaussian derivatives up to the third order, gray image patches, and Laplacian-based descriptors with either three scales or one scale filters. We compare selectivity and invariance to several affine changes such as rotation, scale, brightness, and viewpoint. Comparisons have been made keeping the dimensionality of the descriptors roughly constant. The overall results indicate a good performance by the descriptor based on a set of oriented Gaussian filters. It is interesting that oriented receptive fields similar to the Gaussian derivatives as well as receptive fields similar to the Laplacian are found in primate visual cortex.


Author[s]: Artur Miguel Arsenio

Cognitive-Developmental Learning for a Humanoid Robot: A Caregiver's Gift

September 26, 2004

The goal of this work is to build a cognitive system for the humanoid robot, Cog, that exploits human caregivers as catalysts to perceive and learn about actions, objects, scenes, people, and the robot itself. This thesis addresses a broad spectrum of machine learning problems across several categorization levels. Actions by embodied agents are used to automatically generate training data for the learning mechanisms, so that the robot develops categorization autonomously. Taking inspiration from the human brain, a framework of algorithms and methodologies was implemented to emulate different cognitive capabilities on the humanoid robot Cog. This framework is effectively applied to a collection of AI, computer vision, and signal processing problems. Cognitive capabilities of the humanoid robot are developmentally created, starting from infant-like abilities for detecting, segmenting, and recognizing percepts over multiple sensing modalities. Human caregivers provide a helping hand for communicating such information to the robot. This is done by actions that create meaningful events (by changing the world in which the robot is situated) thus inducing the "compliant perception" of objects from these human-robot interactions. Self-exploration of the world extends the robot's knowledge concerning object properties. This thesis argues for enculturating humanoid robots using infant development as a metaphor for building a humanoid robot's cognitive abilities. A human caregiver redesigns a humanoid's brain by teaching the humanoid robot as she would teach a child, using children's learning aids such as books, drawing boards, or other cognitive artifacts. Multi-modal object properties are learned using these tools and inserted into several recognition schemes, which are then applied to developmentally acquire new object representations. The humanoid robot therefore sees the world through the caregiver's eyes. Building an artificial humanoid robot's brain, even at an infant's cognitive level, has been a long quest which still lies only in the realm of our imagination. Our efforts towards such a dimly imaginable task are developed according to two alternate and complementary views: cognitive and developmental.



Author[s]: Riesenhuber, Jarudi, Gilad, Sinha

Face processing in humans is compatible with a simple shape-based model of vision

March 5, 2004

Understanding how the human visual system recognizes objects is one of the key challenges in neuroscience. Inspired by a large body of physiological evidence (Felleman and Van Essen, 1991; Hubel and Wiesel, 1962; Livingstone and Hubel, 1988; Tso et al., 2001; Zeki, 1993), a general class of recognition models has emerged which is based on a hierarchical organization of visual processing, with succeeding stages being sensitive to image features of increasing complexity (Hummel and Biederman, 1992; Riesenhuber and Poggio, 1999; Selfridge, 1959). However, these models appear to be incompatible with some well-known psychophysical results. Prominent among these are experiments investigating recognition impairments caused by vertical inversion of images, especially those of faces. It has been reported that faces that differ “featurally” are much easier to distinguish when inverted than those that differ “configurally” (Freire et al., 2000; Le Grand et al., 2001; Mondloch et al., 2002) – a finding that is difficult to reconcile with the aforementioned models. Here we show that after controlling for subjects’ expectations, there is no difference between “featurally” and “configurally” transformed faces in terms of inversion effect. This result reinforces the plausibility of simple hierarchical models of object representation and recognition in cortex.


Author[s]: Howard Shrobe and Robert Laddaga

New Architectural Models for Visibly Controllable Computing: The Relevance of Dynamic Object Oriented Architectures and Plan Based Computing Models

February 9, 2004

Traditionally, we've focussed on the question of how to make a system easy to code the first time, or perhaps on how to ease the system's continued evolution. But if we look at life cycle costs, then we must conclude that the important question is how to make a system easy to operate. To do this we need to make it easy for the operators to see what's going on and to then manipulate the system so that it does what it is supposed to. This is a radically different criterion for success. What makes a computer system visible and controllable? This is a difficult question, but it's clear that today's modern operating systems with nearly 50 million source lines of code are neither. Strikingly, the MIT Lisp Machine and its commercial successors provided almost the same functionality as today's mainstream sytsems, but with only 1 Million lines of code. This paper is a retrospective examination of the features of the Lisp Machine hardware and software system. Our key claim is that by building the Object Abstraction into the lowest tiers of the system, great synergy and clarity were obtained. It is our hope that this is a lesson that can impact tomorrow's designs. We also speculate on how the spirit of the Lisp Machine could be extended to include a comprehensive access control model and how new layers of abstraction could further enrich this model.


Author[s]: Robert A. Hearn

Building Grounded Abstractions for Artificial Intelligence Programming

June 16, 2004

Most Artificial Intelligence (AI) work can be characterized as either ``high-level'' (e.g., logical, symbolic) or ``low-level'' (e.g., connectionist networks, behavior-based robotics). Each approach suffers from particular drawbacks. High-level AI uses abstractions that often have no relation to the way real, biological brains work. Low-level AI, on the other hand, tends to lack the powerful abstractions that are needed to express complex structures and relationships. I have tried to combine the best features of both approaches, by building a set of programming abstractions defined in terms of simple, biologically plausible components. At the ``ground level'', I define a primitive, perceptron-like computational unit. I then show how more abstract computational units may be implemented in terms of the primitive units, and show the utility of the abstract units in sample networks. The new units make it possible to build networks using concepts such as long-term memories, short-term memories, and frames. As a demonstration of these abstractions, I have implemented a simulator for ``creatures'' controlled by a network of abstract units. The creatures exist in a simple 2D world, and exhibit behaviors such as catching mobile prey and sorting colored blocks into matching boxes. This program demonstrates that it is possible to build systems that can interact effectively with a dynamic physical environment, yet use symbolic representations to control aspects of their behavior.



Author[s]: Robert Schneider and Maximilian Riesenhuber

On the difficulty of feature-based attentional modulations in visual object recognition: A modeling study.

January 14, 2004

Numerous psychophysical experiments have shown an important role for attentional modulations in vision. Behaviorally, allocation of attention can improve performance in object detection and recognition tasks. At the neural level, attention increases firing rates of neurons in visual cortex whose preferred stimulus is currently attended to. However, it is not yet known how these two phenomena are linked, i.e., how the visual system could be "tuned" in a task-dependent fashion to improve task performance. To answer this question, we performed simulations with the HMAX model of object recognition in cortex [45]. We modulated firing rates of model neurons in accordance with experimental results about effects of feature-based attention on single neurons and measured changes in the model's performance in a variety of object recognition tasks. It turned out that recognition performance could only be improved under very limited circumstances and that attentional influences on the process of object recognition per se tend to display a lack of specificity or raise false alarm rates. These observations lead us to postulate a new role for the observed attention-related neural response modulations.


Author[s]: Jonathan A. Goler

BioJADE: A Design and Simulation Tool for Synthetic Biological Systems

May 28, 2004

The next generations of both biological engineering and computer engineering demand that control be exerted at the molecular level. Creating, characterizing and controlling synthetic biological systems may provide us with the ability to build cells that are capable of a plethora of activities, from computation to synthesizing nanostructures. To develop these systems, we must have a set of tools not only for synthesizing systems, but also designing and simulating them. The BioJADE project provides a comprehensive, extensible design and simulation platform for synthetic biology. BioJADE is a graphical design tool built in Java, utilizing a database back end, and supports a range of simulations using an XML communication protocol. BioJADE currently supports a library of over 100 parts with which it can compile designs into actual DNA, and then generate synthesis instructions to build the physical parts. The BioJADE project contributes several tools to Synthetic Biology. BioJADE in itself is a powerful tool for synthetic biology designers. Additionally, we developed and now make use of a centralized BioBricks repository, which enables the sharing of BioBrick components between researchers, and vastly reduces the barriers to entry for aspiring Synthetic Biologists.


Author[s]: Kristen Grauman, Gregory Shakhnarovich, Trevor Darrell

Virtual Visual Hulls: Example-Based 3D Shape Estimation from a Single Silhouette

January 28, 2004

Recovering a volumetric model of a person, car, or other object of interest from a single snapshot would be useful for many computer graphics applications. 3D model estimation in general is hard, and currently requires active sensors, multiple views, or integration over time. For a known object class, however, 3D shape can be successfully inferred from a single snapshot. We present a method for generating a ``virtual visual hull''-- an estimate of the 3D shape of an object from a known class, given a single silhouette observed from an unknown viewpoint. For a given class, a large database of multi-view silhouette examples from calibrated, though possibly varied, camera rigs are collected. To infer a novel single view input silhouette's virtual visual hull, we search for 3D shapes in the database which are most consistent with the observed contour. The input is matched to component single views of the multi-view training examples. A set of viewpoint-aligned virtual views are generated from the visual hulls corresponding to these examples. The 3D shape estimate for the input is then found by interpolating between the contours of these aligned views. When the underlying shape is ambiguous given a single view silhouette, we produce multiple visual hull hypotheses; if a sequence of input images is available, a dynamic programming approach is applied to find the maximum likelihood path through the feasible hypotheses over time. We show results of our algorithm on real and synthetic images of people.


Author[s]: Jonathan Kennell

Generative Temporal Planning with Complex Processes

May 18, 2004

Autonomous vehicles are increasingly being used in mission-critical applications, and robust methods are needed for controlling these inherently unreliable and complex systems. This thesis advocates the use of model-based programming, which allows mission designers to program autonomous missions at the level of a coach or wing commander. To support such a system, this thesis presents the Spock generative planner. To generate plans, Spock must be able to piece together vehicle commands and team tactics that have a complex behavior represented by concurrent processes. This is in contrast to traditional planners, whose operators represent simple atomic or durative actions. Spock represents operators using the RMPL language, which describes behaviors using parallel and sequential compositions of state and activity episodes. RMPL is useful for controlling mobile autonomous missions because it allows mission designers to quickly encode expressive activity models using object-oriented design methods and an intuitive set of activity combinators. Spock also is significant in that it uniformly represents operators and plan-space processes in terms of Temporal Plan Networks, which support temporal flexibility for robust plan execution. Finally, Spock is implemented as a forward progression optimal planner that walks monotonically forward through plan processes, closing any open conditions and resolving any conflicts. This thesis describes the Spock algorithm in detail, along with example problems and test results.



Author[s]: Lior Wolf, Amnon Shashua, and Sayan Mukherjee

Selecting Relevant Genes with a Spectral Approach

January 27, 2004

Array technologies have made it possible to record simultaneously the expression pattern of thousands of genes. A fundamental problem in the analysis of gene expression data is the identification of highly relevant genes that either discriminate between phenotypic labels or are important with respect to the cellular process studied in the experiment: for example cell cycle or heat shock in yeast experiments, chemical or genetic perturbations of mammalian cell lines, and genes involved in class discovery for human tumors. In this paper we focus on the task of unsupervised gene selection. The problem of selecting a small subset of genes is particularly challenging as the datasets involved are typically characterized by a very small sample size — in the order of few tens of tissue samples — and by a very large feature space as the number of genes tend to be in the high thousands. We propose a model independent approach which scores candidate gene selections using spectral properties of the candidate affinity matrix. The algorithm is very straightforward to implement yet contains a number of remarkable properties which guarantee consistent sparse selections. To illustrate the value of our approach we applied our algorithm on five different datasets. The first consists of time course data from four well studied Hematopoietic cell lines (HL-60, Jurkat, NB4, and U937). The other four datasets include three well studied treatment outcomes (large cell lymphoma, childhood medulloblastomas, breast tumors) and one unpublished dataset (lymph status). We compared our approach both with other unsupervised methods (SOM,PCA,GS) and with supervised methods (SNR,RMB,RFE). The results clearly show that our approach considerably outperforms all the other unsupervised approaches in our study, is competitive with supervised methods and in some case even outperforms supervised approaches.


Author[s]: Oana L. Stamatoiu

Learning Commonsense Categorical Knowledge in a Thread Memory System

May 18, 2004

If we are to understand how we can build machines capable of broad purpose learning and reasoning, we must first aim to build systems that can represent, acquire, and reason about the kinds of commonsense knowledge that we humans have about the world. This endeavor suggests steps such as identifying the kinds of knowledge people commonly have about the world, constructing suitable knowledge representations, and exploring the mechanisms that people use to make judgments about the everyday world. In this work, I contribute to these goals by proposing an architecture for a system that can learn commonsense knowledge about the properties and behavior of objects in the world. The architecture described here augments previous machine learning systems in four ways: (1) it relies on a seven dimensional notion of context, built from information recently given to the system, to learn and reason about objects' properties; (2) it has multiple methods that it can use to reason about objects, so that when one method fails, it can fall back on others; (3) it illustrates the usefulness of reasoning about objects by thinking about their similarity to other, better known objects, and by inferring properties of objects from the categories that they belong to; and (4) it represents an attempt to build an autonomous learner and reasoner, that sets its own goals for learning about the world and deduces new facts by reflecting on its acquired knowledge. This thesis describes this architecture, as well as a first implementation, that can learn from sentences such as ``A blue bird flew to the tree'' and ``The small bird flew to the cage'' that birds can fly. One of the main contributions of this work lies in suggesting a further set of salient ideas about how we can build broader purpose commonsense artificial learners and reasoners.



Author[s]: Alexander Rakhlin, Dmitry Panchenko, Sayan Mukherjee

Risk Bounds for Mixture Density Estimation

January 27, 2004

In this paper we focus on the problem of estimating a bounded density using a finite combination of densities from a given class. We consider the Maximum Likelihood Procedure (MLE) and the greedy procedure described by Li and Barron. Approximation and estimation bounds are given for the above methods. We extend and improve upon the estimation results of Li and Barron, and in particular prove an $O(\frac{1}{\sqrt{n}})$ bound on the estimation error which does not depend on the number of densities in the estimated combination.


Author[s]: Jacob Beal and Seth Gilbert

RamboNodes for the Metropolitan Ad Hoc Network

December 17, 2003

We present an algorithm to store data robustly in a large, geographically distributed network by means of localized regions of data storage that move in response to changing conditions. For example, data might migrate away from failures or toward regions of high demand. The PersistentNode algorithm provides this service robustly, but with limited safety guarantees. We use the RAMBO framework to transform PersistentNode into RamboNode, an algorithm that guarantees atomic consistency in exchange for increased cost and decreased liveness. In addition, a half-life analysis of RamboNode shows that it is robust against continuous low-rate failures. Finally, we provide experimental simulations for the algorithm on 2000 nodes, demonstrating how it services requests and examining how it responds to failures.


Author[s]: Kristen Grauman and Trevor Darrell

Fast Contour Matching Using Approximate Earth Mover's Distance

December 5, 2003

Weighted graph matching is a good way to align a pair of shapes represented by a set of descriptive local features; the set of correspondences produced by the minimum cost of matching features from one shape to the features of the other often reveals how similar the two shapes are. However, due to the complexity of computing the exact minimum cost matching, previous algorithms could only run efficiently when using a limited number of features per shape, and could not scale to perform retrievals from large databases. We present a contour matching algorithm that quickly computes the minimum weight matching between sets of descriptive local features using a recently introduced low-distortion embedding of the Earth Mover's Distance (EMD) into a normed space. Given a novel embedded contour, the nearest neighbors in a database of embedded contours are retrieved in sublinear time via approximate nearest neighbors search. We demonstrate our shape matching method on databases of 10,000 images of human figures and 60,000 images of handwritten digits.


Author[s]: Yu-Han Chang, Tracey Ho, Leslie Pack Kaelbling

Mobilized ad-hoc networks: A reinforcement learning approach

December 4, 2003

Research in mobile ad-hoc networks has focused on situations in which nodes have no control over their movements. We investigate an important but overlooked domain in which nodes do have control over their movements. Reinforcement learning methods can be used to control both packet routing decisions and node mobility, dramatically improving the connectivity of the network. We first motivate the problem by presenting theoretical bounds for the connectivity improvement of partially mobile networks and then present superior empirical results under a variety of different scenarios in which the mobile nodes in our ad-hoc network are embedded with adaptive routing policies and learned movement policies.



Author[s]: Christian Morgenstern, Bernd Heisele

Component based recognition of objects in an office environment

November 28, 2003

We present a component-based approach for recognizing objects under large pose changes. From a set of training images of a given object we extract a large number of components which are clustered based on the similarity of their image features and their locations within the object image. The cluster centers build an initial set of component templates from which we select a subset for the final recognizer. In experiments we evaluate different sizes and types of components and three standard techniques for component selection. The component classifiers are finally compared to global classifiers on a database of four objects.


Author[s]: Jacob Eisenstein

Evolving Robocode Tank Fighters

October 28, 2003

In this paper, I describe the application of genetic programming to evolve a controller for a robotic tank in a simulated environment. The purpose is to explore how genetic techniques can best be applied to produce controllers based on subsumption and behavior oriented languages such as REX. As part of my implementation, I developed TableRex, a modification of REX that can be expressed on a fixed-length genome. Using a fixed subsumption architecture of TableRex modules, I evolved robots that beat some of the most competitive hand-coded adversaries.


Author[s]: Michael G. Ross and Leslie Pack Kaelbling

Learning object segmentation from video data

September 8, 2003

This memo describes the initial results of a project to create a self-supervised algorithm for learning object segmentation from video data. Developmental psychology and computational experience have demonstrated that the motion segmentation of objects is a simpler, more primitive process than the detection of object boundaries by static image cues. Therefore, motion information provides a plausible supervision signal for learning the static boundary detection task and for evaluating performance on a test set. A video camera and previously developed background subtraction algorithms can automatically produce a large database of motion-segmented images for minimal cost. The purpose of this work is to use the information in such a database to learn how to detect the object boundaries in novel images using static information, such as color, texture, and shape. This work was funded in part by the Office of Naval Research contract #N00014-00-1-0298, in part by the Singapore-MIT Alliance agreement of 11/6/98, and in part by a National Science Foundation Graduate Student Fellowship.



Author[s]: Minjoon Kouh and Maximilian Riesenhuber

Investigating shape representation in area V4 with HMAX: Orientation and Grating selectivities

September 8, 2003

The question of how shape is represented is of central interest to understanding visual processing in cortex. While tuning properties of the cells in early part of the ventral visual stream, thought to be responsible for object recognition in the primate, are comparatively well understood, several different theories have been proposed regarding tuning in higher visual areas, such as V4. We used the model of object recognition in cortex presented by Riesenhuber and Poggio (1999), where more complex shape tuning in higher layers is the result of combining afferent inputs tuned to simpler features, and compared the tuning properties of model units in intermediate layers to those of V4 neurons from the literature. In particular, we investigated the issue of shape representation in visual area V1 and V4 using oriented bars and various types of gratings (polar, hyperbolic, and Cartesian), as used in several physiology experiments. Our computational model was able to reproduce several physiological findings, such as the broadening distribution of the orientation bandwidths and the emergence of a bias toward non-Cartesian stimuli. Interestingly, the simulation results suggest that some V4 neurons receive input from afferents with spatially separated receptive fields, leading to experimentally testable predictions. However, the simulations also show that the stimulus set of Cartesian and non-Cartesian gratings is not sufficiently complex to probe shape tuning in higher areas, necessitating the use of more complex stimulus sets.



Author[s]: Hiroaki Shimizu and Tomaso Poggio

Direction Estimation of Pedestrian from Images

August 27, 2003

The capability of estimating the walking direction of people would be useful in many applications such as those involving autonomous cars and robots. We introduce an approach for estimating the walking direction of people from images, based on learning the correct classification of a still image by using SVMs. We find that the performance of the system can be improved by classifying each image of a walking sequence and combining the outputs of the classifier. Experiments were performed to evaluate our system and estimate the trade-off between number of images in walking sequences and performance.


Author[s]: Sayan Mukherjee, Polina Golland and Dmitry Panchenko

Permutation Tests for Classification

August 28, 2003

We introduce and explore an approach to estimating statistical significance of classification accuracy, which is particularly useful in scientific applications of machine learning where high dimensionality of the data and the small number of training examples render most standard convergence bounds too loose to yield a meaningful guarantee of the generalization ability of the classifier. Instead, we estimate statistical significance of the observed classification accuracy, or the likelihood of observing such accuracy by chance due to spurious correlations of the high-dimensional data patterns with the class labels in the given training set. We adopt permutation testing, a non-parametric technique previously developed in classical statistics for hypothesis testing in the generative setting (i.e., comparing two probability distributions). We demonstrate the method on real examples from neuroimaging studies and DNA microarray analysis and suggest a theoretical analysis of the procedure that relates the asymptotic behavior of the test to the existing convergence bounds.



Author[s]: Benjamin J. Balas, Pawan Sinha

Dissociated Dipoles: Image representation via non-local comparisons

August 13, 2003

A fundamental question in visual neuroscience is how to represent image structure. The most common representational schemes rely on differential operators that compare adjacent image regions. While well-suited to encoding local relationships, such operators have significant drawbacks. Specifically, each filter’s span is confounded with the size of its sub-fields, making it difficult to compare small regions across large distances. We find that such long-distance comparisons are more tolerant to common image transformations than purely local ones, suggesting they may provide a useful vocabulary for image encoding. . We introduce the “Dissociated Dipole,” or “Sticks” operator, for encoding non-local image relationships. This operator de-couples filter span from sub-field size, enabling parametric movement between edge and region-based representation modes. We report on the perceptual plausibility of the operator, and the computational advantages of non-local encoding. Our results suggest that non-local encoding may be an effective scheme for representing image structure.


Author[s]: Austin Che

Fluorescence Assay for Polymerase Arrival Rates

August 31, 2003

To engineer complex synthetic biological systems will require modular design, assembly, and characterization strategies. The RNA polymerase arrival rate (PAR) is defined to be the rate that RNA polymerases arrive at a specified location on the DNA. Designing and characterizing biological modules in terms of RNA polymerase arrival rates provides for many advantages in the construction and modeling of biological systems. PARMESAN is an in vitro method for measuring polymerase arrival rates using pyrrolo-dC, a fluorescent DNA base that can substitute for cytosine. Pyrrolo-dC shows a detectable fluorescence difference when in single-stranded versus double-stranded DNA. During transcription, RNA polymerase separates the two strands of DNA, leading to a change in the fluorescence of pyrrolo-dC. By incorporating pyrrolo-dC at specific locations in the DNA, fluorescence changes can be taken as a direct measurement of the polymerase arrival rate.


Author[s]: Jacob Beal

Near-Optimal Distributed Failure Circumscription

August 11, 2003

Small failures should only disrupt a small part of a network. One way to do this is by marking the surrounding area as untrustworthy --- circumscribing the failure. This can be done with a distributed algorithm using hierarchical clustering and neighbor relations, and the resulting circumscription is near-optimal for convex failures.


Author[s]: Krzysztof Gajos and Howard Shrobe

Delegation, Arbitration and High-Level Service Discovery as Key Elements of a Software Infrastructure for Pervasive Computing

June 2003

The dream of pervasive computing is slowly becoming a reality. A number of projects around the world are constantly contributing ideas and solutions that are bound to change the way we interact with our environments and with one another. An essential component of the future is a software infrastructure that is capable of supporting interactions on scales ranging from a single physical space to intercontinental collaborations. Such infrastructure must help applications adapt to very diverse environments and must protect people’s privacy and respect their personal preferences. In this paper we indicate a number of limitations present in the software infrastructures proposed so far (including our previous work). We then describe the framework for building an infrastructure that satisfies the abovementioned criteria. This framework hinges on the concepts of delegation, arbitration and high-level service discovery. Components of our own implementation of such an infrastructure are presented.


Author[s]: Pedro F. Felzenszwalb

Representation and Detection of Shapes in Images

August 8, 2003

We present a set of techniques that can be used to represent and detect shapes in images. Our methods revolve around a particular shape representation based on the description of objects using triangulated polygons. This representation is similar to the medial axis transform and has important properties from a computational perspective. The first problem we consider is the detection of non-rigid objects in images using deformable models. We present an efficient algorithm to solve this problem in a wide range of situations, and show examples in both natural and medical images. We also consider the problem of learning an accurate non-rigid shape model for a class of objects from examples. We show how to learn good models while constraining them to the form required by the detection algorithm. Finally, we consider the problem of low-level image segmentation and grouping. We describe a stochastic grammar that generates arbitrary triangulated polygons while capturing Gestalt principles of shape regularity. This grammar is used as a prior model over random shapes in a low level algorithm that detects objects in images.


Author[s]: Kimberle Koile, Konrad Tollmar, David Demirdjian, Howard Shrobe and Trevor Darrell

Activity Zones for Context-Aware Computing

June 10, 2003

Location is a primary cue in many context-aware computing systems, and is often represented as a global coordinate, room number, or Euclidean distance various landmarks. A user?s concept of location, however, is often defined in terms of regions in which common activities occur. We show how to partition a space into such regions based on patterns of observed user location and motion. These regions, which we call activity zones, represent regions of similar user activity, and can be used to trigger application actions, retrieve information based on previous context, and present information to users. We suggest that context- aware applications can benefit from a location representation learned from observing users. We describe an implementation of our system and present two example applications whose behavior is controlled by users? entry, exit, and presence in the zones.


Author[s]: Samson Timoner

Compact Representations for Fast Nonrigid Registration of Medical Images

July 4, 2003

We develop efficient techniques for the non-rigid registration of medical images by using representations that adapt to the anatomy found in such images. Images of anatomical structures typically have uniform intensity interiors and smooth boundaries. We create methods to represent such regions compactly using tetrahedra. Unlike voxel-based representations, tetrahedra can accurately describe the expected smooth surfaces of medical objects. Furthermore, the interior of such objects can be represented using a small number of tetrahedra. Rather than describing a medical object using tens of thousands of voxels, our representations generally contain only a few thousand elements. Tetrahedra facilitate the creation of efficient non-rigid registration algorithms based on finite element methods (FEM). We create a fast, FEM-based method to non-rigidly register segmented anatomical structures from two subjects. Using our compact tetrahedral representations, this method generally requires less than one minute of processing time on a desktop PC. We also create a novel method for the non-rigid registration of gray scale images. To facilitate a fast method, we create a tetrahedral representation of a displacement field that automatically adapts to both the anatomy in an image and to the displacement field. The resulting algorithm has a computational cost that is dominated by the number of nodes in the mesh (about 10,000), rather than the number of voxels in an image (nearly 10,000,000). For many non-rigid registration problems, we can find a transformation from one image to another in five minutes. This speed is important as it allows use of the algorithm during surgery. We apply our algorithms to find correlations between the shape of anatomical structures and the presence of schizophrenia. We show that a study based on our representations outperforms studies based on other representations. We also use the results of our non-rigid registration algorithm as the basis of a segmentation algorithm. That algorithm also outperforms other methods in our tests, producing smoother segmentations and more accurately reproducing manual segmentations.


Author[s]: Martin C. Martin

The Essential Dynamics Algorithm: Essential Results

May 2003

This paper presents a novel algorithm for learning in a class of stochastic Markov decision processes (MDPs) with continuous state and action spaces that trades speed for accuracy. A transform of the stochastic MDP into a deterministic one is presented which captures the essence of the original dynamics, in a sense made precise. In this transformed MDP, the calculation of values is greatly simplified. The online algorithm estimates the model of the transformed MDP and simultaneously does policy search against it. Bounds on the error of this approximation are proven, and experimental results in a bicycle riding domain are presented. The algorithm learns near optimal policies in orders of magnitude fewer interactions with the stochastic MDP, using less domain knowledge. All code used in the experiments is available on the project’s web site.


Author[s]: Lily Lee

Gait Analysis for Classification

June 26, 2003

This thesis describes a representation of gait appearance for the purpose of person identification and classification. This gait representation is based on simple localized image features such as moments extracted from orthogonal view video silhouettes of human walking motion. A suite of time-integration methods, spanning a range of coarseness of time aggregation and modeling of feature distributions, are applied to these image features to create a suite of gait sequence representations. Despite their simplicity, the resulting feature vectors contain enough information to perform well on human identification and gender classification tasks. We demonstrate the accuracy of recognition on gait video sequences collected over different days and times and under varying lighting environments. Each of the integration methods are investigated for their advantages and disadvantages. An improved gait representation is built based on our experiences with the initial set of gait representations. In addition, we show gender classification results using our gait appearance features, the effect of our heuristic feature selection method, and the significance of individual features.


Author[s]: Lawrence Shih and David Karger

Learning Classes Correlated to a Hierarchy

May 2003

Trees are a common way of organizing large amounts of information by placing items with similar characteristics near one another in the tree. We introduce a classification problem where a given tree structure gives us information on the best way to label nearby elements. We suggest there are many practical problems that fall under this domain. We propose a way to map the classification problem onto a standard Bayesian inference problem. We also give a fast, specialized inference algorithm that incrementally updates relevant probabilities. We apply this algorithm to web-classification problems and show that our algorithm empirically works well.


Author[s]: Matthew J. Marjanovic

Teaching an Old Robot New Tricks: Learning Novel Tasks via Interaction with People and Things

June 20, 2003

As AI has begun to reach out beyond its symbolic, objectivist roots into the embodied, experientialist realm, many projects are exploring different aspects of creating machines which interact with and respond to the world as humans do. Techniques for visual processing, object recognition, emotional response, gesture production and recognition, etc., are necessary components of a complete humanoid robot. However, most projects invariably concentrate on developing a few of these individual components, neglecting the issue of how all of these pieces would eventually fit together. The focus of the work in this dissertation is on creating a framework into which such specific competencies can be embedded, in a way that they can interact with each other and build layers of new functionality. To be of any practical value, such a framework must satisfy the real-world constraints of functioning in real-time with noisy sensors and actuators. The humanoid robot Cog provides an unapologetically adequate platform from which to take on such a challenge. This work makes three contributions to embodied AI. First, it offers a general-purpose architecture for developing behavior-based systems distributed over networks of PC's. Second, it provides a motor-control system that simulates several biological features which impact the development of motor behavior. Third, it develops a framework for a system which enables a robot to learn new behaviors via interacting with itself and the outside world. A few basic functional modules are built into this framework, enough to demonstrate the robot learning some very simple behaviors taught by a human trainer. A primary motivation for this project is the notion that it is practically impossible to build an "intelligent" machine unless it is designed partly to build itself. This work is a proof-of-concept of such an approach to integrating multiple perceptual and motor systems into a complete learning agent.


Author[s]: Andreas F. Wehowsky

Safe Distributed Coordination of Heterogeneous Robots through Dynamic Simple Temporal Networks

May 30, 2003

Research on autonomous intelligent systems has focused on how robots can robustly carry out missions in uncertain and harsh environments with very little or no human intervention. Robotic execution languages such as RAPs, ESL, and TDL improve robustness by managing functionally redundant procedures for achieving goals. The model-based programming approach extends this by guaranteeing correctness of execution through pre-planning of non-deterministic timed threads of activities. Executing model-based programs effectively on distributed autonomous platforms requires distributing this pre-planning process. This thesis presents a distributed planner for modelbased programs whose planning and execution is distributed among agents with widely varying levels of processor power and memory resources. We make two key contributions. First, we reformulate a model-based program, which describes cooperative activities, into a hierarchical dynamic simple temporal network. This enables efficient distributed coordination of robots and supports deployment on heterogeneous robots. Second, we introduce a distributed temporal planner, called DTP, which solves hierarchical dynamic simple temporal networks with the assistance of the distributed Bellman- Ford shortest path algorithm. The implementation of DTP has been demonstrated successfully on a wide range of randomly generated examples and on a pursuer-evader challenge problem in simulation.


Author[s]: Jacob Beal

A Robust Amorphous Hierarchy from Persistent Nodes

May 1, 2003

For a very large network deployed in space with only nearby nodes able to talk to each other, we want to do tasks like robust routing and data storage. One way to organize the network is via a hierarchy, but hierarchies often have a few critical nodes whose death can disrupt organization over long distances. I address this with a system of distributed aggregates called Persistent Nodes, such that spatially local failures disrupt the hierarchy in an area proportional to the diameter of the failure. I describe and analyze this system, which has been implemented in simulation.


Author[s]: Claire Monteleoni

Online Learning of Non-stationary Sequences

June 12, 2003

We consider an online learning scenario in which the learner can make predictions on the basis of a fixed set of experts. The performance of each expert may change over time in a manner unknown to the learner. We formulate a class of universal learning algorithms for this problem by expressing them as simple Bayesian algorithms operating on models analogous to Hidden Markov Models (HMMs). We derive a new performance bound for such algorithms which is considerably simpler than existing bounds. The bound provides the basis for learning the rate at which the identity of the optimal expert switches over time. We find an analytic expression for the a priori resolution at which we need to learn the rate parameter. We extend our scalar switching-rate result to models of the switching-rate that are governed by a matrix of parameters, i.e. arbitrary homogeneous HMMs. We apply and examine our algorithm in the context of the problem of energy management in wireless networks. We analyze the new results in the framework of Information Theory.


Author[s]: Jacob Beal

Persistent Nodes for Reliable Memory in Geographically Local Networks

April 15, 2003

A Persistent Node is a redundant distributed mechanism for storing a key/value pair reliably in a geographically local network. In this paper, I develop a method of establishing Persistent Nodes in an amorphous matrix. I address issues of construction, usage, atomicity guarantees and reliability in the face of stopping failures. Applications include routing, congestion control, and data storage in gigascale networks.



Author[s]: Ezra Rosen

Face Representation in Cortex: Studies Using a Simple and Not So Special Model

June 5, 2003

The face inversion effect has been widely documented as an effect of the uniqueness of face processing. Using a computational model, we show that the face inversion effect is a byproduct of expertise with respect to the face object class. In simulations using HMAX, a hierarchical, shape based model, we show that the magnitude of the inversion effect is a function of the specificity of the representation. Using many, sharply tuned units, an ``expert'' has a large inversion effect. On the other hand, if fewer, broadly tuned units are used, the expertise is lost, and this ``novice'' has a small inversion effect. As the size of the inversion effect is a product of the representation, not the object class, given the right training we can create experts and novices in any object class. Using the same representations as with faces, we create experts and novices for cars. We also measure the feasibility of a view-based model for recognition of rotated objects using HMAX. Using faces, we show that transfer of learning to novel views is possible. Given only one training view, the view-based model can recognize a face at a new orientation via interpolation from the views to which it had been tuned. Although the model can generalize well to upright faces, inverted faces yield poor performance because the features change differently under rotation.


Author[s]: Chris Mario Christoudias, Louis-Philippe Morency and Trevor Darrell

Light Field Morphable Models

April 18, 2003

Statistical shape and texture appearance models are powerful image representations, but previously had been restricted to 2D or simple 3D shapes. In this paper we present a novel 3D morphable model based on image-based rendering techniques, which can represent complex lighting conditions, structures, and surfaces. We describe how to construct a manifold of the multi-view appearance of an object class using light fields and show how to match a 2D image of an object to a point on this manifold. In turn we use the reconstructed light field to render novel views of the object. Our technique overcomes the limitations of polygon based appearance models and uses light fields that are acquired in real-time.



Author[s]: Jennifer Louie

A Biological Model of Object Recognition with Feature Learning

June 2003

Previous biological models of object recognition in cortex have been evaluated using idealized scenes and have hard-coded features, such as the HMAX model by Riesenhuber and Poggio [10]. Because HMAX uses the same set of features for all object classes, it does not perform well in the task of detecting a target object in clutter. This thesis presents a new model that integrates learning of object-specific features with the HMAX. The new model performs better than the standard HMAX and comparably to a computer vision system on face detection. Results from experimenting with unsupervised learning of features and the use of a biologically-plausible classifier are presented.


Author[s]: Gregory Shakhnarovich, Paul Viola and Trevor Darrell

Fast Pose Estimation with Parameter Sensitive Hashing

April 18, 2003

Example-based methods are effective for parameter estimation problems when the underlying system is simple or the dimensionality of the input is low. For complex and high-dimensional problems such as pose estimation, the number of required examples and the computational complexity rapidly becme prohibitively high. We introduce a new algorithm that learns a set of hashing functions that efficiently index examples relevant to a particular estimation task. Our algorithm extends a recently developed method for locality-sensitive hashing, which finds approximate neighbors in time sublinear in the number of examples. This method depends critically on the choice of hash functions; we show how to find the set of hash functions that are optimally relevant to a particular estimation problem. Experiments demonstrate that the resulting algorithm, which we call Parameter-Sensitive Hashing, can rapidly and accurately estimate the articulated pose of human figures from a large database of example images.


Author[s]: Paul Fitzpatrick

From First Contact to Close Encounters: A Developmentally Deep Perceptual System for a Humanoid Robot

June 2003

This thesis presents a perceptual system for a humanoid robot that integrates abilities such as object localization and recognition with the deeper developmental machinery required to forge those competences out of raw physical experiences. It shows that a robotic platform can build up and maintain a system for object localization, segmentation, and recognition, starting from very little. What the robot starts with is a direct solution to achieving figure/ground separation: it simply 'pokes around' in a region of visual ambiguity and watches what happens. If the arm passes through an area, that area is recognized as free space. If the arm collides with an object, causing it to move, the robot can use that motion to segment the object from the background. Once the robot can acquire reliable segmented views of objects, it learns from them, and from then on recognizes and segments those objects without further contact. Both low-level and high-level visual features can also be learned in this way, and examples are presented for both: orientation detection and affordance recognition, respectively. The motivation for this work is simple. Training on large corpora of annotated real-world data has proven crucial for creating robust solutions to perceptual problems such as speech recognition and face detection. But the powerful tools used during training of such systems are typically stripped away at deployment. Ideally they should remain, particularly for unstable tasks such as object detection, where the set of objects needed in a task tomorrow might be different from the set of objects needed today. The key limiting factor is access to training data, but as this thesis shows, that need not be a problem on a robotic platform that can actively probe its environment, and carry out experiments to resolve ambiguity. This work is an instance of a general approach to learning a new perceptual judgment: find special situations in which the perceptual judgment is easy and study these situations to find correlated features that can be observed more generally.


Author[s]: Kristen Grauman, Gregory Shakhnarovich and Trevor Darrell

Inferring 3D Structure with a Statistical Image-Based Shape Model

April 17, 2003

We present an image-based approach to infer 3D structure parameters using a probabilistic "shape+structure'' model. The 3D shape of a class of objects may be represented by sets of contours from silhouette views simultaneously observed from multiple calibrated cameras. Bayesian reconstructions of new shapes can then be estimated using a prior density constructed with a mixture model and probabilistic principal components analysis. We augment the shape model to incorporate structural features of interest; novel examples with missing structure parameters may then be reconstructed to obtain estimates of these parameters. Model matching and parameter inference are done entirely in the image domain and require no explicit 3D construction. Our shape model enables accurate estimation of structure despite segmentation errors or missing views in the input silhouettes, and works even with only a single input view. Using a dataset of thousands of pedestrian images generated from a synthetic model, we can perform accurate inference of the 3D locations of 19 joints on the body based on observed silhouette contours from real images.


Author[s]: Kristen Grauman

A Statistical Image-Based Shape Model for Visual Hull Reconstruction and 3D Structure Inference

May 22, 2003

We present a statistical image-based “shape + structure” model for Bayesian visual hull reconstruction and 3D structure inference. The 3D shape of a class of objects is represented by sets of contours from silhouette views simultaneously observed from multiple calibrated cameras. Bayesian reconstructions of new shapes are then estimated using a prior density constructed with a mixture model and probabilistic principal components analysis. We show how the use of a class-specific prior in a visual hull reconstruction can reduce the effect of segmentation errors from the silhouette extraction process. The proposed method is applied to a data set of pedestrian images, and improvements in the approximate 3D models under various noise conditions are shown. We further augment the shape model to incorporate structural features of interest; unknown structural parameters for a novel set of contours are then inferred via the Bayesian reconstruction process. Model matching and parameter inference are done entirely in the image domain and require no explicit 3D construction. Our shape model enables accurate estimation of structure despite segmentation errors or missing views in the input silhouettes, and works even with only a single input view. Using a data set of thousands of pedestrian images generated from a synthetic model, we can accurately infer the 3D locations of 19 joints on the body based on observed silhouette contours from real images.


Author[s]: Jacob Beal

Leveraging Learning and Language Via Communication Bootstrapping

March 17, 2003

In a Communication Bootstrapping system, peer components with different perceptual worlds invent symbols and syntax based on correlations between their percepts. I propose that Communication Bootstrapping can also be used to acquire functional definitions of words and causal reasoning knowledge. I illustrate this point with several examples, then sketch the architecture of a system in progress which attempts to execute this task.


Author[s]: Louis-Philippe Morency

Stereo-Based Head Pose Tracking Using Iterative Closest Point and Normal Flow Constraint

May 2003

In this text, we present two stereo-based head tracking techniques along with a fast 3D model acquisition system. The first tracking technique is a robust implementation of stereo-based head tracking designed for interactive environments with uncontrolled lighting. We integrate fast face detection and drift reduction algorithms with a gradient-based stereo rigid motion tracking technique. Our system can automatically segment and track a user's head under large rotation and illumination variations. Precision and usability of this approach are compared with previous tracking methods for cursor control and target selection in both desktop and interactive room environments. The second tracking technique is designed to improve the robustness of head pose tracking for fast movements. Our iterative hybrid tracker combines constraints from the ICP (Iterative Closest Point) algorithm and normal flow constraint. This new technique is more precise for small movements and noisy depth than ICP alone, and more robust for large movements than the normal flow constraint alone. We present experiments which test the accuracy of our approach on sequences of real and synthetic stereo images. The 3D model acquisition system we present quickly aligns intensity and depth images, and reconstructs a textured 3D mesh. 3D views are registered with shape alignment based on our iterative hybrid tracker. We reconstruct the 3D model using a new Cubic Ray Projection merging algorithm which takes advantage of a novel data structure: the linked voxel space. We present experiments to test the accuracy of our approach on 3D face modelling using real-time stereo images.


Author[s]: Christine Alvarado, Jaime Teevan, Mark S. Ackerman and David Karger

Surviving the Information Explosion: How People Find Their Electronic Information

April 15, 2003

We report on a study of how people look for information within email, files, and the Web. When locating a document or searching for a specific answer, people relied on their contextual knowledge of their information target to help them find it, often associating the target with a specific document. They appeared to prefer to use this contextual information as a guide in navigating locally in small steps to the desired document rather than directly jumping to their target. We found this behavior was especially true for people with unstructured information organization. We discuss the implications of our findings for the design of personal information management tools.


Author[s]: Antonio Torralba, Kevin P. Murphy, William T. Freeman and Mark A. Rubin

Context-Based Vision System for Place and Object Recognition

March 19, 2003

While navigating in an environment, a vision system has to be able to recognize where it is and what the main objects in the scene are. In this paper we present a context-based vision system for place and object recognition. The goal is to identify familiar locations (e.g., office 610, conference room 941, Main Street), to categorize new environments (office, corridor, street) and to use that information to provide contextual priors for object recognition (e.g., table, chair, car, computer). We present a low- dimensional global image representation that provides relevant information for place recognition and categorization, and how such contextual information introduces strong priors that simplify object recognition. We have trained the system to recognize over 60 locations (indoors and outdoors) and to suggest the presence and locations of more than 20 different object types. The algorithm has been integrated into a mobile system that provides real-time feedback to the user.



Author[s]: Sanmay Das

Intelligent Market-Making in Artificial Financial Markets

June 2003

This thesis describes and evaluates a market-making algorithm for setting prices in financial markets with asymmetric information, and analyzes the properties of artificial markets in which the algorithm is used. The core of our algorithm is a technique for maintaining an online probability density estimate of the underlying value of a stock. Previous theoretical work on market-making has led to price-setting equations for which solutions cannot be achieved in practice, whereas empirical work on algorithms for market-making has focused on sets of heuristics and rules that lack theoretical justification. The algorithm presented in this thesis is theoretically justified by results in finance, and at the same time flexible enough to be easily extended by incorporating modules for dealing with considerations like portfolio risk and competition from other market-makers. We analyze the performance of our algorithm experimentally in artificial markets with different parameter settings and find that many reasonable real-world properties emerge. For example, the spread increases in response to uncertainty about the true value of a stock, average spreads tend to be higher in more volatile markets, and market-makers with lower average spreads perform better in environments with multiple competitive market- makers. In addition, the time series data generated by simple markets populated with market-makers using our algorithm replicate properties of real-world financial time series, such as volatility clustering and the fat-tailed nature of return distributions, without the need to specify explicit models for opinion propagation and herd behavior in the trading crowd.



Author[s]: Izzat N. Jarudi and Pawan Sinha

Relative Contributions of Internal and External Features to Face Recognition

March 2003

The central challenge in face recognition lies in understanding the role different facial features play in our judgments of identity. Notable in this regard are the relative contributions of the internal (eyes, nose and mouth) and external (hair and jaw-line) features. Past studies that have investigated this issue have typically used high-resolution images or good-quality line drawings as facial stimuli. The results obtained are therefore most relevant for understanding the identification of faces at close range. However, given that real-world viewing conditions are rarely optimal, it is also important to know how image degradations, such as loss of resolution caused by large viewing distances, influence our ability to use internal and external features. Here, we report experiments designed to address this issue. Our data characterize how the relative contributions of internal and external features change as a function of image resolution. While we replicated results of previous studies that have shown internal features of familiar faces to be more useful for recognition than external features at high resolution, we found that the two feature sets reverse in importance as resolution decreases. These results suggest that the visual system uses a highly non-linear cue-fusion strategy in combining internal and external features along the dimension of image resolution and that the configural cues that relate the two feature sets play an important role in judgments of facial identity.


Author[s]: Aaron D. Adler

Segmentation and Alignment of Speech and Sketching in a Design Environment

February 2003

Sketches are commonly used in the early stages of design. Our previous system allows users to sketch mechanical systems that the computer interprets. However, some parts of the mechanical system might be too hard or too complicated to express in the sketch. Adding speech recognition to create a multimodal system would move us toward our goal of creating a more natural user interface. This thesis examines the relationship between the verbal and sketch input, particularly how to segment and align the two inputs. Toward this end, subjects were recorded while they sketched and talked. These recordings were transcribed, and a set of rules to perform segmentation and alignment was created. These rules represent the knowledge that the computer needs to perform segmentation and alignment. The rules successfully interpreted the 24 data sets that they were given.



Author[s]: Gadi Geiger, Tony Ezzat and Tomaso Poggio

Perceptual Evaluation of Video-Realistic Speech

February 28, 2003

abstract With many visual speech animation techniques now available, there is a clear need for systematic perceptual evaluation schemes. We describe here our scheme and its application to a new video-realistic (potentially indistinguishable from real recorded video) visual-speech animation system, called Mary 101. Two types of experiments were performed: a) distinguishing visually between real and synthetic image- sequences of the same utterances, ("Turing tests") and b) gauging visual speech recognition by comparing lip-reading performance of the real and synthetic image-sequences of the same utterances ("Intelligibility tests"). Subjects that were presented randomly with either real or synthetic image-sequences could not tell the synthetic from the real sequences above chance level. The same subjects when asked to lip-read the utterances from the same image-sequences recognized speech from real image-sequences significantly better than from synthetic ones. However, performance for both, real and synthetic, were at levels suggested in the literature on lip-reading. We conclude from the two experiments that the animation of Mary 101 is adequate for providing a percept of a talking head. However, additional effort is required to improve the animation for lip-reading purposes like rehabilitation and language learning. In addition, these two tasks could be considered as explicit and implicit perceptual discrimination tasks. In the explicit task (a), each stimulus is classified directly as a synthetic or real image-sequence by detecting a possible difference between the synthetic and the real image-sequences. The implicit perceptual discrimination task (b) consists of a comparison between visual recognition of speech of real and synthetic image-sequences. Our results suggest that implicit perceptual discrimination is a more sensitive method for discrimination between synthetic and real image-sequences than explicit perceptual discrimination.


Author[s]: Leonid Peshkin

Reinforcement Learning by Policy Search

February 14, 2003

One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations can be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are known as partially observable Markov decision processes (POMDPs). While the environment's dynamics are assumed to obey certain rules, the agent does not know them and must learn. In this dissertation we focus on the agent's adaptation as captured by the reinforcement learning framework. This means learning a policy---a mapping of observations into actions---based on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. The set of policies is constrained by the architecture of the agent's controller. POMDPs require a controller to have a memory. We investigate controllers with memory, including controllers with external memory, finite state controllers and distributed controllers for multi-agent systems. For these various controllers we work out the details of the algorithms which learn by ascending the gradient of expected cumulative reinforcement. Building on statistical learning theory and experiment design theory, a policy evaluation algorithm is developed for the case of experience re-use. We address the question of sufficient experience for uniform convergence of policy evaluation and obtain sample complexity bounds for various estimators. Finally, we demonstrate the performance of the proposed algorithms on several domains, the most complex of which is simulated adaptive packet routing in a telecommunication network.


Author[s]: Harald Steck annd Tommi S. Jaakkola

(Semi-)Predictive Discretization During Model Selection

February 25, 2003

In this paper, we present an approach to discretizing multivariate continuous data while learning the structure of a graphical model. We derive the joint scoring function from the principle of predictive accuracy, which inherently ensures the optimal trade- off between goodness of fit and model complexity (including the number of discretization levels). Using the so-called finest grid implied by the data, our scoring function depends only on the number of data points in the various discretization levels. Not only can it be computed efficiently, but it is also independent of the metric used in the continuous space. Our experiments with gene expression data show that discretization plays a crucial role regarding the resulting network structure.


Author[s]: Timothy Chklovski

Using Analogy to Acquire Commonsense Knowledge from Human Contributors

February 12, 2003

The goal of the work reported here is to capture the commonsense knowledge of non-expert human contributors. Achieving this goal will enable more intelligent human-computer interfaces and pave the way for computers to reason about our world. In the domain of natural language processing, it will provide the world knowledge much needed for semantic processing of natural language. To acquire knowledge from contributors not trained in knowledge engineering, I take the following four steps: (i) develop a knowledge representation (KR) model for simple assertions in natural language, (ii) introduce cumulative analogy, a class of nearest-neighbor based analogical reasoning algorithms over this representation, (iii) argue that cumulative analogy is well suited for knowledge acquisition (KA) based on a theoretical analysis of effectiveness of KA with this approach, and (iv) test the KR model and the effectiveness of the cumulative analogy algorithms empirically. To investigate effectiveness of cumulative analogy for KA empirically, Learner, an open source system for KA by cumulative analogy has been implemented, deployed, and evaluated. (The site "1001 Questions," is available at Learner acquires assertion-level knowledge by constructing shallow semantic analogies between a KA topic and its nearest neighbors and posing these analogies as natural language questions to human contributors. Suppose, for example, that based on the knowledge about "newspapers" already present in the knowledge base, Learner judges "newspaper" to be similar to "book" and "magazine." Further suppose that assertions "books contain information" and "magazines contain information" are also already in the knowledge base. Then Learner will use cumulative analogy from the similar topics to ask humans whether "newspapers contain information." Because similarity between topics is computed based on what is already known about them, Learner exhibits bootstrapping behavior --- the quality of its questions improves as it gathers more knowledge. By summing evidence for and against posing any given question, Learner also exhibits noise tolerance, limiting the effect of incorrect similarities. The KA power of shallow semantic analogy from nearest neighbors is one of the main findings of this thesis. I perform an analysis of commonsense knowledge collected by another research effort that did not rely on analogical reasoning and demonstrate that indeed there is sufficient amount of correlation in the knowledge base to motivate using cumulative analogy from nearest neighbors as a KA method. Empirically, evaluating the percentages of questions answered affirmatively, negatively and judged to be nonsensical in the cumulative analogy case compares favorably with the baseline, no-similarity case that relies on random objects rather than nearest neighbors. Of the questions generated by cumulative analogy, contributors answered 45% affirmatively, 28% negatively and marked 13% as nonsensical; in the control, no-similarity case 8% of questions were answered affirmatively, 60% negatively and 26% were marked as nonsensical.


Author[s]: Nathan Srebro and Tommi Jaakkola

Generalized Low-Rank Approximations

January 15, 2003

We study the frequent problem of approximating a target matrix with a matrix of lower rank. We provide a simple and efficient (EM) algorithm for solving {\em weighted} low rank approximation problems, which, unlike simple matrix factorization problems, do not admit a closed form solution in general. We analyze, in addition, the nature of locally optimal solutions that arise in this context, demonstrate the utility of accommodating the weights in reconstructing the underlying low rank representation, and extend the formulation to non-Gaussian noise models such as classification (collaborative filtering).


Author[s]: Philip Mjong-Hyon Shin Kim

Understanding Subsystems in Biology through Dimensionality Reduction, Graph Partitioning and Analytical Modeling

February 5, 2003

Biological systems exhibit rich and complex behavior through the orchestrated interplay of a large array of components. It is hypothesized that separable subsystems with some degree of functional autonomy exist; deciphering their independent behavior and functionality would greatly facilitate understanding the system as a whole. Discovering and analyzing such subsystems are hence pivotal problems in the quest to gain a quantitative understanding of complex biological systems. In this work, using approaches from machine learning, physics and graph theory, methods for the identification and analysis of such subsystems were developed. A novel methodology, based on a recent machine learning algorithm known as non-negative matrix factorization (NMF), was developed to discover such subsystems in a set of large-scale gene expression data. This set of subsystems was then used to predict functional relationships between genes, and this approach was shown to score significantly higher than conventional methods when benchmarking them against existing databases. Moreover, a mathematical treatment was developed to treat simple network subsystems based only on their topology (independent of particular parameter values). Application to a problem of experimental interest demonstrated the need for extentions to the conventional model to fully explain the experimental data. Finally, the notion of a subsystem was evaluated from a topological perspective. A number of different protein networks were examined to analyze their topological properties with respect to separability, seeking to find separable subsystems. These networks were shown to exhibit separability in a nonintuitive fashion, while the separable subsystems were of strong biological significance. It was demonstrated that the separability property found was not due to incomplete or biased data, but is likely to reflect biological structure.



Author[s]: Sayan Mukherjee, Partha Niyogi, Tomaso Poggio and Ryan Rifkin

Statistical Learning: Stability is Sufficient for Generalization and Necessary and Sufficient for Consistency of Empirical Risk Minimization

December 2002 (revised July 2003)

Solutions of learning problems by Empirical Risk Minimization (ERM) need to be consistent, so that they may be predictive. They also need to be well- posed, so that they can be used robustly. We show that a statistical form of well-posedness, defined in terms of the key property of L-stability, is necessary and sufficient for consistency of ERM.



Author[s]: Luis Perez-Breva and Osamu Yoshimi

Model Selection in Summary Evaluation

December 2002

A difficulty in the design of automated text summarization algorithms is in the objective evaluation. Viewing summarization as a tradeoff between length and information content, we introduce a technique based on a hierarchy of classifiers to rank, through model selection, different summarization methods. This summary evaluation technique allows for broader comparison of summarization methods than the traditional techniques of summary evaluation. We present an empirical study of two simple, albeit widely used, summarization methods that shows the different usages of this automated task-based evaluation system and confirms the results obtained with human-based evaluation methods over smaller corpora.


Author[s]: Jake V. Bouvrie

Multiple Resolution Image Classification

December 2002

Binary image classifiction is a problem that has received much attention in recent years. In this paper we evaluate a selection of popular techniques in an effort to find a feature set/ classifier combination which generalizes well to full resolution image data. We then apply that system to images at one-half through one-sixteenth resolution, and consider the corresponding error rates. In addition, we further observe generalization performance as it depends on the number of training images, and lastly, compare the system's best error rates to that of a human performing an identical classification task given teh same set of test images.


Author[s]: Jacob Beal

Leaderless Distributed Hierarchy Formation

December 2002

I present a system for robust leaderless organization of an amorphous network into hierarchical clusters. This system, which assumes that nodes are spatially embedded and can only talk to neighbors within a given radius, scales to networks of arbitrary size and converges rapidly. The amount of data stored at each node is logarithmic in the diameter of the network, and the hierarchical structure produces an addressing scheme such that there is an invertible relation between distance and address for any pair of nodes. The system adapts automatically to stopping failures, network partition, and reorganization.


Author[s]: Erik B. Sudderth, Alexander T. Ihler, William T. Freeman and Alan S. Willsky

Nonparametric Belief Propagation and Facial Appearance Estimation

December 2002

In many applications of graphical models arising in computer vision, the hidden variables of interest are most naturally specified by continuous, non-Gaussian distributions. There exist inference algorithms for discrete approximations to these continuous distributions, but for the high-dimensional variables typically of interest, discrete inference becomes infeasible. Stochastic methods such as particle filters provide an appealing alternative. However, existing techniques fail to exploit the rich structure of the graphical models describing many vision problems. Drawing on ideas from regularized particle filters and belief propagation (BP), this paper develops a nonparametric belief propagation (NBP) algorithm applicable to general graphs. Each NBP iteration uses an efficient sampling procedure to update kernel-based approximations to the true, continuous likelihoods. The algorithm can accomodate an extremely broad class of potential functions, including nonparametric representations. Thus, NBP extends particle filtering methods to the more general vision problems that graphical models can describe. We apply the NBP algorithm to infer component interrelationships in a parts-based face model, allowing location and reconstruction of occluded features.


Author[s]: Antonio Torralba and William T. Freeman

Properties and Applications of Shape Recipes

December 2002

In low-level vision, the representation of scene properties such as shape, albedo, etc., are very high dimensional as they have to describe complicated structures. The approach proposed here is to let the image itself bear as much of the representational burden as possible. In many situations, scene and image are closely related and it is possible to find a functional relationship between them. The scene information can be represented in reference to the image where the functional specifies how to translate the image into the associated scene. We illustrate the use of this representation for encoding shape information. We show how this representation has appealing properties such as locality and slow variation across space and scale. These properties provide a way of improving shape estimates coming from other sources of information like stereo.


Author[s]: Gerald Jay Sussman and Jack Wisdom

The Role of Programming in the Formulation of Ideas

November 2002

Classical mechanics is deceptively simple. It is surprisingly easy to get the right answer with fallacious reasoning or without real understanding. To address this problem we use computational techniques to communicate a deeper understanding of Classical Mechanics. Computational algorithms are used to express the methods used in the analysis of dynamical phenomena. Expressing the methods in a computer language forces them to be unambiguous and computationally effective. The task of formulating a method as a computer-executable program and debugging that program is a powerful exercise in the learning process. Also, once formalized procedurally, a mathematical idea becomes a tool that can be used directly to compute results.


Author[s]: Jack Wisdom

Swimming in Space-Time

November 2002

Cyclic changes in the shape of a quasi-rigid body on a curved manifold can lead to net translation and/or rotation of the body in the manifold. Presuming space-time is a curved manifold as portrayed by general relativity, translation in space can be accomplished simply by cyclic changes in the shape of a body, without any thrust or external forces.


Author[s]: William T. Freeman and Antonio Torralba

Shape Recipes: Scene Representations that Refer to the Image

September 2002

The goal of low-level vision is to estimate an underlying scene, given an observed image. Real-world scenes (e.g., albedos or shapes) can be very complex, conventionally requiring high dimensional representations which are hard to estimate and store. We propose a low-dimensional representation, called a scene recipe, that relies on the image itself to describe the complex scene configurations. Shape recipes are an example: these are the regression coefficients that predict the bandpassed shape from bandpassed image data. We describe the benefits of this representation, and show two uses illustrating their properties: (1) we improve stereo shape estimates by learning shape recipes at low resolution and applying them at full resolution; (2) Shape recipes implicitly contain information about lighting and materials and we use them for material segmentation.


Author[s]: Marshall F. Tappen, William T. Freeman and Edward H. Adelson

Recovering Intrinsic Images from a Single Image

September 2002

We present an algorithm that uses multiple cues to recover shading and reflectance intrinsic images from a single image. Using both color information and a classifier trained to recognize gray-scale patterns, each image derivative is classified as being caused by shading or a change in the surface's reflectance. Generalized Belief Propagation is then used to propagate information from areas where the correct classification is clear to areas where it is ambiguous. We also show results on real images.


Author[s]: Harald Steck and Tommi S. Jaakkola

On the Dirichlet Prior and Bayesian Regularization

September 2002

A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from high-throughput data sources. In this paper we examine how Bayesian regularization using a Dirichlet prior over the model parameters affects the learned model structure in a domain with discrete variables. Surprisingly, a weak prior in the sense of smaller equivalent sample size leads to a strong regularization of the model structure (sparse graph) given a sufficiently large data set. In particular, the empty graph is obtained in the limit of a vanishing strength of prior belief. This is diametrically opposite to what one may expect in this limit, namely the complete graph from an (unregularized) maximum likelihood estimate. Since the prior affects the parameters as expected, the prior strength balances a "trade-off" between regularizing the parameters or the structure of the model. We demonstrate the benefits of optimizing this trade-off in the sense of predictive accuracy.



Author[s]: M.A. Giese and X. Xie

Exact Solution of the Nonlinear Dynamics of Recurrent Neural Mechanisms for Direction Selectivity

August 2002

Different theoretical models have tried to investigate the feasibility of recurrent neural mechanisms for achieving direction selectivity in the visual cortex. The mathematical analysis of such models has been restricted so far to the case of purely linear networks. We present an exact analytical solution of the nonlinear dynamics of a class of direction selective recurrent neural models with threshold nonlinearity. Our mathematical analysis shows that such networks have form-stable stimulus-locked traveling pulse solutions that are appropriate for modeling the responses of direction selective cortical neurons. Our analysis shows also that the stability of such solutions can break down giving raise to a different class of solutions ("lurching activity waves") that are characterized by a specific spatio-temporal periodicity. These solutions cannot arise in models for direction selectivity with purely linear spatio-temporal filtering.



Author[s]: Martin Alexander Giese and Tomaso Poggio

Biologically Plausible Neural Model for the Recognition of Biological Motion and Actions

August 2002

The visual recognition of complex movements and actions is crucial for communication and survival in many species. Remarkable sensitivity and robustness of biological motion perception have been demonstrated in psychophysical experiments. In recent years, neurons and cortical areas involved in action recognition have been identified in neurophysiological and imaging studies. However, the detailed neural mechanisms that underlie the recognition of such complex movement patterns remain largely unknown. This paper reviews the experimental results and summarizes them in terms of a biologically plausible neural model. The model is based on the key assumption that action recognition is based on learned prototypical patterns and exploits information from the ventral and the dorsal pathway. The model makes specific predictions that motivate new experiments.


Author[s]: J.P. Grossman

Design and Evaluation of the Hamal Parallel Computer

December 5, 2002

Parallel shared-memory machines with hundreds or thousands of processor-memory nodes have been built; in the future we will see machines with millions or even billions of nodes. Associated with such large systems is a new set of design challenges. Many problems must be addressed by an architecture in order for it to be successful; of these, we focus on three in particular. First, a scalable memory system is required. Second, the network messaging protocol must be fault-tolerant. Third, the overheads of thread creation, thread management and synchronization must be extremely low. This thesis presents the complete system design for Hamal, a shared-memory architecture which addresses these concerns and is directly scalable to one million nodes. Virtual memory and distributed objects are implemented in a manner that requires neither inter-node synchronization nor the storage of globally coherent translations at each node. We develop a lightweight fault-tolerant messaging protocol that guarantees message delivery and idempotence across a discarding network. A number of hardware mechanisms provide efficient support for massive multithreading and fine-grained synchronization. Experiments are conducted in simulation, using a trace-driven network simulator to investigate the messaging protocol and a cycle-accurate simulator to evaluate the Hamal architecture. We determine implementation parameters for the messaging protocol which optimize performance. A discarding network is easier to design and can be clocked at a higher rate, and we find that with this protocol its performance can approach that of a non-discarding network. Our simulations of Hamal demonstrate the effectiveness of its thread management and synchronization primitives. In particular, we find register-based synchronization to be an extremely efficient mechanism which can be used to implement a software barrier with a latency of only 523 cycles on a 512 node machine.



Author[s]: Robert Schneider and Maximilian Riesenhuber

A Detailed Look at Scale and Translation Invariance in a Hierarchical Neural Model of Visual Object Recognition

August 2002

The HMAX model has recently been proposed by Riesenhuber & Poggio as a hierarchical model of position- and size-invariant object recognition in visual cortex. It has also turned out to model successfully a number of other properties of the ventral visual stream (the visual pathway thought to be crucial for object recognition in cortex), and particularly of (view- tuned) neurons in macaque inferotemporal cortex, the brain area at the top of the ventral stream. The original modeling study only used ``paperclip'' stimuli, as in the corresponding physiology experiment, and did not explore systematically how model units' invariance properties depended on model parameters. In this study, we aimed at a deeper understanding of the inner workings of HMAX and its performance for various parameter settings and ``natural'' stimulus classes. We examined HMAX responses for different stimulus sizes and positions systematically and found a dependence of model units' responses on stimulus position for which a quantitative description is offered. Interestingly, we find that scale invariance properties of hierarchical neural models are not independent of stimulus class, as opposed to translation invariance, even though both are affine transformations within the image plane.


Author[s]: John M. Van Eepoel

Achieving Real-Time Mode Estimation through Offline Compilation

October 22, 2002

As exploration of our solar system and outerspace move into the future, spacecraft are being developed to venture on increasingly challenging missions with bold objectives. The spacecraft tasked with completing these missions are becoming progressively more complex. This increases the potential for mission failure due to hardware malfunctions and unexpected spacecraft behavior. A solution to this problem lies in the development of an advanced fault management system. Fault management enables spacecraft to respond to failures and take repair actions so that it may continue its mission. The two main approaches developed for spacecraft fault management have been rule-based and model-based systems. Rules map sensor information to system behaviors, thus achieving fast response times, and making the actions of the fault management system explicit. These rules are developed by having a human reason through the interactions between spacecraft components. This process is limited by the number of interactions a human can reason about correctly. In the model-based approach, the human provides component models, and the fault management system reasons automatically about system wide interactions and complex fault combinations. This approach improves correctness, and makes explicit the underlying system models, whereas these are implicit in the rule- based approach. We propose a fault detection engine, Compiled Mode Estimation (CME) that unifies the strengths of the rule-based and model- based approaches. CME uses a compiled model to determine spacecraft behavior more accurately. Reasoning related to fault detection is compiled in an off-line process into a set of concurrent, localized diagnostic rules. These are then combined on-line along with sensor information to reconstruct the diagnosis of the system. These rules enable a human to inspect the diagnostic consequences of CME. Additionally, CME is capable of reasoning through component interactions automatically and still provide fast and correct responses. The implementation of this engine has been tested against the NEAR spacecraft advanced rule-based system, resulting in detection of failures beyond that of the rules. This evolution in fault detection will enable future missions to explore the furthest reaches of the solar system without the burden of human intervention to repair failed components.


Author[s]: Justin Werfel

Implementing Universal Computation in an Evolutionary System

July 2002

Evolutionary algorithms are a common tool in engineering and in the study of natural evolution. Here we take their use in a new direction by showing how they can be made to implement a universal computer. We consider populations of individuals with genes whose values are the variables of interest. By allowing them to interact with one another in a specified environment with limited resources, we demonstrate the ability to construct any arbitrary logic circuit. We explore models based on the limits of small and large populations, and show examples of such a system in action, implementing a simple logic circuit.


Author[s]: Ron O. Dror

Surface Reflectance Recognition and Real-World Illumination Statistics

October 2002

Humans distinguish materials such as metal, plastic, and paper effortlessly at a glance. Traditional computer vision systems cannot solve this problem at all. Recognizing surface reflectance properties from a single photograph is difficult because the observed image depends heavily on the amount of light incident from every direction. A mirrored sphere, for example, produces a different image in every environment. To make matters worse, two surfaces with different reflectance properties could produce identical images. The mirrored sphere simply reflects its surroundings, so in the right artificial setting, it could mimic the appearance of a matte ping-pong ball. Yet, humans possess an intuitive sense of what materials typically "look like" in the real world. This thesis develops computational algorithms with a similar ability to recognize reflectance properties from photographs under unknown, real-world illumination conditions. Real-world illumination is complex, with light typically incident on a surface from every direction. We find, however, that real-world illumination patterns are not arbitrary. They exhibit highly predictable spatial structure, which we describe largely in the wavelet domain. Although they differ in several respects from the typical photographs, illumination patterns share much of the regularity described in the natural image statistics literature. These properties of real-world illumination lead to predictable image statistics for a surface with given reflectance properties. We construct a system that classifies a surface according to its reflectance from a single photograph under unknown illuminination. Our algorithm learns relationships between surface reflectance and certain statistics computed from the observed image. Like the human visual system, we solve the otherwise underconstrained inverse problem of reflectance estimation by taking advantage of the statistical regularity of illumination. For surfaces with homogeneous reflectance properties and known geometry, our system rivals human performance.



Author[s]: Adlar J. Kim and Christian R. Shelton

Modeling Stock Order Flows and Learning Market-Making from Data

June 2002

Stock markets employ specialized traders, market-makers, designed to provide liquidity and volume to the market by constantly supplying both supply and demand. In this paper, we demonstrate a novel method for modeling the market as a dynamic system and a reinforcement learning algorithm that learns profitable market-making strategies when run on this model. The sequence of buys and sells for a particular stock, the order flow, we model as an Input-Output Hidden Markov Model fit to historical data. When combined with the dynamics of the order book, this creates a highly non-linear and difficult dynamic system. Our reinforcement learning algorithm, based on likelihood ratios, is run on this partially-observable environment. We demonstrate learning results for two separate real stocks.



Author[s]: Vinay P. Kumar

Towards Man-Machine Interfaces: Combining Top-down Constraints with Bottom-up Learning in Facial Analysis

September 2002

This thesis proposes a methodology for the design of man-machine interfaces by combining top-down and bottom-up processes in vision. From a computational perspective, we propose that the scientific-cognitive question of combining top- down and bottom-up knowledge is similar to the engineering question of labeling a training set in a supervised learning problem. We investigate these questions in the realm of facial analysis. We propose the use of a linear morphable model (LMM) for representing top-down structure and use it to model various facial variations such as mouth shapes and expression, the pose of faces and visual speech (visemes). We apply a supervised learning method based on support vector machine (SVM) regression for estimating the parameters of LMMs directly from pixel-based representations of faces. We combine these methods for designing new, more self- contained systems for recognizing facial expressions, estimating facial pose and for recognizing visemes.


Author[s]: Andrew "bunnie" Huang

Keeping Secrets in Hardware: the Microsoft Xbox(TM) Case Study

May 26, 2002

This paper discusses the hardware foundations of the cryptosystem employed by the Xbox(TM) video game console from Microsoft. A secret boot block overlay is buried within a system ASIC. This secret boot block decrypts and verifies portions of an external FLASH-type ROM. The presence of the secret boot block is camouflaged by a decoy boot block in the external ROM. The code contained within the secret boot block is transferred to the CPU in the clear over a set of high-speed busses where it can be extracted using simple custom hardware. The paper concludes with recommendations for improving the Xbox security system. One lesson of this study is that the use of a high-performance bus alone is not a sufficient security measure, given the advent of inexpensive, fast rapid prototyping services and high-performance FPGAs.


Author[s]: Carl Steinbach

A Reinforcement-Learning Approach to Power Management

May 2002

We describe an adaptive, mid-level approach to the wireless device power management problem. Our approach is based on reinforcement learning, a machine learning framework for autonomous agents. We describe how our framework can be applied to the power management problem in both infrastructure and ad~hoc wireless networks. From this thesis we conclude that mid-level power management policies can outperform low-level policies and are more convenient to implement than high-level policies. We also conclude that power management policies need to adapt to the user and network, and that a mid-level power management framework based on reinforcement learning fulfills these requirements.



Author[s]: Ulf Knoblich, David J. Freedman and Maximilian Riesenhuber

Categorization in IT and PFC: Model and Experiments

April 18, 2002

In a recent experiment, Freedman et al. recorded from inferotemporal (IT) and prefrontal cortices (PFC) of monkeys performing a "cat/dog" categorization task (Freedman 2001 and Freedman, Riesenhuber, Poggio, Miller 2001). In this paper we analyze the tuning properties of view-tuned units in our HMAX model of object recognition in cortex (Riesenhuber 1999) using the same paradigm and stimuli as in the experiment. We then compare the simulation results to the monkey inferotemporal neuron population data. We find that view-tuned model IT units that were trained without any explicit category information can show category-related tuning as observed in the experiment. This suggests that the tuning properties of experimental IT neurons might primarily be shaped by bottom-up stimulus-space statistics, with little influence of top-down task-specific information. The population of experimental PFC neurons, on the other hand, shows tuning properties that cannot be explained just by stimulus tuning. These analyses are compatible with a model of object recognition in cortex (Riesenhuber 2000) in which a population of shape-tuned neurons provides a general basis for neurons tuned to different recognition tasks.


Author[s]: Andrew "bunnie" Huang

ADAM: A Decentralized Parallel Computer Architecture Featuring Fast Thread and Data Migration and a Uniform Hardware Abstraction

June 2002

The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.


Author[s]: Sarah Finney, Natalia H. Gardiol, Leslie Pack Kaelbling and Tim Oates

Learning with Deictic Representation

April 10, 2002

Most reinforcement learning methods operate on propositional representations of the world state. Such representations are often intractably large and generalize poorly. Using a deictic representation is believed to be a viable alternative: they promise generalization while allowing the use of existing reinforcement-learning methods. Yet, there are few experiments on learning with deictic representations reported in the literature. In this paper we explore the effectiveness of two forms of deictic representation and a naive propositional representation in a simple blocks-world domain. We find, empirically, that the deictic representations actually worsen performance. We conclude with a discussion of possible causes of these results and strategies for more effective learning in domains with objects.


Author[s]: Gregory T. Sullivan

Advanced Programming Language Features for Executable Design Patterns "Better Patterns Through Reflection

March 22, 2002

The Design Patterns book [GOF95] presents 24 time-tested patterns that consistently appear in well-designed software systems. Each pattern is presented with a description of the design problem the pattern addresses, as well as sample implementation code and design considerations. This paper explores how the patterns from the "Gang of Four'', or "GOF'' book, as it is often called, appear when similar problems are addressed using a dynamic, higher-order, object-oriented programming language. Some of the patterns disappear -- that is, they are supported directly by language features, some patterns are simpler or have a different focus, and some are essentially unchanged.


Author[s]: Jeremy Hanford Brown

Sparsely Faceted Arrays: A Mechanism Supporting Parallel Allocation, Communication, and Garbage Collection

June 2002

Conventional parallel computer architectures do not provide support for non-uniformly distributed objects. In this thesis, I introduce sparsely faceted arrays (SFAs), a new low- level mechanism for naming regions of memory, or facets, on different processors in a distributed, shared memory parallel processing system. Sparsely faceted arrays address the disconnect between the global distributed arrays provided by conventional architectures (e.g. the Cray T3 series), and the requirements of high-level parallel programming methods that wish to use objects that are distributed over only a subset of processing elements. A sparsely faceted array names a virtual globally-distributed array, but actual facets are lazily allocated. By providing simple semantics and making efficient use of memory, SFAs enable efficient implementation of a variety of non-uniformly distributed data structures and related algorithms. I present example applications which use SFAs, and describe and evaluate simple hardware mechanisms for implementing SFAs. Keeping track of which nodes have allocated facets for a particular SFA is an important task that suggests the need for automatic memory management, including garbage collection. To address this need, I first argue that conventional tracing techniques such as mark/sweep and copying GC are inherently unscalable in parallel systems. I then present a parallel memory-management strategy, based on reference-counting, that is capable of garbage collecting sparsely faceted arrays. I also discuss opportunities for hardware support of this garbage collection strategy. I have implemented a high-level hardware/OS simulator featuring hardware support for sparsely faceted arrays and automatic garbage collection. I describe the simulator and outline a few of the numerous details associated with a "real" implementation of SFAs and SFA-aware garbage collection. Simulation results are used throughout this thesis in the evaluation of hardware support mechanisms.



Author[s]: Ulf Knoblich and Maximilan Riesenhuber

Stimulus Simplification and Object Representation: A Modeling Study

March 15, 2002

Tsunoda et al. (2001) recently studied the nature of object representation in monkey inferotemporal cortex using a combination of optical imaging and extracellular recordings. In particular, they examined IT neuron responses to complex natural objects and "simplified" versions thereof. In that study, in 42% of the cases, optical imaging revealed a decrease in the number of activation patches in IT as stimuli were "simplified". However, in 58% of the cases, "simplification" of the stimuli actually led to the appearance of additional activation patches in IT. Based on these results, the authors propose a scheme in which an object is represented by combinations of active and inactive columns coding for individual features. We examine the patterns of activation caused by the same stimuli as used by Tsunoda et al. in our model of object recognition in cortex (Riesenhuber 99). We find that object-tuned units can show a pattern of appearance and disappearance of features identical to the experiment. Thus, the data of Tsunoda et al. appear to be in quantitative agreement with a simple object-based representation in which an object's identity is coded by its similarities to reference objects. Moreover, the agreement of simulations and experiment suggests that the simplification procedure used by Tsunoda (2001) is not necessarily an accurate method to determine neuronal tuning.


Author[s]: Teodoro Arvizo III

A Virtual Machine for a Type-omega Denotational Proof Language

June 2002

In this thesis, I designed and implemented a virtual machine (VM) for a monomorphic variant of Athena, a type-omega denotational proof language (DPL). This machine attempts to maintain the minimum state required to evaluate Athena phrases. This thesis also includes the design and implementation of a compiler for monomorphic Athena that compiles to the VM. Finally, it includes details on my implementation of a read-eval-print loop that glues together the VM core and the compiler to provide a full, user-accessible interface to monomorphic Athena. The Athena VM provides the same basis for DPLs that the SECD machine does for pure, functional programming and the Warren Abstract Machine does for Prolog.


Author[s]: Joanna J. Bryson

Intelligence by Design: Principles of Modularity and Coordination for Engineerin

September 2001

All intelligence relies on search --- for example, the search for an intelligent agent's next action. Search is only likely to succeed in resource-bounded agents if they have already been biased towards finding the right answer. In artificial agents, the primary source of bias is engineering. This dissertation describes an approach, Behavior-Oriented Design (BOD) for engineering complex agents. A complex agent is one that must arbitrate between potentially conflicting goals or behaviors. Behavior-oriented design builds on work in behavior-based and hybrid architectures for agents, and the object oriented approach to software engineering. The primary contributions of this dissertation are: 1.The BOD architecture: a modular architecture with each module providing specialized representations to facilitate learning. This includes one pre-specified module and representation for action selection or behavior arbitration. The specialized representation underlying BOD action selection is Parallel-rooted, Ordered, Slip-stack Hierarchical (POSH) reactive plans. 2.The BOD development process: an iterative process that alternately scales the agent's capabilities then optimizes the agent for simplicity, exploiting tradeoffs between the component representations. This ongoing process for controlling complexity not only provides bias for the behaving agent, but also facilitates its maintenance and extendibility. The secondary contributions of this dissertation include two implementations of POSH action selection, a procedure for identifying useful idioms in agent architectures and using them to distribute knowledge across agent paradigms, several examples of applying BOD idioms to established architectures, an analysis and comparison of the attributes and design trends of a large number of agent architectures, a comparison of biological (particularly mammalian) intelligence to artificial agent architectures, a novel model of primate transitive inference, and many other examples of BOD agents and BOD development.



Author[s]: Tomaso Poggio, Ryan Rifkin, Sayan Mukherjee and Alex Rakhlin

Bagging Regularizes

March 2002

Intuitively, we expect that averaging --- or bagging --- different regressors with low correlation should smooth their behavior and be somewhat similar to regularization. In this note we make this intuition precise. Using an almost classical definition of stability, we prove that a certain form of averaging provides generalization bounds with a rate of convergence of the same order as Tikhonov regularization --- similar to fashionable RKHS- based learning algorithms.


Author[s]: Jacob Beal

Generating Communications Systems Through Shared Context

January 2002

In a distributed model of intelligence, peer components need to communicate with one another. I present a system which enables two agents connected by a thick twisted bundle of wires to bootstrap a simple communication system from observations of a shared environment. The agents learn a large vocabulary of symbols, as well as inflections on those symbols which allow thematic role-frames to be transmitted. Language acquisition time is rapid and linear in the number of symbols and inflections. The final communication system is robust and performance degrades gradually in the face of problems.


Author[s]: William T. Freeman and Hao Zhang

Shape-Time Photography

January 10, 2002

We introduce a new method to describe, in a single image, changes in shape over time. We acquire both range and image information with a stationary stereo camera. From the pictures taken, we display a composite image consisting of the image data from the surface closest to the camera at every pixel. This reveals the 3-d relationships over time by easy-to-interpret occlusion relationships in the composite image. We call the composite a shape-time photograph. Small errors in depth measurements cause artifacts in the shape-time images. We correct most of these using a Markov network to estimate the most probable front surface, taking into account the depth measurements, their uncertainties, and layer continuity assumptions.


Author[s]: Trevor Darrell, Neal Checka, Alice Oh and Louis-Philippe Morency

Exploring Vision-Based Interfaces: How to Use Your Head in Dual Pointing Tasks

January 2002

The utility of vision-based face tracking for dual pointing tasks is evaluated. We first describe a 3-D face tracking technique based on real-time parametric motion-stereo, which is non-invasive, robust, and self-initialized. The tracker provides a real-time estimate of a ?frontal face ray? whose intersection with the display surface plane is used as a second stream of input for scrolling or pointing, in paral-lel with hand input. We evaluated the performance of com-bined head/hand input on a box selection and coloring task: users selected boxes with one pointer and colors with a second pointer, or performed both tasks with a single pointer. We found that performance with head and one hand was intermediate between single hand performance and dual hand performance. Our results are consistent with previously reported dual hand conflict in symmetric pointing tasks, and suggest that a head-based input stream should be used for asymmetric control.


Author[s]: Lilla Zollei

2D-3D Rigid-Body Registration of X-Ray Fluoroscopy and CT Images

August 2001

The registration of pre-operative volumetric datasets to intra- operative two-dimensional images provides an improved way of verifying patient position and medical instrument loca- tion. In applications from orthopedics to neurosurgery, it has a great value in maintaining up-to-date information about changes due to intervention. We propose a mutual information- based registration algorithm to establish the proper align- ment. For optimization purposes, we compare the perfor- mance of the non-gradient Powell method and two slightly di erent versions of a stochastic gradient ascent strategy: one using a sparsely sampled histogramming approach and the other Parzen windowing to carry out probability density approximation. Our main contribution lies in adopting the stochastic ap- proximation scheme successfully applied in 3D-3D registra- tion problems to the 2D-3D scenario, which obviates the need for the generation of full DRRs at each iteration of pose op- timization. This facilitates a considerable savings in compu- tation expense. We also introduce a new probability density estimator for image intensities via sparse histogramming, de- rive gradient estimates for the density measures required by the maximization procedure and introduce the framework for a multiresolution strategy to the problem. Registration results are presented on uoroscopy and CT datasets of a plastic pelvis and a real skull, and on a high-resolution CT- derived simulated dataset of a real skull, a plastic skull, a plastic pelvis and a plastic lumbar spine segment.



Author[s]: Antonio Torralba and Aude Oliva

Global Depth Perception from Familiar Scene Structure

December 2001

In the absence of cues for absolute depth measurements as binocular disparity, motion, or defocus, the absolute distance between the observer and a scene cannot be measured. The interpretation of shading, edges and junctions may provide a 3D model of the scene but it will not inform about the actual "size" of the space. One possible source of information for absolute depth estimation is the image size of known objects. However, this is computationally complex due to the difficulty of the object recognition process. Here we propose a source of information for absolute depth estimation that does not rely on specific objects: we introduce a procedure for absolute depth estimation based on the recognition of the whole scene. The shape of the space of the scene and the structures present in the scene are strongly related to the scale of observation. We demonstrate that, by recognizing the properties of the structures present in the image, we can infer the scale of the scene, and therefore its absolute mean depth. We illustrate the interest in computing the mean depth of the scene with application to scene recognition and object detection.



Author[s]: Andrew Yip and Pawan Sinha

Role of color in face recognition

December 13, 2001

One of the key challenges in face perception lies in determining the contribution of different cues to face identification. In this study, we focus on the role of color cues. Although color appears to be a salient attribute of faces, past research has suggested that it confers little recognition advantage for identifying people. Here we report experimental results suggesting that color cues do play a role in face recognition and their contribution becomes evident when shape cues are degraded. Under such conditions, recognition performance with color images is significantly better than that with grayscale images. Our experimental results also indicate that the contribution of color may lie not so much in providing diagnostic cues to identity as in aiding low-level image-analysis processes such as segmentation.



Author[s]: Maximilian Riesenhuber

Generalization over contrast and mirror reversal, but not figure-ground reversal, in an "edge-based

December 10, 2001

Baylis & Driver (Nature Neuroscience, 2001) have recently presented data on the response of neurons in macaque inferotemporal cortex (IT) to various stimulus transformations. They report that neurons can generalize over contrast and mirror reversal, but not over figure-ground reversal. This finding is taken to demonstrate that ``the selectivity of IT neurons is not determined simply by the distinctive contours in a display, contrary to simple edge-based models of shape recognition'', citing our recently presented model of object recognition in cortex (Riesenhuber & Poggio, Nature Neuroscience, 1999). In this memo, I show that the main effects of the experiment can be obtained by performing the appropriate simulations in our simple feedforward model. This suggests for IT cell tuning that the possible contributions of explicit edge assignment processes postulated in (Baylis & Driver, 2001) might be smaller than expected.


Author[s]: Ron O. Dror, Edward H. Adelson, and Alan S. Willsky

Recognition of Surface Reflectance Properties from a Single Image under Unknown Real-World Illumination

October 21, 2001

This paper describes a machine vision system that classifies reflectance properties of surfaces such as metal, plastic, or paper, under unknown real-world illumination. We demonstrate performance of our algorithm for surfaces of arbitrary geometry. Reflectance estimation under arbitrary omnidirectional illumination proves highly underconstrained. Our reflectance estimation algorithm succeeds by learning relationships between surface reflectance and certain statistics computed from an observed image, which depend on statistical regularities in the spatial structure of real-world illumination. Although the algorithm assumes known geometry, its statistical nature makes it robust to inaccurate geometry estimates.


Author[s]: Roland W. Fleming, Ron O. Dror, Edward H. Adelson

How do Humans Determine Reflectance Properties under Unknown Illumination?

October 21, 2001

Under normal viewing conditions, humans find it easy to distinguish between objects made out of different materials such as plastic, metal, or paper. Untextured materials such as these have different surface reflectance properties, including lightness and gloss. With single isolated images and unknown illumination conditions, the task of estimating surface reflectance is highly underconstrained, because many combinations of reflection and illumination are consistent with a given image. In order to work out how humans estimate surface reflectance properties, we asked subjects to match the appearance of isolated spheres taken out of their original contexts. We found that subjects were able to perform the task accurately and reliably without contextual information to specify the illumination. The spheres were rendered under a variety of artificial illuminations, such as a single point light source, and a number of photographically-captured real-world illuminations from both indoor and outdoor scenes. Subjects performed more accurately for stimuli viewed under real-world patterns of illumination than under artificial illuminations, suggesting that subjects use stored assumptions about the regularities of real-world illuminations to solve the ill-posed problem.


Author[s]: Konstantine Arkoudas

Simplifying transformations for type-alpha certificates

November 13, 2001

This paper presents an algorithm for simplifying NDL deductions. An array of simplifying transformations are rigorously defined. They are shown to be terminating, and to respect the formal semantis of the language. We also show that the transformations never increase the size or complexity of a deduction---in the worst case, they produce deductions of the same size and complexity as the original. We present several examples of proofs containing various types of "detours", and explain how our procedure eliminates them, resulting in smaller and cleaner deductions. All of the given transformations are fully implemented in SML-NJ. The complete code listing is presented, along with explanatory comments. Finally, although the transformations given here are defined for NDL, we point out that they can be applied to any type-alpha DPL that satisfies a few simple conditions.


Author[s]: Adrian Corduneanu and Tommi Jaakkola

Stable Mixing of Complete and Incomplete Information

November 8, 2001

An increasing number of parameter estimation tasks involve the use of at least two information sources, one complete but limited, the other abundant but incomplete. Standard algorithms such as EM (or em) used in this context are unfortunately not stable in the sense that they can lead to a dramatic loss of accuracy with the inclusion of incomplete observations. We provide a more controlled solution to this problem through differential equations that govern the evolution of locally optimal solutions (fixed points) as a function of the source weighting. This approach permits us to explicitly identify any critical (bifurcation) points leading to choices unsupported by the available complete data. The approach readily applies to any graphical model in O(n^3) time where n is the number of parameters. We use the naive Bayes model to illustrate these ideas and demonstrate the effectiveness of our approach in the context of text classification problems.



Author[s]: Yuri Ostrovsky, Patrick Cavanagh and Pawan Sinha

Perceiving Illumination Inconsistencies in Scenes

November 5, 2001

The human visual system is adept at detecting and encoding statistical regularities in its spatio-temporal environment. Here we report an unexpected failure of this ability in the context of perceiving inconsistencies in illumination distributions across a scene. Contrary to predictions from previous studies [Enns and Rensink, 1990; Sun and Perona, 1996a, 1996b, 1997], we find that the visual system displays a remarkable lack of sensitivity to illumination inconsistencies, both in experimental stimuli and in images of real scenes. Our results allow us to draw inferences regarding how the visual system encodes illumination distributions across scenes. Specifically, they suggest that the visual system does not verify the global consistency of locally derived estimates of illumination direction.



Author[s]: Antonio Torralba and Pawan Sinha

Detecting Faces in Impoverished Images

November 5, 2001

The ability to detect faces in images is of critical ecological significance. It is a pre-requisite for other important face perception tasks such as person identification, gender classification and affect analysis. Here we address the question of how the visual system classifies images into face and non-face patterns. We focus on face detection in impoverished images, which allow us to explore information thresholds required for different levels of performance. Our experimental results provide lower bounds on image resolution needed for reliable discrimination between face and non-face patterns and help characterize the nature of facial representations used by the visual system under degraded viewing conditions. Specifically, they enable an evaluation of the contribution of luminance contrast, image orientation and local context on face-detection performance.


Author[s]: Konstantine Arkoudas

Type-omega DPLs

October 16, 2001

Type-omega DPLs (Denotational Proof Languages) are languages for proof presentation and search that offer strong soundness guarantees. LCF-type systems such as HOL offer similar guarantees, but their soundness relies heavily on static type systems. By contrast, DPLs ensure soundness dynamically, through their evaluation semantics; no type system is necessary. This is possible owing to a novel two-tier syntax that separates deductions from computations, and to the abstraction of assumption bases, which is factored into the semantics of the language and allows for sound evaluation. Every type-omega DPL properly contains a type-alpha DPL, which can be used to present proofs in a lucid and detailed form, exclusively in terms of primitive inference rules. Derived inference rules are expressed as user-defined methods, which are "proof recipes" that take arguments and dynamically perform appropriate deductions. Methods arise naturally via parametric abstraction over type-alpha proofs. In that light, the evaluation of a method call can be viewed as a computation that carries out a type-alpha deduction. The type-alpha proof "unwound" by such a method call is called the "certificate" of the call. Certificates can be checked by exceptionally simple type-alpha interpreters, and thus they are useful whenever we wish to minimize our trusted base. Methods are statically closed over lexical environments, but dynamically scoped over assumption bases. They can take other methods as arguments, they can iterate, and they can branch conditionally. These capabilities, in tandem with the bifurcated syntax of type-omega DPLs and their dynamic assumption-base semantics, allow the user to define methods in a style that is disciplined enough to ensure soundness yet fluid enough to permit succinct and perspicuous expression of arbitrarily sophisticated derived inference rules. We demonstrate every major feature of type-omega DPLs by defining and studying NDL-omega, a higher-order, lexically scoped, call-by-value type-omega DPL for classical zero-order natural deduction---a simple choice that allows us to focus on type-omega syntax and semantics rather than on the subtleties of the underlying logic. We start by illustrating how type-alpha DPLs naturally lead to type-omega DPLs by way of abstraction; present the formal syntax and semantics of NDL-omega; prove several results about it, including soundness; give numerous examples of methods; point out connections to the lambda-phi calculus, a very general framework for type-omega DPLs; introduce a notion of computational and deductive cost; define several instrumented interpreters for computing such costs and for generating certificates; explore the use of type-omega DPLs as general programming languages; show that DPLs do not have to be type-less by formulating a static Hindley-Milner polymorphic type system for NDL-omega; discuss some idiosyncrasies of type-omega DPLs such as the potential divergence of proof checking; and compare type-omega DPLs to other approaches to proof presentation and discovery. Finally, a complete implementation of NDL-omega in SML-NJ is given for users who want to run the examples and experiment with the language.



Author[s]: Jason D. M. Rennie and Ryan Rifkin

Improving Multiclass Text Classification with the Support Vector Machine

October 16, 2001

We compare Naive Bayes and Support Vector Machines on the task of multiclass text classification. Using a variety of approaches to combine the underlying binary classifiers, we find that SVMs substantially outperform Naive Bayes. We present full multiclass results on two well-known text data sets, including the lowest error to date on both data sets. We develop a new indicator of binary performance to show that the SVM's lower multiclass error is a result of its improved binary performance. Furthermore, we demonstrate and explore the surprising result that one-vs-all classification performs favorably compared to other approaches even though it has no error-correcting properties.


Author[s]: Konstantine Arkoudas

Type-alpha DPLs

October 5, 2001

This paper introduces Denotational Proof Languages (DPLs). DPLs are languages for presenting, discovering, and checking formal proofs. In particular, in this paper we discus type-alpha DPLs---a simple class of DPLs for which termination is guaranteed and proof checking can be performed in time linear in the size of the proof. Type-alpha DPLs allow for lucid proof presentation and for efficient proof checking, but not for proof search. Type-omega DPLs allow for search as well as simple presentation and checking, but termination is no longer guaranteed and proof checking may diverge. We do not study type-omega DPLs here. We start by listing some common characteristics of DPLs. We then illustrate with a particularly simple example: a toy type-alpha DPL called PAR, for deducing parities. We present the abstract syntax of PAR, followed by two different kinds of formal semantics: evaluation and denotational. We then relate the two semantics and show how proof checking becomes tantamount to evaluation. We proceed to develop the proof theory of PAR, formulating and studying certain key notions such as observational equivalence that pervade all DPLs. We then present NDL, a type-alpha DPL for classical zero-order natural deduction. Our presentation of NDL mirrors that of PAR, showing how every basic concept that was introduced in PAR resurfaces in NDL. We present sample proofs of several well-known tautologies of propositional logic that demonstrate our thesis that DPL proofs are readable, writable, and concise. Next we contrast DPLs to typed logics based on the Curry-Howard isomorphism, and discuss the distinction between pure and augmented DPLs. Finally we consider the issue of implementing DPLs, presenting an implementation of PAR in SML and one in Athena, and end with some concluding remarks.


Author[s]: Leonid Taycher and Trevor Darrell

Range Segmentation Using Visibility Constraints

September 2001

Visibility constraints can aid the segmentation of foreground objects observed with multiple range images. In our approach, points are defined as foreground if they can be determined to occlude some {em empty space} in the scene. We present an efficient algorithm to estimate foreground points in each range view using explicit epipolar search. In cases where the background pattern is stationary, we show how visibility constraints from other views can generate virtual background values at points with no valid depth in the primary view. We demonstrate the performance of both algorithms for detecting people in indoor office environments.


Author[s]: Ron O. Dror, Edward H. Adelson and Alan S. Willsky

Surface Reflectance Estimation and Natural Illumination Statistics

September 2001

Humans recognize optical reflectance properties of surfaces such as metal, plastic, or paper from a single image without knowledge of illumination. We develop a machine vision system to perform similar recognition tasks automatically. Reflectance estimation under unknown, arbitrary illumination proves highly underconstrained due to the variety of potential illumination distributions and surface reflectance properties. We have found that the spatial structure of real-world illumination possesses some of the statistical regularities observed in the natural image statistics literature. A human or computer vision system may be able to exploit this prior information to determine the most likely surface reflectance given an observed image. We develop an algorithm for reflectance classification under unknown real-world illumination, which learns relationships between surface reflectance and certain features (statistics) computed from a single observed image. We also develop an automatic feature selection method.



Author[s]: Angela J. Yu, Martin A. Giese and Tomaso A. Poggio

Biologically Plausible Neural Circuits for Realization of Maximum Operations

September 2001

Object recognition in the visual cortex is based on a hierarchical architecture, in which specialized brain regions along the ventral pathway extract object features of increasing levels of complexity, accompanied by greater invariance in stimulus size, position, and orientation. Recent theoretical studies postulate a non-linear pooling function, such as the maximum (MAX) operation could be fundamental in achieving such invariance. In this paper, we are concerned with neurally plausible mechanisms that may be involved in realizing the MAX operation. Four canonical circuits are proposed, each based on neural mechanisms that have been previously discussed in the context of cortical processing. Through simulations and mathematical analysis, we examine the relative performance and robustness of these mechanisms. We derive experimentally verifiable predictions for each circuit and discuss their respective physiological considerations.


Author[s]: Erik G. Miller, Kinh Tieu and Chris P. Stauffer

Learning Object-Independent Modes of Variation with Feature Flow Fields

September 2001

We present a unifying framework in which "object-independent" modes of variation are learned from continuous-time data such as video sequences. These modes of variation can be used as "generators" to produce a manifold of images of a new object from a single example of that object. We develop the framework in the context of a well-known example: analyzing the modes of spatial deformations of a scene under camera movement. Our method learns a close approximation to the standard affine deformations that are expected from the geometry of the situation, and does so in a completely unsupervised (i.e. ignorant of the geometry of the situation) fashion. We stress that it is learning a "parameterization", not just the parameter values, of the data. We then demonstrate how we have used the same framework to derive a novel data-driven model of joint color change in images due to common lighting variations. The model is superior to previous models of color change in describing non-linear color changes due to lighting.



Author[s]: Antonio Torralba and Pawan Sinha

Contextual Priming for Object Detection

September 2001

There is general consensus that context can be a rich source of information about an object's identity, location and scale. In fact, the structure of many real-world scenes is governed by strong configurational rules akin to those that apply to a single object. Here we introduce a simple probabilistic framework for modeling the relationship between context and object properties based on the correlation between the statistics of low-level features across the entire scene and the objects that it contains. The resulting scheme serves as an effective procedure for object priming, context driven focus of attention and automatic scale-selection on real-world scenes.


Author[s]: Lily Lee

Gait Dynamics for Recognition and Classification

September 2001

This paper describes a representation of the dynamics of human walking action for the purpose of person identification and classification by gait appearance. Our gait representation is based on simple features such as moments extracted from video silhouettes of human walking motion. We claim that our gait dynamics representation is rich enough for the task of recognition and classification. The use of our feature representation is demonstrated in the task of person recognition from video sequences of orthogonal views of people walking. We demonstrate the accuracy of recognition on gait video sequences collected over different days and times, and under varying lighting environments. In addition, preliminary results are shown on gender classification using our gait dynamics features.



Author[s]: Gene Yeo, Tomaso Poggio

Multiclass Classification of SRBCTs

August 25, 2001

A novel approach to multiclass tumor classification using Artificial Neural Networks (ANNs) was introduced in a recent paper cite{Khan2001}. The method successfully classified and diagnosed small, round blue cell tumors (SRBCTs) of childhood into four distinct categories, neuroblastoma (NB), rhabdomyosarcoma (RMS), non-Hodgkin lymphoma (NHL) and the Ewing family of tumors (EWS), using cDNA gene expression profiles of samples that included both tumor biopsy material and cell lines. We report that using an approach similar to the one reported by Yeang et al cite{Yeang2001}, i.e. multiclass classification by combining outputs of binary classifiers, we achieved equal accuracy with much fewer features. We report the performances of 3 binary classifiers (k-nearest neighbors (kNN), weighted-voting (WV), and support vector machines (SVM)) with 3 feature selection techniques (Golub's Signal to Noise (SN) ratios cite{Golub99}, Fisher scores (FSc) and Mukherjee's SVM feature selection (SVMFS))cite{Sayan98}.



Author[s]: Pawan Sinha and Antonio Torralba

Role of Low-level Mechanisms in Brightness Perception

August 2001

Brightness judgments are a key part of the primate brain’s visual analysis of the environment. There is general consensus that the perceived brightness of an image region is based not only on its actual luminance, but also on the photometric structure of its neighborhood. However, it is unclear precisely how a region’s context influences its perceived brightness. Recent research has suggested that brightness estimation may be based on a sophisticated analysis of scene layout in terms of transparency, illumination and shadows. This work has called into question the role of low-level mechanisms, such as lateral inhibition, as explanations for brightness phenomena. Here we describe experiments with displays for which low-level and high-level analyses make qualitatively different predictions, and with which we can quantitatively assess the trade-offs between low-level and high-level factors. We find that brightness percepts in these displays are governed by low-level stimulus properties, even when these percepts are inconsistent with higher-level interpretations of scene layout. These results point to the important role of low-level mechanisms in determining brightness percepts.


Author[s]: Jacob Beal

An Algorithm for Bootstrapping Communications

August 13, 2001

I present an algorithm which allows two agents to generate a simple language based only on observations of a shared environment. Vocabulary and roles for the language are learned in linear time. Communication is robust and degrades gradually as complexity increases. Dissimilar modes of experience will lead to a shared kernel vocabulary.



Author[s]: Antonio Torralba, Pawan Sinha

Recognizing Indoor Scenes

July 25, 2001

We propose a scheme for indoor place identification based on the recognition of global scene views. Scene views are encoded using a holistic representation that provides low-resolution spatial and spectral information. The holistic nature of the representation dispenses with the need to rely on specific objects or local landmarks and also renders it robust against variations in object configurations. We demonstrate the scheme on the problem of recognizing scenes in video sequences captured while walking through an office environment. We develop a method for distinguishing between 'diagnostic' and 'generic' views and also evaluate changes in system performances as a function of the amount of training data available and the complexity of the representation.



Author[s]: Richard Russell and Pawan Sinha

Perceptually-based Comparison of Image Similarity Metrics

July 2001

The image comparison operation – assessing how well one image matches another – forms a critical component of many image analysis systems and models of human visual processing. Two norms used commonly for this purpose are L1 and L2, which are specific instances of the Minkowski metric. However, there is often not a principled reason for selecting one norm over the other. One way to address this problem is by examining whether one metric better captures the perceptual notion of image similarity than the other. With this goal, we examined perceptual preferences for images retrieved on the basis of the L1 versus the L2 norm. These images were either small fragments without recognizable content, or larger patterns with recognizable content created via vector quantization. In both conditions the subjects showed a consistent preference for images matched using the L1 metric. These results suggest that, in the domain of natural images of the kind we have used, the L1 metric may better capture human notions of image similarity.



Author[s]: Nicholas T. Chan, Ely Dahan, Andrew W. Lo and Tomaso Poggio

Experimental Markets for Product Concepts

July 2001

Market prices are well known to efficiently collect and aggregate diverse information regarding the value of commodities and assets. The role of markets has been particularly suitable to pricing financial securities. This article provides an alternative application of the pricing mechanism to marketing research - using pseudo-securities markets to measure preferences over new product concepts. Surveys, focus groups, concept tests and conjoint studies are methods traditionally used to measure individual and aggregate preferences. Unfortunately, these methods can be biased, costly and time-consuming to conduct. The present research is motivated by the desire to efficiently measure preferences and more accurately predict new product success, based on the efficiency and incentive-compatibility of security trading markets. The article describes a novel market research method, pro-vides insight into why the method should work, and compares the results of several trading experiments against other methodologies such as concept testing and conjoint analysis.



Author[s]: Mariano Alvira, Jim Paris and Ryan Rifkin

The Audiomomma Music Recommendation System

July 2001

We design and implement a system that recommends musicians to listeners. The basic idea is to keep track of what artists a user listens to, to find other users with similar tastes, and to recommend other artists that these similar listeners enjoy. The system utilizes a client-server architecture, a web-based interface, and an SQL database to store and process information. We describe Audiomomma-0.3, a proof-of-concept implementation of the above ideas.



Author[s]: T. Poggio, S. Mukherjee, R. Rifkin, A. Rakhlin, and A. Verri


July 2001

In this note we characterize the role of b ,which is the constant in the standard form of the solution provided by the Support Vector Machine technique f (x )=  i =1 • i K (x ,x i )+b .



Author[s]: Purdy Ho

Rotation Invariant Real-time Face Detection and Recognition System

May 31, 2001

In this report, a face recognition system that is capable of detecting and recognizing frontal and rotated faces was developed. Two face recognition methods focusing on the aspect of pose invariance are presented and evaluated - the whole face approach and the component-based approach. The main challenge of this project is to develop a system that is able to identify faces under different viewing angles in realtime. The development of such a system will enhance the capability and robustness of current face recognition technology. The whole-face approach recognizes faces by classifying a single feature vector consisting of the gray values of the whole face image. The component-based approach first locates the facial components and extracts them. These components are normalized and combined into a single feature vector for classification. The Support Vector Machine (SVM) is used as the classifier for both approaches. Extensive tests with respect to the robustness against pose changes are performed on a database that includes faces rotated up to about 40 degrees in depth. The component-based approach clearly outperforms the whole-face approach on all tests. Although this approach isproven to be more reliable, it is still too slow for real-time applications. That is the reason why a real-time face recognition system using the whole-face approach is implemented to recognize people in color video sequences.


Author[s]: D. Demirdjian and T. Darrell

Motion Estimation from Disparity Images

May 7, 2001

A new method for 3D rigid motion estimation from stereo is proposed in this paper. The appealing feature of this method is that it directly uses the disparity images obtained from stereo matching. We assume that the stereo rig has parallel cameras and show, in that case, the geometric and topological properties of the disparity images. Then we introduce a rigid transformation (called d-motion) that maps two disparity images of a rigidly moving object. We show how it is related to the Euclidean rigid motion and a motion estimation algorithm is derived. We show with experiments that our approach is simple and more accurate than standard approaches.


Author[s]: Tevfik Metin Sezgin

Feature Point Detection and Curve Approximation for Early Processing of Freehand Sketches

May 2001

Freehand sketching is both a natural and crucial part of design, yet is unsupported by current design automation software. We are working to combine the flexibility and ease of use of paper and pencil with the processing power of a computer to produce a design environment that feels as natural as paper, yet is considerably smarter. One of the most basic steps in accomplishing this is converting the original digitized pen strokes in the sketch into the intended geometric objects using feature point detection and approximation. We demonstrate how multiple sources of information can be combined for feature detection in strokes and apply this technique using two approaches to signal processing, one using simple average based thresholding and a second using scale space.


Author[s]: A. Rahimi, L.-P. Morency and T. Darrell

Reducing Drift in Parametric Motion Tracking

May 7, 2001

We develop a class of differential motion trackers that automatically stabilize when in finite domains. Most differ-ential trackers compute motion only relative to one previous frame, accumulating errors indefinitely. We estimate pose changes between a set of past frames, and develop a probabilistic framework for integrating those estimates. We use an approximation to the posterior distribution of pose changes as an uncertainty model for parametric motion in order to help arbitrate the use of multiple base frames. We demonstrate this framework on a simple 2D translational tracker and a 3D, 6-degree of freedom tracker.


Author[s]: Radhika Nagpal

Programmable Self-Assembly: Constructing Global Shape using Biologically-inspire

June 2001

In this thesis I present a language for instructing a sheet of identically-programmed, flexible, autonomous agents (``cells'') to assemble themselves into a predetermined global shape, using local interactions. The global shape is described as a folding construction on a continuous sheet, using a set of axioms from paper-folding (origami). I provide a means of automatically deriving the cell program, executed by all cells, from the global shape description. With this language, a wide variety of global shapes and patterns can be synthesized, using only local interactions between identically-programmed cells. Examples include flat layered shapes, all plane Euclidean constructions, and a variety of tessellation patterns. In contrast to approaches based on cellular automata or evolution, the cell program is directly derived from the global shape description and is composed from a small number of biologically-inspired primitives: gradients, neighborhood query, polarity inversion, cell-to-cell contact and flexible folding. The cell programs are robust, without relying on regular cell placement, global coordinates, or synchronous operation and can tolerate a small amount of random cell death. I show that an average cell neighborhood of 15 is sufficient to reliably self-assemble complex shapes and geometric patterns on randomly distributed cells. The language provides many insights into the relationship between local and global descriptions of behavior, such as the advantage of constructive languages, mechanisms for achieving global robustness, and mechanisms for achieving scale- independent shapes from a single cell program. The language suggests a mechanism by which many related shapes can be created by the same cell program, in the manner of D'Arcy Thompson's famous coordinate transformations. The thesis illuminates how complex morphology and pattern can emerge from local interactions, and how one can engineer robust self-assembly.


Author[s]: Won Hong

Modeling, Estimation, and Control of Robot-Soil Interactions

September 2001

This thesis presents the development of hardware, theory, and experimental methods to enable a robotic manipulator arm to interact with soils and estimate soil properties from interaction forces. Unlike the majority of robotic systems interacting with soil, our objective is parameter estimation, not excavation. To this end, we design our manipulator with a flat plate for easy modeling of interactions. By using a flat plate, we take advantage of the wealth of research on the similar problem of earth pressure on retaining walls. There are a number of existing earth pressure models. These models typically provide estimates of force which are in uncertain relation to the true force. A recent technique, known as numerical limit analysis, provides upper and lower bounds on the true force. Predictions from the numerical limit analysis technique are shown to be in good agreement with other accepted models. Experimental methods for plate insertion, soil-tool interface friction estimation, and control of applied forces on the soil are presented. In addition, a novel graphical technique for inverting the soil models is developed, which is an improvement over standard nonlinear optimization. This graphical technique utilizes the uncertainties associated with each set of force measurements to obtain all possible parameters which could have produced the measured forces. The system is tested on three cohesionless soils, two in a loose state and one in a loose and dense state. The results are compared with friction angles obtained from direct shear tests. The results highlight a number of key points. Common assumptions are made in soil modeling. Most notably, the Mohr-Coulomb failure law and perfectly plastic behavior. In the direct shear tests, a marked dependence of friction angle on the normal stress at low stresses is found. This has ramifications for any study of friction done at low stresses. In addition, gradual failures are often observed for vertical tools and tools inclined away from the direction of motion. After accounting for the change in friction angle at low stresses, the results show good agreement with the direct shear values.


Author[s]: Konstantine Arkoudas

Certified Computation

April 30, 2001

This paper introduces the notion of certified computation. A certified computation does not only produce a result r, but also a correctness certificate, which is a formal proof that r is correct. This can greatly enhance the credibility of the result: if we trust the axioms and inference rules that are used in the certificate,then we can be assured that r is correct. In effect,we obtain a trust reduction: we no longer have to trust the entire computation; we only have to trust the certificate. Typically, the reasoning used in the certificate is much simpler and easier to trust than the entire computation. Certified computation has two main applications: as a software engineering discipline, it can be used to increase the reliability of our code; and as a framework for cooperative computation, it can be used whenever a code consumer executes an algorithm obtained from an untrusted agent and needs to be convinced that the generated results are correct. We propose DPLs (Denotational Proof Languages)as a uniform platform for certified computation. DPLs enforce a sharp separation between logic and control and over versatile mechanicms for constructing certificates. We use Athena as a concrete DPL to illustrate our ideas, and we present two examples of certified computation, giving full working code in both cases.


Author[s]: Aaron Mark Ucko

Predicate Dispatching in the Common Lisp Object System

May 2001

I have added support for predicate dispatching, a powerful generalization of other dispatching mechanisms, to the Common Lisp Object System (CLOS). To demonstrate its utility, I used predicate dispatching to enhance Weyl, a computer algebra system which doubles as a CLOS library. My result is Dispatching-Enhanced Weyl (DEW), a computer algebra system that I have demonstrated to be well suited for both users and programmers.



Author[s]: Javid Sadr and Pawan Sinha

Exploring Object Perception with Random Image Structure Evolution

March 2001

We have developed a technique called RISE (Random Image Structure Evolution), by which one may systematically sample continuous paths in a high-dimensional image space. A basic RISE sequence depicts the evolution of an object's image from a random field, along with the reverse sequence which depicts the transformation of this image back into randomness. The processing steps are designed to ensure that important low-level image attributes such as the frequency spectrum and luminance are held constant throughout a RISE sequence. Experiments based on the RISE paradigm can be used to address some key open issues in object perception. These include determining the neural substrates underlying object perception, the role of prior knowledge and expectation in object perception, and the developmental changes in object perception skills from infancy to adulthood.


Author[s]: Jessica Banks

Design and Control of an Anthropomorphic Robotic Finger with Multi-point Tactile Sensation

May 2001

The goal of this research is to develop the prototype of a tactile sensing platform for anthropomorphic manipulation research. We investigate this problem through the fabrication and simple control of a planar 2-DOF robotic finger inspired by anatomic consistency, self-containment, and adaptability. The robot is equipped with a tactile sensor array based on optical transducer technology whereby localized changes in light intensity within an illuminated foam substrate correspond to the distribution and magnitude of forces applied to the sensor surface plane. The integration of tactile perception is a key component in realizing robotic systems which organically interact with the world. Such natural behavior is characterized by compliant performance that can initiate internal, and respond to external, force application in a dynamic environment. However, most of the current manipulators that support some form of haptic feedback either solely derive proprioceptive sensation or only limit tactile sensors to the mechanical fingertips. These constraints are due to the technological challenges involved in high resolution, multi-point tactile perception. In this work, however, we take the opposite approach, emphasizing the role of full-finger tactile feedback in the refinement of manual capabilities. To this end, we propose and implement a control framework for sensorimotor coordination analogous to infant-level grasping and fixturing reflexes. This thesis details the mechanisms used to achieve these sensory, actuation, and control objectives, along with the design philosophies and biological influences behind them. The results of behavioral experiments with a simple tactilely-modulated control scheme are also described. The hope is to integrate the modular finger into an %engineered analog of the human hand with a complete haptic system.



Author[s]: Nicholas Tung Chan and Christian Shelton

An Electronic Market-Maker

April 17, 2001

This paper presents an adaptive learning model for market-making under the reinforcement learning framework. Reinforcement learning is a learning technique in which agents aim to maximize the long-term accumulated rewards. No knowledge of the market environment, such as the order arrival or price process, is assumed. Instead, the agent learns from real-time market experience and develops explicit market-making strategies, achieving multiple objectives including the maximizing of profits and minimization of the bid-ask spread. The simulation results show initial success in bringing learning techniques to building market-making algorithms.


Author[s]: Jason D. M. Rennie

Improving Multi-class Text Classification with Naive Bayes

September 2001

There are numerous text documents available in electronic form. More and more are becoming available every day. Such documents represent a massive amount of information that is easily accessible. Seeking value in this huge collection requires organization; much of the work of organizing documents can be automated through text classification. The accuracy and our understanding of such systems greatly influences their usefulness. In this paper, we seek 1) to advance the understanding of commonly used text classification techniques, and 2) through that understanding, improve the tools that are available for text classification. We begin by clarifying the assumptions made in the derivation of Naive Bayes, noting basic properties and proposing ways for its extension and improvement. Next, we investigate the quality of Naive Bayes parameter estimates and their impact on classification. Our analysis leads to a theorem which gives an explanation for the improvements that can be found in multiclass classification with Naive Bayes using Error-Correcting Output Codes. We use experimental evidence on two commonly-used data sets to exhibit an application of the theorem. Finally, we show fundamental flaws in a commonly-used feature selection algorithm and develop a statistics-based framework for text feature selection. Greater understanding of Naive Bayes and the properties of text allows us to make better use of it in text classification.



Author[s]: Mariano Alvira and Ryan Rifkin

An Empirical Comparison of SNoW and SVMs for Face Detection

January 2001

Impressive claims have been made for the performance of the SNoW algorithm on face detection tasks by Yang et. al. [7]. In particular, by looking at both their results and those of Heisele et. al. [3], one could infer that the SNoW system performed substantially better than an SVM-based system, even when the SVM used a polynomial kernel and the SNoW system used a particularly simplistic 'primitive' linear representation. We evaluated the two approaches in a controlled experiment, looking directly at performance on a simple, fixed-sized test set, isolating out 'infrastructure' issues related to detecting faces at various scales in large images. We found that SNoW performed about as well as linear SVMs, and substantially worse than polynomial SVMs.



Author[s]: Christian Robert Shelton

Importance Sampling for Reinforcement Learning with Multiple Objectives

August 2001

This thesis considers three complications that arise from applying reinforcement learning to a real-world application. In the process of using reinforcement learning to build an adaptive electronic market-maker, we find the sparsity of data, the partial observability of the domain, and the multiple objectives of the agent to cause serious problems for existing reinforcement learning algorithms. We employ importance sampling (likelihood ratios) to achieve good performance in partially observable Markov decision processes with few data. Our importance sampling estimator requires no knowledge about the environment and places few restrictions on the method of collecting data. It can be used efficiently with reactive controllers, finite-state controllers, or policies with function approximation. We present theoretical analyses of the estimator and incorporate it into a reinforcement learning algorithm. Additionally, this method provides a complete return surface which can be used to balance multiple objectives dynamically. We demonstrate the need for multiple goals in a variety of applications and natural solutions based on our sampling method. The thesis concludes with example results from employing our algorithm to the domain of automated electronic market-making.


Author[s]: Nicolas Meuleau, Leonid Peshkin and Kee-Eung Kim

Exploration in Gradient-Based Reinforcement Learning

April 3, 2001

Gradient-based policy search is an alternative to value-function-based methods for reinforcement learning in non-Markovian domains. One apparent drawback of policy search is its requirement that all actions be 'on-policy'; that is, that there be no explicit exploration. In this paper, we provide a method for using importance sampling to allow any well-behaved directed exploration policy during learning. We show both theoretically and experimentally that using this method can achieve dramatic performance improvements.


Author[s]: Pedro F. Felzenszwalb

Object Recognition with Pictorial Structures

May 2001

This thesis presents a statistical framework for object recognition. The framework is motivated by the pictorial structure models introduced by Fischler and Elschlager nearly 30 years ago. The basic idea is to model an object by a collection of parts arranged in a deformable configuration. The appearance of each part is modeled separately, and the deformable configuration is represented by spring-like connections between pairs of parts. These models allow for qualitative descriptions of visual appearance, and are suitable for generic recognition problems. The problem of detecting an object in an image and the problem of learning an object model using training examples are naturally formulated under a statistical approach. We present efficient algorithms to solve these problems in our framework. We demonstrate our techniques by training models to represent faces and human bodies. The models are then used to locate the corresponding objects in novel images.



Author[s]: Christian R. Shelton

Policy Improvement for POMDPs Using Normalized Importance Sampling

March 20, 2001

We present a new method for estimating the expected return of a POMDP from experience. The estimator does not assume any knowle ge of the POMDP and allows the experience to be gathered with an arbitrary set of policies. The return is estimated for any new policy of the POMDP. We motivate the estimator from function-approximation and importance sampling points-of-view and derive its theoretical properties. Although the estimator is biased, it has low variance and the bias is often irrelevant when the estimator is used for pair-wise comparisons.We conclude by extending the estimator to policies with memory and compare its performance in a greedy search algorithm to the REINFORCE algorithm showing an order of magnitude reduction in the number of trials required.


Author[s]: Kimberle Koile

The Architect's Collaborator: Toward Intelligent Tools for Conceptual Design

January 2001

In early stages of architectural design, as in other design domains, the language used is often very abstract. In architectural design, for example, architects and their clients use experiential terms such as "private" or "open" to describe spaces. If we are to build programs that can help designers during this early-stage design, we must give those programs the capability to deal with concepts on the level of such abstractions. The work reported in this thesis sought to do that, focusing on two key questions: How are abstract terms such as "private" and "open" translated into physical form? How might one build a tool to assist designers with this process? The Architect's Collaborator (TAC) was built to explore these issues. It is a design assistant that supports iterative design refinement, and that represents and reasons about how experiential qualities are manifested in physical form. Given a starting design and a set of design goals, TAC explores the space of possible designs in search of solutions that satisfy the goals. It employs a strategy we've called dependency-directed redesign: it evaluates a design with respect to a set of goals, then uses an explanation of the evaluation to guide proposal and refinement of repair suggestions; it then carries out the repair suggestions to create new designs. A series of experiments was run to study TAC's behavior. Issues of control structure, goal set size, goal order, and modification operator capabilities were explored. In addition, TAC's use as a design assistant was studied in an experiment using a house in the process of being redesigned. TAC's use as an analysis tool was studied in an experiment using Frank Lloyd Wright's Prairie houses.


Author[s]: T. Darrell, D. Demirdjian, N. Checka and P. Felzenswalb

Plan-view Trajectory Estimation with Dense Stereo Background Models

February 2001

In a known environment, objects may be tracked in multiple views using a set of back-ground models. Stereo-based models can be illumination-invariant, but often have undefined values which inevitably lead to foreground classification errors. We derive dense stereo models for object tracking using long-term, extended dynamic-range imagery, and by detecting and interpolating uniform but unoccluded planar regions. Foreground points are detected quickly in new images using pruned disparity search. We adopt a 'late-segmentation' strategy, using an integrated plan-view density representation. Foreground points are segmented into object regions only when a trajectory is finally estimated, using a dynamic programming-based method. Object entry and exit are optimally determined and are not restricted to special spatial zones.



Author[s]: Thomas Serre, Bernd Heisele, Sayan Mukherjee and Tomaso Poggio

Feature Selection for Face Detection

September 2000

We present a new method to select features for a face detection system using Support Vector Machines (SVMs). In the first step we reduce the dimensionality of the input space by projecting the data into a subset of eigenvectors. The dimension of the subset is determined by a classification criterion based on minimizing a bound on the expected error probability of an SVM. In the second step we select features from the SVM feature space by removing those that have low contributions to the decision function of the SVM.



Author[s]: Maximilian Riesenhuber and Tomaso Poggio

Computational Models of Object Recognition in Cortex: A Review

August 7, 2000

Understanding how biological visual systems perform object recognition is one of the ultimate goals in computational neuroscience. Among the biological models of recognition the main distinctions are between feedforward and feedback and between object-centered and view-centered. From a computational viewpoint the different recognition tasks - for instance categorization and identification - are very similar, representing different trade-offs between specificity and invariance. Thus the different tasks do not strictly require different classes of models. The focus of the review is on feedforward, view-based models that are supported by psychophysical and physiological data.



Author[s]: Chikahito Nakajima, Massimiliano Pontil, Bernd Heisele and Tomaso Poggio

People Recognition in Image Sequences by Supervised Learning

June 2000

We describe a system that learns from examples to recognize people in images taken indoors. Images of people are represented by color-based and shape-based features. Recognition is carried out through combinations of Support Vector Machine classifiers (SVMs). Different types of multiclass strategies based on SVMs are explored and compared to k-Nearest Neighbors classifiers (kNNs). The system works in real time and shows high performance rates for people recognition throughout one day.



Author[s]: Bernd Heisele, Tomaso Poggio and Massimiliano Pontil

Face Detection in Still Gray Images

May 2000

We present a trainable system for detecting frontal and near-frontal views of faces in still gray images using Support Vector Machines (SVMs). We first consider the problem of detecting the whole face pattern by a single SVM classifer. In this context we compare different types of image features, present and evaluate a new method for reducing the number of features and discuss practical issues concerning the parameterization of SVMs and the selection of training data. The second part of the paper describes a component-based method for face detection consisting of a two-level hierarchy of SVM classifers. On the first level, component classifers independently detect components of a face, such as the eyes, the nose, and the mouth. On the second level, a single classifer checks if the geometrical configuration of the detected components in the image matches a geometrical model of a face.



Author[s]: Maximilian Riesenhuber and Tomaso Poggio

The Individual is Nothing, the Class Everything: Psychophysics and Modeling of Recognition in Obect Classes

May, 1, 2000

Most psychophysical studies of object recognition have focussed on the recognition and representation of individual objects subjects had previously explicitely been trained on. Correspondingly, modeling studies have often employed a 'grandmother'-type representation where the objects to be recognized were represented by individual units. However, objects in the natural world are commonly members of a class containing a number of visually similar objects, such as faces, for which physiology studies have provided support for a representation based on a sparse population code, which permits generalization from the learned exemplars to novel objects of that class. In this paper, we present results from psychophysical and modeling studies intended to investigate object recognition in natural ('continuous') object classes. In two experiments, subjects were trained to perform subordinate level discrimination in a continuous object class - images of computer-rendered cars - created using a 3D morphing system. By comparing the recognition performance of trained and untrained subjects we could estimate the effects of viewpoint-specific training and infer properties of the object class-specific representation learned as a result of training. We then compared the experimental findings to simulations, building on our recently presented HMAX model of object recognition in cortex, to investigate the computational properties of a population-based object class representation as outlined above. We find experimental evidence, supported by modeling results, that training builds a viewpoint- and class-specific representation that supplements a pre-existing repre-sentation with lower shape discriminability but possibly greater viewpoint invariance.



Author[s]: Vinay Kumar and Tomaso Poggio

Learning-Based Approach to Estimation of Morphable Model Parameters

September 2000

We describe the key role played by partial evaluation in the Supercomputing Toolkit, a parallel computing system for scientific applications that effectively exploits the vast amount of parallelism exposed by partial evaluation. The Supercomputing Toolkit parallel processor and its associated partial evaluation-based compiler have been used extensively by scientists at MIT, and have made possible recent results in astrophysics showing that the motion of the planets in our solar system is chaotically unstable.



Author[s]: Constantine P. Papageorgiou

A Trainable System for Object Detection in Images and Video Sequences

May 2000

This thesis presents a general, trainable system for object detection in static images and video sequences. The core system finds a certain class of objects in static images of completely unconstrained, cluttered scenes without using motion, tracking, or handcrafted models and without making any assumptions on the scene structure or the number of objects in the scene. The system uses a set of training data of positive and negative example images as input, transforms the pixel images to a Haar wavelet representation, and uses a support vector machine classifier to learn the difference between in-class and out-of-class patterns. To detect objects in out-of-sample images, we do a brute force search over all the subwindows in the image. This system is applied to face, people, and car detection with excellent results. For our extensions to video sequences, we augment the core static detection system in several ways -- 1) extending the representation to five frames, 2) implementing an approximation to a Kalman filter, and 3) modeling detections in an image as a density and propagating this density through time according to measured features. In addition, we present a real-time version of the system that is currently running in a DaimlerChrysler experimental vehicle. As part of this thesis, we also present a system that, instead of detecting full patterns, uses a component-based approach. We find it to be more robust to occlusions, rotations in depth, and severe lighting conditions for people detection than the full body version. We also experiment with various other representations including pixels and principal components and show results that quantify how the number of features, color, and gray-level affect performance.



Author[s]: Theodoros Evgeniou and Massimiliano Pontil

A Note on the Generalization Performance of Kernel Classifiers with Margin

May 1, 2000

We present distribution independent bounds on the generalization misclassification performance of a family of kernel classifiers with margin. Support Vector Machine classifiers (SVM) stem out of this class of machines. The bounds are derived through computations of the $V_gamma$ dimension of a family of loss functions where the SVM one belongs to. Bounds that use functions of margin distributions (i.e. functions of the slack variables of SVM) are derived.



Author[s]: Maximilian Riesenhuber and Tomaso Poggio

A Note on Object Class Representation and Categorical Perception

December 17, 1999

We present a novel scheme ("Categorical Basis Functions", CBF) for object class representation in the brain and contrast it to the "Chorus of Prototypes" scheme recently proposed by Edelman. The power and flexibility of CBF is demonstrated in two examples. CBF is then applied to investigate the phenomenon of Categorical Perception, in particular the finding by Bulthoff et al. (1998) of categorization of faces by gender without corresponding Categorical Perception. Here, CBF makes predictions that can be tested in a psychophysical experiment. Finally, experiments are suggested to further test CBF.


Author[s]: J. Kenneth Salisbury, Jr. and Mandayam A. Srinivasan (editors)

Proceedings of the Fourth PHANTOM Users Group Workshop

November 4, 1999

This Report contains the proceedings of the Fourth Phantom Users Group Workshop contains 17 papers presented October 9-12, 1999 at MIT Endicott House in Dedham Massachusetts. The workshop included sessions on, Tools for Programmers, Dynamic Environments, Perception and Cognition, Haptic Connections, Collision Detection / Collision Response, Medical and Seismic Applications, and Haptics Going Mainstream. The proceedings include papers that cover a variety of subjects in computer haptics including rendering, contact determination, development libraries, and applications in medicine, path planning, data interaction and training.


Author[s]: J.P. Mellor

Automatically Recovering Geometry and Texture from Large Sets of Calibrated Images

October 22, 1999

Three-dimensional models which contain both geometry and texture have numerous applications such as urban planning, physical simulation, and virtual environments. A major focus of computer vision (and recently graphics) research is the automatic recovery of three-dimensional models from two- dimensional images. After many years of research this goal is yet to be achieved. Most practical modeling systems require substantial human input and unlike automatic systems are not scalable. This thesis presents a novel method for automatically recovering dense surface patches using large sets (1000's) of calibrated images taken from arbitrary positions within the scene. Physical instruments, such as Global Positioning System (GPS), inertial sensors, and inclinometers, are used to estimate the position and orientation of each image. Essentially, the problem is to find corresponding points in each of the images. Once a correspondence has been established, calculating its three-dimensional position is simply a matter of geometry. Long baseline images improve the accuracy. Short baseline images and the large number of images greatly simplifies the correspondence problem. The initial stage of the algorithm is completely local and scales linearly with the number of images. Subsequent stages are global in nature, exploit geometric constraints, and scale quadratically with the complexity of the underlying scene. We describe techniques for: 1) detecting and localizing surface patches; 2) refining camera calibration estimates and rejecting false positive surfels; and 3) grouping surface patches into surfaces and growing the surface along a two-dimensional manifold. We also discuss a method for producing high quality, textured three-dimensional models from these surfaces. Some of the most important characteristics of this approach are that it: 1) uses and refines noisy calibration estimates; 2) compensates for large variations in illumination; 3) tolerates significant soft occlusion (e.g. tree branches); and 4) associates, at a fundamental level, an estimated normal (i.e. no frontal-planar assumption) and texture with each surface patch.



Author[s]: Constantine P. Papageorgiou and Tomaso Poggio

A Trainable Object Detection System: Car Detection in Static Images

october 13, 1999

This paper describes a general, trainable architecture for object detection that has previously been applied to face and peoplesdetection with a new application to car detection in static images. Our technique is a learning based approach that uses a set of labeled training data from which an implicit model of an object class -- here, cars -- is learned. Instead of pixel representations that may be noisy and therefore not provide a compact representation for learning, our training images are transformed from pixel space to that of Haar wavelets that respond to local, oriented, multiscale intensity differences. These feature vectors are then used to train a support vector machine classifier. The detection of cars in images is an important step in applications such as traffic monitoring, driver assistance systems, and surveillance, among others. We show several examples of car detection on out-of- sample images and show an ROC curve that highlights the performance of our system.



Author[s]: Vinay P. Kumar and Tomaso Poggio

Learning-Based Approach to Real Time Tracking and Analysis of Faces

September 23, 1999

This paper describes a trainable system capable of tracking faces and facialsfeatures like eyes and nostrils and estimating basic mouth features such as sdegrees of openness and smile in real time. In developing this system, we have addressed the twin issues of image representation and algorithms for learning. We have used the invariance properties of image representations based on Haar wavelets to robustly capture various facial features. Similarly, unlike previous approaches this system is entirely trained using examples and does not rely on a priori (hand-crafted) models of facial features based on optical flow or facial musculature. The system works in several stages that begin with face detection, followed by localization of facial features and estimation of mouth parameters. Each of these stages is formulated as a problem in supervised learning from examples. We apply the new and robust technique of support vector machines (SVM) for classification in the stage of skin segmentation, face detection and eye detection. Estimation of mouth parameters is modeled as a regression from a sparse subset of coefficients (basis functions) of an overcomplete dictionary of Haar wavelets.


Author[s]: Mark M. Millonas and Erik M. Rauch

Trans-membrane Signal Transduction and Biochemical Turing Pattern Formation

September 28, 1999

The Turing mechanism for the production of a broken spatial symmetry in an initially homogeneous system of reacting and diffusing substances has attracted much interest as a potential model for certain aspects of morphogenesis such as pre- patterning in the embryo, and has also served as a model for self-organization in more generic systems. The two features necessary for the formation of Turing patterns are short- range autocatalysis and long-range inhibition which usually only occur when the diffusion rate of the inhibitor is significantly greater than that of the activator. This observation has sometimes been used to cast doubt on applicability of the Turing mechanism to cellular patterning since many messenger molecules that diffuse between cells do so at more-or-less similar rates. Here we show that stationary, symmetry-breaking Turing patterns can form in physiologically realistic systems even when the extracellular diffusion coefficients are equal; the kinetic properties of the 'receiver' and 'transmitter' proteins responsible for signal transduction will be primary factors governing this process.


Author[s]: Kinh Tieu and Paul Viola

Boosting Image Database Retrieval

September 10, 1999

We present an approach for image database retrieval using a very large number of highly- selective features and simple on-line learning. Our approach is predicated on the assumption that each image is generated by a sparse set of visual "causes" and that images which are visually similar share causes. We propose a mechanism for generating a large number of complex features which capture some aspects of this causal structure. Boosting is used to learn simple and efficient classifiers in this complex feature space. Finally we will describe a practical implementation of our retrieval system on a database of 3000 images.


Author[s]: Tommi Jaakkola, Marina Meila and Tony Jebara

Maximum Entropy Discrimination

December 1999

We present a general framework for discriminative estimation based on the maximum entropy principle and its extensions. All calculations involve distributions over structures and/or parameters rather than specific settings and reduce to relative entropy projections. This holds even when the data is not separable within the chosen parametric class, in the context of anomaly detection rather than classification, or when the labels in the training set are uncertain or incomplete. Support vector machines are naturally subsumed under this class and we provide several extensions. We are also able to estimate exactly and efficiently discriminative distributions over tree structures of class- conditional models within this framework. Preliminary experimental results are indicative of the potential in these techniques.


Author[s]: Radhika Nagpal

Organizing a Global Coordinate System from Local Information on an Amorphous Computer

August 29, 1999

This paper demonstrates that it is possible to generate a reasonably accurate coordinate system on randomly distributed processors, using only local information and local communication. By coordinate systems we imply that each element assigns itself a logical coordinate that maps to its global physical location, starting with no apriori knowledge of position or orientation. The algorithm presented is inspired by biological systems that use chemical gradients to determine the position of cells. Extensive analysis and simulation results are presented. Two key results are: there is a critical minimum average neighborhood size of 15 for good accuracy and there is a fundamental limit on the resolution of any coordinate system determined strictly from local communication. We also demonstrate that using this algorithm, random distributions of processors produce significantly better accuracy than regular processor grids - such as those used by cellular automata. This has implications for discrete models of biology as well as for building smart sensor arrays.


Author[s]: Harold Abelson, Don Allen, Daniel Coore, Chris Hanson, George Homsy, Thomas F. Knight, Jr., Radhika Nagpal, Erik Rauch, Gerald Jay Sussman and Ron Weiss

Amorphous Computing

August 29, 1999

Amorphous computing is the development of organizational principles and programming languages for obtaining coherent behaviors from the cooperation of myriads of unreliable parts that are interconnected in unknown, irregular, and time-varying ways. The impetus for amorphous computing comes from developments in microfabrication and fundamental biology, each of which is the basis of a kernel technology that makes it possible to build or grow huge numbers of almost-identical information-processing units at almost no cost. This paper sets out a research agenda for realizing the potential of amorphous computing and surveys some initial progress, both in programming and in fabrication. We describe some approaches to programming amorphous systems, which are inspired by metaphors from biology and physics. We also present the basic ideas of cellular computing, an approach to constructing digital-logic circuits within living cells by representing logic levels by concentrations DNA-binding proteins.



Author[s]: Anuj Mohan

Object Detection in Images by Components

August 11, 1999

In this paper we present a component based person detection system that is capable of detecting frontal, rear and near side views of people, and partially occluded persons in cluttered scenes. The framework that is described here for people is easily applied to other objects as well. The motivation for developing a component based approach is two fold: first, to enhance the performance of person detection systems on frontal and rear views of people and second, to develop a framework that directly addresses the problem of detecting people who are partially occluded or whose body parts blend in with the background. The data classification is handled by several support vector machine classifiers arranged in two layers. This architecture is known as Adaptive Combination of Classifiers (ACC). The system performs very well and is capable of detecting people even when all components of a person are not found. The performance of the system is significantly better than a full body person detector designed along similar lines. This suggests that the improved performance is due to the components based approach and the ACC data classification structure.


Author[s]: Rajesh Kasturirangan

Multiple Scales in Small-World Networks

August 11, 1999

Small-world architectures may be implicated in a range of phenomena from networks of neurons in the cerebral cortex to social networks and propogation of viruses. Small- world networks are interpolations of regular and random networks that retain the advantages of both regular and random networks by being highly clustered like regular networks and having small average path length between nodes, like random networks. While most of the recent attention on small- world networks has focussed on the effect of introducing disorder/randomness into a regular network, we show that that the fundamental mechanism behind the small- world phenomenon is not disorder/ randomness, but the presence of connections of many different length scales. Consequently, in order to explain the small-world phenomenon, we introduce the concept of multiple scale networks and then state the multiple length scale hypothesis. We show that small-world behavior in randomly rewired networks is a consequence of features common to all multiple scale networks. To support the multiple length scale hypothesis, novel network architectures are introduced that need not be a result of random rewiring of a regular network. In each case it is shown that whenever the network exhibits small- world behavior, it also has connections of diverse length scales. We also show that the distribution of the length scales of the new connections is significantly more important than whether the new connections are long range, medium range or short range.


Author[s]: Liana M. Lorigo, Olivier Faugeras, W.E.L. Grimson, Renaud Keriven, Ron Kikinis, Carl-Fredrik Westin

Co-dimension 2 Geodesic Active Contours for MRA Segmentation

August 11, 1999

Automatic and semi-automatic magnetic resonance angiography (MRA)s segmentation techniques can potentially save radiologists larges amounts of time required for manual segmentation and cans facilitate further data analysis. The proposed MRAs segmentation method uses a mathematical modeling technique whichs is well-suited to the complicated curve-like structure of bloods vessels. We define the segmentation task as ans energy minimization over all 3D curves and use a level set methods to search for a solution. Ours approach is an extension of previous level set segmentations techniques to higher co-dimension.



Author[s]: Ryan Rifkin, Massimiliano Pontil and Alessandro Verri

A Note on Support Vector Machines Degeneracy

August 11, 1999

When training Support Vector Machines (SVMs) over non-separable data sets, one sets the threshold $b$ using any dual cost coefficient that is strictly between the bounds of $0$ and $C$. We show that there exist SVM training problems with dual optimal solutions with all coefficients at bounds, but that all such problems are degenerate in the sense that the "optimal separating hyperplane" is given by ${f w} = {f 0}$, and the resulting (degenerate) SVM will classify all future points identically (to the class that supplies more training data). We also derive necessary and sufficient conditions on the input data for this to occur. Finally, we show that an SVM training problem can always be made degenerate by the addition of a single data point belonging to a certain unboundedspolyhedron, which we characterize in terms of its extreme points and rays.



Author[s]: Tony Ezzat and Tomaso Poggio

Visual Speech Synthesis by Morphing Visemes

May 1999

We present MikeTalk, a text-to-audiovisual speech synthesizer which converts input text into an audiovisual speech stream. MikeTalk is built using visemes, which are a small set of images spanning a large range of mouth shapes. The visemes are acquired from a recorded visual corpus of a human subject which is specifically designed to elicit one instantiation of each viseme. Using optical flow methods, correspondence from every viseme to every other viseme is computed automatically. By morphing along this correspondence, a smooth transition between viseme images may be generated. A complete visual utterance is constructed by concatenating viseme transitions. Finally, phoneme and timing information extracted from a text-to-speech synthesizer is exploited to determine which viseme transitions to use, and the rate at which the morphing process should occur. In this manner, we are able to synchronize the visual speech stream with the audio speech stream, and hence give the impression of a photorealistic talking face.


Author[s]: Hany Farid

Detecting Digital Forgeries Using Bispectral Analysis

December 1999

With the rapid increase in low-cost and sophisticated digital technology the need for techniques to authenticate digital material will become more urgent. In this paper we address the problem of authenticating digital signals assuming no explicit prior knowledge of the original. The basic approach that we take is to assume that in the frequency domain a "natural" signal has weak higher-order statistical correlations. We then show that "un-natural" correlations are introduced if this signal is passed through a non-linearity (which would almost surely occur in the creation of a forgery). Techniques from polyspectral analysis are then used to detect the presence of these correlations. We review the basics of polyspectral analysis, show how and why these tools can be used in detecting forgeries and show their effectiveness in analyzing human speech.



Author[s]: Theodoros Evgeniou and Massimiliano Pontil

On the V(subscript gamma) Dimension for Regression in Reproducing Kernel Hilbert Spaces

May 1999

This paper presents a computation of the $V_gamma$ dimension for regression in bounded subspaces of Reproducing Kernel Hilbert Spaces (RKHS) for the Support Vector Machine (SVM) regression $epsilon$-insensitive loss function, and general $L_p$ loss functions. Finiteness of the RV_gamma$ dimension is shown, which also proves uniform convergence in probability for regression machines in RKHS subspaces that use the $L_epsilon$ or general $L_p$ loss functions. This paper presenta a novel proof of this result also for the case that a bias is added to the functions in the RKHS.


Author[s]: Gideon P. Stein, Raquel Romano and Lily Lee

Monitoring Activities from Multiple Video Streams: Establishing a Common Coordinate Frame

April 1999

Passive monitoring of large sites typically requires coordination between multiple cameras, which in turn requires methods for automatically relating events between distributed cameras. This paper tackles the problem of self-calibration of multiple cameras which are very far apart, using feature correspondences to determine the camera geometry. The key problem is finding such correspondences. Since the camera geometry and photometric characteristics vary considerably between images, one cannot use brightness and/or proximity constraints. Instead we apply planar geometric constraints to moving objects in the scene in order to align the scene"s ground plane across multiple views. We do not assume synchronized cameras, and we show that enforcing geometric constraints enables us to align the tracking data in time. Once we have recovered the homography which aligns the planar structure in the scene, we can compute from the homography matrix the 3D position of the plane and the relative camera positions. This in turn enables us to recover a homography matrix which maps the images to an overhead view. We demonstrate this technique in two settings: a controlled lab setting where we test the effects of errors in internal camera calibration, and an uncontrolled, outdoor setting in which the full procedure is applied to external camera calibration and ground plane recovery. In spite of noise in the internal camera parameters and image data, the system successfully recovers both planar structure and relative camera positions in both settings.



Author[s]: Theodoros Evgeniou, Massimiliano Pontil and Tomaso Poggio

A Unified Framework for Regularization Networks and Support Vector Machines

March 1999

Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples -- in particular the regression problem of approximating a multivariate function from sparse data. We present both formulations in a unified framework, namely in the context of Vapnik's theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics.



Author[s]: Sayan Mukherjee and Vladimir Vapnik

Multivariate Density Estimation: An SVM Approach

April 1999

We formulate density estimation as an inverse operator problem. We then use convergence results of empirical distribution functions to true distribution functions to develop an algorithm for multivariate density estimation. The algorithm is based upon a Support Vector Machine (SVM) approach to solving inverse operator problems. The algorithm is implemented and tested on simulated data from different distributions and different dimensionalities, gaussians and laplacians in $R^2$ and $R^{12}$. A comparison in performance is made with Gaussian Mixture Models (GMMs). Our algorithm does as well or better than the GMMs for the simulations tested and has the added advantage of being automated with respect to parameters.


Author[s]: Marina Meila

An Accelerated Chow and Liu Algorithm: Fitting Tree Distributions to High Dimensional Sparse Data

January 1999

Chow and Liu introduced an algorithm for fitting a multivariate distribution with a tree (i.e. a density model that assumes that there are only pairwise dependencies between variables) and that the graph of these dependencies is a spanning tree. The original algorithm is quadratic in the dimesion of the domain, and linear in the number of data points that define the target distribution $P$. This paper shows that for sparse, discrete data, fitting a tree distribution can be done in time and memory that is jointly subquadratic in the number of variables and the size of the data set. The new algorithm, called the acCL algorithm, takes advantage of the sparsity of the data to accelerate the computation of pairwise marginals and the sorting of the resulting mutual informations, achieving speed ups of up to 2-3 orders of magnitude in the experiments.



Author[s]: Massimiliano Pontil, Sayan Mukherjee and Federico Girosi

On the Noise Model of Support Vector Machine Regression

October 1998

Support Vector Machines Regression (SVMR) is a regression technique which has been recently introduced by V. Vapnik and his collaborators (Vapnik, 1995; Vapnik, Golowich and Smola, 1996). In SVMR the goodness of fit is measured not by the usual quadratic loss function (the mean square error), but by a different loss function called Vapnik"s $epsilon$- insensitive loss function, which is similar to the "robust" loss functions introduced by Huber (Huber, 1981). The quadratic loss function is well justified under the assumption of Gaussian additive noise. However, the noise model underlying the choice of Vapnik's loss function is less clear. In this paper the use of Vapnik's loss function is shown to be equivalent to a model of additive and Gaussian noise, where the variance and mean of the Gaussian are random variables. The probability distributions for the variance and mean will be stated explicitly. While this work is presented in the framework of SVMR, it can be extended to justify non-quadratic loss functions in any Maximum Likelihood or Maximum A Posteriori approach. It applies not only to Vapnik's loss function, but to a much broader class of loss functions.



Author[s]: Christian R. Shelton

Three-Dimensional Correspondence

December 1998

This paper describes the problem of three-dimensional object correspondence and presents an algorithm for matching two three-dimensional colored surfaces using polygon reduction and the minimization of an energy function. At the core of this algorithm is a novel data-dependent multi-resolution pyramid for polygonal surfaces. The algorithm is general to correspondence between any two manifolds of the same dimension embedded in a higher dimensional space. Results demonstrating correspondences between various objects are presented and a method for incorporating user input is also detailed.



Author[s]: Massimiliano Pontil, Ryan Rifkin and Theodoros Evgeniou

From Regression to Classification in Support Vector Machines

November 1998

We study the relation between support vector machines (SVMs) for regression (SVMR) and SVM for classification (SVMC). We show that for a given SVMC solution there exists a SVMR solution which is equivalent for a certain choice of the parameters. In particular our result is that for $epsilon$ sufficiently close to one, the optimal hyperplane and threshold for the SVMC problem with regularization parameter C_c are equal to (1-epsilon)^{- 1} times the optimal hyperplane and threshold for SVMR with regularization parameter C_r = (1-epsilon)C_c. A direct consequence of this result is that SVMC can be seen as a special case of SVMR.



Author[s]: Marina Meila, Michael I. Jordan and Quaid Morris

Estimating Dependency Structure as a Hidden Variable

September 1998

This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms that use EM and the Minimum Spanning Tree algorithm to find the ML and MAP mixture of trees for a variety of priors, including the Dirichlet and the MDL priors. We also show that the single tree classifier acts like an implicit feature selector, thus making the classification performance insensitive to irrelevant attributes. Experimental results demonstrate the excellent performance of the new model both in density estimation and in classification.


Author[s]: Hany Farid and Edward H. Adelson

Separating Reflections from Images Using Independent Components Analysis

September 1998

The image of an object can vary dramatically depending on lighting, specularities/reflections and shadows. It is often advantageous to separate these incidental variations from the intrinsic aspects of an image. Along these lines this paper describes a method for photographing objects behind glass and digitally removing the reflections off the glass leaving the image of the objects behind the glass intact. We describe the details of this method which employs simple optical techniques and independent components analysis (ICA) and show its efficacy with several examples.



Author[s]: Nicholas Chan, Blake LeBaron, Andrew Lo and Tomaso Poggio

Information Dissemination and Aggregation in Asset Markets with Simple Intelligent Traders

September 1998

Various studies of asset markets have shown that traders are capable of learning and transmitting information through prices in many situations. In this paper we replace human traders with intelligent software agents in a series of simulated markets. Using these simple learning agents, we are able to replicate several features of the experiments with human subjects, regarding (1) dissemination of information from informed to uninformed traders, and (2) aggregation of information spread over different traders.


Author[s]: Deborah A. Wallach

A Hierarchical Cache Coherent Protocol

September 1992

As the number of processors in distributed- memory multiprocessors grows, efficiently supporting a shared-memory programming model becomes difficult. We have designed the Protocol for Hierarchical Directories (PHD) to allow shared-memory support for systems containing massive numbers of processors. PHD eliminates bandwidth problems by using a scalable network, decreases hot-spots by not relying on a single point to distribute blocks, and uses a scalable amount of space for its directories. PHD provides a shared- memory model by synthesizing a global shared memory from the local memories of processors. PHD supports sequentially consistent read, write, and test- and-set operations. This thesis also introduces a method of describing locality for hierarchical protocols and employs this method in the derivation of an abstract model of the protocol behavior. An embedded model, based on the work of Johnson[ISCA19], describes the protocol behavior when mapped to a k-ary n- cube. The thesis uses these two models to study the average height in the hierarchy that operations reach, the longest path messages travel, the number of messages that operations generate, the inter-transaction issue time, and the protocol overhead for different locality parameters, degrees of multithreading, and machine sizes. We determine that multithreading is only useful for approximately two to four threads; any additional interleaving does not decrease the overall latency. For small machines and high locality applications, this limitation is due mainly to the length of the running threads. For large machines with medium to low locality, this limitation is due mainly to the protocol overhead being too large. Our study using the embedded model shows that in situations where the run length between references to shared memory is at least an order of magnitude longer than the time to process a single state transition in the protocol, applications exhibit good performance. If separate controllers for processing protocol requests are included, the protocol scales to 32k processor machines as long as the application exhibits hierarchical locality: at least 22% of the global references must be able to be satisfied locally; at most 35% of the global references are allowed to reach the top level of the hierarchy.


Author[s]: Parry Husbands, Charles Lee Isbell, Jr. and Alan Edelman

Interactive Supercomputing with MIT Matlab

July 28, 1998

This paper describes MITMatlab, a system that enables users of supercomputers or networked PCs to work on large data sets within Matlab transparently. MITMatlab is based on the Parallel Problems Server (PPServer), a standalone 'linear algebra server' that provides a mechanism for running distributed memory algorithms on large data sets. The PPServer and MITMatlab enable high-performance interactive supercomputing. With such a tool, researchers can now use Matlab as more than a prototyping tool for experimenting with small problems. Instead, MITMatlab makes is possible to visualize and operate interactively on large data sets. This has implications not only in supercomputing, but for Artificial Intelligence applicatons such as Machine Learning, Information Retrieval and Image Processing.



Author[s]: Zhaoping Li

Pre-Attentive Segmentation in the Primary Visual Cortex

June 30, 1998

Stimuli outside classical receptive fields have been shown to exert significant influence over the activities of neurons in primary visual cortexWe propose that contextual influences are used for pre-attentive visual segmentation, in a new framework called segmentation without classification. This means that segmentation of an image into regions occurs without classification of features within a region or comparison of features between regions. This segmentation framework is simpler than previous computational approaches, making it implementable by V1 mechanisms, though higher leve l visual mechanisms are needed to refine its output. However, it easily handles a class of segmentation problems that are tricky in conventional methods. The cortex computes global region boundaries by detecting the breakdown of homogeneity or translation invariance in the input, using local intra-cortical interactions mediated by the horizontal connections. The difference between contextual influences near and far from region boundaries makes neural activities near region boundaries higher than elsewhere, making boundaries more salient for perceptual pop-out. This proposal is implemented in a biologically based model of V1, and demonstrated using examples of texture segmentation and figure-ground segregation. The model performs segmentation in exactly the same neural circuit that solves the dual problem of the enhancement of contours, as is suggested by experimental observations. Its behavior is compared with psychophysical and physiological data on segmentation, contour enhancement, and contextual influences. We discuss the implications of segmentation without classification and the predictions of our V1 model, and relate it to other phenomena such as asymmetry in visual search.


Author[s]: Oded Maron

Learning from Ambiguity

December 1998

There are many learning problems for which the examples given by the teacher are ambiguously labeled. In this thesis, we will examine one framework of learning from ambiguous examples known as Multiple- Instance learning. Each example is a bag, consisting of any number of instances. A bag is labeled negative if all instances in it are negative. A bag is labeled positive if at least one instance in it is positive. Because the instances themselves are not labeled, each positive bag is an ambiguous example. We would like to learn a concept which will correctly classify unseen bags. We have developed a measure called Diverse Density and algorithms for learning from multiple- instance examples. We have applied these techniques to problems in drug design, stock prediction, and image database retrieval. These serve as examples of how to translate the ambiguity in the application domain into bags, as well as successful examples of applying Diverse Density techniques.


Author[s]: Oded Maron and Tomas Lozano-Perez

Visible Decomposition: Real-Time Path Planning in Large Planar Environments

June, 1998

We describe a method called Visible Decomposition for computing collision-free paths in real time through a planar environment with a large number of obstacles. This method divides space into local visibility graphs, ensuring that all operations are local. The search time is kept low since the number of regions is proved to be small. We analyze the computational demands of the algorithm and the quality of the paths it produces. In addition, we show test results on a large simulation testbed.


Author[s]: Gina-Anne Levow

Corpus-Based Techniques for Word Sense Disambiguation

May 27, 1998

The need for robust and easily extensible systems for word sense disambiguation coupled with successes in training systems for a variety of tasks using large on-line corpora has led to extensive research into corpus-based statistical approaches to this problem. Promising results have been achieved by vector space representations of context, clustering combined with a semantic knowledge base, and decision lists based on collocational relations. We evaluate these techniques with respect to three important criteria: how their definition of context affects their ability to incorporate different types of disambiguating information, how they define similarity among senses, and how easily they can generalize to new senses. The strengths and weaknesses of these systems provide guidance for future systems which must capture and model a variety of disambiguating information, both syntactic and semantic.


Author[s]: Charles Isbell and Paul Viola

Restructuring Sparse High Dimensional Data for Effective Retrieval

May 1998

The task in text retrieval is to find the subset of a collection of documents relevant to a user's information request, usually expressed as a set of words. Classically, documents and queries are represented as vectors of word counts. In its simplest form, relevance is defined to be the dot product between a document and a query vector--a measure of the number of common terms. A central difficulty in text retrieval is that the presence or absence of a word is not sufficient to determine relevance to a query. Linear dimensionality reduction has been proposed as a technique for extracting underlying structure from the document collection. In some domains (such as vision) dimensionality reduction reduces computational complexity. In text retrieval it is more often used to improve retrieval performance. We propose an alternative and novel technique that produces sparse representations constructed from sets of highly-related words. Documents and queries are represented by their distance to these sets. and relevance is measured by the number of common clusters. This technique significantly improves retrieval performance, is efficient to compute and shares properties with the optimal linear projection operator and the independent components of documents.



Author[s]: Constantine P. Papgeorgiou, Federico Girosi and Tomaso Poggio

Sparse Correlation Kernel Analysis and Reconstruction

May 1998

This paper presents a new paradigm for signal reconstruction and superresolution, Correlation Kernel Analysis (CKA), that is based on the selection of a sparse set of bases from a large dictionary of class- specific basis functions. The basis functions that we use are the correlation functions of the class of signals we are analyzing. To choose the appropriate features from this large dictionary, we use Support Vector Machine (SVM) regression and compare this to traditional Principal Component Analysis (PCA) for the tasks of signal reconstruction, superresolution, and compression. The testbed we use in this paper is a set of images of pedestrians. This paper also presents results of experiments in which we use a dictionary of multiscale basis functions and then use Basis Pursuit De-Noising to obtain a sparse, multiscale approximation of a signal. The results are analyzed and we conclude that 1) when used with a sparse representation technique, the correlation function is an effective kernel for image reconstruction and superresolution, 2) for image compression, PCA and SVM have different tradeoffs, depending on the particular metric that is used to evaluate the results, 3) in sparse representation techniques, L_1 is not a good proxy for the true measure of sparsity, L_0, and 4) the L_epsilon norm may be a better error metric for image reconstruction and compression than the L_2 norm, though the exact psychophysical metric should take into account high order structure in images.


Author[s]: Anita M. Flynn

Piezoelectric Ultrasonic Micromotors

June 1995

This report describes development of micro- fabricated piezoelectric ultrasonic motors and bulk-ceramic piezoelectric ultrasonic motors. Ultrasonic motors offer the advantage of low speed, high torque operation without the need for gears. They can be made compact and lightweight and provide a holding torque in the absence of applied power, due to the traveling wave frictional coupling mechanism between the rotor and the stator. This report covers modeling, simulation, fabrication and testing of ultrasonic motors. Design of experiments methods were also utilized to find optimal motor parameters. A suite of 8 mm diameter x 3 mm tall motors were machined for these studies and maximum stall torques as large as 10^(- 3) Nm, maximum no-load speeds of 1710 rpm and peak power outputs of 27 mW were realized. Aditionally, this report describes the implementation of a microfabricated ultrasonic motor using thin- film lead zirconate titanate. In a joint project with the Pennsylvania State University Materials Research Laboratory and MIT Lincoln Laboratory, 2 mm and 5 mm diameter stator structures were fabricated on 1 micron thick silicon nitride membranes. Small glass lenses placed down on top spun at 100-300 rpm with 4 V excitation at 90 kHz. The large power densities and stall torques of these piezoelectric ultrasonic motors offer tremendous promis for integrated machines: complete intelligent, electro-mechanical autonomous systems mass-produced in a single fabrication process.


Author[s]: Kenneth Yip and Gerald Jay Sussman

Sparse Representations for Fast, One-Shot Learning

November 1997

Humans rapidly and reliably learn many kinds of regularities and generalizations. We propose a novel model of fast learning that exploits the properties of sparse representations and the constraints imposed by a plausible hardware mechanism. To demonstrate our approach we describe a computational model of acquisition in the domain of morphophonology. We encapsulate phonological information as bidirectional boolean constraint relations operating on the classical linguistic representations of speech sounds in term of distinctive features. The performance model is described as a hardware mechanism that incrementally enforces the constraints. Phonological behavior arises from the action of this mechanism. Constraints are induced from a corpus of common English nouns and verbs. The induction algorithm compiles the corpus into increasingly sophisticated constraints. The algorithm yields one-shot learning from a few examples. Our model has been implemented as a computer program. The program exhibits phonological behavior similar to that of young children. As a bonus the constraints that are acquired can be interpreted as classical linguistic rules.



Author[s]: Tomaso Poggio and Federico Girosi

Notes on PCA, Regularization, Sparsity and Support Vector Machines

May 1998

We derive a new representation for a function as a linear combination of local correlation kernels at optimal sparse locations and discuss its relation to PCA, regularization, sparsity principles and Support Vector Machines. We first review previous results for the approximation of a function from discrete data (Girosi, 1998) in the context of Vapnik"s feature space and dual representation (Vapnik, 1995). We apply them to show 1) that a standard regularization functional with a stabilizer defined in terms of the correlation function induces a regression function in the span of the feature space of classical Principal Components and 2) that there exist a dual representations of the regression function in terms of a regularization network with a kernel equal to a generalized correlation function. We then describe the main observation of the paper: the dual representation in terms of the correlation function can be sparsified using the Support Vector Machines (Vapnik, 1982) technique and this operation is equivalent to sparsify a large dictionary of basis functions adapted to the task, using a variation of Basis Pursuit De-Noising (Chen, Donoho and Saunders, 1995; see also related work by Donahue and Geiger, 1994; Olshausen and Field, 1995; Lewicki and Sejnowski, 1998). In addition to extending the close relations between regularization, Support Vector Machines and sparsity, our work also illuminates and formalizes the LFA concept of Penev and Atick (1996). We discuss the relation between our results, which are about regression, and the different problem of pattern classification.


Author[s]: Kevin K. Lin

Coordinate-Independent Computations on Differential Equations

March 1998

This project investigates the computational representation of differentiable manifolds, with the primary goal of solving partial differential equations using multiple coordinate systems on general n- dimensional spaces. In the process, this abstraction is used to perform accurate integrations of ordinary differential equations using multiple coordinate systems. In the case of linear partial differential equations, however, unexpected difficulties arise even with the simplest equations.


Author[s]: Thomas Marill

Recovery of Three-Dimensional Objects from Single Perspective Images

March 1998

Any three-dimensional wire-frame object constructed out of parallelograms can be recovered from a single perspective two-dimensional image. A procedure for performing the recovery is given.



Author[s]: Maximilian Riesenhuber and Tomaso Poggio

Modeling Invariances in Inferotemporal Cell Tuning

March 1998

In macaque inferotemporal cortex (IT), neurons have been found to respond selectively to complex shapes while showing broad tuning ("invariance") with respect to stimulus transformations such as translation and scale changes and a limited tuning to rotation in depth. Training monkeys with novel, paperclip-like objects, Logothetis et al. could investigate whether these invariance properties are due to experience with exhaustively many transformed instances of an object or if there are mechanisms that allow the cells to show response invariance also to previously unseen instances of that object. They found object-selective cells in anterior IT which exhibited limited invariance to various transformations after training with single object views. While previous models accounted for the tuning of the cells for rotations in depth and for their selectivity to a specific object relative to a population of distractor objects, the model described here attempts to explain in a biologically plausible way the additional properties of translation and size invariance. Using the same stimuli as in the experiment, we find that model IT neurons exhibit invariance properties which closely parallel those of real neurons. Simulations show that the model is capable of unsupervised learning of view-tuned neurons. The model also allows to make experimentally testable predictions regarding novel stimulus transformations and combinations of stimuli.


Author[s]: Brian Scassellati

A Binocular, Foveated Active Vision System

March 1998

This report documents the design and implementation of a binocular, foveated active vision system as part of the Cog project at the MIT Artificial Intelligence Laboratory. The active vision system features a three degree of freedom mechanical platform that supports four color cameras, a motion control system, and a parallel network of digital signal processors for image processing. To demonstrate the capabilities of the system, we present results from four sample visual-motor tasks.


Author[s]: Alan Bawden

Implementing Distributed Systems Using Linear Naming

March 1993

Linear graph reduction is a simple computational model in which the cost of naming things is explicitly represented. The key idea is the notion of "linearity". A name is linear if it is only used once, so with linear naming you cannot create more than one outstanding reference to an entity. As a result, linear naming is cheap to support and easy to reason about. Programs can be translated into the linear graph reduction model such that linear names in the program are implemented directly as linear names in the model. Nonlinear names are supported by constructing them out of linear names. The translation thus exposes those places where the program uses names in expensive, nonlinear ways. Two applications demonstrate the utility of using linear graph reduction: First, in the area of distributed computing, linear naming makes it easy to support cheap cross-network references and highly portable data structures, Linear naming also facilitates demand driven migration of tasks and data around the network without requiring explicit guidance from the programmer. Second, linear graph reduction reveals a new characterization of the phenomenon of state. Systems in which state appears are those which depend on certain - global- system properties. State is not a localizable phenomenon, which suggests that our usual object oriented metaphor for state is flawed.


Author[s]: Radhika Nagpal and Daniel Coore

An Algorithm for Group Formation and Maximal Independent Set in an Amorphous Computer

February 1998

Amorphous computing is the study of programming ultra-scale computing environments of smart sensors and actuators cite{white-paper}. The individual elements are identical, asynchronous, randomly placed, embedded and communicate locally via wireless broadcast. Aggregating the processors into groups is a useful paradigm for programming an amorphous computer because groups can be used for specialization, increased robustness, and efficient resource allocation. This paper presents a new algorithm, called the clubs algorithm, for efficiently aggregating processors into groups in an amorphous computer, in time proportional to the local density of processors. The clubs algorithm is well-suited to the unique characteristics of an amorphous computer. In addition, the algorithm derives two properties from the physical embedding of the amorphous computer: an upper bound on the number of groups formed and a constant upper bound on the density of groups. The clubs algorithm can also be extended to find the maximal independent set (MIS) and $Delta + 1$ vertex coloring in an amorphous computer in $O(log N)$ rounds, where $N$ is the total number of elements and $Delta$ is the maximum degree.



Author[s]: Thomas Hofmann and Jan Puzicha

Statistical Models for Co-occurrence Data

February 1998

Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.



Author[s]: Yar Weiss and Edward H. Adelson

Slow and Smooth: A Bayesian Theory for the Combination of Local Motion Signals in Human Vision

February 1998

In order to estimate the motion of an object, the visual system needs to combine multiple local measurements, each of which carries some degree of ambiguity. We present a model of motion perception whereby measurements from different image regions are combined according to a Bayesian estimator --- the estimated motion maximizes the posterior probability assuming a prior favoring slow and smooth velocities. In reviewing a large number of previously published phenomena we find that the Bayesian estimator predicts a wide range of psychophysical results. This suggests that the seemingly complex set of illusions arise from a single computational strategy that is optimal under reasonable assumptions.


Author[s]: Gideon P. Stein and Amnon Shashua

Direct Estimation of Motion and Extended Scene Structure from a Moving Stereo Rig

December 1998

We describe a new method for motion estimation and 3D reconstruction from stereo image sequences obtained by a stereo rig moving through a rigid world. We show that given two stereo pairs one can compute the motion of the stereo rig directly from the image derivatives (spatial and temporal). Correspondences are not required. One can then use the images from both pairs combined to compute a dense depth map. The motion estimates between stereo pairs enable us to combine depth maps from all the pairs in the sequence to form an extended scene reconstruction and we show results from a real image sequence. The motion computation is a linear least squares computation using all the pixels in the image. Areas with little or no contrast are implicitly weighted less so one does not have to explicitly apply a confidence measure.



Author[s]: Gideon P. Stein and Amnon Shashua

On Degeneracy of Linear Reconstruction from Three Views: Linear Line Complex and Applications

December 1997

This paper investigates the linear degeneracies of projective structure estimation from point and line features across three views. We show that the rank of the linear system of equations for recovering the trilinear tensor of three views reduces to 23 (instead of 26) in the case when the scene is a Linear Line Complex (set of lines in space intersecting at a common line) and is 21 when the scene is planar. The LLC situation is only linearly degenerate, and we show that one can obtain a unique solution when the admissibility constraints of the tensor are accounted for. The line configuration described by an LLC, rather than being some obscure case, is in fact quite typical. It includes, as a particular example, the case of a camera moving down a hallway in an office environment or down an urban street. Furthermore, an LLC situation may occur as an artifact such as in direct estimation from spatio-temporal derivatives of image brightness. Therefore, an investigation into degeneracies and their remedy is important also in practice.



Author[s]: Theodoros Evgeniou and Tomaso Poggio

Sparse Representations of Multiple Signals

September 1997

We discuss the problem of finding sparse representations of a class of signals. We formalize the problem and prove it is NP-complete both in the case of a single signal and that of multiple ones. Next we develop a simple approximation method to the problem and we show experimental results using artificially generated signals. Furthermore,we use our approximation method to find sparse representations of classes of real signals, specifically of images of pedestrians. We discuss the relation between our formulation of the sparsity problem and the problem of finding representations of objects that are compact and appropriate for detection and classification.


Author[s]: Dan Halperin and Christian R. Shelton

A Perturbation Scheme for Spherical Arrangements with Application to Molecular Modeling

December 1997

We describe a software package for computing and manipulating the subdivision of a sphere by a collection of (not necessarily great) circles and for computing the boundary surface of the union of spheres. We present problems that arise in the implementation of the software and the solutions that we have found for them. At the core of the paper is a novel perturbation scheme to overcome degeneracies and precision problems in computing spherical arrangements while using floating point arithmetic. The scheme is relatively simple, it balances between the efficiency of computation and the magnitude of the perturbation, and it performs well in practice. In one O(n) time pass through the data, it perturbs the inputs necessary to insure no potential degeneracies and then passes the perturbed inputs on to the geometric algorithm. We report and discuss experimental results. Our package is a major component in a larger package aimed to support geometric queries on molecular models; it is currently employed by chemists working in "rational drug design." The spherical subdivisions are used to construct a geometric model of a molecule where each sphere represents an atom. We also give an overview of the molecular modeling package and detail additional features and implementation issues.


Author[s]: J. Kenneth Salisbury and Mandayam A. Srinivasan

Proceedings of the Second PHANToM User's Group Workshop

December 1997

On October 19-22, 1997 the Second PHANToM Users Group Workshop was held at the MIT Endicott House in Dedham, Massachusetts. Designed as a forum for sharing results and insights, the workshop was attended by more than 60 participants from 7 countries. These proceedings report on workshop presentations in diverse areas including rigid and compliant rendering, tool kits, development environments, techniques for scientific data visualization, multi-modal issues and a programming tutorial.



Author[s]: Yair Weiss

Belief Propagation and Revision in Networks with Loops

November 1997

Local belief propagation rules of the sort proposed by Pearl(1988) are guaranteed to converge to the optimal beliefs for singly connected networks. Recently, a number of researchers have empirically demonstrated good performance of these same algorithms on networks with loops, but a theoretical understanding of this performance has yet to be achieved. Here we lay the foundation for an understanding of belief propagation in networks with loops. For networks with a single loop, we derive ananalytical relationship between the steady state beliefs in the loopy network and the true posterior probability. Using this relationship we show a category of networks for which the MAP estimate obtained by belief update and by belief revision can be proven to be optimal (although the beliefs will be incorrect). We show how nodes can use local information in the messages they receive in order to correct the steady state beliefs. Furthermore we prove that for all networks with a single loop, the MAP estimate obtained by belief revisionat convergence is guaranteed to give the globally optimal sequence of states. The result is independent of the length of the cycle and the size of the statespace. For networks with multiple loops, we introduce the concept of a "balanced network" and show simulati.



Author[s]: Shimon Edelman and Sharon Duvdevani-Bar

Visual Recognition and Categorization on the Basis of Similarities to Multiple Class Prototypes

September 1997

To recognize a previously seen object, the visual system must overcome the variability in the object's appearance caused by factors such as illumination and pose. Developments in computer vision suggest that it may be possible to counter the influence of these factors, by learning to interpolate between stored views of the target object, taken under representative combinations of viewing conditions. Daily life situations, however, typically require categorization, rather than recognition, of objects. Due to the open-ended character both of natural kinds and of artificial categories, categorization cannot rely on interpolation between stored examples. Nonetheless, knowledge of several representative members, or prototypes, of each of the categories of interest can still provide the necessary computational substrate for the categorization of new instances. The resulting representational scheme based on similarities to prototypes appears to be computationally viable, and is readily mapped onto the mechanisms of biological vision revealed by recent psychophysical and physiological studies.


Author[s]: Daniel Coore, Radhika Nagpal and Ron Weiss

Paradigms for Structure in an Amorphous Computer

October 1997

Recent developments in microfabrication and nanotechnology will enable the inexpensive manufacturing of massive numbers of tiny computing elements with sensors and actuators. New programming paradigms are required for obtaining organized and coherent behavior from the cooperation of large numbers of unreliable processing elements that are interconnected in unknown, irregular, and possibly time-varying ways. Amorphous computing is the study of developing and programming such ultrascale computing environments. This paper presents an approach to programming an amorphous computer by spontaneously organizing an unstructured collection of processing elements into cooperative groups and hierarchies. This paper introduces a structure called an AC Hierarchy, which logically organizes processors into groups at different levels of granularity. The AC hierarchy simplifies programming of an amorphous computer through new language abstractions, facilitates the design of efficient and robust algorithms, and simplifies the analysis of their performance. Several example applications are presented that greatly benefit from the AC hierarchy. This paper introduces three algorithms for constructing multiple levels of the hierarchy from an unstructured collection of processors.



Author[s]: Zhaoping Li

Visual Segmentation without Classification in a Model of the Primary Visual Cortex

August 1997

Stimuli outside classical receptive fields significantly influence the neurons' activities in primary visual cortex. We propose that such contextual influences are used to segment regions by detecting the breakdown of homogeneity or translation invariance in the input, thus computing global region boundaries using local interactions. This is implemented in a biologically based model of V1, and demonstrated in examples of texture segmentation and figure-ground segregation. By contrast with traditional approaches, segmentation occurs without classification or comparison of features within or between regions and is performed by exactly the same neural circuit responsible for the dual problem of the grouping and enhancement of contours.



Author[s]: Massimiliano Pontil and Alessandro Verri

Properties of Support Vector Machines

August 1997

Support Vector Machines (SVMs) perform pattern recognition between two point classes by finding a decision surface determined by certain points of the training set, termed Support Vectors (SV). This surface, which in some feature space of possibly infinite dimension can be regarded as a hyperplane, is obtained from the solution of a problem of quadratic programming that depends on a regularization parameter. In this paper we study some mathematical properties of support vectors and show that the decision surface can be written as the sum of two orthogonal terms, the first depending only on the margin vectors (which are SVs lying on the margin), the second proportional to the regularization parameter. For almost all values of the parameter, this enables us to predict how the decision surface varies for small parameter changes. In the special but important case of feature space of finite dimension m, we also show that there are at most m+1 margin vectors and observe that m+1 SVs are usually sufficient to fully determine the decision surface. For relatively small m this latter result leads to a consistent reduction of the SV number.



Author[s]: Marina Meila, Michael I. Jordan and Quaid Morris

Estimating Dependency Structure as a Hidden Variable

June 1997

This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms that use EMand the Minimum Spanning Tree algorithm to find the ML and MAP mixtureof trees for a variety of priors, including the Dirichlet and the MDL priors.



Author[s]: Marcus Dill and Shimon Edelman

Translation Invariance in Object Recognition, and Its Relation to Other Visual Transformations

June 1997

Human object recognition is generally considered to tolerate changes of the stimulus position in the visual field. A number of recent studies, however, have cast doubt on the completeness of translation invariance. In a new series of experiments we tried to investigate whether positional specificity of short-term memory is a general property of visual perception. We tested same/different discrimination of computer graphics models that were displayed at the same or at different locations of the visual field, and found complete translation invariance, regardless of the similarity of the animals and irrespective of direction and size of the displacement (Exp. 1 and 2). Decisions were strongly biased towards same decisions if stimuli appeared at a constant location, while after translation subjects displayed a tendency towards different decisions. Even if the spatial order of animal limbs was randomized ("scrambled animals"), no deteriorating effect of shifts in the field of view could be detected (Exp. 3). However, if the influence of single features was reduced (Exp. 4 and 5) small but significant effects of translation could be obtained. Under conditions that do not reveal an influence of translation, rotation in depth strongly interferes with recognition (Exp. 6). Changes of stimulus size did not reduce performance (Exp. 7). Tolerance to these object transformations seems to rely on different brain mechanisms, with translation and scale invariance being achieved in principle, while rotation invariance is not.



Author[s]: Gad Geiger and Jerome Y. Lettvin

A View on Dyslexia

June 1997

We describe here, briefly, a perceptual non- reading measure which reliably distinguishes between dyslexic persons and ordinary readers. More importantly, we describe a regimen of practice with which dyslexics learn a new perceptual strategy for reading. Two controlled experiment on dyslexics children demonstrate the regimen's efficiency.



Author[s]: Federico Girosi

An Equivalence Between Sparse Approximation and Support Vector Machines

May 1997

In the first part of this paper we show a similarity between the principle of Structural Risk Minimization Principle (SRM) (Vapnik, 1982) and the idea of Sparse Approximation, as defined in (Chen, Donoho and Saunders, 1995) and Olshausen and Field (1996). Then we focus on two specific (approximate) implementations of SRM and Sparse Approximation, which have been used to solve the problem of function approximation. For SRM we consider the Support Vector Machine technique proposed by V. Vapnik and his team at AT&T Bell Labs, and for Sparse Approximation we consider a modification of the Basis Pursuit De-Noising algorithm proposed by Chen, Donoho and Saunders (1995). We show that, under certain conditions, these two techniques are equivalent: they give the same solution and they require the solution of the same quadratic programming problem.



Author[s]: Marina Meila and Michael I. Jordan

Triangulation by Continuous Embedding

March 1997

When triangulating a belief network we aim to obtain a junction tree of minimum state space. Searching for the optimal triangulation can be cast as a search over all the permutations of the network's vaeriables. Our approach is to embed the discrete set of permutations in a convex continuous domain D. By suitably extending the cost function over D and solving the continous nonlinear optimization task we hope to obtain a good triangulation with respect to the aformentioned cost. In this paper we introduce an upper bound to the total junction tree weight as the cost function. The appropriatedness of this choice is discussed and explored by simulations. Then we present two ways of embedding the new objective function into continuous domains and show that they perform well compared to the best known heuristic.


Author[s]: William J. Dally, Leonard McMillan, Gary Bishop and Henry Fuchs

The Delta Tree: An Object-Centered Approach to Image-Based Rendering

May 2, 1997

This paper introduces the delta tree, a data structure that represents an object using a set of reference images. It also describes an algorithm for generating arbitrary re- projections of an object by traversing its delta tree. Delta trees are an efficient representation in terms of both storage and rendering performance. Each node of a delta tree stores an image taken from a point on a sampling sphere that encloses the object. Each image is compressed by discarding pixels that can be reconstructed by warping its ancestor's images to the node's viewpoint. The partial image stored at each node is divided into blocks and represented in the frequency domain. The rendering process generates an image at an arbitrary viewpoint by traversing the delta tree from a root node to one or more of its leaves. A subdivision algorithm selects only the required blocks from the nodes along the path. For each block, only the frequency components necessary to reconstruct the final image at an appropriate sampling density are used. This frequency selection mechanism handles both antialiasing and level-of-detail within a single framework. A complex scene is initially rendered by compositing images generated by traversing the delta trees of its components. Once the reference views of a scene are rendered once in this manner, the entire scene can be reprojected to an arbitrary viewpoint by traversing its own delta tree. Our approach is limited to generating views of an object from outside the object's convex hull. In practice we work around this problem by subdividing objects to render views from within the convex hull.



Author[s]: Shai Avidan, Theodoros Evgeniou, Amnon Shashua and Tomaso Poggio

Image-Based View Synthesis

January 1997

We present a new method for rendering novel images of flexible 3D objects from a small number of example images in correspondence. The strength of the method is the ability to synthesize images whose viewing position is significantly far away from the viewing cone of the example images ("view extrapolation"), yet without ever modeling the 3D structure of the scene. The method relies on synthesizing a chain of "trilinear tensors" that governs the warping function from the example images to the novel image, together with a multi-dimensional interpolation function that synthesizes the non-rigid motions of the viewed object from the virtual camera position. We show that two closely spaced example images alone are sufficient in practice to synthesize a significant viewing cone, thus demonstrating the ability of representing an object by a relatively small number of model images --- for the purpose of cheap and fast viewers that can run on standard hardware.



Author[s]: Edgar Osuna, Robert Freund and Federico Girosi

Support Vector Machines: Training and Applications

March 1997

The Support Vector Machine (SVM) is a new and very promising classification technique developed by Vapnik and his group at AT&T Bell Labs. This new learning algorithm can be seen as an alternative training technique for Polynomial, Radial Basis Function and Multi- Layer Perceptron classifiers. An interesting property of this approach is that it is an approximate implementation of the Structural Risk Minimization (SRM) induction principle. The derivation of Support Vector Machines, its relationship with SRM, and its geometrical insight, are discussed in this paper. Training a SVM is equivalent to solve a quadratic programming problem with linear and box constraints in a number of variables equal to the number of data points. When the number of data points exceeds few thousands the problem is very challenging, because the quadratic form is completely dense, so the memory needed to store the problem grows with the square of the number of data points. Therefore, training problems arising in some real applications with large data sets are impossible to load into memory, and cannot be solved using standard non-linear constrained optimization algorithms. We present a decomposition algorithm that can be used to train SVM's over large data sets. The main idea behind the decomposition is the iterative solution of sub-problems and the evaluation of, and also establish the stopping criteria for the algorithm. We present previous approaches, as well as results and important details of our implementation of the algorithm using a second-order variant of the Reduced Gradient Method as the solver of the sub- problems. As an application of SVM's, we present preliminary results we obtained applying SVM to the problem of detecting frontal human faces in real images.



Author[s]: Thomas Vetter, Michael J. Jones and Tomaso Poggio

A Bootstrapping Algorithm for Learning Linear Models of Object Classes

Flexible models of object classes, based on linear combinations of prototypical images, are capable of matching novel images of the same class and have been shown to be a powerful tool to solve several fundamental vision tasks such as recognition, synthesis and correspondence. The key problem in creating a specific flexible model is the computation of pixelwise correspondence between the prototypes, a task done until now in a semiautomatic way. In this paper we describe an algorithm that automatically bootstraps the correspondence between the prototypes. The algorithm - which can be used for 2D images as well as for 3D models - is shown to synthesize successfully a flexible model of frontal face images and a flexible model of handwritten digits.



Author[s]: B. Schoelkopf, K. Sung, C. Burges, F. Girosi, P. Niyogi, T. Poggio and V. Vapnik

Comparing Support Vector Machines with Gaussian Kernels to Radial Basis Function Classifiers

December 1996

The Support Vector (SV) machine is a novel type of learning machine, based on statistical learning theory, which contains polynomial classifiers, neural networks, and radial basis function (RBF) networks as special cases. In the RBF case, the SV algorithm automatically determines centers, weights and threshold such as to minimize an upper bound on the expected test error. The present study is devoted to an experimental comparison of these machines with a classical approach, where the centers are determined by $k$-- means clustering and the weights are found using error backpropagation. We consider three machines, namely a classical RBF machine, an SV machine with Gaussian kernel, and a hybrid system with the centers determined by the SV method and the weights trained by error backpropagation. Our results show that on the US postal service database of handwritten digits, the SV machine achieves the highest test accuracy, followed by the hybrid approach. The SV approach is thus not only theoretically well--founded, but also superior in a practical application.



Author[s]: Joerg C. Lemm

Prior Information and Generalized Questions

In learning problems available information is usually divided into two categories: examples of function values (or training data) and prior information (e.g. a smoothness constraint). This paper 1.) studies aspects on which these two categories usually differ, like their relevance for generalization and their role in the loss function, 2.) presents a unifying formalism, where both types of information are identified with answers to generalized questions, 3.) shows what kind of generalized information is necessary to enable learning, 4.) aims to put usual training data and prior information on a more equal footing by discussing possibilities and variants of measurement and control for generalized questions, including the examples of smoothness and symmetries, 5.) reviews shortly the measurement of linguistic concepts based on fuzzy priors, and principles to combine preprocessors, 6.) uses a Bayesian decision theoretic framework, contrasting parallel and inverse decision problems, 7.) proposes, for problems with non--approximation aspects, a Bayesian two step approximation consisting of posterior maximization and a subsequent risk minimization, 8.) analyses empirical risk minimization under the aspect of nonlocal information 9.) compares the Bayesian two step approximation with empirical risk minimization, including their interpretations of Occam's razor, 10.) formulates examples of stationarity conditions for the maximum posterior approximation with nonlocal and nonconvex priors, leading to inhomogeneous nonlinear equations, similar for example to equations in scattering theory in physics. In summary, this paper focuses on the dependencies between answers to different questions. Because not training examples alone but such dependencies enable generalization, it emphasizes the need of their empirical measurement and control and of a more explicit treatment in theory.


Author[s]: J. Kenneth Salisbury and Mandayam A. Srinivasan (editors)

The Proceedings of the First PHANToM User's Group Workshop

December 1996

These proceedings summarize the results of the First PHANToM User's Group Workshop held September 27-30, 1996 MIT. The goal of the workshop was to bring together a group of active users of the PHANToM Haptic Interface to discuss the scientific and engineering challenges involved in bringing haptics into widespread use, and to explore the future possibilities of this exciting technology. With over 50 attendees and 25 presentations the workshop provided the first large forum for users of a common haptic interface to share results and engage in collaborative discussions. Short papers from the presenters are contained herein and address the following topics: Research Effort Overviews, Displays and Effects, Applications in Teleoperation and Training, Tools for Simulated Worlds and, Data Visualization.


Author[s]: Gideon P. Stein

Lens Distortion Calibration Using Point Correspondences

December 1996

This paper describes a new method for lens distortion calibration using only point correspondences in multiple views, without the need to know either the 3D location of the points or the camera locations. The standard lens distortion model is a model of the deviations of a real camera from the ideal pinhole or projective camera model.Given multiple views of a set of corresponding points taken by ideal pinhole cameras there exist epipolar and trilinear constraints among pairs and triplets of these views. In practice, due to noise in the feature detection and due to lens distortion these constraints do not hold exactly and we get some error. The calibration is a search for the lens distortion parameters that minimize this error. Using simulation and experimental results with real images we explore the properties of this method. We describe the use of this method with the standard lens distortion model, radial and decentering, but it could also be used with any other parametric distortion models. Finally we demonstrate that lens distortion calibration improves the accuracy of 3D reconstruction.


Author[s]: Gideon P. Stein and Amnon Shashua

Direct Methods for Estimation of Structure and Motion from Three Views

December 1996

We describe a new direct method for estimating structure and motion from image intensities of multiple views. We extend the direct methods of Horn- and-Weldon to three views. Adding the third view enables us to solve for motion, and compute a dense depth map of the scene, directly from image spatio -temporal derivatives in a linear manner without first having to find point correspondences or compute optical flow. We describe the advantages and limitations of this method which are then verified through simulation and experiments with real images.


Author[s]: J.P. Mellor, Seth Teller and Tomas Lozano-Perez

Dense Depth Maps from Epipolar Images

November 1996

Recovering three-dimensional information from two-dimensional images is the fundamental goal of stereo techniques. The problem of recovering depth (three- dimensional information) from a set of images is essentially the correspondence problem: Given a point in one image, find the corresponding point in each of the other images. Finding potential correspondences usually involves matching some image property. If the images are from nearby positions, they will vary only slightly, simplifying the matching process. Once a correspondence is known, solving for the depth is simply a matter of geometry. Real images are composed of noisy, discrete samples, therefore the calculated depth will contain error. This error is a function of the baseline or distance between the images. Longer baselines result in more precise depths. This leads to a conflict: short baselines simplify the matching process, but produce imprecise results; long baselines produce precise results, but complicate the matching process. In this paper, we present a method for generating dense depth maps from large sets (1000's) of images taken from arbitrary positions. Long baseline images improve the accuracy. Short baseline images and the large number of images greatly simplifies the correspondence problem, removing nearly all ambiguity. The algorithm presented is completely local and for each pixel generates an evidence versus depth and surface normal distribution. In many cases, the distribution contains a clear and distinct global maximum. The location of this peak determines the depth and its shape can be used to estimate the error. The distribution can also be used to perform a maximum likelihood fit of models directly to the images. We anticipate that the ability to perform maximum likelihood estimation from purely local calculations will prove extremely useful in constructing three dimensional models from large sets of images.



Author[s]: Theodoros Evgeniou

Image Based Rendering Using Algebraic Techniques

November 1996

This paper presents an image-based rendering system using algebraic relations between different views of an object. The system uses pictures of an object taken from known positions. Given three such images it can generate "virtual'' ones as the object would look from any position near the ones that the two input images were taken from. The extrapolation from the example images can be up to about 60 degrees of rotation. The system is based on the trilinear constraints that bind any three view so fan object. As a side result, we propose two new methods for camera calibration. We developed and used one of them. We implemented the system and tested it on real images of objects and faces. We also show experimentally that even when only two images taken from unknown positions are given, the system can be used to render the object from other view points as long as we have a good estimate of the internal parameters of the camera used and we are able to find good correspondence between the example images. In addition, we present the relation between these algebraic constraints and a factorization method for shape and motion estimation. As a result we propose a method for motion estimation in the special case of orthographic projection.


Author[s]: Paul Viola

Complex Feature Recognition: A Bayesian Approach for Learning to Recognize Objects

November 1996

We have developed a new Bayesian framework for visual object recognition which is based on the insight that images of objects can be modeled as a conjunction of local features. This framework can be used to both derive an object recognition algorithm and an algorithm for learning the features themselves. The overall approach, called complex feature recognition or CFR, is unique for several reasons: it is broadly applicable to a wide range of object types, it makes constructing object models easy, it is capable of identifying either the class or the identity of an object, and it is computationally efficient-- requiring time proportional to the size of the image. Instead of a single simple feature such as an edge, CFR uses a large set of complex features that are learned from experience with model objects. The response of a single complex feature contains much more class information than does a single edge. This significantly reduces the number of possible correspondences between the model and the image. In addition, CFR takes advantage of a type of image processing called 'oriented energy'. Oriented energy is used to efficiently pre-process the image to eliminate some of the difficulties associated with changes in lighting and pose.


Author[s]: Andrew A. Berlin

Towards Intelligent Structures: Active Control of Buckling

May 1994

The buckling of compressively-loaded members is one of the most important factors limiting the overall strength and stability of a structure. I have developed novel techniques for using active control to wiggle a structural element in such a way that buckling is prevented. I present the results of analysis, simulation, and experimentation to show that buckling can be prevented through computer- controlled adjustment of dynamical behavior.sI have constructed a small-scale railroad-style truss bridge that contains compressive members that actively resist buckling through the use of piezo-electric actuators. I have also constructed a prototype actively controlled column in which the control forces are applied by tendons, as well as a composite steel column that incorporates piezo-ceramic actuators that are used to counteract buckling. Active control of buckling allows this composite column to support 5.6 times more load than would otherwise be possible.sThese techniques promise to lead to intelligent physical structures that are both stronger and lighter than would otherwise be possible.


Author[s]: Andrew Justin Blumberg

General Purpose Parallel Computation on a DNA Substrate

December 1996

In this paper I describe and extend a new DNA computing paradigm introduced in Blumberg for building massively parallel machines in the DNA-computing models described by Adelman, Cai et. al., and Liu et. al. Employing only DNA operations which have been reported as successfully performed, I present an implementation of a Connection Machine, a SIMD (single-instruction multiple-data) parallel computer as an illustration of how to apply this approach to building computers in this domain (and as an implicit demonstration of PRAM equivalence). This is followed with a description of how to implement a MIMD (multiple-instruction multiple-data) parallel machine. The implementations described herein differ most from existing models in that they employ explicit communication between processing elements (and hence strands of DNA).


Author[s]: Andrew Justin Blumberg

Parallel Function Application on a DNA Substrate

December 1996

In this paper I present a new model that employs a biological (specifically DNA - based) substrate for performing computation. Specifically, I describe strategies for performing parallel function application in the DNA-computing models described by Adelman, Cai et. al., and Liu et. al. Employing only DNA operations which can presently be performed, I discuss some direct algorithms for computing a variety of useful mathematical functions on DNA, culminating in an algorithm for minimizing an arbitrary continuous function. In addition, computing genetic algorithms on a DNA substrate is briefly discussed.


Author[s]: Partha Niyogi

The Informational Complexity of Learning from Examples

September 1996

This thesis attempts to quantify the amount of information needed to learn certain tasks. The tasks chosen vary from learning functions in a Sobolev space using radial basis function networks to learning grammars in the principles and parameters framework of modern linguistic theory. These problems are analyzed from the perspective of computational learning theory and certain unifying perspectives emerge.


Author[s]: Andre DeHon

Reconfigurable Architectures for General-Purpose Computing

September 1996

General-purpose computing devices allow us to (1) customize computation after fabrication and (2) conserve area by reusing expensive active circuitry for different functions in time. We define RP-space, a restricted domain of the general-purpose architectural space focussed on reconfigurable computing architectures. Two dominant features differentiate reconfigurable from special- purpose architectures and account for most of the area overhead associated with RP devices: (1) instructions which tell the device how to behave, and (2) flexible interconnect which supports task dependent dataflow between operations. We can characterize RP- space by the allocation and structure of these resources and compare the efficiencies of architectural points across broad application characteristics. Conventional FPGAs fall at one extreme end of this space and their efficiency ranges over two orders of magnitude across the space of application characteristics. Understanding RP-space and its consequences allows us to pick the best architecture for a task and to search for more robust design points in the space. Our DPGA, a fine- grained computing device which adds small, on-chip instruction memories to FPGAs is one such design point. For typical logic applications and finite- state machines, a DPGA can implement tasks in one-third the area of a traditional FPGA. TSFPGA, a variant of the DPGA which focuses on heavily time- switched interconnect, achieves circuit densities close to the DPGA, while reducing typical physical mapping times from hours to seconds. Rigid, fabrication-time organization of instruction resources significantly narrows the range of efficiency for conventional architectures. To avoid this performance brittleness, we developed MATRIX, the first architecture to defer the binding of instruction resources until run-time, allowing the application to organize resources according to its needs. Our focus MATRIX design point is based on an array of 8-bit ALU and register- file building blocks interconnected via a byte- wide network. With today's silicon, a single chip MATRIX array can deliver over 10 Gop/s (8-bit ops). On sample image processing tasks, we show that MATRIX yields 10-20x the computational density of conventional processors. Understanding the cost structure of RP-space helps us identify these intermediate architectural points and may provide useful insight more broadly in guiding our continual search for robust and efficient general-purpose computing structures.


Author[s]: Miguel Hall

Prototype of a Configurable Web-Based Assessment System

June 1996

The MIT Prototype Educational Assessment System provides subjects and courses at MIT with the ability to perform online assessment. The system includes polices to handle harassment and electronic "flaming" while protecting privacy. Within these frameworks, individual courses and subjects can make their own policy decisions about such matters as to when assessments can occur, who can submit assessments, and how anonymous assessments are. By allowing assessment to take place continually and allowing both students and staff to participate, the system can provide a forum for the online discussion of subjects. Even in the case of scheduled assessments, the system can provide advantages over end-of-term assessment, since the scheduled assessments can occur several times during the semester, allowing subjects to identify and adjust those areas that could use improvement. Subjects can also develop customized questionnaires, perhaps in response to previous assessments, to suit their needs.


Author[s]: Ujjaval Y. Desai, Marcelo M. Mizuki, Ichiro Masaki and Berthold K.P. Horn

Edge and Mean Based Image Compression

November 1996

In this paper, we present a static image compression algorithm for very low bit rate applications. The algorithm reduces spatial redundancy present in images by extracting and encoding edge and mean information. Since the human visual system is highly sensitive to edges, an edge-based compression scheme can produce intelligible images at high compression ratios. We present good quality results for facial as well as textured, 256~x~256 color images at 0.1 to 0.3 bpp. The algorithm described in this paper was designed for high performance, keeping hardware implementation issues in mind. In the next phase of the project, which is currently underway, this algorithm will be implemented in hardware, and new edge- based color image sequence compression algorithms will be developed to achieve compression ratios of over 100, i.e., less than 0.12 bpp from 12 bpp. Potential applications include low power, portable video telephones.



Author[s]: Michael J. Jones and Tomaso Poggio

Model-Based Matching by Linear Combinations of Prototypes

December 1996

We describe a method for modeling object classes (such as faces) using 2D example images and an algorithm for matching a model to a novel image. The object class models are "learned'' from example images that we call prototypes. In addition to the images, the pixelwise correspondences between a reference prototype and each of the other prototypes must also be provided. Thus a model consists of a linear combination of prototypical shapes and textures. A stochastic gradient descent algorithm is used to match a model to a novel image by minimizing the error between the model and the novel image. Example models are shown as well as example matches to novel images. The robustness of the matching algorithm is also evaluated. The technique can be used for a number of applications including the computation of correspondence between novel images of a certain known class, object recognition, image synthesis and image compression.


Author[s]: Ann L. Torres

Virtual Model Control of a Hexapod Walking Robot

December 1996

Since robots are typically designed with an individual actuator at each joint, the control of these systems is often difficult and non- intuitive. This thesis explains a more intuitive control scheme called Virtual Model Control. This thesis also demonstrates the simplicity and ease of this control method by using it to control a simulated walking hexapod. Virtual Model Control uses imagined mechanical components to create virtual forces, which are applied through the joint torques of real actuators. This method produces a straightforward means of controlling joint torques to produce a desired robot behavior. Due to the intuitive nature of this control scheme, the design of a virtual model controller is similar to the design of a controller with basic mechanical components. The ease of this control scheme facilitates the use of a high level control system which can be used above the low level virtual model controllers to modulate the parameters of the imaginary mechanical components. In order to apply Virtual Model Control to parallel mechanisms, a solution to the force distribution problem is required. This thesis uses an extension of Gardner`s Partitioned Force Control method which allows for the specification of constrained degrees of freedom. This virtual model control technique was applied to a simulated hexapod robot. Although the hexapod is a highly non-linear, parallel mechanism, the virtual models allowed text-book control solutions to be used while the robot was walking. Using a simple linear control law, the robot walked while simultaneously balancing a pendulum and tracking an object.


Author[s]: Jerry E. Pratt

Virtual Model Control of a Biped Walking Robot

December 1995

The transformation from high level task specification to low level motion control is a fundamental issue in sensorimotor control in animals and robots. This thesis develops a control scheme called virtual model control which addresses this issue. Virtual model control is a motion control language which uses simulations of imagined mechanical components to create forces, which are applied through joint torques, thereby creating the illusion that the components are connected to the robot. Due to the intuitive nature of this technique, designing a virtual model controller requires the same skills as designing the mechanism itself. A high level control system can be cascaded with the low level virtual model controller to modulate the parameters of the virtual mechanisms. Discrete commands from the high level controller would then result in fluid motion. An extension of Gardner's Partitioned Actuator Set Control method is developed. This method allows for the specification of constraints on the generalized forces which each serial path of a parallel mechanism can apply. Virtual model control has been applied to a bipedal walking robot. A simple algorithm utilizing a simple set of virtual components has successfully compelled the robot to walk eight consecutive steps.



Author[s]: Bruno A. Olshausen

Learning Linear, Sparse, Factorial Codes

December 1996

In previous work (Olshausen & Field 1996), an algorithm was described for learning linear sparse codes which, when trained on natural images, produces a set of basis functions that are spatially localized, oriented, and bandpass (i.e., wavelet-like). This note shows how the algorithm may be interpreted within a maximum-likelihood framework. Several useful insights emerge from this connection: it makes explicit the relation to statistical independence (i.e., factorial coding), it shows a formal relationship to the algorithm of Bell and Sejnowski (1995), and it suggests how to adapt parameters that were previously fixed.


Author[s]: Brian A. LaMacchia

Internet Fish

August 1, 1996

I have invented "Internet Fish," a novel class of resource-discovery tools designed to help users extract useful information from the Internet. Internet Fish (IFish) are semi- autonomous, persistent information brokers; users deploy individual IFish to gather and refine information related to a particular topic. An IFish will initiate research, continue to discover new sources of information, and keep tabs on new developments in that topic. As part of the information-gathering process the user interacts with his IFish to find out what it has learned, answer questions it has posed, and make suggestions for guidance. Internet Fish differ from other Internet resource discovery systems in that they are persistent, personal and dynamic. As part of the information-gathering process IFish conduct extended, long-term conversations with users as they explore. They incorporate deep structural knowledge of the organization and services of the net, and are also capable of on-the-fly reconfiguration, modification and expansion. Human users may dynamically change the IFish in response to changes in the environment, or IFish may initiate such changes itself. IFish maintain internal state, including models of its own structure, behavior, information environment and its user; these models permit an IFish to perform meta-level reasoning about its own structure. To facilitate rapid assembly of particular IFish I have created the Internet Fish Construction Kit. This system provides enabling technology for the entire class of Internet Fish tools; it facilitates both creation of new IFish as well as additions of new capabilities to existing ones. The Construction Kit includes a collection of encapsulated heuristic knowledge modules that may be combined in mix-and-match fashion to create a particular IFish; interfaces to new services written with the Construction Kit may be immediately added to "live" IFish. Using the Construction Kit I have created a demonstration IFish specialized for finding World-Wide Web documents related to a given group of documents. This "Finder" IFish includes heuristics that describe how to interact with the Web in general, explain how to take advantage of various public indexes and classification schemes, and provide a method for discovering similarity relationships among documents.


Author[s]: Ignacio Sean McQuirk

An Analog VLSI Chip for Estimating the Focus of Expansion

August 21, 1996

For applications involving the control of moving vehicles, the recovery of relative motion between a camera and its environment is of high utility. This thesis describes the design and testing of a real- time analog VLSI chip which estimates the focus of expansion (FOE) from measured time-varying images. Our approach assumes a camera moving through a fixed world with translational velocity; the FOE is the projection of the translation vector onto the image plane. This location is the point towards which the camera is moving, and other points appear to be expanding outward from. By way of the camera imaging parameters, the location of the FOE gives the direction of 3-D translation. The algorithm we use for estimating the FOE minimizes the sum of squares of the differences at every pixel between the observed time variation of brightness and the predicted variation given the assumed position of the FOE. This minimization is not straightforward, because the relationship between the brightness derivatives depends on the unknown distance to the surface being imaged. However, image points where brightness is instantaneously constant play a critical role. Ideally, the FOE would be at the intersection of the tangents to the iso- brightness contours at these "stationary" points. In practice, brightness derivatives are hard to estimate accurately given that the image is quite noisy. Reliable results can nevertheless be obtained if the image contains many stationary points and the point is found that minimizes the sum of squares of the perpendicular distances from the tangents at the stationary points. The FOE chip calculates the gradient of this least-squares minimization sum, and the estimation is performed by closing a feedback loop around it. The chip has been implemented using an embedded CCD imager for image acquisition and a row-parallel processing scheme. A 64 x 64 version was fabricated in a 2um CCD/ BiCMOS process through MOSIS with a design goal of 200 mW of on-chip power, a top frame rate of 1000 frames/second, and a basic accuracy of 5%. A complete experimental system which estimates the FOE in real time using real motion and image scenes is demonstrated.


Author[s]: Olin Shivers

Supporting Dynamic Languages on the Java Virtual Machine

April 1996

In this note, I propose two extensions to the Java virtual machine (or VM) to allow dynamic languages such as Dylan, Scheme and Smalltalk to be efficiently implemented on the VM. These extensions do not affect the performance of pure Java programs on the machine. The first extension allows for efficient encoding of dynamic data; the second allows for efficient encoding of language-specific computational elements.


Author[s]: Kenneth Yip and Gerald Jay Sussman

A Computational Model for the Acquisition and Use of Phonological Knowledge

March 1996

Does knowledge of language consist of symbolic rules? How do children learn and use their linguistic knowledge? To elucidate these questions, we present a computational model that acquires phonological knowledge from a corpus of common English nouns and verbs. In our model the phonological knowledge is encapsulated as boolean constraints operating on classical linguistic representations of speech sounds in term of distinctive features. The learning algorithm compiles a corpus of words into increasingly sophisticated constraints. The algorithm is incremental, greedy, and fast. It yields one-shot learning of phonological constraints from a few examples. Our system exhibits behavior similar to that of young children learning phonological knowledge. As a bonus the constraints can be interpreted as classical linguistic rules. The computational model can be implemented by a surprisingly simple hardware mechanism. Our mechanism also sheds light on a fundamental AI question: How are signals related to symbols?


Author[s]: David Beymer

Pose-Invariant Face Recognition Using Real and Virtual Views

March 28, 1996

The problem of automatic face recognition is to visually identify a person in an input image. This task is performed by matching the input face against the faces of known people in a database of faces. Most existing work in face recognition has limited the scope of the problem, however, by dealing primarily with frontal views, neutral expressions, and fixed lighting conditions. To help generalize existing face recognition systems, we look at the problem of recognizing faces under a range of viewpoints. In particular, we consider two cases of this problem: (i) many example views are available of each person, and (ii) only one view is available per person, perhaps a driver's license or passport photograph. Ideally, we would like to address these two cases using a simple view-based approach, where a person is represented in the database by using a number of views on the viewing sphere. While the view-based approach is consistent with case (i), for case (ii) we need to augment the single real view of each person with synthetic views from other viewpoints, views we call 'virtual views'. Virtual views are generated using prior knowledge of face rotation, knowledge that is 'learned' from images of prototype faces. This prior knowledge is used to effectively rotate in depth the single real view available of each person. In this thesis, I present the view- based face recognizer, techniques for synthesizing virtual views, and experimental results using real and virtual views in the recognizer.


Author[s]: Thomas F. Stahovich

SketchIT: A Sketch Interpretation Tool for Conceptual Mechanical Design

March 13, 1996

We describe a program called SketchIT capable of producing multiple families of designs from a single sketch. The program is given a rough sketch (drawn using line segments for part faces and icons for springs and kinematic joints) and a description of the desired behavior. The sketch is "rough" in the sense that taken literally, it may not work. From this single, perhaps flawed sketch and the behavior description, the program produces an entire family of working designs. The program also produces design variants, each of which is itself a family of designs. SketchIT represents each family of designs with a "behavior ensuring parametric model" (BEP-Model), a parametric model augmented with a set of constraints that ensure the geometry provides the desired behavior. The construction of the BEP-Model from the sketch and behavior description is the primary task and source of difficulty in this undertaking. SketchIT begins by abstracting the sketch to produce a qualitative configuration space (qc- space) which it then uses as its primary representation of behavior. SketchIT modifies this initial qc-space until qualitative simulation verifies that it produces the desired behavior. SketchIT's task is then to find geometries that implement this qc-space. It does this using a library of qc-space fragments. Each fragment is a piece of parametric geometry with a set of constraints that ensure the geometry implements a specific kind of boundary (qcs- curve) in qc-space. SketchIT assembles the fragments to produce the BEP-Model. SketchIT produces design variants by mapping the qc-space to multiple implementations, and by transforming rotating parts to translating parts and vice versa.


Author[s]: Kah-Kay Sung

Learning and Example Selection for Object and Pattern Detection

March 13, 1996

This thesis presents a learning based approach for detecting classes of objects and patterns with variable image appearance but highly predictable image boundaries. It consists of two parts. In part one, we introduce our object and pattern detection approach using a concrete human face detection example. The approach first builds a distribution-based model of the target pattern class in an appropriate feature space to describe the target's variable image appearance. It then learns from examples a similarity measure for matching new patterns against the distribution-based target model. The approach makes few assumptions about the target pattern class and should therefore be fairly general, as long as the target class has predictable image boundaries. Because our object and pattern detection approach is very much learning-based, how well a system eventually performs depends heavily on the quality of training examples it receives. The second part of this thesis looks at how one can select high quality examples for function approximation learning tasks. We propose an {em active learning} formulation for function approximation, and show for three specific approximation function classes, that the active example selection strategy learns its target with fewer data samples than random sampling. We then simplify the original active learning formulation, and show how it leads to a tractable example selection paradigm, suitable for use in many object and pattern detection problems.


Author[s]: Tommi S. Jaakkola and Michael I. Jordan

Computing Upper and Lower Bounds on Likelihoods in Intractable Networks

March 1996

We present techniques for computing upper and lower bounds on the likelihoods of partial instantiations of variables in sigmoid and noisy-OR networks. The bounds determine confidence intervals for the desired likelihoods and become useful when the size of the network (or clique size) precludes exact computations. We illustrate the tightness of the obtained bounds by numerical experiments.


Author[s]: Lawrence K. Saul, Tommi Jaakkola and Michael I. Jordan

Mean Field Theory for Sigmoid Belief Networks

August 1996

We develop a mean field theory for sigmoid belief networks based on ideas from statistical mechanics. Our mean field theory provides a tractable approximation to the true probability distribution in these networks; it also yields a lower bound on the likelihood of evidence. We demonstrate the utility of this framework on a benchmark problem in statistical pattern recognition -- the classification of handwritten digits.


Author[s]: Deniz Yuret

From Genetic Algorithms to Efficient Organization

May 1994

The work described in this thesis began as an inquiry into the nature and use of optimization programs based on "genetic algorithms." That inquiry led, eventually, to three powerful heuristics that are broadly applicable in gradient-ascent programs: First, remember the locations of local maxima and restart the optimization program at a place distant from previously located local maxima. Second, adjust the size of probing steps to suit the local nature of the terrain, shrinking when probes do poorly and growing when probes do well. And third, keep track of the directions of recent successes, so as to probe preferentially in the direction of most rapid ascent. These algorithms lie at the core of a novel optimization program that illustrates the power to be had from deploying them together. The efficacy of this program is demonstrated on several test problems selected from a variety of fields, including De Jong's famous test-problem suite, the traveling salesman problem, the problem of coordinate registration for image guided surgery, the energy minimization problem for determining the shape of organic molecules, and the problem of assessing the structure of sedimentary deposits using seismic data.


Author[s]: Philip N. Sabes and Michael I. Jordan

Reinforcement Learning by Probability Matching

Jaunary 1996

We present a new algorithm for associative reinforcement learning. The algorithm is based upon the idea of matching a network's output probability with a probability distribution derived from the environment"s reward signal. This Probability Matching algorithm is shown to perform faster and be less susceptible to local minima than previously existing algorithms. We use Probability Matching to train mixture of experts networks, an architecture for which other reinforcement learning rules fail to converge reliably on even simple problems. This architecture is particularly well suited for our algorithm as it can compute arbitrarily complex functions yet calculation of the output probability is simple.


Author[s]: Marina Meila and Michael I. Jordan

Learning Fine Motion by Markov Mixtures of Experts

November 1995

Compliant control is a standard method for performing fine manipulation tasks, like grasping and assembly, but it requires estimation of the state of contact between the robot arm and the objects involved. Here we present a method to learn a model of the movement from measured data. The method requires little or no prior knowledge and the resulting model explicitly estimates the state of contact. The current state of contact is viewed as the hidden state variable of a discrete HMM. The control dependent transition probabilities between states are modeled as parametrized functions of the measurement We show that their parameters can be estimated from measurements concurrently with the estimation of the parameters of the movement in each state of contact. The learning algorithm is a variant of the EM procedure. The E step is computed exactly; solving the M step exactly would require solving a set of coupled nonlinear algebraic equations in the parameters. Instead, gradient ascent is used to produce an increase in likelihood.


Author[s]: Tina Kapur

Segmentation of Brain Tissue from Magnetic Resonance Images

January 1995

Segmentation of medical imagery is a challenging problem due to the complexity of the images, as well as to the absence of models of the anatomy that fully capture the possible deformations in each structure. Brain tissue is a particularly complex structure, and its segmentation is an important step for studies in temporal change detection of morphology, as well as for 3D visualization in surgical planning. In this paper, we present a method for segmentation of brain tissue from magnetic resonance images that is a combination of three existing techniques from the Computer Vision literature: EM segmentation, binary morphology, and active contour models. Each of these techniques has been customized for the problem of brain tissue segmentation in a way that the resultant method is more robust than its components. Finally, we present the results of a parallel implementation of this method on IBM's supercomputer Power Visualization System for a database of 20 brain scans each with 256x256x124 voxels and validate those against segmentations generated by neuroanatomy experts.



Author[s]: Padhraic Smyth, David Heckerman and Michael Jordan

Probabilistic Independence Networks for Hidden Markov Probability Models

March 13, 1996

Graphical techniques for modeling the dependencies of randomvariables have been explored in a variety of different areas includingstatistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics.Formalisms for manipulating these models have been developedrelatively independently in these research communities. In this paper weexplore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independencenetworks (PINs). The paper contains a self-contained review of the basic principles of PINs.It is shown that the well- known forward-backward (F-B) and Viterbialgorithms for HMMs are special cases of more general inference algorithms forarbitrary PINs. Furthermore, the existence of inference and estimationalgorithms for more general graphical models provides a set of analysistools for HMM practitioners who wish to explore a richer class of HMMstructures.Examples of relatively complex models to handle sensorfusion and coarticulationin speech recognitionare introduced and treated within the graphical model framework toillustrate the advantages of the general approach.


Author[s]: Jonathan A. Rees

A Security Kernel Based on the Lambda-Calculus

March 13, 1996

Cooperation between independent agents depends upon establishing adegree of security. Each of the cooperating agents needs assurance that the cooperation will not endanger resources of value to that agent. In a computer system, a computational mechanism can assure safe cooperation among the system's users by mediating resource access according to desired security policy. Such a mechanism, which is called a security kernel, lies at the heart of many operating systems and programming environments.The report describes Scheme 48, a programming environment whose design is guided by established principles of operating system security. Scheme 48's security kernel is small, consisting of the call- by-value $lambda$-calculus with a few simple extensions to support abstract data types, object mutation, and access to hardware resources. Each agent (user or subsystem) has a separate evaluation environment that holds objects representing privileges granted to that agent. Because environments ultimately determine availability of object references, protection and sharing can be controlled largely by the way in which environments are constructed. I will describe experience with Scheme 48 that shows how it serves as a robust and flexible experimental platform. Two successful applications of Scheme 48 are the programming environment for the Cornell mobile robots, where Scheme 48 runs with no (other) operating system support; and a secure multi- user environment that runs on workstations.


Author[s]: John Bryant Morrell

Parallel Coupled Micro-Macro Actuators

January 1996

This thesis presents a new actuator system consisting of a micro-actuator and a macro- actuator coupled in parallel via a compliant transmission. The system is called the Parallel Coupled Micro-Macro Actuator, or PaCMMA. In this system, the micro-actuator is capable of high bandwidth force control due to its low mass and direct-drive connection to the output shaft. The compliant transmission of the macro-actuator reduces the impedance (stiffness) at the output shaft and increases the dynamic range of force. Performance improvement over single actuator systems was expected in force control, impedance control, force distortion and reduction of transient impact forces. A set of quantitative measures is proposed and the actuator system is evaluated against them: Force Control Bandwidth, Position Bandwidth, Dynamic Range, Impact Force, Impedance ("Backdriveability'"), Force Distortion and Force Performance Space. Several theoretical performance limits are derived from the saturation limits of the system. A control law is proposed and control system performance is compared to the theoretical limits. A prototype testbed was built using permanenent magnet motors and an experimental comparison was performed between this actuator concept and two single actuator systems. The following performance was observed: Force bandwidth of 56Hz, Torque Dynamic Range of 800:1, Peak Torque of 1040mNm, Minimum Torque of 1.3mNm. Peak Impact Force was reduced by an order of magnitude. Distortion at small amplitudes was reduced substantially. Backdriven impedance was reduced by 2-3 orders of magnitude. This actuator system shows promise for manipulator design as well as psychophysical tests of human performance.



Author[s]: Michael I. Jordan and Christopher M. Bishop

Neural Networks

March 13, 1996

We present an overview of current research on artificial neural networks, emphasizing a statistical perspective. We view neural networks as parameterized graphs that make probabilistic assumptions about data, and view learning algorithms as methods for finding parameter values that look probable in the light of the data. We discuss basic issues in representation and learning, and treat some of the practical issues that arise in fitting networks to data. We also discuss links between neural networks and the general formalism of graphical models.



Author[s]: Zoubin Ghahramani and Michael I. Jordan

Factorial Hidden Markov Models

February 9, 1996

We present a framework for learning in hidden Markov models with distributed state representations. Within this framework, we derive a learning algorithm based on the Expectation--Maximization (EM) procedure for maximum likelihood estimation. Analogous to the standard Baum-Welch update rules, the M-step of our algorithm is exact and can be solved analytically. However, due to the combinatorial nature of the hidden state representation, the exact E-step is intractable. A simple and tractable mean field approximation is derived. Empirical results on a set of problems suggest that both the mean field approximation and Gibbs sampling are viable alternatives to the computationally expensive exact algorithm.



Author[s]: Tommi S. Jaakkola, Lawrence K. Saul and Michael I. Jordan

Fast Learning by Bounding Likelihoods in Sigmoid Type Belief Networks

February 9, 1996

Sigmoid type belief networks, a class of probabilistic neural networks, provide a natural framework for compactly representing probabilistic information in a variety of unsupervised and supervised learning problems. Often the parameters used in these networks need to be learned from examples. Unfortunately, estimating the parameters via exact probabilistic calculations (i.e, the EM-algorithm) is intractable even for networks with fairly small numbers of hidden units. We propose to avoid the infeasibility of the E step by bounding likelihoods instead of computing them exactly. We introduce extended and complementary representations for these networks and show that the estimation of the network parameters can be made fast (reduced to quadratic optimization) by performing the estimation in either of the alternative domains. The complementary networks can be used for continuous density estimation as well.



Author[s]: Michael J. Jones, Tomaso Poggio

Model-Based Matching of Line Drawings by Linear Combinations of Prototypes

January 18, 1996

We describe a technique for finding pixelwise correspondences between two images by using models of objects of the same class to guide the search. The object models are 'learned' from example images (also called prototypes) of an object class. The models consist of a linear combination ofsprototypes. The flow fields giving pixelwise correspondences between a base prototype and each of the other prototypes must be given. A novel image of an object of the same class is matched to a model by minimizing an error between the novel image and the current guess for the closest modelsimage. Currently, the algorithm applies to line drawings of objects. An extension to real grey level images is discussed.



Author[s]: Carl de Marcken

The Unsupervised Acquisition of a Lexicon from Continuous Speech

January 18, 1996

We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that have stymied previous grammar-induction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from raw speech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency.



Author[s]: David C. Somers, Emanuel V. Todorov, Athanassios G. Siapas and Mriganka Sur

Vector-Based Integration of Local and Long-Range Information in Visual Cortex

January 18, 1996

Integration of inputs by cortical neurons provides the basis for the complex information processing performed in the cerebral cortex. Here, we propose a new analytic framework for understanding integration within cortical neuronal receptive fields. Based on the synaptic organization of cortex, we argue that neuronal integration is a systems--level process better studied in terms of local cortical circuitry than at the level of single neurons, and we present a method for constructing self-contained modules which capture (nonlinear) local circuit interactions. In this framework, receptive field elements naturally have dual (rather than the traditional unitary influence since they drive both excitatory and inhibitory cortical neurons. This vector-based analysis, in contrast to scalarsapproaches, greatly simplifies integration by permitting linear summation of inputs from both "classical" and "extraclassical" receptive field regions. We illustrate this by explaining two complex visual cortical phenomena, which are incompatible with scalar notions of neuronal integration.


Author[s]: Thomas Marill

The Three-Dimensional Interpretation of a Class of Simple Line-Drawings

October 1995

We provide a theory of the three-dimensional interpretation of a class of line-drawings called p-images, which are interpreted by the human vision system as parallelepipeds ("boxes"). Despite their simplicity, p-images raise a number of interesting vision questions: *Why are p-images seen as three-dimensional objects? Why not just as flatimages? *What are the dimensions and pose of the perceived objects? *Why are some p-images interpreted as rectangular boxes, while others are seen as skewed, even though there is no obvious distinction between the images? *When p-images are rotated in three dimensions, why are the image-sequences perceived as distorting objects---even though structure-from-motion would predict that rigid objects would be seen? *Why are some three-dimensional parallelepipeds seen as radically different when viewed from different viewpoints? We show that these and related questions can be answered with the help of a single mathematical result and an associated perceptual principle. An interesting special case arises when there are right angles in the p-image. This case represents a singularity in the equations and is mystifying from the vision point of view. It would seem that (at least in this case) the vision system does not follow the ordinary rules of geometry but operates in accordance with other (and as yet unknown) principles.


Author[s]: D.A. Leopold, J.C. Fitzgibbons and N.K. Logothetis

The Role of Attention in Binocular Rivalry as Revealed Through Optokinetic Nystagmus

November 1995

When stimuli presented to the two eyes differ considerably, stable binocular fusion fails, and the subjective percept alternates between the two monocular images, a phenomenon known as binocular rivalry. The influence of attention over this perceptual switching has long been studied, and although there is evidence that attention can affect the alternation rate, its role in the overall dynamics of the rivalry process remains unclear. The present study investigated the relationship between the attention paid to the rivalry stimulus, and the dynamics of the perceptual alternations. Specifically, the temporal course of binocular rivalry was studied as the subjects performed difficult nonvisual and visual concurrent tasks, directing their attention away from the rivalry stimulus. Periods of complete perceptual dominance were compared for the attended condition, where the subjects reported perceptual changes, and the unattended condition, where one of the simultaneous tasks was performed. During both the attended and unattended conditions, phases of rivalry dominance were obtained by analyzing the subject"s optokinetic nystagmus recorded by an electrooculogram, where the polarity of the nystagmus served as an objective indicator of the perceived direction of motion. In all cases, the presence of a difficult concurrent task had little or no effect on the statistics of the alternations, as judged by two classic tests of rivalry, although the overall alternation rate showed a small but significant increase with the concurrent task. It is concluded that the statistical patterns of rivalry alternations are not governed by attentional shifts or decision-making on the part of the subject.


Author[s]: N.K. Logothetis and D.A. Leopold

On the Physiology of Bistable Percepts

November 1995

Binocular rivalry refers to the alternating perceptions experienced when two dissimilar patterns are stereoscopically viewed. To study the neural mechanism that underlies such competitive interactions, single cells were recorded in the visual areas V1, V2, and V4, while monkeys reported the perceived orientation of rivaling sinusoidal grating patterns. A number of neurons in all areas showed alternating periods of excitation and inhibition that correlated with the perceptual dominance and suppression of the cell"s preferred orientation. The remaining population of cells were not influenced by whether or not the optimal stimulus orientation was perceptually suppressed. Response modulation during rivalry was not correlated with cell attributes such as monocularity, binocularity, or disparity tuning. These results suggest that the awareness of a visual pattern during binocular rivalry arises through interactions between neurons at different levels of visual pathways, and that the site of suppression is unlikely to correspond to a particular visual area, as often hypothesized on the basis of psychophysical observations. The cell-types of modulating neurons and their overwhelming preponderance in higher rather than in early visual areas also suggests -- together with earlier psychophysical evidence -- the possibility of a common mechanism underlying rivalry as well as other bistable percepts, such as those experienced with ambiguous figures.


Author[s]: David A. Cohn

Minimizing Statistical Bias with Queries

September 1995

I describe an exploration criterion that attempts to minimize the error of a learner by minimizing its estimated squared bias. I describe experiments with locally-weighted regression on two simple kinematics problems, and observe that this "bias-only" approach outperforms the more common "variance-only" exploration approach, even in the presence of noise.


Author[s]: Jacob Katzenelson and Aharon Unikovski

A Network Charge-Orineted MOS Transistor Model

August 1995

The MOS transistor physical model as described in [3] is presented here as a network model. The goal is to obtain an accurate model, suitable for simulation, free from certain problems reported in the literature [13], and conceptually as simple as possible. To achieve this goal the original model had to be extended and modified. The paper presents the derivation of the network model from physical equations, including the corrections which are required for simulation and which compensate for simplifications introduced in the original physical model. Our intrinsic MOS model consists of three nonlinear voltage-controlled capacitors and a dependent current source. The charges of the capacitors and the current of the current source are functions of the voltages $V_{gs}$, $V_{bs}$, and $V_{ds}$. The complete model consists of the intrinsic model plus the parasitics. The apparent simplicity of the model is a result of hiding information in the characteristics of the nonlinear components. The resulted network model has been checked by simulation and analysis. It is shown that the network model is suitable for simulation: It is defined for any value of the voltages; the functions involved are continuous and satisfy Lipschitz conditions with no jumps at region boundaries; Derivatives have been computed symbolically and are available for use by the Newton-Raphson method. The model"s functions can be measured from the terminals. It is also shown that small channel effects can be included in the model. Higher frequency effects can be modeled by using a network consisting of several sections of the basic lumped model. Future plans include a detailed comparison of the network model with models such as SPICE level 3 and a comparison of the multi- section higher frequency model with experiments.


Author[s]: T.D. Alter and Ronen Basri

Extracting Salient Curves from Images: An Analysis of the Saliency Network

August 1995

The Saliency Network proposed by Shashua and Ullman is a well-known approach to the problem of extracting salient curves from images while performing gap completion. This paper analyzes the Saliency Network. The Saliency Network is attractive for several reasons. First, the network generally prefers long and smooth curves over short or wiggly ones. While computing saliencies, the network also fills in gaps with smooth completions and tolerates noise. Finally, the network is locally connected, and its size is proportional to the size of the image. Nevertheless, our analysis reveals certain weaknesses with the method. In particular, we show cases in which the most salient element does not lie on the perceptually most salient curve. Furthermore, in some cases the saliency measure changes its preferences when curves are scaled uniformly. Also, we show that for certain fragmented curves the measure prefers large gaps over a few small gaps of the same total size. In addition, we analyze the time complexity required by the method. We show that the number of steps required for convergence in serial implementations is quadratic in the size of the network, and in parallel implementations is linear in the size of the network. We discuss problems due to coarse sampling of the range of possible orientations. We show that with proper sampling the complexity of the network becomes cubic in the size of the network. Finally, we consider the possibility of using the Saliency Network for grouping. We show that the Saliency Network recovers the most salient curve efficiently, but it has problems with identifying any salient curve other than the most salient one.


Author[s]: Roberto Brunelli and Tomaso Poggio

Template Matching: Matched Spatial Filters and Beyond

October 1995

Template matching by means of cross-correlation is common practice in pattern recognition. However, its sensitivity to deformations of the pattern and the broad and unsharp peaks it produces are significant drawbacks. This paper reviews some results on how these shortcomings can be removed. Several techniques (Matched Spatial Filters, Synthetic Discriminant Functions, Principal Components Projections and Reconstruction Residuals) are reviewed and compared on a common task: locating eyes in a database of faces. New variants are also proposed and compared: least squares Discriminant Functions and the combined use of projections on eigenfunctions and the corresponding reconstruction residuals. Finally, approximation networks are introduced in an attempt to improve filter design by the introduction of nonlinearity.


Author[s]: Paul A. Viola

Alignment by Maximization of Manual Information

March 1995

A new information-theoretic approach is presented for finding the pose of an object in an image. The technique does not require information about the surface properties of the object, besides its shape, and is robust with respect to variations of illumination. In our derivation, few assumptions are made about the nature of the imaging process. As a result the algorithms are quite general and can foreseeably be used in a wide variety of imaging situations. Experiments are presented that demonstrate the approach registering magnetic resonance (MR) images with computed tomography (CT) images, aligning a complex 3D object model to real scenes including clutter and occlusion, tracking a human head in a video sequence and aligning a view-based 2D object model to real images. The method is based on a formulation of the mutual information between the model and the image called EMMA. As applied here the technique is intensity-based, rather than feature-based. It works well in domains where edge or gradient-magnitude based methods have difficulty, yet it is more robust than traditional correlation. Additionally, it has an efficient implementation that is based on stochastic approximation. Finally, we will describe a number of additional real- world applications that can be solved efficiently and reliably using EMMA. EMMA can be used in machine learning to find maximally informative projections of high-dimensional data. EMMA can also be used to detect and correct corruption in magnetic resonance images (MRI).


Author[s]: Michael R. Blair, Natalya Cohen, David M. LaMacchia and Brian K. Zuzga

MIT SchMUSE: Class-Based Remote Delegation in a Capricious Distributed Environment

February 1993

MIT SchMUSE (pronounced "shmooz") is a concurrent, distributed, delegation-based object-oriented interactive environment with persistent storage. It is designed to run in a "capricious" network environment, where servers can migrate from site to site and can regularly become unavailable. Our design introduces a new form of unique identifiers called "globally unique tickets" that provide globally unique time/space stamps for objects and classes without being location specific. Object location is achieved by a distributed hierarchical lazy lookup mechanism that we call "realm resolution." We also introduce a novel mechanism called "message deferral" for enhanced reliability in the face of remote delegation. We conclude with a comparison to related work and a projection of future work on MIT SchMUSE.


Author[s]: Yoky Matsuoka

Embodiment and Manipulation Learning Process for a Humanoid Hand

May 1995

Babies are born with simple manipulation capabilities such as reflexes to perceived stimuli. Initial discoveries by babies are accidental until they become coordinated and curious enough to actively investigate their surroundings. This thesis explores the development of such primitive learning systems using an embodied light-weight hand with three fingers and a thumb. It is self- contained having four motors and 36 exteroceptor and proprioceptor sensors controlled by an on-palm microcontroller. Primitive manipulation is learned from sensory inputs using competitive learning, back-propagation algorithm and reinforcement learning strategies. This hand will be used for a humanoid being developed at the MIT Artificial Intelligence Laboratory.


Author[s]: James A. Stuart Fiske

Thread Scheduling Mechanisms for Multiple-Context Parallel Processors

June 1995

Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory references or processor synchronization operations. One way of tolerating this latency is to use a processor with multiple hardware contexts that can rapidly switch to executing another thread of computation whenever a long latency operation occurs, thus increasing processor utilization by overlapping computation with communication. Although multiple contexts are effective for tolerating latency, this effectiveness can be limited by memory and network bandwidth, by cache interference effects among the multiple contexts, and by critical tasks sharing processor resources with less critical tasks. This thesis presents techniques that increase the effectiveness of multiple contexts by intelligently scheduling threads to make more efficient use of processor pipeline, bandwidth, and cache resources. This thesis proposes thread prioritization as a fundamental mechanism for directing the thread schedule on a multiple-context processor. A priority is assigned to each thread either statically or dynamically and is used by the thread scheduler to decide which threads to load in the contexts, and to decide which context to switch to on a context switch. We develop a multiple-context model that integrates both cache and network effects, and shows how thread prioritization can both maintain high processor utilization, and limit increases in critical path runtime caused by multithreading. The model also shows that in order to be effective in bandwidth limited applications, thread prioritization must be extended to prioritize memory requests. We show how simple hardware can prioritize the running of threads in the multiple contexts, and the issuing of requests to both the local memory and the network. Simulation experiments show how thread prioritization is used in a variety of applications. Thread prioritization can improve the performance of synchronization primitives by minimizing the number of processor cycles wasted in spinning and devoting more cycles to critical threads. Thread prioritization can be used in combination with other techniques to improve cache performance and minimize cache interference between different working sets in the cache. For applications that are critical path limited, thread prioritization can improve performance by allowing processor resources to be devoted preferentially to critical threads. These experimental results show that thread prioritization is a mechanism that can be used to implement a wide range of scheduling policies.


Author[s]: J.P. Mellor

Enhanced Reality Visualization in a Surgical Environment

January 1995

Enhanced reality visualization is the process of enhancing an image by adding to it information which is not present in the original image. A wide variety of information can be added to an image ranging from hidden lines or surfaces to textual or iconic data about a particular part of the image. Enhanced reality visualization is particularly well suited to neurosurgery. By rendering brain structures which are not visible, at the correct location in an image of a patient's head, the surgeon is essentially provided with X-ray vision. He can visualize the spatial relationship between brain structures before he performs a craniotomy and during the surgery he can see what's under the next layer before he cuts through. Given a video image of the patient and a three dimensional model of the patient's brain the problem enhanced reality visualization faces is to render the model from the correct viewpoint and overlay it on the original image. The relationship between the coordinate frames of the patient, the patient's internal anatomy scans and the image plane of the camera observing the patient must be established. This problem is closely related to the camera calibration problem. This report presents a new approach to finding this relationship and develops a system for performing enhanced reality visualization in a surgical environment. Immediately prior to surgery a few circular fiducials are placed near the surgical site. An initial registration of video and internal data is performed using a laser scanner. Following this, our method is fully automatic, runs in nearly real-time, is accurate to within a pixel, allows both patient and camera motion, automatically corrects for changes to the internal camera parameters (focal length, focus, aperture, etc.) and requires only a single image.


Author[s]: Brian Scott Eberman

Contact Sensing: A Sequential Decision Approach to Sensing Manipulation Contact

May 1995

This paper describes a new statistical, model-based approach to building a contact state observer. The observer uses measurements of the contact force and position, and prior information about the task encoded in a graph, to determine the current location of the robot in the task configuration space. Each node represents what the measurements will look like in a small region of configuration space by storing a predictive, statistical, measurement model. This approach assumes that the measurements are statistically block independent conditioned on knowledge of the model, which is a fairly good model of the actual process. Arcs in the graph represent possible transitions between models. Beam Viterbi search is used to match measurement history against possible paths through the model graph in order to estimate the most likely path for the robot. The resulting approach provides a new decision process that can be use as an observer for event driven manipulation programming. The decision procedure is significantly more robust than simple threshold decisions because the measurement history is used to make decisions. The approach can be used to enhance the capabilities of autonomous assembly machines and in quality control applications.


Author[s]: D. McAllester, P. Van Henlenryck and T. Kapur

Three Cuts for Accelerated Interval Propagation

May 1995

This paper addresses the problem of nonlinear multivariate root finding. In an earlier paper we described a system called Newton which finds roots of systems of nonlinear equations using refinements of interval methods. The refinements are inspired by AI constraint propagation techniques. Newton is competative with continuation methods on most benchmarks and can handle a variety of cases that are infeasible for continuation methods. This paper presents three "cuts" which we believe capture the essential theoretical ideas behind the success of Newton. This paper describes the cuts in a concise and abstract manner which, we believe, makes the theoretical content of our work more apparent. Any implementation will need to adopt some heuristic control mechanism. Heuristic control of the cuts is only briefly discussed here.


Author[s]: Elmer S. Hung

Parameter Estimation in Chaotic Systems

April 1995

This report examines how to estimate the parameters of a chaotic system given noisy observations of the state behavior of the system. Investigating parameter estimation for chaotic systems is interesting because of possible applications for high-precision measurement and for use in other signal processing, communication, and control applications involving chaotic systems. In this report, we examine theoretical issues regarding parameter estimation in chaotic systems and develop an efficient algorithm to perform parameter estimation. We discover two properties that are helpful for performing parameter estimation on non-structurally stable systems. First, it turns out that most data in a time series of state observations contribute very little information about the underlying parameters of a system, while a few sections of data may be extraordinarily sensitive to parameter changes. Second, for one-parameter families of systems, we demonstrate that there is often a preferred direction in parameter space governing how easily trajectories of one system can "shadow'" trajectories of nearby systems. This asymmetry of shadowing behavior in parameter space is proved for certain families of maps of the interval. Numerical evidence indicates that similar results may be true for a wide variety of other systems. Using the two properties cited above, we devise an algorithm for performing parameter estimation. Standard parameter estimation techniques such as the extended Kalman filter perform poorly on chaotic systems because of divergence problems. The proposed algorithm achieves accuracies several orders of magnitude better than the Kalman filter and has good convergence properties for large data sets.


Author[s]: David Beymer

Vectorizing Face Images by Interpreting Shape and Texture Computations

September 1995

The correspondence problem in computer vision is basically a matching task between two or more sets of features. In this paper, we introduce a vectorized image representation, which is a feature-based representation where correspondence has been established with respect to a reference image. This representation has two components: (1) shape, or (x, y) feature locations, and (2) texture, defined as the image grey levels mapped onto the standard reference image. This paper explores an automatic technique for "vectorizing" face images. Our face vectorizer alternates back and forth between computation steps for shape and texture, and a key idea is to structure the two computations so that each one uses the output of the other. A hierarchical coarse-to-fine implementation is discussed, and applications are presented to the problems of facial feature detection and registration of two arbitrary faces.


Author[s]: David Beymer and Tomaso Poggio

Face Recognition from One Example View

September 1995

If we are provided a face database with only one example view per person, is it possible to recognize new views of them under a variety of different poses, especially views rotated in depth from the original example view? We investigate using prior knowledge about faces plus each single example view to generate virtual views of each person, or views of the face as seen from different poses. Prior knowledge of faces is represented in an example-based way, using 2D views of a prototype face seen rotating in depth. The synthesized virtual views are evaluated as example views in a view-based approach to pose-invariant face recognition. They are shown to improve the recognition rate over the scenario where only the single real view is used.


Author[s]: Panayotis Skordos and Gerald Jay Sussman

Comparison Between Subsonic Flow Simulation and Physical Measurements of Flue Pipes

April 1995

Direct simulations of wind musical instruments using the compressible Navier Stokes equations have recently become possible through the use of parallel computing and through developments in numerical methods. As a first demonstration, the flow of air and the generation of musical tones inside a soprano recorder are simulated numerically. In addition, physical measurements are made of the acoustic signal generated by the recorder at different blowing speeds. The comparison between simulated and physically measured behavior is encouraging and points towards ways of improving the simulations.


Author[s]: Panayotis A. Skordos

Aeroacoustics on Non-Dedicated Workstations

April 1995

The simulation of subsonic aeroacoustic problems such as the flow-generated sound of wind instruments is well suited for parallel computing on a cluster of non-dedicated workstations. Simulations are demonstrated which employ 20 non-dedicated Hewlett-Packard workstations (HP9000/715), and achieve comparable performance on this problem as a 64-node CM-5 dedicated supercomputer with vector units. The success of the present approach depends on the low communication requirements of the problem (low communication to computation ratio) which arise from the coarse-grain decomposition of the problem and the use of local-interaction methods. Many important problems may be suitable for this type of parallel computing including computer vision, circuit simulation, and other subsonic flow problems.


Author[s]: N.K. Logothetis, J. Pauls and T. Poggio

Spatial Reference Frames for Object Recognition: Tuning for Rotations in Depth

March 1995

The inferior temporal cortex (IT) of monkeys is thought to play an essential role in visual object recognition. Inferotemporal neurons are known to respond to complex visual stimuli, including patterns like faces, hands, or other body parts. What is the role of such neurons in object recognition? The present study examines this question in combined psychophysical and electrophysiological experiments, in which monkeys learned to classify and recognize novel visual 3D objects. A population of neurons in IT were found to respond selectively to such objects that the monkeys had recently learned to recognize. A large majority of these cells discharged maximally for one view of the object, while their response fell off gradually as the object was rotated away from the neuron"s preferred view. Most neurons exhibited orientation-dependent responses also during view-plane rotations. Some neurons were found tuned around two views of the same object, while a very small number of cells responded in a view- invariant manner. For five different objects that were extensively used during the training of the animals, and for which behavioral performance became view-independent, multiple cells were found that were tuned around different views of the same object. No selective responses were ever encountered for views that the animal systematically failed to recognize. The results of our experiments suggest that neurons in this area can develop a complex receptive field organization as a consequence of extensive training in the discrimination and recognition of objects. Simple geometric features did not appear to account for the neurons" selective responses. These findings support the idea that a population of neurons -- each tuned to a different object aspect, and each showing a certain degree of invariance to image transformations -- may, as an assembly, encode complex 3D objects. In such a system, several neurons may be active for any given vantage point, with a single unit acting like a blurred template for a limited neighborhood of a single view.


Author[s]: Marco Fillo, Stephen W. Keckler, William J. Dally, Nicholas P. Carter, Andrew Chang, Yevgeny Gurevich and Whay S. Lee

The M-Machine Multicomputer

March 1995

The M-Machine is an experimental multicomputer being developed to test architectural concepts motivated by the constraints of modern semiconductor technology and the demands of programming systems. The M- Machine computing nodes are connected with a 3-D mesh network; each node is a multithreaded processor incorporating 12 function units, on-chip cache, and local memory. The multiple function units are used to exploit both instruction-level and thread-level parallelism. A user accessible message passing system yields fast communication and synchronization between nodes. Rapid access to remote memory is provided transparently to the user with a combination of hardware and software mechanisms. This paper presents the architecture of the M-Machine and describes how its mechanisms maximize both single thread performance and overall system throughput.


Author[s]: Thomas Vetter and Tomaso Poggio

Linear Object Classes and Image Synthesis from a Single Example Image

March 1995

The need to generate new views of a 3D object from a single real image arises in several fields, including graphics and object recognition. While the traditional approach relies on the use of 3D models, we have recently introduced techniques that are applicable under restricted conditions but simpler. The approach exploits image transformations that are specific to the relevant object class and learnable from example views of other "prototypical" objects of the same class. In this paper, we introduce such a new technique by extending the notion of linear class first proposed by Poggio and Vetter. For linear object classes it is shown that linear transformations can be learned exactly from a basis set of 2D prototypical views. We demonstrate the approach on artificial objects and then show preliminary evidence that the technique can effectively "rotate" high- resolution face images from a single 2D view.


Author[s]: Partha Niyogi and Robert C. Berwick

A Note of Zipf's Law, Natural Languages, and Noncoding DNA Regions

March 1995

In Phys. Rev. Letters (73:2), Mantegna et al. conclude on the basis of Zipf rank frequency data that noncoding DNA sequence regions are more like natural languages than coding regions. We argue on the contrary that an empirical fit to Zipf"s "law" cannot be used as a criterion for similarity to natural languages. Although DNA is a presumably "organized system of signs" in Mandelbrot"s (1961) sense, and observation of statistical featurs of the sort presented in the Mantegna et al. paper does not shed light on the similarity between DNA's "gramar" and natural language grammars, just as the observation of exact Zipf-like behavior cannot distinguish between the underlying processes of tossing an M-sided die or a finite-state branching process.


Author[s]: Aparna Lakshmi Ratan

The Role of Fixation and Visual Attention in Object Recognition

July 21, 1995

This research project is a study of the role of fixation and visual attention in object recognition. In this project, we build an active vision system which can recognize a target object in a cluttered scene efficiently and reliably. Our system integrates visual cues like color and stereo to perform figure/ground separation, yielding candidate regions on which to focus attention. Within each image region, we use stereo to extract features that lie within a narrow disparity range about the fixation position. These selected features are then used as input to an alignment-style recognition system. We show that visual attention and fixation significantly reduce the complexity and the false identifications in model-based recognition using Alignment methods. We also demonstrate that stereo can be used effectively as a figure/ground separator without the need for accurate camera calibration.


Author[s]: Panayotis A. Skordos

Modeling Flue Pipes: Subsonic Flow, Lattice Boltzmann, and Parallel Distributed Computers

Arpil 21, 1995

The problem of simulating the hydrodynamics and the acoustic waves inside wind musical instruments such as the recorder, the organ, and the flute is considered. The problem is attacked by developing suitable local- interaction algorithms and a parallel simulation system on a cluster of non- dedicated workstations. Physical measurements of the acoustic signal of various flue pipes show good agreement with the simulations. Previous attempts at this problem have been frustrated because the modeling of acoustic waves requires small integration time steps which make the simulation very compute-intensive. In addition, the simulation of subsonic viscous compressible flow at high Reynolds numbers is susceptible to slow-growing numerical instabilities which are triggered by high- frequency acoustic modes. The numerical instabilities are mitigated by employing suitable explicit algorithms: lattice Boltzmann method, compressible finite differences, and fourth-order artificial-viscosity filter. Further, a technique for accurate initial and boundary conditions for the lattice Boltzmann method is developed, and the second-order accuracy of the lattice Boltzmann method is demonstrated. The compute-intensive requirements are handled by developing a parallel simulation system on a cluster of non-dedicated workstations. The system achieves 80 percent parallel efficiency (speedup/processors) using 20 HP-Apollo workstations. The system is built on UNIX and TCP/IP communication routines, and includes automatic process migration from busy hosts to free hosts.


Author[s]: Kenji Nagao and Berthold Horn

Direct Object Recognition Using No Higher Than Second or Third Order Statistics of the Image

December 1995

Novel algorithms for object recognition are described that directly recover the transformations relating the image to its model. Unlike methods fitting the typical conventional framework, these new methods do not require exhaustive search for each feature correspondence in order to solve for the transformation. Yet they allow simultaneous object identification and recovery of the transformation. Given hypothesized % potentially corresponding regions in the model and data (2D views) --- which are from planar surfaces of the 3D objects --- these methods allow direct compututation of the parameters of the transformation by which the data may be generated from the model. We propose two algorithms: one based on invariants derived from no higher than second and third order moments of the image, the other via a combination of the affine properties of geometrical and the differential attributes of the image. Empirical results on natural images demonstrate the effectiveness of the proposed algorithms. A sensitivity analysis of the algorithm is presented. We demonstrate in particular that the differential method is quite stable against perturbations --- although not without some error --- when compared with conventional methods. We also demonstrate mathematically that even a single point correspondence suffices, theoretically at least, to recover affine parameters via the differential method.


Author[s]: Matthew M. Williamson

Series Elastic Actuators

September 7, 1995

This thesis presents the design, construction, control and evaluation of a novel force controlled actuator. Traditional force controlled actuators are designed from the premise that "Stiffer is better''. This approach gives a high bandwidth system, prone to problems of contact instability, noise, and low power density. The actuator presented in this thesis is designed from the premise that "Stiffness isn't everything". The actuator, which incorporates a series elastic element, trades off achievable bandwidth for gains in stable, low noise force control, and protection against shock loads. This thesis reviews related work in robot force control, presents theoretical descriptions of the control and expected performance from a series elastic actuator, and describes the design of a test actuator constructed to gather performance data. Finally the performance of the system is evaluated by comparing the performance data to theoretical predictions.


Author[s]: Kenji Nagao and Eric Grimson

Recognizing 3D Object Using Photometric Invariant

April 22, 1995

In this paper we describe a new efficient algorithm for recognizing 3D objects by combining photometric and geometric invariants. Some photometric properties are derived, that are invariant to the changes of illumination and to relative object motion with respect to the camera and/or the lighting source in 3D space. We argue that conventional color constancy algorithms can not be used in the recognition of 3D objects. Further we show recognition does not require a full constancy of colors, rather, it only needs something that remains unchanged under the varying light conditions sand poses of the objects. Combining the derived color invariants and the spatial constraints on the object surfaces, we identify corresponding positions in the model and the data space coordinates, using centroid invariance of corresponding groups of feature positions. Tests are given to show the stability and efficiency of our approach to 3D object recognition.



Author[s]: David A. Cohn, Zoubin Ghahramani and Michael I. Jordan

Active Learning with Statistical Models

March 21, 1995

For many types of learners one can compute the statistically 'optimal' way to select data. We review how these techniques have been used with feedforward neural networks. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate.



Author[s]: Kah Kay Sung and Tomaso Poggio

Example Based Learning for View-Based Human Face Detection

January 24, 1995

We present an example-based learning approach for locating vertical frontal views of human faces in complex scenes. The technique models the distribution of human face patterns by means of a few view-based "face'' and "non-face'' prototype clusters. At each image location, the local pattern is matched against the distribution-based model, and a trained classifier determines, based on the local difference measurements, whether or not a human face exists at the current image location. We provide an analysis that helps identify the critical components of our system.



Author[s]: Michael Jordan and Lei Xu

On Convergence Properties of the EM Algorithm for Gaussian Mixtures

April 21, 1995

"Expectation-Maximization'' (EM) algorithm and gradient-based approaches for maximum likelihood learning of finite Gaussian mixtures. We show that the EM step in parameter space is obtained from the gradient via a projection matrix $P$, and we provide an explicit expression for the matrix. We then analyze the convergence of EM in terms of special properties of $P$ and provide new results analyzing the effect that $P$ has on the likelihood surface. Based on these mathematical results, we present a comparative discussion of the advantages and disadvantages of EM and other algorithms for the learning of Gaussian mixture models.



Author[s]: Pawan Sinha and Tomaso Poggio

View-Based Strategies for 3D Object Recognition

April 21, 1995

A persistent issue of debate in the area of 3D object recognition concerns the nature of the experientially acquired object models in the primate visual system. One prominent proposal in this regard has expounded the use of object centered models, such as representations of the objects’ 3D structures in a coordinate frame independent of the viewing parameters [Marr and Nishihara, 1978]. In contrast to this is another proposal which suggests that the viewing parameters encountered during the learning phase might be inextricably linked to subsequent performance on a recognition task [Tarr and Pinker, 1989; Poggio and Edelman, 1990]. The ‘object model’, according to this idea, is simply a collection of the sample views encountered during training. Given that object centered recognition strategies have the attractive feature of leading to viewpoint independence, they have garnered much of the research effort in the field of computational vision. Furthermore, since human recognition performance seems remarkably robust in the face of imaging variations [Ellis et al., 1989], it has often been implicitly assumed that the visual system employs an object centered strategy. In the present study we examine this assumption more closely. Our experimental results with a class of novel 3D structures strongly suggest the use of a view-based strategy by the human visual system even when it has the opportunity of constructing and using object-centered models. In fact, for our chosen class of objects, the results seem to support a stronger claim: 3D object recognition is 2D view-based.



Author[s]: Douglas A. Jones, Robert C. Berwick, Franklin Cho, Zeeshan Khan, Karen T. Kohl, Naoyuki Nomura, Anand Radhakrishnan, Ulrich Sauerland and Brian Ulicny

Verb Classes and Alternations in Bangla, German, English, and Korean

May 6, 1996

In this report, we investigate the relationship between the semantic and syntactic properties of verbs. Our work is based on the English Verb Classes and Alternations of (Levin, 1993). We explore how these classes are manifested in other languages, in particular, in Bangla, German, and Korean. Our report includes a survey and classification of several hundred verbs from these languages into the cross-linguistic equivalents of Levin's classes. We also explore ways in which our findings may be used to enhance WordNet in two ways: making the English syntactic information of WordNet more fine-grained, and making WordNet multilingual.



Author[s]: Partha Niyogi and Robert Berwick

The Logical Problem of Language Change

December 1995

This paper considers the problem of language change. Linguists must explain not only how languages are learned but also how and why they have evolved along certain trajectories and not others. While the language learning problem has focused on the behavior of individuals and how they acquire a particular grammar from a class of grammars ${cal G}$, here we consider a population of such learners and investigate the emergent, global population characteristics of linguistic communities over several generations. We argue that language change follows logically from specific assumptions about grammatical theories and learning paradigms. In particular, we are able to transform parameterized theories and memoryless acquisition algorithms into grammatical dynamical systems, whose evolution depicts a population's evolving linguistic composition. We investigate the linguistic and computational consequences of this model, showing that the formalization allows one to ask questions about diachronic that one otherwise could not ask, such as the effect of varying initial conditions on the resulting diachronic trajectories. From a more programmatic perspective, we give an example of how the dynamical system model for language change can serve as a way to distinguish among alternative grammatical theories, introducing a formal diachronic adequacy criterion for linguistic theories.



Author[s]: Partha Niyogi and Robert Berwick

A Dynamical Systems Model for Language Change

December 1995

Formalizing linguists' intuitions of language change as a dynamical system, we quantify the time course of language change including sudden vs. gradual changes in languages. We apply the computer model to the historical loss of Verb Second from Old French to modern French, showing that otherwise adequate grammatical theories can fail our new evolutionary criterion.



Author[s]: Partha Niyogi

Sequential Optimal Recovery: A Paradigm for Active Learning

May 12, 1995

In most classical frameworks for learning from examples, it is assumed that examples are randomly drawn and presented to the learner. In this paper, we consider the possibility of a more active learner who is allowed to choose his/her own examples. Our investigations are carried out in a function approximation setting. In particular, using arguments from optimal recovery (Micchelli and Rivlin, 1976), we develop an adaptive sampling strategy (equivalent to adaptive approximation) for arbitrary approximation schemes. We provide a general formulation of the problem and show how it can be regarded as sequential optimal recovery. We demonstrate the application of this general formulation to two special cases of functions on the real line 1) monotonically increasing functions and 2) functions with bounded derivative. An extensive investigation of the sample complexity of approximating these functions is conducted yielding both theoretical and empirical results on test functions. Our theoretical results (stated insPAC-style), along with the simulations demonstrate the superiority of our active scheme over both passive learning as well as classical optimal recovery. The analysis of active function approximation is conducted in a worst-case setting, in contrast with other Bayesian paradigms obtained from optimal design (Mackay, 1992).


Author[s]: Ruth Bergman

Learning World Models in Environments with Manifest Causal Structure

May 5, 1995

This thesis examines the problem of an autonomous agent learning a causal world model of its environment. Previous approaches to learning causal world models have concentrated on environments that are too "easy" (deterministic finite state machines) or too "hard" (containing much hidden state). We describe a new domain --- environments with manifest causal structure - -- for learning. In such environments the agent has an abundance of perceptions of its environment. Specifically, it perceives almost all the relevant information it needs to understand the environment. Many environments of interest have manifest causal structure and we show that an agent can learn the manifest aspects of these environments quickly using straightforward learning techniques. We present a new algorithm to learn a rule-based causal world model from observations in the environment. The learning algorithm includes (1) a low level rule-learning algorithm that converges on a good set of specific rules, (2) a concept learning algorithm that learns concepts by finding completely correlated perceptions, and (3) an algorithm that learns general rules. In addition this thesis examines the problem of finding a good expert from a sequence of experts. Each expert has an "error rate"; we wish to find an expert with a low error rate. However, each expert's error rate and the distribution of error rates are unknown. A new expert-finding algorithm is presented and an upper bound on the expected error rate of the expert is derived.



Author[s]: M. Poggio and T. Poggio

Cooperative Physics of Fly Swarms: An Emergent Behavior

April 11, 1995

We have simulated the behavior of several artificial flies, interacting visually with each other. Each fly is described by a simple tracking system (Poggio and Reichardt, 1973; Land and Collett, 1974) which summarizes behavioral experiments in which individual flies fixate a target. Our main finding is that the interaction of theses implemodules gives rise to a variety of relatively complex behaviors. In particular, we observe a swarm-like behavior of a group of many artificial flies for certain reasonable ranges of our tracking system parameters.


Author[s]: Ian Horswill

Specialization of Perceptual Processes

April 22, 1995

In this report, I discuss the use of vision to support concrete, everyday activity. I will argue that a variety of interesting tasks can be solved using simple and inexpensive vision systems. I will provide a number of working examples in the form of a state-of-the-art mobile robot, Polly, which uses vision to give primitive tours of the seventh floor of the MIT AI Laboratory. By current standards, the robot has a broad behavioral repertoire and is both simple and inexpensive (the complete robot was built for less than $20,000 using commercial board-level components). The approach I will use will be to treat the structure of the agent's activity---its task and environment---as positive resources for the vision system designer. By performing a careful analysis of task and environment, the designer can determine a broad space of mechanisms which can perform the desired activity. My principal thesis is that for a broad range of activities, the space of applicable mechanisms will be broad enough to include a number mechanisms which are simple and economical. The simplest mechanisms that solve a given problem will typically be quite specialized to that problem. One thus worries that building simple vision systems will be require a great deal of {it ad-hoc} engineering that cannot be transferred to other problems. My second thesis is that specialized systems can be analyzed and understood in a principled manner, one that allows general lessons to be extracted from specialized systems. I will present a general approach to analyzing specialization through the use of transformations that provably improve performance. By demonstrating a sequence of transformations that derive a specialized system from a more general one, we can summarize the specialization of the former in a compact form that makes explicit the additional assumptions that it makes about its environment. The summary can be used to predict the performance of the system in novel environments. Individual transformations can be recycled in the design of future systems.



Author[s]: Margrit Betke and Nicholas Makris

Fast Object Recognition in Noisy Images Using Simulated Annealing

January 25, 1995

A fast simulated annealing algorithm is developed for automatic object recognition. The normalized correlation coefficient is used as a measure of the match between a hypothesized object and an image. Templates are generated on-line during the search by transforming model images. Simulated annealing reduces the search time by orders of magnitude with respect to an exhaustive search. The algorithm is applied to the problem of how landmarks, for example, traffic signs, can be recognized by an autonomous vehicle or a navigating robot. The algorithm works well in noisy, real-world images of complicated scenes for model images with high information content.



Author[s]: Zoubin Ghahramani and Michael I. Jordan

Learning from Incomplete Data

January 24,1995

Real-world learning tasks often involve high- dimensional data sets with complex patterns of missing features. In this paper we review the problem of learning from incomplete data from two statistical perspectives---the likelihood-based and the Bayesian. The goal is two-fold: to place current neural network approaches to missing data within a statistical framework, and to describe a set of algorithms, derived from the likelihood-based framework, that handle clustering, classification, and function approximation from incomplete data in a principled and efficient manner. These algorithms are based on mixture modeling and make two distinct appeals to the Expectation-Maximization (EM) principle (Dempster, Laird, and Rubin 1977)-- -both for the estimation of mixture components and for coping with the missing data.



Author[s]: Pawan Sinha

Reciprocal Interactions Between Motion and Form Perception

April 21, 1995

The processes underlying the perceptual analysis of visual form are believed to have minimal interaction with those subserving the perception of visual motion (Livingstone and Hubel, 1987; Victor and Conte, 1990). Recent reports of functionally and anatomically segregated parallel streams in the primate visual cortex seem to support this hypothesis (Ungerlieder and Mishkin, 1982; VanEssen and Maunsell, 1983; Shipp and Zeki, 1985; Zeki and Shipp, 1988; De Yoe et al., 1994). Here we present perceptual evidence that is at odds with this view and instead suggests strong symmetric interactions between the form and motion processes. In one direction, we show that the introduction of specific static figural elements, say 'F', in a simple motion sequence biases an observer to perceive a particular motion field, say 'M'. In the reverse direction, the imposition of the same motion field 'M' on the original sequence leads the observer to perceive illusory static figural elements 'F'. A specific implication of these findings concerns the possible existence of (what we call) motion end-stopped units in the primate visual system. Such units might constitute part of a mechanism for signalling subjective occluding contours based on motion-field discontinuities.


Author[s]: Robert Playter

Passive Dynamics in the Control of Gymnastic Maneuvers

March 1995

The control of aerial gymnastic maneuvers is challenging because these maneuvers frequently involve complex rotational motion and because the performer has limited control of the maneuver during flight. A performer can influence a maneuver using a sequence of limb movements during flight. However, the same sequence may not produce reliable performances in the presence of off-nominal conditions. How do people compensate for variations in performance to reliably produce aerial maneuvers? In this report I explore the role that passive dynamic stability may play in making the performance of aerial maneuvers simple and reliable. I present a control strategy comprised of active and passive components for performing robot front somersaults in the laboratory. I show that passive dynamics can neutrally stabilize the layout somersault which involves an "inherently unstable" rotation about the intermediate principal axis. And I show that a strategy that uses open loop joint torques plus passive dynamics leads to more reliable 1 1/2 twisting front somersaults in simulation than a strategy that uses prescribed limb motion. Results are presented from laboratory experiments on gymnastic robots, from dynamic simulation of humans and robots, and from linear stability analyses of these systems.


Author[s]: Saed G. Younis

Asymptotically Zero Energy Computing Using Split-Level Charge Recovery Logic

June 1994

The dynamic power requirement of CMOS circuits is rapidly becoming a major concern in the design of personal information systems and large computers. In this work we present a number of new CMOS logic families, Charge Recovery Logic (CRL) as well as the much improved Split-Level Charge Recovery Logic (SCRL), within which the transfer of charge between the nodes occurs quasistatically. Operating quasistatically, these logic families have an energy dissipation that drops linearly with operating frequency, i.e., their power consumption drops quadratically with operating frequency as opposed to the linear drop of conventional CMOS. The circuit techniques in these new families rely on constructing an explicitly reversible pipelined logic gate, where the information necessary to recover the energy used to compute a value is provided by computing its logical inverse. Information necessary to uncompute the inverse is available from the subsequent inverse logic stage. We demonstrate the low energy operation of SCRL by presenting the results from the testing of the first fully quasistatic 8 x 8 multiplier chip (SCRL-1) employing SCRL circuit techniques.


Author[s]: Roberto Brunelli

Estimation of Pose and Illuminant Direction for Face Processing

November 1994

In this paper three problems related to the analysis of facial images are addressed: the illuminant direction, the compensation of illumination effects and, finally, the recovery of the pose of the face, restricted to in-depth rotations. The solutions proposed for these problems rely on the use of computer graphics techniques to provide images of faces under different illumination and pose, starting from a database of frontal views under frontal illumination.


Author[s]: Lisa Dron

Computing 3-D Motion in Custom Analog and Digital VLSI

Nov 28, 1994

This thesis examines a complete design framework for a real-time, autonomous system with specialized VLSI hardware for computing 3-D camera motion. In the proposed architecture, the first step is to determine point correspondences between two images. Two processors, a CCD array edge detector and a mixed analog/digital binary block correlator, are proposed for this task. The report is divided into three parts. Part I covers the algorithmic analysis; part II describes the design and test of a 32$ ime $32 CCD edge detector fabricated through MOSIS; and part III compares the design of the mixed analog/digital correlator to a fully digital implementation.


Author[s]: Maja J. Mataric

Interaction and Intelligent Behavior

August 1994

We introduce basic behaviors as primitives for control and learning in situated, embodied agents interacting in complex domains. We propose methods for selecting, formally specifying, algorithmically implementing, empirically evaluating, and combining behaviors from a basic set. We also introduce a general methodology for automatically constructing higher--level behaviors by learning to select from this set. Based on a formulation of reinforcement learning using conditions, behaviors, and shaped reinforcement, out approach makes behavior selection learnable in noisy, uncertain environments with stochastic dynamics. All described ideas are validated with groups of up to 20 mobile robots performing safe-- wandering, following, aggregation, dispersion, homing, flocking, foraging, and learning to forage.


Author[s]: Sebastian Toleg and Tomaso Poggio

Towards an Example-Based Image Compression Architecture for Video-Conferencing

June 1994

This paper consists of two major parts. First, we present the outline of a simple approach to very-low bandwidth video-conferencing system relying on an example-based hierarchical image compression scheme. In particular, we discuss the use of example images as a model, the number of required examples, faces as a class of semi-rigid objects, a hierarchical model based on decomposition into different time-scales, and the decomposition of face images into patches of interest. In the second part, we present several algorithms for image processing and animation as well as experimental evaluations. Among the original contributions of this paper is an automatic algorithm for pose estimation and normalization. We also review and compare different algorithms for finding the nearest neighbors in a database for a new input as well as a generalized algorithm for blending patches of interest in order to synthesize new images. Finally, we outline the possible integration of several algorithms to illustrate a simple model-based video-conference system.


Author[s]: Michael H. Coen

SodaBot: A Software Agent Environment and Construction System

Nov 2, 1994

This thesis presents SodaBot, a general- purpose software agent user-environment and construction system. Its primary component is the basic software agent --- a computational framework for building agents which is essentially an agent operating system. We also present a new language for programming the basic software agent whose primitives are designed around human-level descriptions of agent activity. Via this programming language, users can easily implement a wide-range of typical software agent applications, e.g. personal on-line assistants and meeting scheduling agents. The SodaBot system has been implemented and tested, and its description comprises the bulk of this thesis.


Author[s]: John S. Keen

Logging and Recovery in a Highly Concurrent Database

June 1994

This report addresses the problem of fault tolerance to system failures for database systems that are to run on highly concurrent computers. It assumes that, in general, an application may have a wide distribution in the lifetimes of its transactions. Logging remains the method of choice for ensuring fault tolerance. Generational garbage collection techniques manage the limited disk space reserved for log information; this technique does not require periodic checkpoints and is well suited for applications with a broad range of transaction lifetimes. An arbitrarily large collection of parallel log streams provide the necessary disk bandwidth.


Author[s]: David A. Cohn

Neural Network Exploration Using Optimal Experiment Design

June 1994

We consider the question "How should one act when the only goal is to learn as much as possible?" Building on the theoretical results of Fedorov [1972] and MacKay [1992], we apply techniques from Optimal Experiment Design (OED) to guide the query/action selection of a neural network learner. We demonstrate that these techniques allow the learner to minimize its generalization error by exploring its domain efficiently and completely. We conclude that, while not a panacea, OED-based query/action has much to offer, especially in domains where its high computational costs can be tolerated.


Author[s]: Ammon Shashua and Nassir Navab

Relative Affine Structure: Canonical Model for 3D from 2D Geometry and Applications

June 1994

We propose an affine framework for perspective views, captured by a single extremely simple equation based on a viewer- centered invariant we call "relative affine structure". Via a number of corollaries of our main results we show that our framework unifies previous work --- including Euclidean, projective and affine --- in a natural and simple way, and introduces new, extremely simple, algorithms for the tasks of reconstruction from multiple views, recognition by alignment, and certain image coding applications.


Author[s]: Leon Wong

Automated Reasoning About Classical Mechanics

May 1994

In recent years, researchers in artificial intelligence have become interested in replicating human physical reasoning talents in computers. One of the most important skills in this area is predicting how physical systems will behave. This thesis discusses an implemented program that generates algebraic descriptions of how systems of rigid bodies evolve over time. Discussion about the design of this program identifies a physical reasoning paradigm and knowledge representation approach based on mathematical model construction and algebraic reasoning. This paradigm offers several advantages over methods that have become popular in the field, and seems promising for reasoning about a wide variety of classical mechanics problems.


Author[s]: Andrew Berlin and Rajeev Surati

Partial Evaluation for Scientific Computing: The Supercomputer Toolkit Experience

May 1994

We describe the key role played by partial evaluation in the Supercomputer Toolkit, a parallel computing system for scientific applications that effectively exploits the vast amount of parallelism exposed by partial evaluation. The Supercomputer Toolkit parallel processor and its associated partial evaluation-based compiler have been used extensively by scientists at M.I.T., and have made possible recent results in astrophysics showing that the motion of the planets in our solar system is chaotically unstable.


Author[s]: Panayotis A. Skordos

Parallel Simulation of Subsonic Fluid Dynamics on a Cluster of Workstations

December 1995

An effective approach of simulating fluid dynamics on a cluster of non- dedicated workstations is presented. The approach uses local interaction algorithms, small communication capacity, and automatic migration of parallel processes from busy hosts to free hosts. The approach is well- suited for simulating subsonic flow problems which involve both hydrodynamics and acoustic waves; for example, the flow of air inside wind musical instruments. Typical simulations achieve $80\%$ parallel efficiency (speedup/processors) using 20 HP-Apollo workstations. Detailed measurements of the parallel efficiency of 2D and 3D simulations are presented, and a theoretical model of efficiency is developed which fits closely the measurements. Two numerical methods of fluid dynamics are tested: explicit finite differences, and the lattice Boltzmann method.



Author[s]: Heinrich H. Buelthoff, Shimon Y. Edelman and Michael J. Tarr

How are Three-Deminsional Objects Represented in the Brain?

April 1994

We discuss a variety of object recognition experiments in which human subjects were presented with realistically rendered images of computer-generated three-dimensional objects, with tight control over stimulus shape, surface properties, illumination, and viewpoint, as well as subjects' prior exposure to the stimulus objects. In all experiments recognition performance was: (1) consistently viewpoint dependent; (2) only partially aided by binocular stereo and other depth information, (3) specific to viewpoints that were familiar; (4) systematically disrupted by rotation in depth more than by deforming the two-dimensional images of the stimuli. These results are consistent with recently advanced computational theories of recognition based on view interpolation.


Author[s]: D.W. Jacobs and T.D. Alter

Uncertainty Propagation in Model-Based Recognition

February 1995

Building robust recognition systems requires a careful understanding of the effects of error in sensed features. Error in these image features results in a region of uncertainty in the possible image location of each additional model feature. We present an accurate, analytic approximation for this uncertainty region when model poses are based on matching three image and model points, for both Gaussian and bounded error in the detection of image points, and for both scaled-orthographic and perspective projection models. This result applies to objects that are fully three- dimensional, where past results considered only two- dimensional objects. Further, we introduce a linear programming algorithm to compute the uncertainty region when poses are based on any number of initial matches. Finally, we use these results to extend, from two-dimensional to three- dimensional objects, robust implementations of alignmentt interpretation- tree search, and ransformation clustering.


Author[s]: Margrit Betke, Ronald L. Rivest and Mona Singh

Piecemeal Learning of an Unknown Environment

March 1994

We introduce a new learning problem: learning a graph by piecemeal search, in which the learner must return every so often to its starting point (for refueling, say). We present two linear-time piecemeal-search algorithms for learning city-block graphs: grid graphs with rectangular obstacles.


Author[s]: N.K. Logothetis, J. Pauls and T. Poggio

Viewer-Centered Object Recognition in Monkeys

April 1994

How does the brain recognize three-dimensional objects? We trained monkeys to recognize computer rendered objects presented from an arbitrarily chosen training view, and subsequently tested their ability to generalize recognition for other views. Our results provide additional evidence in favor of with a recognition model that accomplishes view-invariant performance by storing a limited number of object views or templates together with the capacity to interpolate between the templates (Poggio and Edelman, 1990).


Author[s]: Nikos K. Logothetis, Thomas Vetter, Anya Hurlbert and Tomaso Poggio

View-Based Models of 3D Object Recognition and Class-Specific Invariances

April 1994

This paper describes the main features of a view-based model of object recognition. The model tries to capture general properties to be expected in a biological architecture for object recognition. The basic module is a regularization network in which each of the hidden units is broadly tuned to a specific view of the object to be recognized.


Author[s]: James M. Hutchinson, Andrew Lo and Tomaso Poggio

A Nonparametric Approach to Pricing and Hedging Derivative Securities via Learning Networks

April 1994

We propose a nonparametric method for estimating derivative financial asset pricing formulae using learning networks. To demonstrate feasibility, we first simulate Black-Scholes option prices and show that learning networks can recover the Black- Scholes formula from a two-year training set of daily options prices, and that the resulting network formula can be used successfully to both price and delta-hedge options out-of- sample. For comparison, we estimate models using four popular methods: ordinary least squares, radial basis functions, multilayer perceptrons, and projection pursuit. To illustrate practical relevance, we also apply our approach to S&P 500 futures options data from 1987 to 1991.


Author[s]: Karen Beth Sarachik

An Analysis of the Effect of Gaussian Error in Object Recognition

February 1994

Object recognition is complicated by clutter, occlusion, and sensor error. Since pose hypotheses are based on image feature locations, these effects can lead to false negatives and positives. In a typical recognition algorithm, pose hypotheses are tested against the image, and a score is assigned to each hypothesis. We use a statistical model to determine the score distribution associated with correct and incorrect pose hypotheses, and use binary hypothesis testing techniques to distinguish between them. Using this approach we can compare algorithms and noise models, and automatically choose values for internal system thresholds to minimize the probability of making a mistake.


Author[s]: Whitman Richards and Jan J. Koenderink

Trajectory Mapping ("TM''): A New Non-Metric Scaling Technique

December 1993

Trajectory Mapping "TM'' is a new scaling technique designed to recover the parameterizations, axes, and paths used to traverse a feature space. Unlike Multidimensional Scaling (MDS), there is no assumption that the space is homogenous or metric. Although some metric ordering information is obtained with TM, the main output is the feature parameterizations that partition the given domain of object samples into different categories. Following an introductory example, the technique is further illustrated using first a set of colors and then a collection of textures taken from Brodatz (1966).


Author[s]: Partha Niyogi and Federico Girosi

On the Relationship Between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions

February 1994

In this paper, we bound the generalization error of a class of Radial Basis Function networks, for certain well defined function learning tasks, in terms of the number of parameters and number of examples. We show that the total generalization error is partly due to the insufficient representational capacity of the network (because of its finite size) and partly due to insufficient information about the target function (because of finite number of samples). We make several observations about generalization error which are valid irrespective of the approximation scheme. Our result also sheds light on ways to choose an appropriate network architecture for a particular problem.


Author[s]: Lynne E. Parker

Heterogeneous Multi-Robot Cooperation

February 1994

This report addresses the problem of achieving cooperation within small- to medium- sized teams of heterogeneous mobile robots. I describe a software architecture I have developed, called ALLIANCE, that facilitates robust, fault tolerant, reliable, and adaptive cooperative control. In addition, an extended version of ALLIANCE, called L-ALLIANCE, is described, which incorporates a dynamic parameter update mechanism that allows teams of mobile robots to improve the efficiency of their mission performance through learning. A number of experimental results of implementing these architectures on both physical and simulated mobile robot teams are described. In addition, this report presents the results of studies of a number of issues in mobile robot cooperation, including fault tolerant cooperative control, adaptive action selection, distributed control, robot awareness of team member actions, improving efficiency through learning, inter- robot communication, action recognition, and local versus global control.


Author[s]: Nancy S. Pollard

Parallel Methods for Synthesizing Whole-Hand Grasps from Generalized Prototypes

January 1994

This report addresses the problem of acquiring objects using articulated robotic hands. Standard grasps are used to make the problem tractable, and a technique is developed for generalizing these standard grasps to increase their flexibility to variations in the problem geometry. A generalized grasp description is applied to a new problem situation using a parallel search through hand configuration space, and the result of this operation is a global overview of the space of good solutions. The techniques presented in this report have been implemented, and the results are verified using the Salisbury three- finger robotic hand.


Author[s]: Kanji Nagao and W. Eric L. Grimson

Object Recognition By Alignment Using Invariant Projections of Planar Surfaces

February 1994

In order to recognize an object in an image, we must determine the best transformation from object model to the image. In this paper, we show that for features from coplanar surfaces which undergo linear transformations in space, there exist projections invariant to the surface motions up to rotations in the image field. To use this property, we propose a new alignment approach to object recognition based on centroid alignment of corresponding feature groups. This method uses only a single pair of 2D model and data. Experimental results show the robustness of the proposed method against perturbations of feature positions.


Author[s]: James S. Miller and Guillermo J. Rozas

Garbage Collection is Fast, But a Stack is Faster

March 1994

Prompted by claims that garbage collection can outperform stack allocation when sufficient physical memory is available, we present a careful analysis and set of cross-architecture measurements comparing these two approaches for the implementation of continuation (procedure call) frames. When the frames are allocated on a heap they require additional space, increase the amount of data transferred between memory and registers, and, on current architectures, require more instructions. We find that stack allocation of continuation frames outperforms heap allocation in some cases by almost a factor of three. Thus, stacks remain an important implementation technique for procedure calls, even in the presence of an efficient, compacting garbage collector and large amounts of memory.


Author[s]: David J. Beymer

Face Recognition Under Varying Pose

December 1993

While researchers in computer vision and pattern recognition have worked on automatic techniques for recognizing faces for the last 20 years, most systems specialize on frontal views of the face. We present a face recognizer that works under varying pose, the difficult part of which is to handle face rotations in depth. Building on successful template-based systems, our basic approach is to represent faces with templates from multiple model views that cover different poses from the viewing sphere. Our system has achieved a recognition rate of 98% on a data base of 62 people containing 10 testing and 15 modelling views per person.


Author[s]: Peter R. Nuth

The Named-State Register File

August 1993

This thesis introduces the Named-State Register File, a fine-grain, fully-associative register file. The NSF allows fast context switching between concurrent threads as well as efficient sequential program performance. The NSF holds more live data than conventional register files, and requires less spill and reload traffic to switch between contexts. This thesis demonstrates an implementation of the Named-State Register File and estimates the access time and chip area required for different organizations. Architectural simulations of large sequential and parallel applications show that the NSF can reduce execution time by 9% to 17% compared to alternative register files.


Author[s]: Michael I. Jordan and Lei Xu

Convergence Results for the EM Approach to Mixtures of Experts Architectures

November 1993

The Expectation-Maximization (EM) algorithm is an iterative approach to maximum likelihood parameter estimation. Jordan and Jacobs (1993) recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architecture of Jordan and Jacobs (1992). They showed empirically that the EM algorithm for these architectures yields significantly faster convergence than gradient ascent. In the current paper we provide a theoretical analysis of this algorithm. We show that the algorithm can be regarded as a variable metric algorithm with its searching direction having a positive projection on the gradient of the log likelihood. We also analyze the convergence of the algorithm and provide an explicit expression for the convergence rate. In addition, we describe an acceleration technique that yields a significant speedup in simulation experiments.


Author[s]: James M. Hutchinson

A Radial Basis Function Approach to Financial Time Series Analysis

December 1993

Nonlinear multivariate statistical techniques on fast computers offer the potential to capture more of the dynamics of the high dimensional, noisy systems underlying financial markets than traditional models, while making fewer restrictive assumptions. This thesis presents a collection of practical techniques to address important estimation and confidence issues for Radial Basis Function networks arising from such a data driven approach, including efficient methods for parameter estimation and pruning, a pointwise prediction error estimator, and a methodology for controlling the "data mining'' problem. Novel applications in the finance area are described, including customized, adaptive option pricing and stock price prediction.


Author[s]: Jeffrey M. Siskind

Naive Physics, Event Perception, Lexical Semantics, and Language Acquisition

April 1993

This thesis proposes a computational model of how children may come to learn the meanings of words in their native language. The proposed model is divided into two separate components. One component produces semantic descriptions of visually observed events while the other correlates those descriptions with co-occurring descriptions of those events in natural language. The first part of this thesis describes three implementations of the correlation process whereby representations of the meanings of whole utterances can be decomposed into fragments assigned as representations of the meanings of individual words. The second part of this thesis describes an implemented computer program that recognizes the occurrence of simple spatial motion events in simulated video input.


Author[s]: Martha J. Hiller

The Role of Chemical Mechanisms in Neural Computation and Learning

May 23, 1995

Most computational models of neurons assume that their electrical characteristics are of paramount importance. However, all long- term changes in synaptic efficacy, as well as many short-term effects, are mediated by chemical mechanisms. This technical report explores the interaction between electrical and chemical mechanisms in neural learning and development. Two neural systems that exemplify this interaction are described and modelled. The first is the mechanisms underlying habituation, sensitization, and associative learning in the gill withdrawal reflex circuit in Aplysia, a marine snail. The second is the formation of retinotopic projections in the early visual pathway during embryonic development.


Author[s]: Carl de Marcken

Methods for Parallelizing Search Paths in Phrasing

January 1994

Many search problems are commonly solved with combinatoric algorithms that unnecessarily duplicate and serialize work at considerable computational expense. There are techniques available that can eliminate redundant computations and perform remaining operations concurrently, effectively reducing the branching factors of these algorithms. This thesis applies these techniques to the problem of parsing natural language. The result is an efficient programming language that can reduce some of the expense associated with principle- based parsing and other search problems. The language is used to implement various natural language parsers, and the improvements are compared to those that result from implementing more deterministic theories of language processing.


Author[s]: Ammon Shashua

Algebraic Functions For Recognition

January 1994

In the general case, a trilinear relationship between three perspective views is shown to exist. The trilinearity result is shown to be of much practical use in visual recognition by alignment --- yielding a direct method that cuts through the computations of camera transformation, scene structure and epipolar geometry. The proof of the central result may be of further interest as it demonstrates certain regularities across homographies of the plane and introduces new view invariants. Experiments on simulated and real image data were conducted, including a comparative analysis with epipolar intersection and the linear combination methods, with results indicating a greater degree of robustness in practice and a higher level of performance in re-projection tasks.


Author[s]: Matthew Birkholz

Emacs Lisp in Edwin SScheme

September 1993

The MIT-Scheme program development environment includes a general-purpose text editor, Edwin, that has an extension language, Edwin Scheme. Edwin is very similar to another general-purpose text editor, GNU Emacs, which also has an extension language, Emacs Lisp. The popularity of GNU Emacs has lead to a large library of tools written in Emacs Lisp. The goal of this thesis is to implement a useful subset of Emacs Lisp in Edwin Scheme. This subset was chosen to be sufficient for simple operation of the GNUS news reading program.


Author[s]: Daniel M. Albro

AMAR: A Computational Model of Autosegmental Phonology

October 1993

This report describes a computational system with which phonologists may describe a natural language in terms of autosegmental phonology, currently the most advanced theory pertaining to the sound systems of human languages. This system allows linguists to easily test autosegmental hypotheses against a large corpus of data. The system was designed primarily with tonal systems in mind, but also provides support for tree or feature matrix representation of phonemes (as in The Sound Pattern of English), as well as syllable structures and other aspects of phonological theory. Underspecification is allowed, and trees may be specified before, during, and after rule application. The association convention is automatically applied, and other principles such as the conjunctivity condition are supported. The method of representation was designed such that rules are designated in as close a fashion as possible to the existing conventions of autosegmental theory while adhering to a textual constraint for maximum portability.


Author[s]: Partha Niyogi and Robert C. Berwick

Formalizing Triggers: A Learning Model for Finite Spaces

November 1993

In a recent seminal paper, Gibson and Wexler (1993) take important steps to formalizing the notion of language learning in a (finite) space whose grammars are characterized by a finite number of parameters. They introduce the Triggering Learning Algorithm (TLA) and show that even in finite space convergence may be a problem due to local maxima. In this paper we explicitly formalize learning in finite parameter space as a Markov structure whose states are parameter settings. We show that this captures the dynamics of TLA completely and allows us to explicitly compute the rates of convergence for TLA and other variants of TLA e.g. random walk. Also included in the paper are a corrected version of GW's central convergence proof, a list of "problem states" in addition to local maxima, and batch and PAC-style learning bounds for the model.



Author[s]: Amnon Shashua and Sebastian Toelg

The Quadric Reference Surface: Theory and Applications

June 1994

The conceptual component of this work is about "reference surfaces'' which are the dual of reference frames often used for shape representation purposes. The theoretical component of this work involves the question of whether one can find a unique (and simple) mapping that aligns two arbitrary perspective views of an opaque textured quadric surface in 3D, given (i) few corresponding points in the two views, or (ii) the outline conic of the surface in one view (only) and few corresponding points in the two views. The practical component of this work is concerned with applying the theoretical results as tools for the task of achieving full correspondence between views of arbitrary objects.



Author[s]: Takaya Miyano and Federico Girosi

Forecasting Global Temperature Variations by Neural Networks

August 1994

Global temperature variations between 1861 and 1984 are forecast usingsregularization networks, multilayer perceptrons and linearsautoregression. The regularization network, optimized by stochasticsgradient descent associated with colored noise, gives the bestsforecasts. For all the models, prediction errors noticeably increasesafter 1965. These results are consistent with the hypothesis that thesclimate dynamics is characterized by low-dimensional chaos and thatsthe it may have changed at some point after 1965, which is alsosconsistent with the recent idea of climate change.s


Author[s]: Andre DeHon

Robust, High-Speed Network Design for Large-Scale Multiprocessing

September 1993

As multiprocessor system size scales upward, two important aspects of multiprocessor systems will generally get worse rather than better: (1) interprocessor communication latency will increase and (2) the probability that some component in the system will fail will increase. These problems can prevent us from realizing the potential benefits of large-scale multiprocessing. In this report we consider the problem of designing networks which simultaneously minimize communication latency while maximizing fault tolerance. Using a synergy of techniques including connection topologies, routing protocols, signalling techniques, and packaging technologies we assemble integrated, system-level solutions to this network design problem.


Author[s]: Michael de la Maza

Synthesizing Regularity Exposing Attributes in Large Protein Databases

May 1993

This thesis describes a system that synthesizes regularity exposing attributes from large protein databases. After processing primary and secondary structure data, this system discovers an amino acid representation that captures what are thought to be the three most important amino acid characteristics (size, charge, and hydrophobicity) for tertiary structure prediction. A neural network trained using this 16 bit representation achieves a performance accuracy on the secondary structure prediction problem that is comparable to the one achieved by a neural network trained using the standard 24 bit amino acid representation. In addition, the thesis describes bounds on secondary structure prediction accuracy, derived using an optimal learning algorithm and the probably approximately correct (PAC) model.


Author[s]: Cynthia Ferrell

Robust Agent Control of an Autonomous Robot with Many Sensors and Actuators

May 1993

This thesis presents methods for implementing robust hexpod locomotion on an autonomous robot with many sensors and actuators. The controller is based on the Subsumption Architecture and is fully distributed over approximately 1500 simple, concurrent processes. The robot, Hannibal, weighs approximately 6 pounds and is equipped with over 100 physical sensors, 19 degrees of freedom, and 8 on board computers. We investigate the following topics in depth: distributed control of a complex robot, insect-inspired locomotion control for gait generation and rough terrain mobility, and fault tolerance. The controller was implemented, debugged, and tested on Hannibal. Through a series of experiments, we examined Hannibal's gait generation, rough terrain locomotion, and fault tolerance performance. These results demonstrate that Hannibal exhibits robust, flexible, real-time locomotion over a variety of terrain and tolerates a multitude of hardware failures.


Author[s]: J. Brian Subirana-Vilanova

Mid-Level Vision and Recognition of Non-Rigid Objects

April 1995

We address mid-level vision for the recognition of non-rigid objects. We align model and image using frame curves - which are object or "figure/ground" skeletons. Frame curves are computed, without discontinuities, using Curved Inertia Frames, a provably global scheme implemented on the Connection Machine, based on: non- cartisean networks; a definition of curved axis of inertia; and a ridge detector. I present evidence against frame alignment in human perception. This suggests: frame curves have a role in figure/ground segregation and in fuzzy boundaries; their outside/near/top/ incoming regions are more salient; and that perception begins by setting a reference frame (prior to early vision), and proceeds by processing convex structures.



Author[s]: Tommi Jaakkola, Michael I. Jordan and Satinder P. Singh

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

August 1993

Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP- based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.



Author[s]: Michael I. Jordan and Robert A. Jacobs

Hierarchical Mixtures of Experts and the EM Algorithm

August 1993

We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM's). Learning is treated as a maximum likelihood problem; in particular, we present an Expectation- Maximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an on-line learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain.


Author[s]: Rodney Brooks and Lynn A. Stein

Building Brains for Bodies

August 1993

We describe a project to capitalize on newly available levels of computational resources in order to understand human cognition. We will build an integrated physical system including vision, sound input and output, and dextrous manipulation, all controlled by a continuously operating large scale parallel MIMD computer. The resulting system will learn to "think'' by building on its bodily experiences to accomplish progressively more abstract tasks. Past experience suggests that in attempting to build such an integrated system we will have to fundamentally change the way artificial intelligence, cognitive science, linguistics, and philosophy think about the organization of intelligence. We expect to be able to better reconcile the theories that will be developed with current work in neuroscience.



Author[s]: Kah Kay Sung and Partha Niyogi

A Formulation for Active Learning with Applications to Object Detection

June 6, 1996

We discuss a formulation for active example selection for function learning problems. This formulation is obtained by adapting Fedorov's optimal experiment design to the learning problem. We specifically show how to analytically derive example selection algorithms for certain well defined function classes. We then explore the behavior and sample complexity of such active learning algorithms. Finally, we view object detection as a special case of function learning and show how our formulation reduces to a useful heuristic to choose examples to reduce the generalization error.



Author[s]: Reza Shadmehr and Ferdinando Mussa-Ivaldi

Geometric Structure of the Adaptive Controller of the Human Arm

July 1993

The objects with which the hand interacts with may significantly change the dynamics of the arm. How does the brain adapt control of arm movements to this new dynamic? We show that adaptation is via composition of a model of the task's dynamics. By exploring generalization capabilities of this adaptation we infer some of the properties of the computational elements with which the brain formed this model: the elements have broad receptive fields and encode the learned dynamics as a map structured in an intrinsic coordinate system closely related to the geometry of the skeletomusculature. The low- -level nature of these elements suggests that they may represent asset of primitives with which a movement is represented in the CNS.


Author[s]: W. Eric L. Grimson

Why Stereo Vision is Not Always About 3D Reconstruction

July 1993

It is commonly assumed that the goal of stereovision is computing explicit 3D scene reconstructions. We show that very accurate camera calibration is needed to support this, and that such accurate calibration is difficult to achieve and maintain. We argue that for tasks like recognition, figure/ground separation is more important than 3D depth reconstruction, and demonstrate a stereo algorithm that supports figure/ground separation without 3D reconstruction.


Author[s]: Ronald D. Chaney

Feature Extraction Without Edge Detection

September 1993

Information representation is a critical issue in machine vision. The representation strategy in the primitive stages of a vision system has enormous implications for the performance in subsequent stages. Existing feature extraction paradigms, like edge detection, provide sparse and unreliable representations of the image information. In this thesis, we propose a novel feature extraction paradigm. The features consist of salient, simple parts of regions bounded by zero-crossings. The features are dense, stable, and robust. The primary advantage of the features is that they have abstract geometric attributes pertaining to their size and shape. To demonstrate the utility of the feature extraction paradigm, we apply it to passive navigation. We argue that the paradigm is applicable to other early vision problems.



Author[s]: Jose L. Marroquin

Measure Fields for Function Approximation

June 1993

The computation of a piecewise smooth function that approximates a finite set of data points may be decomposed into two decoupled tasks: first, the computation of the locally smooth models, and hence, the segmentation of the data into classes that consist on the sets of points best approximated by each model, and second, the computation of the normalized discriminant functions for each induced class. The approximating function may then be computed as the optimal estimator with respect to this measure field. We give an efficient procedure for effecting both computations, and for the determination of the optimal number of components.



Author[s]: Philippe G. Schyns and Heinrich H. Bulthoff

Conditions for Viewpoint Dependent Face Recognition

August 1993

Poggio and Vetter (1992) showed that learning one view of a bilaterally symmetric object could be sufficient for its recognition, if this view allows the computation of a symmetric, "virtual," view. Faces are roughly bilaterally symmetric objects. Learning a side- view--which always has a symmetric view-- should allow for better generalization performances than learning the frontal view. Two psychophysical experiments tested these predictions. Stimuli were views of shaded 3D models of laser-scanned faces. The first experiment tested whether a particular view of a face was canonical. The second experiment tested which single views of a face give rise to best generalization performances. The results were compatible with the symmetry hypothesis: Learning a side view allowed better generalization performances than learning the frontal view.



Author[s]: David Beymer, Amnon Shashua and Tomaso Poggio

Example Based Image Analysis and Synthesis

November 1993

Image analysis and graphics synthesis can be achieved with learning techniques using directly image examples without physically- based, 3D models. In our technique: -- the mapping from novel images to a vector of "pose" and "expression" parameters can be learned from a small set of example images using a function approximation technique that we call an analysis network; -- the inverse mapping from input "pose" and "expression" parameters to output images can be synthesized from a small set of example images and used to produce new images using a similar synthesis network. The techniques described here have several applications in computer graphics, special effects, interactive multimedia and very low bandwidth teleconferencing.



Author[s]: Federico Girosi, Michael Jones and Tomaso Poggio

Priors Stabilizers and Basis Functions: From Regularization to Radial, Tensor and Additive Splines

June 1993

We had previously shown that regularization principles lead to approximation schemes, as Radial Basis Functions, which are equivalent to networks with one layer of hidden units, called Regularization Networks. In this paper we show that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models, Breiman's hinge functions and some forms of Projection Pursuit Regression. In the probabilistic interpretation of regularization, the different classes of basis functions correspond to different classes of prior probabilities on the approximating function spaces, and therefore to different types of smoothness assumptions. In the final part of the paper, we also show a relation between activation functions of the Gaussian and sigmoidal type.


Author[s]: Christine L. Tsien

Maygen: A Symbolic Debugger Generation System

May 1993

With the development of high-level languages for new computer architectures comes the need for appropriate debugging tools as well. One method for meeting this need would be to develop, from scratch, a symbolic debugger with the introduction of each new language implementation for any given architecture. This, however, seems to require unnecessary duplication of effort among developers. This paper describes Maygen, a "debugger generation system," designed to efficiently provide the desired language-dependent and architecture-dependent debuggers. A prototype of the Maygen system has been implemented and is able to handle the semantically different languages of C and OPAL.


Author[s]: Guillermo J. Rozas

Translucent Procedures, Abstraction without Opacity

October 1993

This report introduces TRANSLUCENT PROCEDURES as a new mechanism for implementing behavioral abstractions. Like an ordinary procedure, a translucent procedure can be invoked, and thus provides an obvious way to capture a BEHAVIOR. Translucent procedures, like ordinary procedures, can be manipulated as first-class objects and combined using functional composition. But unlike ordinary procedures, translucent procedures have structure that can be examined in well-specified non- destructive ways, without invoking the procedure.


Author[s]: Gideon P. Stein

Internal Camera Calibration Using Rotation and Geometric Shapes

February 1993

This paper describes a simple method for internal camera calibration for computer vision. This method is based on tracking image features through a sequence of images while the camera undergoes pure rotation. The location of the features relative to the camera or to each other need not be known and therefore this method can be used both for laboratory calibration and for self calibration in autonomous robots working in unstructured environments. A second method of calibration is also presented. This method uses simple geometric objects such as spheres and straight lines to The camera parameters. Calibration is performed using both methods and the results compared.


Author[s]: Michael E. Caine

The Design of Shape from Motion Constraints

September 1993

This report presents a set of representations methodologies and tools for the purpose of visualizing, analyzing and designing functional shapes in terms of constraints on motion. The core of the research is an interactive computational environment that provides an explicit visual representation of motion constraints produced by shape interactions, and a series of tools that allow for the manipulation of motion constraints and their underlying shapes for the purpose of design.


Author[s]: Charles L. Isbell

Explorations of the Practical Issues of Learning Prediction-Control Tasks Using Temporal Difference Learning Methods

December 1992

There has been recent interest in using temporal difference learning methods to attack problems of prediction and control. While these algorithms have been brought to bear on many problems, they remain poorly understood. It is the purpose of this thesis to further explore these algorithms, presenting a framework for viewing them and raising a number of practical issues and exploring those issues in the context of several case studies. This includes applying the TD(lambda) algorithm to: 1) learning to play tic-tac-toe from the outcome of self-play and of play against a perfectly-playing opponent and 2) learning simple one-dimensional segmentation tasks.


Author[s]: Neil C. Singer and Warren P. Seering

A Simplified Method for Deriving Equations of Motion For Continuous Systems with Flexible Members

May 1993

A method is proposed for deriving dynamical equations for systems with both rigid and flexible components. During the derivation, each flexible component of the system is represented by a "surrogate element" which captures the response characteristics of that component and is easy to mathematically manipulate. The derivation proceeds essentially as if each surrogate element were a rigid body. Application of an extended form of Lagrange's equation yields a set of simultaneous differential equations which can then be transformed to be the exact, partial differential equations for the original flexible system. This method's use facilitates equation generation either by an analyst or through application of software-based symbolic manipulation.


Author[s]: Henry M. Wu

A Method for Eliminating Skew Introduced by Non-Uniform Buffer Delay and Wire Lengths in Clock Distribution Trees

April 1993

The computation of a piecewise smooth function that approximates a finite set of data points is decomposed into two decoupled tasks: first, the computation of the locally smooth models, and hence, the segmentation of the data into classes that consist on the sets of points best approximated by each model, and second, the computation of the normalized discriminant functions for each induced class. The approximating function is then computed as the optimal estimator with respect to this measure field. Applications to image processing and time series prediction are presented as well.


Author[s]: Brian Eberman and S. Kenneth Salisbury

Application of Charge Detection to Dynamic Contact Sensing

March 1993

The manipulation contact forces convey substantial information about the manipulation state. This paper address the fundamental problem of interpreting the force signals without any additional manipulation context. Techniques based on forms of the generalized sequential likelihood ratio test are used to segment individual strain signals into statistically equivalent pieces. We report on our experimental development of the segmentation algorithm and on its results for contact states. The sequential likelihood ratio test is reviewed and some of its special cases and optimal properties are discussed. Finally, we conclude by discussing extensions to the techniques and a contact interpretation framework.


Author[s]: S. Tanveer F. Mahmood

Attentional Selection in Object Recognition


A key problem in object recognition is selection, namely, the problem of identifying regions in an image within which to start the recognition process, ideally by isolating regions that are likely to come from a single object. Such a selection mechanism has been found to be crucial in reducing the combinatorial search involved in the matching stage of object recognition. Even though selection is of help in recognition, it has largely remained unsolved because of the difficulty in isolating regions belonging to objects under complex imaging conditions involving occlusions, changing illumination, and object appearances. This thesis presents a novel approach to the selection problem by proposing a computational model of visual attentional selection as a paradigm for selection in recognition. In particular, it proposes two modes of attentional selection, namely, attracted and pay attention modes as being appropriate for data and model-driven selection in recognition. An implementation of this model has led to new ways of extracting color, texture and line group information in images, and their subsequent use in isolating areas of the scene likely to contain the model object. Among the specific results in this thesis are: a method of specifying color by perceptual color categories for fast color region segmentation and color-based localization of objects, and a result showing that the recognition of texture patterns on model objects is possible under changes in orientation and occlusions without detailed segmentation. The thesis also presents an evaluation of the proposed model by integrating with a 3D from 2D object recognition system and recording the improvement in performance. These results indicate that attentional selection can significantly overcome the computational bottleneck in object recognition, both due to a reduction in the number of features, and due to a reduction in the number of matches during recognition using the information derived during selection. Finally, these studies have revealed a surprising use of selection, namely, in the partial solution of the pose of a 3D object.


Author[s]: Patrick Sobalvarro

A Lifetime-based Garbage Collector for LISP Systems on General-Purpose Computers

February 1988

Garbage collector performance in LISP systems on custom hardware has been substantially improved by the adoption of lifetime-based garbage collection techniques. To date, however, successful lifetime-based garbage collectors have required special- purpose hardware, or at least privileged access to data structures maintained by the virtual memory system. I present here a lifetime-based garbage collector requiring no special-purpose hardware or virtual memory system support, and discuss its performance.


Author[s]: David W. Jacobs

Recognizing 3-D Objects Using 2-D Images

April 1993

We discuss a strategy for visual recognition by forming groups of salient image features, and then using these groups to index into a data base to find all of the matching groups of model features. We discuss the most space efficient possible method of representing 3-D models for indexing from 2-D data, and show how to account for sensing error when indexing. We also present a convex grouping method that is robust and efficient, both theoretically and in practice. Finally, we combine these modules into a complete recognition system, and test its performance on many real images.


Author[s]: Pawan Sinha

Pattern Motion Perception: Feature Tracking or Integration of Component Motions?

October 1994

A key question regarding primate visual motion perception is whether the motion of 2D patterns is recovered by tracking distinctive localizable features [Lorenceau and Gorea, 1989; Rubin and Hochstein, 1992] or by integrating ambiguous local motion estimates [Adelson and Movshon, 1982; Wilson and Kim, 1992]. For a two-grating plaid pattern, this translates to either tracking the grating intersections or to appropriately combining the motion estimates for each grating. Since both component and feature information are simultaneously available in any plaid pattern made of contrast defined gratings, it is unclear how to determine which of the two schemes is actually used to recover the plaid"s motion. To address this problem, we have designed a plaid pattern made with subjective, rather than contrast defined, gratings. The distinguishing characteristic of such a plaid pattern is that it contains no contrast defined intersections that may be tracked. We find that notwithstanding the absence of such features, observers can accurately recover the pattern velocity. Additionally we show that the hypothesis of tracking "illusory features" to estimate pattern motion does not stand up to experimental test. These results present direct evidence in support of the idea that calls for the integration of component motions over the one that mandates tracking localized features to recover 2D pattern motion. The localized features, we suggest, are used primarily as providers of grouping information - which component motion signals to integrate and which not to.


Author[s]: Rajeev Surati

Exploiting the Parallelism Exposed by Partial Evaluation

May 1994

We describe the key role played by partial evaluation in the Supercomputing Toolkit, a parallel computing system for scientific applications that effectively exploits the vast amount of parallelism exposed by partial evaluation. The Supercomputing Toolkit parallel processor and its associated partial evaluation-based compiler have been used extensively by scientists at MIT, and have made possible recent results in astrophysics showing that the motion of the planets in our solar system is chaotically unstable.


Author[s]: Andrew A. Berlin and Rajeev J. Surati

Exploiting the Parallelism Exposed by Partial Evaluation

April 1993

We describe an approach to parallel compilation that seeks to harness the vast amount of fine-grain parallelism that is exposed through partial evaluation of numerically-intensive scientific programs. We have constructed a compiler for the Supercomputer Toolkit parallel processor that uses partial evaluation to break down data abstractions and program structure, producing huge basic blocks that contain large amounts of fine-grain parallelism. We show that this fine-grain prarllelism can be effectively utilized even on coarse-grain parallel architectures by selectively grouping operations together so as to adjust the parallelism grain-size to match the inter-processor communication capabilities of the target architecture.


Author[s]: Jonathan Amsterdam

Automatic Qualitative Modeling of Dynamic Physical Systems

January 1993

This report describes MM, a computer program that can model a variety of mechanical and fluid systems. Given a system's structure and qualitative behavior, MM searches for models using an energy- based modeling framework. MM uses general facts about physical systems to relate behavioral and model properties. These facts enable a more focussed search for models than would be obtained by mere comparison of desired and predicted behaviors. When these facts do not apply, MM uses behavior- constrained qualitative simulation to verify candidate models efficiently. MM can also design experiments to distinguish among multiple candidate models.


Author[s]: Clay Matthew Thompson

Robust Photo-topography by Fusing Shape-from-Shading and Stereo

February 1993

Methods for fusing two computer vision methods are discussed and several example algorithms are presented to illustrate the variational method of fusing algorithms. The example algorithms seek to determine planet topography given two images taken from two different locations with two different lighting conditions. The algorithms each employ assingle cost function that combines the computer vision methods of shape-from- shading and stereo in different ways. The algorithms are closely coupled and take into account all the constraints of the photo- topography problem. The algorithms are run on four synthetic test image sets of varying difficulty.


Author[s]: Tao Daniel Alter

Robust and Efficient 3D Recognition by Alignment

September 1992

Alignment is a prevalent approach for recognizing 3D objects in 2D images. A major problem with current implementations is how to robustly handle errors that propagate from uncertainties in the locations of image features. This thesis gives a technique for bounding these errors. The technique makes use of a new solution to the problem of recovering 3D pose from three matching point pairs under weak-perspective projection. Furthermore, the error bounds are used to demonstrate that using line segments for features instead of points significantly reduces the false positive rate, to the extent that alignment can remain reliable even in cluttered scenes.



Author[s]: Thomas Vetter, Tomaso Poggio and Heinrich B'ulthoff

3D Object Recognition: Symmetry and Virtual Views

December 1992

Many 3D objects in the world around us are strongly constrained. For instance, not only cultural artifacts but also many natural objects are bilaterally symmetric. Thoretical arguments suggest and psychophysical experiments confirm that humans may be better in the recognition of symmetric objects. The hypothesis of symmetry-induced virtual views together with a network model that successfully accounts for human recognition of generic 3D objects leads to predictions that we have verified with psychophysical experiments.


Author[s]: Philip Greenspun

Site Controller: A System for Computer-Aided Civil Engineering and Construction

February 1993

A revolution in earthmoving, a $100 billion industry, can be achieved with three components: the GPS location system, sensors and computers in bulldozers, and SITE CONTROLLER, a central computer system that maintains design data and directs operations. The first two components are widely available; I built SITE CONTROLLER to complete the triangle and describe it here. SITE CONTROLLER assists civil engineers in the design, estimation, and construction of earthworks, including hazardous waste site remediation. The core of SITE CONTROLLER is a site modelling system that represents existing and prospective terrain shapes, roads, hydrology, etc. Around this core are analysis, simulation, and vehicle control tools. Integrating these modules into one program enables civil engineers and contractors to use a single interface and database throughout the life of a project.



Author[s]: Amnon Shashua

Geometric and Algebraic Aspects of 3D Affine and Projective Structures from Perspective 2D Views

July 1993

We investigate the differences --- conceptually and algorithmically --- between affine and projective frameworks for the tasks of visual recognition and reconstruction from perspective views. It is shown that an affine invariant exists between any view and a fixed view chosen as a reference view. This implies that for tasks for which a reference view can be chosen, such as in alignment schemes for visual recognition, projective invariants are not really necessary. We then use the affine invariant to derive new algebraic connections between perspective views. It is shown that three perspective views of an object are connected by certain algebraic functions of image coordinates alone (no structure or camera geometry needs to be involved).



Author[s]: Tomaso Poggio and Anya Hurlbert

Observations on Cortical Mechanisms for Object Recognition andsLearning

December 1993

This paper sketches a hypothetical cortical architecture for visual 3D object recognition based on a recent computational model. The view-centered scheme relies on modules for learning from examples, such as Hyperbf-like networks. Such models capture a class of explanations we call Memory-Based Models (MBM) that contains sparse population coding, memory-based recognition, and codebooks of prototypes. Unlike the sigmoidal units of some artificial neural networks, the units of MBMs are consistent with the description of cortical neurons. We describe how an example of MBM may be realized in terms of cortical circuitry and biophysical mechanisms, consistent with psychophysical and physiological data.


Author[s]: Gary C. Borchardt

Causal Reconstruction

February 1993

Causal reconstruction is the task of reading a written causal description of a physical behavior, forming an internal model of the described activity, and demonstrating comprehension through question answering. T his task is difficult because written d escriptions often do not specify exactly how r eferenced events fit together. This article (1) ch aracterizes the causal reconstruction problem, (2) presents a representation called transition space, which portrays events in terms of "transitions,'' or collections of changes expressible in everyday language, and (3) describes a program called PATHFINDER, which uses the transition space representation to perform causal reconstruction on simplified English descriptions of physical activity.


Author[s]: Athanassios G. Siapas

A Global Approach to Parameter Estimation of Chaotic Dynamical Systems

December 1992

We present a novel approach to parameter estimation of systems with complicated dynamics, as well as evidence for the existence of a universal power law that enables us to quantify the dependence of global geometry on small changes in the parameters of the system. This power law gives rise to what seems to be a new dynamical system invariant.


Author[s]: Amnon Shashua

Geometry and Photometry in 3D Visual Recognition

November 1992

The report addresses the problem of visual recognition under two sources of variability: geometric and photometric. The geometric deals with the relation between 3D objects and their views under orthographic and perspective projection. The photometric deals with the relation between 3D matte objects and their images under changing illumination conditions. Taken together, an alignment- based method is presented for recognizing objects viewed from arbitrary viewing positions and illuminated by arbitrary settings of light sources.


Author[s]: S. Tanveer F. Mahmood

Data and Model-Driven Selection Using Parallel-Line Groups

May 1993

A key problem in model-based object recognition is selection, namely, the problem of isolating regions in an image that are likely to come from a single object. This isolation can be either based solely on image data (data-driven) or can incorporate the knowledge of the model object (model- driven). In this paper we present an approach that exploits the property of closely-spaced parallelism between lines on objects to achieve data and model-driven selection. Specifically, we present a method of identifying groups of closely-spaced parallel lines in images that generates a linear number of small-sized and reliable groups thus meeting several of the desirable requirements of a grouping scheme for recognition. The line groups generated form the basis for data and model-driven selection. Data-driven selection is achieved by selecting salient line groups as judged by a saliency measure that emphasizes the likelihood of the groups coming from single objects. The approach to model-driven selection, on the other hand, uses the description of closely- spaced parallel line groups on the model object to selectively generate line groups in the image that are likely to eb the projections of the model groups under a set of allowable transformations and taking into account the effect of occlusions, illumination changes, and imaging errors. We then discuss the utility of line groups-based selection in the context of reducing the search involved in recognition, both as an independent selection mechanism, and when used in combination with other cues such as color. Finally, we present results that indicate a vast improvement in the performance of a recognition system that is integrated with parallel line groups-based selection.


Author[s]: William M. Wells III

Statistical Object Recognition

January 1993

Two formulations of model-based object recognition are described. MAP Model Matching evaluates joint hypotheses of match and pose, while Posterior Marginal Pose Estimation evaluates the pose only. Local search in pose space is carried out with the Expectation--Maximization (EM) algorithm. Recognition experiments are described where the EM algorithm is used to refine and evaluate pose hypotheses in 2D and 3D. Initial hypotheses for the 2D experiments were generated by a simple indexing method: Angle Pair Indexing. The Linear Combination of Views method of Ullman and Basri is employed as the projection model in the 3D experiments.


Author[s]: Ronald Chaney

Complexity as a Sclae-Space for the Medial Axis Transform

January 1993

The medial axis skeleton is a thin line graph that preserves the topology of a region. The skeleton has often been cited as a useful representation for shape description, region interpretation, and object recognition. Unfortunately, the computation of the skeleton is extremely sensitive to variations in the bounding contour. In this paper, we describe a robust method for computing the medial axis skeleton across a variety of scales. The resulting scale-space is parametric with the complexity of the skeleton, where the complexity is defined as the number of branches in the skeleton.


Author[s]: Michael J. Jones

Using Recurrent Networks for Dimensionality Reduction

September 1992

This report explores how recurrent neural networks can be exploited for learning high- dimensional mappings. Since recurrent networks are as powerful as Turing machines, an interesting question is how recurrent networks can be used to simplify the problem of learning from examples. The main problem with learning high-dimensional functions is the curse of dimensionality which roughly states that the number of examples needed to learn a function increases exponentially with input dimension. This thesis proposes a way of avoiding this problem by using a recurrent network to decompose a high-dimensional function into many lower dimensional functions connected in a feedback loop.


Author[s]: Karen B. Sarachik

Limitations of Geometric Hashing in the Presence of Gaussian Noise

October 1992

This paper presents a detailed error analysis of geometric hashing for 2D object recogition. We analytically derive the probability of false positives and negatives as a function of the number of model and image, features and occlusion, using a 2D Gaussian noise model. The results are presented in the form of ROC (receiver-operating characteristic) curves, which demonstrate that the 2D Gaussian error model always has better performance than that of the bounded uniform model. They also directly indicate the optimal performance that can be achieved for a given clutter and occlusion rate, and how to choose the thresholds to achieve these rates.


Author[s]: Ronald D. Chaney

Analytical Representation of Contours

October 1992

The interpretation and recognition of noisy contours, such as silhouettes, have proven to be difficult. One obstacle to the solution of these problems has been the lack of a robust representation for contours. The contour is represented by a set of pairwise tangent circular arcs. The advantage of such an approach is that mathematical properties such as orientation and curvature are explicityly represented. We introduce a smoothing criterion for the contour tht optimizes the tradeoff between the complexity of the contour and proximity of the data points. The complexity measure is the number of extrema of curvature present in the contour. The smoothing criterion leads us to a true scale-space for contours. We describe the computation of the contour representation as well as the computation of relevant properties of the contour. We consider the potential application of the representation, the smoothing paradigm, and the scale-space to contour interpretation and recognition.


Author[s]: Ronen Basri

Recognition by Prototypes

December 1992

A scheme for recognizing 3D objects from single 2D images is introduced. The scheme proceeds in two stages. In the first stage, the categorization stage, the image is compared to prototype objects. For each prototype, the view that most resembles the image is recovered, and, if the view is found to be similar to the image, the class identity of the object is determined. In the second stage, the identification stage, the observed object is compared to the individual models of its class, where classes are expected to contain objects with relatively similar shapes. For each model, a view that matches the image is sought. If such a view is found, the object's specific identity is determined. The advantage of categorizing the object before it is identified is twofold. First, the image is compared to a smaller number of models, since only models that belong to the object's class need to be considered. Second, the cost of comparing the image to each model in a classis very low, because correspondence is computed once for the whoel class. More specifically, the correspondence and object pose computed in the categorization stage to align the prototype with the image are reused in the identification stage to align the individual models with the image. As a result, identification is reduced to a series fo simple template comparisons. The paper concludes with an algorithm for constructing optimal prototypes for classes of objects.


Author[s]: Jose L. Marroquin and Federico Girosi

Some Extensions of the K-Means Algorithm for Image Segmentation and Pattern Classification

January 1993

In this paper we present some extensions to the k-means algorithm for vector quantization that permit its efficient use in image segmentation and pattern classification tasks. It is shown that by introducing state variables that correspond to certain statistics of the dynamic behavior of the algorithm, it is possible to find the representative centers fo the lower dimensional maniforlds that define the boundaries between classes, for clouds of multi-dimensional, mult-class data; this permits one, for example, to find class boundaries directly from sparse data (e.g., in image segmentation tasks) or to efficiently place centers for pattern classification (e.g., with local Gaussian classifiers). The same state variables can be used to define algorithms for determining adaptively the optimal number of centers for clouds of data with space-varying density. Some examples of the applicatin of these extensions are also given.


Author[s]: Elizabeth Bradley

Taming Chaotic Circuits

September 1992

Control algorithms that exploit chaotic behavior can vastly improve the performance of many practical and useful systems. The program Perfect Moment is built around a collection of such techniques. It autonomously explores a dynamical system's behavior, using rules embodying theorems and definitions from nonlinear dynamics to zero in on interesting and useful parameter ranges and state-space regions. It then constructs a reference trajectory based on that information and causes the system to follow it. This program and its results are illustrated with several examples, among them the phase- locked loop, where sections of chaotic attractors are used to increase the capture range of the circuit.


Author[s]: Feng Zhao

Automatic Analysis and Synthesis of Controllers for Dynamical Systems Based On P

September 1992

I present a novel design methodology for the synthesis of automatic controllers, together with a computational environment---the Control Engineer's Workbench---integrating a suite of programs that automatically analyze and design controllers for high-performance, global control of nonlinear systems. This work demonstrates that difficult control synthesis tasks can be automated, using programs that actively exploit and efficiently represent knowledge of nonlinear dynamics and phase space and effectively use the representation to guide and perform the control design. The Control Engineer's Workbench combines powerful numerical and symbolic computations with artificial intelligence reasoning techniques. As a demonstration, the Workbench automatically designed a high-quality maglev controller that outperforms a previous linear design by a factor of 20.


Author[s]: M. Ali Taalebinezhaad

Robot Motion Vision by Fixation

September 1992

In many motion-vision scenarios, a camera (mounted on a moving vehicle) takes images of an environment to find the "motion'' and shape. We introduce a direct-method called fixation for solving this motion-vision problem in its general case. Fixation uses neither feature-correspondence nor optical-flow. Instead, spatio-temporal brightness gradients are used directly. In contrast to previous direct methods, fixation does not restrict the motion or the environment. Moreover, fixation method neither requires tracked images as its input nor uses mechanical tracking for obtaining fixated images. The experimental results on real images are presented and the implementation issues and techniques are discussed.


Author[s]: M. Ali Taalebinezhaad

Visual Tracking

October 1992

A typical robot vision scenario might involve a vehicle moving with an unknown 3D motion (translation and rotation) while taking intensity images of an arbitrary environment. This paper describes the theory and implementation issues of tracking any desired point in the environment. This method is performed completely in software without any need to mechanically move the camera relative to the vehicle. This tracking technique is simple an inexpensive. Furthermore, it does not use either optical flow or feature correspondence. Instead, the spatio-temporal gradients of the input intensity images are used directly. The experimental results presented support the idea of tracking in software. The final result is a sequence of tracked images where the desired point is kept stationary in the images independent of the nature of the relative motion. Finally, the quality of these tracked images are examined using spatio-temporal gradient maps.


Author[s]: T.D. Alter

3D Pose from Three Corresponding Points Under Weak-Perspective Projection

July 1992

Model-based object recognition commonly involves using a minimal set of matched model and image points to compute the pose of the model in image coordinates. Furthermore, recognition systems often rely on the "weak-perspective" imaging model in place of the perspective imaging model. This paper discusses computing the pose of a model from three corresponding points under weak-perspective projection. A new solution to the problem is proposed which, like previous solutins, involves solving a biquadratic equation. Here the biquadratic is motivate geometrically and its solutions, comprised of an actual and a false solution, are interpreted graphically. The final equations take a new form, which lead to a simple expression for the image position of any unmatched model point.


Author[s]: Rajeev Surati

A Parallelizing Compiler Based on Partial Evaluation

July 1993

We constructed a parallelizing compiler that utilizes partial evaluation to achieve efficient parallel object code from very high-level data independent source programs. On several important scientific applications, the compiler attains parallel performance equivalent to or better than the best observed results from the manual restructuring of code. This is the first attempt to capitalize on partial evaluation's ability to expose low-level parallelism. New static scheduling techniques are used to utilize the fine-grained parallelism of the computations. The compiler maps the computation graph resulting from partial evaluation onto the Supercomputer Toolkit, an eight VLIW processor parallel computer.


Author[s]: Ehud Rivlin and Ronen Basri

Localization and Positioning Using Combinations of Model Views

September 1992

A method for localization and positioning in an indoor environment is presented. The method is based on representing the scene as a set of 2D views and predicting the appearances of novel views by linear combinations of the model views. The method is accurate under weak perspective projection. Analysis of this projection as well as experimental results demonstrate that in many cases it is sufficient to accurately describe the scene. When weak perspective approximation is invalid, an iterative solution to account for the perspective distortions can be employed. A simple algorithm for repositioning, the task of returning to a previously visited position defined by a single view, is derived from this method.


Author[s]: Nicola Ancona and Tomaso Poggio

Optical Flow From 1D Correlation: Application to a Simple Time-To-Crash Detector

October 1993

In the first part of this paper we show that a new technique exploiting 1D correlation of 2D or even 1D patches between successive frames may be sufficient to compute a satisfactory estimation of the optical flow field. The algorithm is well-suited to VLSI implementations. The sparse measurements provided by the technique can be used to compute qualitative properties of the flow for a number of different visual tsks. In particular, the second part of the paper shows how to combine our 1D correlation technique with a scheme for detecting expansion or rotation ([5]) in a simple algorithm which also suggests interesting biological implications. The algorithm provides a rough estimate of time-to-crash. It was tested on real image sequences. We show its performance and compare the results to previous approaches.


Author[s]: Thomas M. Breuel

Geometric Aspects of Visual Object Recognition

May 1992

This thesis presents there important results in visual object recognition based on shape. (1) A new algorithm (RAST; Recognition by Adaptive Sudivisions of Tranformation space) is presented that has lower average-case complexity than any known recognition algorithm. (2) It is shown, both theoretically and empirically, that representing 3D objects as collections of 2D views (the "View-Based Approximation") is feasible and affects the reliability of 3D recognition systems no more than other commonly made approximations. (3) The problem of recognition in cluttered scenes is considered from a Bayesian perspective; the commonly-used "bounded- error errorsmeasure" is demonstrated to correspond to an independence assumption. It is shown that by modeling the statistical properties of real-scenes better, objects can be recognized more reliably.


Author[s]: Ronen Basri and Daphna Weinshall

Distance Metric Between 3D Models and 3D Images for Recognition and Classification

July 1992

Similarity measurements between 3D objects and 2D images are useful for the tasks of object recognition and classification. We distinguish between two types of similarity metrics: metrics computed in image-space (image metrics) and metrics computed in transformation-space (transformation metrics). Existing methods typically use image and the nearest view of the object. Example for such a measure is the Euclidean distance between feature points in the image and corresponding points in the nearest view. (Computing this measure is equivalent to solving the exterior orientation calibration problem.) In this paper we introduce a different type of metrics: transformation metrics. These metrics penalize for the deformatoins applied to the object to produce the observed image. We present a transformation metric that optimally penalizes for "affine deformations" under weak-perspective. A closed-form solution, together with the nearest view according to this metric, are derived. The metric is shown to be equivalent to the Euclidean image metric, in the sense that they bound each other from both above and below. For Euclidean image metric we offier a sub-optimal closed-form solution and an iterative scheme to compute the exact solution.


Author[s]: B. Whitney Rappole, Jr.

Minimizing Residual Vibrations in Flexible Systems

June 1992

Residual vibrations degrade the performance of many systems. Due to the lightweight and flexible nature of space structures, controlling residual vibrations is especially difficult. Also, systems such as the Space Shuttle remote Manipulator System have frequencies that vary significantly based upon configuration and loading. Recently, a technique of minimizing vibrations in flexible structures by command input shaping was developed. This document presents research completed in developing a simple, closed- form method of calculating input shaping sequences for two-mode systems and a system to adapt the command input shaping technique to known changes in system frequency about the workspace. The new techniques were tested on a three-link, flexible manipulator.


Author[s]: Vijay Balasubramanian

Equivalence and Reduction of Hidden Markov Models

January 1993

This report studies when and why two Hidden Markov Models (HMMs) may represent the same stochastic process. HMMs are characterized in terms of equivalence classes whose elements represent identical stochastic processes. This characterization yields polynomial time algorithms to detect equivalent HMMs. We also find fast algorithms to reduce HMMs to essentially unique and minimal canonical representations. The reduction to a canonical form leads to the definition of 'Generalized Markov Models' which are essentially HMMs without the positivity constraint on their parameters. We discuss how this generalization can yield more parsimonious representations of stochastic processes at the cost of the probabilistic interpretation of the model parameters.


Author[s]: Michael D. Ernst (Editor)

Intellectual Property in Computing: (How) Should Software Be Protected? An Industry Perspective

May 1992

The future of the software industry is today being shaped in the courtroom. Most discussions of intellectual property to date, however, have been frames as debates about how the existing law --- promulgated long before the computer revolution --- should be applied to software. This memo is a transcript of a panel discussion on what forms of legal protection should apply to software to best serve both the industry and society in general. After addressing that question we can consider what laws would bring this about.


Author[s]: Kenneth W. Chang

Shaping Inputs to Reduce Vibration in Flexible Space Structures

June 1992

Future NASA plans to launch large space strucutres solicit the need for effective vibration control schemes which can solve the unique problems associated with unwanted residual vibration in flexible spacecraft. In this work, a unique method of input command shaping called impulse shaping is examined. A theoretical background is presented along with some insight into the methdos of calculating multiple mode sequences. The Middeck Active Control Experiment (MACE) is then described as the testbed for hardware experiments. These results are shown and some of the difficulties of dealing with nonlinearities are discussed. The paper is concluded with some conclusions about calculating and implementing impulse shaping in complex nonlinear systems.


Author[s]: Thomas Marill

Why Do We See Three-dimensional Objects?

June 1992

When we look at certain line-drawings, we see three-dimensional objects. The question is why; why not just see two-dimensional images? We theorize that we see objects rather than images because the objects we see are, in a certain mathematical sense, less complex than the images; and that furthermore the particular objects we see will be the least complex of the available alternatives. Experimental data supporting the theory is reported. The work is based on ideas of Solomonoff, Kolmogorov, and the "minimum description length'' concepts of Rissanen.


Author[s]: Timothy D. Tuttle

Understanding and Modeling the Behavior of a Harmonic Drive Gear Transmission

May 1992

In my research, I have performed an extensive experimental investigation of harmonic-drive properties such as stiffness, friction, and kinematic error. From my experimental results, I have found that these properties can be sharply non-linear and highly dependent on operating conditions. Due to the complex interaction of these poorly behaved transmission properties, dynamic response measurements showed surprisingly agitated behavior, especially around system resonance. Theoretical models developed to mimic the observed response illustrated that non-linear frictional effects cannot be ignored in any accurate harmonic-drive representation. Additionally, if behavior around system resonance must be replicated, kinematic error and transmission compliance as well as frictional dissipation from gear- tooth rubbing must all be incorporated into the model.


Author[s]: Patrick G. Sobalvarro

Probabilistic Analysis of Multistage Interconnection Network Performance

April 1992

We present methods of calculating the value of two performance parameters for multipath, multistage interconnection networks: the normalized throughput and the probability of successful message transmission. We develop a set of exact equations for the loading probability mass functions of network channels and a program for solving them exactly. We also develop a Monte Carlo method for approxmiate solution of the equations, and show that the resulting approximation method will always calculate the values of the performance parameters more quickly than direct simulation.


Author[s]: Amnon Shashua

Projective Structure from Two Uncalibrated Images: Structure from Motion and RecRecognition

September 1992

This paper addresses the problem of recovering relative structure, in the form of an invariant, referred to as projective structure, from two views of a 3D scene. The invariant structure is computed without any prior knowledge of camera geometry, or internal calibration, and with the property that perspective and orthographic projections are treated alike, namely, the system makes no assumption regarding the existence of perspective distortions in the input images.


Author[s]: W. Eric Grimson, Daniel P. Huttenlocher and T. D. Alter

Recognizing 3D Ojbects of 2D Images: An Error Analysis

July 1992

Many object recognition systems use a small number of pairings of data and model features to compute the 3D transformation from a model coordinate frame into the sensor coordinate system. With perfect image data, these systems work well. With uncertain image data, however, their performance is less clear. We examine the effects of 2D sensor uncertainty on the computation of 3D model transformations. We use this analysis to bound the uncertainty in the transformation parameters, and the uncertainty associated with transforming other model features into the image. We also examine the impact of the such transformation uncertainty on recognition methods.


Author[s]: Gerald J. Sussman and Jack Wisdom

Chaotic Evolution of the Solar System

March 1992

The evolution of the entire planetary system has been numerically integrated for a time span of nearly 100 million years. This calculation confirms that the evolution of the solar system as a whole is chaotic, with a time scale of exponential divergence of about 4 million years. Additional numerical experiments indicate that the Jovian planet subsystem is chaotic, although some small variations in the model can yield quasiperiodic motion. The motion of Pluto is independently and robustly chaotic.


Author[s]: Linda M. Wills

Automated Program Recognition by Graph Parsing

July 1992

Recognizing standard computational structures (cliches) in a program can help an experienced programmer understand the program. We develop a graph parsing approach to automating program recognition in which programs and cliches are represented in an attributed graph grammar formalism and recognition is achieved by graph parsing. In studying this approach, we evaluate our representation's ability to suppress many common forms of variation which hinder recognition. We investigate the expressiveness of our graph grammar formalism for capturing programming cliches. We empirically and analytically study the computational cost of our recognition approach with respect to two medium-sized, real-world simulator programs.


Author[s]: Lynne E. Parker

Local Versus Global Control Laws for Cooperative Agent Teams

March 1992

The design of the control laws governing the behavior of individual agents is crucial for the successful development of cooperative agent teams. These control laws may utilize a combination of local and/or global knowledge to achieve the resulting group behavior. A key difficulty in this development is deciding the proper balance between local and global control required to achieve the desired emergent group behavior. This paper addresses this issue by presenting some general guidelines and principles for determining the appropriate level of global versus local control. These principles are illustrated and implemented in a "keep formation'' cooperative task case study.


Author[s]: W. Richards and A. Jepson

What Makes a Good Feature?

April 1992

Using a Bayesian framework, we place bounds on just what features are worth computing if inferences about the world properties are to be made from image data. Previously others have proposed that useful features reflect "non-accidental'' or "suspicious'' configurations (such as parallel or colinear lines). We make these notions more precise and show them to be context sensitive.


Author[s]: Stephen W. Keckler

A Coupled Multi-ALU Processing Node for a Highly Parallel Computer

September 1992

This report describes Processor Coupling, a mechanism for controlling multiple ALUs on a single integrated circuit to exploit both instruction-level and inter-thread parallelism. A compiler statically schedules individual threads to discover available intra-thread instruction-level parallelism. The runtime scheduling mechanism interleaves threads, exploiting inter-thread parallelism to maintain high ALU utilization. ALUs are assigned to threads on a cycle byscycle basis, and several threads can be active concurrently. Simulation results show that Processor Coupling performs well both on single threaded and multi-threaded applications. The experiments address the effects of memory latencies, function unit latencies, and communication bandwidth between function units.


Author[s]: Tomaso Poggio and Roberto Brunelli

A Novel Approach to Graphics

February 1992

sWe show that we can optimally represent the set of 2D images producedsby the point features of a rigid 3D model as two lines in twoshigh-dimensional spaces. We then decribe a working recognition systemsin which we represent these spaces discretely in a hash table. We cansaccess this table at run time to find all the groups of model featuressthat could match a group of image features, accounting for the effectssof sensing error. We also use this representation of a model's imagessto demonstrate significant new limitations of two other approaches tosrecognition: invariants, and non-accidental properties.


Author[s]: David W. Jacobs

Space Efficient 3D Model Indexing

February 1992

We show that we can optimally represent the set of 2D images produced by the point features of a rigid 3D model as two lines in two high-dimensional spaces. We then decribe a working recognition system in which we represent these spaces discretely in a hash table. We can access this table at run time to find all the groups of model features that could match a group of image features, accounting for the effects of sensing error. We also use this representation of a model's images to demonstrate significant new limitations of two other approaches to recognition: invariants, and non- accidental properties.


Author[s]: Lyle J. Borg-Graham

On Directional Selectivity in Vertebrate Retina: An Experimental and Computational Study

January 1992

This thesis describes an investigation of retinal directional selectivity. We show intracellular (whole-cell patch) recordings in turtle retina which indicate that this computation occurs prior to the ganglion cell, and we describe a pre-ganglionic circuit model to account for this and other findings which places the non-linear spatio-temporal filter at individual, oriented amacrine cell dendrites. The key non-linearity is provided by interactions between excitatory and inhibitory synaptic inputs onto the dendrites, and their distal tips provide directionally selective excitatory outputs onto ganglion cells. Detailed simulations of putative cells support this model, given reasonable parameter constraints. The performance of the model also suggests that this computational substructure may be relevant within the dendritic trees of CNS neurons in general.


Author[s]: Kah-Kay Sung

A Vector Signal Processing Approach to Color

January 1992

Surface (Lambertain) color is a useful visual cue for analyzing material composition of scenes. This thesis adopts a signal processing approach to color vision. It represents color images as fields of 3D vectors, from which we extract region and boundary information. The first problem we face is one of secondary imaging effects that makes image color different from surface color. We demonstrate a simple but effective polarization based technique that corrects for these effects. We then propose a systematic approach of scalarizing color, that allows us to augment classical image processing tools and concepts for multi-dimensional color signals.


Author[s]: Shimon Edelman, Heinrich Bulthoff, and Erik Sklar

Task and Object Learning in Visual Recognition

January 1991

Human performance in object recognition changes with practice, even in the absence of feedback to the subject. The nature of the change can reveal important properties of the process of recognition. We report an experiment designed to distinguish between non-specific task learning and object- specific practice effects. The results of the experiment support the notion that learning through modification of object representations can be separated from less interesting effects of practice, if appropriate response measures (specifically, the coefficient of variation of response time over views of an object) are used. Furthermore, the results, obtained with computer-generated amoeba-like objects, corroborate previous findings regarding the development of canonical views and related phenomena with practice.


Author[s]: Tomaso Poggio and Thomas Vetter

Recognition and Structure from One 2D Model View: Observations on Prototypes, Object Classes and Symmetries

February 1992

In this note we discuss how recognition can be achieved from a single 2D model view exploiting prior knowledge of an object's structure (e.g. symmetry). We prove that for any bilaterally symmetric 3D object one non- accidental 2D model view is sufficient for recognition. Symmetries of higher order allow the recovery of structure from one 2D view. Linear transformations can be learned exactly from a small set of examples in the case of "linear object classes" and used to produce new views of an object from a single view.


Author[s]: P. A. Skordos

Time-Reversible Maxwell's Demon

September 1992

A time-reversible Maxwell's demon is demonstrated which creates a density difference between two chambers initialized to have equal density. The density difference is estimated theoretically and confirmed by computer simulations. It is found that the reversible Maxwell's demon compresses phase space volume even though its dynamics are time reversible. The significance of phase space volume compression in operating a microscopic heat engine is also discussed.


Author[s]: Michael de la Maza and Bruce Tidor

Boltzmannn Weighted Selection Improves Performance of Genetic Algorithms

December 1991

Modifiable Boltzmann selective pressure is investigated as a tool to control variability in optimizations using genetic algorithms. An implementation of variable selective pressure, modeled after the use of temperature as a parameter in simulated annealing approaches, is described. The convergence behavior of optimization runs is illustrated as a function of selective pressure; the method is compared to a genetic algorithm lacking this control feature and is shown to exhibit superior convergence properties on a small set of test problems. An analysis is presented that compares the selective pressure of this algorithm to a standard selection procedure.


Author[s]: Robert Givan and David McAllester

Tractable Inference Relations

December 1991

We consider the concept of local sets of inference rules. Locality is a syntactic condition on rule sets which guarantees that the inference relation defined by those rules is polynomial time decidable. Unfortunately, determining whether a given rule set is local can be difficult. In this paper we define inductive locality, a strengthening of locality. We also give a procedure which can automatically recognize the locality of any inductively local rule set. Inductive locality seems to be more useful that the earlier concept of strong locality. We show that locality, as a property of rule sets, is undecidable in general.


Author[s]: David McAllester and Jeffrey Siskind

Lifting Transformations

December 1991

Lifting is a well known technique in resolution theorem proving, logic programming, and term rewriting. In this paper we formulate lifting as an efficiency-motivated program transformation applicable to a wide variety of nondeterministic procedures. This formulation allows the immediate lifting of complex procedures, such as the Davis- Putnam algorithm, which are otherwise difficult to lift. We treat both classical lifting, which is based on unification, and various closely related program transformations which we also call lifting transformations. These nonclassical lifting transformations are closely related to constraint techniques in logic programming, resolution, and term rewriting.


Author[s]: David McAllester

Grammar Rewriting

December 1991

We present a term rewriting procedure based on congruence closure that can be used with arbitrary equational theories. This procedure is motivated by the pragmatic need to prove equations in equational theories where confluence can not be achieved. The procedure uses context free grammars to represent equivalence classes of terms. The procedure rewrites grammars rather than terms and uses congruence closure to maintain certain congruence properties of the grammar. Grammars provide concise representations of large term sets. Infinite term sets can be represented with finite grammars and exponentially large term sets can be represented with linear sized grammars.


Author[s]: Robert Givan, David McAllester and Sameer Shalaby

Natural Language Based Inference Procedures Applied to Schubert's Steamroller

December 1991

We have previously argued that the syntactic structure of natural language can be exploited to construct powerful polynomial time inference procedures. This paper supports the earlier arguments by demonstrating that a natural language based polynomial time procedure can solve Schubert's steamroller in a single step.


Author[s]: David McAllester

Observations on Cognitive Judgments

December 1991

It is obvious to anyone familiar with the rules of the game of chess that a king on an empty board can reach every square. It is true, but not obvious, that a knight can reach every square. Why is the first fact obvious but the second fact not? This paper presents an analytic theory of a class of obviousness judgments of this type. Whether or not the specifics of this analysis are correct, it seems that the study of obviousness judgments can be used to construct integrated theories of linguistics, knowledge representation, and inference.


Author[s]: David McAllester and David Rosenblatt

Systematic Nonlinear Planning

December 1991

This paper presents a simple, sound, complete, and systematic algorithm for domain independent STRIPS planning. Simplicity is achieved by starting with a ground procedure and then applying a general and independently verifiable, lifting transformation. Previous planners have been designed directly as lifted procedures. Our ground procedure is a ground version of Tate's NONLIN procedure. In Tate's procedure one is not required to determine whether a prerequisite of a step in an unfinished plan is guarnateed to hold in all linearizations. This allows Tate"s procedure to avoid the use of Chapman"s modal truth criterion. Systematicity is the property that the same plan, or partial plan, is never examined more than once. Systematicity is achieved through a simple modification of Tate's procedure.


Author[s]: Lynn Andrea Stein and Leora Morgenstern

Motivated Action Theory: A Formal Theory of Causal Reasoning

December 1991

When we reason about change over time, causation provides an implicit preference: we prefer sequences of situations in which one situation leads causally to the next, rather than sequences in which one situation follows another at random and without causal connections. In this paper, we explore the problem of temporal reasoning --- reasoning about change over time --- and the crucial role that causation plays in our intuitions. We examine previous approaches to temporal reasoning, and their shortcomings, in light of this analysis. We propose a new system for causal reasoning, motivated action theory, which builds upon causation as a crucial preference creterion. Motivated action theory solves the traditional problems of both forward and backward reasoning, and additionally provides a basis for a new theory of explanation.


Author[s]: Patrick G. Sobalvarro

Calculation of Blocking Probabilities in Multistage Interconnection Networks with Redundant Paths

December 1991

The blocking probability of a network is a common measure of its performance. There exist means of quickly calculating the blocking probabilities of Banyan networks; however, because Banyan networks have no redundant paths, they are not inherently fault-tolerant, and so their use in large-scale multiprocessors is problematic. Unfortunately, the addition of multiple paths between message sources and sinks in a network complicates the calculation of blocking probabilities. A methodology for exact calculation of blocking probabilities for small networks with redundant paths is presented here, with some discussion of its potential use in approximating blocking probabilities for large networks with redundant paths.


Author[s]: Tomaso Poggio, Manfred Fahle and Shimon Edelman

Fast Perceptual Learning in Visual Hyperacuity

December 1991

In many different spatial discrimination tasks, such as in determining the sign of the offset in a vernier stimulus, the human visual system exhibits hyperacuity-level performance by evaluating spatial relations with the precision of a fraction of a photoreceptor"s diameter. We propose that this impressive performance depends in part on a fast learning process that uses relatively few examples and occurs at an early processing stage in the visual pathway. We show that this hypothesis is plausible by demonstrating that it is possible to synthesize, from a small number of examples of a given task, a simple (HyperBF) network that attains the required performance level. We then verify with psychophysical experiments some of the key predictions of our conjecture. In particular, we show that fast timulus-specific learning indeed takes place in the human visual system and that this learning does not transfer between two slightly different hyperacuity tasks.


Author[s]: M. Ali Taalebinezhaad

Towards Autonomous Motion Vision

April 1992

Earlier, we introduced a direct method called fixation for the recovery of shape and motion in the general case. The method uses neither feature correspondence nor optical flow. Instead, it directly employs the spatiotemporal gradients of image brightness. This work reports the experimental results of applying some of our fixation algorithms to a sequence of real images where the motion is a combination of translation and rotation. These results show that parameters such as the fization patch size have crucial effects on the estimation of some motion parameters. Some of the critical issues involved in the implementaion of our autonomous motion vision system are also discussed here. Among those are the criteria for automatic choice of an optimum size for the fixation patch, and an appropriate location for the fixation point which result in good estimates for important motion parameters. Finally, a calibration method is described for identifying the real location of the rotation axis in imaging systems.


Author[s]: Ronen Basri

On The Uniqueness of Correspondence Under Orthographic and Perspective Projections

December 1991

The task of shape recovery from a motion sequence requires the establishment of correspondence between image points. The two processes, the matching process and the shape recovery one, are traditionally viewed as independent. Yet, information obtained during the process of shape recovery can be used to guide the matching process. This paper discusses the mutual relationship between the two processes. The paper is divided into two parts. In the first part we review the constraints imposed on the correspondence by rigid transformations and extend them to objects that undergo general affine (non rigid) transformation (including stretch and shear), as well as to rigid objects with smooth surfaces. In all these cases corresponding points lie along epipolar lines, and these lines can be recovered from a small set of corresponding points. In the second part of the paper we discuss the potential use of epipolar lines in the matching process. We present an algorithm that recovers the correspondence from three contour images. The algorithm was implemented and used to construct object models for recognition. In addition we discuss how epipolar lines can be used to solve the aperture problem.


Author[s]: Ronen Basri

The Alignment of Objects With Smooth Surfaces: Error Analysis of the Curvature Method

November 1991

The recognition of objects with smooth bounding surfaces from their contour images is considerably more complicated than that of objects with sharp edges, since in the former case the set of object points that generates the silhouette contours changes from one view to another. The "curvature method", developed by Basri and Ullman [1988], provides a method to approximate the appearance of such objects from different viewpoints. In this paper we analyze the curvature method. We apply the method to ellipsoidal objects and compute analytically the error obtained for different rotations of the objects. The error depends on the exact shape of the ellipsoid (namely, the relative lengths of its axes), and it increases a sthe ellipsoid becomes "deep" (elongated in the Z-direction). We show that the errors are usually small, and that, in general, a small number of models is required to predict the appearance of an ellipsoid from all possible views. Finally, we show experimentally that the curvature method applies as well to objects with hyperbolic surface patches.


Author[s]: David L. Brock

Dynamic Model and Control of an Artificial Muscle Based on Contractile Polymers

November 1991

A dynamic model and control system of an artificial muscle is presented. The artificial muscle is based on a contractile polymer gel which undergoes abrupt volume changes in response to variations in external conditions. The device uses an acid-base reaction to directly convert chemical to mechanical energy. A nonlinear sliding mode control system is proposed to track desired joint trajectories of a single link controlled by two antagonist muscles. Both the model and controller were implemented and produced acceptable tracking performance at 2Hz.


Author[s]: David L. Brock

Review of Artificial Muscle Based on Contractile Polymers

November 1991

An artificial muscle with strength and speed equal to that of a human muscle may soon be possible. Polymer gels exhibit abrubt volume changes in response to variations in their external conditions -- shrinking or swelling up to 1000 times their original volume. Through the conversion of chemical or electrical energy into mechanical work, a number of devices have already been constructed which produce forces up to 100N/cm2 and contraction rates on the order of a second. Through the promise of an artificial muscle is real, many fundamental physical and engineering questions remain before the extent or limit of these devices is known.


Author[s]: Harold Abelson, Andrew A. Berlin, Jacob Katzenelson, William H. McAllister, Guillermo J. Rozas, Gerald Jay Sussman and Jack Wisdom

The Supercomputer Toolkit: A General Framework for Special-purpose Computing

November 1991

The Toolkit is a family of hardware modules (processors, memory, interconnect, and input- output devices) and a collection of software modules (compilers, simulators, scientific libraries, and high-level front ends) from which high-performance special-purpose computers can be easily configured and programmed. The hardware modules are intended to be standard, reusable parts. These are combined by means of a user- reconfigurable, static interconnect technology. T he Toolkit's software support, based on n ovel compilation techniques, produces e xtremely high- performance numerical code from high-level language input, and will eventually automatically configure hardware modules for particular applications.


Author[s]: Randall Davis

Intellectual Property and Software: The Assumptions are Broken

November 1991

In March 1991 the World Intellectual Property Organization held an international symposium attended primarily by lawyers, to discuss the questions that artificial intelligence poses for intellectual property law (i.e., copyright and patents). This is an edited version of a talk presented there, which argues that AI poses few problems in the near term and that almost all the truly challenging issues arise instead from software in general. The talk was an attempt to bridge the gap between the legal community and the software community, to explain why existing concepts and categories in intellectual property law present such difficult problems for software, and why software as a technology breaks several important assumptions underlying intellectual property law.


Author[s]: Amnon Shashua

Correspondence and Affine Shape from Two Orthographic Views: Motion and Recognition

December 1991

The paper presents a simple model for recovering affine shape and correspondence from two orthographic views of a 3D object. It is shown that four corresponding points along two orthographic views, taken under similar illumination conditions, determine affine shape and correspondence for all other points. The scheme is useful for purposes of visual recognition by generating novel views of an object given two model views. It is also shown that the scheme can handle objects with smooth boundaries, to a good approximation, without introducing any modifications or additional model views.


Author[s]: Ivan A. Bachelder

Contour Matching Using Local Affine Transformations

April 1992

Partial constraints are often available in visual processing tasks requiring the matching of contours in two images. We propose a non- iterative scheme to determine contour matches using locally affine transformations. The method assumes that contours are approximated by the orthographic projection of planar patches within oriented neighborhoods of varying size. For degenerate cases, a minimal matching solution is chosen closest to the minimal pure translation. Performance on noisy synthetic and natural contour imagery is reported.


Author[s]: Michael Eisenberg

Programmable Applications: Interpreter Meets Interface

October 1991

Current fashion in "user-friendly'' software design tends to place an overreliance on direct manipulation interfaces. To be truly expressive (and thus truly user-friendly), applications need both learnable interfaces and domain-enriched languages that are accessible to the user. This paper discusses some of the design issues that arise in the creation of such programmable applications. As an example, we present "SchemePaint", a graphics application that combines a MacPaint-like interface with an interpreter for (a "graphics-enriched'') Scheme.


Author[s]: Elizabeth Bradley

A Control Algorithm for Chaotic Physical Systems

October 1991

Control algorithms which exploit the unique properties of chaos can vastly improve the design and performance of many practical and useful systems. The program Perfect Moment is built around such an algorithm. Given two points in the system's state space, it autonomously maps the space, chooses a set of trajectory segments from the maps, uses them to construct a composite path between the points, then causes the system to follow that path. This program is illustrated with two practical examples: the driven single pendulum and its electronic analog, the phase-locked loop. Strange attractor bridges, which alter the reachability of different state space points, can be used to increase the capture range of the circuit.


Author[s]: Maja Mataric

A Comparative Analysis of Reinforcement Learning Methods

October 1991

This paper analyzes the suitability of reinforcement learning (RL) for both programming and adapting situated agents. We discuss two RL algorithms: Q-learning and the Bucket Brigade. We introduce a special case of the Bucket Brigade, and analyze and compare its performance to Q in a number of experiments. Next we discuss the key problems of RL: time and space complexity, input generalization, sensitivity to parameter values, and selection of the reinforcement function. We address the tradeoffs between the built-in and learned knowledge and the number of training examples required by a learning algorithm. Finally, we suggest directions for future research.


Author[s]: Waldemar Horwat

Concurrent Smalltalk on the Message-Driven Processor

September 1991

Concurrent Smalltalk is the primary language used for programming the J- Machine, a MIMD message-passing computer containing thousands of 36-bit processors connected by a very low latency network. This thesis describes in detail Concurrent Smalltalk and its implementation on the J-Machine, including the Optimist II global optimizing compiler and Cosmos fine-grain parallel operating system. Quantitative and qualitative results are presented.


Author[s]: Lisa Dron

The Multi-Scale Veto Model: A Two-Stage Analog Network for Edge Detection and Image Reconstruction

March 1992

This paper presents the theory behind a model for a two-stage analog network for edge detection and image reconstruction to be implemented in VLSI. Edges are detected in the first stage using the multi-scale veto rule, which eliminates candidates that do not pass a threshold test at each of a set of different spatial scales. The image is reconstructed in the second stage from the brightness values adjacent to edge locations. The MSV rule allows good localization and efficient noise removal. Since the reconstructed images are visually similar to the originals, the possibility exists of achieving significant bandwidth compression.


Author[s]: P.A. Skordos and W.H. Zurek

Maxwell's Demon, Rectifiers, and the Second Law: Computer Simulation of Smoluchowski's Trapdoor

September 1991

We have simulated numerically an automated Maxwell's demon inspired by Smoluchowski's ideas of 1912. Two gas chambers of equal area are connected via an opening that is covered by a trapdoor. The trapdoor can open to the left but not to the right, and is intended to rectify naturally occurring variations in density between the two chambers. Our results confirm that though the trapdoor behaves as a rectifier when large density differences are imposed by external means, it can not extract useful work from the thermal motion of the molecules when left on its own.


Author[s]: J. Brian Subirana-Vilanova and Kah Kay Sung

Multi-Scale Vector-Ridge-Detection for Perceptual Organization Without Edges

December 1992

We present a novel ridge detector that finds ridges on vector fields. It is designed to automatically find the right scale of a ridge even in the presence of noise, multiple steps and narrow valleys. One of the key features of such ridge detector is that it has a zero response at discontinuities. The ridge detector can be applied to scalar and vector quantities such as color. We also present a parallel perceptual organization scheme based on such ridge detector that works without edges; in addition to perceptual groups, the scheme computes potential focus of attention points at which to direct future processing. The relation to human perception and several theoretical findings supporting the scheme are presented. We also show results of a Connection Machine implementation of the scheme for perceptual organization (without edges) using color.


Author[s]: Lynn Andrea Stein

Resolving Ambiguity in Nonmonotonic Inheritance Hierarchies

August 1991

This paper describes a theory of inheritance theories. We present an original theory of inheritance in nonmonotonic hierarchies. The structures on which this theory is based delineate a framework that subsumes most inheritance theories in the literature, providing a new foundation for inheritance. * Our path- based theory is sound and complete w.r.t. a direct model-theoretic semantics. * Both the credulous and the skeptical conclusions of this theory are polynomial-time computable. * We prove that true skeptical inheritance is not contained in the language of path-based inheritance. Because our techniques are modular w.r.t. the definition of specificity, they generalize to provide a unified framework for a broad class of inheritance theories. By describing multiple inheritance theories in the same “language” of credulous extensions, we make principled comparisons rather than the ad-hoc examination of specific examples makes up most of the comparative inheritance work.


Author[s]: Ellen Spertus

Why are There so Few Female Computer Scientists?

August 1991

This report examines why women pursue careers in computer science and related fields far less frequently than men do. In 1990, only 13% of PhDs in computer science went to women, and only 7.8% of computer science professors were female. Causes include the different ways in which boys and girls are raised, the stereotypes of female engineers, subtle biases that females face, problems resulting from working in predominantly male environments, and sexual biases in language. A theme of the report is that women's underrepresentation is not primarily due to direct discrimination but to subconscious behavior that perpetuates the status quo.


Author[s]: Ellen C. Hildreth, Hiroshi Ando, Richard Anderson and Stefan Treue

Recovering Three-Dimensional Structure from Motion with Surface Reconstruction

December 1991

We address the computational role that the construction of a complete surface representation may play in the recovery of 3--D structure from motion. We present a model that combines a feature--based structure-- from- -motion algorithm with smooth surface interpolation. This model can represent multiple surfaces in a given viewing direction, incorporates surface constraints from object boundaries, and groups image features using their 2--D image motion. Computer simulations relate the model's behavior to perceptual observations. In a companion paper, we discuss further perceptual experiments regarding the role of surface reconstruction in the human recovery of 3--D structure from motion.


Author[s]: Daphna Weinshall

The Matching of Doubly Ambiguous Stereograms

July 1991

I have previously described psychophysical experiments that involved the perception of many transparent layers, corresponding to multiple matching, in doubly ambiguous random dot stereograms. Additional experiments are described in the first part of this paper. In one experiment, subjects were required to report the density of dots on each transparent layer. In another experiment, the minimal density of dots on each layer, which is required for the subjects to perceive it as a distinct transparent layer, was measured. The difficulties encountered by stereo matching algorithms, when applied to doubly ambiguous stereograms, are described in the second part of this paper. Algorithms that can be modified to perform consistently with human perception, and the constraints imposed on their parameters by human perception, are discussed.


Author[s]: Shimon Ullman

Sequence-Seeking and Counter Streams: A Model for Information Processing in the Cortex

December 1991

This paper presents a model for the general flow in the neocortex. The basic process, called "sequence-seeking," is a search for a sequence of mappings or transformations, linking source and target representations. The search is bi-directional, "bottom-up" as well as "top-down," and it explores in parallel a large numbe rof alternative sequences. This operation is implemented in a structure termed "counter streams," in which multiple sequences are explored along two separate, complementary pathways which seeking to meet. The first part of the paper discusses the general sequence-seeking scheme and a number of related processes, such as the learning of successful sequences, context effects, and the use of "express lines" and partial matches. The second part discusses biological implications of the model in terms of connections within and between cortical areas. The model is compared with existing data, and a number of new predictions are proposed.


Author[s]: Pegor Papazian

Principles, Opportunism and Seeing in Design: A Computational Approach

June 1991

This thesis introduces elements of a theory of design activity and a computational framework for developing design systems. The theory stresses the opportunistic nature of designing and the complementary roles of focus and distraction, the interdependence of evaluation and generation, the multiplicity of ways of seeing over the history of a design session versus the exclusivity of a given way of seeing over an arbitrarily short period, and the incommensurability of criteria used to evaluate a design. The thesis argues for a principle based rather than rule based approach to designing documents. The Discursive Generator is presented as a computational framework for implementing specific design systems, and a simple system for arranging blocks according to a set of formal principles is developed by way of illustration. Both shape grammars and constraint based systems are used to contrast current trends in design automation with the discursive approach advocated in the thesis. The Discursive Generator is shown to have some important properties lacking in other types of systems, such as dynamism, robustness and the ability to deal with partial designs. When studied in terms of a search metaphor, the Discursive Generator is shown to exhibit behavior which is radically different from some traditional search techniques, and to avoid some of the well-known difficulties associated with them.


Author[s]: David T. Clemens

Region-Based Feature Interpretation for Recognizing 3D Models in 2D Images

June 1991

In model-based vision, there are a huge number of possible ways to match model features to image features. In addition to model shape constraints, there are important match-independent constraints that can efficiently reduce the search without the combinatorics of matching. I demonstrate two specific modules in the context of a complete recognition system, Reggie. The first is a region-based grouping mechanism to find groups of image features that are likely to come from a single object. The second is an interpretive matching scheme to make explicit hypotheses about occlusion and instabilities in the image features.


Author[s]: Michael A. Eisenberg

The Kineticist's Workbench: Combining Symbolic and Numerical Methods in the Simulation of Chemical Reaction 

May 1991

The Kineticist's Workbench is a program that simulates chemical reaction mechanisms by predicting, generating, and interpreting numerical data. Prior to simulation, it analyzes a given mechanism to predict that mechanism's behavior; it then simulates the mechanism numerically; and afterward, it interprets and summarizes the data it has generated. In performing these tasks, the Workbench uses a variety of techniques: graph- theoretic algorithms (for analyzing mechanisms), traditional numerical simulation methods, and algorithms that examine simulation results and reinterpret them in qualitative terms. The Workbench thus serves as a prototype for a new class of scientific computational tools---tools that provide symbiotic collaborations between qualitative and quantitative methods.


Author[s]: Feng Zhao and Richard Thorton

Automatic Design of a Maglev Controller in State Space

December 1991

We describe the automatic synthesis of a global nonlinear controller for stabilizing a magnetic levitation system. The synthesized control system can stabilize the maglev vehicle with large initial displacements from an equilibrium, and possesses a much larger operating region than the classical linear feedback design for the same system. The controller is automatically synthesized by a suite of computational tools. This work demonstrates that the difficult control synthesis task can be automated, using programs that actively exploit knowledge of nonlinear dynamics and state space and combine powerful numerical and symbolic computations with spatial-reasoning techniques.


Author[s]: Yael Moses and Shimon Ullman

Limitations of Non Model-Based Recognition Schemes

May 1991

Different approaches to visual object recognition can be divided into two general classes: model-based vs. non model-based schemes. In this paper we establish some limitation on the class of non model-based recognition schemes. We show that every function that is invariant to viewing position of all objects is the trivial (constant) function. It follows that every consistent recognition scheme for recognizing all 3-D objects must in general be model based. The result is extended to recognition schemes that are imperfect (allowed to make mistakes) or restricted to certain classes of objects.


Author[s]: David M. Siegel

Pose Determination of a Grasped Object Using Limited Sensing

May 1991

This report explores methods for determining the pose of a grasped object using only limited sensor information. The problem of pose determination is to find the position of an object relative to the hand. The information is useful when grasped objects are being manipulated. The problem is hard because of the large space of grasp configurations and the large amount of uncertainty inherent in dexterous hand control. By studying limited sensing approaches, the problem's inherent constraints can be better understood. This understanding helps to show how additional sensor data can be used to make recognition methods more effective and robust.


Author[s]: Ellen C. Hildreth

Recovering Heading for Visually-Guided Navigation

June 1991

We present a model for recovering the direction of heading of an observer who is moving relative to a scene that may contain self-moving objects. The model builds upon an algorithm proposed by Rieger and Lawton (1985), which is based on earlier work by Longuet-Higgens and Prazdny (1981). The algorithm uses velocity differences computed in regions of high depth variation to estimate the location of the focus of expansion, which indicates the observer’s heading direction. We relate the behavior of the proposed model to psychophysical observations regarding the ability of human observers to judge their heading direction, and show how the model can cope with self-moving objects in the environment. We also discuss this model in the broader context of a navigational system that performs tasks requiring rapid sensing and response through the interaction of simple task-specific routines.


Author[s]: Joachim Heel

Temporal Surface Reconstruction

May 1991

This thesis investigates the problem of estimating the three-dimensional structure of a scene from a sequence of images. Structure information is recovered from images continuously using shading, motion or other visual mechanisms. A Kalman filter represents structure in a dense depth map. With each new image, the filter first updates the current depth map by a minimum variance estimate that best fits the new image data and the previous estimate. Then the structure estimate is predicted for the next time step by a transformation that accounts for relative camera motion. Experimental evaluation shows the significant improvement in quality and computation time that can be achieved using this technique.


Author[s]: James M. Hyde

Multiple Mode Vibration Suppression in Controlled Flexible Systems

May 1991

Prior research has led to the development of input command shapers that can reduce residual vibration in single- or multiple-mode flexible systems. We present a method for the development of multiple-mode shapers which are simpler to implement and produce smaller response delays than previous designs. An MIT / NASA experimental flexible structure, MACE, is employed as a test article for the validation of the new shaping method. We examine the results of tests conducted on simulations of MACE. The new shapers are shown to be effective in suppressing multiple-mode vibration, even in the presence of mild kinematic and dynamic non-linearities.


Author[s]: Larry R. Dennison

Reliable Interconnection Networks for Parallel Computers

October 1991

This technical report describes a new protocol, the Unique Token Protocol, for reliable message communication. This protocol eliminates the need for end-to-end acknowledgments and minimizes the communication effort when no dynamic errors occur. Various properties of end-to-end protocols are presented. The unique token protocol solves the associated problems. It eliminates source buffering by maintaining in the network at least two copies of a message. A token is used to decide if a message was delivered to the destination exactly once. This technical report also presents a possible implementation of the protocol in a worm-hole routed, 3-D mesh network.


Author[s]: Rodney A. Brooks

Intelligence Without Reason

April 1991

Computers and Thought are the two categories that together define Artificial Intelligence as a discipline. It is generally accepted that work in Artificial Intelligence over the last thirty years has had a strong influence on aspects of computer architectures. In this paper we also make the converse claim; that the state of computer architecture has been a strong influence on our models of thought. The Von Neumann model of computation has lead Artificial Intelligence in particular directions. Intelligence in biological systems is completely different. Recent work in behavior-based Artificial Intelligenge has produced new models of intelligence that are much closer in spirit to biological systems. The non-Von Neumann computational models they use share many characteristics with biological computation.


Author[s]: Minoru Maruyama, Federico Girosi and Tomaso Poggio

A Connection Between GRBF and MLP

April 1992

Both multilayer perceptrons (MLP) and Generalized Radial Basis Functions (GRBF) have good approximation properties, theoretically and experimentally. Are they related? The main point of this paper is to show that for normalized inputs, multilayer perceptron networks are radial function networks (albeit with a non-standard radial function). This provides an interpretation of the weights w as centers t of the radial function network, and therefore as equivalent to templates. This insight may be useful for practical applications, including better initialization procedures for MLP. In the remainder of the paper, we discuss the relation between the radial functions that correspond to the sigmoid for normalized inputs and well-behaved radial basis functions, such as the Gaussian. In particular, we observe that the radial function associated with the sigmoid is an activation function that is good approximation to Gaussian basis functions for a range of values of the bias parameter. The implication is that a MLP network can always simulate a Gaussian GRBF network (with the same number of units but less parameters); the converse is true only for certain values of the bias parameter. Numerical experiments indicate that this constraint is not always satisfied in practice by MLP networks trained with backpropagation. Multiscale GRBF networks, on the other hand, can approximate MLP networks with a similar number of parameters.


Author[s]: Tomaso Poggio, Allessandro Verri and Vincent Torre

Green Theorems and Qualitative Properties of the Optical Flow

April 1991

How can one compute qualitative properties of the optical flow, such as expansion or rotation, in a way which is robust and invariant to the position of the focus of expansion or the center of rotation? We suggest a particularly simple algorithm, well-suited to VLSI implementations, that exploits well-known relations between the integral and differential properties of vector fields and their linear behaviour near singularities.


Author[s]: Federico Girosi and Gabriele Anzellotti

Convergence Rates of Approximation by Translates

March 1992

In this paper we consider the problem of approximating a function belonging to some funtion space ø by a linear comination of n translates of a given function G. Ussing a lemma by Jones (1990) and Barron (1991) we show that it is possible to define function spaces and functions G for which the rate of convergence to zero of the erro is 0(1/n) in any number of dimensions. The apparent avoidance of the "curse of dimensionality" is due to the fact that these function spaces are more and more constrained as the dimension increases. Examples include spaces of the Sobolev tpe, in which the number of weak derivatives is required to be larger than the number of dimensions. We give results both for approximation in the L2 norm and in the Lc norm. The interesting feature of these results is that, thanks to the constructive nature of Jones" and Barron"s lemma, an iterative procedure is defined that can achieve this rate.


Author[s]: Federico Girosi

Models of Noise and Robust Estimates

November 1991

Given n noisy observations g; of the same quantity f, it is common use to give an estimate of f by minimizing the function Eni=1(gi-f)2. From a statistical point of view this corresponds to computing the Maximum likelihood estimate, under the assumption of Gaussian noise. However, it is well known that this choice leads to results that are very sensitive to the presence of outliers in the data. For this reason it has been proposed to minimize the functions of the form Eni=1V(gi-f), where V is a function that increases less rapidly than the square. Several choices for V have been proposed and successfully used to obtain "robust" estimates. In this paper we show that, for a class of functions V, using these robust estimators corresponds to assuming that data are corrupted by Gaussian noise whose variance fluctuates according to some given probability distribution, that uniquely determines the shape of V.


Author[s]: Feng Zhao

Phase Space Navigator: Towards Automating Control Synthesis in Phase Spaces for Nonlinear Control Systems

April 1991

We develop a novel autonomous control synthesis strategy called Phase Space Navigator for the automatic synthesis of nonlinear control systems. The Phase Space Navigator generates global control laws by synthesizing flow shapes of dynamical systems and planning and navigating system trajectories in the phase spaces. Parsing phase spaces into trajectory flow pipes provide a way to efficiently reason about the phase space structures and search for global control paths. The strategy is particularly suitable for synthesizing high-performance control systems that do not lend themselves to traditional design and analysis techniques.


Author[s]: Daniel Kersten and Heinrich Bulthoff

Apparent Opacity Affects Perception of Structure from Motion

January 1991

The judgment of surface attributes such as transparency or opacity is often considered to be a higher-level visual process that would make use of low-level stereo or motion information to tease apart the transparent from the opaque parts. In this study, we describe a new illusion and some results that question the above view by showing that depth from transparency and opacity can override the rigidity bias in perceiving depth from motion. This provides support for the idea that the brain's computation of the surface material attribute of transparency may have to be done either before, or in parallel with the computation of structure from motion.


Author[s]: Henry Minsky

A Parallel Crossbar Routing Chip for a Shared Memory Multiprocessor

March 1991

This thesis describes the design and implementation of an integrated circuit and associated packaging to be used as the building block for the data routing network of a large scale shared memory multiprocessor system. A general purpose multiprocessor depends on high-bandwidth, low-latency communications between computing elements. This thesis describes the design and construction of RN1, a novel self-routing, enhanced crossbar switch as a CMOS VLSI chip. This chip provides the basic building block for a scalable pipelined routing network with byte- wide data channels. A series of RN1 chips can be cascaded with no additional internal network components to form a multistage fault-tolerant routing switch. The chip is designed to operate at clock frequencies up to 100Mhz using Hewlett- Packard's HP34 $1.2\mu$ process. This aggressive performance goal demands that special attention be paid to optimization of the logic architecture and circuit design.


Author[s]: Brian A. LaMacchia

Basis Reduction Algorithms and Subset Sum Problems

June 1991

This thesis investigates a new approach to lattice basis reduction suggested by M. Seysen. Seysen's algorithm attempts to globally reduce a lattice basis, whereas the Lenstra, Lenstra, Lovasz (LLL) family of reduction algorithms concentrates on local reductions. We show that Seysen's algorithm is well suited for reducing certain classes of lattice bases, and often requires much less time in practice than the LLL algorithm. We also demonstrate how Seysen's algorithm for basis reduction may be applied to subset sum problems. Seysen's technique, used in combination with the LLL algorithm, and other heuristics, enables us to solve a much larger class of subset sum problems than was previously possible.


Author[s]: Thomas Knight and Henry M. Wu

A Method for Skew-free Distribution of Digital Signals Using Matched Variable Delay Lines

March 1992

The ability to distribute signals everywhere in a circuit with controlled and known delays is essential in large, high-speed digital systems. We present a technique by which a signal driver can adjust the arrival time of the signal at the end of the wire using a pair of matched variable delay lines. We show an implemention of this idea requiring no extra wiring, and how it can be extended to distribute signals skew-free to receivers along the signal run. We demonstrate how this scheme fits into the boundary scan logic of a VLSI chip.


Author[s]: Chris Hanson

MIT Scheme Reference Manual

January 1991

MIT Scheme is an implementation of the Scheme programming language that runs on many popular workstations. The MIT Scheme Reference Manual describes the special forms, procedures, and datatypes provided by the implementation for use by application programmers.


Author[s]: A. Lumsdaine, J.L. Wyatt, Jr. and I.M. Elfadel

Nonlinear Analog Networks for Image Smoothing and Segmentation

January 1991

Image smoothing and segmentation algorithms are frequently formulatedsas optimization problems. Linear and nonlinear (reciprocal) resistivesnetworks have solutions characterized by an extremum principle. Thus,sappropriately designed networks can automatically solve certainssmoothing and segmentation problems in robot vision. This papersconsiders switched linear resistive networks and nonlinear resistivesnetworks for such tasks. The latter network type is derived from thesformer via an intermediate stochastic formulation, and a new resultsrelating the solution sets of the two is given for the "zerostermperature'' limit. We then present simulation studies of severalscontinuation methods that can be gracefully implemented in analog VLSIsand that seem to give "good'' results for these non- convexsoptimization problems.


Author[s]: Elizabeth Bradley

Control Algorithms for Chaotic Systems

March 1991

This paper presents techniques that actively exploit chaotic behavior to accomplish otherwise-impossible control tasks. The state space is mapped by numerical integration at different system parameter values and trajectory segments from several of these maps are automatically combined into a path between the desired system states. A fine- grained search and high computational accuracy are required to locate appropriate trajectory segments, piece them together and cause the system to follow this composite path. The sensitivity of a chaotic system's state-space topology to the parameters of its equations and of its trajectories to the initial conditions make this approach rewarding in spite of its computational demands.


Author[s]: Lynn Andrea Stein

Imagination and Situated Cognition

February 1991

A subsumption-based mobile robot is extended to perform cognitive tasks. Following directions, the robot navigates directly to previously unexplored goals. This robot exploits a novel architecture based on the idea that cognition uses the underlying machinery of interaction, imagining sensations and actions.


Author[s]: Anselm Spoerri

The Early Detection of Motion Boundaries

May 1990

This thesis shows how to detect boundaries on the basis of motion information alone. The detection is performed in two stages: (i) the local estimation of motion discontinuities and of the visual flowsfield; (ii) the extraction of complete boundaries belonging to differently moving objects. For the first stage, three new methods are presented: the "Bimodality Tests,'' the "Bi-distribution Test,'' and the "Dynamic Occlusion Method.'' The second stage consists of applying the "Structural Saliency Method,'' by Sha'ashua and Ullman to extract complete and unique boundaries from the output of the first stage. The developed methods can successfully segment complex motion sequences.


Author[s]: Feng Zhao

Extracting and Representing Qualitative Behaviors of Complex Systems in Phase Spaces

March 1991

We develop a qualitative method for understanding and representing phase space structures of complex systems and demonstrate the method with a program, MAPS --- Modeler and Analyzer for Phase Spaces, using deep domain knowledge of dynamical system theory. Given a dynamical system, the program generates a complete, high level symbolic description of the phase space structure sensible to human beings and manipulable by other programs. Using the phase space descriptions, we are developing a novel control synthesis strategy to automatically synthesize a controller for a nonlinear system in the phase space to achieve desired properties.


Author[s]: Ellen Spertus and William J. Dally

Experiments with Dataflow on a General-Purpose Parallel Computer

January 1991

The MIT J-Machine, a massively-parallel computer, is an experiment in providing general-purpose mechanisms for communication, synchronization, and naming that will support a wide variety of parallel models of comptuation. We have developed two experimental dataflow programming systems for the J-Machine. For the first system, we adapted Papadopoulos' explicit token store to implement static and then dynamic dataflow. Our second system made use of Iannucci's hybrid execution model to combine several dataflow graph nodes into a single sequence, decreasing scheduling overhead. By combining the strengths of the two systems, it is possible to produce a system with competitive performance.


Author[s]: Tomaso Poggio, Manfred Fahle and Shimon Edelman

Synthesis of Visual Modules from Examples: Learning Hyperacuity

January 1991

Networks that solve specific visual tasks, such as the evaluation of spatial relations with hyperacuity precision, can be eastily synthesized from a small set of examples. This may have significant implications for the interpretation of many psychophysical results in terms of neuronal models.


Author[s]: Tanveer Fathima Syeda-Mahmood

Data and Model-Driven Selection Using Color Regions

February 1992

A key problem in model-based object recognition is selection, namely, the problem of determining which regions in the image are likely to come from a single object. In this paper we present an approach that extracts and uses color region information to perform selection either based solely on image- data (data-driven), or based on the knowledge of the color description of the model (model -driven). The paper presents a method of perceptual color specification by color categories to extract perceptual color regions. It also discusses the utility of color-based selection in reducing the search involved in recognition.


Author[s]: Anita M. Flynn, Lee S. Tavrow, Stephen F. Bart and Rodney A. Brooks

Piezoelectric Micromotors for Microrobots

February 1991

By combining new robot control systems with piezoelectric motors and micromechanics, we propose creating micromechanical systems which are small, cheap and completely autonomous. We have fabricated small - a few millimeters in diameter - piezoelectric motors using ferroelectric thin films and consisting of two pieces: a stator and a rotor. The stationary stator includes a piezoelectric film in which we induce bending in the form of a traveling wave. Anything which sits atop the stator is propelled by the wave. A small glass lens placed upon the stator becomes the spinning rotor. Using thin films of PZT on silicon nitride memebranes, various types of actuator structures have been fabricated.


Author[s]: David J. Beymer

Finding Junctions Using the Image Gradient

December 1991

Junctions are the intersection points of three or more intensity surfaces in an image. An analysis of zero crossings and the gradient near junctions demonstrates that gradient- based edge detection schemes fragment edges at junctions. This fragmentation is caused by the intrinsic pairing of zero crossings and a destructive interference of edge gradients at junctions. Using the previous gradient analysis, we propose a junction detector that finds junctions in edge maps by following gradient ridges and using the minimum direction of saddle points in the gradient. The junction detector is demonstrated on real imagery and previous approaches to junction detection are discussed.


Author[s]: Joachim Dengler

Estimation of Discontinuous Displacement Vector Fields with the Minimum Description Length Criterion

October 1990

A new noniterative approach to determine displacement vector fields with discontinuities is described. In order to overcome the limitations of current methods, the problem is regarded as a general modelling problem. Starting from a family of regularized estimates, by measuring the difference in description length the compatibility between different levels of regularization is determined. This gives local but noisy evidence of possible model boundaries at multiple scales. With the two constraints of continous lines of discontinuities and the spatial coincidence assumption consistent boundary evidence is found. Based on this combined evidence the model is updated, now describing homogeneous regions with sharp discontinuities.


Author[s]: Daphna Weinshall

The Shape of Shading

October 1990

This paper discusses the relationship between the shape of the shading, the surface whose depth at each point equals the brightness in the image, and the shape of the original surface. I suggest the shading as an initial local approximation to shape, and discuss the scope of this approximation and what it may be good for. In particular, qualitative surface features, such as the sign of the Gaussian curvature, can be computed in some cases directly from the shading. Finally, a method to compute the direction of the illuminant (assuming a single point light source) from shading on occluding contours is shown.


Author[s]: Antionio Bicchi

A Criterion for the Optimal Design of Multiaxis Force Sensors

October 1990

This paper deals with the design of multi-axis force (also known as force/torque) sensors, as considered within the framework of optimal design theory. The principal goal of this paper is to identify a mathematical objective function, whose minimization corresponds to the optimization of sensor accuracy. The methodology employed is derived from linear algebra and analysis of numerical stability. The problem of optimizing the number of basic transducers employed in a multi- component sensor is also addressed. Finally, applications of the proposed method to the design of a simple sensor as well as to the optimization of a novel, 6-axis miniaturized sensor are discussed.


Author[s]: Antonio Bicchi, J. Kenneth Salisbury and David L. Brock

Contact Sensing from Force Measurements

October 1990

This paper addresses contact sensing, i.e. the problem of resolving the location of a contact, the force at the interface and the moment about the contact normals. Called "intrinsic'' contact sensing for the use of internal force and torque measurements, this method allows for practical devices which provide simple, relevant contact information in practical robotic applications. Such sensors have been used in conjunction with robot hands to identify objects, determine surface friction, detect slip, augment grasp stability, measure object mass, probe surfaces, control collision and a variety of other useful tasks. This paper describes the theoretical basis for their operation and provides a framework for future device design.


Author[s]: Claudio Melchiorri and J.K. Salisbury

Exploiting the Redundancy of a Hand-Arm Robotic System

October 1990

In this report, a method for exploiting the redundancy of a hand-arm mechanical system for manipulation tasks is illustrated. The basic idea is to try to exploit the different intrinsic capabilities of the arm and hand subsystems. The Jacobian transpose technique is at the core of the method: different behaviors of the two subsystems are obtained by means of constraints in Null(J) generated by non-orthogonal projectors. Comments about the computation of the constraints are reported in the memo, as well as a description of some preliminary experiments on a robotic system at the A.I. Lab., M.I.T.


Author[s]: Eric Sven Ristad

Computational Structure of Human Language

October 1990

The central thesis of this report is that human language is NP-complete. That is, the process of comprehending and producing utterances is bounded above by the class NP, and below by NP-hardness. This constructive complexity thesis has two empirical consequences. The first is to predict that a linguistic theory outside NP is unnaturally powerful. The second is to predict that a linguistic theory easier than NP-hard is descriptively inadequate. To prove the lower bound, I show that the following three subproblems of language comprehension are all NP-hard: decide whether a given sound is possible sound of a given language; disambiguate a sequence of words; and compute the antecedents of pronouns. The proofs are based directly on the empirical facts of the language user’s knowledge, under an appropriate idealization. Therefore, they are invariant across linguistic theories. (For this reason, no knowledge of linguistic theory is needed to understand the proofs, only knowledge of English.) To illustrate the usefulness of the upper bound, I show that two widely-accepted analyses of the language user’s knowledge (of syntactic ellipsis and phonological dependencies) lead to complexity outside of NP (PSPACE-hard and Undecidable, respectively). Next, guided by the complexity proofs, I construct alternate linguisitic analyses that are strictly superior on descriptive grounds, as well as being less complex computationally (in NP). The report also presents a new framework for linguistic theorizing, that resolves important puzzles in generative linguistics, and guides the mathematical investigation of human language.


Author[s]: Thomas M. Breuel

An Efficient Correspondence Based Algorithm for 2D and 3D Model Based Recognition

October 1990

A polynomial time algorithm (pruned correspondence search, PCS) with good average case performance for solving a wide class of geometric maximal matching problems, including the problem of recognizing 3D objects from a single 2D image, is presented. Efficient verification algorithms, based on a linear representation of location constraints, are given for the case of affine transformations among vector spaces and for the case of rigid 2D and 3D transformations with scale. Some preliminary experiments suggest that PCS is a practical algorithm. Its similarity to existing correspondence based algorithms means that a number of existing techniques for speedup can be incorporated into PCS to improve its performance.


Author[s]: Choon P. Goh

Model Selection for Solving Kinematics Problems

September 1990

There has been much interest in the area of model-based reasoning within the Artificial Intelligence community, particularly in its application to diagnosis and troubleshooting. The core issue in this thesis, simply put, is, model-based reasoning is fine, but whence the model? Where do the models come from? How do we know we have the right models? What does the right model mean anyway? Our work has three major components. The first component deals with how we determine whether a piece of information is relevant to solving a problem. We have three ways of determining relevance: derivational, situational and an order-of- magnitude reasoning process. The second component deals with the defining and building of models for solving problems. We identify these models, determine what we need to know about them, and importantly, determine when they are appropriate. Currently, the system has a collection of four basic models and two hybrid models. This collection of models has been successfully tested on a set of fifteen simple kinematics problems. The third major component of our work deals with how the models are selected.


Author[s]: Yang Meng Tan

Supporting Reuse and Evolution in Software Design

October 1990

Program design is an area of programming that can benefit significantly from machine-mediated assistance. A proposed tool, called the Design Apprentice (DA), can assist a programmer in the detailed design of programs. The DA supports software reuse through a library of commonly-used algorithmic fragments, or cliches, that codifies standard programming. The cliche library enables the programmer to describe the design of a program concisely. The DA can detect some kinds of inconsistencies and incompleteness in program descriptions. It automates detailed design by automatically selecting appropriate algorithms and data structures. It supports the evolution of program designs by keeping explicit dependencies between the design decisions made. These capabilities of the DA are underlaid bya model of programming, called programming by successive elaboration, which mimics the way programmers interact. Programming by successive elaboration is characterized by the use of breadth-first exposition of layered program descriptions and the successive modifications of descriptions. A scenario is presented to illustrate the concept of the DA. Technques for automating the detailed design process are described. A framework is given in which designs are incrementally augmented and modified by a succession of design steps. A library of cliches and a suite of design steps needed to support the scenario are presented.


Author[s]: Brian Eberman and David L. Brock

Line Kinematics for Whole-Arm Manipulation

January 1991

A Whole-Arm Manipulator uses every surface to both sense and interact with the environment. To facilitate the analysis and control of a Whole-Arm Manipulator, line geometry is used to describe the location and trajectory of the links. Applications of line kinematics are described and implemented on the MIT Whole-Arm Manipulator (WAM-1).


Author[s]: Bruno Caprile and Federico Girosi

A Nondeterministic Minimization Algorithm

September 1990

The problem of minimizing a multivariate function is recurrent in many disciplines as Physics, Mathematics, Engeneering and, of course, Computer Science. In this paper we describe a simple nondeterministic algorithm which is based on the idea of adaptive noise, and that proved to be particularly effective in the minimization of a class of multivariate, continuous valued, smooth functions, associated with some recent extension of regularization theory by Poggio and Girosi (1990). Results obtained by using this method and a more traditional gradient descent technique are also compared.


Author[s]: Tomaso Poggio

A Theory of How the Brain Might Work

December 1990

I wish to propose a quite speculative new version of the grandmother cell theory to explain how the brain, or parts of it, may work. In particular, I discuss how the visual system may learn to recognize 3D objects. The model would apply directly to the cortical cells involved in visual face recognition. I will also outline the relation of our theory to existing models of the cerebellum and of motor control. Specific biophysical mechanisms can be readily suggested as part of a basic type of neural circuitry that can learn to approximate multidimensional input-output mappings from sets of examples and that is expected to be replicated in different regions of the brain and across modalities. The main points of the theory are: -the brain uses modules for multivariate function approximation as basic components of several of its information processing subsystems. -these modules are realized as HyperBF networks (Poggio and Girosi, 1990a,b). -HyperBF networks can be implemented in terms of biologically plausible mechanisms and circuitry. The theory predicts a specific type of population coding that represents an extension of schemes such as look-up tables. I will conclude with some speculations about the trade-off between memory and computation and the evolution of intelligence.


Author[s]: Anita M. Flynn

The 1990 AI Fair

August 1990

This year, as the finale to the Artificial Intelligence Laboratory’s annual Winter Olympics, the Lab staged an AI Fair – a night devoted to displaying the wide variety of talents and interests within the laboratory. The Fair provided an outlet for creativity and fun in a carnival-like atmosphere. Students organized events from robot boat races to face-recognition vision contests. Research groups came together to make posters and booths explaining their work. The robots rolled down out of the labs, networks were turned over to aerial combat computer games and walls were decorated with posters of zany ideas for the future. Everyone pitched in, and this photograph album is a pictorial account of the fun that night at the AI Fair.


Author[s]: Robert Joseph Hall

Program Improvement by Automatic Redistribution of Intermediate Results

February 1991

Introducing function sharing into designs allows eliminating costly structure by adapting existing structure to perform its function. This can eliminate many inefficiencies of reusing general componentssin specific contexts. "Redistribution of intermediate results'' focuses on instances where adaptation requires only addition/deletion of data flow and unused code removal. I show that this approach unifies and extends several well- known optimization classes. The system performs search and screening by deriving, using a novel explanation-based generalization technique, operational filtering predicates from input teleological information. The key advantage is to focus the system's effort on optimizations that are easier to prove safe.


Author[s]: W. Eric L. Grimson, Daniel P. Huttenlocher and David W. Jacobs

Affine Matching with Bounded Sensor Error: A Study of Geometric Hashing and Alignment

August 1991

Affine transformations are often used in recognition systems, to approximate the effects of perspective projection. The underlying mathematics is for exact feature data, with no positional uncertainty. In practice, heuristics are added to handle uncertainty. We provide a precise analysis of affine point matching, obtaining an expression for the range of affine-invariant values consistent with bounded uncertainty. This analysis reveals that the range of affine- invariant values depends on the actual $x$- $y$-positions of the features, i.e. with uncertainty, affine representations are not invariant with respect to the Cartesian coordinate system. We analyze the effect of this on geometric hashing and alignment recognition methods.


Author[s]: Harold Abelson, Andrew A. Berlin, Jacob Katzenelson, William H. McAllister, Guillermo J. Rozas and Gerald Jay Sussman

The Supercomputer Toolkit and Its Applications

July 1990

The Supercomputer Toolkit is a proposed family of standard hardware and software components from which special-purpose machines can be easily configured. Using the Toolkit, a scientist or an engineer, starting with a suitable computational problem, will be able to readily configure a special purpose multiprocessor that attains supercomputer- class performance on that problem, at a fraction of the cost of a general purpose supercomputer. The Toolkit is currently being built as a joint project between Hewlett- Packard and MIT. The software and the applications are in various stages of development and research.


Author[s]: Andrew Andai Chien

Concurrent Aggregates (CA): An Object-Oriented Language for Fine-Grained Message-Passing Machines

July 1990

Fine-grained parallel machines have the potential for very high speed computation. To program massively-concurrent MIMD machines, programmers need tools for managing complexity. These tools should not restrict program concurrency. Concurrent Aggregates (CA) provides multiple-access data abstraction tools, Aggregates, which can be used to implement abstractions with virtually unlimited potential for concurrency. Such tools allow programmers to modularize programs without reducing concurrency. I describe the design, motivation, implementation and evaluation of Concurrent Aggregates. CA has been used to construct a number of application programs. Multi-access data abstractions are found to be useful in constructing highly concurrent programs.


Author[s]: Bruce R. Thompson

The PHD: A Planar, Harmonic Drive Robot for Joint Torque Control

May 1990

This thesis details the development of a model of a seven degree of freedom manipulator for position control. Then, it goes on to discuss the design and construction of a the PHD, a robot built to serve two purposes: first, to perform research on joint torque control schemes, and second, to determine the important dynamic characteristics of the Harmonic Drive. The PHD, is a planar, three degree of freedom arm with torque sensors integral to each joint. Preliminary testing has shown that a simple linear spring model of the Harmonic Drive's flexibility is suitable in many situations.


Author[s]: Donald Scott Wills

Pi: A Parallel Architecture Interface for Multi-Model Execution

May 1990

This thesis defines Pi, a parallel architecture interface that separates model and machine issues, allowing them to be addressed independently. This provides greater flexibility for both the model and machine builder. Pi addresses a set of common parallel model requirements including low latency communication, fast task switching, low cost synchronization, efficient storage management, the ability to exploit locality, and efficient support for sequential code. Since Pi provides generic parallel operations, it can efficiently support many parallel programming models including hybrids of existing models. Pi also forms a basis of comparison for architectural components.


Author[s]: Michael Dean Levin

Design and Control of a Closed-Loop Brushless Torque Actuator

May 1990

This report explores the design and control issues associated with a brushless actuator capable of achieving extremely high torque accuracy. Models of several different motor - sensor configurations were studied to determine dynamic characteristics. A reaction torque sensor fixed to the motor stator was implemented to decouple the transmission dynamics from the sensor. This resulted in a compact actuator with higher bandwidth and precision than could be obtained with an inline or joint sensor. Testing demonstrated that closed-loop torque accuracy was within 0.1%, and the mechanical bandwidth approached 300 Hz.


Author[s]: Manfred Fahle

On the Shifter Hyposthesis for the Elimination of Motion Blur

August 1990

Moving objects may stimulate many retinal photoreceptors within the integration time of the receptors without motion blur being experienced. Anderson and vanEssen (1987) suggested that the neuronal representation of retinal images is shifted on its way to the cortex, in an opposite direction to the motion. Thus, the cortical representation of objects would be stationary. I have measured thresholds for two vernier stimuli, moving simultaneously into opposite directions over identical positions. Motion blur for these stimuli is not stronger than with a single moving stimulus, and thresholds can be below a photoreceptor diameter. This result cannot be easily reconciled with the hypothesis of Tshifter circuitsU.


Author[s]: Manfred Fahle and Gunther Palm

A Model for Rivalry Between Cognitive Contours

June 1990

The interactions between illusory and real contours have been inves- tigated under monocular, binocular and dichoptic conditions. Results show that under all three presentation conditions, periodic alternations, generally called rivalry, occur during the perception of cognitive (or illusory) triangles, while earlier research had failed to find such rivalry (Bradley & Dumais, 1975). With line triangles, rivalry is experienced only under dichoptic conditions. A model is proposed to account for the observed phenomena, and the results of simulations are presented.


Author[s]: Shimon Edelman and Heinrich H. Bulthoff

Viewpoint-Specific Representations in Three-Dimensional Object Recognition

August 1990

We report a series of psychophysical experiments that explore different aspects of the problem of object representation and recognition in human vision. Contrary to the paradigmatic view which holds that the representations are three-dimensional and object-centered, the results consistently support the notion of view-specific representations that include at most partial depth information. In simulated experiments that involved the same stimuli shown to the human subjects, computational models built around two-dimensional multiple-view representations replicated our main psychophysical results, including patterns of generalization errors and the time course of perceptual learning.


Author[s]: Gary C. Borchardt

Transition Space

November 1990

Informal causal descriptions of physical systems abound in sources such as encyclopedias, reports and user's manuals. Yet these descriptions remain largely opaque to computer processing. This paper proposes a representational framework in which such descriptions are viewed as providing partial specifications of paths in a space of possible transitions, or transition space. In this framework, the task of comprehending informal causal descriptions emerges as one of completing the specifications of paths in transition space---filling causal gaps and relating accounts of activity varied by analogy and abstraction. The use of the representation and its operations is illustrated in the context of a simple description concerning rocket propulsion.


Author[s]: Camille Z. Chammas

Analysis and Implementation of Robust Grasping Behaviors

May 1990

This thesis addresses the problem of developing automatic grasping capabilities for robotic hands. Using a 2-jointed and a 4- jointed nmodel of the hand, we establish the geometric conditions necessary for achieving form closure grasps of cylindrical objects. We then define and show how to construct the grasping pre-image for quasi-static (friction dominated) and zero-G (inertia dominated) motions for sensorless and sensor-driven grasps with and without arm motions. While the approach does not rely on detailed modeling, it is computationally inexpensive, reliable, and easy to implement. Example behaviors were successfully implemented on the Salisbury hand and on a planar 2- fingered, 4 degree-of-freedom hand.


Author[s]: Jonathan Amsterdam

The Iterate Manual

October 1990

This is the manual for version 1.1 of Iterate, a powerful iteration macro for Common Lisp. Iterate is similar to Loop but provides numerous additional features, is well integrated with Lisp, and is extensible.


Author[s]: Helen Greiner

Passive and Active Grasping with a Prehensile Robot End-Effector

May 1990

This report presents a design of a new type of robot end-effector with inherent mechanical grasping capabilities. Concentrating on designing an end-effector to grasp a simple class of objects, cylindrical, allowed a design with only one degree of actuation. The key features of this design are high bandwidth response to forces, passive grasping capabilities, ease of control, and ability to wrap around objects with simple geometries providing form closure. A prototype of this mechanism was built to evaluate these features.


Author[s]: David J. Bennett

The Control of Human Arm Movement Models and Mechanical Constraints

May 1990

A serial-link manipulator may form a mobile closed kinematic chain when interacting with the environment, if it is redundant with respect to the task degrees of freedom (DOFs) at the endpoint. If the mobile closed chain assumes a number of configurations, then loop consistency equations permit the manipulator and task kinematics to be calibrated simultaneously using only the joint angle readings; endpoint sensing is not required. Example tasks include a fixed endpoint (0 DOF task), the opening of a door (1 DOF task), and point contact (3 DOF task). Identifiability conditions are derived for these various tasks.


Author[s]: Ellen Spertus

Dataflow Computation for the J-Machine

May 1990

The dataflow model of computation exposes and exploits parallelism in programs without requiring programmer annotation; however, instruction- level dataflow is too fine-grained to be efficient on general-purpose processors. A popular solution is to develop a "hybrid'' model of computation where regions of dataflow graphs are combined into sequential blocks of code. I have implemented such a system to allow the J- Machine to run Id programs, leaving exposed a high amount of parallelism --- such as among loop iterations. I describe this system and provide an analysis of its strengths and weaknesses and those of the J-Machine, along with ideas for improvement.


Author[s]: Jeff F. Tabor

Noise Reduction Using Low Weight and Constant Weight Coding Techniques

May 1990

Signalling off-chip requires significant current. As a result, a chip's power-supply current changes drastically during certain output-bus transitions. These current fluctuations cause a voltage drop between the chip and circuit board due to the parasitic inductance of the power-supply package leads. Digital designers often go to great lengths to reduce this "transmitted" noise. Cray, for instance, carefully balances output signals using a technique called differential signalling to guarantee a chip has constant output current. Transmitted-noise reduction costs Cray a factor of two in output pins and wires. Coding achieves similar results at smaller costs.


Author[s]: Patrick H. Winston and Satayjit Rao

Repairing Learned Knowledge Using Experience

May 1990

Explanation-based learning occurs when something useful is retained from an explanation, usually an account of how some particular problem can be solved given a sound theory. Many real-world explanations are not based on sound theory, however, and wrong things may be learned accidentally, as subsequent failures will likely demonstrate. In this paper, we describe ways to isolate the facts that cause failures, ways to explain why those facts cause problems, and ways to repair learning mistakes. In particular, our program learns to distinguish pails from cups after making a few mistakes.


Author[s]: Anita Flynn (editor)

Olympic Robot Building Manual

December 1988

The 1989 AI Lab Winter Olympics will take a slightly different twist from previous Olympiads. Although there will still be a dozen or so athletic competitions, the annual talent show finale will now be a display not of human talent, but of robot talent. Spurred on by the question, "Why aren't there more robots running around the AI Lab?", Olympic Robot Building is an attempt to teach everyone how to build a robot and get them started. Robot kits will be given out the last week of classes before the Christmas break and teams have until the Robot Talent Show, January 27th, to build a machine that intelligently connects perception to action. There is no constraint on what can be built; participants are free to pick their own problems and solution implementations. As Olympic Robot Building is purposefully a talent show, there is no particular obstacle course to be traversed or specific feat to be demonstrated. The hope is that this format will promote creativity, freedom and imagination. This manual provides a guide to overcoming all the practical problems in building things. What follows are tutorials on the components supplied in the kits: a microprocessor circuit "brain", a variety of sensors and motors, a mechanical building block system, a complete software development environment, some example robots and a few tips on debugging and prototyping. Parts given out in the kits can be used, ignored or supplemented, as the kits are designed primarily to overcome the intertia of getting started. If all goes well, then come February, there should be all kinds of new members running around the AI Lab!


Author[s]: David Jerome Braunegg

MARVEL: A System for Recognizing World Locations with Stereo Vision

June 1990

To use a world model, a mobile robot must be able to determine its own position in the world. To support truly autonomous navigation, I present MARVEL, a system that builds and maintains its own models of world locations and uses these models to recognize its world position from stereo vision input. MARVEL is designed to be robust with respect to input errors and to respond to a gradually changing world by updating its world location models. I present results from real- world tests of the system that demonstrate its reliability. MARVEL fits into a world modeling system under development.


Author[s]: Maja J. Mataric

A Distributed Model for Mobile Robot Environment-Learning and Navigation

May 1990

A distributed method for mobile robot navigation, spatial learning, and path planning is presented. It is implemented on a sonar- based physical robot, Toto, consisting of three competence layers: 1) Low-level navigation: a collection of reflex-like rules resulting in emergent boundary-tracing. 2) Landmark detection: dynamically extracts landmarks from the robot's motion. 3) Map learning: constructs a distributed map of landmarks. The parallel implementation allows for localization in constant time. Spreading of activation computes both topological and physical shortest paths in linear time. The main issues addressed are: distributed, procedural, and qualitative representation and computation, emergent behaviors, dynamic landmarks, minimized communication.


Author[s]: Rodney A. Brooks

The Behavior Language; User's Guide

April 1990

The Behavior Language is a rule-based real- time parallel robot programming language originally based on ideas from [Brooks 86], [Connell 89], and [Maes 89]. It compiles into a modified and extended version of the subsumption architecture [Brooks 86] and thus has backends for a number of processors including the Motorola 68000 and 68HCll, the Hitachi 6301, and Common Lisp. Behaviors are groups of rules which are activatable by a number of different schemes. There are no shared data structures across behaviors, but instead all communication is by explicit message passing. All rules are assumed to run in parallel and asynchronously. It includes the earlier notions of inhibition and suppression, along with a number of mechanisms for spreading of activation.


Author[s]: W. Eric L. Grimson

The Effect of Indexing on the Complexity of Object Recognition

April 1990

Many current recognition systems use constrained search to locate objects in cluttered environments. Previous formal analysis has shown that the expected amount of search is quadratic in the number of model and data features, if all the data is known to come from a sinlge object, but is exponential when spurious data is included. If one can group the data into subsets likely to have come from a single object, then terminating the search once a "good enough" interpretation is found reduces the expected search to cubic. Without successful grouping, terminated search is still exponential. These results apply to finding instances of a known object in the data. In this paper, we turn to the problem of selecting models from a library, and examine the combinatorics of determining that a candidate object is not present in the data. We show that the expected search is again exponential, implying that naïve approaches to indexing are likely to carry an expensive overhead, since an exponential amount of work is needed to week out each of the incorrect models. The analytic results are shown to be in agreement with empirical data for cluttered object recognition.


Author[s]: Andre DeHon, Tom Knight and Marvin Minsky

Fault-Tolerant Design for Multistage Routing Networks

April 1990

As the size of digital systems increases, the mean time between single component failures diminishes. To avoid component related failures, large computers must be fault-tolerant. In this paper, we focus on methods for achieving a high degree of fault- tolerance in multistage routing networks. We describe a multipath scheme for providing end-to-end fault-tolerance on large networks. The scheme improves routing performance while keeping network latency low. We also describe the novel routing component, RN1, which implements this scheme, showing how it can be the basic building block for fault- tolerant multistage routing networks.


Author[s]: Andre DeHon

Fat-Tree Routing for Transit

February 1990

The Transit network provides high-speed, low-latency, fault-tolerant interconnect for high-performance, multiprocessor computers. The basic connection scheme for Transit uses bidelta style, multistage networks to support up to 256 processors. Scaling to larger machines by simply extending the bidelta network topology will result in a uniform degradation of network latency between all processors. By employing a fat- tree network structure in larger systems, the network provides locality and universality properties which can help minimize the impact of scaling on network latency. This report details the topology and construction issues associated with integrating Transit routing technology into fat-tree interconnect topologies.


Author[s]: William Singhose

Shaping Inputs to Reduce Vibration: A Vector Diagram Approach

March 1990

This paper describes a method for limiting vibration in flexible systems by shaping the system inputs. Unlike most previous attempts at input shaping, this method does not require an extensive system model or lengthy numerical computation; only knowledge of the system natural frequency and damping ratio are required. The effectiveness of this method when there are errors in the system model is explored and quantified. An algorithm is presented which, given an upper bound on acceptable residual vibration amplitude, determines a shaping strategy that is insensitive to errors in the estimated natural frequency. A procedure for shaping inputs to systems with input constraints is outlined. The shaping method is evaluated by dynamic simulations and hardware experiments.


Author[s]: Federico Girosi, Tomaso Poggio and Bruno Caprile

Extensions of a Theory of Networks for Approximation and Learning: Outliers and Negative Examples

July 1990

Learning an input-output mapping from a set of examples can be regarded as synthesizing an approximation of a multi-dimensional function. From this point of view, this form of learning is closely related to regularization theory. In this note, we extend the theory by introducing ways of dealing with two aspects of learning: learning in the presence of unreliable examples and learning from positive and negative examples. The first extension corresponds to dealing with outliers among the sparse data. The second one corresponds to exploiting information about points or regions in the range of the function that are forbidden.


Author[s]: J. Brian Subirana-Vilanova and Whitman Richards

Perceptual Organization, Figure-Ground, Attention and Saliency

August 1991

Notions of figure-ground, inside-outside are difficult to define in a computational sense, yet seem intuitively meaningful. We propose that "figure" is an attention-directed region of visual information processing, and has a non- discrete boundary. Associated with "figure" is a coordinate frame and a "frame curve" which helps initiate the shape recognition process by selecting and grouping convex image chunks for later matching- to-model. We show that human perception is biased to see chunks outside the frame as more salient than those inside. Specific tasks, however, can reverse this bias. Near/far, top/bottom and expansion/contraction also behave similarly.


Author[s]: Elizabeth Bradley

Causes and Effects of Chaos

December 1990

Most of the recent literature on chaos and nonlinear dynamics is written either for popular science magazine readers or for advanced mathematicians. This paper gives a broad introduction to this interesting and rapidly growing field at a level that is between the two. The graphical and analytical tools used in the literature are explained and demonstrated, the rudiments of the current theory are outlined and that theory is discussed in the context of several examples: an electronic circuit, a chemical reaction and a system of satellites in the solar system.


Author[s]: David McAllester

Automatic Recognition of Tractability in Inference Relations

February 1990

A procedure is given for recognizing sets of inference rules that generate polynomial time decidable inference relations. The procedure can automatically recognize the tractability of the inference rules underlying congruence closure. The recognition of tractability for that particular rule set constitutes mechanical verification of a theorem originally proved independently by Kozen and Shostak. The procedure is algorithmic, rather than heuristic, and the class of automatically recognizable tractable rule sets can be precisely characterized. A series of examples of rule sets whose tractability is non-trivial, yet machine recognizable, is also given. The technical framework developed here is viewed as a first step toward a general theory of tractable inference relations.


Author[s]: Nancy S. Pollard

The Grasping Problem: Toward Task-Level Programming for an Articulated Hand

May 1990

This report presents a system for generating a stable, feasible, and reachable grasp of a polyhedral object. A set of contact points on the object is found that can result in a stable grasp; a feasible grasp is found in which the robot contacts the object at those contact points; and a path is constructed from the initial configuration of the robot to the stable, feasible final grasp configuration. The algorithm described in the report is designed for the Salisbury hand mounted on a Puma 560 arm, but a similar approach could be used to develop grasping systems for other robots.


Author[s]: Manfred Fahle

Parallel Computation of Vernier Offsets, Curvature and Chevrons in Humans

December 1989

A vernier offset is detected at once among straight lines, and reaction times are almost independent of the number of simultaneously presented stimuli (distractors), indicating parallel processing of vernier offsets. Reaction times for identifying a vernier offset to one side among verniers offset to the opposite side increase with the number of distractors, indicating serial processing. Even deviations below a photoreceptor diameter can be detected at once. The visual system thus attains positional accuracy below the photoreceptor diameter simultaneously at different positions. I conclude that deviation from straightness, or change of orientation, is detected in parallel over the visual field. Discontinuities or gradients in orientation may represent an elementary feature of vision.


Author[s]: Manfred Fahle

Limits of Precision for Human Eye Motor Control

November 1989

Dichoptic presentation of vernier stimuli, i.e., one segment to each eye, yielded three times higher thresholds than binocular presentation, mainly due to uncorrelated movements of both eyes. Thresholds allow one to calculate an upper estimate for the amplitudes of uncorrelated eye movements during fixation. This estimate matches the best results from direct eye position recording, with the calculated mean amplitude of eye tremor corresponding to roughly one photoreceptor diameter. The combined amplitude of both correlated and uncorrelated eye movements was also measured by delaying one segment of the vernier relative to its partner under monocular or dichoptic conditions.


Author[s]: Manfred Fahle and Tom Troscianko

Computation of Texture and Stereoscopic Depth in Humans

October 1989

The computation of texture and of stereoscopic depth is limited by a number of factors in the design of the optical front-end and subsequent processing stages in humans and machines. A number of limiting factors in the human visual system, such as resolution of the optics and opto-electronic interface, contrast, luminance, temporal resolution and eccentricity are reviewed and evaluated concerning their relevance for the recognition of texture and stereoscopic depth. The algorithms used by the human brain to discriminate between textures and to compute stereoscopic depth are very fast and efficient. Their study might be beneficial for the development of better algorithms in machine vision.


Author[s]: Howard B. Reubenstein

Automated Acquisition of Evolving Informal Descriptions

June 1990

The Listener is an automated system that unintrusively performs knowledge acquisition from informal input. The Listener develops a coherent internal representation of a description from an initial set of disorganized, imprecise, incomplete, ambiguous, and possibly inconsistent statements. The Listener can produce a summary document from its internal representation to facilitate communication, review, and validation. A special purpose Listener, called the Requirements Apprentice (RA), has been implemented in the software requirements acquisition domain. Unlike most other requirements analysis tools, which start from a formal description language, the focus of the RA is on the transition between informal and formal specifications.


Author[s]: David Chapman

Vision, Instruction, and Action

April 1990

This thesis describes Sonja, a system which uses instructions in the course of visually- guided activity. The thesis explores an integration of research in vision, activity, and natural language pragmatics. Sonja's visual system demonstrates the use of several intermediate visual processes, particularly visual search and routines, previously proposed on psychophysical grounds. The computations Sonja performs are compatible with the constraints imposed by neuroscientifically plausible hardware. Although Sonja can operate autonomously, it can also make flexible use of instructions provided by a human advisor. The system grounds its understanding of these instructions in perception and action.


Author[s]: Joachim Heel

Direct Estimation of Structure and Motion from Multiple Frames

March 1990

This paper presents a method for the estimation of scene structure and camera motion from a sequence of images. This approach is fundamentally new. No computation of optical flow or feature correspondences is required. The method processes image sequences of arbitrary length and exploits the redundancy for a significant reduction in error over time. No assumptions are made about camera motion or surface structure. Both quantities are fully recovered. Our method combines the "direct'' motion vision approach with the theory of recursive estimation. Each step is illustrated and evaluated with results from real images.


Author[s]: Feng Zhao

Machine Recognition as Representation and Search

December 1989

Generality, representation, and control have been the central issues in machine recognition. Model-based recognition is the search for consistent matches of the model and image features. We present a comparative framework for the evaluation of different approaches, particularly those of ACRONYM, RAF, and Ikeuchi et al. The strengths and weaknesses of these approaches are discussed and compared and the remedies are suggested. Various tradeoffs made in the implementations are analyzed with respect to the systems' intended task-domains. The requirements for a versatile recognition system are motivated. Several directions for future research are pointed out.


Author[s]: M. Ali Taalebinezhaad

Direct Recovery of Motion and Shape in the General Case by Fixation

March 1990

This work introduces a direct method called FIXATION for solving the general motion vision problem. This Fixation method results in a constraint equation between translational and rotational velocities that in combination with the Brightness-Change Constraint Equation (BCCE) solves the general motion vision problem, arbitrary motion with respect to an arbitrary rigid environment. Neither Correspondence nor Optical Flow has been used here. Recently Direct Motion Vision methods have used the BCCE for solving the motion vision problem of special motions or environments. In contrast to those solutions, the Fixation method does not put such severe restrictions on the motion or the environment.


Author[s]: David J. Braunegg

Location Recognition Using Stereo Vision

October 1989

A mobile robot must be able to determine its own position in the world. To support truly autonomous navigation, we present a system that builds and maintains its own models of world locations and uses these models to recognize its world position from stereo vision input. The system is designed to be robust with respect to input errors and to respond to a gradually changing world by updating the world location models. We present results from tests of the system that demonstrate its reliability. The model builder and recognition system fit into a planned world modeling system that we describe.


Author[s]: David J. Braunegg

An Alternative to Using the 3D Delaunay Tessellation for Representing Freespace

September 1989

Representing the world in terms of visible surfaces and the freespacesexisting between these surfaces and the viewer is an important problemsin robotics. Recently, researchers have proposed using the 3DsDelaunay Tessellation for representing 3D stereo vision data and thesfreespace determined therefrom. We discuss problems with using thes3D Delaunay Tessellation as the basis of the representation andspropose an alternative representation that we are currentlysinvestigating. This new representation is appropriate for planningsmobile robot navigation and promises to be robust when using stereosdata that has errors and uncertainty.


Author[s]: David J. Braunegg

Stereo Feature Matching in Disparity Space

September 1989

This paper describes a new method for matching, validating, and disambiguating features for stereo vision. It is based on the Marr-Poggio- Grimson stereo matching algorithm which uses zero-crossing contours in difference-of-Gaussian filtered images as features. The matched contours are represented in disparity space, which makes the information needed for matched contour validation and disambiguation easily accessible. The use of disparity space also makes the algorithm conceptually cleaner than previous implementations of the Marr- Poggio-Grimson algorithm and yields a more efficient matching process.


Author[s]: Andrew Trice and Randall Davis

Consensus Knowledge Acquisition

December 1989

We have developed a method and prototype program for assisting two experts in their attempts to construct a single, consensus knowledge base. We show that consensus building can be effectively facilitated by a debugging approach that identifies, explains, and resolves discrepancies in their knowledge. To implement this approach we identify and use recognition and repair procedures for a variety of discrepancies. Examples of this knowledge are illustrated\ with sample transcripts from CARTER, a system for reconciling two rule-based systems. Implications for resolving other kinds of knowledge representations are also examined.


Author[s]: Rodney A. Brooks and Anita M. Flynn

Fast, Cheap and Out of Control

December 1989

Spur-of-the-moment planetary exploration missions are within our reach. Complex systems and complex missions usually take years of planning and force launches to become incredibly expensive. We argue here for cheap, fast missions using large numbers of mass produced simple autonomous robots that are small by today's standards, perhaps 1 to 2kg. We suggest that within a few years it will be possible, at modest cost, to invade a planet with millions of tiny robots.


Author[s]: Shimon Edelman and Tomaso Poggio

Bringing the Grandmother Back into the Picture: A Memory-Based View of Object Recognition

April 1990

We describe experiments with a versatile pictorial prototype based learning scheme for 3D object recognition. The GRBF scheme seems to be amenable to realization in biophysical hardware because the only kind of computation it involves can be effectively carried out by combining receptive fields. Furthermore, the scheme is computationally attractive because it brings together the old notion of a "grandmother'' cell and the rigorous approximation methods of regularization and splines.


Author[s]: Pattie Maes

How to Do the Right Thing

October 1989

This paper presents a novel approach to the problem of action selection for an autonomous agent. An agent is viewed as a collection of competence modules. Action selection is modeled as an emergent property of an activation/inhibition dynamics among these modules. A concrete action selection algorithm is presented and a detailed account of the results is given. This algorithm combines characteristics of both traditional planners and reactive systems: it produces fast and robust activity in a tight interaction loop with the environment, while at the same time allowing for some prediction and planning to take place.


Author[s]: Marc H. Raibert, H. Benjamin Brown, Jr., Michael Chepponis, Jeff Koechling, Jessica K. Hodgins, Diane Dustman, W. Kevin Brennan, David S. Barrett, Clay M. Thompson, John Daniell Hebert, Woojin Lee and Lance Borvansky

Dynamically Stable Legged Locomotion (September 1985-Septembers1989)

September 1989

This report documents our work in exploring active balance for dynamic legged systems for the period from September 1985 through September 1989. The purpose of this research is to build a foundation of knowledge that can lead both to the construction of useful legged vehicles and to a better understanding of animal locomotion. In this report we focus on the control of biped locomotion, the use of terrain footholds, running at high speed, biped gymnastics, symmetry in running, and the mechanical design of articulated legs.


Author[s]: Eric Sven Ristad and Robert C. Berwick

Computational Consequences of Agreement and Ambiguity in Natural Language

November 1988

The computer science technique of computational complexity analysis can provide powerful insights into the algorithm- neutral analysis of information processing tasks. Here we show that a simple, theory- neutral linguistic model of syntactic agreement and ambiguity demonstrates that natural language parsing may be computationally intractable. Significantly, we show that it may be syntactic features rather than rules that can cause this difficulty. Informally, human languages and the computationally intractable Satisfiability (SAT) problem share two costly computional mechanisms: both enforce agreement among symbols across unbounded distances (Subject-Verb agreement) and both allow ambiguity (is a word a Noun or a Verb?).


Author[s]: David W. Jacobs

Grouping For Recognition

November 1989

This paper presents a new method of grouping edges in order to recognize objects. This grouping method succeeds on images of both two- and three- dimensional objects. So that the recognition system can consider first the collections of edges most likely to lead to the correct recognition of objects, we order groups of edges based on the likelihood that a single object produced them. The grouping module estimates this likelihood using the distance that separates edges and their relative orientation. This ordering greatly reduces the amount of computation required to locate objects and improves the system's robustness to error.


Author[s]: David McAllester and Robert Givan

Natural Language Syntax and First Order Preference

October 1989

We have argued elsewhere that first order inference can be made more efficient by using non-standard syntax for first order logic. In this paper we show how a fragment of English syntax under Montague semantics provides the foundation of a new inference procedure. This procedure seems more effective than corresponding procedures based on either classical syntax of our previously proposed taxonomic syntax. This observation may provide a functional explanation for some of the syntactic structure of English.


Author[s]: Henrich Bulthoff and Manfred Fahle

Disparity Gradients and Depth Scaling

September 1989

The binocular perception of shape and depth relations between objects can change considerably if the viewing direction is changed only by a small angle. We explored this effect psychophysically and found a strong depth reduction effect for large disparity gradients. The effect is found to be strongest for horizontally oriented stimuli, and stronger for line stimuli than for points. This depth scaling effect is discussed in a computational framework of stereo based on a Baysian approach which allows integration of information from different types of matching primitives weighted according to their robustness.


Author[s]: Harold Abelson

The Bifurcation Interpreter: A Step Towards the Automatic Analysis of Dynamical Systems

September 1989

The Bifurcation Interpreter is a computer program that autonomously explores the steady-state orbits of one-parameter families of periodically- driven oscillators. To report its findings, the Interpreter generates schematic diagrams and English text descriptions similar to those appearing in the science and engineering research literature. Given a system of equations as input, the Interpreter uses symbolic algebra to automatically generate numerical procedures that simulate the system. The Interpreter incorporates knowledge about dynamical systems theory, which it uses to guide the simulations, to interpret the results, and to minimize the effects of numerical error.


Author[s]: Ed Gamble

A Comparison of Hardware Implementations for Low-Level Vision Algorithms

November 1989

Early and intermediate vision algorithms, such as smoothing and discontinuity detection, are often implemented on general- purpose serial, and more recently, parallel computers. Special-purpose hardware implementations of low-level vision algorithms may be needed to achieve real- time processing. This memo reviews and analyzes some hardware implementations of low-level vision algorithms. Two types of hardware implementations are considered: the digital signal processing chips of Ruetz (and Broderson) and the analog VLSI circuits of Carver Mead. The advantages and disadvantages of these two approaches for producing a general, real-time vision system are considered.


Author[s]: Michael Eisenberg

Descriptive Simulation: Combining Symbolic and Numerical Methods in the Analysis of Chemical Reaction Mechanisms

September 1989

The Kineticist's Workbench is a computer program currently under development whose purpose is to help chemists understand, analyze, and simplify complex chemical reaction mechanisms. This paper discusses one module of the program that numerically simulates mechanisms and constructs qualitative descriptions of the simulation results. These descriptions are given in terms that are meaningful to the working chemist (e.g., steady states, stable oscillations, and so on); and the descriptions (as well as the data structures used to construct them) are accessible as input to other programs.


Author[s]: Eric Sven Ristad

Computational Structure of GPSG Models: Revised Generalized Phrase Structure Grammar

September 1989

The primary goal of this report is to demonstrate how considerations from computational complexity theory can inform grammatical theorizing. To this end, generalized phrase structure grammar (GPSG) linguistic theory is revised so that its power more closely matches the limited ability of an ideal speaker--hearer: GPSG Recognition is EXP-POLY time hard, while Revised GPSG Recognition is NP-complete. A second goal is to provide a theoretical framework within which to better understand the wide range of existing GPSG models, embodied in formal definitions as well as in implemented computer programs. A grammar for English and an informal explanation of the GPSG/RGPSG syntactic features are included in appendices.


Author[s]: Tomaso Poggio and Federico Girosi

Continuous Stochastic Cellular Automata that Have a Stationary Distribution and No Detailed Balance

December 1990

Marroquin and Ramirez (1990) have recently discovered a class of discrete stochastic cellular automata with Gibbsian invariant measures that have a non-reversible dynamic behavior. Practical applications include more powerful algorithms than the Metropolis algorithm to compute MRF models. In this paper we describe a large class of stochastic dynamical systems that has a Gibbs asymptotic distribution but does not satisfy reversibility. We characterize sufficient properties of a sub-class of stochastic differential equations in terms of the associated Fokker-Planck equation for the existence of an asymptotic probability distribution in the system of coordinates which is given. Practical implications include VLSI analog circuits to compute coupled MRF models.


Author[s]: Tomaso Poggio and Federico Girosi

Extensions of a Theory of Networks for Approximation and Learning: Dimensionality Reduction and Clustering

April 1990

The theory developed in Poggio and Girosi (1989) shows the equivalence between regularization and a class of three-layer networks that we call regularization networks or Hyper Basis Functions. These networks are also closely related to the classical Radial Basis Functions used for interpolation tasks and to several pattern recognition and neural network algorithms. In this note, we extend the theory by defining a general form of these networks with two sets of modifiable parameters in addition to the coefficients $c_\ alpha$: moving centers and adjustable norm- weight.


Author[s]: Bonnie J. Dorr

Conceptual Basis of the Lexicon in Machine Translation

August 1989

This report describes the organization and content of lexical information required for the task of machine translation. In particular, the lexical-conceptual basis for UNITRAN, an implemented machine translation system, will be described. UNITRAN uses an underlying form called lexical conceptual structure to perform lexical selection and syntactic realization. Lexical word entries have two levels of description: the first is an underlying lexical-semantic representation that is derived from hierarchically organized primitives, and the second is a mapping from this representation to a corresponding syntactic structure. The interaction of these two levels will be discussed and the lexical selection and syntactic realization processes will be described.


Author[s]: Brian LaMacchia and Jason Nieh

The Standard Map Machine

September 1989

We have designed the Standard Map Machine(SMM) as an answer to the intensive computational requirements involved in the study of chaotic behavior in nonlinear systems. The high-speed and high-precision performance of this computer is due to its simple architecture specialized to the numerical computations required of nonlinear systems. In this report, we discuss the design and implementation of this special-purpose machine.


Author[s]: Federico Girosi and Tomaso Poggio

Networks and the Best Approximation Property

October 1989

Networks can be considered as approximation schemes. Multilayer networks of the backpropagation type can approximate arbitrarily well continuous functions (Cybenko, 1989; Funahashi, 1989; Stinchcombe and White, 1989). We prove that networks derived from regularization theory and including Radial Basis Function (Poggio and Girosi, 1989), have a similar property. From the point of view of approximation theory, however, the property of approximating continous functions arbitrarily well is not sufficient for characterizing good approximation schemes. More critical is the property of best approximation. The main result of this paper is that multilayer networks, of the type used in backpropagation, are not best approximation. For regularization networks (in particular Radial Basis Function networks) we prove existence and uniqueness of best approximation.


Author[s]: Kenneth Man-Kam Yip

KAM: Automatic Planning and Interpretation of Numerical Experiments Using Geometrical Methods

August 1989

KAM is a computer program that can automatically plan, monitor, and interpret numerical experiments with Hamiltonian systems with two degrees of freedom. The program has recently helped solve an open problem in hydrodynamics. Unlike other approaches to qualitative reasoning about physical system dynamics, KAM embodies a significant amount of knowledge about nonlinear dynamics. KAM's ability to control numerical experiments arises from the fact that it not only produces pictures for us to see, but also looks at (sic---in its mind's eye) the pictures it draws to guide its own actions. KAM is organized in three semantic levels: orbit recognition, phase space searching, and parameter space searching. Within each level spatial properties and relationships that are not explicitly represented in the initial representation are extracted by applying three operations ---(1) aggregation, (2) partition, and (3) classification--- iteratively.


Author[s]: Jean-Pierre Schott

Three-Dimensional Motion Estimation Using Shading Information in Multiple Frames

September 1989

A new formulation for recovering the structure and motion parameters of a moving patch using both motion and shading information is presented. It is based on a new differential constraint equation (FICE) that links the spatiotemporal gradients of irradiance to the motion and structure parameters and the temporal variations of the surface shading. The FICE separates the contribution to the irradiance spatiotemporal gradients of the gradients due to texture from those due to shading and allows the FICE to be used for textured and textureless surface. The new approach, combining motion and shading information, leads directly to two different contributions: it can compensate for the effects of shading variations in recovering the shape and motion; and it can exploit the shading/illumination effects to recover motion and shape when they cannot be recovered without it. The FICE formulation is also extended to multiple frames.


Author[s]: Lyle J. Borg-Graham

Modelling the Somantic Electrical Response of Hippocampal Pyramidal Neurons

September 1989

A modeling study of hippocampal pyramidal neurons is described. This study is based on simulations using HIPPO, a program which simulates the somatic electrical activity of these cells. HIPPO is based on a) descriptions of eleven non-linear conductances that have been either reported for this class of cell in the literature or postulated in the present study, and b) an approximation of the electrotonic structure of the cell that is derived in this thesis, based on data for the linear properties of these cells. HIPPO is used a) to integrate empirical data from a variety of sources on the electrical characteristics of this type of cell, b) to investigate the functional significance of the various elements that underly the electrical behavior, and c) to provide a tool for the electrophysiologist to supplement direct observation of these cells and provide a method of testing speculations regarding parameters that are not accessible.


Author[s]: Bonnie J. Dorr

Lexical Conceptual Structure and Generation in Machine Translation

June 1989

This report introduces an implemented scheme for generating target- language sentences using a compositional representation of meaning called lexical conceptual structure. Lexical conceptual structure facilitates two crucial operations associated with generation: lexical selection and syntactic realization. The compositional nature of the representation is particularly valuable for these two operations when semantically equivalent source-and-target-language words and phrases are structurally or thematically divergent. To determine the correct lexical items and syntactic realization associated with the surface form in such cases, the underlying lexical-semantic forms are systematically mapped to the target-language syntactic structures. The model described constitutes a lexical-semantic extension to UNITRAN.


Author[s]: Shimon Edelman and Daphna Weinshall

Computational Vision: A Critical Review

October 1989

We review the progress made in computational vision, as represented by Marr's approach, in the last fifteen years. First, we briefly outline computational theories developed for low, middle and high-level vision. We then discuss in more detail solutions proposed to three representative problems in vision, each dealing with a different level of visual processing. Finally, we discuss modifications to the currently established computational paradigm that appear to be dictated by the recent developments in vision.


Author[s]: Thomas Marill

Recognizing Three-Dimensional Objects without the Use of Models

September 1989

We present an approach to the problem of recognizing three-dimensional objects from line-drawings. In this approach there are no models. The system needs only to be given a single picture of an object; it can then recognize the object in arbitrary orientations.


Author[s]: Sandiway Fong

Free Indexation: Combinatorial Analysis and a Compositional Algorithm

December 1989

In the principles-and-parameters model of language, the principle known as "free indexation'' plays an important part in determining the referential properties of elements such as anaphors and pronominals. This paper addresses two issues. (1) We investigate the combinatorics of free indexation. In particular, we show that free indexation must produce an exponential number of referentially distinct structures. (2) We introduce a compositional free indexation algorithm. We prove that the algorithm is "optimal.'' More precisely, by relating the compositional structure of the formulation to the combinatorial analysis, we show that the algorithm enumerates precisely all possible indexings, without duplicates.


Author[s]: Michael A. Erdmann

On Probabilistic Strategies for Robot Tasks

August 1989

Robots must act purposefully and successfully in an uncertain world. Sensory information is inaccurate or noisy, actions may have a range of effects, and the robot’s environment is only partially and imprecisely modeled. This thesis introduces active randomization by a robot, both in selecting actions to execute and in focusing on sensory information to interpret, as a basic tool for overcoming uncertainty. An example of randomization is given by the strategy of shaking a bin containing a part in order to orient the part in a desired stable state with some high probability. Another example consists of first using reliable sensory information to bring two parts close together, then relying on short random motions to actually mate the two parts, once the part motions lie below the available sensing resolution. Further examples include tapping parts that are tightly wedged, twirling gears before trying to mesh them, and vibrating parts to facilitate a mating operation.


Author[s]: Anya C. Hurlbert

The Computation of Color

September 1989

This thesis takes an interdisciplinary approach to the study of color vision, focussing on the phenomenon of color constancy formulated as a computational problem. The primary contributions of the thesis are (1) the demonstration of a formal framework for lightness algorithms; (2) the derivation of a new lightness algorithm based on regularization theory; (3) the synthesis of an adaptive lightness algorithm using "learning" techniques; (4) the development of an image segmentation algorithm that uses luminance and color information to mark material boundaries; and (5) an experimental investigation into the cues that human observers use to judge the color of the illuminant. Other computational approaches to color are reviewed and some of their links to psychophysics and physiology are explored.


Author[s]: Andrew Dean Christian

Design and Implementation of a Flexible Robot

August 1989

This robot has low natural frequencies of vibration. Insights into the problems of designing joint and link flexibility are discussed. The robot has three flexible rotary actuators and two flexible, interchangeable links, and is controlled by three independent processors on a VMEbus. Results from experiments on the control of residual vibration for different types of robot motion are presented. Impulse prefiltering and slowly accelerating moves are compared and shown to be effective at reducing residual vibration.


Author[s]: Shimon Ullman and Ronen Basri

Recognition by Linear Combinations of Models

August 1989

Visual object recognition requires the matching of an image with a set of models stored in memory. In this paper we propose an approach to recognition in which a 3-D object is represented by the linear combination of 2-D images of the object. If M = {M1,…Mk} is the set of pictures representing a given object, and P is the 2-D image of an object to be recognized, then P is considered an instance of M if P = Eki=aiMi for some constants ai. We show that this approach handles correctly rigid 3-D transformations of objects with sharp as well as smooth boundaries, and can also handle non-rigid transformations. The paper is divided into two parts. In the first part we show that the variety of views depicting the same object under different transformations can often be expressed as the linear combinations of a small number of views. In the second part we suggest how this linear combinatino property may be used in the recognition process.


Author[s]: Jonathan Connell

A Colony Architecture for an Artificial Creature

September 1989

This report describes a working autonomous mobile robot whose only goal is to collect and return empty soda cans. It operates in an unmodified office environment occupied by moving people. The robot is controlled by a collection of over 40 independent "behaviors'' distributed over a loosely coupled network of 24 processors. Together this ensemble helps the robot locate cans with its laser rangefinder, collect them with its on-board manipulator, and bring them home using a compass and an array of proximity sensors. We discuss the advantages of using such a multi-agent control system and show how to decompose the required tasks into component activities. We also examine the benefits and limitations of spatially local, stateless, and independent computation by the agents.


Author[s]: Anita M. Flynn and Rodney A. Brooks

Battling Reality

October 1989

In the four years that the MIT Mobile Robot Project has benn in existence, we have built ten robots that focus research in various areas concerned with building intelligent systems. Towards this end, we have embarked on trying to build useful autonomous creatures that live and work in the real world. Many of the preconceived notions entertained before we started building our robots turned out to be misguided. Some issues we thought would be hard have worked successfully from day one and subsystems we imagined to be trivial have become tremendous time sinks. Oddly enough, one of our biggest failures has led to some of our favorite successes. This paper describes the changing paths our research has taken due to the lessons learned from the practical realities of building robots.


Author[s]: Shimon Edelman and Daphna Weinshall

A Self-Organizing Multiple-View Representation of 3D Objects

August 1989

We explore representation of 3D objects in which several distinct 2D views are stored for each object. We demonstrate the ability of a two-layer network of thresholded summation units to support such representations. Using unsupervised Hebbian relaxation, we trained the network to recognise ten objects from different viewpoints. The training process led to the emergence of compact representations of the specific input views. When tested on novel views of the same objects, the network exhibited a substantial generalisation capability. In simulated psychophysical experiments, the network's behavior was qualitatively similar to that of human subjects.


Author[s]: Andrew Berlin and Daniel Weise

Compiling Scientific Code Using Partial Evaluation

July 1989

Scientists are faced with a dilemma: either they can write abstract programs that express their understanding of a problem, but which do not execute efficiently; or they can write programs that computers can execute efficiently, but which are difficult to write and difficult to understand. We have developed a compiler that uses partial evaluation and scheduling techniques to provide a solution to this dilemma.


Author[s]: Andrew A. Berlin

A Compilation Strategy for Numerical Programs Based on Partial Evaluation

February 1989

This work demonstrates how partial evaluation can be put to practical use in the domain of high-performance numerical computation. I have developed a technique for performing partial evaluation by using placeholders to propagate intermediate results. For an important class of numerical programs, a compiler based on this technique improves performance by an order of magnitude over conventional compilation techniques. I show that by eliminating inherently sequential data-structure references, partial evaluation exposes the low-level parallelism inherent in a computation. I have implemented several parallel scheduling and analysis programs that study the tradeoffs involved in the design of an architecture that can effectively utilize this parallelism. I present these results using the 9- body gravitational attraction problem as an example.


Author[s]: W. Kenneth Stewart, Jr.

Multisensor Modeling Underwater with Uncertain Information

July 1988

This thesis develops an approach to the construction of multidimensional stochastic models for intelligent systems exploring an underwater environment. It describes methods for building models by a three- dimensional spatial decomposition of stochastic, multisensor feature vectors. New sensor information is incrementally incorporated into the model by stochastic backprojection. Error and ambiguity are explicitly accounted for by blurring a spatial projection of remote sensor data before incorporation. The stochastic models can be used to derive surface maps or other representations of the environment. The methods are demonstrated on data sets from multibeam bathymetric surveying, towed sidescan bathymetry, towed sidescan acoustic imagery, and high-resolution scanning sonar aboard a remotely operated vehicle.


Author[s]: Ellen C. Hildreth, Norberto M. Grzywacz, Edward H. Adelson and Victor K. Inada

The Perceptual Buildup of Three-Dimensional Structure from Motion

August 1989

We present psychophysical experiments that measure the accuracy of perceived 3D structure derived from relative image motion. The experiments are motivated by Ullman's incremental rigidity scheme, which builds up 3D structure incrementally over an extended time. Our main conclusions are: first, the human system derives an accurate model of the relative depths of moving points, even in the presence of noise; second, the accuracy of 3D structure improves with time, eventually reaching a plateau; and third, the 3D structure currently perceived depends on previous 3D models. Through computer simulations, we relate the psychophysical observations to the behavior of Ullman's model.


Author[s]: Tomaso Poggio and Federico Girosi

A Theory of Networks for Appxoimation and Learning

July 1989

Learning an input-output mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multi-dimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, this form of learning is closely related to classical approximation techniques, such as generalized splines and regularization theory. This paper considers the problems of an exact representation and, in more detail, of the approximation of linear and nolinear mappings in terms of simpler functions of fewer variables. Kolmogorov's theorem concerning the representation of functions of several variables in terms of functions of one variable turns out to be almost irrelevant in the context of networks for learning. We develop a theoretical framework for approximation based on regularization techniques that leads to a class of three-layer networks that we call Generalized Radial Basis Functions (GRBF), since they are mathematically related to the well-known Radial Basis Functions, mainly used for strict interpolation tasks. GRBF networks are not only equivalent to generalized splines, but are also closely related to pattern recognition methods such as Parzen windows and potential functions and to several neural network algorithms, such as Kanerva's associative memory, backpropagation and Kohonen's topology preserving map. They also have an interesting interpretation in terms of prototypes that are synthesized and optimally combined during the learning stage. The paper introduces several extensions and applications of the technique and discusses intriguing analogies with neurobiological data.


Author[s]: Jason Nieh

Using Special-Purpose Computing to Examine Chaotic Behavior in Nonlinear Mappings

September 1989

Studying chaotic behavior in nonlinear systems requires numerous computations in order to simulate the behavior of such systems. The Standard Map Machine was designed and implemented as a special computer for performing these intensive computations with high-speed and high- precision. Its impressive performance is due to its simple architecture specialized to the numerical computations required of nonlinear systems. This report discusses the design and implementation of the Standard Map Machine and its use in the study of nonlinear mappings; in particular, the study of the standard map.


Author[s]: Shimon Edelman, Heinrich Bulthoff and Daphna Weinshall

Stimulus Familiarity Determines Recognition Strategy for Novel 3-D Objects

July 1989

We describe a psychophysical investigation of the effects of object complexity and familiarity on the variation of recognition time and recognition accuracy over different views of novel 3D objects. Our findings indicate that with practice the response times for different views become more uniform and the initially orderly dependency of the response time on the distance to a "good" view disappears. One possible interpretation of our results is in terms of a tradeoff between memory needed for storing specific-view representations of objects and time spent in recognizing the objects.


Author[s]: J. Brian Subirana-Vilanova

Curved Inertia Frames: Visual Attention and Perceptual Organization Using Convexity and Symmetry

October 1991

In this paper we present an approach to perceptual organization and attention based on Curved Inertia Frames (C.I.F.), a novel definition of "curved axis of inertia'' tolerant to noisy and spurious data. The definition is useful because it can find frames that correspond to large, smooth, convex, symmetric and central parts. It is novel because it is global and can detect curved axes. We discuss briefly the relation to human perception, the recognition of non-rigid objects, shape description, and extensions to finding "features", inside/outside relations, and long- smooth ridges in arbitrary surfaces.


Author[s]: Thomas Marill

Computer Perception of Three-Dimensional Objects

August 1989

We first pose the following problem: to develop a program which takes line-drawings as input and constructs three-dimensional objects as output, such that the output objects are the same as the ones we see when we look at the input line-drawing. We then introduce the principle of minimum standard- deviation of angles (MSDA) and discuss a program based on MSDA. We present the results of testing this program with a variety of line- drawings and show that the program constitutes a solution to the stated problem over the range of line-drawings tested. Finally, we relate this work to its historical antecedents in the psychological and computer-vision literature.


Author[s]: David McAllester and Robert Givan

Taxonomic Syntax for First-Order Inference

June 1989

Most knowledge representation languages are based on classes and taxonomic relationships between classes. Taxonomic hierarchies without defaults or exceptions are semantically equivalent to a collection of formulas in first order predicate calculus. Although designers of knowledge representation languages often express an intuitive feeling that there must be some advantage to representing facts as taxonomic relationships rather than first order formulas, there are few, if any, technical results supporting this intuition. We attempt to remedy this situation by presenting a taxonomic syntax for first order predicate calculus and a series of theorems that support the claim that taxonomic syntax is superior to classical syntax.


Author[s]: Todd Anthony Cass

Feature Matching for Object Localization in the Presence of Uncertainty

May 1990

We consider the problem of matching model and sensory data features in the presence of geometric uncertainty, for the purpose of object localization and identification. The problem is to construct sets of model feature and sensory data feature pairs that are geometrically consistent given that there is uncertainty in the geometry of the sensory data features. If there is no geometric uncertainty, polynomial-time algorithms are possible for feature matching, yet these approaches can fail when there is uncertainty in the geometry of data features. Existing matching and recognition techniques which account for the geometric uncertainty in features either cannot guarantee finding a correct solution, or can construct geometrically consistent sets of feature pairs yet have worst case exponential complexity in terms of the number of features. The major new contribution of this work is to demonstrate a polynomial-time algorithm for constructing sets of geometrically consistent feature pairs given uncertainty in the geometry of the data features. We show that under a certain model of geometric uncertainty the feature matching problem in the presence of uncertainty is of polynomial complexity. This has important theoretical implications by demonstrating an upper bound on the complexity of the matching problem, an by offering insight into the nature of the matching problem itself. These insights prove useful in the solution to the matching problem in higher dimensional cases as well, such as matching three-dimensional models to either two or three-dimensional sensory data. The approach is based on an analysis of the space of feasible transformation parameters. This paper outlines the mathematical basis for the method, and describes the implementation of an algorithm for the procedure. Experiments demonstrating the method are reported.


Author[s]: Todd A. Cass

Robust 2-D Model-Based Object Recognition

May 1988

Techniques, suitable for parallel implementation, for robust 2D model-based object recognition in the presence of sensor error are studied. Models and scene data are represented as local geometric features and robust hypothesis of feature matchings and transformations is considered. Bounds on the error in the image feature geometry are assumed constraining possible matchings and transformations. Transformation sampling is introduced as a simple, robust, polynomial-time, and highly parallel method of searching the space of transformations to hypothesize feature matchings. Key to the approach is that error in image feature measurement is explicitly accounted for. A Connection Machine implementation and experiments on real images are presented.


Author[s]: Daphna Weinshall

Direct Computation of 3D Shape Invariants and the Focus of Expansion

May 1989

Structure from motion often refers to the computation of 3D structure from a matched sequence of images. However, a depth map of a surface is difficult to compute and may not be a good representation for storage and recognition. Given matched images, I will first show that the sign of the normal curvature in a given direction at a given point in the image can be computed from a simple difference of slopes of line-segments in one image. Using this result, local surface patches can be classified as convex, concave, parabolic (cylindrical), hyperbolic (saddle point) or planar. At the same time the translational component of the optical flow is obtained, from which the focus of expansion can be computed.


Author[s]: Jeffrey Van Baalen

Toward a Theory of Representation Design

May 1989

This research is concerned with designing representations for analytical reasoning problems (of the sort found on the GRE and LSAT). These problems test the ability to draw logical conclusions. A computer program was developed that takes as input a straightforward predicate calculus translation of a problem, requests additional information if necessary, decides what to represent and how, designs representations capturing the constraints of the problem, and creates and executes a LISP program that uses those representations to produce a solution. Even though these problems are typically difficult for theorem provers to solve, the LISP program that uses the designed representations is very efficient.


Author[s]: Anita M. Flynn, Rodney A. Brooks and Lee S. Tavrow

Twilight Zones and Cornerstones: A Gnat Robot Double Feature

July 1989

We want to build tiny gnat-sized robots, a millimeter or two in diameter. They will be cheap, disposable, totally self-contained autonomous agents able to do useful things in the world. This paper consists of two parts. The first describes why we want to build them. The second is a technical outline of how to go about it. Gnat robots are going to change the world.


Author[s]: Michelle Kwok Lee

Summarizing Qualitative Behavior from Measurements of NonlinearsCircuits

May 1989

This report describes a program which automatically characterizes the behavior of any driven, nonlinear, electrical circuit. To do this, the program autonomously selects interesting input parameters, drives the circuit, measures its response, performs a set of numeric computations on the measured data, interprets the results, and decomposes the circuit's parameter space into regions of qualitatively distinct behavior. The output is a two-dimensional portrait summarizing the high-level, qualitative behavior of the circuit for every point in the graph, an accompanying textual explanation describing any interesting patterns observed in the diagram, and a symbolic description of the circuit's behavior which can be passed on to other programs for further analysis.


Author[s]: Michael R. Brent

Causal/Temporal Connectives: Syntax and Lexicon

September 1989

This report elucidates the linguistic representation of temporal relations among events. This involves examining sentences that contain two clauses connected by words like once, by the time, when, and before. Specifically, the effect of the tenses of the connected clauses on the acceptability of sentences are examined. For example, Rachel disappeared once Jon had fallen asleep is fine, but *Rachel had disappeared once Jon fell asleep is unacceptable. A theory of acceptability is developed and its implications for interpretation discussed. Factoring of the linguisitic knowledge into a general, syntactic component and a lexical component clarifies the interpretation problem. Finally, a computer model of the theory is demonstrated.


Author[s]: Anita M. Flynn, Rodney A. Brooks, William M. Wells III and David S. Barrett

SQUIRT: The Prototypical Mobile Robot for Autonomous Graduate Students

July 1989

This paper describes an exercise in building a complete robot aimed at being as small as possible but using off-the-shelf components exclusively. The result is an autonomous mobile robot slightly larger than one cubic inch which incorporates sensing, actuation, onboard computation, and onboard power supplies. Nicknamed Squirt, this robot acts as a 'bug', hiding in dark corners and venturing out in the direction of last heard noises, only moving after the noises are long gone.


Author[s]: Henry M. Wu

A Multiprocessor Architecture Using Modular Arithmetic for Very High Precision Computation

April 1989

We outline a multiprocessor architecture that uses modular arithmetic to implement numerical computation with 900 bits of intermediate precision. A proposed prototype, to be implemented with off-the-shelf parts, will perform high-precision arithmetic as fast as some workstations and mini- computers can perform IEEE double-precision arithmetic. We discuss how the structure of modular arithmetic conveniently maps into a simple, pipelined multiprocessor architecture. We present techniques we developed to overcome a few classical drawbacks of modular arithmetic. Our architecture is suitable to and essential for the study of chaotic dynamical systems.


Author[s]: Michael D. Monegan

An Object-Oriented Software Reuse Tool

April 1989

The Object-oriented Reuse Tool (ORT) supports the reuse of object-oriented software by maintaining a library of reusable classes and recording information about their reusability as well as information associated with their design and verification. In the early design phases of object-oriented development, ORT facilitates reuse by providing a flexible way to navigate the library, thereby aiding in the process of refining a design to maximally reuse existing classes. A collection of extensions to ORT have also been identified. These extensions would compose the remainder of a system useful in increasing reuse in object-oriented software production.


Author[s]: Bror V. H. Saxberg

A Modern Differential Geometric Approach to Shape from Shading

June 1989

How the visual system extracts shape information from a single grey-level image can be approached by examining how the information about shape is contained in the image. This technical report considers the characteristic equations derived by Horn as a dynamical system. Certain image critical points generate dynamical system critical points. The stable and unstable manifolds of these critical points correspond to convex and concave solution surfaces, giving more general existence and uniqueness results. A new kind of highly parallel, robust shape from shading algorithm is suggested on neighborhoods of these critical points. The information at bounding contours in the image is also analyzed.


Author[s]: Andrew Christian

Design Considerations for an Earth-Based Flexible Robotic System

March 1989

This paper provides insights into the problems of designing a robot with joint and link flexibility. The relationship between the deflection of the robot under gravity is correlated with the fundamental frequency of vibration. We consider different types of link geometry and evaluate the flexibility potential of different materials. Some general conclusions and guidelines for constructing a flexible robot are given.


Author[s]: Davi Geiger and Federico Girosi

Parallel and Deterministic Algorithms for MRFs: Surface Reconstruction and Integration

May 1989

In recent years many researchers have investigated the use of Markov random fields (MRFs) for computer vision. The computational complexity of the implementation has been a drawback of MRFs. In this paper we derive deterministic approximations to MRFs models. All the theoretical results are obtained in the framework of the mean field theory from statistical mechanics. Because we use MRFs models the mean field equations lead to parallel and iterative algorithms. One of the considered models for image reconstruction is shown to give in a natural way the graduate non-convexity algorithm proposed by Blake and Zisserman.


Author[s]: Karen Beth Sarachik

Visual Navigation: Constructing and Utilizing Simple Maps of an Indoor Environment

March 1989

The goal of this work is to navigate through an office environmentsusing only visual information gathered from four cameras placed onboard a mobile robot. The method is insensitive to physical changes within the room it is inspecting, such as moving objects. Forward and rotational motion vision are used to find doors and rooms, and these can be used to build topological maps. The map is built without the use of odometry or trajectory integration. The long term goal of the project described here is for the robot to build simple maps of its environment and to localize itself within this framework.


Author[s]: Richard P. Wildes

On Interpreting Stereo Disparity

February 1989

The problems under consideration center around the interpretation of binocular stereo disparity. In particular, the goal is to establish a set of mappings from stereo disparity to corresponding three-dimensional scene geometry. An analysis has been developed that shows how disparity information can be interpreted in terms of three-dimensional scene properties, such as surface depth, discontinuities, and orientation. These theoretical developments have been embodied in a set of computer algorithms for the recovery of scene geometry from input stereo disparity. The results of applying these algorithms to several disparity maps are presented. Comparisons are made to the interpretation of stereo disparity by biological systems.


Author[s]: W. Eric L. Grimson

The Combinatorics of Heuristic Search Termination for Object Recognition in Cluttered Environments

May 1989

Many recognition systems use constrained search to locate objects in cluttered environments. Earlier analysis showed that the expected search is quadratic in the number of model and data features, if all the data comes from one object, but is exponential when spurious data is included. To overcome this, many methods terminate search once an interpretation that is "good enough" is found. We formally examine the combinatorics of this, showing that correct termination procedures dramatically reduce search. We provide conditions on the object model and the scene clutter such that the expected search is quartic. These results are shown to agree with empirical data for cluttered object recognition.


Author[s]: W. Eric L. Grimson and Daniel P. Huttenlocher

On the Verification of Hypothesized Matches in Model-Based Recognition

May 1989

In model-based recognition, ad hoc techniques are used to decide if a match of data to model is correct. Generally an empirically determined threshold is placed on the fraction of model features that must be matched. We rigorously derive conditions under which to accept a match, relating the probability of a random match to the fraction of model features accounted for, as a function of the number of model features, number of image features and the sensor noise. We analyze some existing recognition systems and show that our method yields results comparable with experimental data.


Author[s]: Jeremy M. Wertheimer

Derivation of an Efficient Rule System Pattern Matcher

February 1989

Formalizing algorithm derivations is a necessary prerequisite for developing automated algorithm design systems. This report describes a derivation of an algorithm for incrementally matching conjunctive patterns against a growing database. This algorithm, which is modeled on the Rete matcher used in the OPS5 production system, forms a basis for efficiently implementing a rule system. The highlights of this derivation are: (1) a formal specification for the rule system matching problem, (2) derivation of an algorithm for this task using a lattice-theoretic model of conjunctive and disjunctive variable substitutions, and (3) optimization of this algorithm, using finite differencing, for incrementally processing new data.


Author[s]: Thomas M. Breuel

Indexing for Visual Recognition from a Large Model Base

August 1990

This paper describes a new approach to the model base indexing stage of visual object recognition. Fast model base indexing of 3D objects is achieved by accessing a database of encoded 2D views of the objects using a fast 2D matching algorithm. The algorithm is specifically intended as a plausible solution for the problem of indexing into very large model bases that general purpose vision systems and robots will have to deal with in the future. Other properties that make the indexing algorithm attractive are that it can take advantage of most geometric and non- geometric properties of features without modification, and that it addresses the incremental model acquisition problem for 3D objects.


Author[s]: Jeff Palmucci, Carl Waldsburger, David Duis and Paul Krause

Experience with Acore: Implementing GHC with Actors

August 1990

This paper presents a concurrent interpreter for a general-purpose concurrent logic programming language, Guarded Horn Clauses (GHC). Unlike typical implementations of GHC in logic programming languages, the interpreter is implemented in the Actor language Acore. The primary motivation for this work was to probe the strengths and weaknesses of Acore as a platform for developing sophisticated programs. The GHC interpreter provided a rich testbed for exploring Actor programming methodology. The interpreter is a pedagogical investigation of the mapping of GHC constructs onto the Actor model. Since we opted for simplicity over optimization, the interpreter is somewhat inefficient.


Author[s]: Berthold K.P. Horn

Height and Gradient from Shading

May 1989

The method described here for recovering the shape of a surface from a shaded image can deal with complex, wrinkled surfaces. Integrability can be enforced easily because both surface height and gradient are represented. The robustness of the method stems in part from linearization of the reflectance map about the current estimate of the surface orientation at each picture cell. The new scheme can find an exact solution of a given shape-from-shading problem even though a regularizing term is included. This is a reflection of the fact that shape-from- shading problems are not ill-posed when boundary conditions are available or when the image contains singular points.


Author[s]: Henry M. Wu

Performance Evaluation of the Scheme 86 and HP Precision Architecture

April 1989

The Scheme86 and the HP Precision Architectures represent different trends in computer processor design. The former uses wide micro-instructions, parallel hardware, and a low latency memory interface. The latter encourages pipelined implementation and visible interlocks. To compare the merits of these approaches, algorithms frequently encountered in numerical and symbolic computation were hand-coded for each architecture. Timings were done in simulators and the results were evaluated to determine the speed of each design. Based on these measurements, conclusions were drawn as to which aspects of each architecture are suitable for a high- performance computer.


Author[s]: Richard C. Waters

XP. A Common Lisp Pretty Printing System

March 1989

XP provides efficient and flexible support for pretty printing in Common Lisp. Its single greatest advantage is that it allows the full benefits of pretty printing to be obtained when printing data structures, as well as when printing program code. XP is efficient, because it is based on a linear time algorithm that uses only a small fixed amount of storage. XP is flexible, because users can control the exact form of the output via a set of special format directives. XP can operate on arbitrary data structures, because facilities are provided for specifying pretty printing methods for any type of object. XP also modifies the way abbreviation based on length, nesting depth, and circularity is supported so that they automatically apply to user-defined functions that perform output – e.g., print functions for structures. In addition, a new abbreviation mechanism is introduced that can be used to limit the total numbers of lines printed.


Author[s]: Richard C. Waters

XP. A Common Lisp Pretty Printing System

August 1989

XP provides efficient and flexible support for pretty printing in Common Lisp. Its single greatest advantage is that it allows the full benefits of pretty printing to be obtained when printing data structures, as well as when printing program code. XP is efficient, because it is based on a linear time algorithm that uses a small fixed amount of storage. XP is flexible, because users can control the exact form of the output via a set of special format directives. XP can operate on arbitrary data structures, because facilities are provided for specifying pretty printing methods for any type of object.


Author[s]: Thomas F. Knight, Jr. and Patrick G. Sobalvarro

Routing Statistics for Unqueued Banyan Networks

September 1990

Banyan networks comprise a large class of networks that have been used for interconnection in large-scale multiprocessors and telephone switching systems. Regular variants of Banyan networks, such as delta and butterfly networks, have been used in multiprocessors such as the IBM RP3 and the BBN Butterfly. Analysis of the performance of Banyan networks has typically focused on these regular variants. We present a methodology for performance analysis of unbuffered Banyan multistage interconnection networks. The methodology has two novel features: it allows analysis of networks where some inputs are more likely to be active than others, and allows analysis of Banyan networks of arbitrary topology.


Author[s]: Charles Rich and Richard C. Waters

Intelligent Assistance for Program Recognition, Design, Optimization, and Debugging

January 1989

A recognition assistant will help reconstruct the design of a program, given only its source code. A design assistant will assist a programmer by detecting errors and inconsistencies in his design choices and by automatically making many straightforward implementation decisions. An optimization assistant will help improve the performance of programs by identifying intermediate results that can be reused. A debugging assistant will aid in the detection, localization, and repair of errors in designs as well as completed programs.


Author[s]: Mark Harper Shirley

Generating Circuit Tests by Exploiting Designed Behavior

December 1988

This thesis describes two programs for generating tests for digital circuits that exploit several kinds of expert knowledge not used by previous approaches. First, many test generation problems can be solved efficiently using operation relations, a novel representation of circuit behavior that connects internal component operations with directly executable circuit operations. Operation relations can be computed efficiently by searching traces of simulated circuit behavior. Second, experts write test programs rather than test vectors because programs are more readable and compact. Test programs can be constructed automatically by merging program fragments using expert-supplied goal-refinement rules and domain-independent planning techniques.


Author[s]: Boris Katz

Using English For Indexing and Retrieving

October 1988

This paper describes a natural language system START. The system analyzes English text and automatically transforms it into an appropriate representation, the knowledge base, which incorporates the information found in the text. The user gains access to information stored in the knowledge base by querying it in English. The system analyzes the query and decides through a matching process what information in the knowledge base is relevant to the question. Then it retrieves this information and formulates its response also in English.


Author[s]: Harold Abelson, Michael Eisenberg, Mathew Halfact, Jacob Katzenelson, Elisha Sacks, Gerald Jay Sussman, Jack Wisdom and Ken Yip

Intelligence in Scientific Computing

November 1988

Combining numerical techniques with ideas from symbolic computation and with methods incorporating knowledge of science and mathematics leads to a new category of intelligent computational tools for scientists and engineers. These tools autonomously prepare simulation experiments from high- level specifications of physical models. For computationally intensive experiments, they automatically design special-purpose numerical engines optimized to perform the necessary computations. They actively monitor numerical and physical experiments. They interpret experimental data and formulate numerical results in qualitative terms. They enable their human users to control computational experiments in terms of high-level behavioral descriptions.


Author[s]: Eric Saund

The Role of Knowledge in Visual Shape Representation

October 1988

This report shows how knowledge about the visual world can be built into a shape representation in the form of a descriptive vocabulary making explicit the important geometrical relationships comprising objects' shapes. Two computational tools are offered: (1) Shapestokens are placed on a Scale- Space Blackboard, (2) Dimensionality- reduction captures deformation classes in configurations of tokens. Knowledge lies in the token types and deformation classes tailored to the constraints and regularities ofparticular shape worlds. A hierarchical shape vocabulary has been implemented supporting several later visual tasks in the two-dimensional shape domain of the dorsal fins of fishes.


Author[s]: Rodney A. Brooks

A Robot that Walks: Emergent Behaviors from a Carefully Evolved Network

February 1989

Most animals have significant behavioral expertise built in without having to explicitly learn it all from scratch. This expertise is a product of evolution of the organism; it can be viewed as a very long term form of learning which provides a structured system within which individuals might learn more specialized skills or abilities. This paper suggests one possible mechanism for analagous robot evolution by describing a carefully designed series of networks, each one being a strict augmentation of the previous one, which control a six legged walking machine capable of walking over rough terrain and following a person passively sensed in the infrared spectrum. As the completely decentralized networks are augmented, the robot's performance and behavior repertoire demonstrably improve. The rationale for such demonstrations is that they may provide a hint as to the requirements for automatically building massive networks to carry out complex sensory-motor tasks. The experiments with an actual robot ensure that an essence of reality is maintained and that no critical problems have been ignored.


Author[s]: Allen C. Ward

A Theory of Quantitative Inference Applied to a Mechanical Design Compiler

January 1989

This thesis presents the ideas underlying a computer program that takes as input a schematic of a mechanical or hydraulic power transmission system, plus specifications and a utility function, and returns catalog numbers from predefined catalogs for the optimal selection of components implementing the design. Unlike programs for designing single components or systems, the program provides the designer with a high level "language" in which to compose new designs. It then performs some of the detailed design process. The process of "compilation" is based on a formalization of quantitative inferences about hierarchically organized sets of artifacts and operating conditions. This allows the design compilation without the exhaustive enumeration of alternatives.


Author[s]: Terence D. Sanger

Optimal Unsupervised Learning in Feedforward Neural Networks

January 1989

We investigate the properties of feedforward neural networks trained with Hebbian learning algorithms. A new unsupervised algorithm is proposed which produces statistically uncorrelated outputs. The algorithm causes the weights of the network to converge to the eigenvectors of the input correlation with largest eigenvalues. The algorithm is closely related to the technique of Self-supervised Backpropagation, as well as other algorithms for unsupervised learning. Applications of the algorithm to texture processing, image coding, and stereo depth edge detection are given. We show that the algorithm can lead to the development of filters qualitatively similar to those found in primate visual cortex.


Author[s]: Philip E. Agre

The Dynamic Structures of Everyday Life

October 1988

Computational theories of action have generally understood the organized nature of human activity through the construction and execution of plans. By consigning the phenomena of contingency and improvisation to peripheral roles, this view has led to impractical technical proposals. As an alternative, I suggest that contingency is a central feature of everyday activity and that improvisation is the central kind of human activity. I also offer a computational model of certain aspects of everyday routine activity based on an account of improvised activity called running arguments and an account of representation for situated agents called deictic representation .


Author[s]: Allen C. Ward and Warren Seering

The Performance of a Mechanical Design 'Compiler'

January 1989

A mechanical design "compiler" has been developed which, given an appropriate schematic, specifications, and utility function for a mechanical design, returns catalog numbers for an optimal implementation. The compiler has been successfully tested on a variety of mechanical and hydraulic power transmission designs and a few temperature sensing designs. Times required have been at worst proportional to the logarithm of the number of possible combinations of catalog numbers.


Author[s]: Richard C. Waters

Optimization of Series Expressions: Part II: Overview of the Theory and Implementation

January 1989

The benefits of programming in a functional style are well known. In particular, algorithms that are expressed as compositions of functions operating on series/vectors/streams of data elements are much easier to understand and modify than equivalent algorithms expressed as loops. Unfortunately, many programmers hesitate to use series expressions, because they are typically implemented very inefficiently---the prime source of inefficiency being the creation of intermediate series objects. A restricted class of series expressions, obviously synchronizable series expressions, is defined which can be evaluated very efficiently. At the cost of introducing restrictions which place modest limits on the series expressions which can be written, the restrictions guarantee that the creation of intermediate series objects is never necessary. This makes it possible to automatically convert obviously synchronizable series expressions into highly efficient loops using straight forward algorithms.


Author[s]: Richard C. Waters

Optimization of Series Expressions: Part I: User's Manual for the Series Macro Package

January 1989

The benefits of programming in a functional style are well known. In particular, algorithms that are expressed as compositions of functions operating on series/vectors/streams of data elements are much easier to understand and modify than equivalent algorithms expressed as loops. Unfortunately, many programmers hesitate to use series expressions, because they are typically implemented very inefficiently. A Common Lisp macro package (OSS) has been implemented which supports a restricted class of series expressions, obviously synchronizable series expressions, which can be evaluated very efficiently by automatically converting them into loops. Using this macro package, programmers can obtain the advantages of expressing computations as series expressions without incurring any run-time overhead.


Author[s]: Benjamin J. Paul

A Systems Approach to the Torque Control of a Permanent Magnet Brushless Motor

August 1987

Many approaches to force control have assumed the ability to command torques accurately. Concurrently, much research has been devoted to developing accurate torque actuation schemes. Often, torque sensors have been utilized to close a feedback loop around output torque. In this paper, the torque control of a brushless motor is investigated through: the design, construction, and utilization of a joint torque sensor for feedback control; and the development and implementation of techniques for phase current based feedforeward torque control. It is concluded that simply closing a torque loop is no longer necessarily the best alternative since reasonably accurate current based torque control is achievable.


Author[s]: Waldemar Horwat

A Concurrent Smalltalk Compiler for the Message-Driven Processor

May 1988

This thesis describes Optimist, an optimizing compiler for the Concurrent Smalltalk language developed by the Concurrent VLSI Architecture Group. Optimist compiles Concurrent Smalltalk to the assembly language of the Message-Driven Processor (MDP). The compiler includes numerous optimization techniques such as dead code elimination, dataflow analysis, constant folding, move elimination, concurrency analysis, duplicate code merging, tail forwarding, use of register variables, as well as various MDP-specific optimizations in the code generator. The MDP presents some unique challenges and opportunities for compilation. Due to the MDP's small memory size, it is critical that the size of the generated code be as small as possible. The MDP is an inherently concurrent processor with efficient mechanisms for sending and receiving messages; the compiler takes advantage of these mechanisms. The MDP's tagged architecture allows very efficient support of object-oriented languages such as Concurrent Smalltalk. The initial goals for the MDP were to have the MDP execute about twenty instructions per method and contain 4096 words of memory. This compiler shows that these goals are too optimistic -- most methods are longer, both in terms of code size and running time. Thus, the memory size of the MDP should be increased.


Author[s]: Eric W. Aboaf

Task-Level Robot Learning

August 1988

We are investigating how to program robots so that they learn from experience. Our goal is to develop principled methods of learning that can improve a robot's performance of a wide range of dynamic tasks. We have developed task-level learning that successfully improves a robot's performance of two complex tasks, ball-throwing and juggling. With task- level learning, a robot practices a task, monitors its own performance, and uses that experience to adjust its task-level commands. This learning method serves to complement other approaches, such as model calibration, for improving robot performance.


Author[s]: Davi Geiger and Tomaso Poggio

An Optimal Scale for Edge Detection

September 1988

Many problems in early vision are ill posed. Edge detection is a typical example. This paper applies regularization techniques to the problem of edge detection. We derive an optimal filter for edge detection with a size controlled by the regularization parameter $\ lambda $ and compare it to the Gaussian filter. A formula relating the signal-to-noise ratio to the parameter $\lambda $ is derived from regularization analysis for the case of small values of $\lambda$. We also discuss the method of Generalized Cross Validation for obtaining the optimal filter scale. Finally, we use our framework to explain two perceptual phenomena: coarsely quantized images becoming recognizable by either blurring or adding noise.


Author[s]: Walter Charles Hamscher

Model-Based Troubleshooting of Digital Systems

August 1988

This thesis describes a methodology, a representation, and an implemented program for troubleshooting digital circuit boards at roughly the level of expertise one might expect in a human novice. Existing methods for model-based troubleshooting have not scaled up to deal with complex circuits, in part because traditional circuit models do not explicitly represent aspects of the device that troubleshooters would consider important. For complex devices the model of the target device should be constructed with the goal of troubleshooting explicitly in mind. Given that methodology, the principal contributions of the thesis are ways of representing complex circuits to help make troubleshooting feasible. Temporally coarse behavior descriptions are a particularly powerful simplification. Instantiating this idea for the circuit domain produces a vocabulary for describing digital signals. The vocabulary has a level of temporal detail sufficient to make useful predictions abut the response of the circuit while it remains coarse enough to make those predictions computationally tractable. Other contributions are principles for using these representations. Although not embodied in a program, these principles are sufficiently concrete that models can be constructed manually from existing circuit descriptions such as schematics, part specifications, and state diagrams. One such principle is that if there are components with particularly likely failure modes or failure modes in which their behavior is drastically simplified, this knowledge should be incorporated into the model. Further contributions include the solution of technical problems resulting from the use of explicit temporal representations and design descriptions with tangled hierarchies.


Author[s]: Daphna Weinshall

Seeing 'Ghost' Solutions in Stereo Vision

September 1988

A unique matching is a stated objective of most computational theories of stereo vision. This report describes situations where humans perceive a small number of surfaces carried by non-unique matching of random dot patterns, although a unique solution exists and is observed unambiguously in the perception of isolated features. We find both cases where non-unique matchings compete and suppress each other and cases where they are all perceived as transparent surfaces. The circumstances under which each behavior occurs are discussed and a possible explanation is sketched. It appears that matching reduces many false targets to a few, but may still yield multiple solutions in some cases through a (possibly different) process of surface interpolation.


Author[s]: Steven D. Eppinger

Modeling Robot Dynamic Performance for Endpoint Force Control

September 1988

This research aims to understand the fundamental dynamic behavior of servo- controlled machinery in response to various types of sensory feedback. As an example of such a system, we study robot force control, a scheme which promises to greatly expand the capabilities of industrial robots by allowing manipulators to interact with uncertain and dynamic tasks. Dynamic models are developed which allow the effects of actuator dynamics, structural flexibility, and workpiece interaction to be explored in the frequency and time domains. The models are used first to explain the causes of robot force control instability, and then to find methods of improving this performance.


Author[s]: Berthold K.P. Horn

Parallel Networks for Machine Vision

December 1988

The amount of computation required to solve many early vision problems is prodigious, and so it has long been thought that systems that operate in a reasonable amount of time will only become feasible when parallel systems become available. Such systems now exist in digital form, but most are large and expensive. These machines constitute an invaluable test- bed for the development of new algorithms, but they can probably not be scaled down rapidly in both physical size and cost, despite continued advances in semiconductor technology and machine architecture. Simple analog networks can perform interesting computations, as has been known for a long time. We have reached the point where it is feasible to experiment with implementation of these ideas in VLSI form, particularly if we focus on networks composed of locally interconnected passive elements, linear amplifiers, and simple nonlinear components. While there have been excursions into the development of ideas in this area since the very beginnings of work on machine vision, much work remains to be done. Progress will depend on careful attention to matching of the capabilities of simple networks to the needs of early vision. Note that this is not at all intended to be anything like a review of the field, but merely a collection of some ideas that seem to be interesting.


Author[s]: Brian K. Totty

An Operating Environment for the Jellybean Machine

May 1988

The Jellybean Machine is a scalable MIMD concurrent processor consisting of special purpose RISC processors loosely coupled into a low latency network. I have developed an operating system to provide the supportive environment required to efficiently coordinate the collective power of the distributed processing elements. The system services are developed in detail, and may be of interest to other designers of fine grain, distributed memory processing networks.


Author[s]: William Dally, Andrew Chien, Stuart Fiske, Waldemar Horwat, John Keen, Peter Nuth, Jerry Larivee and Brian Totty

Message-Driven Processor Architecture: Verson 11

August 1988

The Message-Driven Processor is a node of a large-scale multiprocessor being developed by the Concurrent VLSI Architecture Group. It is intended to support fine-grained, message passing, parallel computation. It contains several novel architectural features, such as a low-latency network interface, extensive type- checking hardware, and on-chip memory that can be used as an associative lookup table. This document is a programmer's guide to the MDP. It describes the processor's register architecture, instruction set, and the data types supported by the processor. It also details the MDP's message sending and exception handling facilities.


Author[s]: Hsien-Che Lee

Estimating the Illuminant Color from the Shading of a Smooth Surface

August 1988

5{ a uniform wall illuminated by a spot light often gives a strong impression of the illuminant color. How can it be possible to know if it is a white wall illuminated by yellow light or a yellow wall illuminated by white light? If the wall is a Lambertian reflector, it would not be possible to tell the difference. However, in the real world, some amount of specular reflection is almost always present. In this memo, it is shown that the computation is possible in most practical cases.


Author[s]: James J. Little and Alessandro Verri

Analysis of Differential and Matching Methods for Optical Flow

August 1988

Several algorithms for optical flow are studied theoretically and experimentally. Differential and matching methods are examined; these two methods have differing domains of application- differential methods are best when displacements in the image are small (<2 pixels) while matching methods work well for moderate displacements but do not handle sub-pixel motions. Both types of optical flow algorithm can use either local or global constraints, such as spatial smoothness. Local matching and differential techniques and global differential techniques will be examined. Most algorithms for optical flow utilize weak assumptions on the local variation of the flow and on the variation of image brightness. Strengthening these assumptions improves the flow computation. The computational consequence of this is a need for larger spatial and temporal support. Global differential approaches can be extended to local (patchwise) differential methods and local differential methods using higher derivatives. Using larger support is valid when constraint on the local shape of the flow are satisfied. We show that a simple constraint on the local shape of the optical flow, that there is slow spatial variation in the image plane, is often satisfied. We show how local differential methods imply the constraints for related methods using higher derivatives. Experiments show the behavior of these optical flow methods on velocity fields which so not obey the assumptions. Implementation of these methods highlights the importance of numerical differentiation. Numerical approximation of derivatives require care, in two respects: first, it is important that the temporal and spatial derivatives be matched, because of the significant scale differences in space and time, and, second, the derivative estimates improve with larger support.


Author[s]: Margaret Morrison Fleck

Bondaries and Topological Algorithms

September 1988

This thesis develops a model for the topological structure of situations. In this model, the topological structure of space is altered by the presence or absence of boundaries, such as those at the edges of objects. This allows the intuitive meaning of topological concepts such as region connectivity, function continuity, and preservation of topological structure to be modeled using the standard mathematical definitions. The thesis shows that these concepts are important in a wide range of artificial intelligence problems, including low- level vision, high-level vision, natural language semantics, and high-level reasoning.


Author[s]: Allen C. Ward and Warren Seering

Quantitative Inference in a Mechanical Design Compiler

January 1989

This paper presents the ideas underlying a program that takes as input a schematic of a mechanical or hydraulic power transmission system, plus specifications and a utility function, and returns catalog numbers from predefined catalogs for the optimal selection of components implementing the design. It thus provides the designer with a high level "language" in which to compose new designs, then performs some of the detailed design process for him. The program is based on a formalization of quantitative inferences about hierarchically organized sets of artifacts and operating conditions, which allows design compilation without the exhaustive enumeration of alternatives.


Author[s]: Shimon Ullman and Amnon Sha'ashua

Structural Saliency: The Detection of Globally Salient Structures Using a Locally Connected Network

July 1988

Certain salient structures in images attract our immediate attention without requiring a systematic scan. We present a method for computing saliency by a simple iterative scheme, using a uniform network of locally connected processing elements. The network uses an optimization approach to produce a "saliency map," a representation of the image emphasizing salient locations. The main properties of the network are: (i) the computations are simple and local, (ii) globally salient structures emerge with a small number of iterations, and (iii) as a by- product of the computations, contours are smoothed and gaps are filled in.


Author[s]: Shimon Ullman and Ronen Basri

The Alignment of Objects with Smooth Surfaces

July 1988

This paper examines the recognition of rigid objects bounded by smooth surfaces using an alignment approach. The projected image of such an object changes during rotation in a manner that is difficult to predict. A method to approach this problem is suggested, using the 3D surface curvature at the points along the silhouette. The curvature information requires a single number for each point along the object's silhouette, the magnitude of the curvature vector at the point. We have implemented and tested this method on images of complex 3D objects; it was found to give accurate predictions of the objects' appearances for large transformations. A small number of models can be used to predict the new appearance of an object from any viewpoint.


Author[s]: Randall Davis and Walter C. Hamscher

Model-Based Reasoning: Troubleshooting

July 1988

To determine why something has stopped working, it is useful to know how it was supposed to work in the first place. That simple observation underlies some of the considerable interest generated in recent years on the topic of model-based reasoning, particularly its application to diagnosis and troubleshooting. This paper surveys the current state of the art, reviewing areas that are well understood and exploring areas that present challenging research topics. It views the fundamental paradigm as the interaction of prediction and observation, and explores it by examining three fundamental subproblems: Generating hypotheses by reasoning from a symptom to a collection of components whose misbehavior may plausibly have caused that symptom; testing each hypothesis to see whether it can account for all available observations of device behavior; then discriminating among the ones that survive testing. We analyze each of these independently at the knowledge level, i.e., attempting to understand what reasoning capabilities arise from the different varieties of knowledge available to the program. We find that while a wide range of apparently diverse model-based systems have been built for diagnosis and troubleshooting, they are for the most part variations on the central theme outlined here. Their diversity lies primarily in the varying amounts and kinds of knowledge they bring to bear at each stage of the process; the underlying paradigm is fundamentally the same. Our survey of this familiar territory leads to a second major conclusion of the paper: Diagnostic reasoning from a model is reasonably understood. Given a model of behavior and structure, we know how to use it in a variety of ways to produce a diagnosis. There is, by contrast, a rich supply of open research issues in the modeling process itself. In a sense we know how to do model-based reasoning; we do not know how to model the behavior of complex devices, how to create models, and how to select the "right" model for the task at hand.


Author[s]: Sundar Narasimhan

Dexterous Robotic Hands: Kinematics and Control

November 1988

This report presents issues relating to the kinematics and control of dexterous robotic hands using the Utah-MIT hand as an illustrative example. The emphasis throughout is on the actual implementation and testing of the theoretical concepts presented. The kinematics of such hands is interesting and complicated owing to the large number of degrees of freedom involved. The implementation of position and force control algorithms on such tendon driven hands has previously suffered from inefficient formulations and a lack of sophisticated computer hardware. Both these problems are addressed in this report. A multiprocessor architecture has been built with high performance microcomputers on which real-time algorithms can be efficiently implemented. A large software library has also been built to facilitate flexible software development on this architecture. The position and force control algorithms described herein have been implemented and tested on this hardware.


Author[s]: Panayotis S. Skordos

Multistep Methods for Integrating the Solar System

July 1988

High order multistep methods, run at constant stepsize, are very effective for integrating the Newtonian solar system for extended periods of time. I have studied the stability and error growth of these methods when applied to harmonic oscillators and two-body systems like the Sun-Jupiter pair. I have also tried to design better multistep integrators than the traditional Stormer and Cowell methods, and I have found a few interesting ones.


Author[s]: William T. Townsend

The Effect of Transmission Design on Force-Controlled Manipulator Performance

April 1988

Previous research in force control has focused on the choice of appropriate servo implementation without corresponding regard to the choice of mechanical hardware. This report analyzes the effect of mechanical properties such as contact compliance, actuator-to-joint compliance, torque ripple, and highly nonlinear dry friction in the transmission mechanisms of a manipulator. A set of requisites for high performance then guides the development of mechanical- design and servo strategies for improved performance. A single-degree-of-freedom transmission testbed was constructed that confirms the predicted effect of Coulomb friction on robustness; design and construction of a cable-driven, four-degree-of- freedom, "whole-arm" manipulator illustrates the recommended design strategies.


Author[s]: Ron I. Kuper

Dependency-Directed Localization of Software Bugs

May 1989

Software bugs are violated specifications. Debugging is the process that culminates in repairing a program so that it satisfies its specification. An important part of debugging is localization, whereby the smallest region of the program that manifests the bug is found. The Debugging Assistant (DEBUSSI) localizes bugs by reasoning about logical dependencies. DEBUSSI manipulates the assumptions that underlie a bug manifestation, eventually localizing the bug to one particular assumption. At the same time, DEBUSSI acquires specification information, thereby extending its understanding of the buggy program. The techniques used for debugging fully implemented code are also appropriate for validating partial designs.


Author[s]: Paul Resnick

Generalizing on Multiple Grounds: Performance Learning in Model-Based Technology

February 1989

This thesis explores ways to augment a model-based diagnostic program with a learning component, so that it speeds up as it solves problems. Several learning components are proposed, each exploiting a different kind of similarity between diagnostic examples. Through analysis and experiments, we explore the effect each learning component has on the performance of a model-based diagnostic program. We also analyze more abstractly the performance effects of Explanation-Based Generalization, a technology that is used in several of the proposed learning components.


Author[s]: Peng Wu

Test Generation Guided Design for Testability

July 1988

This thesis presents a new approach to building a design for testability (DFT) system. The system takes a digital circuit description, finds out the problems in testing it, and suggests circuit modifications to correct those problems. The key contributions of the thesis research are (1) setting design for testability in the context of test generation (TG), (2) using failures during FG to focus on testability problems, and (3) relating circuit modifications directly to the failures. A natural functionality set is used to represent the maximum functionalities that a component can have. The current implementation has only primitive domain knowledge and needs other work as well. However, armed with the knowledge of TG, it has already demonstrated its ability and produced some interesting results on a simple microprocessor.


Author[s]: Philip E. Agre and David Chapman

What Are Plans For?

October 1989

What plans are like depends on how they're used. We contrast two views of plan use. On the plan-as-program-view, plan use is the execution of an effective procedure. On the plan-as-communication view, plan use is like following natural language instructions. We have begun work on computational models of plans-as-communication, building on our previous work on improvised activity and on ideas from sociology.


Author[s]: Alan Bawden and Jonathan Rees

Syntactic Closures

June 1988

In this paper we describe {\it syntactic closures}. Syntactic closures address the scoping problems that arise when writing macros. We discuss some issues raised by introducing syntactic closures into the macro expansion interface, and we compare syntactic closures with other approaches. Included is a complete implementation.


Author[s]: Reid G. Simmons

Combining Associational and Causal Reasoning to Solve Interpretation and Planning Problems

August 1988

This report describes a paradigm for combining associational and causal reasoning to achieve efficient and robust problem-solving behavior. The Generate, Test and Debug (GTD) paradigm generates initial hypotheses using associational (heuristic) rules. The tester verifies hypotheses, supplying the debugger with causal explanations for bugs found if the test fails. The debugger uses domain- independent causal reasoning techniques to repair hypotheses, analyzing domain models and the causal explanations produced by the tester to determine how to replace faulty assumptions made by the generator. We analyze the strengths and weaknesses of associational and causal reasoning techniques, and present a theory of debugging plans and interpretations. The GTD paradigm has been implemented and tested in the domains of geologic interpretation, the blocks world, and Tower of Hanoi problems.


Author[s]: Richard James Doyle

Hypothesizing Device Mechanisms: Opening Up the Black Box

June 1988

I describe an approach to forming hypotheses about hidden mechanism configurations within devices given external observations and a vocabulary of primitive mechanisms. An implemented causal modelling system called JACK constructs explanations for why a second piece of toast comes out lighter, why the slide in a tire gauge does not slip back inside when the gauge is removed from the tire, and how in a refrigerator a single substance can serve as a heat sink for the interior and a heat source for the exterior. I report the number of hypotheses admitted for each device example, and provide empirical results which isolate the pruning power due to different constraint sources.


Author[s]: Steven D. Eppinger and Warren P. Seering

Modeling Robot Flexibility for Endpoint Force Control

May 1988

Dynamic models have been developed in an attempt to match the response of a robot arm. The experimental data show rigid-body and five resonant modes. The frequency response and pole-zero arrays for various models of structural flexibility are compared with the data to evaluate the characteristics of the models, and to provide insight into the nature of the flexibility in the robot. Certain models are better able to depict transmission flexibility while others describe types of structural flexibility.


Author[s]: Daniel Peter Huttenlocher

Three-Dimensional Recognition of Solid Objects from a Two-Dimensional Image

October 1988

This thesis addresses the problem of recognizing solid objects in the three- dimensional world, using two-dimensional shape information extracted from a single image. Objects can be partly occluded and can occur in cluttered scenes. A model based approach is taken, where stored models are matched to an image. The matching problem is separated into two stages, which employ different representations of objects. The first stage uses the smallest possible number of local features to find transformations from a model to an image. This minimizes the amount of search required in recognition. The second stage uses the entire edge contour of an object to verify each transformation. This reduces the chance of finding false matches.


Author[s]: W. Eric L. Grimson and David Huttenlocher

On the Sensitivity of the Hough Transform for Object Recognition

May 1988

A common method for finding an object's pose is the generalized Hough transform, which accumulates evidence for possible coordinate transformations in a parameter space and takes large clusters of similar transformations as evidence of a correct solution. We analyze this approach by deriving theoretical bounds on the set of transformations consistent with each data- model feature pairing, and by deriving bounds on the likelihood of false peaks in the parameter space, as a function of noise, occlusion, and tessellation effects. We argue that blithely applying such methods to complex recognition tasks is a risky proposition, as the probability of false positives can be very high.


Author[s]: Karl T. Ulrich

Computation and Pre-Parametric Design

September 1988

My work is broadly concerned with the question "How can designs bessynthesized computationally?" The project deals primarily with mechanical devices and focuses on pre- parametric design: design at the level of detail of a blackboard sketch rather than at the level of detail of an engineering drawing. I explore the project ideas in the domain of single-input single-output dynamic systems, like pressure gauges, accelerometers, and pneumatic cylinders. The problem solution consists of two steps: 1) generate a schematic description of the device in terms of idealized functional elements, and then 2) from the schematic description generate a physical description.


Author[s]: Jacob Katzenelson

Computational Structure of the N-body Problem

April 1988

This work considers the organization and performance of computations on parallel computers of tree algorithms for the N-body problem where the number of particles is on the order of a million. The N-body problem is formulated as a set of recursive equations based on a few elementary functions, which leads to a computational structure in the form of a pyramid-like graph, where each vertex is a process, and each arc a communication link. The pyramid is mapped to three different processor configurations: (1) A pyramid of processors corresponding to the processes pyramid graph; (2) A hypercube of processors, e.g., a connection-machine like architecture; (3) A rather small array, e.g., $2 \times 2 \ times 2$, of processors faster than the ones considered in (1) and (2) above. Simulations of this size can be performed on any of the three architectures in reasonable time.


Author[s]: Boris Katz and Beth Levin

Exploiting Lexical Regularities in Designing Natural Language Systems

April 1988

This paper presents the lexical component of the START Question Answering system developed at the MIT Artificial Intelligence Laboratory. START is able to interpret correctly a wide range of semantic relationships associated with alternate expressions of the arguments of verbs. The design of the system takes advantage of the results of recent linguistic research into the structure of the lexicon, allowing START to attain a broader range of coverage than many existing systems.


Author[s]: Andrew A. Berlin and Henry M. Wu

Scheme86: A System for Interpreting Scheme

April 1988

Scheme86 is a computer system designed to interpret programs written in the Scheme dialect of Lisp. A specialized architecture, coupled with new techniques for optimizing register management in the interpreter, allows Scheme86 to execute interpreted Scheme at a speed comparable to that of compiled Lisp on conventional workstations.


Author[s]: Gerald Jay Sussman and Jack Wisdom

Numerical Evidence that the Motion of Pluto is Chaotic

April 1988

The Digital Orrery has been used to perform an integration of the motion of the outer planets for 845 million years. This integration indicates that the long-term motion of the planet Pluto is chaotic. Nearby trajectories diverge exponentially with an e-folding time of only about 20 million years.


Author[s]: Ellen C. Hildreth and Shimon Ullman

The Computational Study of Vision

April 1988

The computational approach to the study of vision inquires directly into the sort of information processing needed to extract important information from the changing visual image---information such as the three- dimensional structure and movement of objects in the scene, or the color and texture of object surfaces. An important contribution that computational studies have made is to show how difficult vision is to perform, and how complex are the processes needed to perform visual tasks successfully. This article reviews some computational studies of vision, focusing on edge detection, binocular stereo, motion analysis, intermediate vision, and object recognition.


Author[s]: Joachim Heel

Dynamical Systems and Motion Vision

April 1988

In this paper we show how the theory of dynamical systems can be employed to solve problems in motion vision. In particular we develop algorithms for the recovery of dense depth maps and motion parameters using state space observers or filters. Four different dynamical models of the imaging situation are investigated and corresponding filters/ observers derived. The most powerful of these algorithms recovers depth and motion of general nature using a brightness change constraint assumption. No feature- matching preprocessor is required.


Author[s]: Kenneth Alan Pasch

Heuristics for Job-Shop Scheduling

January 1988

Two methods of obtaining approximate solutions to the classic General Job-shop Scheduling Program are investigated. The first method is iterative. A sampling of the solution space is used to decide which of a collection of space pruning constraints are consistent with "good" schedules. The selected space pruning constraints are then used to reduce the search space and the sampling is repeated. This approach can be used either to verify whether some set of space pruning constraints can prune with discrimination or to generate solutions directly. Schedules can be represented as trajectories through a Cartesian space. Under the objective criteria of Minimum maximum Lateness family of “good" schedules (trajectories) are geometric neighbors (reside with some “tube”) in this space. This second method of generating solutions takes advantage of this adjacency by pruning the space from the outside in thus converging gradually upon this “tube.” One the average this methods significantly outperforms an array of the Priority Dispatch rules when the object criteria is that of Minimum Maximum Lateness. It also compares favorably with a recent relaxation procedure.


Author[s]: Daniel S. Weld

Theories of Comparative Analysis

May 1988

Comparative analysis is the problem of predicting how a system will react to perturbations in its parameters, and why. For example, comparative analysis could be asked to explain why the period of an oscillating spring/block system would increase if the mass of the block were larger. This thesis formalizes the task of comparative analysis and presents two solution techniques: differential qualitative (DQ) analysis and exaggeration. Both techniques solve many comparative analysis problems, providing explanations suitable for use by design systems, automated diagnosis, intelligent tutoring systems, and explanation based generalization. This thesis explains the theoretical basis for each technique, describes how they are implemented, and discusses the difference between the two. DQ analysis is sound; it never generates an incorrect answer to a comparative analysis question. Although exaggeration does occasionally produce misleading answers, it solves a larger class of problems than DQ analysis and frequently results in simpler explanations.


Author[s]: Steven J. Gordon and Warren P. Seering

Real-Time Part Position Sensing

May 1988

A light stripe vision system is used to measure the location of polyhedral features of parts from a single frame of video camera output. Issues such as accuracy in locating the line segments of intersection in the image and combining redundant information from multiple measurements and multiple sources are addressed. In 2.5 seconds, a prototype sensor was capable of locating a two inch cube to an accuracy (one standard deviation) of .002 inches (.055 mm) in translation and .1 degrees (.0015 radians) in rotation. When integrated with a manipulator, the system was capable of performing high precision assembly tasks.


Author[s]: Elisha Sacks

Automatic Qualitative Analysis of Ordinary Differential Equations Using Piecewise Linear Approximations

March 1988

This paper explores automating the qualitative analysis of physical systems. It describes a program, called PLR, that takes parameterized ordinary differential equations as input and produces a qualitative description of the solutions for all initial values. PLR approximates intractable nonlinear systems with piecewise linear ones, analyzes the approximations, and draws conclusions about the original systems. It chooses approximations that are accurate enough to reproduce the essential properties of their nonlinear prototypes, yet simple enough to be analyzed completely and efficiently. It derives additional properties, such as boundedness or periodicity, by theoretical methods. I demonstrate PLR on several common nonlinear systems and on published examples from mechanical engineering.


Author[s]: Neil C. Singer

Residual Vibration Reduction in Computer Controlled Machines

February 1989

Control of machines that exhibit flexibility becomes important when designers attempt to push the state of the art with faster, lighter machines. Three steps are necessary for the control of a flexible planet. First, a good model of the plant must exist. Second, a good controller must be designed. Third, inputs to the controller must be constructed using knowledge of the system dynamic response. There is a great deal of literature pertaining to modeling and control but little dealing with the shaping of system inputs. Chapter 2 examines two input shaping techniques based on frequency domain analysis. The first involves the use of the first deriviate of a gaussian exponential as a driving function template. The second, acasual filtering, involves removal of energy from the driving functions at the resonant frequencies of the system. Chapter 3 presents a linear programming technique for generating vibration-reducing driving functions for systems. Chapter 4 extends the results of the previous chapter by developing a direct solution to the new class of driving functions. A detailed analysis of the new technique is presented from five different perspectives and several extensions are presented. Chapter 5 verifies the theories of the previous two chapters with hardware experiments. Because the new technique resembles common signal filtering, chapter 6 compares the new approach to eleven standard filters. The new technique will be shown to result in less residual vibrations, have better robustness to system parameter uncertainty, and require less computation than other currently used shaping techniques.


Author[s]: Stephen L. Chiu

Generating Compliant Motion of Objects with an Articulated Hand

June 1985

The flexibility of the robot is the key to its success as a viable aid to production. Flexibility of a robot can be explained in two directions. The first is to increase the physical generality of the robot such that it can be easily reconfigured to handle a wide variety of tasks. The second direction is to increase the ability of the robot to interact with its environment such that tasks can still be successfully completed in the presence of uncertainties. The use of articulated hands are capable of adapting to a wide variety of grasp shapes, hence reducing the need for special tooling. The availability of low mass, high bandwidth points close to the manipulated object also offers significant improvements I the control of fine motions. This thesis provides a framework for using articulated hands to perform local manipulation of objects. N particular, it addresses the issues in effecting compliant motions of objects in Cartesian space. The Stanford/JPL hand is used as an example to illustrate a number of concepts. The examples provide a unified methodology for controlling articulated hands grasping with point contacts. We also present a high-level hand programming system based on the methodologies developed in this thesis. Compliant motion of grasped objects and dexterous manipulations can be easily described in the LISP-based hand programming language.


Author[s]: Eric Saund

Symbolic Construction of a 2D Scale-Space Image

April 1988

The shapes of naturally occurring objects characteristically involve spatial events occurring at many scales. This paper offers a symbolic approach to constructing a primitive shape description across scales for 2D binary (silhouette) shape images: grouping operations are performed over collections of tokens residing on a Scale-Space Blackboard. Two types of grouping operations are identified that, respectively: (1) aggregate edge primitives at one scale into edge primitives at a coarser scale and (2) group edge primitives into partial-region assertions, including curved- contours, primitive-corners, and bars. This approach avoids several drawbacks of numerical smoothing methods.


Author[s]: Neil Singer and Warren Seering

Preshaping Command Inputs to Reduce System Vibration

January 1988

A method is presented for generating shaped command inputs which significantly reduce or eliminate endpoint vibration. Desired system inputs are altered so that the system completes the requested move without residual vibration. A short move time penalty is incurred (on the order of one period of the first mode of vibration). The preshaping technique is robust under system parameter uncertainty and may be applied to both open and closed loop systems. The Draper Laboratory's Space Shuttle Remote Manipulator System simulator (DRS) is used to evaluate the method. Results show a factor of 25 reduction in endpoint residual vibration for typical moves of the DRS.


Author[s]: Gary L. Drescher

Demystifying Quantum Mechanics: A Simple Universe with Quantum Uncertainty

December 1988

An artificial universe is defined that has entirely deterministic laws with exclusively local interactions, and that exhibits the fundamental quantum uncertainty phenomenon: superposed states mutually interfere, but only to the extent that no observation distinguishes among them. Showing how such a universe could be elucidates interpretational issues of actual quantum mechanics. The artificial universe is a much-simplified version of Everett's real- world model (the so-called multiple-worlds formulation). In the artificial world, as in Everett's model, the tradeoff between interference and observation is deducible from the universe formalism. Artificial world examples analogous to the quantum double- slit experiment and the EPR experiment are presented.


Author[s]: Jonathan H. Connell

A Behavior-Based Arm Controller

June 1988

In this paper we describe a working, implemented controller for a real, physical mobile robot arm. The controller is composed of a collection of 15 independent behaviors which run, in real time, on a set of 8 loosely coupled on-board 8-bit microprocessors. We describe how these behaviors cooperate to actually seek out and retrieve objects using local sensory data. We also discuss the methodology used to decompose this collection task and the types of spatial representation and reasoning used by the system.


Author[s]: Christopher G. Atkeson, Eric W. Aboaf, Joseph McIntyre and David J. Reinkensmeyer

Model-Based Robot Learning

April 1988

Models play an important role in learning from practice. Models of a controlled system can be used as learning operators to refine commands on the basis of performance errors. The examples used to demonstrate this include positioning a limb at a visual target and following a defined trajectory. Better models lead to faster correction of command errors, requiring less practice to attain a given level of performance. The benefits of accurate modeling are improved performance in all aspects of control, while the risks of inadequate modeling are poor learning performance, or even degradation of performance with practice.


Author[s]: David W. Jacobs

The Use of Grouping in Visual Object Recognition.

January 1988

The report describes a recognition system called GROPER, which performs grouping by using distance and relative orientation constraints that estimate the likelihood of different edges in an image coming from the same object. The thesis presents both a theoretical analysis of the grouping problem and a practical implementation of a grouping system. GROPER also uses an indexing module to allow it to make use of knowledge of different objects, any of which might appear in an image. We test GROPER by comparing it to a similar recognition system that does not use grouping.


Author[s]: Pyung H. Chang

Analysis and Control of Robot Manipulators with Kinematic Redundancy

May 1987

A closed-form solution formula for the kinematic control of manipulators with redundancy is derived, using the Lagrangian multiplier method. Differential relationship equivalent to the Resolved Motion Method has been also derived. The proposed method is proved to provide with the exact equilibrium state for the Resolved Motion Method. This exactness in the proposed method fixes the repeatability problem in the Resolved Motion Method, and establishes a fixed transformation from workspace to the joint space. Also the method, owing to the exactness, is demonstrated to give more accurate trajectories than the Resolved Motion Method. In addition, a new performance measure for redundancy control has been developed. This measure, if used with kinematic control methods, helps achieve dexterous movements including singularity avoidance. Compared to other measures such as the manipulability measure and the condition number, this measure tends to give superior performances in terms of preserving the repeatability property and providing with smoother joint velocity trajectories. Using the fixed transformation property, Taylor’s Bounded Deviation Paths Algorithm has been extended to the redundant manipulators.


Author[s]: Pyung H. Chang

A Dexterity Measure for the Kinematic Control of Robot Manipulator with Redundany

February 1988

We have derived a new performance measure, product of minors of the Jacobian matrix, that tells how far kinematically redundant manipulators are from singularity. It was demonstrated that previously used performance measures, namely condition number and manipulability measure allowed to change configurations, caused repeatability problems and discontinuity effects. The new measure, on the other hand, assures that the arm solution remains in the same configuration, thus effectively preventing these problems.


Author[s]: Richard C. Waters

System Validation via Constraint Modeling

February 1988

Constraint modeling could be a very important system validation method, because its abilities are complementary to both testing and code inspection. In particular, even though the ability of constraint modeling to find errors is limited by the simplifications which are introduced when making a constraint model, constraint modeling can locate important classes of errors which are caused by non-local faults (i.e., are hard to find with code inspection) and manifest themselves as failures only in unusual situations (i.e., are hard to find with testing).


Author[s]: W. Eric L. Grimson

The Combinatorics of Object Recognition in Cluttered Environments Using Constrained Search

February 1988

When clustering techniques such as the Hough transform are used to isolate likely subspaces of the search space, empirical performance in cluttered scenes improves considerably. In this paper we establish formal bounds on the combinatorics of this approach. Under some simple assumptions, we show that the expected complexity of recognizing isolated objects is quadratic in the number of model and sensory fragments, but that the expected complexity of recognizing objects in cluttered environments is exponential in the size of the correct interpretation. We also provide formal bounds on the efficacy of using the Hough transform to preselect likely subspaces, showing that the problem remains exponential, but that in practical terms, the size of the problem is significantly decreased.


Author[s]: Peter Heinrich Meckl

Control of Vibration in Mechanical Systems Using Shaped Reference Inputs

January 1988

Dynamic systems which undergo rapid motion can excite natural frequencies that lead to residual vibration at the end of motion. This work presents a method to shape force profiles that reduce excitation energy at the natural frequencies in order to reduce residual vibration for fast moves. Such profiles are developed using a ramped sinusoid function and its harmonics, choosing coefficients to reduce spectral energy at the natural frequencies of the system. To improve robustness with respect to parameter uncertainty, spectral energy is reduced for a range of frequencies surrounding the nominal natural frequency. An additional set of versine profiles are also constructed to permit motion at constant speed for velocity-limited systems. These shaped force profiles are incorporated into a simple closed-loop system with position and velocity feedback. The force input is doubly integrated to generate a shaped position reference for the controller to follow. This control scheme is evaluated on the MIT Cartesian Robot. The shaped inputs generate motions with minimum residual vibration when actuator saturation is avoided. Feedback control compensates for the effect of friction Using only a knowledge of the natural frequencies of the system to shape the force inputs, vibration can also be attenuated in modes which vibrate in directions other than the motion direction. When moving several axes, the use of shaped inputs allows minimum residual vibration even when the natural frequencies are dynamically changing by a limited amount.


Author[s]: Yishai A. Feldman and Charles Rich

Pattern-Directed Invocation with Changing Equations

May 1988

The interaction of pattern-directed invocation with equality in an automated reasoning system gives rise to a completeness problem. In such systems, a demon needs to be invoked not only when its pattern exactly matches a term in the reasoning data base, but also when it is possible to create a variant that matches. An incremental algorithm has been developed which solves this problem without generating all possible variants of terms in the database. The algorithm is shown to be complete for a class of demons, called transparent demons, in which there is a well-behaved logical relationship between the pattern and the body of the demon.


Author[s]: Rodney A. Brooks, Jonathan Connell and Peter Ning

Herbert: A Second Generation Mobile Robot

January 1988

In mobile robot research we believe the structure of the platform, its capabilities, the choice of sensors, their capabilities, and the choice of processors, both onboard and offboard, greatly constrains the direction of research activity centered on the platform. We examine the design and tradeoffs in a low cost mobile platform we have built while paying careful attention to issues of sensing, manipulation, onboard processing and debuggability of the total system. The robot, named Herbert, is a completely autonomous mobile robot with an onboard parallel processor and special hardware support for the subsumption architecture [Brooks (1986)], an onboard manipulator and a laser range scanner. All processors are simple low speed 8-bit micro-processors. The robot is capable of real time three dimensional vision, while simultaneously carrying out manipulator and navigation tasks.


Author[s]: Bonnie J. Dorr

A Lexical Conceptual Approach to Generation for Machine Translation

January 1988

Current approaches to generation for machine translation make use of direct- replacement templates, large grammars, and knowledge-based inferencing techniques. Not only are rules language-specific, but they are too simplistic to handle sentences that exhibit more complex phenomena. Furthermore, these systems are not easily extendable to other languages because the rules that map the internal representation to the surface form are entirely dependent on both the domain of the system and the language being generated. Finally an adequate interlingual representation has not yet been discovered; thus, knowledge-based inferencing is necessary and syntactic cross-linguistic generalization cannot be exploited. This report introduces a plan for the development of a theoretically based computational scheme of natural language generation for a translation system. The emphasis of the project is the mapping from the lexical conceptual structure of sentences to an underlying or "base" syntactic structure called deep structure. This approach tackles the problems of thematic and structural divergence, i.e., it allows generation of target language sentences that are not thematically or structurally equivalent to their conceptually equivalent source language counterparts. Two other more secondary tasks, construction of a dictionary and mapping from dep structure to surface structure, will also be discussed. The generator operates on a constrained grammatical theory rather than on a set of surface level transformations. If the endeavor succeeds, there will no longer be a need for large, detailed grammars; general knowledge-based inferencing will not be necessary; lexical selection and syntactic realization will bw facilitated; and the model will be general enough for extension to other languages.


Author[s]: Kenneth Haase

Soft Objects: A Paradigm for Object Oriented Programming

March 1990

This paper introduces soft objects, a new paradigm for object oriented programming. This paradigm replaces the traditional notion of object classes with the specification of transforming procedures which transform simpler objects into more complicated objects. These transforming procedures incrementally construct new objects by adding new state or providing handlers for new messages. Unlike other incremental approaches (e.g. the inherited exist handlers of Object Logo [Drescher, 1987]), transforming procedures are strict functions which always return new objects; rather than conflating objects and object abstractions (classes), soft objects distinctly separates objects and their abstractions. The composition of these transforming procedures replaces the inheritance schemes of class oriented approaches; order of composition of transforming procedure makes explicit the inheritance indeterminancies introduced by multiple super classes. Issues regarding semantics, efficiency, and security are discussed in the context of several alternative implementation models and the code of a complete implementation is provided in an appendix.


Author[s]: Neil C. Singer and Warren P. Seering

Utilizing Dynamic Stability to Orient Parts

February 1988

The intent of this research is to study the dynamic behavior of a solid body resting on a moving surface. Results of the study are then used to propose methods for controlling the orientation of parts in preparation for automatic assembly. Two dynamic models are discussed in detail. The first examines the impacts required to cause reorientation of a part. The second investigates the use of oscillatory motion to selectively reorient parts. This study demonstrates that the dynamic behaviors of solid bodies, under the conditions mentioned above, vary considerably with small changes in geometry or orientation.


Author[s]: Jintae Lee

Knowledge Base Integration: What Can We Learn from Database Integration Research?

January 1988

This paper examines the issues and the solutions that have been studied in database (DB) integration research and tries to draw lessons from them for knowledge base (KB) integration.


Author[s]: Alfonso Garcia Reynoso

Structural Dynamics Model of a Cartesian Robot

October 1985

Methods are developed for predicting vibration response characteristics of systems which change configuration during operation. A cartesian robot, an example of such a position-dependent system, served as a test case for these methods and was studied in detail. The chosen system model was formulated using the technique of Component Mode Synthesis (CMS). The model assumes that he system is slowly varying, and connects the carriages to each other and to the robot structure at the slowly varying connection points. The modal data required for each component is obtained experimentally in order to get a realistic model. The analysis results in prediction of vibrations that are produced by the inertia forces as well as gravity and friction forces which arise when the robot carriages move with some prescribed motion. Computer simulations and experimental determinations are conducted in order to calculate the vibrations at the robot end- effector. Comparisons are shown to validate the model in two ways: for fixed configuration the mode shapes and natural frequencies are examined, and then for changing configuration the residual vibration at the end of the mode is evaluated. A preliminary study was done on a geometrically nonlinear system which also has position-dependency. The system consisted of a flexible four-bar linkage with elastic input and output shafts. The behavior of the rocker-beam is analyzed for different boundary conditions to show how some limiting cases are obtained. A dimensional analysis leads to an evaluation of the consequences of dynamic similarity on the resulting vibration.


Author[s]: Jacob Katzenelson and Richard Zippel

Software Structuring Principles for VLSI CAD

December 1987

A frustrating aspect of the frequent changes to large VLSI CAD systems is that so little of the old available programs can be reused. It takes too much time and effort to find the reusable pieces and recast them for the new use. Our thesis is that such systems can be designed for reusability by designing the software as layers of problem oriented languages, which are implemented by suitably extending a "base" language. We illustrate this methodology with respect to VLSI CAD programs and a particular language layer: a language for handling networks. We present two different implementations. The first uses UNIX and Enhanced C. The second approach uses Common Lisp on a Lisp machine.


Author[s]: Daphna Weinshall

Qualitative Depth and Shape from Stereo, in Agreement with Psychophysical Evidendence

December 1987

Obtaining exact depth from binocular disparities is hard if camera calibration is needed. We will show that qualitative depth information can be obtained from stereo disparities with almost no computations and with no prior knowledge (or computation) of camera parameters. We derive two expressions that order all matched points in the images in two distinct depth-consistent ways from image coordinates only. One is a tilt-related order $\lambda$, the other is a depth-related order $\chi$. Using $\lambda$ demonstrates some anomalies and unusual characteristics that have been observed in psychophysical experiments. The same approach is applied to qualitatively estimate changes in the curvature of a contour on the surface of an object, with either $x$- or $y$- coordinate fixed.


Author[s]: Eric W. Aboaf, Christopher G. Atkeson and David J. Reinkensmeyer

Task-Level Robot Learning: Ball Throwing

December 1987

We are investigating how to program robots so that they learn tasks from practice. One method, task-level learning, provides advantages over simply perfecting models of the robot's lower level systems. Task-level learning can compensate for the structural modeling errors of the robot's lower level control systems and can speed up the learning process by reducing the degrees of freedom of the models to be learned. We demonstrate two general learning procedures---fixed-model learning and refined-model learning---on a ball- throwing robot system.


Author[s]: Charles Rich

Inspection Methods in Programming: Cliches and Plans

December 1987

Inspection methods are a kind of engineering problem solving based on the recognition and use of standard forms or cliches. Examples are given of program analysis, program synthesis and program validation by inspection. A formalism, called the Plan Calculus, is defined and used to represent programming cliches in a convenient, canonical, and programming- language independent fashion.


Author[s]: Charles Rich and Richard C. Waters

The Programmer's Apprentice Project: A Research Overview

November 1987

The goal of the Programmer's Apprentice project is to develop a theory of how expert programmers analyze, synthesize, modify, explain, specify, verify, and document programs. This research goal overlaps both artificial intelligence and software engineering. From the viewpoint of artificial intelligence, we have chosen programming as a domain in which to study fundamental issues of knowledge representation and reasoning. From the viewpoint of software engineering, we seek to automate the programming process by applying techniques from artificial intelligence.


Author[s]: Aaron F. Bobick

Natural Object Categorization

November 1987

This thesis addresses the problem of categorizing natural objects. To provide a criteria for categorization we propose that the purpose of a categorization is to support the inference of unobserved properties of objects from the observed properties. Because no such set of categories can be constructed in an arbitrary world, we present the Principle of Natural Modes as a claim about the structure of the world. We first define an evaluation function that measures how well a set of categories supports the inference goals of the observer. Entropy measures for property uncertainty and category uncertainty are combined through a free parameter that reflects the goals of the observer. Natural categorizations are shown to be those that are stable with respect to this free parameter. The evaluation function is tested in the domain of leaves and is found to be sensitive to the structure of the natural categories corresponding to the different species. We next develop a categorization paradigm that utilizes the categorization evaluation function in recovering natural categories. A statistical hypothesis generation algorithm is presented that is shown to be an effective categorization procedure. Examples drawn from several natural domains are presented, including data known to be a difficult test case for numerical categorization techniques. We next extend the categorization paradigm such that multiple levels of natural categories are recovered; by means of recursively invoking the categorization procedure both the genera and species are recovered in a population of anaerobic bacteria. Finally, a method is presented for evaluating the utility of features in recovering natural categories. This method also provides a mechanism for determining which features are constrained by the different processes present in a multiple modal world.


Author[s]: Bonnie Jean Dorr

UNITRAN: A Principle-Based Approach to Machine Translation

December 1987

Machine translation has been a particularly difficult problem in the area of Natural Language Processing for over two decades. Early approaches to translation failed since interaction effects of complex phenomena in part made translation appear to be unmanageable. Later approaches to the problem have succeeded (although only bilingually), but are based on many language- specific rules of a context-free nature. This report presents an alternative approach to natural language translation that relies on principle-based descriptions of grammar rather than rule-oriented descriptions. The model that has been constructed is based on abstract principles as developed by Chomsky (1981) and several other researchers working within the “Government and Binding” (GB) framework. Thus, the grammar is viewed as a modular system of principles rather than a large set of ad hoc language-specific rules.


Author[s]: Gerald Roylance

Expressing Mathematical Subroutines Constructively

November 1987

The typical subroutines that compute $\sin(x)$ and $\exp(x)$ bear little resemblance to our mathematical knowledge of these functions: they are composed of concrete arithmetic expressions that include many mysterious numerical constants. Instead of programming these subroutines conventionally, we can express their construction using symbolic ideas such as periodicity and Taylor series. Such an approach has many advantages: the code is closer to the mathematical basis of the function, less vulnerable to errors, and is trivially adaptable to various precisions.


Author[s]: Bonnie Jean Dorr

UNITRAN: An Interlingual Machine Translation System

December 1987

This report describes the UNITRAN (UNIversal TRANslator) system, an implementation of a principle-based approach to natural language translation. The system is "interlingual", i.e., the model is based on universal principles that hold across all languages; the distinctions among languages are handled by settings of parameters associated with the universal principles. Interaction effects of linguistic principles are handled by the syste so that the programmer does not need to specifically spell out the details of rule applications. Only a small set of principles covers all languages; thus, the unmanageable grammar size of alternative approaches is no longer a problem.


Author[s]: Matthew Halfant and Gerald Jay Sussman

Abstraction in Numerical Methods

October 1987

We illustrate how the liberal use of high-order procedural abstractions and infinite streams helps us to express some of the vocabulary and methods of numerical analysis. We develop a software toolbox encapsulating the technique of Richardson extrapolation, and we apply these tools to the problems of numerical integration and differentiation. By separating the idea of Richardson extrapolation from its use in particular circumstances, we indicate how numerical programs can be written that exhibit the structure of the ideas from which they are formed.


Author[s]: Rado Jasinschi and Alan Yuille

Non-Rigid Motion and Regge Calculus

November 1987

We study the problem of recovering the structure from motion of figures which are allowed to perform a controlled non-rigid motion. We use Regge Calculus to approximate a general surface by a net of triangles. The non- rigid flexing motion we deal with corresponds to keeping the triangles rigid and allowing bending only at the joins between triangles. We show that depth information can be obtained by using a modified version of the Incremental Rigidity Scheme devised by Ullman (1984). We modify this scheme to allow for flexing motion and call our version the Incremental Semirigidity Scheme.


Author[s]: Feng Zhao

An O(N) Algorithm for Three-Dimensional N-Body Simulations

October 1987

We develop an algorithm that computes the gravitational potentials and forces on N point- masses interacting in three-dimensional space. The algorithm, based on analytical techniques developed by Rokhlin and Greengard, runs in order N time. In contrast to other fast N-body methods such as tree codes, which only approximate the interaction potentials and forces, this method is exact – it computes the potentials and forces to within any prespecified tolerance up to machine precision. We present an implementation of the algorithm for a sequential machine. We numerically verify the algorithm, and compare its speed with that of an O(N2) direct force computation. We also describe a parallel version of the algorithm that runs on the Connection Machine in order 0(logN) time. We compare experimental results with those of the sequential implementation and discuss how to minimize communication overhead on the parallel machine.


Author[s]: Berthold K.P. Horn

Relative Orientation

September 1987

Before corresponding points in images taken with two cameras can be used to recover distances to objects in a scene, one has to determine the position and orientation of one camera relative to the other. This is the classic photogrammetric problem of relative orientation, central to the interpretation of binocular stereo information. Described here is a particularly simple iterative scheme for recovering relative orientation that, unlike existing methods, does not require a good initial guess for the baseline and the rotation.


Author[s]: Michael B. Kashket

A Government-Binding Based Parser for Warlpiri, a Free-Word Order Language

January 1987

Free-word order languages have long posed significant problems for standard parsing algorithms. This thesis presents an implemented parser, based on Government- Binding (GB) theory, for a particular free-word order language, Warlpiri, an aboriginal language of central Australia. The words in a sentence of a free-word order language may swap about relatively freely with little effect on meaning: the permutations of a sentence mean essentially the same thing. It is assumed that this similarity in meaning is directly reflected in the syntax. The parser presented here properly processes free word order because it assigns the same syntactic structure to the permutations of a single sentence. The parser also handles fixed word order, as well as other phenomena. On the view presented here, there is no such thing as a “configurational” or “non-configurational” language. Rather, there is a spectrum of languages that are more or less ordered. The operation of this parsing system is quite different in character from that of more traditional rule-based parsing systems, e.g., context-free parsers. In this system, parsing is carried out via the construction of two different structures, one encoding precedence information and one encoding hierarchical information. This bipartite representation is the key to handling both free- and fixed-order phenomena. This thesis first presents an overview of the portion of Warlpiri that can be parsed. Following this is a description of the linguistic theory on which the parser is based. The chapter after that describes the representations and algorithms of the parser. In conclusion, the parser is compared to related work. The appendix contains a substantial list of test cases – both grammatical and ungrammatical – that the parser has actually processed.


Author[s]: David L. Brock

Enhancing the Dexterity of a Robot Hand Using Controlled Slip

May 1987

Humans can effortlessly manipulate objects in their hands, dexterously sliding and twisting them within their grasp. Robots, however, have none of these capabilities, they simply grasp objects rigidly in their end effectors. To investigate this common form of human manipulation, an analysis of controlled slipping of a grasped object within a robot hand was performed. The Salisbury robot hand demonstrated many of these controlled slipping techniques, illustrating many results of this analysis. First, the possible slipping motions were found as a function of the location, orientation, and types of contact between the hand and object. Second, for a given grasp, the contact types were determined as a function of the grasping force and the external forces on the object. Finally, by changing the grasping force, the robot modified the constraints on the object and affect controlled slipping slipping motions.


Author[s]: Alan Yuille and Shimon Ullman

Rigidity and Smoothness of Motion

November 1987

sMany theories of structure from motion divide the process into twosparts which are solved using different assumptions. Smoothness of thesvelocity field is often assumed to solve the motion correspondencesproblem, and then rigidity is used to recover the 3D structure. Wesprove results showing that, in a statistical sense, smoothness of thesvelocity field follows from rigidity of the motion.


Author[s]: Kenneth W. Haase, Jr.

TYPICAL: A Knowledge Representation System for Automated Discovery and Inference

August 1987

TYPICAL is a package for describing and making automatic inferences about a broad class of SCHEME predicate functions. These functions, called types following popular usage, delineate classes of primitive SCHEME objects, composite data structures, and abstract descriptions. TYPICAL types are generated by an extensible combinator language from either existing types or primitive terminals. These generated types are located in a lattice of predicate subsumption which captures necessary entailment between types; if satisfaction of one type necessarily entail satisfaction of another, the first type is below the second in the lattice. The inferences make by TYPICAL computes the position of the new definition within the lattice and establishes it there. This information is then accessible to both later inferences and other programs (reasoning systems, code analyzers, etc) which may need the information for their own purposes. TYPICAL was developed as a representation language for the discovery program Cyrano; particular examples are given of TYPICAL’s application in the Cyrano program.


Author[s]: Alan Yuille

Energy Functions for Early Vision and Analog Networks

November 1987

This paper describes attempts to model the modules of early vision in terms of minimizing energy functions, in particular energy functions allowing discontinuities in the solution. It examines the success of using Hopfield-style analog networks for solving such problems. Finally it discusses the limitations of the energy function approach.


Author[s]: Harold Abelson and Gerald Jay Sussman

Lisp: A Language for Stratified Design

August 1987

We exhibit programs that illustrate the power of Lisp as a language for expressing the design and organization of computational systems. The examples are chosen to highlight the importance of abstraction in program design and to draw attention to the use of procedures to express abstractions.


Author[s]: W. Eric L. Grimson

On the Recognition of Parameterized Objects

October 1987

Determining the identity and pose of occluded objects from noisy data is a critical step in interacting intelligently with an unstructured environment. Previous work has shown that local measurements of position and surface orientation may be used in a constrained search process to solve this problem, for the case of rigid objects, either two-dimensional or three-dimensional. This paper considers the more general problem of recognizing and locating objects that can vary in parameterized ways. We consider objects with rotational, translational, or scaling degrees of freedom, and objects that undergo stretching transformations. We show that the constrained search method can be extended to handle the recognition and localization of such generalized classes of object families.


Author[s]: Rodney A. Brooks, Anita M. Flynn and Thomas Marill

Self Calibration of Motion and Stereo Vision for Mobile RobotsNavigation

August 1987

We report on experiments with a mobile robot using one vision process (forward motion vision) to calibrate another (stereo vision) without resorting to any external units of measurement. Both are calibrated to a velocity dependent coordinate system which is natural to the task of obstacle avoidance. The foundations of these algorithms, in a world of perfect measurement, are quite elementary. The contribution of this work is to make them noise tolerant while remaining simple computationally. Both the algorithms and the calibration procedure are easy to implement and have shallow computational depth, making them (1) run at reasonable speed on moderate uni-processors, (2) appear practical to run continuously, maintaining an up-to-the- second calibration on a mobile robot, and (3) appear to be good candidates for massively parallel implementations.


Author[s]: W. Eric L. Grimson

On the Recognition of Curved Objects

July 1987

Determining the identity and pose of occluded objects from noisy data is a critical part of a system's intelligent interaction with an unstructured environment. Previous work has shown that local measurements of the position and surface orientation of small patches of an object's surface may be used in a constrained search process to solve this problem for the case of rigid polygonal objects using two-dimensional sensory data, or rigid polyhedral objects using three-dimensional data. This note extends the recognition system to deal with the problem of recognizing and locating curved objects. The extension is done in two dimensions, and applies to the recognition of two-dimensional objects from two-dimensional data, or to the recognition of three-dimensional objects in stable positions from two- dimensional data.


Author[s]: Bruce Randall Donald

Error Detection and Recovery for Robot Motion Planning with Uncertainty

July 1987

Robots must plan and execute tasks in the presence of uncertainty. Uncertainty arises from sensing errors, control errors, and uncertainty in the geometry of the environment. The last, which is called model error, has received little previous attention. We present a framework for computing motion strategies that are guaranteed to succeed in the presence of all three kinds of uncertainty. The motion strategies comprise sensor- based gross motions, compliant motions, and simple pushing motions.


Author[s]: James V. Mahoney

Image Chunking: Defining Spatial Building Blocks for Scene Analysis

August 1987

Rapid judgments about the properties and spatial relations of objects are the crux of visually guided interaction with the world. Vision begins, however, with essentially pointwise representations of the scene, such as arrays of pixels or small edge fragments. For adequate time-performance in recognition, manipulation, navigation, and reasoning, the processes that extract meaningful entities from the pointwise representations must exploit parallelism. This report develops a framework for the fast extraction of scene entities, based on a simple, local model of parallel computation.sAn image chunk is a subset of an image that can act as a unit in the course of spatial analysis. A parallel preprocessing stage constructs a variety of simple chunks uniformly over the visual array. On the basis of these chunks, subsequent serial processes locate relevant scene components and assemble detailed descriptions of them rapidly. This thesis defines image chunks that facilitate the most potentially time- consuming operations of spatial analysis--- boundary tracing, area coloring, and the selection of locations at which to apply detailed analysis. Fast parallel processes for computing these chunks from images, and chunk-based formulations of indexing, tracing, and coloring, are presented. These processes have been simulated and evaluated on the lisp machine and the connection machine.


Author[s]: David Allen McAllester

ONTIC: A Knowledge Representation System for Mathematics

July 1987

Ontic is an interactive system for developing and verifying mathematics. Ontic’s verification mechanism is capable of automatically finding and applying information from a library containing hundreds of mathematical facts. Starting with only the axioms of Zermelo- Fraenkel set theory, the Ontic system has been used to build a data base of definitions and lemmas leading to a proof of the Stone representation theorem for Boolean lattices. The Ontic system has been used to explore issues in knowledge representation, automated deduction, and the automatic use of large data bases.


Author[s]: Daniel Wayne Weise

Formal Multilevel Hierarchical Verification of Synchronous MOS Circuits

June 1987

I have designed and implemented a system for the multilevel verification of synchronous MOS VLSI circuits. The system, called Silica Pithecus, accepts the schematic of an MOS circuit and a specification of the circuit’s intended digital behavior. Silica Pithecus determines if the circuit meets its specification. If the circuit fails to meet its specification Silica Pithecus returns to the designer the reason for the failure. Unlike earlier verifiers which modelled primitives (e.g., transistors) as unidirectional digital devices, Silica Pithecus models primitives more realistically. Transistors are modelled as bidirectional devices of varying resistances, and nodes are modelled as capacitors. Silica Pithecus operates hierarchically, interactively, and incrementally. Major contributions of this research include a formal understanding of the relationship between different behavioral descriptions (e.g., signal, boolean, and arithmetic descriptions) of the same device, and a formalization of the relationship between the structure, behavior, and context of device. Given these formal structures my methods find sufficient conditions on the inputs of circuits which guarantee the correct operation of the circuit in the desired descriptive domain. These methods are algorithmic and complete. They also handle complex phenomena such as races and charge sharing. Informal notions such as races and hazards are shown to be derivable from the correctness conditions used by my methods.


Author[s]: Sundar Narasimhan, David M. Siegel and John M. Hollerbach

A Standard Architecture for Controlling Robots

July 1988

This paper describes a fully implemented computational architecture that controls the Utah-MIT dextrous hand and other complex robots. Robots like the Utah-MIT hand are characterized by large numbers of actuators and sensors, and require high servo rates. Consequently, powerful and flexible computer architectures are needed to control them. The architecture described in this paper derives its power from the highly efficient real-time environment provided for its control processors, coupled with a development host that enables flexible program development. By mapping the memory of a dedicated group of processors into the address space of a host computer, efficient sharing of system resources between them is possible. The software is characterized by a few simple design concepts but provides the facilities out of which more powerful utilities like multi- processor pseudoterminal emulator, a transparent and fast file server, and a flexible symbolic debugger could be constructed.


Author[s]: Michael A. Gennert and Shahriar Negahdaripour

Relaxing the Brightness Constancy Assumption in Computing Optical Flow

June 1987

Optical flow is the apparent (or perceived) motion of image brightness patterns arising from relative motion of objects and observer. Estimation of the optical flow requires the application of two kinds of constraint: the flow field smoothness constraint and the brightness constancy constraint. The brightness constancy constraint permits one to match image brightness values across images, but is very restrictive. We propose replacing this constraint with a more general constraint, which permits a linear transformation between image brightness values. The transformation parameters are allowed to vary smoothly so that inexact matching is allowed. We describe the implementation on a highly parallel computer and present sample results.


Author[s]: Michael D. Riley

Time-Frequency Representations for Speech Signals

May 1987

This work addresses two related questions. The first question is what joint time-frequency energy representations are most appropriate for auditory signals, in particular, for speech signals in sonorant regions. The quadratic transforms of the signal are examined, a large class that includes, for example, the spectrograms and the Wigner distribution. Quasi-stationarity is not assumed, since this would neglect dynamic regions. A set of desired properties is proposed for the representation: (1) shift-invariance, (2) positivity, (3) superposition, (4) locality, and (5) smoothness. Several relations among these properties are proved: shift-invariance and positivity imply the transform is a superposition of spectrograms; positivity and superposition are equivalent conditions when the transform is real; positivity limits the simultaneous time and frequency resolution (locality) possible for the transform, defining an uncertainty relation for joint time-frequency energy representations; and locality and smoothness tradeoff by the 2-D generalization of the classical uncertainty relation. The transform that best meets these criteria is derived, which consists of two-dimensionally smoothed Wigner distributions with (possibly oriented) 2-D guassian kernels. These transforms are then related to time-frequency filtering, a method for estimating the time- varying ‘transfer function’ of the vocal tract, which is somewhat analogous to ceptstral filtering generalized to the time-varying case. Natural speech examples are provided. The second question addressed is how to obtain a rich, symbolic description of the phonetically relevant features in these time-frequency energy surfaces, the so-called schematic spectrogram. Time-frequency ridges, the 2-D analog of spectral peaks, are one feature that is proposed. If non-oriented kernels are used for the energy representation, then the ridge tops can be identified, with zero-crossings in the inner product of the gradient vector and the direction of greatest downward curvature. If oriented kernels are used, the method can be generalized to give better orientation selectivity (e.g., at intersecting ridges) at the cost of poorer time-frequency locality. Many speech examples are given showing the performance for some traditionally difficult cases: semi- vowels and glides, nasalized vowels, consonant-vowel transitions, female speech, and imperfect transmission channels.


Author[s]: T. Poggio and C. Koch

Synapses That Compute Motion

June 1987

Biophysics of computation is a new field that attempts to characterize the role in information processing of the several biophysical mechanisms in neurons, synapses, and membranes that have been uncovered in recent years. In this article, we review a synaptic mechanism, based on the interaction between excitation and silent inhibition, that implements a veto-like operation. Synapses of this type may underlie direction selectivity to direction of motion in the vertebrate retina.


Author[s]: Robert C. Berwick

Principle-Based Parsing

June 1987

During the past few years, there has been much discussion of a shift from rule-based systems to principle-based systems for natural language processing. This paper outlines the major computational advantages of principle-based parsing, its differences from the usual rule-based approach, and surveys several existing principle-based parsing systems used for handling languages as diverse as Warlpiri, English, and Spanish, as well as language translation.


Author[s]: Ed Gamble and Tomaso Poggio

Visual Integration and Detection of Discontinuities: The Key Role of Intensity Edges

October 1987

Integration of several vision modules is likely to be one of the keys to the power and robustness of the human visual system. The problem of integrating early vision cues is also emerging as a central problem in current computer vision research. In this paper we suggest that integration is best performed at the location of discontinuities in early processes, such as discontinuities in image brightness, depth, motion, texture and color. Coupled Markov Random Field models, based on Bayes estimation techiques, can be used to combine vision modalities with their discontinuities. These models generate algorithms that map naturally onto parallel fine-grained architectures such as the Connection Machine. We derive a scheme to integrate intensity edges with stereo depth and motion field information and show results on synthetic and natural images. The use of intensity edges to integrate other visual cues and to help discover discontinuities emerges as a general and powerful principle.


Author[s]: Harry Voorhees

Finding Texture Boundaries in Images

June 1987

Texture provides one cue for identifying the physical cause of an intensity edge, such as occlusion, shadow, surface orientation or reflectance change. Marr, Julesz, and others have proposed that texture is represented by small lines or blobs, called 'textons' by Julesz [1981a], together with their attributes, such as orientation, elongation, and intensity. Psychophysical studies suggest that texture boundaries are perceived where distributions of attributes over neighborhoods of textons differ significantly. However, these studies, which deal with synthetic images, neglect to consider two important questions: How can these textons be extracted from images of natural scenes? And how, exactly, are texture boundaries then found? This thesis proposes answers to these questions by presenting an algorithm for computing blobs from natural images and a st