CSAIL Publications

Technical Reports

Work Products

Historical Collections

horizontal line

CSAIL Digital Archive - Artificial Intelligence Laboratory Series
Publications 1500 through 1599

AI PublicationsLast update Sun Mar 19 05:05:02 2006

AITR-1500

Author[s]: Saed G. Younis

Asymptotically Zero Energy Computing Using Split-Level Charge Recovery Logic

June 1994

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1500.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1500.pdf

The dynamic power requirement of CMOS circuits is rapidly becoming a major concern in the design of personal information systems and large computers. In this work we present a number of new CMOS logic families, Charge Recovery Logic (CRL) as well as the much improved Split-Level Charge Recovery Logic (SCRL), within which the transfer of charge between the nodes occurs quasistatically. Operating quasistatically, these logic families have an energy dissipation that drops linearly with operating frequency, i.e., their power consumption drops quadratically with operating frequency as opposed to the linear drop of conventional CMOS. The circuit techniques in these new families rely on constructing an explicitly reversible pipelined logic gate, where the information necessary to recover the energy used to compute a value is provided by computing its logical inverse. Information necessary to uncompute the inverse is available from the subsequent inverse logic stage. We demonstrate the low energy operation of SCRL by presenting the results from the testing of the first fully quasistatic 8 x 8 multiplier chip (SCRL-1) employing SCRL circuit techniques.

AITR-1504

Author[s]: Robert Playter

Passive Dynamics in the Control of Gymnastic Maneuvers

March 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1504.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1504.pdf

The control of aerial gymnastic maneuvers is challenging because these maneuvers frequently involve complex rotational motion and because the performer has limited control of the maneuver during flight. A performer can influence a maneuver using a sequence of limb movements during flight. However, the same sequence may not produce reliable performances in the presence of off-nominal conditions. How do people compensate for variations in performance to reliably produce aerial maneuvers? In this report I explore the role that passive dynamic stability may play in making the performance of aerial maneuvers simple and reliable. I present a control strategy comprised of active and passive components for performing robot front somersaults in the laboratory. I show that passive dynamics can neutrally stabilize the layout somersault which involves an "inherently unstable" rotation about the intermediate principal axis. And I show that a strategy that uses open loop joint torques plus passive dynamics leads to more reliable 1 1/2 twisting front somersaults in simulation than a strategy that uses prescribed limb motion. Results are presented from laboratory experiments on gymnastic robots, from dynamic simulation of humans and robots, and from linear stability analyses of these systems.

AIM-1506

CBCL-104

Author[s]: Pawan Sinha

Reciprocal Interactions Between Motion and Form Perception

April 21, 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1506.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1506.pdf

The processes underlying the perceptual analysis of visual form are believed to have minimal interaction with those subserving the perception of visual motion (Livingstone and Hubel, 1987; Victor and Conte, 1990). Recent reports of functionally and anatomically segregated parallel streams in the primate visual cortex seem to support this hypothesis (Ungerlieder and Mishkin, 1982; VanEssen and Maunsell, 1983; Shipp and Zeki, 1985; Zeki and Shipp, 1988; De Yoe et al., 1994). Here we present perceptual evidence that is at odds with this view and instead suggests strong symmetric interactions between the form and motion processes. In one direction, we show that the introduction of specific static figural elements, say 'F', in a simple motion sequence biases an observer to perceive a particular motion field, say 'M'. In the reverse direction, the imposition of the same motion field 'M' on the original sequence leads the observer to perceive illusory static figural elements 'F'. A specific implication of these findings concerns the possible existence of (what we call) motion end-stopped units in the primate visual system. Such units might constitute part of a mechanism for signalling subjective occluding contours based on motion-field discontinuities.

AIM-1509

CBCL-108

Author[s]: Zoubin Ghahramani and Michael I. Jordan

Learning from Incomplete Data

January 24,1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1509.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1509.pdf

Real-world learning tasks often involve high- dimensional data sets with complex patterns of missing features. In this paper we review the problem of learning from incomplete data from two statistical perspectives---the likelihood-based and the Bayesian. The goal is two-fold: to place current neural network approaches to missing data within a statistical framework, and to describe a set of algorithms, derived from the likelihood-based framework, that handle clustering, classification, and function approximation from incomplete data in a principled and efficient manner. These algorithms are based on mixture modeling and make two distinct appeals to the Expectation-Maximization (EM) principle (Dempster, Laird, and Rubin 1977)-- -both for the estimation of mixture components and for coping with the missing data.

AIM-1510

CBCL-109

Author[s]: Margrit Betke and Nicholas Makris

Fast Object Recognition in Noisy Images Using Simulated Annealing

January 25, 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1510.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1510.pdf

A fast simulated annealing algorithm is developed for automatic object recognition. The normalized correlation coefficient is used as a measure of the match between a hypothesized object and an image. Templates are generated on-line during the search by transforming model images. Simulated annealing reduces the search time by orders of magnitude with respect to an exhaustive search. The algorithm is applied to the problem of how landmarks, for example, traffic signs, can be recognized by an autonomous vehicle or a navigating robot. The algorithm works well in noisy, real-world images of complicated scenes for model images with high information content.

AITR-1511

Author[s]: Ian Horswill

Specialization of Perceptual Processes

April 22, 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1511.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1511.pdf

In this report, I discuss the use of vision to support concrete, everyday activity. I will argue that a variety of interesting tasks can be solved using simple and inexpensive vision systems. I will provide a number of working examples in the form of a state-of-the-art mobile robot, Polly, which uses vision to give primitive tours of the seventh floor of the MIT AI Laboratory. By current standards, the robot has a broad behavioral repertoire and is both simple and inexpensive (the complete robot was built for less than $20,000 using commercial board-level components). The approach I will use will be to treat the structure of the agent's activity---its task and environment---as positive resources for the vision system designer. By performing a careful analysis of task and environment, the designer can determine a broad space of mechanisms which can perform the desired activity. My principal thesis is that for a broad range of activities, the space of applicable mechanisms will be broad enough to include a number mechanisms which are simple and economical. The simplest mechanisms that solve a given problem will typically be quite specialized to that problem. One thus worries that building simple vision systems will be require a great deal of {it ad-hoc} engineering that cannot be transferred to other problems. My second thesis is that specialized systems can be analyzed and understood in a principled manner, one that allows general lessons to be extracted from specialized systems. I will present a general approach to analyzing specialization through the use of transformations that provably improve performance. By demonstrating a sequence of transformations that derive a specialized system from a more general one, we can summarize the specialization of the former in a compact form that makes explicit the additional assumptions that it makes about its environment. The summary can be used to predict the performance of the system in novel environments. Individual transformations can be recycled in the design of future systems.

AIM-1512

CBCL-103

Author[s]: M. Poggio and T. Poggio

Cooperative Physics of Fly Swarms: An Emergent Behavior

April 11, 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1512.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1512.pdf

We have simulated the behavior of several artificial flies, interacting visually with each other. Each fly is described by a simple tracking system (Poggio and Reichardt, 1973; Land and Collett, 1974) which summarizes behavioral experiments in which individual flies fixate a target. Our main finding is that the interaction of theses implemodules gives rise to a variety of relatively complex behaviors. In particular, we observe a swarm-like behavior of a group of many artificial flies for certain reasonable ranges of our tracking system parameters.

AITR-1513

Author[s]: Ruth Bergman

Learning World Models in Environments with Manifest Causal Structure

May 5, 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1513.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1513.pdf

This thesis examines the problem of an autonomous agent learning a causal world model of its environment. Previous approaches to learning causal world models have concentrated on environments that are too "easy" (deterministic finite state machines) or too "hard" (containing much hidden state). We describe a new domain --- environments with manifest causal structure - -- for learning. In such environments the agent has an abundance of perceptions of its environment. Specifically, it perceives almost all the relevant information it needs to understand the environment. Many environments of interest have manifest causal structure and we show that an agent can learn the manifest aspects of these environments quickly using straightforward learning techniques. We present a new algorithm to learn a rule-based causal world model from observations in the environment. The learning algorithm includes (1) a low level rule-learning algorithm that converges on a good set of specific rules, (2) a concept learning algorithm that learns concepts by finding completely correlated perceptions, and (3) an algorithm that learns general rules. In addition this thesis examines the problem of finding a good expert from a sequence of experts. Each expert has an "error rate"; we wish to find an expert with a low error rate. However, each expert's error rate and the distribution of error rates are unknown. A new expert-finding algorithm is presented and an upper bound on the expected error rate of the expert is derived.

AIM-1514

CBCL-113

Author[s]: Partha Niyogi

Sequential Optimal Recovery: A Paradigm for Active Learning

May 12, 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1514.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1514.pdf

In most classical frameworks for learning from examples, it is assumed that examples are randomly drawn and presented to the learner. In this paper, we consider the possibility of a more active learner who is allowed to choose his/her own examples. Our investigations are carried out in a function approximation setting. In particular, using arguments from optimal recovery (Micchelli and Rivlin, 1976), we develop an adaptive sampling strategy (equivalent to adaptive approximation) for arbitrary approximation schemes. We provide a general formulation of the problem and show how it can be regarded as sequential optimal recovery. We demonstrate the application of this general formulation to two special cases of functions on the real line 1) monotonically increasing functions and 2) functions with bounded derivative. An extensive investigation of the sample complexity of approximating these functions is conducted yielding both theoretical and empirical results on test functions. Our theoretical results (stated insPAC-style), along with the simulations demonstrate the superiority of our active scheme over both passive learning as well as classical optimal recovery. The analysis of active function approximation is conducted in a worst-case setting, in contrast with other Bayesian paradigms obtained from optimal design (Mackay, 1992).

AIM-1515

CBCL-114

Author[s]: Partha Niyogi and Robert Berwick

A Dynamical Systems Model for Language Change

December 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1515.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1515.pdf

Formalizing linguists' intuitions of language change as a dynamical system, we quantify the time course of language change including sudden vs. gradual changes in languages. We apply the computer model to the historical loss of Verb Second from Old French to modern French, showing that otherwise adequate grammatical theories can fail our new evolutionary criterion.

AIM-1516

CBCL-115

Author[s]: Partha Niyogi and Robert Berwick

The Logical Problem of Language Change

December 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1516.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1516.pdf

This paper considers the problem of language change. Linguists must explain not only how languages are learned but also how and why they have evolved along certain trajectories and not others. While the language learning problem has focused on the behavior of individuals and how they acquire a particular grammar from a class of grammars ${cal G}$, here we consider a population of such learners and investigate the emergent, global population characteristics of linguistic communities over several generations. We argue that language change follows logically from specific assumptions about grammatical theories and learning paradigms. In particular, we are able to transform parameterized theories and memoryless acquisition algorithms into grammatical dynamical systems, whose evolution depicts a population's evolving linguistic composition. We investigate the linguistic and computational consequences of this model, showing that the formalization allows one to ask questions about diachronic that one otherwise could not ask, such as the effect of varying initial conditions on the resulting diachronic trajectories. From a more programmatic perspective, we give an example of how the dynamical system model for language change can serve as a way to distinguish among alternative grammatical theories, introducing a formal diachronic adequacy criterion for linguistic theories.

AIM-1517

CBCL-137

Author[s]: Douglas A. Jones, Robert C. Berwick, Franklin Cho, Zeeshan Khan, Karen T. Kohl, Naoyuki Nomura, Anand Radhakrishnan, Ulrich Sauerland and Brian Ulicny

Verb Classes and Alternations in Bangla, German, English, and Korean

May 6, 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1517.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1517.pdf

In this report, we investigate the relationship between the semantic and syntactic properties of verbs. Our work is based on the English Verb Classes and Alternations of (Levin, 1993). We explore how these classes are manifested in other languages, in particular, in Bangla, German, and Korean. Our report includes a survey and classification of several hundred verbs from these languages into the cross-linguistic equivalents of Levin's classes. We also explore ways in which our findings may be used to enhance WordNet in two ways: making the English syntactic information of WordNet more fine-grained, and making WordNet multilingual.

AIM-1518

CBCL-106

Author[s]: Pawan Sinha and Tomaso Poggio

View-Based Strategies for 3D Object Recognition

April 21, 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1518.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1518.pdf

A persistent issue of debate in the area of 3D object recognition concerns the nature of the experientially acquired object models in the primate visual system. One prominent proposal in this regard has expounded the use of object centered models, such as representations of the objects’ 3D structures in a coordinate frame independent of the viewing parameters [Marr and Nishihara, 1978]. In contrast to this is another proposal which suggests that the viewing parameters encountered during the learning phase might be inextricably linked to subsequent performance on a recognition task [Tarr and Pinker, 1989; Poggio and Edelman, 1990]. The ‘object model’, according to this idea, is simply a collection of the sample views encountered during training. Given that object centered recognition strategies have the attractive feature of leading to viewpoint independence, they have garnered much of the research effort in the field of computational vision. Furthermore, since human recognition performance seems remarkably robust in the face of imaging variations [Ellis et al., 1989], it has often been implicitly assumed that the visual system employs an object centered strategy. In the present study we examine this assumption more closely. Our experimental results with a class of novel 3D structures strongly suggest the use of a view-based strategy by the human visual system even when it has the opportunity of constructing and using object-centered models. In fact, for our chosen class of objects, the results seem to support a stronger claim: 3D object recognition is 2D view-based.

AIM-1520

CBCL-111

Author[s]: Michael Jordan and Lei Xu

On Convergence Properties of the EM Algorithm for Gaussian Mixtures

April 21, 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1520.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1520.pdf

"Expectation-Maximization'' (EM) algorithm and gradient-based approaches for maximum likelihood learning of finite Gaussian mixtures. We show that the EM step in parameter space is obtained from the gradient via a projection matrix $P$, and we provide an explicit expression for the matrix. We then analyze the convergence of EM in terms of special properties of $P$ and provide new results analyzing the effect that $P$ has on the likelihood surface. Based on these mathematical results, we present a comparative discussion of the advantages and disadvantages of EM and other algorithms for the learning of Gaussian mixture models.

AIM-1521

CBCL-112

Author[s]: Kah Kay Sung and Tomaso Poggio

Example Based Learning for View-Based Human Face Detection

January 24, 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1521.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1521.pdf

We present an example-based learning approach for locating vertical frontal views of human faces in complex scenes. The technique models the distribution of human face patterns by means of a few view-based "face'' and "non-face'' prototype clusters. At each image location, the local pattern is matched against the distribution-based model, and a trained classifier determines, based on the local difference measurements, whether or not a human face exists at the current image location. We provide an analysis that helps identify the critical components of our system.

AIM-1522

CBCL-110

Author[s]: David A. Cohn, Zoubin Ghahramani and Michael I. Jordan

Active Learning with Statistical Models

March 21, 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1522.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1522.pdf

For many types of learners one can compute the statistically 'optimal' way to select data. We review how these techniques have been used with feedforward neural networks. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate.

AIM-1523

Author[s]: Kenji Nagao and Eric Grimson

Recognizing 3D Object Using Photometric Invariant

April 22, 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1523.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1523.pdf

In this paper we describe a new efficient algorithm for recognizing 3D objects by combining photometric and geometric invariants. Some photometric properties are derived, that are invariant to the changes of illumination and to relative object motion with respect to the camera and/or the lighting source in 3D space. We argue that conventional color constancy algorithms can not be used in the recognition of 3D objects. Further we show recognition does not require a full constancy of colors, rather, it only needs something that remains unchanged under the varying light conditions sand poses of the objects. Combining the derived color invariants and the spatial constraints on the object surfaces, we identify corresponding positions in the model and the data space coordinates, using centroid invariance of corresponding groups of feature positions. Tests are given to show the stability and efficiency of our approach to 3D object recognition.

AITR-1524

Author[s]: Matthew M. Williamson

Series Elastic Actuators

September 7, 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1524.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1524.pdf

This thesis presents the design, construction, control and evaluation of a novel force controlled actuator. Traditional force controlled actuators are designed from the premise that "Stiffer is better''. This approach gives a high bandwidth system, prone to problems of contact instability, noise, and low power density. The actuator presented in this thesis is designed from the premise that "Stiffness isn't everything". The actuator, which incorporates a series elastic element, trades off achievable bandwidth for gains in stable, low noise force control, and protection against shock loads. This thesis reviews related work in robot force control, presents theoretical descriptions of the control and expected performance from a series elastic actuator, and describes the design of a test actuator constructed to gather performance data. Finally the performance of the system is evaluated by comparing the performance data to theoretical predictions.

AIM-1526

Author[s]: Kenji Nagao and Berthold Horn

Direct Object Recognition Using No Higher Than Second or Third Order Statistics of the Image

December 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1526.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1526.pdf

Novel algorithms for object recognition are described that directly recover the transformations relating the image to its model. Unlike methods fitting the typical conventional framework, these new methods do not require exhaustive search for each feature correspondence in order to solve for the transformation. Yet they allow simultaneous object identification and recovery of the transformation. Given hypothesized % potentially corresponding regions in the model and data (2D views) --- which are from planar surfaces of the 3D objects --- these methods allow direct compututation of the parameters of the transformation by which the data may be generated from the model. We propose two algorithms: one based on invariants derived from no higher than second and third order moments of the image, the other via a combination of the affine properties of geometrical and the differential attributes of the image. Empirical results on natural images demonstrate the effectiveness of the proposed algorithms. A sensitivity analysis of the algorithm is presented. We demonstrate in particular that the differential method is quite stable against perturbations --- although not without some error --- when compared with conventional methods. We also demonstrate mathematically that even a single point correspondence suffices, theoretically at least, to recover affine parameters via the differential method.

AITR-1527

Author[s]: Panayotis A. Skordos

Modeling Flue Pipes: Subsonic Flow, Lattice Boltzmann, and Parallel Distributed Computers

Arpil 21, 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1527.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1527.pdf

The problem of simulating the hydrodynamics and the acoustic waves inside wind musical instruments such as the recorder, the organ, and the flute is considered. The problem is attacked by developing suitable local- interaction algorithms and a parallel simulation system on a cluster of non- dedicated workstations. Physical measurements of the acoustic signal of various flue pipes show good agreement with the simulations. Previous attempts at this problem have been frustrated because the modeling of acoustic waves requires small integration time steps which make the simulation very compute-intensive. In addition, the simulation of subsonic viscous compressible flow at high Reynolds numbers is susceptible to slow-growing numerical instabilities which are triggered by high- frequency acoustic modes. The numerical instabilities are mitigated by employing suitable explicit algorithms: lattice Boltzmann method, compressible finite differences, and fourth-order artificial-viscosity filter. Further, a technique for accurate initial and boundary conditions for the lattice Boltzmann method is developed, and the second-order accuracy of the lattice Boltzmann method is demonstrated. The compute-intensive requirements are handled by developing a parallel simulation system on a cluster of non-dedicated workstations. The system achieves 80 percent parallel efficiency (speedup/processors) using 20 HP-Apollo workstations. The system is built on UNIX and TCP/IP communication routines, and includes automatic process migration from busy hosts to free hosts.

AITR-1529

Author[s]: Aparna Lakshmi Ratan

The Role of Fixation and Visual Attention in Object Recognition

July 21, 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1529.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1529.pdf

This research project is a study of the role of fixation and visual attention in object recognition. In this project, we build an active vision system which can recognize a target object in a cluttered scene efficiently and reliably. Our system integrates visual cues like color and stereo to perform figure/ground separation, yielding candidate regions on which to focus attention. Within each image region, we use stereo to extract features that lie within a narrow disparity range about the fixation position. These selected features are then used as input to an alignment-style recognition system. We show that visual attention and fixation significantly reduce the complexity and the false identifications in model-based recognition using Alignment methods. We also demonstrate that stereo can be used effectively as a figure/ground separator without the need for accurate camera calibration.

AIM-1530

Author[s]: Partha Niyogi and Robert C. Berwick

A Note of Zipf's Law, Natural Languages, and Noncoding DNA Regions

March 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1530.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1530.pdf

In Phys. Rev. Letters (73:2), Mantegna et al. conclude on the basis of Zipf rank frequency data that noncoding DNA sequence regions are more like natural languages than coding regions. We argue on the contrary that an empirical fit to Zipf"s "law" cannot be used as a criterion for similarity to natural languages. Although DNA is a presumably "organized system of signs" in Mandelbrot"s (1961) sense, and observation of statistical featurs of the sort presented in the Mantegna et al. paper does not shed light on the similarity between DNA's "gramar" and natural language grammars, just as the observation of exact Zipf-like behavior cannot distinguish between the underlying processes of tossing an M-sided die or a finite-state branching process.

AIM-1531

Author[s]: Thomas Vetter and Tomaso Poggio

Linear Object Classes and Image Synthesis from a Single Example Image

March 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1531.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1531.pdf

The need to generate new views of a 3D object from a single real image arises in several fields, including graphics and object recognition. While the traditional approach relies on the use of 3D models, we have recently introduced techniques that are applicable under restricted conditions but simpler. The approach exploits image transformations that are specific to the relevant object class and learnable from example views of other "prototypical" objects of the same class. In this paper, we introduce such a new technique by extending the notion of linear class first proposed by Poggio and Vetter. For linear object classes it is shown that linear transformations can be learned exactly from a basis set of 2D prototypical views. We demonstrate the approach on artificial objects and then show preliminary evidence that the technique can effectively "rotate" high- resolution face images from a single 2D view.

AIM-1532

Author[s]: Marco Fillo, Stephen W. Keckler, William J. Dally, Nicholas P. Carter, Andrew Chang, Yevgeny Gurevich and Whay S. Lee

The M-Machine Multicomputer

March 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1532.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1532.pdf

The M-Machine is an experimental multicomputer being developed to test architectural concepts motivated by the constraints of modern semiconductor technology and the demands of programming systems. The M- Machine computing nodes are connected with a 3-D mesh network; each node is a multithreaded processor incorporating 12 function units, on-chip cache, and local memory. The multiple function units are used to exploit both instruction-level and thread-level parallelism. A user accessible message passing system yields fast communication and synchronization between nodes. Rapid access to remote memory is provided transparently to the user with a combination of hardware and software mechanisms. This paper presents the architecture of the M-Machine and describes how its mechanisms maximize both single thread performance and overall system throughput.

AIM-1533

Author[s]: N.K. Logothetis, J. Pauls and T. Poggio

Spatial Reference Frames for Object Recognition: Tuning for Rotations in Depth

March 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1533.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1533.pdf

The inferior temporal cortex (IT) of monkeys is thought to play an essential role in visual object recognition. Inferotemporal neurons are known to respond to complex visual stimuli, including patterns like faces, hands, or other body parts. What is the role of such neurons in object recognition? The present study examines this question in combined psychophysical and electrophysiological experiments, in which monkeys learned to classify and recognize novel visual 3D objects. A population of neurons in IT were found to respond selectively to such objects that the monkeys had recently learned to recognize. A large majority of these cells discharged maximally for one view of the object, while their response fell off gradually as the object was rotated away from the neuron"s preferred view. Most neurons exhibited orientation-dependent responses also during view-plane rotations. Some neurons were found tuned around two views of the same object, while a very small number of cells responded in a view- invariant manner. For five different objects that were extensively used during the training of the animals, and for which behavioral performance became view-independent, multiple cells were found that were tuned around different views of the same object. No selective responses were ever encountered for views that the animal systematically failed to recognize. The results of our experiments suggest that neurons in this area can develop a complex receptive field organization as a consequence of extensive training in the discrimination and recognition of objects. Simple geometric features did not appear to account for the neurons" selective responses. These findings support the idea that a population of neurons -- each tuned to a different object aspect, and each showing a certain degree of invariance to image transformations -- may, as an assembly, encode complex 3D objects. In such a system, several neurons may be active for any given vantage point, with a single unit acting like a blurred template for a limited neighborhood of a single view.

AIM-1534

Author[s]: Panayotis A. Skordos

Aeroacoustics on Non-Dedicated Workstations

April 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1534.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1534.pdf

The simulation of subsonic aeroacoustic problems such as the flow-generated sound of wind instruments is well suited for parallel computing on a cluster of non-dedicated workstations. Simulations are demonstrated which employ 20 non-dedicated Hewlett-Packard workstations (HP9000/715), and achieve comparable performance on this problem as a 64-node CM-5 dedicated supercomputer with vector units. The success of the present approach depends on the low communication requirements of the problem (low communication to computation ratio) which arise from the coarse-grain decomposition of the problem and the use of local-interaction methods. Many important problems may be suitable for this type of parallel computing including computer vision, circuit simulation, and other subsonic flow problems.

AIM-1535

Author[s]: Panayotis Skordos and Gerald Jay Sussman

Comparison Between Subsonic Flow Simulation and Physical Measurements of Flue Pipes

April 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1535.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1535.pdf

Direct simulations of wind musical instruments using the compressible Navier Stokes equations have recently become possible through the use of parallel computing and through developments in numerical methods. As a first demonstration, the flow of air and the generation of musical tones inside a soprano recorder are simulated numerically. In addition, physical measurements are made of the acoustic signal generated by the recorder at different blowing speeds. The comparison between simulated and physically measured behavior is encouraging and points towards ways of improving the simulations.

AIM-1536

Author[s]: David Beymer and Tomaso Poggio

Face Recognition from One Example View

September 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1536.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1536.pdf

If we are provided a face database with only one example view per person, is it possible to recognize new views of them under a variety of different poses, especially views rotated in depth from the original example view? We investigate using prior knowledge about faces plus each single example view to generate virtual views of each person, or views of the face as seen from different poses. Prior knowledge of faces is represented in an example-based way, using 2D views of a prototype face seen rotating in depth. The synthesized virtual views are evaluated as example views in a view-based approach to pose-invariant face recognition. They are shown to improve the recognition rate over the scenario where only the single real view is used.

AIM-1537

Author[s]: David Beymer

Vectorizing Face Images by Interpreting Shape and Texture Computations

September 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1537.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1537.pdf

The correspondence problem in computer vision is basically a matching task between two or more sets of features. In this paper, we introduce a vectorized image representation, which is a feature-based representation where correspondence has been established with respect to a reference image. This representation has two components: (1) shape, or (x, y) feature locations, and (2) texture, defined as the image grey levels mapped onto the standard reference image. This paper explores an automatic technique for "vectorizing" face images. Our face vectorizer alternates back and forth between computation steps for shape and texture, and a key idea is to structure the two computations so that each one uses the output of the other. A hierarchical coarse-to-fine implementation is discussed, and applications are presented to the problems of facial feature detection and registration of two arbitrary faces.

AITR-1541

Author[s]: Elmer S. Hung

Parameter Estimation in Chaotic Systems

April 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1541.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1541.pdf

This report examines how to estimate the parameters of a chaotic system given noisy observations of the state behavior of the system. Investigating parameter estimation for chaotic systems is interesting because of possible applications for high-precision measurement and for use in other signal processing, communication, and control applications involving chaotic systems. In this report, we examine theoretical issues regarding parameter estimation in chaotic systems and develop an efficient algorithm to perform parameter estimation. We discover two properties that are helpful for performing parameter estimation on non-structurally stable systems. First, it turns out that most data in a time series of state observations contribute very little information about the underlying parameters of a system, while a few sections of data may be extraordinarily sensitive to parameter changes. Second, for one-parameter families of systems, we demonstrate that there is often a preferred direction in parameter space governing how easily trajectories of one system can "shadow'" trajectories of nearby systems. This asymmetry of shadowing behavior in parameter space is proved for certain families of maps of the interval. Numerical evidence indicates that similar results may be true for a wide variety of other systems. Using the two properties cited above, we devise an algorithm for performing parameter estimation. Standard parameter estimation techniques such as the extended Kalman filter perform poorly on chaotic systems because of divergence problems. The proposed algorithm achieves accuracies several orders of magnitude better than the Kalman filter and has good convergence properties for large data sets.

AIM-1542

Author[s]: D. McAllester, P. Van Henlenryck and T. Kapur

Three Cuts for Accelerated Interval Propagation

May 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1542.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1542.pdf

This paper addresses the problem of nonlinear multivariate root finding. In an earlier paper we described a system called Newton which finds roots of systems of nonlinear equations using refinements of interval methods. The refinements are inspired by AI constraint propagation techniques. Newton is competative with continuation methods on most benchmarks and can handle a variety of cases that are infeasible for continuation methods. This paper presents three "cuts" which we believe capture the essential theoretical ideas behind the success of Newton. This paper describes the cuts in a concise and abstract manner which, we believe, makes the theoretical content of our work more apparent. Any implementation will need to adopt some heuristic control mechanism. Heuristic control of the cuts is only briefly discussed here.

AITR-1543

Author[s]: Brian Scott Eberman

Contact Sensing: A Sequential Decision Approach to Sensing Manipulation Contact

May 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1543.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1543.pdf

This paper describes a new statistical, model-based approach to building a contact state observer. The observer uses measurements of the contact force and position, and prior information about the task encoded in a graph, to determine the current location of the robot in the task configuration space. Each node represents what the measurements will look like in a small region of configuration space by storing a predictive, statistical, measurement model. This approach assumes that the measurements are statistically block independent conditioned on knowledge of the model, which is a fairly good model of the actual process. Arcs in the graph represent possible transitions between models. Beam Viterbi search is used to match measurement history against possible paths through the model graph in order to estimate the most likely path for the robot. The resulting approach provides a new decision process that can be use as an observer for event driven manipulation programming. The decision procedure is significantly more robust than simple threshold decisions because the measurement history is used to make decisions. The approach can be used to enhance the capabilities of autonomous assembly machines and in quality control applications.

AITR-1544

Author[s]: J.P. Mellor

Enhanced Reality Visualization in a Surgical Environment

January 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1544.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1544.pdf

Enhanced reality visualization is the process of enhancing an image by adding to it information which is not present in the original image. A wide variety of information can be added to an image ranging from hidden lines or surfaces to textual or iconic data about a particular part of the image. Enhanced reality visualization is particularly well suited to neurosurgery. By rendering brain structures which are not visible, at the correct location in an image of a patient's head, the surgeon is essentially provided with X-ray vision. He can visualize the spatial relationship between brain structures before he performs a craniotomy and during the surgery he can see what's under the next layer before he cuts through. Given a video image of the patient and a three dimensional model of the patient's brain the problem enhanced reality visualization faces is to render the model from the correct viewpoint and overlay it on the original image. The relationship between the coordinate frames of the patient, the patient's internal anatomy scans and the image plane of the camera observing the patient must be established. This problem is closely related to the camera calibration problem. This report presents a new approach to finding this relationship and develops a system for performing enhanced reality visualization in a surgical environment. Immediately prior to surgery a few circular fiducials are placed near the surgical site. An initial registration of video and internal data is performed using a laser scanner. Following this, our method is fully automatic, runs in nearly real-time, is accurate to within a pixel, allows both patient and camera motion, automatically corrects for changes to the internal camera parameters (focal length, focus, aperture, etc.) and requires only a single image.

AITR-1545

Author[s]: James A. Stuart Fiske

Thread Scheduling Mechanisms for Multiple-Context Parallel Processors

June 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1545.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1545.pdf

Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory references or processor synchronization operations. One way of tolerating this latency is to use a processor with multiple hardware contexts that can rapidly switch to executing another thread of computation whenever a long latency operation occurs, thus increasing processor utilization by overlapping computation with communication. Although multiple contexts are effective for tolerating latency, this effectiveness can be limited by memory and network bandwidth, by cache interference effects among the multiple contexts, and by critical tasks sharing processor resources with less critical tasks. This thesis presents techniques that increase the effectiveness of multiple contexts by intelligently scheduling threads to make more efficient use of processor pipeline, bandwidth, and cache resources. This thesis proposes thread prioritization as a fundamental mechanism for directing the thread schedule on a multiple-context processor. A priority is assigned to each thread either statically or dynamically and is used by the thread scheduler to decide which threads to load in the contexts, and to decide which context to switch to on a context switch. We develop a multiple-context model that integrates both cache and network effects, and shows how thread prioritization can both maintain high processor utilization, and limit increases in critical path runtime caused by multithreading. The model also shows that in order to be effective in bandwidth limited applications, thread prioritization must be extended to prioritize memory requests. We show how simple hardware can prioritize the running of threads in the multiple contexts, and the issuing of requests to both the local memory and the network. Simulation experiments show how thread prioritization is used in a variety of applications. Thread prioritization can improve the performance of synchronization primitives by minimizing the number of processor cycles wasted in spinning and devoting more cycles to critical threads. Thread prioritization can be used in combination with other techniques to improve cache performance and minimize cache interference between different working sets in the cache. For applications that are critical path limited, thread prioritization can improve performance by allowing processor resources to be devoted preferentially to critical threads. These experimental results show that thread prioritization is a mechanism that can be used to implement a wide range of scheduling policies.

AITR-1546

Author[s]: Yoky Matsuoka

Embodiment and Manipulation Learning Process for a Humanoid Hand

May 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1546.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1546.pdf

Babies are born with simple manipulation capabilities such as reflexes to perceived stimuli. Initial discoveries by babies are accidental until they become coordinated and curious enough to actively investigate their surroundings. This thesis explores the development of such primitive learning systems using an embodied light-weight hand with three fingers and a thumb. It is self- contained having four motors and 36 exteroceptor and proprioceptor sensors controlled by an on-palm microcontroller. Primitive manipulation is learned from sensory inputs using competitive learning, back-propagation algorithm and reinforcement learning strategies. This hand will be used for a humanoid being developed at the MIT Artificial Intelligence Laboratory.

AIM-1547

Author[s]: Michael R. Blair, Natalya Cohen, David M. LaMacchia and Brian K. Zuzga

MIT SchMUSE: Class-Based Remote Delegation in a Capricious Distributed Environment

February 1993

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1547.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1547.pdf

MIT SchMUSE (pronounced "shmooz") is a concurrent, distributed, delegation-based object-oriented interactive environment with persistent storage. It is designed to run in a "capricious" network environment, where servers can migrate from site to site and can regularly become unavailable. Our design introduces a new form of unique identifiers called "globally unique tickets" that provide globally unique time/space stamps for objects and classes without being location specific. Object location is achieved by a distributed hierarchical lazy lookup mechanism that we call "realm resolution." We also introduce a novel mechanism called "message deferral" for enhanced reliability in the face of remote delegation. We conclude with a comparison to related work and a projection of future work on MIT SchMUSE.

AITR-1548

Author[s]: Paul A. Viola

Alignment by Maximization of Manual Information

March 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1548.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1548.pdf

A new information-theoretic approach is presented for finding the pose of an object in an image. The technique does not require information about the surface properties of the object, besides its shape, and is robust with respect to variations of illumination. In our derivation, few assumptions are made about the nature of the imaging process. As a result the algorithms are quite general and can foreseeably be used in a wide variety of imaging situations. Experiments are presented that demonstrate the approach registering magnetic resonance (MR) images with computed tomography (CT) images, aligning a complex 3D object model to real scenes including clutter and occlusion, tracking a human head in a video sequence and aligning a view-based 2D object model to real images. The method is based on a formulation of the mutual information between the model and the image called EMMA. As applied here the technique is intensity-based, rather than feature-based. It works well in domains where edge or gradient-magnitude based methods have difficulty, yet it is more robust than traditional correlation. Additionally, it has an efficient implementation that is based on stochastic approximation. Finally, we will describe a number of additional real- world applications that can be solved efficiently and reliably using EMMA. EMMA can be used in machine learning to find maximally informative projections of high-dimensional data. EMMA can also be used to detect and correct corruption in magnetic resonance images (MRI).

AIM-1549

Author[s]: Roberto Brunelli and Tomaso Poggio

Template Matching: Matched Spatial Filters and Beyond

October 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1549.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1549.pdf

Template matching by means of cross-correlation is common practice in pattern recognition. However, its sensitivity to deformations of the pattern and the broad and unsharp peaks it produces are significant drawbacks. This paper reviews some results on how these shortcomings can be removed. Several techniques (Matched Spatial Filters, Synthetic Discriminant Functions, Principal Components Projections and Reconstruction Residuals) are reviewed and compared on a common task: locating eyes in a database of faces. New variants are also proposed and compared: least squares Discriminant Functions and the combined use of projections on eigenfunctions and the corresponding reconstruction residuals. Finally, approximation networks are introduced in an attempt to improve filter design by the introduction of nonlinearity.

AIM-1550

Author[s]: T.D. Alter and Ronen Basri

Extracting Salient Curves from Images: An Analysis of the Saliency Network

August 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1550.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1550.pdf

The Saliency Network proposed by Shashua and Ullman is a well-known approach to the problem of extracting salient curves from images while performing gap completion. This paper analyzes the Saliency Network. The Saliency Network is attractive for several reasons. First, the network generally prefers long and smooth curves over short or wiggly ones. While computing saliencies, the network also fills in gaps with smooth completions and tolerates noise. Finally, the network is locally connected, and its size is proportional to the size of the image. Nevertheless, our analysis reveals certain weaknesses with the method. In particular, we show cases in which the most salient element does not lie on the perceptually most salient curve. Furthermore, in some cases the saliency measure changes its preferences when curves are scaled uniformly. Also, we show that for certain fragmented curves the measure prefers large gaps over a few small gaps of the same total size. In addition, we analyze the time complexity required by the method. We show that the number of steps required for convergence in serial implementations is quadratic in the size of the network, and in parallel implementations is linear in the size of the network. We discuss problems due to coarse sampling of the range of possible orientations. We show that with proper sampling the complexity of the network becomes cubic in the size of the network. Finally, we consider the possibility of using the Saliency Network for grouping. We show that the Saliency Network recovers the most salient curve efficiently, but it has problems with identifying any salient curve other than the most salient one.

AIM-1551

Author[s]: Jacob Katzenelson and Aharon Unikovski

A Network Charge-Orineted MOS Transistor Model

August 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1551.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1551.pdf

The MOS transistor physical model as described in [3] is presented here as a network model. The goal is to obtain an accurate model, suitable for simulation, free from certain problems reported in the literature [13], and conceptually as simple as possible. To achieve this goal the original model had to be extended and modified. The paper presents the derivation of the network model from physical equations, including the corrections which are required for simulation and which compensate for simplifications introduced in the original physical model. Our intrinsic MOS model consists of three nonlinear voltage-controlled capacitors and a dependent current source. The charges of the capacitors and the current of the current source are functions of the voltages $V_{gs}$, $V_{bs}$, and $V_{ds}$. The complete model consists of the intrinsic model plus the parasitics. The apparent simplicity of the model is a result of hiding information in the characteristics of the nonlinear components. The resulted network model has been checked by simulation and analysis. It is shown that the network model is suitable for simulation: It is defined for any value of the voltages; the functions involved are continuous and satisfy Lipschitz conditions with no jumps at region boundaries; Derivatives have been computed symbolically and are available for use by the Newton-Raphson method. The model"s functions can be measured from the terminals. It is also shown that small channel effects can be included in the model. Higher frequency effects can be modeled by using a network consisting of several sections of the basic lumped model. Future plans include a detailed comparison of the network model with models such as SPICE level 3 and a comparison of the multi- section higher frequency model with experiments.

AIM-1552

Author[s]: David A. Cohn

Minimizing Statistical Bias with Queries

September 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1552.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1552.pdf

I describe an exploration criterion that attempts to minimize the error of a learner by minimizing its estimated squared bias. I describe experiments with locally-weighted regression on two simple kinematics problems, and observe that this "bias-only" approach outperforms the more common "variance-only" exploration approach, even in the presence of noise.

AIM-1553

Author[s]: N.K. Logothetis and D.A. Leopold

On the Physiology of Bistable Percepts

November 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1553.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1553.pdf

Binocular rivalry refers to the alternating perceptions experienced when two dissimilar patterns are stereoscopically viewed. To study the neural mechanism that underlies such competitive interactions, single cells were recorded in the visual areas V1, V2, and V4, while monkeys reported the perceived orientation of rivaling sinusoidal grating patterns. A number of neurons in all areas showed alternating periods of excitation and inhibition that correlated with the perceptual dominance and suppression of the cell"s preferred orientation. The remaining population of cells were not influenced by whether or not the optimal stimulus orientation was perceptually suppressed. Response modulation during rivalry was not correlated with cell attributes such as monocularity, binocularity, or disparity tuning. These results suggest that the awareness of a visual pattern during binocular rivalry arises through interactions between neurons at different levels of visual pathways, and that the site of suppression is unlikely to correspond to a particular visual area, as often hypothesized on the basis of psychophysical observations. The cell-types of modulating neurons and their overwhelming preponderance in higher rather than in early visual areas also suggests -- together with earlier psychophysical evidence -- the possibility of a common mechanism underlying rivalry as well as other bistable percepts, such as those experienced with ambiguous figures.

AIM-1554

Author[s]: D.A. Leopold, J.C. Fitzgibbons and N.K. Logothetis

The Role of Attention in Binocular Rivalry as Revealed Through Optokinetic Nystagmus

November 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1554.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1554.pdf

When stimuli presented to the two eyes differ considerably, stable binocular fusion fails, and the subjective percept alternates between the two monocular images, a phenomenon known as binocular rivalry. The influence of attention over this perceptual switching has long been studied, and although there is evidence that attention can affect the alternation rate, its role in the overall dynamics of the rivalry process remains unclear. The present study investigated the relationship between the attention paid to the rivalry stimulus, and the dynamics of the perceptual alternations. Specifically, the temporal course of binocular rivalry was studied as the subjects performed difficult nonvisual and visual concurrent tasks, directing their attention away from the rivalry stimulus. Periods of complete perceptual dominance were compared for the attended condition, where the subjects reported perceptual changes, and the unattended condition, where one of the simultaneous tasks was performed. During both the attended and unattended conditions, phases of rivalry dominance were obtained by analyzing the subject"s optokinetic nystagmus recorded by an electrooculogram, where the polarity of the nystagmus served as an objective indicator of the perceived direction of motion. In all cases, the presence of a difficult concurrent task had little or no effect on the statistics of the alternations, as judged by two classic tests of rivalry, although the overall alternation rate showed a small but significant increase with the concurrent task. It is concluded that the statistical patterns of rivalry alternations are not governed by attentional shifts or decision-making on the part of the subject.

AIM-1555

Author[s]: Thomas Marill

The Three-Dimensional Interpretation of a Class of Simple Line-Drawings

October 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1555.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1555.pdf

We provide a theory of the three-dimensional interpretation of a class of line-drawings called p-images, which are interpreted by the human vision system as parallelepipeds ("boxes"). Despite their simplicity, p-images raise a number of interesting vision questions: *Why are p-images seen as three-dimensional objects? Why not just as flatimages? *What are the dimensions and pose of the perceived objects? *Why are some p-images interpreted as rectangular boxes, while others are seen as skewed, even though there is no obvious distinction between the images? *When p-images are rotated in three dimensions, why are the image-sequences perceived as distorting objects---even though structure-from-motion would predict that rigid objects would be seen? *Why are some three-dimensional parallelepipeds seen as radically different when viewed from different viewpoints? We show that these and related questions can be answered with the help of a single mathematical result and an associated perceptual principle. An interesting special case arises when there are right angles in the p-image. This case represents a singularity in the equations and is mystifying from the vision point of view. It would seem that (at least in this case) the vision system does not follow the ordinary rules of geometry but operates in accordance with other (and as yet unknown) principles.

AIM-1556

CBCL-127

Author[s]: David C. Somers, Emanuel V. Todorov, Athanassios G. Siapas and Mriganka Sur

Vector-Based Integration of Local and Long-Range Information in Visual Cortex

January 18, 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1556.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1556.pdf

Integration of inputs by cortical neurons provides the basis for the complex information processing performed in the cerebral cortex. Here, we propose a new analytic framework for understanding integration within cortical neuronal receptive fields. Based on the synaptic organization of cortex, we argue that neuronal integration is a systems--level process better studied in terms of local cortical circuitry than at the level of single neurons, and we present a method for constructing self-contained modules which capture (nonlinear) local circuit interactions. In this framework, receptive field elements naturally have dual (rather than the traditional unitary influence since they drive both excitatory and inhibitory cortical neurons. This vector-based analysis, in contrast to scalarsapproaches, greatly simplifies integration by permitting linear summation of inputs from both "classical" and "extraclassical" receptive field regions. We illustrate this by explaining two complex visual cortical phenomena, which are incompatible with scalar notions of neuronal integration.

AIM-1558

CBCL-129

Author[s]: Carl de Marcken

The Unsupervised Acquisition of a Lexicon from Continuous Speech

January 18, 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1558.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1558.pdf

We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that have stymied previous grammar-induction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from raw speech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency.

AIM-1559

CBCL-128

Author[s]: Michael J. Jones, Tomaso Poggio

Model-Based Matching of Line Drawings by Linear Combinations of Prototypes

January 18, 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1559.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1559.pdf

We describe a technique for finding pixelwise correspondences between two images by using models of objects of the same class to guide the search. The object models are 'learned' from example images (also called prototypes) of an object class. The models consist of a linear combination ofsprototypes. The flow fields giving pixelwise correspondences between a base prototype and each of the other prototypes must be given. A novel image of an object of the same class is matched to a model by minimizing an error between the novel image and the current guess for the closest modelsimage. Currently, the algorithm applies to line drawings of objects. An extension to real grey level images is discussed.

AIM-1560

CBCL-129

Author[s]: Tommi S. Jaakkola, Lawrence K. Saul and Michael I. Jordan

Fast Learning by Bounding Likelihoods in Sigmoid Type Belief Networks

February 9, 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1560.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1560.pdf

Sigmoid type belief networks, a class of probabilistic neural networks, provide a natural framework for compactly representing probabilistic information in a variety of unsupervised and supervised learning problems. Often the parameters used in these networks need to be learned from examples. Unfortunately, estimating the parameters via exact probabilistic calculations (i.e, the EM-algorithm) is intractable even for networks with fairly small numbers of hidden units. We propose to avoid the infeasibility of the E step by bounding likelihoods instead of computing them exactly. We introduce extended and complementary representations for these networks and show that the estimation of the network parameters can be made fast (reduced to quadratic optimization) by performing the estimation in either of the alternative domains. The complementary networks can be used for continuous density estimation as well.

AIM-1561

CBCL-130

Author[s]: Zoubin Ghahramani and Michael I. Jordan

Factorial Hidden Markov Models

February 9, 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1561.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1561.pdf

We present a framework for learning in hidden Markov models with distributed state representations. Within this framework, we derive a learning algorithm based on the Expectation--Maximization (EM) procedure for maximum likelihood estimation. Analogous to the standard Baum-Welch update rules, the M-step of our algorithm is exact and can be solved analytically. However, due to the combinatorial nature of the hidden state representation, the exact E-step is intractable. A simple and tractable mean field approximation is derived. Empirical results on a set of problems suggest that both the mean field approximation and Gibbs sampling are viable alternatives to the computationally expensive exact algorithm.

AIM-1562

CBCL-131

Author[s]: Michael I. Jordan and Christopher M. Bishop

Neural Networks

March 13, 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1562.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1562.pdf

We present an overview of current research on artificial neural networks, emphasizing a statistical perspective. We view neural networks as parameterized graphs that make probabilistic assumptions about data, and view learning algorithms as methods for finding parameter values that look probable in the light of the data. We discuss basic issues in representation and learning, and treat some of the practical issues that arise in fitting networks to data. We also discuss links between neural networks and the general formalism of graphical models.

AITR-1563

Author[s]: John Bryant Morrell

Parallel Coupled Micro-Macro Actuators

January 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1563.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1563.pdf

This thesis presents a new actuator system consisting of a micro-actuator and a macro- actuator coupled in parallel via a compliant transmission. The system is called the Parallel Coupled Micro-Macro Actuator, or PaCMMA. In this system, the micro-actuator is capable of high bandwidth force control due to its low mass and direct-drive connection to the output shaft. The compliant transmission of the macro-actuator reduces the impedance (stiffness) at the output shaft and increases the dynamic range of force. Performance improvement over single actuator systems was expected in force control, impedance control, force distortion and reduction of transient impact forces. A set of quantitative measures is proposed and the actuator system is evaluated against them: Force Control Bandwidth, Position Bandwidth, Dynamic Range, Impact Force, Impedance ("Backdriveability'"), Force Distortion and Force Performance Space. Several theoretical performance limits are derived from the saturation limits of the system. A control law is proposed and control system performance is compared to the theoretical limits. A prototype testbed was built using permanenent magnet motors and an experimental comparison was performed between this actuator concept and two single actuator systems. The following performance was observed: Force bandwidth of 56Hz, Torque Dynamic Range of 800:1, Peak Torque of 1040mNm, Minimum Torque of 1.3mNm. Peak Impact Force was reduced by an order of magnitude. Distortion at small amplitudes was reduced substantially. Backdriven impedance was reduced by 2-3 orders of magnitude. This actuator system shows promise for manipulator design as well as psychophysical tests of human performance.

AIM-1564

Author[s]: Jonathan A. Rees

A Security Kernel Based on the Lambda-Calculus

March 13, 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1564.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1564.pdf

Cooperation between independent agents depends upon establishing adegree of security. Each of the cooperating agents needs assurance that the cooperation will not endanger resources of value to that agent. In a computer system, a computational mechanism can assure safe cooperation among the system's users by mediating resource access according to desired security policy. Such a mechanism, which is called a security kernel, lies at the heart of many operating systems and programming environments.The report describes Scheme 48, a programming environment whose design is guided by established principles of operating system security. Scheme 48's security kernel is small, consisting of the call- by-value $lambda$-calculus with a few simple extensions to support abstract data types, object mutation, and access to hardware resources. Each agent (user or subsystem) has a separate evaluation environment that holds objects representing privileges granted to that agent. Because environments ultimately determine availability of object references, protection and sharing can be controlled largely by the way in which environments are constructed. I will describe experience with Scheme 48 that shows how it serves as a robust and flexible experimental platform. Two successful applications of Scheme 48 are the programming environment for the Cornell mobile robots, where Scheme 48 runs with no (other) operating system support; and a secure multi- user environment that runs on workstations.

AIM-1565

CBCL-132

Author[s]: Padhraic Smyth, David Heckerman and Michael Jordan

Probabilistic Independence Networks for Hidden Markov Probability Models

March 13, 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1565.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1565.pdf

Graphical techniques for modeling the dependencies of randomvariables have been explored in a variety of different areas includingstatistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics.Formalisms for manipulating these models have been developedrelatively independently in these research communities. In this paper weexplore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independencenetworks (PINs). The paper contains a self-contained review of the basic principles of PINs.It is shown that the well- known forward-backward (F-B) and Viterbialgorithms for HMMs are special cases of more general inference algorithms forarbitrary PINs. Furthermore, the existence of inference and estimationalgorithms for more general graphical models provides a set of analysistools for HMM practitioners who wish to explore a richer class of HMMstructures.Examples of relatively complex models to handle sensorfusion and coarticulationin speech recognitionare introduced and treated within the graphical model framework toillustrate the advantages of the general approach.

AITR-1566

Author[s]: Tina Kapur

Segmentation of Brain Tissue from Magnetic Resonance Images

January 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1566.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1566.pdf

Segmentation of medical imagery is a challenging problem due to the complexity of the images, as well as to the absence of models of the anatomy that fully capture the possible deformations in each structure. Brain tissue is a particularly complex structure, and its segmentation is an important step for studies in temporal change detection of morphology, as well as for 3D visualization in surgical planning. In this paper, we present a method for segmentation of brain tissue from magnetic resonance images that is a combination of three existing techniques from the Computer Vision literature: EM segmentation, binary morphology, and active contour models. Each of these techniques has been customized for the problem of brain tissue segmentation in a way that the resultant method is more robust than its components. Finally, we present the results of a parallel implementation of this method on IBM's supercomputer Power Visualization System for a database of 20 brain scans each with 256x256x124 voxels and validate those against segmentations generated by neuroanatomy experts.

AIM-1567

Author[s]: Marina Meila and Michael I. Jordan

Learning Fine Motion by Markov Mixtures of Experts

November 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1567.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1567.pdf

Compliant control is a standard method for performing fine manipulation tasks, like grasping and assembly, but it requires estimation of the state of contact between the robot arm and the objects involved. Here we present a method to learn a model of the movement from measured data. The method requires little or no prior knowledge and the resulting model explicitly estimates the state of contact. The current state of contact is viewed as the hidden state variable of a discrete HMM. The control dependent transition probabilities between states are modeled as parametrized functions of the measurement We show that their parameters can be estimated from measurements concurrently with the estimation of the parameters of the movement in each state of contact. The learning algorithm is a variant of the EM procedure. The E step is computed exactly; solving the M step exactly would require solving a set of coupled nonlinear algebraic equations in the parameters. Instead, gradient ascent is used to produce an increase in likelihood.

AIM-1568

Author[s]: Philip N. Sabes and Michael I. Jordan

Reinforcement Learning by Probability Matching

Jaunary 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1568.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1568.pdf

We present a new algorithm for associative reinforcement learning. The algorithm is based upon the idea of matching a network's output probability with a probability distribution derived from the environment"s reward signal. This Probability Matching algorithm is shown to perform faster and be less susceptible to local minima than previously existing algorithms. We use Probability Matching to train mixture of experts networks, an architecture for which other reinforcement learning rules fail to converge reliably on even simple problems. This architecture is particularly well suited for our algorithm as it can compute arbitrarily complex functions yet calculation of the output probability is simple.

AITR-1569

Author[s]: Deniz Yuret

From Genetic Algorithms to Efficient Organization

May 1994

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1569.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1569.pdf

The work described in this thesis began as an inquiry into the nature and use of optimization programs based on "genetic algorithms." That inquiry led, eventually, to three powerful heuristics that are broadly applicable in gradient-ascent programs: First, remember the locations of local maxima and restart the optimization program at a place distant from previously located local maxima. Second, adjust the size of probing steps to suit the local nature of the terrain, shrinking when probes do poorly and growing when probes do well. And third, keep track of the directions of recent successes, so as to probe preferentially in the direction of most rapid ascent. These algorithms lie at the core of a novel optimization program that illustrates the power to be had from deploying them together. The efficacy of this program is demonstrated on several test problems selected from a variety of fields, including De Jong's famous test-problem suite, the traveling salesman problem, the problem of coordinate registration for image guided surgery, the energy minimization problem for determining the shape of organic molecules, and the problem of assessing the structure of sedimentary deposits using seismic data.

AIM-1570

Author[s]: Lawrence K. Saul, Tommi Jaakkola and Michael I. Jordan

Mean Field Theory for Sigmoid Belief Networks

August 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1570.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1570.pdf

We develop a mean field theory for sigmoid belief networks based on ideas from statistical mechanics. Our mean field theory provides a tractable approximation to the true probability distribution in these networks; it also yields a lower bound on the likelihood of evidence. We demonstrate the utility of this framework on a benchmark problem in statistical pattern recognition -- the classification of handwritten digits.

AIM-1571

Author[s]: Tommi S. Jaakkola and Michael I. Jordan

Computing Upper and Lower Bounds on Likelihoods in Intractable Networks

March 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1571.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1571.pdf

We present techniques for computing upper and lower bounds on the likelihoods of partial instantiations of variables in sigmoid and noisy-OR networks. The bounds determine confidence intervals for the desired likelihoods and become useful when the size of the network (or clique size) precludes exact computations. We illustrate the tightness of the obtained bounds by numerical experiments.

AITR-1572

Author[s]: Kah-Kay Sung

Learning and Example Selection for Object and Pattern Detection

March 13, 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1572.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1572.pdf

This thesis presents a learning based approach for detecting classes of objects and patterns with variable image appearance but highly predictable image boundaries. It consists of two parts. In part one, we introduce our object and pattern detection approach using a concrete human face detection example. The approach first builds a distribution-based model of the target pattern class in an appropriate feature space to describe the target's variable image appearance. It then learns from examples a similarity measure for matching new patterns against the distribution-based target model. The approach makes few assumptions about the target pattern class and should therefore be fairly general, as long as the target class has predictable image boundaries. Because our object and pattern detection approach is very much learning-based, how well a system eventually performs depends heavily on the quality of training examples it receives. The second part of this thesis looks at how one can select high quality examples for function approximation learning tasks. We propose an {em active learning} formulation for function approximation, and show for three specific approximation function classes, that the active example selection strategy learns its target with fewer data samples than random sampling. We then simplify the original active learning formulation, and show how it leads to a tractable example selection paradigm, suitable for use in many object and pattern detection problems.

AITR-1573

Author[s]: Thomas F. Stahovich

SketchIT: A Sketch Interpretation Tool for Conceptual Mechanical Design

March 13, 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1573.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1573.pdf

We describe a program called SketchIT capable of producing multiple families of designs from a single sketch. The program is given a rough sketch (drawn using line segments for part faces and icons for springs and kinematic joints) and a description of the desired behavior. The sketch is "rough" in the sense that taken literally, it may not work. From this single, perhaps flawed sketch and the behavior description, the program produces an entire family of working designs. The program also produces design variants, each of which is itself a family of designs. SketchIT represents each family of designs with a "behavior ensuring parametric model" (BEP-Model), a parametric model augmented with a set of constraints that ensure the geometry provides the desired behavior. The construction of the BEP-Model from the sketch and behavior description is the primary task and source of difficulty in this undertaking. SketchIT begins by abstracting the sketch to produce a qualitative configuration space (qc- space) which it then uses as its primary representation of behavior. SketchIT modifies this initial qc-space until qualitative simulation verifies that it produces the desired behavior. SketchIT's task is then to find geometries that implement this qc-space. It does this using a library of qc-space fragments. Each fragment is a piece of parametric geometry with a set of constraints that ensure the geometry implements a specific kind of boundary (qcs- curve) in qc-space. SketchIT assembles the fragments to produce the BEP-Model. SketchIT produces design variants by mapping the qc-space to multiple implementations, and by transforming rotating parts to translating parts and vice versa.

AITR-1574

Author[s]: David Beymer

Pose-Invariant Face Recognition Using Real and Virtual Views

March 28, 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1574.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1574.pdf

The problem of automatic face recognition is to visually identify a person in an input image. This task is performed by matching the input face against the faces of known people in a database of faces. Most existing work in face recognition has limited the scope of the problem, however, by dealing primarily with frontal views, neutral expressions, and fixed lighting conditions. To help generalize existing face recognition systems, we look at the problem of recognizing faces under a range of viewpoints. In particular, we consider two cases of this problem: (i) many example views are available of each person, and (ii) only one view is available per person, perhaps a driver's license or passport photograph. Ideally, we would like to address these two cases using a simple view-based approach, where a person is represented in the database by using a number of views on the viewing sphere. While the view-based approach is consistent with case (i), for case (ii) we need to augment the single real view of each person with synthetic views from other viewpoints, views we call 'virtual views'. Virtual views are generated using prior knowledge of face rotation, knowledge that is 'learned' from images of prototype faces. This prior knowledge is used to effectively rotate in depth the single real view available of each person. In this thesis, I present the view- based face recognizer, techniques for synthesizing virtual views, and experimental results using real and virtual views in the recognizer.

AIM-1575

Author[s]: Kenneth Yip and Gerald Jay Sussman

A Computational Model for the Acquisition and Use of Phonological Knowledge

March 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1575.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1575.pdf

Does knowledge of language consist of symbolic rules? How do children learn and use their linguistic knowledge? To elucidate these questions, we present a computational model that acquires phonological knowledge from a corpus of common English nouns and verbs. In our model the phonological knowledge is encapsulated as boolean constraints operating on classical linguistic representations of speech sounds in term of distinctive features. The learning algorithm compiles a corpus of words into increasingly sophisticated constraints. The algorithm is incremental, greedy, and fast. It yields one-shot learning of phonological constraints from a few examples. Our system exhibits behavior similar to that of young children learning phonological knowledge. As a bonus the constraints can be interpreted as classical linguistic rules. The computational model can be implemented by a surprisingly simple hardware mechanism. Our mechanism also sheds light on a fundamental AI question: How are signals related to symbols?

AIM-1576

Author[s]: Olin Shivers

Supporting Dynamic Languages on the Java Virtual Machine

April 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1576.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1576.pdf

In this note, I propose two extensions to the Java virtual machine (or VM) to allow dynamic languages such as Dylan, Scheme and Smalltalk to be efficiently implemented on the VM. These extensions do not affect the performance of pure Java programs on the machine. The first extension allows for efficient encoding of dynamic data; the second allows for efficient encoding of language-specific computational elements.

AITR-1577

Author[s]: Ignacio Sean McQuirk

An Analog VLSI Chip for Estimating the Focus of Expansion

August 21, 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1577.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1577.pdf

For applications involving the control of moving vehicles, the recovery of relative motion between a camera and its environment is of high utility. This thesis describes the design and testing of a real- time analog VLSI chip which estimates the focus of expansion (FOE) from measured time-varying images. Our approach assumes a camera moving through a fixed world with translational velocity; the FOE is the projection of the translation vector onto the image plane. This location is the point towards which the camera is moving, and other points appear to be expanding outward from. By way of the camera imaging parameters, the location of the FOE gives the direction of 3-D translation. The algorithm we use for estimating the FOE minimizes the sum of squares of the differences at every pixel between the observed time variation of brightness and the predicted variation given the assumed position of the FOE. This minimization is not straightforward, because the relationship between the brightness derivatives depends on the unknown distance to the surface being imaged. However, image points where brightness is instantaneously constant play a critical role. Ideally, the FOE would be at the intersection of the tangents to the iso- brightness contours at these "stationary" points. In practice, brightness derivatives are hard to estimate accurately given that the image is quite noisy. Reliable results can nevertheless be obtained if the image contains many stationary points and the point is found that minimizes the sum of squares of the perpendicular distances from the tangents at the stationary points. The FOE chip calculates the gradient of this least-squares minimization sum, and the estimation is performed by closing a feedback loop around it. The chip has been implemented using an embedded CCD imager for image acquisition and a row-parallel processing scheme. A 64 x 64 version was fabricated in a 2um CCD/ BiCMOS process through MOSIS with a design goal of 200 mW of on-chip power, a top frame rate of 1000 frames/second, and a basic accuracy of 5%. A complete experimental system which estimates the FOE in real time using real motion and image scenes is demonstrated.

AITR-1579

Author[s]: Brian A. LaMacchia

Internet Fish

August 1, 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1579.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1579.pdf

I have invented "Internet Fish," a novel class of resource-discovery tools designed to help users extract useful information from the Internet. Internet Fish (IFish) are semi- autonomous, persistent information brokers; users deploy individual IFish to gather and refine information related to a particular topic. An IFish will initiate research, continue to discover new sources of information, and keep tabs on new developments in that topic. As part of the information-gathering process the user interacts with his IFish to find out what it has learned, answer questions it has posed, and make suggestions for guidance. Internet Fish differ from other Internet resource discovery systems in that they are persistent, personal and dynamic. As part of the information-gathering process IFish conduct extended, long-term conversations with users as they explore. They incorporate deep structural knowledge of the organization and services of the net, and are also capable of on-the-fly reconfiguration, modification and expansion. Human users may dynamically change the IFish in response to changes in the environment, or IFish may initiate such changes itself. IFish maintain internal state, including models of its own structure, behavior, information environment and its user; these models permit an IFish to perform meta-level reasoning about its own structure. To facilitate rapid assembly of particular IFish I have created the Internet Fish Construction Kit. This system provides enabling technology for the entire class of Internet Fish tools; it facilitates both creation of new IFish as well as additions of new capabilities to existing ones. The Construction Kit includes a collection of encapsulated heuristic knowledge modules that may be combined in mix-and-match fashion to create a particular IFish; interfaces to new services written with the Construction Kit may be immediately added to "live" IFish. Using the Construction Kit I have created a demonstration IFish specialized for finding World-Wide Web documents related to a given group of documents. This "Finder" IFish includes heuristics that describe how to interact with the Web in general, explain how to take advantage of various public indexes and classification schemes, and provide a method for discovering similarity relationships among documents.

AIM-1580

CBCL-138

Author[s]: Bruno A. Olshausen

Learning Linear, Sparse, Factorial Codes

December 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1580.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1580.pdf

In previous work (Olshausen & Field 1996), an algorithm was described for learning linear sparse codes which, when trained on natural images, produces a set of basis functions that are spatially localized, oriented, and bandpass (i.e., wavelet-like). This note shows how the algorithm may be interpreted within a maximum-likelihood framework. Several useful insights emerge from this connection: it makes explicit the relation to statistical independence (i.e., factorial coding), it shows a formal relationship to the algorithm of Bell and Sejnowski (1995), and it suggests how to adapt parameters that were previously fixed.

AITR-1581

Author[s]: Jerry E. Pratt

Virtual Model Control of a Biped Walking Robot

December 1995

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1581.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1581.pdf

The transformation from high level task specification to low level motion control is a fundamental issue in sensorimotor control in animals and robots. This thesis develops a control scheme called virtual model control which addresses this issue. Virtual model control is a motion control language which uses simulations of imagined mechanical components to create forces, which are applied through joint torques, thereby creating the illusion that the components are connected to the robot. Due to the intuitive nature of this technique, designing a virtual model controller requires the same skills as designing the mechanism itself. A high level control system can be cascaded with the low level virtual model controller to modulate the parameters of the virtual mechanisms. Discrete commands from the high level controller would then result in fluid motion. An extension of Gardner's Partitioned Actuator Set Control method is developed. This method allows for the specification of constraints on the generalized forces which each serial path of a parallel mechanism can apply. Virtual model control has been applied to a bipedal walking robot. A simple algorithm utilizing a simple set of virtual components has successfully compelled the robot to walk eight consecutive steps.

AITR-1582

Author[s]: Ann L. Torres

Virtual Model Control of a Hexapod Walking Robot

December 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1582.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1582.pdf

Since robots are typically designed with an individual actuator at each joint, the control of these systems is often difficult and non- intuitive. This thesis explains a more intuitive control scheme called Virtual Model Control. This thesis also demonstrates the simplicity and ease of this control method by using it to control a simulated walking hexapod. Virtual Model Control uses imagined mechanical components to create virtual forces, which are applied through the joint torques of real actuators. This method produces a straightforward means of controlling joint torques to produce a desired robot behavior. Due to the intuitive nature of this control scheme, the design of a virtual model controller is similar to the design of a controller with basic mechanical components. The ease of this control scheme facilitates the use of a high level control system which can be used above the low level virtual model controllers to modulate the parameters of the imaginary mechanical components. In order to apply Virtual Model Control to parallel mechanisms, a solution to the force distribution problem is required. This thesis uses an extension of Gardner`s Partitioned Force Control method which allows for the specification of constrained degrees of freedom. This virtual model control technique was applied to a simulated hexapod robot. Although the hexapod is a highly non-linear, parallel mechanism, the virtual models allowed text-book control solutions to be used while the robot was walking. Using a simple linear control law, the robot walked while simultaneously balancing a pendulum and tracking an object.

AIM-1583

CBCL-139

Author[s]: Michael J. Jones and Tomaso Poggio

Model-Based Matching by Linear Combinations of Prototypes

December 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1583.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1583.pdf

We describe a method for modeling object classes (such as faces) using 2D example images and an algorithm for matching a model to a novel image. The object class models are "learned'' from example images that we call prototypes. In addition to the images, the pixelwise correspondences between a reference prototype and each of the other prototypes must also be provided. Thus a model consists of a linear combination of prototypical shapes and textures. A stochastic gradient descent algorithm is used to match a model to a novel image by minimizing the error between the model and the novel image. Example models are shown as well as example matches to novel images. The robustness of the matching algorithm is also evaluated. The technique can be used for a number of applications including the computation of correspondence between novel images of a certain known class, object recognition, image synthesis and image compression.

AIM-1584

Author[s]: Ujjaval Y. Desai, Marcelo M. Mizuki, Ichiro Masaki and Berthold K.P. Horn

Edge and Mean Based Image Compression

November 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1584.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1584.pdf

In this paper, we present a static image compression algorithm for very low bit rate applications. The algorithm reduces spatial redundancy present in images by extracting and encoding edge and mean information. Since the human visual system is highly sensitive to edges, an edge-based compression scheme can produce intelligible images at high compression ratios. We present good quality results for facial as well as textured, 256~x~256 color images at 0.1 to 0.3 bpp. The algorithm described in this paper was designed for high performance, keeping hardware implementation issues in mind. In the next phase of the project, which is currently underway, this algorithm will be implemented in hardware, and new edge- based color image sequence compression algorithms will be developed to achieve compression ratios of over 100, i.e., less than 0.12 bpp from 12 bpp. Potential applications include low power, portable video telephones.

AITR-1585

Author[s]: Miguel Hall

Prototype of a Configurable Web-Based Assessment System

June 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1585.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1585.pdf

The MIT Prototype Educational Assessment System provides subjects and courses at MIT with the ability to perform online assessment. The system includes polices to handle harassment and electronic "flaming" while protecting privacy. Within these frameworks, individual courses and subjects can make their own policy decisions about such matters as to when assessments can occur, who can submit assessments, and how anonymous assessments are. By allowing assessment to take place continually and allowing both students and staff to participate, the system can provide a forum for the online discussion of subjects. Even in the case of scheduled assessments, the system can provide advantages over end-of-term assessment, since the scheduled assessments can occur several times during the semester, allowing subjects to identify and adjust those areas that could use improvement. Subjects can also develop customized questionnaires, perhaps in response to previous assessments, to suit their needs.

AITR-1586

Author[s]: Andre DeHon

Reconfigurable Architectures for General-Purpose Computing

September 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1586.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1586.pdf

General-purpose computing devices allow us to (1) customize computation after fabrication and (2) conserve area by reusing expensive active circuitry for different functions in time. We define RP-space, a restricted domain of the general-purpose architectural space focussed on reconfigurable computing architectures. Two dominant features differentiate reconfigurable from special- purpose architectures and account for most of the area overhead associated with RP devices: (1) instructions which tell the device how to behave, and (2) flexible interconnect which supports task dependent dataflow between operations. We can characterize RP- space by the allocation and structure of these resources and compare the efficiencies of architectural points across broad application characteristics. Conventional FPGAs fall at one extreme end of this space and their efficiency ranges over two orders of magnitude across the space of application characteristics. Understanding RP-space and its consequences allows us to pick the best architecture for a task and to search for more robust design points in the space. Our DPGA, a fine- grained computing device which adds small, on-chip instruction memories to FPGAs is one such design point. For typical logic applications and finite- state machines, a DPGA can implement tasks in one-third the area of a traditional FPGA. TSFPGA, a variant of the DPGA which focuses on heavily time- switched interconnect, achieves circuit densities close to the DPGA, while reducing typical physical mapping times from hours to seconds. Rigid, fabrication-time organization of instruction resources significantly narrows the range of efficiency for conventional architectures. To avoid this performance brittleness, we developed MATRIX, the first architecture to defer the binding of instruction resources until run-time, allowing the application to organize resources according to its needs. Our focus MATRIX design point is based on an array of 8-bit ALU and register- file building blocks interconnected via a byte- wide network. With today's silicon, a single chip MATRIX array can deliver over 10 Gop/s (8-bit ops). On sample image processing tasks, we show that MATRIX yields 10-20x the computational density of conventional processors. Understanding the cost structure of RP-space helps us identify these intermediate architectural points and may provide useful insight more broadly in guiding our continual search for robust and efficient general-purpose computing structures.

AITR-1587

Author[s]: Partha Niyogi

The Informational Complexity of Learning from Examples

September 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1587.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1587.pdf

This thesis attempts to quantify the amount of information needed to learn certain tasks. The tasks chosen vary from learning functions in a Sobolev space using radial basis function networks to learning grammars in the principles and parameters framework of modern linguistic theory. These problems are analyzed from the perspective of computational learning theory and certain unifying perspectives emerge.

AIM-1588

Author[s]: Andrew Justin Blumberg

Parallel Function Application on a DNA Substrate

December 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1588.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1588.pdf

In this paper I present a new model that employs a biological (specifically DNA - based) substrate for performing computation. Specifically, I describe strategies for performing parallel function application in the DNA-computing models described by Adelman, Cai et. al., and Liu et. al. Employing only DNA operations which can presently be performed, I discuss some direct algorithms for computing a variety of useful mathematical functions on DNA, culminating in an algorithm for minimizing an arbitrary continuous function. In addition, computing genetic algorithms on a DNA substrate is briefly discussed.

AIM-1589

Author[s]: Andrew Justin Blumberg

General Purpose Parallel Computation on a DNA Substrate

December 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1589.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1589.pdf

In this paper I describe and extend a new DNA computing paradigm introduced in Blumberg for building massively parallel machines in the DNA-computing models described by Adelman, Cai et. al., and Liu et. al. Employing only DNA operations which have been reported as successfully performed, I present an implementation of a Connection Machine, a SIMD (single-instruction multiple-data) parallel computer as an illustration of how to apply this approach to building computers in this domain (and as an implicit demonstration of PRAM equivalence). This is followed with a description of how to implement a MIMD (multiple-instruction multiple-data) parallel machine. The implementations described herein differ most from existing models in that they employ explicit communication between processing elements (and hence strands of DNA).

AITR-1590

Author[s]: Andrew A. Berlin

Towards Intelligent Structures: Active Control of Buckling

May 1994

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1590.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1590.pdf

The buckling of compressively-loaded members is one of the most important factors limiting the overall strength and stability of a structure. I have developed novel techniques for using active control to wiggle a structural element in such a way that buckling is prevented. I present the results of analysis, simulation, and experimentation to show that buckling can be prevented through computer- controlled adjustment of dynamical behavior.sI have constructed a small-scale railroad-style truss bridge that contains compressive members that actively resist buckling through the use of piezo-electric actuators. I have also constructed a prototype actively controlled column in which the control forces are applied by tendons, as well as a composite steel column that incorporates piezo-ceramic actuators that are used to counteract buckling. Active control of buckling allows this composite column to support 5.6 times more load than would otherwise be possible.sThese techniques promise to lead to intelligent physical structures that are both stronger and lighter than would otherwise be possible.

AIM-1591

Author[s]: Paul Viola

Complex Feature Recognition: A Bayesian Approach for Learning to Recognize Objects

November 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1591.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1591.pdf

We have developed a new Bayesian framework for visual object recognition which is based on the insight that images of objects can be modeled as a conjunction of local features. This framework can be used to both derive an object recognition algorithm and an algorithm for learning the features themselves. The overall approach, called complex feature recognition or CFR, is unique for several reasons: it is broadly applicable to a wide range of object types, it makes constructing object models easy, it is capable of identifying either the class or the identity of an object, and it is computationally efficient-- requiring time proportional to the size of the image. Instead of a single simple feature such as an edge, CFR uses a large set of complex features that are learned from experience with model objects. The response of a single complex feature contains much more class information than does a single edge. This significantly reduces the number of possible correspondences between the model and the image. In addition, CFR takes advantage of a type of image processing called 'oriented energy'. Oriented energy is used to efficiently pre-process the image to eliminate some of the difficulties associated with changes in lighting and pose.

AIM-1592

CBCL-140

Author[s]: Theodoros Evgeniou

Image Based Rendering Using Algebraic Techniques

November 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1592.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1592.pdf

This paper presents an image-based rendering system using algebraic relations between different views of an object. The system uses pictures of an object taken from known positions. Given three such images it can generate "virtual'' ones as the object would look from any position near the ones that the two input images were taken from. The extrapolation from the example images can be up to about 60 degrees of rotation. The system is based on the trilinear constraints that bind any three view so fan object. As a side result, we propose two new methods for camera calibration. We developed and used one of them. We implemented the system and tested it on real images of objects and faces. We also show experimentally that even when only two images taken from unknown positions are given, the system can be used to render the object from other view points as long as we have a good estimate of the internal parameters of the camera used and we are able to find good correspondence between the example images. In addition, we present the relation between these algebraic constraints and a factorization method for shape and motion estimation. As a result we propose a method for motion estimation in the special case of orthographic projection.

AIM-1593

Author[s]: J.P. Mellor, Seth Teller and Tomas Lozano-Perez

Dense Depth Maps from Epipolar Images

November 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1593.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1593.pdf

Recovering three-dimensional information from two-dimensional images is the fundamental goal of stereo techniques. The problem of recovering depth (three- dimensional information) from a set of images is essentially the correspondence problem: Given a point in one image, find the corresponding point in each of the other images. Finding potential correspondences usually involves matching some image property. If the images are from nearby positions, they will vary only slightly, simplifying the matching process. Once a correspondence is known, solving for the depth is simply a matter of geometry. Real images are composed of noisy, discrete samples, therefore the calculated depth will contain error. This error is a function of the baseline or distance between the images. Longer baselines result in more precise depths. This leads to a conflict: short baselines simplify the matching process, but produce imprecise results; long baselines produce precise results, but complicate the matching process. In this paper, we present a method for generating dense depth maps from large sets (1000's) of images taken from arbitrary positions. Long baseline images improve the accuracy. Short baseline images and the large number of images greatly simplifies the correspondence problem, removing nearly all ambiguity. The algorithm presented is completely local and for each pixel generates an evidence versus depth and surface normal distribution. In many cases, the distribution contains a clear and distinct global maximum. The location of this peak determines the depth and its shape can be used to estimate the error. The distribution can also be used to perform a maximum likelihood fit of models directly to the images. We anticipate that the ability to perform maximum likelihood estimation from purely local calculations will prove extremely useful in constructing three dimensional models from large sets of images.

AIM-1594

Author[s]: Gideon P. Stein and Amnon Shashua

Direct Methods for Estimation of Structure and Motion from Three Views

December 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1594.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1594.pdf

We describe a new direct method for estimating structure and motion from image intensities of multiple views. We extend the direct methods of Horn- and-Weldon to three views. Adding the third view enables us to solve for motion, and compute a dense depth map of the scene, directly from image spatio -temporal derivatives in a linear manner without first having to find point correspondences or compute optical flow. We describe the advantages and limitations of this method which are then verified through simulation and experiments with real images.

AIM-1595

Author[s]: Gideon P. Stein

Lens Distortion Calibration Using Point Correspondences

December 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1595.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1595.pdf

This paper describes a new method for lens distortion calibration using only point correspondences in multiple views, without the need to know either the 3D location of the points or the camera locations. The standard lens distortion model is a model of the deviations of a real camera from the ideal pinhole or projective camera model.Given multiple views of a set of corresponding points taken by ideal pinhole cameras there exist epipolar and trilinear constraints among pairs and triplets of these views. In practice, due to noise in the feature detection and due to lens distortion these constraints do not hold exactly and we get some error. The calibration is a search for the lens distortion parameters that minimize this error. Using simulation and experimental results with real images we explore the properties of this method. We describe the use of this method with the standard lens distortion model, radial and decentering, but it could also be used with any other parametric distortion models. Finally we demonstrate that lens distortion calibration improves the accuracy of 3D reconstruction.

AITR-1596

Author[s]: J. Kenneth Salisbury and Mandayam A. Srinivasan (editors)

The Proceedings of the First PHANToM User's Group Workshop

December 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1596.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1596.pdf

These proceedings summarize the results of the First PHANToM User's Group Workshop held September 27-30, 1996 MIT. The goal of the workshop was to bring together a group of active users of the PHANToM Haptic Interface to discuss the scientific and engineering challenges involved in bringing haptics into widespread use, and to explore the future possibilities of this exciting technology. With over 50 attendees and 25 presentations the workshop provided the first large forum for users of a common haptic interface to share results and engage in collaborative discussions. Short papers from the presenters are contained herein and address the following topics: Research Effort Overviews, Displays and Effects, Applications in Teleoperation and Training, Tools for Simulated Worlds and, Data Visualization.

AIM-1598

CBCL-141

Author[s]: Joerg C. Lemm

Prior Information and Generalized Questions

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1598.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1598.pdf

In learning problems available information is usually divided into two categories: examples of function values (or training data) and prior information (e.g. a smoothness constraint). This paper 1.) studies aspects on which these two categories usually differ, like their relevance for generalization and their role in the loss function, 2.) presents a unifying formalism, where both types of information are identified with answers to generalized questions, 3.) shows what kind of generalized information is necessary to enable learning, 4.) aims to put usual training data and prior information on a more equal footing by discussing possibilities and variants of measurement and control for generalized questions, including the examples of smoothness and symmetries, 5.) reviews shortly the measurement of linguistic concepts based on fuzzy priors, and principles to combine preprocessors, 6.) uses a Bayesian decision theoretic framework, contrasting parallel and inverse decision problems, 7.) proposes, for problems with non--approximation aspects, a Bayesian two step approximation consisting of posterior maximization and a subsequent risk minimization, 8.) analyses empirical risk minimization under the aspect of nonlocal information 9.) compares the Bayesian two step approximation with empirical risk minimization, including their interpretations of Occam's razor, 10.) formulates examples of stationarity conditions for the maximum posterior approximation with nonlocal and nonconvex priors, leading to inhomogeneous nonlinear equations, similar for example to equations in scattering theory in physics. In summary, this paper focuses on the dependencies between answers to different questions. Because not training examples alone but such dependencies enable generalization, it emphasizes the need of their empirical measurement and control and of a more explicit treatment in theory.

AIM-1599

CBCL-142

Author[s]: B. Schoelkopf, K. Sung, C. Burges, F. Girosi, P. Niyogi, T. Poggio and V. Vapnik

Comparing Support Vector Machines with Gaussian Kernels to Radial Basis Function Classifiers

December 1996

ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1599.ps

ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1599.pdf

The Support Vector (SV) machine is a novel type of learning machine, based on statistical learning theory, which contains polynomial classifiers, neural networks, and radial basis function (RBF) networks as special cases. In the RBF case, the SV algorithm automatically determines centers, weights and threshold such as to minimize an upper bound on the expected test error. The present study is devoted to an experimental comparison of these machines with a classical approach, where the centers are determined by $k$-- means clustering and the weights are found using error backpropagation. We consider three machines, namely a classical RBF machine, an SV machine with Gaussian kernel, and a hybrid system with the centers determined by the SV method and the weights trained by error backpropagation. Our results show that on the US postal service database of handwritten digits, the SV machine achieves the highest test accuracy, followed by the hybrid approach. The SV approach is thus not only theoretically well--founded, but also superior in a practical application.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu