Web page: Achieved Deliverables


COG

Brief List
* Detecting Head Orientation
* Mimicry
* Distinguishing Animate from Non-Animate
* Joint Reference
* Simulated Musculature


Detecting Head Orientation:

Web Page: Achieved Deliverable:  
We have implemented and evaluated a system that detects the orientation of a person's head from as far as six meters away from the robot.  To accomplish this, we have implemented a multi-stage behavior.  Whenever the robot sees an item of interest, it moves its eyes and head to bring that object within the field of view of the foveal cameras.  A face finding algorithm based on skin color and shape is used to identify faces and a software zoom is used to capture as much information as possible.  The system then identifies a set of facial features (eyes and nose/mouth) and uses a model of human facial structure to identify the orientation of the person's head.  

Watch it in action: there is a video of this running on Cog's monitors, as well as an imitation video where head orientation matters

Refer to : Brian Scassellati. "Foundations for a Theory of Mind for a Humanoid
Robot",  Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, Cambridge, MA, PhD Thesis, June 2001.


Mimicry:

Web Page: Achieved Deliverable:  
Cog's torso was retrofitted with force sensing capabilities in order to implement body motion via virtual spring force control.  In addition, we developed a representational language for humanoid motor control inspired by the neurophysiological organizing principle of motor primitives. Both endeavors allowed Cog to broadly mimic the motions of a person with whom it interacts using its body or arms. In the arm imitation behavior, the robot continuously tracks many object trajectories.  A trajectory is selected on the basis of animacy and the attentional state of the instructor.  Motion trajectories are then converted from a visual representation to a motor representation which the robot can execute.The performance of this mimicry response was evaluated with naive human instructors. 

Related Publications:
Aaron Edsinger. "A Gestural Language for a Humanoid Robot", Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science,  Master's Thesis, Cambridge, MA, 2000.

Brian Scassellati. "Foundations for a Theory of Mind for a Humanoid
Robot",  Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, Cambridge, MA, PhD Thesis, June 2001.

See it in action:
A video clip of Cog's new force-control torso exhibiting virtual spring behavior. The ability to use virtual spring control on the torso allows for full body/arm integration and for safe human-robot interaction

This video clip shows an example of Cog mimicking the movement of a person. The visual attention system directs the robot to look and turn its head toward the person. Cog observes the movement of the person's hand, recognizes that movement as an animate stimulus, and responds by moving its own hand in a similar fashion.  

Video clip 3:We have also tested the performance of this mimicry response with naive human instructors. In this case, the subject gives the robot the American Sign Language gesture for "eat", which the robot mimics back at the person. Note that the robot has no understanding of the semantics of this gesture, it is merely mirroring the person's action.   


Distinguishing Animate from Inanimate

Web Page: Achieved Deliverable:
We have implemented a system that distinguishes between the movement
patterns of animate objects from those of inanimate objects.  This system
uses a multi-agent architecture to represent a set of naive rules of physics
that are drawn from experimental results on human subjects.  These naive
rules represent the effects of gravity, inertia, and other intuitive parts
of Newtonian mechanics.  We have evaluated this system by comparing the
results to human performance on classifying the movement of point-light
sources, and found the system to be more than 85% accurate on a test suite
of recorded real-world data.

Watch it in action: the video with the ball moving down  the inclined plane

Refer to : my IJCAI 2001 paper
Brian Scassellati. "Discriminating Animate from Inanimate Visual
Stimuli," to appear at the International Joint Conference on Artificial
Intelligence, Seattle, Washington, August 2001.


Joint Reference

Web Page: Achieved Deliverable:
Using its new 2-DOF hands that exploit series elastic actuators and rapid prototyping technology, Cog demonstrated basic grasping and gestures. The gestural ability was combined with models from human development for establishing joint reference, that is, for the robot to attend to the same object that an instructor is attending to.  Objects that are within the approximate attention range of the human instructor are made more salient to the robot.  Information from head orientation is the primary cue of attention in the instructor.

Related References:  Scaz's thesis
Watch it in action:
video-1: A video clip of Cog's new hand demonstrating various grasping behaviors. The 2 degree of freedom hands utilize series elastic actuators and rapid prototyping technology.
 video-2: This video clip demonstrates the simple ways that Cog interprets the intentions of the instructor. Note that unlike the other video clips, in this example, the instructor was given a specific sequence of tasks to perform in front of the robot. The instructor was asked to "get the robot's attention and then look over at the block". Cog responds by first fixating the instructor and then shifting its gaze to the block. The instructor was asked to again get the robot's attention and then to reach slowly for the block. Cog looks back at the instructor, observes the instructor moving toward the block, and interprets that the instructor might want the block. Although Cog has relatively little capabilities to assist the instructor in this case, we programmed the robot to attempt to reach for any target that the instructor became interested in.  (this should be the video of Kristi looking and reaching for the block)


Simulated Musculature

Web Page: Achieved Deliverable:
Cog's arm and body are controlled via simulated muscle-like elements that span multiple joints and operate independently.  Muscle strength and fatigue over time are modulated by a biochemical model. The muscle-like elements are inspired by real physiology and allow Cog to move with dynamics that are more human-like than conventional manipulator control. 

Related References:
Bryan Adams. "Learning Humanoid Arm Gestures". Working Notes - AAAI Spring Symposium Series: Learning Grounded Representations, Stanford, CA. March 26-28, 2001, pp. 1-3


Lazlo

Attentional System based on Space Variant Vision

Web Page: Achieved Deliverable:
Lazlo uses an attentional system based completely on space variant (in particular log-polar) vision. This allowed Lazlo to saccade and track with human-like smoothness, pace and accuracy. Algorithms for color processing, optic flow, and disparity computation were developed. The attentional software modules are the first layer of a more complicated system which will next incorporate the learning of object recognition, trajectory tracking, and naīve physics understanding during the natural interaction of the robot with the environment.


Related Reference:
Giorgio Metta, "An attentional system for a humanoid robot exploiting space variant vision". In submission to Humanoids 2001.


Kismet

Short List

* Vocabulary Management, 
* Head Pose Recognition, 
* Process Learning
* Face Recognition,


Vocabulary Management

 Web Page: Achieved Deliverable:

Kismet needs to acquire a vocabulary relevant to a human's purpose.  Towards this goal, first, we have implemented a command protocol for introducing vocabulary to Kismet. Second, we have developed an unsupervised mechanism for extracting candidate vocabulary items from natural continuous speech. Third, we have analyzed the speech used in teaching Kismet words in order to determine whether humans naturally modify their speech in ways that would enable better word learning by the robot.

Related Reports
    Paulina Varchavskaia, Paul Fitzpatrick, and Cynthia
    Breazeal.  "Characterizing and Processing Robot-Directed Speech".
    Submitted to the IEEE-RAS International Conference on Humanoid Robots
    2001, Tokyo, Japan.
      http://www.ai.mit.edu/people/paulfitz/pub/human2001-vocabulary.ps
      http://www.ai.mit.edu/people/paulfitz/pub/human2001-vocabulary.pdf

    Paul Fitzpatrick.  "From Word-spotting to OOV Modelling".
    Term paper for MIT course 6.345.
      http://www.ai.mit.edu/people/paulfitz/pub/paulfitz-oov.ps
      http://www.ai.mit.edu/people/paulfitz/pub/paulfitz-oov.pdf

[Paul's original version] For verbal tasking of a humanoid robot, we need to be able to adapt its vocabulary to the task at hand. We are tackling this problem from a number of directions.  First, a simple command protocol for   introducing vocabulary to the robot has been implemented.  Second, an unsupervised mechanism for extracting candidate vocabulary items from natural continuous speech has been developed.   Lastly, we have carried out a preliminary analysis of robot-directed   speech in the context of word teaching scenarios to determine whether there are other natural strategies we can realistically make use of.


Head Pose Estimation

Web Page: Achieved Deliverable:
We developed a fully automatic system for recovering the rigid components of head pose. The conventional approach of   tracking pose changes relative to a reference configuration can give high accuracy but is subject to drift.  In face-to-face interaction   with a robot, there are likely to be frequent presentations of the head in a close to frontal orientation, so we used that to make opportunistic corrections.  Tracking of pose was done in an intermediate mixed coordinate system chosen to minimize the impact of   errors in estimates of the 3D shape of the head being tracked.  This is vital for practical application to unknown users in cluttered conditions.

Related Publication:
Paul Fitzpatrick.  "Head Pose Estimation Without Manual Initialization".  Term paper for MIT course 6.892.  
http://www.ai.mit.edu/people/paulfitz/pub/paulfitz-headpose.ps
http://www.ai.mit.edu/people/paulfitz/pub/paulfitz-headpose.pdf

Watch it in action:
http://www.ai.mit.edu/people/paulfitz/pub/paulfitz-gridlock.mpg


Process Learning

Web Page: Achieved Deliverable:
Communicating a task to a robot will involve introducing it to actions and percepts peculiar to that task, and showing how these can be structured into the complete activity.  In this work, the structure of the task is communicated to the robot first.  Examples of the activity are presented, with any unfamiliar actions and percepts being accompanied by verbal annotation.  This allows the robot to identify the role these components need to play within the activity, using Augmented Markov Models. 

Watch it in action:
http://www.ai.mit.edu/people/paulfitz/pub/paulfitz-process.mov


Face Recognition

Web Page: Achieved Deliverable:
TR Recent Accomplishment
Related Publication: