Human-Robot Dynamic Social Interaction

Human-Robot Dynamic Social Interaction

NTT9904-01

Progress Report: July 1, 2002ãDecember 31, 2002

Rodney Brooks

Project Overview

NTT researchers are interested in the question of whether a physical robot produces a more direct emotional coupling with human beings than does a computer generated graphical image of a similar robot. At MIT we are building a robot that has human-like facial expressions and shoulder and neck gestures, and that perceives human motion and facial expressions. This is coupled to an emotional system so that the person and the robot naturally follow normal human communication social dynamics. This robot will be installed at the NTT Communications Science Laboratories in Kyoto where the response of human subjects will be measured and compared to their response to a graphical interface.

Progress Through December 2002

In 2000 and 2001 we delivered a preliminary robot to NTT, updated the design of the ultimate robot, called Kismet, resolved complex mechanical issues surrounding Kismet and fabricated Kismet components. Software infrastructure for Kismet was developed. In the first half of 2002 Kismet was completely assembled and implemented then transferred to CSL in Kyoto. Handoff was facilitated by a visit by two NTT CSL researchers: Kazuhiko Shinozawa and Futoshi Naya in June for 2 weeks. The NTT researchers communicated a description of their experimental methodology and current progress. MIT researchers teamed to familiarize the NTT researchers with trouble shooting, mechanical operations and software control of Kismet.

In the latter half of 2002, NTT prepared Kismet for experimental use. Kismet was fitted with a face shell that will play an integral role in the nature of social responses between Kismet and human subjects. MIT research has been to continue to develop software that can run on the NTT Kismet. We have used Cog as the experimental platform at MIT; NTT Kismet and Cog share the same software structure above their low level motor and frame grabbing code. The work at MIT in the second half of 2002 involved the development of an egocentric map.. An egocentric map keeps track of the locations of objects relative to the robot's body. The frame of reference for the objects in the world is centered within the robot and principally derived from its visual and manipulative field. This allows the robot to look away from one part of the scene, and still know what to expect when it looks back. This capability will be useful for NTT Kismet so that it can have more sophisticated interactions with people, knowing where they are even when it looks away.

We have also endowed Cog with a speech processing system where acoustic input can either come from a microphone array on the body of the robot, or a portable microphone held by a human. This is a practical arrangement since it means that under normal operation it is possible to speak to the robot by simply standing near it, but if the recognition rate degrades due to a large amount of noise in the background, it is still possible to operate the robot with the portable microphone. To further improve recognition performance, a low vocabulary size is used, with provision to easily introduce new words. In practice, relatively few words are needed for a particular task, so this is a very reasonable trade-off.

Research Plan for the Next Six Months

The egocentric map and speech processing modules will be combined with an object recognition module where the robot learns to locate and identify objects that it has previously touched. Our goal is to allow the robot to learn how to perform a task through demonstration by taking over the task incrementally.

The MIT team will also support the experiments using Kismet at NTT CSL in Kyoto. We will provide updated hardware as it is need. We will visit NTT at a critical time in the experiment schedule to provide on-site hardware support for the robot.