Humans as Robots

Humans as Robots a picture of me wearing the system

Above is a picture of me wearing the first version of the system while inspecting a toy. Below is a diagram depicting the main components of the most recent version of the system. The backpack holds batteries and a laptop that communicates wirelessly with a computer cluster. A hat-mounted firewire camera captures video of the dominant hand's workspace from the person's perspective. Kinematic measurements of the head and dominant arm are performed by 4 Intersense devices with one worn on the head, another on the wrist, a third on the upper arm, the fourth on the torso.

diagram of the system

Human intelligence relies on a wealth of commonsense acquired from a lifetime of experience. In order to achieve the long term goals of artificial human intelligence, researchers must find ways to endow machines with this type of commonsense.

Humanoid robots could serve as a direct approach to the acquisition of this type of competence, since a sufficiently sophisticated humanoid robot would be able to experience much of the world in the same way as humans. Currently, however, humanoid robots have very limited experience with the world due to obstacles ranging from mechanical design to social constraints on the use of autonomous robots.

examples frames of the creature's perception

examples frames of the creature's perception

This figure shows three snapshots of data from a sequence of activity monitored by the system. The wearer reaches for a cup and drinks from it. The top row consists of images from the glasses-mounted camera. The bottom row shows depictions of the corresponding data provided by the Intersense devices, which give 9 orientation values in total.

Wearable computing systems have the potential to measure a great deal of the sensory input and physical output of a person as he or she experiences everyday activities. Much can be learned through passive observation of these measurements. However, if we can also find ways for the wearable system to strongly influence the behavior of the person wearing the system, many learning tasks can be made easier.

We are designing Duo, a wearable creature that works with a cooperative human in order to learn about everyday objects in the world and the ways in which they are commonly used. The wearable learns about the world by watching and sometimes making requests of the wearer as he goes through activities in the day. By using the same sensory input as the person and co-opting his output behaviors, the wearable creature serves as a top layer of control in a subsumption architecture with the human serving as a powerful mechanical and computational infrastructure.

As an initial exploration into this class of wearable applications, we are creating a wearable system that attempts to learn common-sense about everyday actions as they relate to objects and changes to the environment. As shown in the diagram at the top of the page, the system currently consists of a hat-mounted camera from which the creature watches the world and 4 Intersense devices, each of which provides an absolute orientation, with which the creature measures the kinematic configuration of the wearer's head and dominant arm. The creature also serves as a high level controller that attempts to co-opt the wearer's behaviors by requesting actions through headphones. For example, the creature can currently request that the wearer look at an object that the wearer is manipulating, in order to see it better and segment it using the LED array. Likewise Duo can ask that the wearer keep his head still, in order to make perception easier. We hypothesize that a broad array of actions useful for learning can be successfully prompted by speech from Duo. In the future Duo may ask the wearer to repeat an action by uttering, ``do that again!'', which should help the creature segment the activity into meaningful parts. More generally, by requesting actions the wearable creature can test hypotheses it has made about actions and their effect in the world.

example segmentations from Duo

This figure shows two segmentations of common manipulable objects by Duo. When Duo detects that the wearer has reached for an object, Duo requests that the person look at the object via speech through the headphones. When the person holds up the object to look at it, Duo flashes the LEDs in order to produce the segmentations shown in this figure. The first column shows Duo's view before the LED flash and the second column shows the view during the LED flash. The third column shows the difference between the flashed and non-flashed images. The fourth column shows the object and hand mask produced by thresholding the difference image from the third column. The final column shows the masks applied to the image to segment the hands holding the objects in the images.

Duo's first behaviors work together with a cooperative human to acquire high-quality segmentations of everyday manipulable objects used by the person wearing the system. When Duo detects that the wearer has reached for an object it asks the wearer to look at it with speech through the headphones. While looking at the object Duo flashes the LED array in order to segment the hand and object, which are in the foreground, from the rest of the world in the background. While looking at the object, Duo also monitors the wearer's head movements. If the wearer's head moves significantly, Duo request that the wearer keep his head still. We are currently working with these segmented images to perform object tracking, detection, and recognition. We are also creating segmentation and learning code for the object related actions of the wearer.

Our overall goal is to develop a viable system for the acquisition of commonsense related to everyday human activities. A successful creature would be able to learn and control a set of common behaviors performed by a cooperative human and would be able to relate common action patterns to the visual appearance of objects and to observed changes in the world.