Humans as Robots
Above is a picture of me wearing the first version of the system
while inspecting a toy. Below is a diagram depicting the main
components of the most recent version of the system. The backpack
holds batteries and a laptop that communicates wirelessly with a
computer cluster. A hat-mounted firewire camera captures video of
the dominant hand's workspace from the person's
perspective. Kinematic measurements of the head and dominant arm are
performed by 4 Intersense devices with one worn on the head, another
on the wrist, a third on the upper arm, the fourth on the
torso.
Human intelligence relies on a wealth of commonsense acquired from a
lifetime of experience. In order to achieve the long term goals of
artificial human intelligence, researchers must find ways to endow
machines with this type of commonsense.
Humanoid robots could serve as a direct approach to the acquisition of
this type of competence, since a sufficiently sophisticated humanoid
robot would be able to experience much of the world in the same way as
humans. Currently, however, humanoid robots have very limited
experience with the world due to obstacles ranging from mechanical
design to social constraints on the use of autonomous robots.
This figure shows three snapshots of data from a sequence
of activity monitored by the system. The wearer reaches for a cup and
drinks from it. The top row consists of images from the
glasses-mounted camera. The bottom row shows depictions of the
corresponding data provided by the Intersense devices, which give 9
orientation values in total.
Wearable computing systems have the potential to measure a great deal
of the sensory input and physical output of a person as he or she
experiences everyday activities. Much can be learned through passive
observation of these measurements. However, if we can also find ways
for the wearable system to strongly influence the behavior of the
person wearing the system, many learning tasks can be made easier.
We are designing Duo, a wearable creature that works with a
cooperative human in order to learn about everyday objects in the
world and the ways in which they are commonly used. The wearable
learns about the world by watching and sometimes making requests of
the wearer as he goes through activities in the day. By using the same
sensory input as the person and co-opting his output behaviors, the
wearable creature serves as a top layer of control in a subsumption
architecture with the human serving as a powerful mechanical and
computational infrastructure.
As an initial exploration into this class of wearable applications, we
are creating a wearable system that attempts to learn common-sense
about everyday actions as they relate to objects and changes to the
environment. As shown in the diagram at the top of the page, the
system currently consists of a hat-mounted camera from which the
creature watches the world and 4 Intersense devices, each of which
provides an absolute orientation, with which the creature measures the
kinematic configuration of the wearer's head and dominant arm. The
creature also serves as a high level controller that attempts to
co-opt the wearer's behaviors by requesting actions through
headphones. For example, the creature can currently request that the
wearer look at an object that the wearer is manipulating, in order to
see it better and segment it using the LED array. Likewise Duo can ask
that the wearer keep his head still, in order to make perception
easier. We hypothesize that a broad array of actions useful for
learning can be successfully prompted by speech from Duo. In the future
Duo may ask the wearer to repeat an action by uttering, ``do that
again!'', which should help the creature segment the activity into
meaningful parts. More generally, by requesting actions the wearable
creature can test hypotheses it has made about actions and their
effect in the world.
This figure shows two segmentations of common manipulable objects
by Duo. When Duo detects that the wearer has reached for an object,
Duo requests that the person look at the object via speech through the
headphones. When the person holds up the object to look at it, Duo
flashes the LEDs in order to produce the segmentations shown in this
figure. The first column shows Duo's view before the LED flash and the
second column shows the view during the LED flash. The third column
shows the difference between the flashed and non-flashed images. The
fourth column shows the object and hand mask produced by thresholding
the difference image from the third column. The final column shows the
masks applied to the image to segment the hands holding the objects in
the images.
Duo's first behaviors work together with a cooperative human to
acquire high-quality segmentations of everyday manipulable objects
used by the person wearing the system. When Duo detects that the
wearer has reached for an object it asks the wearer to look at it with
speech through the headphones. While looking at the object Duo flashes
the LED array in order to segment the hand and object, which are in
the foreground, from the rest of the world in the background. While
looking at the object, Duo also monitors the wearer's head movements.
If the wearer's head moves significantly, Duo request that the wearer
keep his head still. We are currently working with these segmented
images to perform object tracking, detection, and recognition. We are
also creating segmentation and learning code for the object related
actions of the wearer.
Our overall goal is to develop a viable system for the acquisition of
commonsense related to everyday human activities. A successful
creature would be able to learn and control a set of common behaviors
performed by a cooperative human and would be able to relate common
action patterns to the visual appearance of objects and to observed
changes in the world.