From an external perspective, the behavior is quite
rudimentary. Given a visual stimulus, typically by a researcher waving an object in front
of its cameras, the robot saccades to foveate on the target, and then reaches out its arm
toward the target. Early reaches are inaccurate, and often in the wrong direction
altogether, but after a few hours of practice the accuracy improves drastically.
The reaching algorithm involves an amalgam of several subsystems. A motion
detection routine identifies a salient stimulus, which serves as a target for the saccade
module. This foveation guarantees that the target is always at the center of the visual
field; the coordinates of the target on the retina are always the center of the visual
field, and the position of the target relative to the robot is wholly characterized by the
gaze angle of the eyes (only two degrees of freedom). Once the target is foveated, the
joint configuration necessary to point to that target is generated from the gaze angle of
the eyes using a "ballistic map." This configuration is used by the arm
controller to generate the reach.
Training the ballistic map is complicated by the inappropriate coordinate
space of the error signal. When the arm is extended, the robot waves its hand. This motion
is used to locate the end of the arm in the visual field. The distance of the hand from
the center of the visual field is the measure of the reach error. However, this error
signal is measured in units of pixels, yet the map being trained relates gaze angles to
joint positions. The reach error measured by the visual system cannot be directly used to
train the ballistic map. However, the saccade map has been trained to relate pixel
positions to gaze angles. The saccade map converts the reach error, measured as a pixel
offset on the retina, into an offset in the gaze angles of the eyes (as if Cog were
looking at a different target).
This is still not enough to train the ballistic map. Our error is now in
terms of gaze angles, not joint positions --- i.e. we know where Cog could have looked,
but not how it should have moved the arm. To train the ballistic map, we also need a
"forward map" --- i.e. a forward kinematics function which gives the gaze angle
of the hand in response to a commanded set of joint positions. The error in gaze
coordinates can be back-propagated through this map, yielding a signal appropriate for
training the ballistic map.
The forward map is learned incrementally during every reach: after each
reach we know the commanded arm position, as well as the position measured in eye gaze
coordinates (even though that was not the target position). For the ballistic map to train
properly, the forward map must have the correct signs in its derivative. Hence, training
of the forward map begins first, during a "flailing" period in which Cog
performs reaches to random arm positions distributed through its workspace.
Although the arm has four joints active in moving the hand to a particular
position in space (the other two control the orientation of the hand), we re-parameterize
in such a way that we only control two degrees of freedom for a reach. The position of the
outstretched arm is governed by a normalized vector of "postural primitives." A
primitive is a fixed set joint angles, corresponding to a static position of the arm,
placed at a corner of the workspace. Three such primitives form a basis for the workspace.
The joint space command for the arm is calculated by interpolating the joint space
components between each primitive, weighted by the coefficients of the primitive-space
vector. Since the vector in primitive space is normalized, three coefficients give rise to
only two degrees of freedom. Hence, a mapping between eye gaze position and arm position,
and vice versa, is a simple, non-degenerate $R^2 \rightarrow R^2$ function. This
considerably simplifies learning.
Unfortunately, the notion of postural primitives as formulated is very
brittle: the primitives are chosen ad-hoc to yield a reasonable workspace. Finding methods
to adaptively generate primitives and divide the workspace is a subject of active
research.