From an external perspective, the behavior is quite
    rudimentary. Given a visual stimulus, typically by a researcher waving an object in front
    of its cameras, the robot saccades to foveate on the target, and then reaches out its arm
    toward the target. Early reaches are inaccurate, and often in the wrong direction
    altogether, but after a few hours of practice the accuracy improves drastically.
    The reaching algorithm involves an amalgam of several subsystems. A motion
    detection routine identifies a salient stimulus, which serves as a target for the saccade
    module. This foveation guarantees that the target is always at the center of the visual
    field; the coordinates of the target on the retina are always the center of the visual
    field, and the position of the target relative to the robot is wholly characterized by the
    gaze angle of the eyes (only two degrees of freedom). Once the target is foveated, the
    joint configuration necessary to point to that target is generated from the gaze angle of
    the eyes using a "ballistic map." This configuration is used by the arm
    controller to generate the reach.
    Training the ballistic map is complicated by the inappropriate coordinate
    space of the error signal. When the arm is extended, the robot waves its hand. This motion
    is used to locate the end of the arm in the visual field. The distance of the hand from
    the center of the visual field is the measure of the reach error. However, this error
    signal is measured in units of pixels, yet the map being trained relates gaze angles to
    joint positions. The reach error measured by the visual system cannot be directly used to
    train the ballistic map. However, the saccade map has been trained to relate pixel
    positions to gaze angles. The saccade map converts the reach error, measured as a pixel
    offset on the retina, into an offset in the gaze angles of the eyes (as if Cog were
      looking at a different target). 
    This is still not enough to train the ballistic map. Our error is now in
    terms of gaze angles, not joint positions --- i.e. we know where Cog could have looked,
    but not how it should have moved the arm. To train the ballistic map, we also need a
    "forward map" --- i.e. a forward kinematics function which gives the gaze angle
    of the hand in response to a commanded set of joint positions. The error in gaze
    coordinates can be back-propagated through this map, yielding a signal appropriate for
    training the ballistic map. 
    The forward map is learned incrementally during every reach: after each
    reach we know the commanded arm position, as well as the position measured in eye gaze
    coordinates (even though that was not the target position). For the ballistic map to train
    properly, the forward map must have the correct signs in its derivative. Hence, training
    of the forward map begins first, during a "flailing" period in which Cog
    performs reaches to random arm positions distributed through its workspace.
    Although the arm has four joints active in moving the hand to a particular
    position in space (the other two control the orientation of the hand), we re-parameterize
    in such a way that we only control two degrees of freedom for a reach. The position of the
    outstretched arm is governed by a normalized vector of "postural primitives." A
    primitive is a fixed set joint angles, corresponding to a static position of the arm,
    placed at a corner of the workspace. Three such primitives form a basis for the workspace.
    The joint space command for the arm is calculated by interpolating the joint space
    components between each primitive, weighted by the coefficients of the primitive-space
    vector. Since the vector in primitive space is normalized, three coefficients give rise to
    only two degrees of freedom. Hence, a mapping between eye gaze position and arm position,
    and vice versa, is a simple, non-degenerate $R^2 \rightarrow R^2$ function. This
    considerably simplifies learning.
    Unfortunately, the notion of postural primitives as formulated is very
    brittle: the primitives are chosen ad-hoc to yield a reasonable workspace. Finding methods
    to adaptively generate primitives and divide the workspace is a subject of active
    research.