|   Multi-Modal  Orientation Behavior 
    
      
     
    
      The Cog Shop 
      MIT Artificial Intelligence Laboratory 
      545 Technology Square, #920 
      Cambridge, MA 02139 
    
      
     | 
     | 
     
     To Integrate multi-modal sensory information (visual, auditory, tactile, etc.), and use
    this information to orient the robot to the source of sensory stimuli. 
       
    The ability to orient toward visual, auditory, or tactile stimuli is an
    important skill for systems intended to interact with and explore their environment. In
    the brain of mammalian vertebrates, the Superior Colliculus is specialized for integrating
    multi-modal sensory information, and for using this information to orient the animal to
    the source sensory stimuli, such as noisy, moving objects. Within the Superior Colliculus,
    this ability appears to be implemented using layers of registered, multi-modal,
    topographic maps. Inspired by the structure, function, and plasticity of the Superior
    Colliculus, we are in the process of implementing multi-modal orientation behaviors on our
    humanoid robot using registered topographic maps.  
    
    In the Superior Colliculus of the cat, there are visuotopic maps representing motion in
    visual space, somatotopic maps yielding a body representation of tactile inputs, and
    spatiotopic maps of auditory space encoding inter-aural time differences (ITD) and
    inter-aural intensity differences (IID). Hence, a sensory stimulus originating from a
    given direction will elicit activity in the corresponding region of the appropriate
    sensory map. There are also motor movement maps consisting of pre-motor neurons whose
    movement fields are topologically organized. These maps exist for the eyes, head, neck,
    body, ears. Stimulating a specific region in the map elicits a corresponding motor
    movement. 
    These multi-modal maps overlap and are aligned with each other so that they share a
    common multisensory spatial coordinate system. The maps are said to be registered
    with one another when this is the case. By arranging multi-modal information into a common
    representational framework (within each map) and registering the maps with respect to
    eachother allows the information within each map to interact and influence the other maps.
     
    There are a couple of advantages to this organizational strategy. First, it is an
    economical way of specifying the location of peripheral stimuli, and for organizing and
    activating the motor program required to orient towards the stimuli; thereby allowing any
    sensory modality to orient the other sensory organs to source of stimulation. Second, it
    supports enhancement of simultaneous sensory cues. Stimuli that occur in the same place at
    the same time are likely to be interrelated by common causality. For instance, a bird
    rustling in the bushes will provide both visual motion and auditory cues. During
    enhancement, certain combinations of meaningful stimulus become more salient because their
    neuronal responses are spatio-temporally related. Once the multi-modal maps are aligned,
    neuronal enhancement (or depression) is a function of the temporal and spatial
    relationships of neural activity among the maps. 
    In our framework, a map is a two dimensional array of elements where each element
    corresponds to a site in the map. The maps are arranged into interconnected layers, where
    a given map can be interfaced to more than one map. Each connection is uni-directional, so
    recurrent connections between maps require both a feedforward connection and a feedback
    connection. The activity level of sites on one map is passed to another map thorough these
    connections, hence the input to a given map is a function of the spatio-temporal activity
    of the maps feeding into it and the connectivity between these maps. Currently, all
    connections have equal weights, although this could change in the future. The output of a
    given map is its spatio-temporal activity pattern. What this pattern of activity
    represents depends upon the map: if it is a visuotopic map, it could represent motion
    coming from a particular direction in the visual field; if it is an oculomotor map, it
    could encode a motor command to move the eyes, and so forth. 
    The smallest map ensemble capable of producing an observable behavior consists of a
    sensory input map, a motor output map, and an established set of connections between them.
    The input map could have a fairly rigid structure consisting simply of time-differenced
    intensity images. Because visual information already contains a spatial component, this
    simple map is topographic without any additional tuning. The motor map could be
    established such that a given site on the map corresponds to a particular motor
    displacement from the current position. If the motor displacement commands vary linearly
    with motor space, for instance, this map is also topographically organized. Assuming the
    cameras are motionless, a moving object occupies a localized region in the visual field,
    and correspondingly causes a localized intensity difference (an active region) in the
    time-differenced image map. If there exists connections from this region of the
    time-difference map to the appropriate region of the oculomotor map, then a motion
    stimulus in the visual field excites the corresponding region of the visual motion map,
    which in turn excites the connected region of the oculomotor map, which evokes the
    necessary camera motion to foveate the stimuli. 
       
    Of course, sensor to motor integration is only one type of multi-modal
    registration. As mentioned earlier, it is also possible to register sensor to sensor maps,
    such as aligning an auditory ITD map with the visuotopic motion map. By integrating these
    maps to motor maps, the robot could orient to visual or auditory stimuli. 
    Several mechanisms and models have been proposed to account for the alignament process
    of sensory-motor maps in animals. The mechanisms we use for map organization and alignment
    on Cog are inspired by similar mechanisms. However, different combinations of mechanisms
    are used depending on what is being learned: i.e. tuning the organization within a map,
    registering different sensory maps, or registering sensory maps and motor maps. A variety
    of mechanisms determine how map connections are established. Guided by sensori-motor
    experience, these mechanisms govern how connections between maps are modified to improve
    behavioral performance. 
    
    These mechanisms have been used to register a visual motion map with an oculomotor map
    so that the robot saccades to moving objects seen in either its peripheral field of view
    or its foveal field of fiew. A successful saccade is one that centers the stimulus in the
    foveal field of view. The registration of the visuotopic map with the oculomotor map
    proceeds from an initial random mapping between the two maps. So far we have learned the
    registration for the center 20x20 degree region of the peripheral map (this region
    corresponds to the fovea). 
    
    The figure below shows the performance of the learned mapping. The error at each site
    of the map is measured as the displacement of the centroid of the stimulus (after the
    saccade) from the center of the fovea field of view. The error is measured in degrees. The
    performance is analgous to the average error over the mapping, i.e., the average of the
    absolute value of the error at each site. 
       
    
    Ongoing work extends this approach to sensory-sensory map registration to integrate
    auditory and visual information so that the robot orients to both noisy and moving
    stimuli. The neck and body degrees of freedom are also being incorporated for full body
    orientation to stimuli.  
      
     
     |