|         Methodology: 
        
      Keystones of our Alternative Methodology
      
    
     
    
      The Cog Shop 
      MIT Artificial Intelligence Laboratory 
      545 Technology Square, #920 
      Cambridge, MA 02139 
    
      
     | 
     | 
    In recent years, AI research has
    begun to move away from the assumptions of classical AI: monolithic internal models,
    monolithic control, and general purpose processing. However, these concepts are still
    prevalent in much current work and are deeply ingrained in many architectures for
    intelligent systems. Our alternative methodology is based on evidence from cognitive
    science and neuroscience which focus on four alternative attributes which we believe are
    critical attributes of human intelligence: developmental organization, social interaction,
    embodiment and physical coupling, and multimodal integration. 
    In this section, we summarize some of the evidence that has led us to abandon those
    assumptions about intelligence that classical AI continues to uphold. We then briefly
    review the alternative methodology that we have been using in constructing humanoid
    robotic systems. For further references, see the paper  [postscript, compressed, 36 pages, 2.2Mb] [PDF, 36 pages, 484k].
    
    In studying human intelligence, three common conceptual errors often occur: reliance on
    monolithic internal models, on monolithic control, and on general purpose processing.
    These and other errors primarily derive from naive models based on subjective observation
    and introspection, and biases from common computational metaphors (mathematical logic, Von
    Neumann architectures, etc.). A modern understanding of cognitive science and neuroscience
    refutes these assumptions. 
    
    There is evidence that in normal tasks humans tend to minimize their internal
    representation of the world. Ballard, Hayhoe, and Pelz (1995) have shown that in
    performing a complex task, like building a copy of a display of blocks, humans do not
    build an internal model of the entire visible scene. By changing the display while
    subjects were looking away, Ballard found that subjects noticed only the most drastic of
    changes; rather than keeping a complete model of the scene, they instead left that
    information in the world and continued to refer back to the scene while performing the
    copying task. 
    There is also evidence that there are multiple internal representations, which are not
    mutually consistent. For example, in the phenomena of blindsight, cortically blind
    patients can discriminate different visual stimuli, but report seeing nothing. This
    inconsistency would not be a feature of a single central model of visual space. 
    These experiments and many others like it, such as the work by Gazzaniga and LeDoux on
    split brain patients or Rensink, ORegan, and Clark on changes in visual scenes,
    convincingly demonstrate that humans do not construct a full, monolithic model of the
    environment. Instead humans tend to only represent what is immediately relevant from the
    environment, and those representations do not have full access to one another. 
    
    Naive introspection and observation can lead one to believe in a neurological
    equivalent of the central processing unitsomething that makes the decisions and
    controls the other functions of the organism. While there are undoubtedly control
    structures, this model of a single, unitary control system is not supported by evidence
    from cognitive science. 
    One example comes from studies of split brain patients by Gazzaniga and LeDoux. As an
    experimental treatment for severe epilepsy in these patients, the corpus callosum (the
    main structure connecting the two hemispheres of the brain) was surgically cut. The
    patients are surprisingly normal after the operation, but with deficits that are revealed
    by presenting different information to either side of the (now unconnected) brain. Since
    each hemisphere controls one side of the body, the experimenters can probe the behavior of
    each hemisphere independently (for example, by observing the subject picking up an object
    appropriate to the scene that they had viewed). In one example, a snow scene was presented
    to the right hemisphere and the leg of a chicken to the left. The subject selected a
    chicken head to match the chicken leg, explaining with the verbally dominant left
    hemisphere that "I saw the claw and picked the chicken". When the right
    hemisphere then picked a shovel to correctly match the snow, the left hemisphere explained
    that you need a shovel to "clean out the chicken shed" (from p. 148 of Gazzaniga
    and LeDoux). The separate halves of the subject independently acted appropriately, but one
    side falsely explained the choice of the other. This suggests that there are multiple
    independent control systems, rather than a single monolithic one. 
    
    The brain is conventionally thought to be a general purpose machine, acting with equal
    skill on any type of operation that it performs by invoking a set of powerful rules.
    However, humans seem to be proficient only in particular sets of skills, at the expense of
    other skills, often in non-obvious ways. A good example of this is the Stroop effect. When
    presented with a list of words written in a variety of colors, performance in a color
    recognition and articulation task is dependent on the semantic content of the words; the
    task is very difficult if names of colors are printed in non-corresponding colors. This
    experiment demonstrates the specialized nature of human computational processes and
    interactions. 
    Even in the areas of deductive logic, humans often perform extremely poorly in
    different contexts. Wason (1966) found that subjects were unable to apply the negative
    rule of if-then inference when four cards were labeled with single letters and digits.
    However, with additional context---labeling the cards such that they were understandable
    as names and ages---subjects could easily solve exactly the same problem. 
    Further, humans often do not use subroutine-like rules for making decisions. They are
    often more emotional than rational, and there is evidence that this emotional content is
    an important aspect of decision making (for example, the work from Damasio). 
    
    In an attempt to simplify the problem of building complex intelligent systems,
    classical AI approaches tended to ignore or avoid many aspects of human intelligence. We
    believe that many of these discarded elements are essential to human intelligence. Our
    methodology exploits four central aspects of human intelligence: development, social
    interaction, physical interaction and integration. Development forms the framework by
    which humans successfully acquire increasingly more complex skills and competencies.
    Social interaction allows humans to exploit other humans for assistance, teaching, and
    knowledge. Embodiment and physical coupling allow humans to use the world itself as a tool
    for organizing and manipulating knowledge. Integration allows humans to maximize the
    efficacy and accuracy of complementary sensory and motor systems. We believe that not only
    are these four themes critical to the understanding of human intelligence but also they
    actually simplify the problem of creating human-like intelligence. 
    
    Humans are not born with complete reasoning systems, complete motor systems, or even
    complete sensory systems. Instead, they undergo a process of development where they
    perform incrementally more difficult tasks in more complex environments en route to the
    adult state. Building systems developmentally facilitates learning both by providing a
    structured decomposition of skills and by gradually increasing the complexity of the task
    to match the competency of the system. 
    Development is an incremental process. Behaviors and learned skills that have already
    been mastered prepare and enable the acquisition of more advanced behaviors by providing
    subskills and knowledge that can be re-used, by placing simplifying constraints on the
    acquisition, and by minimizing new information that must be acquired. For example, Diamond
    (1990) shows that infants between five and twelve months of age progress through a number
    of distinct phases in the development of visually guided reaching. In this progression,
    infants in later phases consistently demonstrate more sophisticated reaching strategies to
    retrieve a toy in more challenging scenarios. As the infants reaching competency
    develops, later stages incrementally improve upon the competency afforded by the previous
    stages. Within our group, Marjanovic, Scassellati, and Williamson  [postscript, compressed, 10 pages, 577k] [PDF, 10 pages, 1.2 Mb]
    applied a similar bootstrapping technique to enable the robot to learn to point to a
    visual target. Scassellati [postscript,
    compressed, 20 pages, 1.3 Mb] [PDF, 20 pages,
    360k] has discussed how a humanoid robot might acquire basic
    social competencies through this sort of developmental methodology. Other examples of
    developmental learning that we have explored can be found in [postscript, compressed, 10 pages, 280k] and [postscript,
    compressed, 13 pages, 1.25 Meg] [PDF, 13
    pages, 370k]
    By gradually increasing the complexity of the required task, a developmental process
    optimizes learning. For example, infants are born with low acuity vision which simplifies
    the visual input they must process. The infants visual performance develops in step
    with their ability to process the influx of stimulation. The same is true for the motor
    system. Newborn infants do not have independent control over each degree of freedom of
    their limbs, but through a gradual increase in the granularity of their motor control they
    learn to coordinate the full complexity of their bodies. A process in which the acuity of
    both sensory and motor systems are gradually increased significantly reduces the
    difficulty of the learning problem. The caregiver also acts to gradually increase the task
    complexity by structuring and controlling the complexity of the environment. By exploiting
    a gradual increase in complexity both internal and external, while reusing structures and
    information gained from previously learned behaviors, we hope to be able to learn
    increasingly sophisticated behaviors. We believe that these methods will allow us to
    construct systems which scale autonomously  [postscript, 5 pages, 92k] [postscript, compressed, 7 pages, 370k] [PDF, 7 pages, 730k]
    
    Human infants are extremely dependent on their caregivers, relying upon them not only
    for basic necessities but also as a guide to their development. This reliance on social
    contact is so integrated into our species that it is hard to imagine a completely asocial
    human; developmental disorders that effect social development, such as autism and
    Aspergers syndrome, are extremely debilitating and can have far-reaching
    consequences. Building social skills into an artificial intelligence provides not only a
    natural means of human-machine interaction but also a mechanism for bootstrapping more
    complex behavior. Our research program has investigated social interaction both as a means
    for bootstrapping and as an instance of developmental progression. 
    Social interaction can be a means to facilitate learning. New skills may be socially
    transfered from caregiver to infant through mimicry or imitation, through direct tutelage,
    or by means of scaffolding, in which a more able adult manipulates the infants
    interactions with the environment to foster novel abilities. Commonly scaffolding involves
    reducing distractions, marking the tasks critical attributes, reducing the number of
    degrees of freedom in the target task, and enabling the infant to experience the end or
    outcome before she is cognitively or physically able of seeking and attaining it for
    herself. We are currently engaged in work studying bootstrapping new behaviors from social
    interactions  [postscript,
    compressed, 43 pages, 963k] and  <<<< link to
    Breazeal-Velasquez 1998 SAB paper>>>>>
    The social skills required to make use of scaffolding are complex. Infants acquire
    these social skills through a developmental progression. One of the earliest precursors is
    the ability to share attention with the caregiver. This ability can take many forms, from
    the recognition of a pointing gesture to maintaining eye contact  [postscript, compressed, 20 pages, 1.3 Mb] [PDF, 20 pages, 360k].. In our
    work, we have also examined social interaction from this developmental perspective,
    building systems that can recognize and respond to joint attention by finding faces and
    eyes [postscript, compressed, 8 pages,
    1.75 Meg] [PDF, 8 pages, 2.2 Mb] and imitating head nods of the caregiver <<<<link to Scazs
    1998 Autonomous Agents paper>>>>>.
    
    Perhaps the most obvious, and most overlooked, aspect of human intelligence is that it
    is embodied. A principle tenet of our methodology is to build and test real robotic
    systems. We believe that building human-like intelligence requires human-like interaction
    with the world. Humanoid form is important both to allow humans to interact socially with
    the robot in a natural way and to provide similar task constraints. 
    The direct physical coupling between action and perception reduces the need for an
    intermediary representation. For an embodied system, internal representations can be
    ultimately grounded in sensory-motor interactions with the world. Our systems are
    physically coupled with the world and operate directly in that world without any explicit
    representations of it. There are representations, or accumulations of state, but these
    only refer to the internal workings of the system; they are meaningless without
    interaction with the outside world. The embedding of the system within the world enables
    the internal accumulations of state to provide useful behavior. 
    In addition we believe that building a real system is computationally less complex than
    simulating such a system. The effects of gravity, friction, and natural human interaction
    are obtained for free, without any computation. Embodied systems can also perform some
    complex tasks in relatively simple ways by exploiting the properties of the complete
    system. For example, when putting a jug of milk in the refrigerator, you can exploit the
    pendulum action of your arm to move the milk (an example from Green (1982)). The swing of
    the jug does not need to be explicitly planned or controlled, since it is the natural
    behavior of the system. Instead of having to plan the whole motion, the system only has to
    modulate, guide and correct the natural dynamics. We have implemented one such scheme
    using self-adaptive oscillators to drive the joints of the robots arm  [postscript, gzipped, 6 pages, 202k] [ftp] and [postscript, gzipped, 7 pages, 682k] [ftp].
    
    Humans have the capability to receive an enormous amount of information from the world.
    Visual, auditory, somatosensory, and olfactory cues are all processed simultaneously to
    provide us with our view of the world. However, there is evidence that the sensory
    modalities are not independent; stimuli from one modality can and do influence the
    perception of stimuli in another modality. For example, Churchland, Ramachandran, and
    Sejnowski (1994) demonstrated an example of how audition can cause illusory visual motion.
    Vision can cause auditory illusions too, such as the McGurk effect. These studies
    demonstrate that sensory modalities cannot be treated independently. 
    Sensory integration can simplify the computation necessary for a given task. Attempting
    to perform the task using only one modality is sometimes awkward and computationally
    intensive. Utilizing the complementary nature of separate modalities can result in a
    reduction of overall computation. We have implemented several mechanisms on Cog that use
    multimodal integration to aid in increasing performance or developing competencies. For
    example, OReilly and Scassellati implemented a system that stabilized images from a
    moving camera using vestibular feedback. 
    By integrating different sensory modalities we can exploit the multimodal nature of
    stimuli to facilitate learning. For example, objects that make noise often move. This
    correlation can be exploited to facilitate perception. Wertheimer (1961) has shown that
    vision and audition interact from birth; even ten-minute-old children will turn their eyes
    toward an auditory cue. This interaction between the senses continues to develop; visual
    stimuli greatly affect the development of sound localization. In our work, Irie built an
    auditory system that utilizes visual information to train auditory localization
    <<<< link to Roberts 1997 IJCAI paper>>>>. This work
    highlights not only the development of sensory integration, but also the simplification of
    computational requirements that can be obtained through integration. 
      |