Keystones of our Alternative Methodology
The Cog Shop
MIT Artificial Intelligence Laboratory
545 Technology Square, #920
Cambridge, MA 02139
||In recent years, AI research has
begun to move away from the assumptions of classical AI: monolithic internal models,
monolithic control, and general purpose processing. However, these concepts are still
prevalent in much current work and are deeply ingrained in many architectures for
Our alternative methodology is based on evidence from cognitive
science and neuroscience which focus on four alternative attributes which we believe are
critical attributes of human intelligence: developmental organization, social interaction,
embodiment and physical coupling, and multimodal integration.
In this section, we summarize some of the evidence that has led us to abandon those
assumptions about intelligence that classical AI continues to uphold. We then briefly
review the alternative methodology that we have been using in constructing humanoid
robotic systems. For further references, see the paper [postscript, compressed, 36 pages, 2.2Mb] [PDF, 36 pages, 484k].
In studying human intelligence, three common conceptual errors often occur: reliance on
monolithic internal models, on monolithic control, and on general purpose processing.
These and other errors primarily derive from naive models based on subjective observation
and introspection, and biases from common computational metaphors (mathematical logic, Von
Neumann architectures, etc.). A modern understanding of cognitive science and neuroscience
refutes these assumptions.
There is evidence that in normal tasks humans tend to minimize their internal
representation of the world. Ballard, Hayhoe, and Pelz (1995) have shown that in
performing a complex task, like building a copy of a display of blocks, humans do not
build an internal model of the entire visible scene. By changing the display while
subjects were looking away, Ballard found that subjects noticed only the most drastic of
changes; rather than keeping a complete model of the scene, they instead left that
information in the world and continued to refer back to the scene while performing the
There is also evidence that there are multiple internal representations, which are not
mutually consistent. For example, in the phenomena of blindsight, cortically blind
patients can discriminate different visual stimuli, but report seeing nothing. This
inconsistency would not be a feature of a single central model of visual space.
These experiments and many others like it, such as the work by Gazzaniga and LeDoux on
split brain patients or Rensink, ORegan, and Clark on changes in visual scenes,
convincingly demonstrate that humans do not construct a full, monolithic model of the
environment. Instead humans tend to only represent what is immediately relevant from the
environment, and those representations do not have full access to one another.
Naive introspection and observation can lead one to believe in a neurological
equivalent of the central processing unitsomething that makes the decisions and
controls the other functions of the organism. While there are undoubtedly control
structures, this model of a single, unitary control system is not supported by evidence
from cognitive science.
One example comes from studies of split brain patients by Gazzaniga and LeDoux. As an
experimental treatment for severe epilepsy in these patients, the corpus callosum (the
main structure connecting the two hemispheres of the brain) was surgically cut. The
patients are surprisingly normal after the operation, but with deficits that are revealed
by presenting different information to either side of the (now unconnected) brain. Since
each hemisphere controls one side of the body, the experimenters can probe the behavior of
each hemisphere independently (for example, by observing the subject picking up an object
appropriate to the scene that they had viewed). In one example, a snow scene was presented
to the right hemisphere and the leg of a chicken to the left. The subject selected a
chicken head to match the chicken leg, explaining with the verbally dominant left
hemisphere that "I saw the claw and picked the chicken". When the right
hemisphere then picked a shovel to correctly match the snow, the left hemisphere explained
that you need a shovel to "clean out the chicken shed" (from p. 148 of Gazzaniga
and LeDoux). The separate halves of the subject independently acted appropriately, but one
side falsely explained the choice of the other. This suggests that there are multiple
independent control systems, rather than a single monolithic one.
The brain is conventionally thought to be a general purpose machine, acting with equal
skill on any type of operation that it performs by invoking a set of powerful rules.
However, humans seem to be proficient only in particular sets of skills, at the expense of
other skills, often in non-obvious ways. A good example of this is the Stroop effect. When
presented with a list of words written in a variety of colors, performance in a color
recognition and articulation task is dependent on the semantic content of the words; the
task is very difficult if names of colors are printed in non-corresponding colors. This
experiment demonstrates the specialized nature of human computational processes and
Even in the areas of deductive logic, humans often perform extremely poorly in
different contexts. Wason (1966) found that subjects were unable to apply the negative
rule of if-then inference when four cards were labeled with single letters and digits.
However, with additional context---labeling the cards such that they were understandable
as names and ages---subjects could easily solve exactly the same problem.
Further, humans often do not use subroutine-like rules for making decisions. They are
often more emotional than rational, and there is evidence that this emotional content is
an important aspect of decision making (for example, the work from Damasio).
In an attempt to simplify the problem of building complex intelligent systems,
classical AI approaches tended to ignore or avoid many aspects of human intelligence. We
believe that many of these discarded elements are essential to human intelligence. Our
methodology exploits four central aspects of human intelligence: development, social
interaction, physical interaction and integration. Development forms the framework by
which humans successfully acquire increasingly more complex skills and competencies.
Social interaction allows humans to exploit other humans for assistance, teaching, and
knowledge. Embodiment and physical coupling allow humans to use the world itself as a tool
for organizing and manipulating knowledge. Integration allows humans to maximize the
efficacy and accuracy of complementary sensory and motor systems. We believe that not only
are these four themes critical to the understanding of human intelligence but also they
actually simplify the problem of creating human-like intelligence.
Humans are not born with complete reasoning systems, complete motor systems, or even
complete sensory systems. Instead, they undergo a process of development where they
perform incrementally more difficult tasks in more complex environments en route to the
adult state. Building systems developmentally facilitates learning both by providing a
structured decomposition of skills and by gradually increasing the complexity of the task
to match the competency of the system.
Development is an incremental process. Behaviors and learned skills that have already
been mastered prepare and enable the acquisition of more advanced behaviors by providing
subskills and knowledge that can be re-used, by placing simplifying constraints on the
acquisition, and by minimizing new information that must be acquired. For example, Diamond
(1990) shows that infants between five and twelve months of age progress through a number
of distinct phases in the development of visually guided reaching. In this progression,
infants in later phases consistently demonstrate more sophisticated reaching strategies to
retrieve a toy in more challenging scenarios. As the infants reaching competency
develops, later stages incrementally improve upon the competency afforded by the previous
stages. Within our group, Marjanovic, Scassellati, and Williamson [postscript, compressed, 10 pages, 577k] [PDF, 10 pages, 1.2 Mb]
applied a similar bootstrapping technique to enable the robot to learn to point to a
visual target. Scassellati [postscript,
compressed, 20 pages, 1.3 Mb] [PDF, 20 pages,
360k] has discussed how a humanoid robot might acquire basic
social competencies through this sort of developmental methodology. Other examples of
developmental learning that we have explored can be found in [postscript, compressed, 10 pages, 280k] and [postscript,
compressed, 13 pages, 1.25 Meg] [PDF, 13
By gradually increasing the complexity of the required task, a developmental process
optimizes learning. For example, infants are born with low acuity vision which simplifies
the visual input they must process. The infants visual performance develops in step
with their ability to process the influx of stimulation. The same is true for the motor
system. Newborn infants do not have independent control over each degree of freedom of
their limbs, but through a gradual increase in the granularity of their motor control they
learn to coordinate the full complexity of their bodies. A process in which the acuity of
both sensory and motor systems are gradually increased significantly reduces the
difficulty of the learning problem. The caregiver also acts to gradually increase the task
complexity by structuring and controlling the complexity of the environment. By exploiting
a gradual increase in complexity both internal and external, while reusing structures and
information gained from previously learned behaviors, we hope to be able to learn
increasingly sophisticated behaviors. We believe that these methods will allow us to
construct systems which scale autonomously [postscript, 5 pages, 92k] [postscript, compressed, 7 pages, 370k] [PDF, 7 pages, 730k]
Human infants are extremely dependent on their caregivers, relying upon them not only
for basic necessities but also as a guide to their development. This reliance on social
contact is so integrated into our species that it is hard to imagine a completely asocial
human; developmental disorders that effect social development, such as autism and
Aspergers syndrome, are extremely debilitating and can have far-reaching
consequences. Building social skills into an artificial intelligence provides not only a
natural means of human-machine interaction but also a mechanism for bootstrapping more
complex behavior. Our research program has investigated social interaction both as a means
for bootstrapping and as an instance of developmental progression.
Social interaction can be a means to facilitate learning. New skills may be socially
transfered from caregiver to infant through mimicry or imitation, through direct tutelage,
or by means of scaffolding, in which a more able adult manipulates the infants
interactions with the environment to foster novel abilities. Commonly scaffolding involves
reducing distractions, marking the tasks critical attributes, reducing the number of
degrees of freedom in the target task, and enabling the infant to experience the end or
outcome before she is cognitively or physically able of seeking and attaining it for
herself. We are currently engaged in work studying bootstrapping new behaviors from social
compressed, 43 pages, 963k] and <<<< link to
Breazeal-Velasquez 1998 SAB paper>>>>>
The social skills required to make use of scaffolding are complex. Infants acquire
these social skills through a developmental progression. One of the earliest precursors is
the ability to share attention with the caregiver. This ability can take many forms, from
the recognition of a pointing gesture to maintaining eye contact [postscript, compressed, 20 pages, 1.3 Mb] [PDF, 20 pages, 360k].. In our
work, we have also examined social interaction from this developmental perspective,
building systems that can recognize and respond to joint attention by finding faces and
eyes [postscript, compressed, 8 pages,
1.75 Meg] [PDF, 8 pages, 2.2 Mb] and imitating head nods of the caregiver <<<<link to Scazs
1998 Autonomous Agents paper>>>>>.
Perhaps the most obvious, and most overlooked, aspect of human intelligence is that it
is embodied. A principle tenet of our methodology is to build and test real robotic
systems. We believe that building human-like intelligence requires human-like interaction
with the world. Humanoid form is important both to allow humans to interact socially with
the robot in a natural way and to provide similar task constraints.
The direct physical coupling between action and perception reduces the need for an
intermediary representation. For an embodied system, internal representations can be
ultimately grounded in sensory-motor interactions with the world. Our systems are
physically coupled with the world and operate directly in that world without any explicit
representations of it. There are representations, or accumulations of state, but these
only refer to the internal workings of the system; they are meaningless without
interaction with the outside world. The embedding of the system within the world enables
the internal accumulations of state to provide useful behavior.
In addition we believe that building a real system is computationally less complex than
simulating such a system. The effects of gravity, friction, and natural human interaction
are obtained for free, without any computation. Embodied systems can also perform some
complex tasks in relatively simple ways by exploiting the properties of the complete
system. For example, when putting a jug of milk in the refrigerator, you can exploit the
pendulum action of your arm to move the milk (an example from Green (1982)). The swing of
the jug does not need to be explicitly planned or controlled, since it is the natural
behavior of the system. Instead of having to plan the whole motion, the system only has to
modulate, guide and correct the natural dynamics. We have implemented one such scheme
using self-adaptive oscillators to drive the joints of the robots arm [postscript, gzipped, 6 pages, 202k] [ftp] and [postscript, gzipped, 7 pages, 682k] [ftp].
Humans have the capability to receive an enormous amount of information from the world.
Visual, auditory, somatosensory, and olfactory cues are all processed simultaneously to
provide us with our view of the world. However, there is evidence that the sensory
modalities are not independent; stimuli from one modality can and do influence the
perception of stimuli in another modality. For example, Churchland, Ramachandran, and
Sejnowski (1994) demonstrated an example of how audition can cause illusory visual motion.
Vision can cause auditory illusions too, such as the McGurk effect. These studies
demonstrate that sensory modalities cannot be treated independently.
Sensory integration can simplify the computation necessary for a given task. Attempting
to perform the task using only one modality is sometimes awkward and computationally
intensive. Utilizing the complementary nature of separate modalities can result in a
reduction of overall computation. We have implemented several mechanisms on Cog that use
multimodal integration to aid in increasing performance or developing competencies. For
example, OReilly and Scassellati implemented a system that stabilized images from a
moving camera using vestibular feedback.
By integrating different sensory modalities we can exploit the multimodal nature of
stimuli to facilitate learning. For example, objects that make noise often move. This
correlation can be exploited to facilitate perception. Wertheimer (1961) has shown that
vision and audition interact from birth; even ten-minute-old children will turn their eyes
toward an auditory cue. This interaction between the senses continues to develop; visual
stimuli greatly affect the development of sound localization. In our work, Irie built an
auditory system that utilizes visual information to train auditory localization
<<<< link to Roberts 1997 IJCAI paper>>>>. This work
highlights not only the development of sensory integration, but also the simplification of
computational requirements that can be obtained through integration.