Home ] Up ] Publications ] Video ] People ] History of Cog ] Job Opportunities ]

Our Research Methodology



The Cog Shop
MIT Artificial Intelligence Laboratory
545 Technology Square, #920
Cambridge, MA 02139

write to the Cog Documentation Project: cdp@ai.mit.edu
In recent years, AI research has begun to move away from the assumptions of classical AI: monolithic internal models, monolithic control, and general purpose processing. However, these concepts are still prevalent in much current work and are deeply ingrained in many architectures for intelligent systems.

Our alternative methodology is based on evidence from cognitive science and neuroscience which focus on four alternative attributes which we believe are critical attributes of human intelligence: developmental organization, social interaction, embodiment and physical coupling, and multimodal integration.

In this section, we summarize some of the evidence that has led us to abandon those assumptions about intelligence that classical AI continues to uphold. We then briefly review the alternative methodology that we have been using in constructing humanoid robotic systems. For further references, see the paper [postscript, compressed, 36 pages, 2.2Mb] [PDF, 36 pages, 484k].

False Assumptions about Human Intelligence

In studying human intelligence, three common conceptual errors often occur: reliance on monolithic internal models, on monolithic control, and on general purpose processing. These and other errors primarily derive from naive models based on subjective observation and introspection, and biases from common computational metaphors (mathematical logic, Von Neumann architectures, etc.). A modern understanding of cognitive science and neuroscience refutes these assumptions.

Humans have no full monolithic internal models.

There is evidence that in normal tasks humans tend to minimize their internal representation of the world. Ballard, Hayhoe, and Pelz (1995) have shown that in performing a complex task, like building a copy of a display of blocks, humans do not build an internal model of the entire visible scene. By changing the display while subjects were looking away, Ballard found that subjects noticed only the most drastic of changes; rather than keeping a complete model of the scene, they instead left that information in the world and continued to refer back to the scene while performing the copying task.

There is also evidence that there are multiple internal representations, which are not mutually consistent. For example, in the phenomena of blindsight, cortically blind patients can discriminate different visual stimuli, but report seeing nothing. This inconsistency would not be a feature of a single central model of visual space.

These experiments and many others like it, such as the work by Gazzaniga and LeDoux on split brain patients or Rensink, O’Regan, and Clark on changes in visual scenes, convincingly demonstrate that humans do not construct a full, monolithic model of the environment. Instead humans tend to only represent what is immediately relevant from the environment, and those representations do not have full access to one another.

Humans have no monolithic control.

Naive introspection and observation can lead one to believe in a neurological equivalent of the central processing unit—something that makes the decisions and controls the other functions of the organism. While there are undoubtedly control structures, this model of a single, unitary control system is not supported by evidence from cognitive science.

One example comes from studies of split brain patients by Gazzaniga and LeDoux. As an experimental treatment for severe epilepsy in these patients, the corpus callosum (the main structure connecting the two hemispheres of the brain) was surgically cut. The patients are surprisingly normal after the operation, but with deficits that are revealed by presenting different information to either side of the (now unconnected) brain. Since each hemisphere controls one side of the body, the experimenters can probe the behavior of each hemisphere independently (for example, by observing the subject picking up an object appropriate to the scene that they had viewed). In one example, a snow scene was presented to the right hemisphere and the leg of a chicken to the left. The subject selected a chicken head to match the chicken leg, explaining with the verbally dominant left hemisphere that "I saw the claw and picked the chicken". When the right hemisphere then picked a shovel to correctly match the snow, the left hemisphere explained that you need a shovel to "clean out the chicken shed" (from p. 148 of Gazzaniga and LeDoux). The separate halves of the subject independently acted appropriately, but one side falsely explained the choice of the other. This suggests that there are multiple independent control systems, rather than a single monolithic one.

Humans are not general purpose.

The brain is conventionally thought to be a general purpose machine, acting with equal skill on any type of operation that it performs by invoking a set of powerful rules. However, humans seem to be proficient only in particular sets of skills, at the expense of other skills, often in non-obvious ways. A good example of this is the Stroop effect. When presented with a list of words written in a variety of colors, performance in a color recognition and articulation task is dependent on the semantic content of the words; the task is very difficult if names of colors are printed in non-corresponding colors. This experiment demonstrates the specialized nature of human computational processes and interactions.

Even in the areas of deductive logic, humans often perform extremely poorly in different contexts. Wason (1966) found that subjects were unable to apply the negative rule of if-then inference when four cards were labeled with single letters and digits. However, with additional context---labeling the cards such that they were understandable as names and ages---subjects could easily solve exactly the same problem.

Further, humans often do not use subroutine-like rules for making decisions. They are often more emotional than rational, and there is evidence that this emotional content is an important aspect of decision making (for example, the work from Damasio).

Essences of Human Intelligence

In an attempt to simplify the problem of building complex intelligent systems, classical AI approaches tended to ignore or avoid many aspects of human intelligence. We believe that many of these discarded elements are essential to human intelligence. Our methodology exploits four central aspects of human intelligence: development, social interaction, physical interaction and integration. Development forms the framework by which humans successfully acquire increasingly more complex skills and competencies. Social interaction allows humans to exploit other humans for assistance, teaching, and knowledge. Embodiment and physical coupling allow humans to use the world itself as a tool for organizing and manipulating knowledge. Integration allows humans to maximize the efficacy and accuracy of complementary sensory and motor systems. We believe that not only are these four themes critical to the understanding of human intelligence but also they actually simplify the problem of creating human-like intelligence.


Humans are not born with complete reasoning systems, complete motor systems, or even complete sensory systems. Instead, they undergo a process of development where they perform incrementally more difficult tasks in more complex environments en route to the adult state. Building systems developmentally facilitates learning both by providing a structured decomposition of skills and by gradually increasing the complexity of the task to match the competency of the system.

Development is an incremental process. Behaviors and learned skills that have already been mastered prepare and enable the acquisition of more advanced behaviors by providing subskills and knowledge that can be re-used, by placing simplifying constraints on the acquisition, and by minimizing new information that must be acquired. For example, Diamond (1990) shows that infants between five and twelve months of age progress through a number of distinct phases in the development of visually guided reaching. In this progression, infants in later phases consistently demonstrate more sophisticated reaching strategies to retrieve a toy in more challenging scenarios. As the infant’s reaching competency develops, later stages incrementally improve upon the competency afforded by the previous stages. Within our group, Marjanovic, Scassellati, and Williamson [postscript, compressed, 10 pages, 577k] [PDF, 10 pages, 1.2 Mb] applied a similar bootstrapping technique to enable the robot to learn to point to a visual target. Scassellati [postscript, compressed, 20 pages, 1.3 Mb] [PDF, 20 pages, 360k] has discussed how a humanoid robot might acquire basic social competencies through this sort of developmental methodology. Other examples of developmental learning that we have explored can be found in [postscript, compressed, 10 pages, 280k] and [postscript, compressed, 13 pages, 1.25 Meg] [PDF, 13 pages, 370k]

By gradually increasing the complexity of the required task, a developmental process optimizes learning. For example, infants are born with low acuity vision which simplifies the visual input they must process. The infant’s visual performance develops in step with their ability to process the influx of stimulation. The same is true for the motor system. Newborn infants do not have independent control over each degree of freedom of their limbs, but through a gradual increase in the granularity of their motor control they learn to coordinate the full complexity of their bodies. A process in which the acuity of both sensory and motor systems are gradually increased significantly reduces the difficulty of the learning problem. The caregiver also acts to gradually increase the task complexity by structuring and controlling the complexity of the environment. By exploiting a gradual increase in complexity both internal and external, while reusing structures and information gained from previously learned behaviors, we hope to be able to learn increasingly sophisticated behaviors. We believe that these methods will allow us to construct systems which scale autonomously [postscript, 5 pages, 92k] [postscript, compressed, 7 pages, 370k] [PDF, 7 pages, 730k]

Social Interaction

Human infants are extremely dependent on their caregivers, relying upon them not only for basic necessities but also as a guide to their development. This reliance on social contact is so integrated into our species that it is hard to imagine a completely asocial human; developmental disorders that effect social development, such as autism and Asperger’s syndrome, are extremely debilitating and can have far-reaching consequences. Building social skills into an artificial intelligence provides not only a natural means of human-machine interaction but also a mechanism for bootstrapping more complex behavior. Our research program has investigated social interaction both as a means for bootstrapping and as an instance of developmental progression.

Social interaction can be a means to facilitate learning. New skills may be socially transfered from caregiver to infant through mimicry or imitation, through direct tutelage, or by means of scaffolding, in which a more able adult manipulates the infant’s interactions with the environment to foster novel abilities. Commonly scaffolding involves reducing distractions, marking the task’s critical attributes, reducing the number of degrees of freedom in the target task, and enabling the infant to experience the end or outcome before she is cognitively or physically able of seeking and attaining it for herself. We are currently engaged in work studying bootstrapping new behaviors from social interactions [postscript, compressed, 43 pages, 963k] and  <<<< link to Breazeal-Velasquez 1998 SAB paper>>>>>

The social skills required to make use of scaffolding are complex. Infants acquire these social skills through a developmental progression. One of the earliest precursors is the ability to share attention with the caregiver. This ability can take many forms, from the recognition of a pointing gesture to maintaining eye contact [postscript, compressed, 20 pages, 1.3 Mb] [PDF, 20 pages, 360k].. In our work, we have also examined social interaction from this developmental perspective, building systems that can recognize and respond to joint attention by finding faces and eyes [postscript, compressed, 8 pages, 1.75 Meg] [PDF, 8 pages, 2.2 Mb] and imitating head nods of the caregiver <<<<link to Scaz’s 1998 Autonomous Agents paper>>>>>.

Embodiment and Physical Coupling

Perhaps the most obvious, and most overlooked, aspect of human intelligence is that it is embodied. A principle tenet of our methodology is to build and test real robotic systems. We believe that building human-like intelligence requires human-like interaction with the world. Humanoid form is important both to allow humans to interact socially with the robot in a natural way and to provide similar task constraints.

The direct physical coupling between action and perception reduces the need for an intermediary representation. For an embodied system, internal representations can be ultimately grounded in sensory-motor interactions with the world. Our systems are physically coupled with the world and operate directly in that world without any explicit representations of it. There are representations, or accumulations of state, but these only refer to the internal workings of the system; they are meaningless without interaction with the outside world. The embedding of the system within the world enables the internal accumulations of state to provide useful behavior.

In addition we believe that building a real system is computationally less complex than simulating such a system. The effects of gravity, friction, and natural human interaction are obtained for free, without any computation. Embodied systems can also perform some complex tasks in relatively simple ways by exploiting the properties of the complete system. For example, when putting a jug of milk in the refrigerator, you can exploit the pendulum action of your arm to move the milk (an example from Green (1982)). The swing of the jug does not need to be explicitly planned or controlled, since it is the natural behavior of the system. Instead of having to plan the whole motion, the system only has to modulate, guide and correct the natural dynamics. We have implemented one such scheme using self-adaptive oscillators to drive the joints of the robot’s arm [postscript, gzipped, 6 pages, 202k] [ftp] and [postscript, gzipped, 7 pages, 682k] [ftp].


Humans have the capability to receive an enormous amount of information from the world. Visual, auditory, somatosensory, and olfactory cues are all processed simultaneously to provide us with our view of the world. However, there is evidence that the sensory modalities are not independent; stimuli from one modality can and do influence the perception of stimuli in another modality. For example, Churchland, Ramachandran, and Sejnowski (1994) demonstrated an example of how audition can cause illusory visual motion. Vision can cause auditory illusions too, such as the McGurk effect. These studies demonstrate that sensory modalities cannot be treated independently.

Sensory integration can simplify the computation necessary for a given task. Attempting to perform the task using only one modality is sometimes awkward and computationally intensive. Utilizing the complementary nature of separate modalities can result in a reduction of overall computation. We have implemented several mechanisms on Cog that use multimodal integration to aid in increasing performance or developing competencies. For example, O’Reilly and Scassellati implemented a system that stabilized images from a moving camera using vestibular feedback.

By integrating different sensory modalities we can exploit the multimodal nature of stimuli to facilitate learning. For example, objects that make noise often move. This correlation can be exploited to facilitate perception. Wertheimer (1961) has shown that vision and audition interact from birth; even ten-minute-old children will turn their eyes toward an auditory cue. This interaction between the senses continues to develop; visual stimuli greatly affect the development of sound localization. In our work, Irie built an auditory system that utilizes visual information to train auditory localization <<<< link to Robert’s 1997 IJCAI paper>>>>. This work highlights not only the development of sensory integration, but also the simplification of computational requirements that can be obtained through integration.


   Brought to you by, Brian Scassellati lil-scaz.gif (4865 bytes)


Representatives of the press who are interested in acquiring further information about the Cog project should contact Elizabeth Thomson, thomson@mit.edu, from the MIT News Office,  http://web.mit.edu/newsoffice/www/ .


Home ] Up ] Publications ] Video ] People ] History of Cog ] Job Opportunities ]