The Human Intelligence Enterprise

The Explanation of Human Intelligence

A Roadmap for Research

Robert C. Berwick, Thomas F. Knight, Jr.,
Howard E. Shrobe, Gerald Jay Sussman, Shimon Ullman,
Patrick Henry Winston, and Kenneth Yip

The Time has Come

Modern work on understanding human intelligence from a computational point of view began about 50 years ago, just after the development of computers stimulated landmark papers, such as those of Alan Turing on the “Turing Test” and Claude Shannon on computer chess.

We expect that developing a full understanding of the components of human intelligence could prove to be one of those great intellectual challenges that take 100 years to meet, so we should be half way there in time. We do not have half of the ideas, however. As one would expect, the obvious ideas are not always the right ideas, yet the obvious ideas take time to develop, test, and be found wanting.

Today, however, recently developed ideas and experimental methods—conceived and developed by visionary people in artificial intelligence, cognitive science, linguistics, systems neuroscience, and molecular neuroscience—fuel optimism for what can be done during the next ten years.

These new ideas and methods bring a golden opportunity within reach—the sort of opportunity that comes along just a few times per century. During the next ten to twenty years, we can substantially solve the problem of understanding intelligence. Then, during the next thirty to forty years, we can develop applications with truly humanlike intelligence—applications that will change the world in the same way that aviation and computers changed the world in the 20th century.

With a view toward reaching for the biggest goals, we have held lengthy, individual discussions with many prominent MIT scientists, such as Noam Chomsky, Emilio Bizzi, Randall Davis, Morris Halle, Michael Jordan, Jerry Lettvin, Alec Marantz, Joel Moses, Marvin Minsky, Alex Pentland, Tomas Lozano-Perez, Steve Pinker, Tomaso Poggio, Mitch Resnick, Elizabeth Spelke, Mriganka Sur, Matthew Wilson, and many other prominent scientists from other institutions.

From these discussions, we have reached the conclusions expressed in this paper.

Scientific Impact

Explaining how the brain thinks, from a scientific perspective, is a big goal—on the same level as explaining the solar system, evolution, and the genetic code. Why we want to do it is self evident:

Having succeeded in understanding how the brain thinks, our species will enjoy an enrichment of self image, as a greater understanding of how we function works its way into the following:

Applied Impact

Creating truly intelligent systems, from an applied perspective, is also a big goal—on the same level as creating airplanes and computers.

In the nearer term, we anticipate that understanding how the brain thinks will advance education:

Further out, we can imagine, for example, the following, perhaps directly wired to our brains:

What Is New?

Today, we are much better situated to undertake the mission of understanding human intelligence than, say, 10 years ago. There are several reasons.

First, neuroscience has launched revolutionary improvements in experimental methods:

Second, our own views have matured, leading us to the following conclusions:

By “The intelligence is in the I/O,” we mean that intelligence lies in the use and re-use of vision, language, and motor systems, not behind them.

By “There is recognizable engineering in the brain,” we mean that nature exploits powerful computational mechanisms, brought into purposeful combinations by genetic, environmental, and other biological forces working together. For example, we should not be surprised to find analogs of shift registers, constraint-propagators, and bidirectional search engines.

By “There is good science in the brain's engineering,” we mean that nature's computing mechanisms exploit powerful computing ideas. For example, we should not be surprised to find the central limit theorem hard at work, suppressing noise, in sparsely populated, high-dimensional spaces.

Third, there has been a steady accumulation of knowledge in artificial intelligence, cognitive science, linguistics, systems neuroscience, and molecular neuroscience.

From the perspective of artificial intelligence, it seemed, during the field's infancy, that understanding intelligence from a computational point of view might be a matter of ten years of hard work. Then, during an adolescent phase, it seemed that understanding intelligence might follow only from 300 years of incremental progress.

Evidently, during infancy, we did not know enough to be scared; then, during adolescence, we did not know enough to be optimistic.

Now, with maturity, we look back, reflectively, on a great deal of helpful experience. We look forward, reflectively, with the optimism that comes with an armamentarium of accumulated ideas.

And Fourth, computational infrastructure has improved enormously. During the past 10 years we have seen:

Why Now?

Progress in neuroscience, the maturing of our views, the steady advance of knowledge, and infrastructure miracles all work together to bring tremendous opportunities into view. By concentrating our forces, we can seize those opportunities and translate them into successes.

Why MIT?

MIT can pull together strengths that no other institution can match—strengths in artificial intelligence, cognitive science, linguistics, system neuroscience, and molecular neuroscience—at just the time when recently developed ideas and experimental methods have laid the foundation for a decade of breakthroughs.

What Should Be Done?

A Strategic Plan Should Be Written

We propose to transform this paper into a strategic plan. As the views and desires of more people are brought in, we will be able to go into much greater detail with respect to what will be done. In that strategic plan, we will exhibit a detailed description of a graded set of problems to be solved, thus ensuring that progress will be steady. We will also specify a series of engineering spinoffs, thus ensuring that the work will produce practical benefits, as well as scientific advances.

The Work Should Be Collaborative and Multidisciplinary

Meanwhile, we supply short descriptions of past work that we find particularly congruent with our views, along with short descriptions of future work that we propose to do.

We provide these examples to illustrate the characteristically multidisciplinary view that we think is essential. We want to create powerful positive feedback loops in which human or animal experiments lead to computational hypotheses, that lead to experimental computer programs or computational devices, that lead back to human or animal experiments.

Representative Examples

In each example, we provide the general question; the anvil on which an answer is hammered out; the contributing, multidisciplinary foundational ideas; and for work already done, the results.

Because our purpose is to expose how we think, with a minimum number of words, the explanation style is telegraphic, rather than tutorial.


How do we recognize objects?

Investigator: Shimon Ullman

Anvil: How can a machine recognize printed characters?

Foundational ideas:

Result: Streams and counter streams.

According to Ullman's theory, perceived characters are shifted and scaled in many ways, producing a tree of altered characters. This tree is analogous to the forward search from a starting point. Then, remembered views of various characters are combined, rotated, and skewed in many ways, producing a second tree. This second tree is analogous to backward search from a goal. One tree constitutes the stream; the other, the counter stream.

Recognition occurs when a leaf of the forward search tree matches a leaf from the backward search tree. In performing the search, search blowup is kept in control, because the bidirectional search halves the exponent.

If only an approximate match occurs, further search occurs, with each transformation, from both directions, concentrated in the vicinity of the transformations that produced the approximate match.

Ullman tests his theory via programs that recognize Chinese characters, with encouraging results.

Derivative questions: Where else might streams and counter streams constitute an illuminating model? Analogical matching? Simulation?


How do we learn phonological rules?

Investigators: Kenneth Yip and Gerald J. Sussman

Anvil: How do we learn English pluralization and past-tense rules?

Foundational ideas:

Result: Sparse-space learning

Yip and Sussman began by envisioning a shift register, with each element a distinctive feature vector. In addition, auxiliary bits represent syntactic and semantic elements, with labels such as plural and past. Then, learning reduces to finding bidirectional constraints that maintain consistency between patterns of distinctive features in the shift register and the auxiliary bits.

At first, Yip and Sussman were held up by knowing too much: they looked for ways of using logic-minimization methods to find the consistency-enforcing constraints.

Then, they remembered that Morris Halle, who taught them about distinctive features, asserted that all human languages have far fewer than 100 phonemes. Hence, legitimate phonemes occupy only a small fraction of distinctive-feature space. Realizing that the space is sparsely filled, they proceeded to devise a constraint-learning algorithm with many desirable, experimentally verified features. For example, linguistic rules are learned from just a few examples, but tidal waves of examples do no harm; the linguistic rules are functionally equivalent to the rules worked out earlier by Halle and Chomsky by hand; and the learning algorithm readily handles noise and exceptions.

These results differ from standard neural net approaches in that only 1 or 2 unsupervised examples enable correct learning, just as with children.

Derivative questions: What other sorts of learning might be facilitated by operation in a sparse space? Is the utility of sparse-space learning one reason for having the functional equivalent of symbols? Do sparse spaces have other roles?

Sample question

What is the role of symbols in thinking?

Anvil: How do we suppress noise in listening to speechn?

Foundational ideas:

Hypothesis: Sparse-space noise suppression.

In a sparse space, a phoneme vector can be corrupted by a great deal of normally distributed noise before the corrupted vector will become closer to some other phoneme. In a sparse space, with many dimensions, the central limit theorem dictates that a truly random vector will be approximately equally far from any set of randomly selected points.

Accordingly, from an engineering point of view, high-dimensional, sparsely filled spaces have highly desirable characteristics, so it is natural to wonder if evolution has stumbled onto such a noise-suppression solution.

Derivative questions: Are words the sparse fillers of a space of phoneme sequences? Is there a bidirectionality to the interaction of distinctive features, phonemes and words? Can a few shaky distinctive features determine a few shaky phonemes, that are enough to fix a word, which in turn shores up and corrects the phoneme interpretation, which shores up and corrects the distinctive feature interpretation?

Sample question

How do we exploit analogous situations?

Anvil: How might brains perform domain-specific matches?

Foundational ideas:

Hypothesis: Bidirectional domain-specific matching.

There has been much work on analogy-based reasoning in artificial intelligence. Winston, for example, has shown that programs can find and exploit parallels in simple theories (water pipes and circuits), story precis (Shakespearean plots), and functional descriptions of simple objects (cups).

The work remains primitive, however, for several reasons, one of which is that there always seems a need to make the analogical matcher smarter and smarter, until it becomes a sort of all-knowing homunculus.

The alternative is to think of matching as a product of a bidirectional domain-specific search, analogous to Ullman's streams and counter streams process, except that now the transformations are not shifts, scales, and rotations, but rather mappings of one sort of object or relation into another.

At first, when nothing is known about a domain, the mapping is necessarily varied, and the search for a match must be broad. Skinny water pipes become resistors or capacitors or inductors. Macbeth becomes Hamlet or Claudius or Gertrude. Cups become bowls or bricks or toothbrushes. Later, however, as problems are solved, transformation sequences that do well in a domain are remembered with that domain, so that experience leads to routine skill.

Transformation-sequence memories are domain specific. Thus, matching knowledge is stored outside the matcher, eliminating the all-knowing homunculus problem. Also, becoming good at one domain does not transfer readily to becoming good at another, for the transformation sequences are likely different. Well-educated engineers are not necessarily good at, say, post-modern literary criticism.

Sample question

How might brains exploit randomness in neural process development?

Anvil: Why do brains have cortical columns?

Foundational ideas:

Hypothesis: Columns provide multiple venues for bringing relevant features into close proximity.

We see intriguing points of similarity between randomized interconnection networks and the cortical processing areas and columns.

In computer architecture, we have found that the most powerful interconnection switches rely on multiple, randomly interconnected stages of switching elements, where each stage brings together in a small, localized area, a distinct set of the problem dimensions. The randomized wiring performs a sort of dimensional transformation between stages, allowing a very high dimensional space to be laid out efficiently in a much lower dimensional package.

In nature, cortex is organized into cortical processing areas and cortical columns. We know that vision, for example, proceeds through tens of distinct cortical processing areas, each consisting of tens of thousands of 200 to 400 micron diameter cortical columns. These columns are often interleaved in such a way as to provide locality to important computations, such as binocular stereo processing.

The ability to bring multiple features into physical proximity may underlie our ability to recognize individual concepts and groups of concepts organized into situations.

Sample question

What is the role of downward projections in problem solving?

Anvil: Can downward projections invoke simulations that support problem solving?

Foundational ideas:

Hypothesis: Downward projections may provide a means by which visual and linguistic abstractions are translated into simulated visual or linguistic inputs, thus re-engaging visual and linguistic machinery over and over as problem solving proceeds.

This sort of thinking may prove to be consistent with Ullman's streams-and-counter-streams theory of object recognition, because if the downward stream is reflected back at some level, rather than continuing toward a match with perceptual inputs, then higher levels see a reflected image as if it were perceived. The visual routines introduced in Ullman's earlier work could operate on such reflected images as if they were real. Thus, visual routines could compute answers to problems that stimulate downward flow from visual memory. For example, if a child is too young to add, he can count on his fingers, or he can close his eyes and imagine what it would be like to count fingers.

Another place where sort of problem solving may occur is in speech recognition: we may use motor control signals to the vocal tract planning units as a way to generate hypotheses about speech.

Other Questions

Who Should Participate?

We see inclusive collaborations in which the researchers—with backgrounds in artificial intelligence, cognitive science, linguistics, systems neuroscience, molecular neuroscience, and other disciplines—all sit at a round table, with no contributing discipline viewed as more central than another.

A Computational World View

From the broadest perspective, the proposed steps toward understanding thinking computationally have important parallels in other disciplines, such as biology and physics.

Over the centuries we have developed formal mathematical languages for expressing various aspects of our understanding of the universe. The development of each mathematical language was spurred by the need to express the previously inexpressible. Each such language development enabled scientific progress by creating a new window through which to view the world. Consider, for example, the following:
Mathematical language Understanding enabled
Calculus and differential equationsClassical mechanics
Partial-Differential EquationsElectromagnetic fields
MatricesQuantum mechanics
TensorsGeneral relativity

Computation—like a mathematical language—provides us with a new window of understanding, because computation enables us to describe complex systems. Using computation, we can express, formalize, and study very complex processes in a precise and elegant manner.

Viewing computation as a powerful tool, complementing those offered by traditional mathematics, is a jolting proposition. Many scientists still subscribe to a more limited view of computation, believing that computation is useful in just the traditional ways:

By contrast, we have come to believe that the role of computation in science has a much more profound capability:

Example: Biology

Computational explanations provide us with new ways to extend our understanding of biological systems, as illustrated by the following modeling examples:

Engineering Impact of a Deeper Biological Understanding

An interest in computational explanations will have a big impact on the way bioengineering is done, enabling progress toward the really high-impact short- and long-term engineering goals, such as the following:

Example: Physics

Forward thinking computer scientists and physicists can think about information as a substance, in the same sense that physicists have traditionally viewed energy, matter, and momentum as substances. From this perspective, it seems evident that, say, a deep understanding of quantum mechanics will involve an understanding of information and its conservation laws. Also, discrete models of spacetime may solve problems in renormalization, self energy of electrons, and the finite information capacity of spacetime regions.

Engineering Impact of a Deeper Physical Understanding

The information-as-substance perspective makes it possible for engineers to think of new ways to understand materials and new ways to connect information processing to physical materials:


Education and research are mutually enabling. A healthy educational program stimulates research, and an exciting research program enables new educational tracks. In another paper, we lay out an approximation to what might become an undergraduate curriculum.