The motivation behind creating Cog is the hypothesis that:
Humanoid intelligence requires humanoid interactions with the world.
(Read The Cog Project: Building a Humanoid
Robot for more details.)
Avoiding flighty anthropomorphism, you can consider Cog to be a set of sensors and
actuators which tries to approximate the sensory and motor dynamics of a human body.
Except for legs and a flexible spine, the major degrees of motor freedom in the trunk,
head, and arms are all there. Sight exists, in the form of video cameras. Hearing and
touch are on the drawing board. Proprioception in the form of joint position and torque is
already in place; a vestibular system is on the way. Hands are being built as you read
this, and a system for vocalization is also in the works. Cog is a single hardware
platform which seeks to bring together each of the many subfields of Artificial
Intelligence into one unified, coherent, functional whole.
Why build a human-like robot?
In thinking about human level intelligence, there are two sets of reasons one might
build a robot with humanoid form.
If one takes seriously the arguments of Johnson and Lakoff, then the form of our bodies
is critical to the representations that we develop and use for both our internal
thought (whatever that might mean...) and our language. If we are to build a robot
with human like intelligence then it must have a human like body in order to be able to
develop similar sorts of representations. However, there is a large cautionary note to
accompany this particular line of reasoning. Since we can only build a very crude
approximation to a human body there is a danger that the essential aspects of the human
body will be totally missed. There is thus a danger of engaging in cargo-cult science,
where only the broad outline form is mimicked, but none of the internal essentials are
there at all.
A second reason for building a humanoid form robot stands on firmer ground. An
important aspect of being human is interaction with other humans. For a human-level
intelligent robot to gain experience in interacting with humans it needs a large number of
interactions. If the robot has humanoid form then it will be both easy and natural
for humans to interact with it in a human like way. In fact it has been our observation
that with just a very few human-like cues from a humanoid robot, people naturally fall
into the pattern of interacting with it as if it were a human. Thus we can get a large
source of dynamic interaction examples for the robot to participate in. These examples can
be used with various internal and external evaluation functions to provide experiences for
learning in the robot. Note that this source would not be at all possible if we simply had
a disembodied human intelligence. There would be no reason for people to interact with it
in a human-like way.
Why not just simulate it?
One might argue that a well simulated human face on a monitor would be as engaging as a
robot---perhaps so, but it might be necessary to make the face appear to be part of a
robot viewed by a distant TV camera, and even then the illusion of reality and engagedness
might well disappear if the interacting humans were to know it was a simulation. These
arguments, in both directions are speculative of course, and it would be interesting,
though difficult, to carry out careful experiments to determine the truth. Rather than
being a binary truth, it may well be the case that the level of natural interaction is a
function of the physical reality of the simulation, leading to another set of difficult
engineering problems. Our experience, a terribly introspective and dangerous
thing in general, leads us to believe that a physical robot is more engaging than a screen
image, no matter how sophisticated.
But in any case...
It turns out to be easier to build real robots than to simulate complex intereactions
with the world, including perception and motor control. Leaving those things out would
deprive us of key insights into the nature of human intelligence.
Already from our work with Cog we have discovered how being forced to learn an occular
motor map so that the robot can saccade high resolution cameras (something that wouldn't
be necessary to do in simulation) gives us a basis for learning visually guided
To do a worthwhile simulation you have to understand all the issues relevant to the
simulation beforehand; but as far as human level intelligence is concerned, that is
exactly what we are trying to find out--the relevant issues.