The computer has no previous knowledge of the objects it is observing. It gains experience with the world and objects in that world by observing. It then uses that experience to organize what it sees.
The system learns to identify different categories of objects based on characteristics such as size, shape, and behavior (namely speed and paths of movement).
We would like our system to be capable of classifying an object given only the instantaneous motion silhouette of the object without supervision. While this seems like a lofty goal, we can use our tracking sequences to learn some invariance on our space of silhouettes. As a person walks or an automobile drives through a scene, it presents differently at each frame. We attempt to use the invariance shown within the tracks to give us information about how to cluster the represenations.
In our current domain, we are able to find two very discrete clusters of motion silhouettes... one pertaining to cars and one pertaining to people. Using those automatically generated clusters, we have taken all the activity in a scene for a particular morning and seperated it into two seperate TrackApplets. While this classification is simple, it can reduce your query domain size by one half in many cases.
|Classification system||Resulting classes for One Day|
This example uses the position, movement, and size characteristics of the objects over time to cluster based on the activity of the object. It also clusters based on idenity in certain cases.
| 0 || 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 ||10||11||12||13||14||15|
|PRESS THE NUMBERS ABOVE TO SEE THE CLASSES OF ACTIVITIES.|
Above is a binary tree which shows how the activities for the scene (shown in the upper left) have been automatically broken down. At the top of the tree is the "universal" event. Every event is shown in the corresponding template. The first break is based on direction (eastbound vs. westbound). While this may seem like an obvious break to make, it only made it because very few objects showed both eastbound and westbound characteristics. This is not the case in many scenes.
The second break (on both sides of the tree) seperates path traffic from road traffic. Further breaks break activities down into very tight clusters of activities. If you click on any of the numbers below the tree you will be able to view the activities of the corresponding leaf node in a Java 1.1 Applet.
The computer does a pretty good job classifying the objects it sees. As a general rule, the computer is able to accurately distinguish people from cars. Bicycles, because they have attributes of both people and cars, are problematic.
Why does the computer make some mistakes?
We take for granted our ability to instantly distinguish a car from a bicycle, or a group of people walking single file. In fact, this ability is made possible by years of experiences with these objects using multiple senses (hearing and smell, for example, as well as vision). Researchers hope to eventually increase the computer's accuracy by combining its visual sense with an ability to hear.