Looking @ People: Estimation and Explanation of Human Motion

Michael J. Black
Xerox Palo Alto Research Center
http://www.parc.xerox.com/black/

November 6th, 1997
2:30pm
refreshments at 2:15pm
NE43 - 8th Floor Playroom

The estimation and explanation of human motion in image sequences is a challenging problem with diverse applications in human-computer interaction, medicine, robotics, animation, video databases, and surveillance to name a few. Motion is intimately tied with our behavior; we move when we communicate through facial expressions and gestures and when we interact with each other and with objects in the world. Recovering this motion is necessary if we want computers to understand human action.

Parameterized models (e.g. affine) are popular for motion estimation in rigid scenes since they provide accurate motion estimates and concise descriptions that can be used for recognition. Human motion, however, violates many of the assumptions made by these approaches. First, there may be multiple motions present in an image region. Second, the changes in image appearance between frames may not be well modeled by image motion. Third, traditional models of image motion such as affine are a poor approximation to the non-rigid motion of body parts such as mouths.

In this talk I will present an approach for estimating motion that can cope with many of these problems. The approach uses parameterized models of both image motion and image appearance change that can be "learned" from examples. Changes in the image sequence over time can be thought of as resulting from a "mixture" of causes or "layers". We robustly estimate the parameters of each layer and segment the image sequence into layers using the Expectation Maximization (EM) algorithm. Using examples of human facial expressions, speech, and articulated motion, I will illustrate how the estimated image-change parameters can be use to recognize human motions.

I will conclude with some thoughts on what remains to be done to reach our goal of understanding human motion.


Michael Black received his Ph.D. in 1992 from Yale University and is currently the head of the image understanding research group at Xerox PARC.