Variable Viewpoint Reality
Progress Report: July 1, 1999December 31, 1999
Paul Viola and Eric Grimson
In the foreseeable future, sporting events will be recorded in super high fidelity from hundreds or even thousands of cameras. Currently the nature of television broadcasting demands that only a single viewpoint be shown, at any particular time. This viewpoint is necessarily a compromise and is typically designed to displease the fewest number of viewers.
In this project we are creating a new viewing paradigm that will take advantage of recent and emerging methods in computer vision, virtual reality and computer graphics technology, together with the computational capabilities likely to be available on next generation machines and networks. This new paradigm will allow each viewer the ability to view the field from any arbitrary viewpoint -- from the point of view of the ball headed toward the soccer goal; or from that of the goalie defending the goal; as the quarterback dropping back to pass; or as a hitter waiting for a pitch. In this way, the viewer can observe exactly those portions of the game which most interest him, and from the viewpoint that most interests him (e.g. some fans may want to have the best view of Michael Jordan as he sails toward the basket; others may want to see the world from his point of view).
Progress Through December 1999
We have made rapid progress on a number of problems related to the goals of the Variable Viewpoint Reality project:
We have developed a number of basic algorithms for 3D reconstruction. One approach is designed to work in real time on many cameras. Another is a bit slower, but is designed to yield higher quality results. A third attempts to find the arm, leg and body positions of a human being from one or multiple camera views.
Each of these algorithms is in its very earliest stages. We are exploring these algorithms now: constructing implementations, testing assumptions, etc.
We have designed and setup a multiple camera systems for acquiring data in real-time. This system was designed to be flexible and to work indoors. Right now we have 12 cameras working in synchrony. We would like to setup more.
We have acquired a great deal of multi-camera data. This is allowing us to test our algorithms and to develop new ideas.
In collaboration with students working on another project we have been observing outdoor activities. This system provides coarse tracking information of multiple people and cars. The system can also recognize simple activities.
New Results since July 1999
We have developed a new algorithm for the reconstruction of 3D shapes. This algorithm addresses one of the key problems we have encountered to date
noise in the camera observations. In previous reconstruction algorithm, each camera attempts to segment the object from the background. These segments are then intersected to form a 3D shape. In some cases noise in the cameras leads to incorrect segmentation. This in turn leads to poor reconstruction. Our new algorithm explicitly models this noise and introduces a prior over shapes. The result is the Bayes optimal reconstruction which is very insensitive to noise. These results are described in a pre-print paper available from the MIT/NTT web page: http://www.ai.mit.edu/projects/ntt.
We have made significant progress on texture mapping 3D shapes. This will allow us to create much more convincing VVR displays.
- We have demonstrated the system performing real-time 3D reconstruction using 16 cameras. This system combines many of the results mentioned above.
Research Plan for the Next Six Months
Integration of new reconstruction algorithms with real-time system.
Incorporating more cameras with real-time system.
Improving tracking of multiple people.