Virtual Viewpoint
Reality
NTT: Visit
1/7/99
Overview of VVR Meeting
|
|
|
|
Motivation from MIT ... |
|
Discuss current and related work |
|
Video Activity Monitoring and
Recognition |
|
3D Modeling |
|
Demonstrations |
|
Related NTT Efforts |
|
Discussion of collaboration |
|
Future work |
|
Lunch |
Motivating Scenario
|
|
|
|
Construct a system that will allow a
user to observe any viewpoint of a sporting event. |
|
From behind the goal |
|
Along the path of the ball |
|
As a participating player |
|
Provide high level
commentary/statistics |
|
Analyze plays |
|
Flag goals/fouls/offsides/strikes |
|
|
|
|
Given a number of fixed
cameras…
Can we simulate any other?
A Virtual Reality Spectator
Environment
|
|
|
|
|
Build an exciting, fun, high-profile
system |
|
Sports: Soccer, Hockey, Tennis,
Basketball |
|
Drama, Dance, Ballet |
|
Leverage MIT technology in: |
|
Vision/Video Analysis |
|
Tracking, Calibration, Action
Recognition |
|
Image/Video Databases |
|
Graphics |
|
Build a system that provides data
available nowhere else… |
|
Record/Study Human movements and
actions |
|
Motion Capture / Motion Generation |
Factor 1: Window of Opportunity
|
|
|
|
20-50 cameras in a stadium |
|
Soon there will be many more |
|
HDTV is digital |
|
Flexible, very high bandwidth
transmissions |
|
Future Televisions will be Computers |
|
Plenty of extra computation available |
|
3D Graphics hardware will be integrated |
|
Economics of sports |
|
Dollar investments by broadcasters is
huge (Billions) |
|
Computation is getting cheaper |
Factor 2: Research
|
|
|
|
Calibration |
|
How to automatically calibrate 100
moving cameras? |
|
Tracking |
|
How to detect and represent 30 moving
entities? |
|
Resolution |
|
Assuming moveable/zoomable
cameras: How to direct cameras towards the important
events? |
|
Action Understanding |
|
Can we automatically detect significant
events - fouls, goals, defensive/offensive plays? |
|
Can we direct the user towards points
of interest? |
|
Can we learn from user feedback? |
Factor 3: Research
|
|
|
|
|
Learning / Statistics |
|
Estimating the shape of complex objects
like human beings is hard. How can we
effectively use prior models? |
|
Can we develop statistical models for
human motions? |
|
For the actions of an entire team? |
|
Graphics |
|
What are the most efficient/effective
representations for the immersive video stream? |
|
What is the best scheme for rendering
it? |
|
How to combine conflicting information
into a single graphical image? |
|
|
Factor 4: Enabling Other Applications
|
|
|
|
|
Cyberware Room |
|
A room that records the shape of
everything in it. |
|
Every action and motion. |
|
Provide Unprecedented Information |
|
Study human motion |
|
Build a model to synthesize motions
(Movies) |
|
Study sports activities |
|
Provide constructive feedback |
|
Study ballet and dance |
|
Critique? |
|
Study drama and acting |
Factor 5: NTT Interest and Involvement
|
|
|
|
NTT has expertise: |
|
Networking and information transmission |
|
Computer Vision |
|
Human Interfaces |
|
|
|
We would like your feedback here! |
Overview of VVR Meeting
|
|
|
|
Motivation from MIT ... |
|
Discuss current and related work (MIT) |
|
Video Activity Monitoring and
Recognition |
|
3D Modeling |
|
Demonstrations |
|
Related NTT Efforts |
|
Discussion of collaboration |
|
Future work |
|
Lunch |
Progress on 3D
Reconstruction
|
|
|
|
Simple intersection of silhouettes |
|
Efficient but limited. |
|
Tomographic reconstruction |
|
Based on medical reconstructions. |
|
Probabilistic Voxel Analysis (Poxels) |
|
Handles transparency. |
|
|
Simple Technical Approach
|
|
|
|
1
Integration/Calibration of Multiple Cameras |
|
2: Segmentation of Actors from Field |
|
Yields silhouettes -> FRUSTA |
|
3: Build Coarse 3D Models |
|
Intersection of FRUSTA |
|
4: Refine Coarse 3D Models |
|
Wide baseline stereo |
Idea in 2D
Idea in 2D: Segment
Idea in 2D: Segment
Idea in 2D: Intersection
Coarse Shape
Real Data: Tweety
|
|
|
|
Data acquired on a turntable |
|
180 views are available… not all are used. |
Intersection of Frusta
|
|
|
|
|
Intersection of 18 frusta |
|
Computations are very fast |
|
perhaps real-time |
Agreement provides
additional information
Tomographic Reconstruction
|
|
|
|
Motivated by medical imaging |
|
CT - Computed Tomography |
|
Measurements are line integrals in a
volume |
|
Reconstruction is by back-projection
& deconvolution |
Acquiring Multiple Images
(2D)
Backprojecting Rays
Back-projection of image
intensities
Volume Render...
|
|
|
Captures shape very well |
|
Intensities are not perfect |
Slide 27
Slide 28
Slide 29
Slide 30
Slide 31
Slide 32
Slide 33
Slide 34
Slide 35
Results…
Overview of VVR Meeting
|
|
|
|
Motivation from MIT ... |
|
Discuss current and related work (MIT) |
|
Video Activity Monitoring and
Recognition |
|
3D Modeling |
|
Demonstrations |
|
Related NTT Efforts |
|
Discussion of collaboration |
|
Future work |
|
Lunch |