MIT Media Laboratory
Perceptual Computing
The Perceptual Computing Section of the Media Laboratory is concerned
with making computers understand their environment, with special
emphasis on understanding and cooperating with people. The Section
consists of eight faculty, six postgraduate researchers, and more than
forty graduate students in the areas of vision, music, speech, and
multimodal interaction. Faculty participating in the Media Lab's ICCV
open house include Aaron Bobick, Mike Bove, Roz Picard, Ted Adelson,
and Alex Pentland.
Demonstration Abstracts
- Lightness, Transparency, and Mid-Level Vision
- ADELSON, EDWARD
- Some new brightness illusions will be demonstrated. These illusions
indicate the importance of mid-level mechanisms involving transparency,
occlusion, and lighting.
- Object-oriented television and Cheops processing system
- AGAMANOLIS, STEFAN
- Cheops is a data-flow computer built in the lab which can be used to
display such "structured video" programs in real time. Structured
video is represented as 2D and 3D objects rather than pixels or
frames. These objects are "transmitted" together with a script that
tells how to assemble them to make a television program.
- Automated Extraction and Resynthesis of Walkers
- ASKEY, DAVID
- Automated layer decomposition of walkers in an image sequence:
an approach for efficient coding and resynthesis of walking
motion using component layers.
- Put That There/Models from Video
- AZARBAYEJANI,ALI
- We will show a wide-baseline stereo system for tracking
people in 3-D based on symbolic correspondence. The system is
self-calibrated and the output is used for gestural control in a 3-D
audio visual environment.We will also show a system for building 3D
models from video, which is based on our Structure-from-Motion
research, described in last month's IEEE PAMI.
- Ambient Microphone Speech Recognition
- BASU, SUMIT
- This demonstration illustrates the use of an array of
microphones along with visual cues to perform speech
recognition "at a distance" in a noisy, open environment.
- Semiautomatic 3D model building and lens distortion correction
- BECKER, SHAWN
- Reconstructing camera parameters, planar 3-D geometry and surface
texture, given one or more views of a scene with pre-selected parallel
and coplanar edges. This technique has been used to generate a 3-D
textured database from a set of still images taken with an
uncalibrated 35mm camera. This technique has also been used to
determine 3-D positions of actors from video.
- Physics-based scene understanding
- BRAND, MATT
- Knowledge-intensive vision systems can understand scenes with complex
visual and causal structure. This demo shows the visual analysis and
explanation of a variety of artifacts, including mechanical transmissions.
- Phase Space Recognition of Human Body Motion
- CAMPBELL, LEE
- This work presents a method for representing and recognizing human
body motion. It identifies sets of constraints that are diagnostic
of a movement; different constraints identify different movements.
- Vision-Steered Phased-Array Microphones
- CASEY, MIKE
- A beam-forming microphone array is used to capture noisy speech input
from the ALIVE space. Using the position information provided by the
vision system, we obtain audio singal enhancements up to 10dB.
- ALIVE, Active Face Tracking/Recognition/Pose Estimation
- DARRELL, TREVOR
- We will show active face tracking, recognition and pose estimation in
the ALIVE system. Users can walk about a room and interact with autonomous
virtual creatures in a `Magic Mirror' paradigm; the creatures can
recognize/track/respond to the users face as well as body position and
hand gestures.
- Recognizing Facial Expressions
- ESSA, IRFAN
- We describe our methods for extracting detailed representations
of facial motion from video. We will show how these representation
can be used for Coding, Analysis, Recognition, Tracking and Synthesis
of Facial Expressions.
- Transaural Rendering
- GARDNER, BILL
- The STIVE demo will feature a three-dimensional audio system which
uses only two speakers to create the illusion of sounds emanating from
arbitrary directions around the listener.
- Closed-World Tracking
- INTILLE, STEPHEN
- Tracking for video annotation using contextual information to
dynamically select tracked features. Example domain: football plays.
- Wold-based Texture Modeling
- LIU, FANG
- We apply the Wold-based texture model to image database retrieval.
The Wold model provides perceptually sensible features which
correspond well to the reported most important dimensions of human
texture perception -- periodicity, directionality, and randomness.
- Video Orbits for Mosaicing and Resolution Enhancement Wearable Computers
- MANN, STEVE
- New featureless multiscale method of estimating the homographic coordinate
transformation between a pair of images. This method is used to make pictures
with a "Visual filter" equipped with image acquisition and display capability.
Standing in a single location. A scene is scanned on a large "video canvas"
where each new frame undergoes the appropriate homographic coordinate
transformation to insert it correctly into the image mosaic. I will also show my work on wearable computers and NetCam.
- Photobook: Content-Based Image Retrieval
- MINKA, TOM
- Content-based image annotation is complicated by the fact that feature
salience varies with context. FourEyes indexes images using several
features which are consulted independently, based on user interaction.
- Large Database Face Recognition, and Active Face Recognition/Tracking/Pose Recognition
- MOGHADDAM, BABACK
- An automatic system for detection, recognition and model-based coding
of human faces is presented. The system is able to detect human faces
(at various scales and different poses) in the input scene and
geometrically align them prior to recognition and compression. The
system has been tested successfully on over 2,000 faces from ARPA's
FERET program.
- Thin-plate Models for Motion Analysis and Object Recognition
- NASTAR, CHAHAB
- We present a deformable model for nonrigid motion tracking
(e.g. heart motion). A similar model can be
used for object recognition (e.g face recognition).
- Detecting Kinetic Occlusion
- NIYOGI, SOURABH
- Description: Detecting motion boundaries in image sequences through
local spatiotemporal junction analysis; deducing ordinal depth locally
from accretion and deletion cues.
- SmartCam
- PINHANEZ, CLAUDIO
- An SmartCam is a robotic camera which operates in a TV studio without
a cameraman, using computer vision to find objects and people in
complex scenes. The development of SmartCams requires new methods and
ideas in context-based vision, action recognition, and architecture of
computer vision systems.
- High-Dimensional Probabilistic Modeling
- POPAT, KRIS
- Improved probabilistic models often mean better performance in a
variety of systems. Accurate modeling usually requires
high-dimensional modeling, with its attendant difficulties.
We explore some approaches to high-dimensional modeling, and
explore their application to image compression and restoration,
and to texture synthesis and classification.
- M-Lattice -- Nonlinear Dynamics For Vision and Image Processing
- SHERSTINSKY, ALEX
- This research investigates the mathematical properties of the
Reaction-Diffusion model and its derivative the new "M-Lattice" system.
Originated by Alan Turing in order to explain morphogenesis, we
demonstrate these models' applications to computational vision and
image processing.
- Real-time visual recognition of American Sign Language Wearable Computing
- STARNER, THAD
- Full-sentence, 40-word lexicon ASL is recognized with an accuracy of
99.2% in real-time without explicit modelling of the fingers. One
color camera is used for tracking. I will also show my work on wearable computers and rememberance agents.
- Scene Cut Detection and Motion Texture Modeling
- SZUMMER, MARTIN
- 1) A robust algorithm for finding cuts in video -- to "skip ahead to
the next shot." 2) A stochastic motion model for estimating and
resynthesizing spatio-temporal patterns. (water, smoke, etc.)
- Query by Content in Video Sequences
- WACHMAN, JOSH
- Unsupervised, Cross-Modal Characterization of Discourse in Tonight
Show Monlogues: Preliminary results from analysis of audio and
visual-kinesic features as processed with the isodata clustering
algorithm demonstrate a bottom up approach to discourse analysis
- Layered Image Representation
- WANG, JOHN
- We will demonstrate novel techniques in motion estimation and
segmentation based on mid-level vision concepts for applications in
image coding, data compression, video special effects, and 3D
structure recovery.
- Non-Rigid Motion Segmentation: Psychophysics and Modeling
- WEISS, YAIR
- Estimating non-rigid motion requires integrating multiple constraints
while segmenting others. We will show psychophysical demonstrations
which reveal how the human visual system solves this dilemma.
- Learning Visual Behavior for Gesture Analysis
- WILSON, ANDREW
- The "visual behavior" of gesture is recovered from a number of example image
sequences by concurrently training the temporal model and multiple models of
the visual scene. The training process is demonstrated.
- ALIVE
- WREN, CHRIS
- We will show active face tracking, recognition and pose estimation in
the ALIVE system. Users can walk about a room and interact with autonomous
virtual creatures in a `Magic Mirror' paradigm; the creatures can
recognize/track/respond to the users face as well as body position and
hand gestures.