High-Resolution Mapping and
Modeling of
Multi-Floor Architectural
Interiors
MIT9904-20
Progress Report: July 1,
2002‹December 31, 2002
Seth Teller
Project
Overview
Our research has three long-term goals. First, to develop rapid capture
methods for geometric
environments, using autonomous (robotic) sensors. Second, to develop a pervasive location capability for indoor environments (without GPS), so that a hand-held mobile
computing device can reliably determine its position and orientation. Third, to develop several fundamentally
new devices and applications (the software compass, software marker,
software flashlight) that
combine the captured model, and the hand-held device¹s positioning ability, to
enable the user to interact directly with the environment away from the desktop.
Our NTT-sponsored efforts focus on three aspects
of this long-term goal. First, we
are developing computer-vision capture methods:
algorithms to localize and fuse a low-resolution omni-directional video
stream gathered by a rolling or hand-held camera. (Joint work with Prof. John
Leonard in the Department of Ocean Engineering.) Second, we are developing procedural capture methods: a
³compiler² that takes legacy 2D CAD information as input, and produces
well-formed 3D architectural
models as output. Third, in
collaboration with Prof. Hari Balakrishnan, we are generalizing the Cricket
location infrastructure to
support orientation (as well as position) determination, and building early
prototypes of the devices and applications mentioned above.
These
research goals overlap with the interests of several NTT laboratories (and projects): the Cyber-Space
Laboratories (3D Information Processing Systems); the Communications Science
Laboratories (GeoLink); the Telecommunication Energy Laboratories (Low-Power
Radio-Frequency Devices); and the Network Innovation Laboratories (Software
Radios).
Progress
Through December 2002 (Updated to February 2003)
Computer-Vision Model Capture. We
have achieved the largest-scale autonomous capture tasks with loop closing
reported to date in the autonomous robotics literature. These include rolling robot deployment
for sessions ranging from tens of minutes to nearly two hours in duration, in
various MIT corridor networks with complex topologies (see Figures in
accompanying slides). Along with
Ph.D. student Michael Bosse, we have formulated the ³Atlas Framework² for
extended mapping, in which the environment is represented and recovered as a
collection of locally rigid sub-maps, each connected to its neighbors by a set
of uncertain geometric transformations.
This is motivated by our intuition about how humans navigate in complex
spaces: we are aware of the local
metric structure of our environment, but do not usually have a globally consistent
mental model of the entire space.
We have demonstrated the following results. First, we can use the image information
to improve the raw navigation data available from the sensor (typically as
odometry or inertial integration).
This allows us to use relatively less accurate (and therefore cheaper)
navigation sensors on boards.
Second, we can stabilize the image sequences to persistent structure in
the scene (vanishing points), removing the camera rotation introduced by the
operator. This presents a smoother
viewing experience to the user, allowing him/her to better maintain a sense of
orientation within the simulated environment. Also, stabilized video compresses significantly more
efficiently than unstabilized video, due to increased inter-frame coherence.
Third, we have addressed the scaling problem with
an approach called ³Atlas generation.²
Rather than attempt to recover a single, globally consistent scene
model, we instead produce a set of local maps connected only by their
(overlapping) boundaries. This
allows us to push uncertainty out of the maps themselves and into the
connecting transformations, much as a human exits one room, through a short
passageway, and enters another room. For most applications, this notion of ³local²
orientation is sufficient. Fourth,
we have demonstrated the Atlas framework over environments spanning hundreds of
meters, with repeated visits to certain areas of the environment (³Loop
Closing²).
Procedural Model Generation. We have made significant progress in our effort to extract detailed
three-dimensional geometric models from legacy two-dimensional CAD files. MIT¹s Department of Facilities
maintains an extensive corpus of more than 800 floorplans, each including
vector (line segment) representations of exterior walls, interior walls,
load-bearing columns, doorways and windows. MIT also maintains a ³base map² situating each building
(ground floor) on campus, and delineating roads, sidewalks, walking paths,
grassy areas, and parking areas.
Finally, various topographic representations of campus elevations with
respect to local sea level are available.
We have combined all of these elements using a
series of parsing and interpretation scripts to be run daily, in ³batch mode²
in the early morning hours. The end-to-end
script retrieves the base map from MIT¹s web site, then fetches all floorplans
for each building found on the basemap.
The floorplans are then segmented into layers, and each layer is
separately extruded into three-dimensional form. The result is exterior geometry with exterior doors and
windows, and interior geometry with interior walls, doors, stairwells, elevator
shafts etc. All geometry is
generated at three ³levels of detail² for efficient rendering: the ³low-detail² model is a simple vertical
prism with no doors or windows; the ³medium-detail² model has doors and
windows; and the ³high-detail² model has full exterior and interior
geometry. During interactive
viewing, the renderer selects the appropriate level of detail using the
viewer¹s distance from the building.
Recent progress on this front includes integration of these floorplans
with an existing spatial database for large-scale walkthroughs, and the
development (in progress) of new visibility algorithms for indoor-outdoor
scenes.
Generalizing the Cricket Position
Infrastructure. Our third major area of effort is in generalizing
the Cricket location-determination infrastructure to support orientation
computations, that is, to determine both position and attitude (bearing and
elevation angle) of a hand-held device.
We have designed a prototype device, the Cricket ³software compass,²
that uses multiple ultrasonic receivers to infer orientation from phase
differentials at the receiver.
This device is not yet in fabrication, however. In the interim we have developed an
equivalent capability through the use of two ordinary Cricket (position)
listeners, attached to either end of a board about 75cm long. From the difference in reported listener
positions, we compute the position and attitude of the board. We have also integrated a laser
range-finder and VGA projector to produce a prototype ³software
flashlight.² The range-finder
yields depth to a modeled projection surface in the environment. The VGA projector allows projection of
known model geometry (for example, hidden wires or pipes) onto the projection
surface. Together, these
components make a fundamentally new device possible, one that allows a kind of
³X-ray² vision through the ordinarily opaque surfaces of the environment. At present the accuracy of the software
flashlight is rather poor, but we are continuously improving its components.
Recent progress includes the extension of this
device to have ³software marker² capability, in which the user points at a
location in the world and illuminates it using the laser range finder. The software compass integrates
position, attitude, and range information to infer the XYZ coordinates of the
world point indicated by the user.
The user can then attach metadata to this world point to form a kind of
³virtual tag² indexed by location.
Research Plans for the Next Six Months
Over
the next six months, we plan to continue with each of the research efforts
above.
First,
we will continue to develop scalable computer-vision methods for model
capture. Our next goal is to
support generation of Atlases for even larger environments, and environments
that intermix indoor and outdoor elements. We will continue to develop new data structures and
rendering algorithms for viewing dense, registered imagery and extracted
three-dimensional geometry (typically point and edge features and
piecewise-planar models).
Second,
we will continue to develop scalable procedural CAD-based methods for model
generation. We are working with
MIT Department of Facilities to achieve more comprehensive parsing and
interpretation of their posted CAD data.
(At present we correctly process only a fraction of the available
floorplans.) We are also
integrating existing procedural algorithms for populating furniture based on
space type. We are developing
procedural population algorithms to fill in elevator shafts and cab assemblies,
stairwells, wheelchair ramps, and pedestrian bridges from icons on the source
CAD plans. Finally, we will
continue to integrate our network API to serve location-specific data (model
geometry, space name and type information, adjacency information) to mobile
hand-held computers. This API will
support location-aware applications such as route-finding, software marking
(including entry of as-built CAD information), resource discovery, and the
software flashlight for information overlay.
Finally,
we will continue to develop the Cricket and Software Compass architecture. We are actively engineering the first-
and second-generation Cricket Beacon and Listener hardware to improve its
accuracy, precision, channel efficiency and power usage. For example, the current algorithm to
detect the start of the ultrasound pulse is naïve, and could be greatly improved
with the addition of a simple transmission pattern and match filter. The current uncertainty in detection
produces a rather large spatial uncertainty in the recovered position of the
listerner, which we hope to reduce significantly. Also, under some circumstances the beacon circuitry can
drain a significant amount of power to ground, wasting battery life. We are actively examining these issues.
We
have recruited a new UROP student to tackle the problem of developing a
reliable software compass using a single Cricket listener with multiple ultrasound
transducers. Our current prototype
is accurate to about five degrees of bearing. We hope to improve our design in two ways. First, to increase the bearing accuracy
to a fraction of a degree. Second,
to recover orientation information along multiple axes by using a 3D
arrangement of the ultrasound transducers. For example, placing four transducers at four vertices of a
cube would enable the listener to recover two, rather than one, orientation
degrees of freedom with respect to each sensed beacon.
The
software flashlight prototype serves as a proof of concept, but its accuracy is
limited at present due to the uncertainty in the underlying Cricket listener
position algorithms. In parallel
with the above efforts, we will continue to develop calibration algorithms to
recover the optical parameters of the VGA projector, for example by aligning
projected geometry to equivalent fiducial geometry in the scene. In the next six months we hope to show
the software flashlight deployed throughout a large room (tens of meters on a
side), and able to faithfully project structural geometry: walls, edges, corners, support beams,
doors, and window frames. We
also hope to show a proof of concept of an ³assisted deployment² method for
beacons, in which a few beacons are initially deployed and programmed by hand,
then additional beacons are deployed and semi-automatically discover their
position with the help of a human operator and software compass.