Learning Rich, Tractable Models of the Real World


Progress Report: January 1, 2001–June 30, 2001

Leslie Pack Kaelbling



Project Overview

The everyday world of a household or a city street is exceedingly complex and dynamic, from a robot's perspective. In order for robots to operate effectively in such domains, they have to learn models of how the world works and use them to predict the effects of their actions. In traditional AI, such models were represented in first-order logic and related languages; they had no representation of the inherent uncertainty in the world and were not connected up to real perceptual systems. More recent AI techniques allow model-learning directly from perceptual data, but they are representationally impoverished, lacking the ability to refer to objects as such, or to make relational generalizations of the form: "If object A is on object B, then if I move object B, object A will probably move too."

We are engaged in developing learning algorithms that can represent and acquire such information from interaction with a noisy real world.


Progress Through June 2001

In September, 2000, we added a postdoctoral researcher, Tim Oates, and two research assistants, Natalia Hernandez and Sarah Finney to the project. We spent the first three months exploring the literature and developing a concrete research agenda.

In January, we began to carry out a set of experiments exploring the use of deictic propositional representations in a simple blocks-world environment. The fundamental idea is that, rather than describing the objects in the world with arbitrary names, such as "block 23", it makes more sense to describe them in terms related to the agent, such as "the block I am looking at". There had been some existing studies of learning with deictic representations, but they were small and the representations seemed to be very carefully hand-tuned to work in the domain. We wanted to explore the use of general reinforcement-learning techniques in these domains, without hand-crafting the representations. We had two goals: comparing a propositional deictic representation to a more traditional propositional representation, and understanding how different reinforcement-learning methods (in particular, NeuroDyanmic programming and Utree) interacted with deictic representations.

We made a great deal of progress during this period, but unfortunately, most of our results were negative. That is, we found that none of the different techniques we tried worked very well in the block-stacking domain. After the initial set of experiments, we did a great many more experiments, aimed at elucidating exactly the causes of failure of the initial experiments. We have learned a great deal through all of this. The details are described in a technical report; but the major points are listed below:

Our conclusions from all of this work are that it is important to identify something about the structure of the domain and how the observations work, rather than trying to learn policies directly via reinforcement learning.


Research Plan for the Next Six Months

In the next six months (which will be the final period of this project), our plan is to investigate methods for learning forward models of the dynamics of the domain, rather than trying to learn the reinforcement function directly. If we are able to learn a model, then we can use it to build a state estimator, and then render the problem completely observable. This is a big problem, but we have the following concrete steps planned: