Learning and Using Models

Abstract

As opposed to model-free RL methods, which learn directly from experience in the domain, model-based methods learn a model of the transition and reward functions of the domain on-line and plan a policy using this model. Once the method has learned an accurate model, it can plan an optimal policy on this model without any further experience in the world. Therefore, when model-based methods are able to learn a good model quickly, they frequently have improved sample efficiency over model-free methods, which must continue taking actions in the world for values to propagate back to previous states. Another advantage of model-based methods is that they can use their models to plan multi-step exploration trajectories. In particular, many methods drive the agent to explore where there is uncertainty in the model, so as to learn the model as fast as possible. In this chapter, we survey some of the types of models used in model-based methods and ways of learning them, as well as methods for planning on these models. In addition, we examine the typical architectures for combining model learning and planning, which vary depending on whether the designer wants the algorithm to run on-line, in batch mode, or in real-time. One of the main performance criteria for these algorithms is sample complexity, or how many actions the algorithm must take to learn. We examine the sample efficiency of a few methods, which are highly dependent on having intelligent exploration mechanisms. We survey some approaches to solving the exploration problem, including Bayesian methods that maintain a belief distribution over possible models to explicitly measure uncertainty in the model. We show some empirical comparisons of various model-based and model-free methods on two example domains before concluding with a survey of current research on scaling these methods up to larger domains with improved sample and computational complexity.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2011
Accession Number
AD1024588

Entities

People

  • Peter Stone
  • Todd Hester

Organizations

  • University of Texas at Austin

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Artificial Intelligence Software
  • Artificial Neural Networks
  • Bayesian Networks
  • Compressed Sensing
  • Computational Complexity
  • Computational Science
  • Computer Programming
  • Computer Programs
  • Computer Science
  • Computers
  • Dimensionality Reduction
  • Dynamic Programming
  • Gaussian Processes
  • Generative Models
  • Information Processing
  • Information Systems
  • Machine Learning
  • Models
  • Monte Carlo Method
  • Neural Networks
  • Probabilistic Models
  • Probability
  • Probability Distributions
  • Reinforcement Learning
  • Sampling
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Neural Network Machine Learning.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Learning Algorithms