Generalizing Dijkstra's Algorithm and Gaussian Elimination for Solving MDPs

Abstract

The authors study the problem of computing the optimal value function for a Markov Decision Process (MDP) with positive costs. Computing this function quickly and accurately is a basic step in many schemes for deciding how to act in stochastic environments. There are efficient algorithms that compute value functions for special types of MDPs. For deterministic MDPs with S states and A actions, Dijkstra's algorithm runs in time O(AS log S). And, in single-action MDPs (Markov chains), standard linear-algebraic algorithms find the value function in time O(S sup 3), or faster by taking advantage of sparsity or good conditioning. Algorithms for solving general MDPs can take much longer: the authors are not aware of any speed guarantees better than those for comparably sized linear programs. They present a family of algorithms that reduce to Dijkstra's algorithm when applied to deterministic MDPs, and to standard techniques for solving linear equations when applied to Markov chains. More importantly, they demonstrate experimentally that these algorithms perform well when applied to MDPs that "almost" have the required special structure.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 2005
Accession Number
ADA456811

Entities

People

  • Geoffrey J. Gordon
  • H. B. Mcmahan

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Autonomy
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Computations
  • Computer Science
  • Convergence
  • Elimination
  • Equations
  • Iterations
  • Linear Algebra
  • Linear Systems
  • Markov Chains
  • Monotone Functions
  • Motion Planning
  • Noise
  • Numerical Analysis
  • Probability
  • Robots
  • Trajectories

Fields of Study

  • Computer science

Readers

  • Mathematical Modeling and Probability Theory.
  • Operations Research