An Adaptive Sampling Algorithm for Solving Markov Decision Processes

Abstract

Based on recent results for multi-armed bandit problems, the authors propose an adaptive sampling algorithm that approximates the optimal value of a finite horizon Markov decision process (MDP) with infinite state space but finite action space and bounded rewards. The algorithm adaptively chooses which action to sample as the sampling process proceeds, and it is proven that the estimate produced by the algorithm is asymptotically unbiased and the worst possible bias is bounded by a quantity that converges to zero at rate of Order of H(lnN)N, where H is the horizon length and N is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is Order of ((Abs. Val. A)N) to the H power, independent of the state space size, where Abs. Val. A is the size of the action space. The algorithm can be used to create an approximate receding horizon control to solve infinite horizon MDPs.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 31, 2002
Accession Number
ADA438505

Entities

People

  • Hyeong S. Chang
  • Michael C. Fu
  • Steven I Marcus

Organizations

  • University of Maryland

Tags

Communities of Interest

  • Autonomy
  • C4I
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Air Force
  • Algorithms
  • Applied Mathematics
  • Collecting Methods
  • Computational Complexity
  • Engineering
  • Equations
  • Inequalities
  • Machine Learning
  • Markov Chains
  • Mathematics
  • Probability
  • Probability Distributions
  • Random Variables
  • Sampling
  • Stochastic Processes
  • Theorems

Fields of Study

  • Computer science

Readers

  • Adaptive Control and Estimation with Uncertainty in Dynamic Systems.
  • Analytical Mechanics
  • Mathematical Modeling and Probability Theory.

Technology Areas

  • Space