An Evolutionary Random Policy Search Algorithm for Solving Markov Decision Processes

Abstract

This paper presents a new randomized search method called Evolutionary Random Policy Search (ERPS) for solving infinite horizon discounted cost Markov Decision Process (MDP) problems. The algorithm is particularly targeted at problems with large or uncountable action spaces. ERPS approaches a given MDP by iteratively dividing it into a sequence of smaller, random, sub-MDP problems based on information obtained from random sampling of the entire action space and local search. Each sub-MDP is then solved approximately by using a variant of the standard policy improvement technique, where an elite policy is obtained. We show that the sequence of elite policies converges to an optimal policy with probability one. An adaptive version of the algorithm that improves the efficiency of the search process while maintaining the convergence properties of ERPS is also proposed. Some numerical studies are carried out to illustrate the algorithm and compare it with existing procedures.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2004
Accession Number
AD1005567

Entities

People

  • Jiaqiao Hu
  • Michael C. Fu
  • Steven I Marcus
  • Vahid R. Ramezani

Organizations

  • University of Maryland

Tags

Communities of Interest

  • C4I
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Computational Complexity
  • Computer Programming
  • Computer Science
  • Dynamic Programming
  • Engineering
  • Genetic Algorithms
  • Linear Programming
  • Mathematical Programming
  • Operations Research
  • Optimization
  • Probability
  • Probability Distributions
  • Random Variables
  • Statistical Sampling
  • Theorems

Fields of Study

  • Computer science

Readers

  • Groundwater Contamination Remediation.
  • Mathematical Modeling and Probability Theory.
  • Operations Research

Technology Areas

  • Space