Discounting, Ergodicity, and Convergence for Markov Decision Processes.

Abstract

The convergence of Markov decision processes in horizon length is commonly associated with the discount rate alpha. For example, the total cost function for a broad set of problems is known to converge O(alpha sup n). It is, however, the relative cost function (total cost function modulo an additive constant) which determines policy convergence. Relative cost convergence in turn depends both on the discount factor and on ergodic properties of nonhomogeneous Markov chains. We show in particular that for the stationary finite state space compact action space Markov decision problem that the relative cost function converges O((alpha)(lambda)) sup n), O < lambda < or = 1. The proof is constructive and shows that lambda equals the argument of the subdominant eigenvalue of the optimal infinite horizon policy.

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1975
Accession Number
ADA009264

Entities

People

  • Thomas E. Morton
  • William E. Wecker

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Additives (Chemicals)
  • Commerce
  • Convergence
  • Cooperation
  • Eigenvalues
  • Ergodic Processes
  • Markov Chains
  • Mathematics
  • Military Research
  • Probability
  • Stationary
  • Stochastic Processes

Readers

  • Analytical Mechanics
  • Life Cycle Cost Analysis
  • Mathematical Modeling and Probability Theory.

Technology Areas

  • Space