Discounting, Ergodicity, and Convergence for Markov Decision Processes.
Abstract
The convergence of Markov decision processes in horizon length is commonly associated with the discount rate alpha. For example, the total cost function for a broad set of problems is known to converge O(alpha sup n). It is, however, the relative cost function (total cost function modulo an additive constant) which determines policy convergence. Relative cost convergence in turn depends both on the discount factor and on ergodic properties of nonhomogeneous Markov chains. We show in particular that for the stationary finite state space compact action space Markov decision problem that the relative cost function converges O((alpha)(lambda)) sup n), O < lambda < or = 1. The proof is constructive and shows that lambda equals the argument of the subdominant eigenvalue of the optimal infinite horizon policy.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 1975
- Accession Number
- ADA009264
Entities
People
- Thomas E. Morton
- William E. Wecker
Organizations
- Carnegie Mellon University