Discounting, Ergodicity, and Convergence for Markov Decision Processes.

Abstract

The convergence of Markov decision processes in horizon length is commonly associated with the discount rate alpha. For example, the total cost function for a broad set of problems is known to converge O(alpha sup n). It is, however, the relative cost function (total cost function modulo an additive constant) which determines policy convergence. Relative cost convergence in turn depends both on the discount factor and on ergodic properties of nonhomogeneous Markov chains. We show in particular that for the stationary finite state space compact action space Markov decision problem that the relative cost function converges O((alpha)(lambda)) sup n), O < lambda < or = 1. The proof is constructive and shows that lambda equals the argument of the subdominant eigenvalue of the optimal infinite horizon policy.

Document Details

Document Type: Technical Report
Publication Date: Jan 01, 1975
Accession Number: ADA009264

Entities

People

Thomas E. Morton
William E. Wecker

Organizations

Carnegie Mellon University

Discounting, Ergodicity, and Convergence for Markov Decision Processes.

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Readers

Technology Areas