PERTURBATION THEORY AND MARKOVIAN DECISION PROCESSES.

Abstract

The Howard-Jewell algorithm for programming over a Markov-renewal process is analyzed in terms of a perturbation theory formalism which describes how the stationary distribution changes when the transition probabilities change. The policy improvement technique is derived from this new viewpoint. The relative values may be interpreted as partial derivatives of the gain rate with respect to policy. The value equations are shown to be solvable, with the relative values unique up to one additive constant, if and only if the underlying Markov chain is irreducible. The policy iteration algorithm is shown not to cycle, this guaranteeing convergence. A discussion of the existence, uniqueness, and characterization of the solution to the functional equation of dynamic programming is given. Emphasis is placed upon the value maximization of transient states. The fundamental matrix is developed as a useful tool for doing perturbation theory, describing firstpassage properties of semi-Markov processes, and for dealing with semi-Markov processes with rewards. (Author)

Document Details

Document Type: Technical Report
Publication Date: Jun 01, 1965
Accession Number: AD0618406

Entities

People

Paul J. Schweitzer

Organizations

Massachusetts Institute of Technology

PERTURBATION THEORY AND MARKOVIAN DECISION PROCESSES.

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers