PERTURBATION THEORY AND MARKOVIAN DECISION PROCESSES.

Abstract

The Howard-Jewell algorithm for programming over a Markov-renewal process is analyzed in terms of a perturbation theory formalism which describes how the stationary distribution changes when the transition probabilities change. The policy improvement technique is derived from this new viewpoint. The relative values may be interpreted as partial derivatives of the gain rate with respect to policy. The value equations are shown to be solvable, with the relative values unique up to one additive constant, if and only if the underlying Markov chain is irreducible. The policy iteration algorithm is shown not to cycle, this guaranteeing convergence. A discussion of the existence, uniqueness, and characterization of the solution to the functional equation of dynamic programming is given. Emphasis is placed upon the value maximization of transient states. The fundamental matrix is developed as a useful tool for doing perturbation theory, describing firstpassage properties of semi-Markov processes, and for dealing with semi-Markov processes with rewards. (Author)

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 1965
Accession Number
AD0618406

Entities

People

  • Paul J. Schweitzer

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Complex Systems
  • Computer Programming
  • Dynamic Programming
  • Equations
  • Markov Chains
  • Markov Processes
  • Perturbation Theory
  • Perturbations
  • Probability

Fields of Study

  • Mathematics

Readers

  • Calculus or Mathematical Analysis
  • Mathematical Modeling and Probability Theory.
  • Operations Research