ALTERNATIVE SCHEMES FOR INVESTIGATING MARKOV DECISION PROCESSES.

Abstract

The report is concerned with the problem of finding optimal policies for infinite-horizon Markovian decision models. Several algorithms applicable to discrete time, discrete state, single-chain Markov processes with undiscounted rewards are described. One of these algorithms, called 'the iterative method with limits,' is proved and its properties are discussed. The method consists of finding upper and lower limits for the optimal gain per transition of the infinite-horizon process. These limits improve on each iteration of the algorithm, and converge to the final value of the optimal gain, ultimately. Simultaneously, the algorithm produces the 'relative value' variables, as well as the optimal policy. The computational efficiency of the aforementioned algorithms was compared by solving several test problems. It was found that for problems for which the number of alternatives per state is considerably less than the number of states, the iterative method with limits enjoys a serious speed advantage over the well-known policy iteration algorithms. The former method also produces very accurate results with very little programming effort. (Author)

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 1967
Accession Number
AD0655893

Entities

People

  • Amedeo R. Odoni

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Computer Programming
  • Efficiency
  • Iterations
  • Markov Processes
  • Mathematics
  • Transitions

Readers

  • Mathematical Modeling and Probability Theory.
  • Systems Analysis and Design