DISCRETE DYNAMIC PROGRAMMING WITH SENSITIVE DISCOUNT OPTIMALITY CRITERIA.

Abstract

Stationary, discrete and continuous time parameter, finite state and action Markovian decision processes with substochastic transition matrices are studied where the future rewards are discounted. The methods of successive approximations and policy improvement are shown to converge to the maximal expected discounted reward when the interst rate is negative provided the expected discounted time until the process stops is finite under every stationary policy. A family of discount optimality criteria of varying degrees of sensitivity is introduced. The family includes and unifies the discount criteria studied by Blackwell. The existence of stationary optimal policies is established constructively and the class of such policies is characterized. The algorithms are essentially specializations and refinements of a policy improvement method recently devised by Miller and the author for finding stationary policies having maximal expected discounted reward for all small enough positive interest rates. The results exploit the fact that the Laurent expansion in rho (the interest rate) about the origin of the discounted return under a stationary policy has a simple form. (Author)

Document Details

Document Type
Technical Report
Publication Date
Aug 30, 1968
Accession Number
AD0673965

Entities

People

  • Arthur F. Veinott Jr.

Organizations

  • Stanford University

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Computer Programming
  • Dynamic Programming
  • Evolutionary Algorithms
  • Heuristic Methods
  • Mathematics
  • Sensitivity
  • Specialization
  • Stationary
  • Transitions

Readers

  • Mathematical Modeling and Probability Theory.
  • Operations Research