Some Results in Dynamic Programming

Abstract

The first part of the report discusses a dynamic programming model in which all rewards obtained by the decision maker are assumed nonnegative. The decision maker's objective is to successively choose actions so as to maximize his expected reward earned over an infinite time span. It follows from known results that the decision maker's choice need only depend upon the outcome of a randomization that depends on the model only through the state of the model and the time when the choice is made. It is shown by counterexample that this is basically the smallest class of decision rules that need be considered. Conditions under which a stationary policy is optimal are also presented. The second part of the report discusses the same model under a new criteria, namely, the average cost incurred per unit time. An example is presented in which there does not exist an epsilon-optimal randomized stationary policy.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Apr 01, 1971
Accession Number
AD0722347

Entities

People

  • Sheldon M. Ross

Organizations

  • University of California, Berkeley

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Applied Mathematics
  • California
  • Computer Programming
  • Dynamic Programming
  • Engineering
  • Industrial Engineering
  • Interdisciplinary Science
  • Military Research
  • Operations Research
  • Probability
  • Random Variables
  • Stationary
  • Transitions
  • United States
  • United States Government
  • Universities

Fields of Study

  • Computer science

Readers

  • Economics
  • Mathematical Modeling and Probability Theory.