Markov Decision Problems with Expected Utility Criteria.

Abstract

Finite state and action Markov decision problems with expected utility criteria are analyzed. A Markov decision chain (or sequential decision process) is defined in the usual manner. But instead of seeking to maximize the expected sum (or product) of rewards, the objective is maximization of the expectation of some cardinal utility function defined on the sequence of rewards. The standard notions and results of infinite horizon problems with additive utility are extended to the case of general utility. Appropriate optimality equations, conserving strategies, policy iteration and value iteration concepts are given. Analogues of negative, positive, convergent and transient dynamic programming are defined and the standard results proven; e.g., conserving strategies are optimal in the cases analogous to negative, convergent and transient dynamic programming. Because utility functions are not generally separable, optimal strategies must take into account more than just the current state. But generalizations of memoryless and stationary strategies are given, and conditions are established under which there exists an optimal memoryless/stationary strategy. The basic notion is summarized utility; rewards up to every decision epoch are summarized (with 'sufficient' richness) in a single number.

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 1975
Accession Number
ADA016235

Entities

People

  • David M. Kreps

Organizations

  • Stanford University

Tags

Communities of Interest

  • Human Systems

DTIC Thesaurus Topics

  • Additives (Chemicals)
  • Analogs
  • Computer Programming
  • Dynamic Programming
  • Equations
  • Iterations
  • Mathematics
  • Sequences
  • Standards
  • Stationary

Readers

  • Mathematical Modeling and Probability Theory.
  • Operations Research