Markov Decision Problems with Expected Utility Criteria.

Abstract

Finite state and action Markov decision problems with expected utility criteria are analyzed. A Markov decision chain (or sequential decision process) is defined in the usual manner. But instead of seeking to maximize the expected sum (or product) of rewards, the objective is maximization of the expectation of some cardinal utility function defined on the sequence of rewards. The standard notions and results of infinite horizon problems with additive utility are extended to the case of general utility. Appropriate optimality equations, conserving strategies, policy iteration and value iteration concepts are given. Analogues of negative, positive, convergent and transient dynamic programming are defined and the standard results proven; e.g., conserving strategies are optimal in the cases analogous to negative, convergent and transient dynamic programming. Because utility functions are not generally separable, optimal strategies must take into account more than just the current state. But generalizations of memoryless and stationary strategies are given, and conditions are established under which there exists an optimal memoryless/stationary strategy. The basic notion is summarized utility; rewards up to every decision epoch are summarized (with 'sufficient' richness) in a single number.

Document Details

Document Type: Technical Report
Publication Date: Jun 01, 1975
Accession Number: ADA016235

Entities

People

David M. Kreps

Organizations

Stanford University

Markov Decision Problems with Expected Utility Criteria.

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Readers