Markov Decision Problems with Expected Utility Criteria.
Abstract
Finite state and action Markov decision problems with expected utility criteria are analyzed. A Markov decision chain (or sequential decision process) is defined in the usual manner. But instead of seeking to maximize the expected sum (or product) of rewards, the objective is maximization of the expectation of some cardinal utility function defined on the sequence of rewards. The standard notions and results of infinite horizon problems with additive utility are extended to the case of general utility. Appropriate optimality equations, conserving strategies, policy iteration and value iteration concepts are given. Analogues of negative, positive, convergent and transient dynamic programming are defined and the standard results proven; e.g., conserving strategies are optimal in the cases analogous to negative, convergent and transient dynamic programming. Because utility functions are not generally separable, optimal strategies must take into account more than just the current state. But generalizations of memoryless and stationary strategies are given, and conditions are established under which there exists an optimal memoryless/stationary strategy. The basic notion is summarized utility; rewards up to every decision epoch are summarized (with 'sufficient' richness) in a single number.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jun 01, 1975
- Accession Number
- ADA016235
Entities
People
- David M. Kreps
Organizations
- Stanford University