FINITE STATE CONTINUOUS TIME MARKOV DECISION PROCESSES WITH A FINITE PLANNING HORIZON.
Abstract
The system considered may be in one of n states at any point in time and its probability law is a Markov process which depends on the policy (control) chosen. The return to the system over a given planning horizon is the integral (over that horizon) of a return rate which depends on both the policy and the sample path of the process. The objective is to find a policy which maximizes the expected return over the given planning horizon. A necessary and sufficient condition for optimality is obtained, and a constructive proof is given that there is a piecewise constant policy which is optimal. A bound on the number of switches (points where the piecewise constant policy jumps) is obtained for the case where there are two states. (Author)
Document Details
- Document Type
- Technical Report
- Publication Date
- Apr 01, 1967
- Accession Number
- AD0651467
Entities
People
- Bruce L. Miller
Organizations
- RAND Corporation