Partially Observable Markov Decision Processes over an Infinite Planning Horizon with Discounting
Abstract
This is the last in a series of technical reports concerned with mathematical approaches to instructional sequence optimization in instructional systems. This paper deals with Markov decision processes where the true state of the system is not known with certainty. Hence the state of the system is characterized by a probability vector. Each action yields an expected reward, transforms the system to a new state and yields an observable outcome. One wishes to determine an action for each probability state vector so as to maximize the total expected reward. This report treats the infinite time horizon with a discount factor, using a partial N dimensional Maclaurin series to approximate the total optimal reward as a function of the probability state vector.
Document Details
- Document Type
- Technical Report
- Publication Date
- Mar 01, 1976
- Accession Number
- ADA025077
Entities
People
- Richard D. Wollmer
Organizations
- University of Southern California