Partially Observable Markov Decision Processes over an Infinite Planning Horizon with Discounting

Abstract

This is the last in a series of technical reports concerned with mathematical approaches to instructional sequence optimization in instructional systems. This paper deals with Markov decision processes where the true state of the system is not known with certainty. Hence the state of the system is characterized by a probability vector. Each action yields an expected reward, transforms the system to a new state and yields an observable outcome. One wishes to determine an action for each probability state vector so as to maximize the total expected reward. This report treats the infinite time horizon with a discount factor, using a partial N dimensional Maclaurin series to approximate the total optimal reward as a function of the probability state vector.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Mar 01, 1976
Accession Number: ADA025077

Entities

People

Richard D. Wollmer

Organizations

University of Southern California

Partially Observable Markov Decision Processes over an Infinite Planning Horizon with Discounting

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Readers