Partially Observable Markov Decision Processes over an Infinite Planning Horizon with Discounting

Abstract

This is the last in a series of technical reports concerned with mathematical approaches to instructional sequence optimization in instructional systems. This paper deals with Markov decision processes where the true state of the system is not known with certainty. Hence the state of the system is characterized by a probability vector. Each action yields an expected reward, transforms the system to a new state and yields an observable outcome. One wishes to determine an action for each probability state vector so as to maximize the total expected reward. This report treats the infinite time horizon with a discount factor, using a partial N dimensional Maclaurin series to approximate the total optimal reward as a function of the probability state vector.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 1976
Accession Number
ADA025077

Entities

People

  • Richard D. Wollmer

Organizations

  • University of Southern California

Tags

Communities of Interest

  • Biomedical
  • Human Systems

DTIC Thesaurus Topics

  • Algorithms
  • Education
  • Human Resources
  • Learning
  • Linear Programming
  • Management Personnel
  • Manpower Utilization
  • Military Research
  • Operations Research
  • Personnel Management
  • Probability
  • Psychology
  • Simplex Method
  • Social Sciences
  • Students
  • Systems Engineering
  • Training

Readers

  • Calculus or Mathematical Analysis
  • Mathematical Modeling and Probability Theory.