Steering Policies for Markov Decision Processes Under a Recurrence Condition

Abstract

This paper presents a class of adaptive policies in the context of Markov decision processes (MDP's) with long-run average performance measures. Under a recurrence condition, the proposed policy alternates between two stationary policies so as to adaptively track a sample average cost to a desired value. Direct sample path arguments are presented for investigating the convergence of sample average costs and the performance of the adaptive policy is discussed. The obtained results are particularly useful in discussing constrained MDP's with a single constraint. Applications include a wide class of constrained MDP's with finite state space (Beutler and Ross 1985), an optimal flow control problem (Ma and Makowski 1987) and an optimal resource allocation problem (Nain and Ross 1986).

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1988
Accession Number
ADA454595

Entities

People

  • Armand M. Makowski
  • Dye-jyu Ma

Organizations

  • University of Maryland

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Abstracts
  • Availability
  • Classification
  • Contracts
  • Convergence
  • Flow
  • Hypervelocity Flow
  • Information Operations
  • Instructions
  • Maryland
  • Monitoring
  • Security
  • Standards
  • Steering
  • Universities

Readers

  • Adaptive Control and Estimation with Uncertainty in Dynamic Systems.
  • Statistical inference.

Technology Areas

  • Space
  • Space - Spacecraft Maneuvers