Learning Representation and Control in Markov Decision Processes

Abstract

This research investigated algorithms for approximately solving Markov decision processes (MDPs), a widely used model of sequential decision making. Much past work on solving MDPs in adaptive dynamic programming and reinforcement learning has assumed representations, such as basis functions, are provided by a human expert. The research investigated a variety of approaches to automatic basis construction, including reward-sensitive and reward-invariant methods, diagonalization and dilation methods, as well as orthogonal and over-complete representations. A unifying perspective on the various basis construction methods emerges from showing they result from different power series expansions of value functions, including the Neumann series expansion, the Laurent series expansion, and the Schultz expansion. The research also develops new computational algorithms for learning sparse solutions to MDPs using convex optimization methods.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 21, 2013
Accession Number
ADA587838

Entities

People

  • Sridhar Mahadevan

Organizations

  • University of Massachusetts Amherst

Tags

Communities of Interest

  • Energy and Power Technologies
  • Human Systems

DTIC Thesaurus Topics

  • Air Force
  • Air Force Research Laboratories
  • Algorithms
  • Artificial Intelligence
  • Bayesian Networks
  • Computational Science
  • Dimensionality Reduction
  • Dynamic Programming
  • Equations
  • Feature Selection
  • Machine Learning
  • Markov Chains
  • Operations Research
  • Optimization
  • Power Series
  • Probability
  • Reinforcement Learning

Readers

  • Calculus or Mathematical Analysis
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms