Reinforcement Learning from Human Reward: Discounting in Episodic Tasks

Abstract

Several studies have demonstrated that teaching agents by human-generated reward can be a powerful technique. However, the algorithmic space for learning from human reward has hitherto not been explored systematically. Using model-based reinforcement learning from human reward in goal-based, episodic tasks, we investigate how anticipated future rewards should be discounted to create behavior that performs well on the task that the human trainer intends to teach. We identify a positive circuits problem with low discounting (i.e., high discount factors) that arises from an observed bias among humans towards giving positive reward. Empirical analyses indicate that high discounting (i.e., low discount factors) of human reward is necessary in goal-based, episodic tasks and lend credence to the existence of the positive circuits problem.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 13, 2012
Accession Number
AD1024591

Entities

People

  • Peter Stone
  • W. B. Knox

Organizations

  • University of Texas at Austin

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Algorithms
  • Computer Programming
  • Computer Science
  • Computers
  • Instructions
  • Intervals
  • Learning
  • Mountains
  • Reinforcement Learning
  • Specifications
  • Supervised Machine Learning
  • Task Performance And Analysis
  • Training
  • Transitions
  • Web Browsers

Fields of Study

  • Biology
  • Computer science
  • Psychology

Readers

  • Artificial Intelligence
  • Brain and Cognitive Science; Experimental Psychology; Cognitive Neuroscience

Technology Areas

  • AI & ML
  • Space
  • Space - Spacecraft Maneuvers