Learning from Noisy and Delayed Rewards: The Value of Reinforcement Learning to Defense Modeling and Simulation

Abstract

Modeling and simulation of military operations requires human behavior models capable of learning from experience in complex environments in which feedback on action quality is noisy and delayed. This research examines the potential of reinforcement learning, a class of Artificial Intelligence learning algorithms, to address this need. A novel reinforcement learning algorithm that uses the exponentially weighted average reward as an action-value estimator is described. Empirical results indicate that this relatively straight-forward approach improves learning speed in both benchmark environments and in challenging applied settings. Applications of reinforcement learning in the verification of the reward structure of a training simulation, the improvement in the performance of a discrete event simulation scheduling tool, and in enabling adaptive decision-making in combat simulation are presented. To place reinforcement learning within the context of broader models of human information processing, a practical cognitive architecture is developed and applied to the representation of a population within a conflict area. These varied applications and domains demonstrate that the potential for the use of reinforcement learning within modeling and simulation is great.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Sep 01, 2012
Accession Number: ADA567384

Entities

People

Jonathan K. Alt

Organizations

Naval Postgraduate School

Learning from Noisy and Delayed Rewards: The Value of Reinforcement Learning to Defense Modeling and Simulation

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas