Learning from Noisy and Delayed Rewards: The Value of Reinforcement Learning to Defense Modeling and Simulation
Abstract
Modeling and simulation of military operations requires human behavior models capable of learning from experience in complex environments in which feedback on action quality is noisy and delayed. This research examines the potential of reinforcement learning, a class of Artificial Intelligence learning algorithms, to address this need. A novel reinforcement learning algorithm that uses the exponentially weighted average reward as an action-value estimator is described. Empirical results indicate that this relatively straight-forward approach improves learning speed in both benchmark environments and in challenging applied settings. Applications of reinforcement learning in the verification of the reward structure of a training simulation, the improvement in the performance of a discrete event simulation scheduling tool, and in enabling adaptive decision-making in combat simulation are presented. To place reinforcement learning within the context of broader models of human information processing, a practical cognitive architecture is developed and applied to the representation of a population within a conflict area. These varied applications and domains demonstrate that the potential for the use of reinforcement learning within modeling and simulation is great.
Document Details
- Document Type
- Technical Report
- Publication Date
- Sep 01, 2012
- Accession Number
- ADA567384
Entities
People
- Jonathan K. Alt
Organizations
- Naval Postgraduate School