Learning Control Actions for Guided Projectiles Using the Proximal Policy Optimization (PPO) Algorithm
Abstract
In this report we present a data-driven approach for closed loop control of the Laboratory Technology Vehicle. We use the Proximal Policy Optimization (PPO) algorithm, a reinforcement learning algorithm that has been shown to perform well for a variety of tasks. The success of PPO is due to its stability in finding solutions, in addition to having many of the positive properties of policy gradient methods. Although PPO has been shown to be successful across the literature, it does suffer in the event of a sparse reward; this happens to be the case for our application of precision munitions where the objective is to hit a specific target. To tackle this issue of sparse reward we present a curriculum learning method around PPO. The curriculum segments learning into stages that incrementally increase in complexity, alleviating the sparsity of the reward signal. The proposed method is shown to outperform a learning method with no curriculum.
Document Details
- Document Type
- Technical Report
- Publication Date
- Oct 01, 2022
- Accession Number
- AD1182956
Entities
People
- Bethany Allik
- Christopher Hsu
- Franklin Shedleski
Organizations
- United States Army Research Laboratory