Learning Control Actions for Guided Projectiles Using the Proximal Policy Optimization (PPO) Algorithm

Abstract

In this report we present a data-driven approach for closed loop control of the Laboratory Technology Vehicle. We use the Proximal Policy Optimization (PPO) algorithm, a reinforcement learning algorithm that has been shown to perform well for a variety of tasks. The success of PPO is due to its stability in finding solutions, in addition to having many of the positive properties of policy gradient methods. Although PPO has been shown to be successful across the literature, it does suffer in the event of a sparse reward; this happens to be the case for our application of precision munitions where the objective is to hit a specific target. To tackle this issue of sparse reward we present a curriculum learning method around PPO. The curriculum segments learning into stages that incrementally increase in complexity, alleviating the sparsity of the reward signal. The proposed method is shown to outperform a learning method with no curriculum.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Oct 01, 2022
Accession Number: AD1182956

Entities

People

Bethany Allik
Christopher Hsu
Franklin Shedleski

Organizations

United States Army Research Laboratory

Learning Control Actions for Guided Projectiles Using the Proximal Policy Optimization (PPO) Algorithm

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas