Improving Reinforcement Learning Efficacy with Causality, Trajectories,and the Value of Information

Abstract

Our architecture for autonomy in the world is inspired by human cognition, which deconstructs the world in objects and their properties. Objects are extracted in an unsupervised manner from the environment by a foveated vision system and stored in external memoryto be accessed by any component of the algorithm. The agent still interacts with the world through exploration and rewards a.k.a. Reinforcement Learning (RL), but it must perform inference to discover objects properties, instead of using huge amounts of interactions under Markovian assumptions. As we are all well aware, RL is slow to converge and does not scale up well, therefore it is far from being useful in scenarios where the problem is high dimensional, the data is scarce, and one needs near real time operation. Thegoal of the current proposal to SoA is to improve the mathematical foundations of RL theory to make RL more efficient under the constraints of our current architecture.The proposal main goal is to bring value of information (VoI) and causality to RL to improve the efficiency of learning and provide at the same time solid mathematical foundations (indeed information theory is a theory of bounds). Basically, we propose to decrease the number of interactions with the environment by using causality and to organize the past knowledge of the agent in trajectories to discover by inference and apply winning strategies more efficiently. We divided the work in two tasks:1- Our hypothesis is that the learning rate should be controlled by how certain the agent is about the cause of the reward/punishment. Since the agent has its own memory of past objects and their affordances, it should be possible to employ these features to link actions to rewards with probability 1 in some cases. We will use causality for this purpose, i.e. the policy should learn the action in one shot when causality is established.2- A strategy is a sequence of moves that lead to a solution. Because of the Markov assumption in RL, there is no strategy beyond Bellmans maximization of future rewards (or discounted reward). Real world problems may not be Markovian, hence developing strategies for sequential decision making in RL is important. Our hypothesis is that the te will validate the improvements in video games because it is a sufficiently rich environment where we have some control to test each one of the phases of the research. We will start with SuperMario (Super Mario Broto interact with the environment. But we will demonstrate that the learned skills generalize to other games, such as Sonic the Hedgertal Kombat III and Street Fighter II for the Super NES. These latter games will aid in highlighting the immense effectiveness of portable task knowledge in facilitating generalizable agent behaviors compared to deep RL strategies that rely on purely point-wise estimation and generalization.However, the ultimate goal is to move to problems important for the Navy. This is where the Collaboration with the co-PI, Dr. Sledge, is going to be crucial. One such example isunderwater mine counter measures and explosive remediationusing teams of heterogeneousvehicles. For these applications, we would like to direct a team of vehicles to divide and surveya given workspace to locate and neutralize explosive targets. The other example that we willconsider is that of vehicle counter swarming, wherein the goal is to direct a team of agents toneutralize a series of enemy vehicles. This is an incredibly challenging application, since theenemy vehicles may behave in unpredictable ways, to the agents, to complete theirunknown goals.*Abstract approved for public release

Document Details

Document Type: DoD Grant Award
Publication Date: Apr 06, 2021
Source ID: N000142112295

Entities

People

José Príncipe

Organizations

Office of Naval Research
United States Navy
University of Florida

Improving Reinforcement Learning Efficacy with Causality, Trajectories,and the Value of Information

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas