Aligning LLMs for Reasoning and Explanations in Reinforcement Learning
Abstract
This project aims to explore an explanation framework for a reinforcement learning (RL) agent to work in complex environments-tasks. More specifically, we expect this explanation framework can (1) explain the inner logic of an RL agent s action at any given state in natural language; (2) help people to understand how much a training task contributes to the final performance, and then further understand why an agent can perform good or not after training. Diverging from prior approaches in explanation and causal reinforcement learning, our framework employs natural language to explain the actions of the agent (it is capable of elucidating each frame of action), thereby rendering such explanations inherently more intuitive and comprehensible. Moreover, by relying on a more intuitive elucidation and comprehension of RL behavior, we can improve our understanding of the RL training process, e.g., a certain training stage has the most obvious improvement in final performance. Nevertheless, a significant drawback is that RL training mostly relies on visual signals such as videos and images in intricate settings, but lacks a complementary text engine. On the other hand, current LLMs do not possess spatial and logical reasoning capabilities to understand the agent s behavior. Vision language model (VLM) can solve this problem since it has spatial-temporal information in the video signal. We plan to systematically address the following challenges- (1) How to integrate LLM into the RL training process. This alignment enables LLM to accurately comprehend and articulate the behavior logic of the RL agent. (2) How to align the distribution of an LLM with VLM. This refers to whether our framework can be generalized to real-world scenarios. This research project is expected to explore the very first model that can generate a natural language-based explanation for the RL agent s actions in any complex environments and tasks, indicate the role of each training task, and visualize the learning process of an agent.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Feb 05, 2025
- Source ID
- FA23862414031
Entities
People
- Flora Salim
Organizations
- Air Force Office of Scientific Research
- United States Air Force
- University of New South Wales