Using Large Language Models as World Models in Visual Environments

Abstract

Model-based reinforcement learning (RL) aims to mitigate excessive need for costly environment interaction by using a world model to simulate interactions. However, current vision-based world models often produce inaccurate trajectories, decreasing the reliability of the world model for planning and simulation. To resolve these challenges, we propose a world model grounded on explicit textual representations. Our method transforms visual states into tokenized textual representations with explicit semantic meaning, and utilizes large language models (LLMs) to predict the next state in textual representations. Our preliminary experimental results demonstrate that our proposed text-grounded world model achieves accurate trajectory imagination, enabling improved policy training.

Document Details

Document Type: DoD Grant Award
Publication Date: Feb 06, 2025
Source ID: FA23862514013

Entities

People

Hyun Oh Song

Organizations

Air Force Office of Scientific Research
Seoul National University
United States Air Force

Using Large Language Models as World Models in Visual Environments

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas