Rethinking Reinforcement Learning with Astrocyte-Neuron Computations

Abstract

Deep reinforcement learning (RL) has enabled super-human performance in challenging video games and control tasks, and particularly in games with perfect knowledge and deep search (GO, chess). Despite this success, solving real world problems with RL algorithms remains next to impossible. We identify five key problems that the current state-of-the-art RL cannot solve effectively. P1: The need for an efficient solution to the exploration-exploitation problem. P2: The need for more efficient learning of Q-values via feature and task similarity transfer. P3: How to learn new tasks without forgetting older tasks: the problem of catastrophic forgetting. P4: How to deploy temporal abstractions to solve new tasks. P5: The problem of credit assignment for rewarding actions over extended time-scales. Because todayÕs RL algorithms are firmly grounded in the theory of Markov systems, leveraging past knowledge for solving new tasks requires the development of novel computational frameworks that go beyond the Markov assumption and integrate information across multiple time-scales. By integrating information across time, astrocytes may play an essential role in solving problems with long temporal dependencies. We will build a new theoretical framework for RL based on a theory of potential-based methods. By separating value estimation from action selection, and integrating costs and benefits across time, the theory naturally solves the exploration-exploitation dilemma (P1) and enables mapping its predictions to the functional anatomy of brain structures, including the prefrontal cortex (PFC) and striatum. Importantly, preferential innervation of the PFC by the neuromodulator norepinephrine and of the striatum by dopamine represent important mechanisms for selectively activating astrocytes and modulating task-specific neuronal output. The theory thus relates to the architecture of neuron-glia interactions in the brain, which we will formalize with a new class of networks, Glial Deep Neural Networks (GDNNs). We will examine the predictions of the theory and GDNNs experimentally in humans and mice using two tasks that exemplify non-Markov processes: one involving contextual sequential decisions and another involving transfer learning. We posit that by selectively modulating plasticity of synapses on subsets of neurons in the PFC and striatum, astrocytes can prevent interference in storage of unrelated tasks and promote information sharing between similar tasks (P3, P4). Instead of searching over all possible features, we propose that the brain leverages similarity between tasks to efficiently identify critical features (P2, P5). We suggest that activation of PFC by norepinephrine underlies exploration, and dopamine activation of the striatum underlies exploitation (P1). To decompose a new task into subtasks and recompose previously solved sub-tasks to solve a future, more complex task, we propose that a crucial mechanism involves rapid astrocyte-neuron dynamics in PFC coordinated with slower astrocyte-neuron ÔchunkingÕ in the striatum (P2, P3 and P5). Our team of six investigators has exceptional theoretical, computational and experimental expertise. In Thrust 1, we will define our theoretical advances and demonstrate its application to the experimental tasks. In Thrust 2, we will design and carry out behavioral and fMRI experiments in humans on the tasks above, with explicit evaluation of data against theoretical and GDNN predictions. In Thrust 3, we will carry out simpler versions of the two tasks in mice, and again evaluate the data against predictions. We will then circle back and update the theory. Insights from our project will guide a principled framework for sequential and transfer learning based on non-Markovian RL. We expect that the theory and algorithms developed in this work will lead to RL systems that are significantly more data-efficient and capable of scaling to problems that are beyon

Document Details

Document Type: DoD Grant Award
Publication Date: Oct 07, 2021
Source ID: W911NF2110328

Entities

People

Mriganka Sur

Organizations

Army Contracting Command
Massachusetts Institute of Technology
United States Army

Rethinking Reinforcement Learning with Astrocyte-Neuron Computations

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas