Multimodal Vision-Language Architectures for Continual Learning and Complex Tasks Tracking: 23-00003510

Abstract

We will build the next generation multimodal GPL system, with unprecedented applicability in vision, audio, language, and embodied tasks. Many previous works, such as GPT-3, improve by massively scaling, but further order-of-magnitude scaling is infeasible. Theseworks rely on a monolithic network to retain knowledge and solve complex inference tasks. Instead, inspired by theories of human learning and memory, we propose to add complementary learning systems, including long-term and task-duration memory. The dense network weights model common objects and tasks, while the sparse memory tokens enable fast learning and retention of infrequent examples.We further investigate program generation as a scalable and interpretable mechanism for performing complex, multi-step tasks. Despite the well-known importance of multiple memory and reasoning systems in humans, AI researchers have focused almost exclusively on developing monolithic systems. Thus, we believe this research could lead tomajor breakthroughs in what AI systems can achieve. Specifically, we expect the following direct benefits: (1) Multimodal AI systems that can perform a broad range of tasks that are specified by natural language; (2) Ability to acquire knowledge in real time while retaining previous knowledge; (3) Ability to perform multistep tasks and to report both the plan and intermediate results; and (4) Ability to retain and use instructions and task context in performance of longer duration tasks. We will benchmark our systems on a wide variety of vision, language, and embodied benchmarks. Long-term applications include robotics, image and video analysis, inspection, AI assistants, among many others. Approved for Public Release.

Document Details

Document Type
DoD Grant Award
Publication Date
Apr 12, 2023
Source ID
N000142312383

Entities

People

  • Derek Hoiem

Organizations

  • Office of Naval Research
  • United States Navy
  • University of Illinois Urbana–Champaign

Tags

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Parallel and Distributed Computing.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - DoD AI Strategy
  • AI & ML - Neural Networks
  • Autonomy