Self-Guided Learning within Broad Task Domains

Abstract

In some benchmarks, computer vision and machine learning systems are wildly successful and sometimes even outperform humans. Yet, mo dern AI systems have limited application due to requirements for enormous data sources and models that work only on narrowly defined tasks and datasets. Our proposed work addresses these shortcomings with two innovations: (1) create systems that operate within bro ad task domains, such as vision-language or tablet programs, instead of solving a discrete set of tasks; and (2) design learning met hods that exploit large, diversely annotated data sources to gain proficiency on target tasks with less target data. We expect this work to lead to general purpose AI systems that continuously learn new tasks and adapt to changing environments. Our work specifical ly addresses two challenges. First, we propose general purpose architectures, where the key idea is to narrow the interface to broad en applicability. Consider how the chef s knife with its simple interface of blade and handle has broader applicability than a food processor with its many buttons and accessories. We will design an architecture that enables a broad range of tasks to be performed with limited input/output channels, in contrast with current architectures that have a specialized output for each category and typ e of prediction. Second, we propose methods for self-guided learning. Continuous learning requires that a system chooses what it le arns from and how it adapts to perform well on known tasks, to be prepared for future tasks, and to retain performance on already le arned tasks. Prior work in meta-learning, continual learning, and avoiding catastrophic forgetting typically assumes a small number of tasks are learned sequentially. We will investigate new methods to learn from a large and growing repository of available data an d annotations.We plan to investigate these challenges within application domains of vision-language tasks and childrens learning ga mes. For vision-language tasks, given an image and text that describes the task (e.g. "How many windows are installed?"), the syste m must output appropriate annotations (bounding boxes, counts, segmentations) and/or text. We motivate the system with use cases in safety, progress, and quality monitoring for construction, where tasks may vary by company and project. Our evaluation focuses on ability to learn target tasks from limited data and to transfer across skills (e.g. detect people not wearing hard hats after learni ng to classify whether someone is wearing a hard hat). In childrens learning games, given the standard tablet interface of image (s creen) and audio/text, the system must touch/drag/drop to complete modules in learning games. We will investigate curriculum learnin g and neural symbolic architectures, and the evaluation will focus on ability to complete new modules with a limited number of trial s.

Document Details

Document Type
DoD Grant Award
Publication Date
Aug 20, 2021
Source ID
N000142112705

Entities

People

  • Derek Hoiem

Organizations

  • Office of Naval Research
  • United States Navy
  • University of Illinois Urbana–Champaign

Tags

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Artificial Intelligence
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks