EMBODIED VISUAL INTELLIGENCE

Abstract

In the past decade, the democratization of machine learning methods has led to an explosion of successes. The successes have been widely heralded, and where they might be incomplete, the belief that more data and more compute power will complete the success is widely held (e.g., de Melo et al. 2022). Many leading proponents have become so bold as to assert that their methods suffice to explain the breadth of human intelligence (Silver et al. 2021). Has the field of Artificial Intelligence really solved how to embody human intelligence in machines in just the 66 years since the 1956 Dartmouth Conference? The answer is not as clear as some claim (see Marcus 2022, for just one of the many skeptics). The Dartmouth goal was "that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it". Of course, every aspect of intelligence was certainly not known in 1956 and there are myriad mysteries remaining many decades later. Our research program has addressed many aspects of visual attention and executive control. The goal is to build a prototype of a new visual cognition architecture, consistent with the known aspects of how humans solve complex visuospatial tasks in the real world. We sought to have a theory that would make falsifiable predictions that motivate new experiments, and to also have a physical embodiment that might have practical utility. We have made solid progress on each, even the discovery of new knowledge of human visual processing. However, as the project moved forward, we gave increasingly greater consideration to how such behavior might be learned. We realized we did not have a sufficient quantitative understanding of human visuospatial behavior let alone data for a learning method, an open issue for human visual intelligence. We discovered that such data did not exist, so we developed an experimental infrastructure and methodology for its collection. The surprises that were revealed led to this new project. We tried many forms of reinforcement learning (RL) on our tasks to no avail. On closer examination, we found too many unaligned elements and decided that we needed to invent methodology that would support our overall goals. We named this ACE - Active Composition and Exploration - a combination of learning, planning, search, representation, active agent observation, perception and task-driven behavior for a real 3D world. In this new project, we seek to develop theory, methods and embodiments of a visually intelligent artificial agent. The practical utility lies in domains where there is a need to view and interpret novel and difficult visual scenarios and dynamically formulate responses in a mission-directed context.

Document Details

Document Type: DoD Grant Award
Publication Date: Apr 20, 2023
Source ID: FA95502210538

Entities

People

John Tsotsos

Organizations

Air Force Office of Scientific Research
United States Air Force
University of York

EMBODIED VISUAL INTELLIGENCE

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas