Self-Learning Perception through Interaction with the Real World
Abstract
Modern advancements in Machine Learning (ML) have brought about transformative progress in a variety of computer vision problems suc,h as image classification, detection, segmentation and captioning. The recipe thus far has been: gather large labeled datasets and t,rain large parametric models on them. However, such a learning paradigm with static datasets has proven to be insufficient for our r,eal visual world. Although our current vision systems can score high on seemingly difficult benchmarks, they remain brittle to out-o,f-distribution examples that are present throughout the real world. Circumventing this brittleness has proven hard, and even just mo,destly reducing this brittleness has required substantial human effort by prescribing data priors such as augmentations or by the te,dious collection of curated data for a specific task.From infancy, children spontaneously map visual input onto abstract representat,ions of the world with little apparent supervision or reward. Some of thesemappings may be hard-wired, but many are constructed by c,hildren themselves as they make their way through the world. Importantly, children neither use carefully curated or static datasets,, nor are expected to engage as passive observers. They instead explore, experiment, and constantly engage in a rich visual world.Cr,eating such a system that physically interacts with their environment would not only obtain data that better reflects the scenarios, that system would see in the real world but also actively propose and investigate hypotheses in it. In this proposal we seek to cre,ate a vision system that is more human-like through learning from data gathered through interaction with the world. Instead of view,ing data as a fixed, unchanging repository, we cast the data collection process as a learning problem in itself. A vision framework, that can effectively do this will be able to move away from static benchmark datasets, and instead self-learn through actively coll,ected, continual and embodied data in the real world.Expected Outcome: Creating such embodied vision systems will require rethinking, our vision systems from the ground up. We will first need to develop systems that can actively interact with their environment andc,ollect data through intrinsic-motivation based exploration. As the embodied system keeps continuously collecting data, it will need, to simultaneously assimilate this experience, reason about the world, and decide on what to do next. To accelerate the self-learnin,g of our visual systems and be useful in human-centric environments, our embodied agents will need to best use any form of human inf,ormation available, passive and active. Using such data will bring about unique machine learning challenges such as learning from hi,ghly-correlated data, continual data-streams, and in a single pass.Impact on DoD: The result will be the ability to build vision sys,tems that can more robustly support any of the DoD?s missions that benefit from visual understanding of the world. Success in this e,ffort could dramatically increase DoD capabilities for deploying autonomous agents for crit?ical functionalities such as surveillanc,e, intelligence gathering & reconnaissance, autonomous control of vehicles and naval vessels, repair & maintenance, disaster relief., Our approach will greatly increase the capabilities and problem domains where visual intelligence systems could be applied.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Oct 07, 2022
- Source ID
- N000142212773
Entities
People
- Pieter Abbeel
Organizations
- Office of Naval Research
- United States Navy
- University of California Regents