Knowledge Graphs for Planning and Perception
Abstract
Over the last decade, the computer vision community has made significant advances in representing and modeling the visual world. But our current approaches are surprisingly different compared to humans. While modern learning-based approaches can recognize some categories with high accuracy, it usually requires thousands of labeled examples for each of these categories. However, most categories in the real world have many fewer examples, forming a long-tail distribution [61]. And yet, even when only shown a few or even one example, humans have the remarkable ability to recognize these categorieswith high accuracy. In the same vein, most of our planning and reinforcement learning (RL) algorithms require millions of trial-and-run instances to learn a policy ~ on the other hand, humans can learn how to perform actions with very few examples. We believe the key lies in reasoning ~ our current systems use bottom-up and feed forward processing without any commonsense reasoning. On the other hand, humans use the commonsense relationships to performreasoning about objects, parts, physics, functions, and even intentions. This proposal aims to bridge this gap; focusing on how we can exploit commonsense knowledge for purposes of planning and perception. Our long-term goal is to bring commonsense reasoning into the fold of end-to-end learning approaches. In this project, we propose to represent commonsense as knowledge graph and we use the knowledge graph as constraint to perform end-to-end learning for perception and planning. We learn the knowledge graph using a multi-task, multi-modal, lifelong learning framework. But more importantly, once we have learned the knowledge graph; we learn how to use the knowledge graphs for multiple tasks such as image classification,zero-shot recognition, navigation and even manipulation tasks such as grasping. Specifically, we learn information propagation models on top of these knowledge graphs which allows us to model the errors in the graph (i.e. knowledge database) and model the uncertainty in individual classification or detection. Impact: The proposed research can be viewed as an attempt to return the focus of computer vision and AI to knowledge representation and reasoning.Therefore, the proposed research will result in major advances in all fields of AI, including vision, natural language processing, and robotics. It could be a critical enabling technology for applications such as autonomous systems,surveillance, visual data mining and personal robotics. Some aspects of this research might have ramifications beyond traditional areas. For example, we are collaborating with Material Science researchers to develop a web-based learning system for automatically building a micro-structure image database. But more importantly, we anticipate that fundamental aspects of this research will be of interest to the psychology and neuroscience communities to test long-standing questions on vision, learning and cognition. In collaboration with Michael Tarr and Elissa Aminoff, we are investigating if humans create and use associations. In collaboration with Josh Tenenbaum, we will explore algorithms inspired from human behavioral experiments.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Jul 10, 2018
- Source ID
- N000141812312
Entities
People
- Abhinav Gupta
Organizations
- Massachusetts Institute of Technology
- Office of Naval Research
- United States Navy