Visual Question Answering (VQA)
Abstract
Statement of Work:Develop underlying capabilities for building a semantic-based visual question-answering system that can communicate with humans in natural language.Objective:The goal is to enable machines to understand semantic content in images, and communicate this understanding as effectively as humans via natural language.Approach:The PI proposes to address the problem of Visual Question Answering (VQA). Given an image and a free-form, natural language question about the image, the task is to automatically produce a concise, accurate, free-form, natural language answer. This research is expected to generate new datasets, knowledge, and techniques in pure computer vision, in integrating vision and language, in developing visual common sense, and in interpretable models. Also contributions are expected in training the machine to be curious and actively ask questions to learn, and training the machine to know what it knows and what it does not. Deep learning is a key approach in this proposal. The PI will buildon her pioneering work in developing universal attributes and relative attributes. Another innovative aspect of theproposed research is using drawings and sketches to train the system to recognize subtle differences between similar concepts.Overall Merit and ONR Mission/Relevance:This research addresses ONR~s Information Dominance focus area, as well as Autonomy and Unmanned Systems focus area. This work is expected to advance visual question-answering systems for use by intelligence analysts, as well as enhanced image interpretation capabilities for autonomous agents.This research is expected to develop novel approaches toward building sophisticated semantic-based visual question answering systems.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Aug 12, 2016
- Source ID
- N000141612647
Entities
People
- Devi Parikh
Organizations
- Office of Naval Research
- United States Navy
- Virginia Tech