Convolutional-feature Analysis and Control for Mobile Visual Scene Perception
Abstract
A visual scene can be defined as an environment composed of objects and surfaces arranged in a spatial layout with statistical regularities that are valuable for perception and semantic interpretation. Recent studies have shown that spatial envelope and attributes, such as colors, lines, openness, and patterns of texture, enable rapid scene perception and facilitate object recognition. Using scene context information and deep learning algorithms for nonlinear feature extraction has been has recently been shown highly effective at classifying a single image or video obtained byone camera with little or no motion. These methods, however, do not provide a natural way to integrate asynchronous frames obtained by multiple mobile cameras, or to parse data and construct generative models of objects interacting in a scene. Using a combination of recent developments by the PIs, ranging from image processing to decentralized estimation and control, the proposed research project will develop a deep- learning Bayesian optimization frameworkhinging on sparse features for mobile cooperative scene perception. Because the majority of frames in video areredundant and only a subset of pixels in a frame are informative, this research will develop a convolutional feature extraction technique to extract task-relevant data with relevance backpropagation. Models of task-relevant objects and scene attributes will be learned from available physics-based and generative data- driven models. Obtaining scene models in addition to classifications will make it possible to infer intent, relationships, predict future actions, and help provide a semantic scene interpretation to an operator in the loop. Furthermore, it will be necessary for developing information-driven strategies to actively obtain additional videos collaboratively. The methods developed in thisresearch will exhibit the following key capabilities: (i) extract mission-relevant data from video with little or no prior knowledge of the scene; (ii) fuse spatio-temporal data with different viewpoints and changes in appearance, scale, illumination, and focus; (iii) extract and share compact models and classifications autonomously with few manually labeled data; (iv) operate robustly under dynamic and, possibly, disconnected communication topologies.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Jan 04, 2017
- Source ID
- N000141712175
Entities
People
- Silvia Ferrari
Organizations
- Cornell University
- Office of Naval Research
- United States Navy