THIS IS A CONTINUATION OF N00014-14-1-0232 Deep Structured Learning for Scene Understanding

Abstract

Short Work Statement:Investigate Deep Boltzmann Machines for learning the structure and parameters of complex objects and scenes and develop fast and scalable learning and inference algorithms for scene understanding. Evaluate these methods on a number of popular datasets for recognition of object and scene types.Objective:Investigate Deep Boltzmann Machines for learning the structure and parameters of complex concepts (notably complex objects and scenes) and develop fast and scalable learning and inference algorithms for scene understanding.Approach:The Pis will investigate Deep Boltzmann Machines for learning the structure and parameters of complex concepts. They will develop fast and scalable learning and inference algorithms for DBMs. The focus will be recognition of complex objects and scenes, which will be represented by DPM (Deformable Parts Model). They will investigate the relationship between DBM and Convolutional Networks and will develop hybrid networks for more efficient learning. They will develop approximate methods for learning and inference that will make DBM more tractable and real-time, through developing sparse DBMs. The Pis will also investigate integrating image data and high-level knowledge forjoint segmentation and recognition in the DBM framework. These algorithms will be tested on standard datasets in the vision community for object recognition and holistic scene recognition.Overall Merits/ONR Relevance:This work is expected to substantially contribute to the methods for learning complex concepts including complex objects and their relationships, and scenes.This work is highly relevant to enabling Navy?s Autonomy and Information Dominance. Automated methods for recognizing objects and scene are of critical importance to naval missions that include perception for autonomous agents and understanding surveillance imagery.Progress:This is a collaboration of Raquel Urtasun and Ruslan Salakhutdinov. The focus of this project is to investigate Deep Learning methods and extend them to image/video understanding and natural language processing. Representations learned using deep learning have been shown to outperform hand-craft features in a wide variety of application domains. We developed two new deep learning algorithms that can learn complex representations taking into account the dependencies between the random variables we are interested in predicting. The main difficulty of developing such algorithms is that, unlike the standard neural net setting, the forward pass is complex (NP hard) as it involves inference in a MRF. Our first algorithm is able to use similar approximations than the ones employed in LP relaxations for inference in graphical models to create a single loop algorithm that interleaves the computation of the forward and backward passes resulting in much faster convergence. Our second algorithm is able to exploit the fact that mean-field can be seen as a recurrent net to provide very fast learning in the context of fully-connected CRFs with Gaussian potentials. This is an important case, as the state of the art in semantic segmentation can be achieved with such MRFs. In computer vision, we have developed object detectors that exploit deep features and depth for very accurate 3D object detection. In fact, our approach is currently the state-of- the-art in the challenging KITTI dataset for autonomous driving. In the context of 2D object detection, we have developed an approach that uses deep features encoding context, appearance and segmentation, showing remarkable improvements over R-CNN in the difficult PASCAL VOC challenge. We have also shown that deep learning can be used to perform joint instance level segmentation and depth ordering from a single monocular image. In the context of aerial images, we have developed a very efficient approachthat employs freely available maps (i.e., OpenStreetMaps) to construct a MRF for road segmentation where both feature computation and inference can be

Document Details

Document Type: DoD Grant Award
Publication Date: Sep 23, 2016
Source ID: N000141612792

Entities

People

Raquel Urtasun

Organizations

Office of Naval Research
United States Navy
University of Toronto

THIS IS A CONTINUATION OF N00014-14-1-0232 Deep Structured Learning for Scene Understanding

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas