Deep Structured Learning for Scene Understanding

Abstract

Statement of Work:Investigate Deep Boltzmann Machines for learning the structure and parameters of complex objects and scenes and develop fast and scalable learning and inference algorithms for scene understanding. Evaluate these methods on a number of popular datasets for recognition of object and scene types.Objective:Investigate Deep Boltzmann Machines for learning the structure and parameters of complex concepts (notably complex objects and scenes) and develop fast and scalable learning and inference algorithms for scene understanding.Approach:This is collaboration of Ruslan Salakhutdinov (CMU) and Raquel Urtasun (Univ. of Toronto). The PIs will investigate Deep Boltzmann Machines for learning the structure and parameters of complex concepts. They will develop fast and scalable learning and inference algorithms for DBMs. The focus will be recognition of complex objects and scenes, which will be represented by DPM (Deformable Parts Model). They will investigate the relationship between DBM and Convolutional Networks and will develop hybrid networks for more efficient learning. They will develop approximatemethods for learning and inference that will make DBM more tractable and real-time, through developing sparseDBMs. The PIs will also investigate integrating image data and high-level knowledge for joint segmentation andrecognition in the DBM framework. These algorithms will be tested on standard datasets in the vision community for object recognition and holistic scene recognition.Overall Merit and ONR Mission/Relevance:This work is highly relevant to ONR Autonomy and Information Dominance. Automated methods for recognizing objects and scene are of critical importance to naval missions that include perception for autonomous agents and understanding surveillance imagery. This work is expected to substantially contribute to the methods for learning complex concepts including complex objects and their relationships, and scenes.Progress:This is a collaboration of Raquel Urtasun and Ruslan Salakhutdinov. The focus of this project is to investigate Deep Learning methods and extend them to image/video understanding and natural language processing. Representations learned using deep learning have been shown to outperform hand-craft features in a wide variety of application domains. We developed two new deep learning algorithms that can learn complex representations taking into account the dependencies between the random variables we are interested in predicting. The main difficulty ofdeveloping such algorithms is that, unlike the standard neural net setting, the forward pass is complex (NP hard) as it involves inference in a MRF. Our first algorithm is able to use similar approximations than the ones employed in LP relaxations for inference in graphical models to create a single loop algorithm that interleaves the computation of the forward and backward passes resulting in much faster convergence. Our second algorithm is able to exploit the factthat mean-field can be seen as a recurrent net to provide very fast learning in the context of fully-connected CRFs with Gaussian potentials. This is an important case, as the state of the art in semantic segmentation can be achieved with such MRFs. In computer vision, we have developed object detectors that exploit deep features and depth for very accurate 3Dobject detection. In fact, our approach is currently the state-of- the-art in the challenging KITTI dataset for autonomous driving. In the context of 2D object detection, we have developed an approach that uses deep features encoding context, appearance and segmentation, showing remarkable improvements over R-CNN in the difficult PASCAL VOC challenge. We have also shown that deep learning can be used to perform joint instance level segmentation and depthordering from a single monocular image. In the context of aerial images, we have developed a very efficient approach that employs freely available maps (i.e., OpenStreetMaps) to constru

Document Details

Document Type: DoD Grant Award
Publication Date: Sep 23, 2016
Source ID: N000141613074

Entities

People

Ruslan Salakhutdinov

Organizations

Massachusetts Institute of Technology
Office of Naval Research
United States Navy

Deep Structured Learning for Scene Understanding

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas