StreetScenes: Towards Scene Understanding in Still Images

Abstract

This thesis describes an effort to construct a scene understanding system that is able to analyze the content of real images. While constructing the system we had to provide solutions to many of the fundamental questions that every student of object recognition deals with daily. These include the choice of data set, the choice of success measurement, the representation of the image content, the selection of inference engine, and the representation of the relations between objects. The main test-bed for our system is the CBCL StreetScenes data base. It is a carefully labeled set of images, much larger than any similar data set available at the time it was collected. Each image in this data set was labeled for 9 common classes such as cars, pedestrians, roads and trees. Our system represents each image using a set of features that are based on a model of the human visual system constructed in our lab. We demonstrate that this biologically motivated image representation, along with its extensions, constitutes an effective representation for object detection, facilitating unprecedented levels of detection accuracy. Similarly to biological vision systems, our system uses hierarchical representations. We therefore explore the possible ways of combining information across the hierarchy into the final perception. Our system is trained using standard machine learning machinery, which was first applied to computer vision in earlier work of Prof. Poggio and others. We demonstrate how the same standard methods can be used to model relations between objects in images as well, capturing context information. The resulting system detects and localizes, using a unified set of tools and image representations, compact objects such as cars, amorphous objects such as trees and roads, and the relations between objects within the scene. The same representation also excels in identifying objects in clutter without scanning the image.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 2006
Accession Number
ADA454810

Entities

People

  • Stanley M. Bileschi

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • C4I

DTIC Thesaurus Topics

  • Accuracy
  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automata Theory
  • Computer Languages
  • Computer Science
  • Computer Vision
  • Computers
  • Databases
  • Electrical Engineering
  • Information Science
  • Machine Learning
  • Network Science
  • Object Recognition
  • Ontologies
  • Supervised Machine Learning
  • Urban Areas

Fields of Study

  • Computer science

Readers

  • Computer Vision.
  • Distributed Systems and Data Platform Development
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML