Realistic Modeling of Simple and Complex Cell Tuning in the HMAX Model, and Implications for Invariant Object Recognition in Cortex

Abstract

Riesenhuber & Poggio recently proposed a model of object recognition in cortex which, beyond integrating general beliefs about the visual system in a quantitative framework, made testable predictions about visual processing. In particular, they showed that invariant object representation could be obtained with a selective pooling mechanism over properly chosen afferents through a MAX operation: For instance, at the complex cells level, pooling over a group of simple cells at the same preferred orientation and position in space but at slightly different spatial frequency would provide scale tolerance, while pooling over a group of simple cells at the same preferred orientation and spatial frequency but at slightly different position in space would provide position tolerance. Indirect support for such mechanisms in the visual system comes from the ability of the architecture at the top level to replicate shape tuning as well as shift and size invariance properties of "view-tuned cells" (VTUs) found in inferotemporal cortex (IT), the highest area in the ventral visual stream, thought to be crucial in mediating object recognition in cortex. There is also now good physiological evidence that a MAX operation is performed at various levels along the ventral stream. However, in the original paper by Riesenhuber & Poggio, tuning and pooling parameters of model units in early and intermediate areas were only qualitatively inspired by physiological data. Many studies have investigated the tuning properties of simple and complex cells in primary visual cortex, V1. We show that units in the early levels of HMAX can be tuned to produce realistic simple and complex cell-like tuning, and that the earlier findings on the invariance properties of model VTUs still hold in this more realistic version of the model.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 01, 2004
Accession Number
ADA459692

Entities

People

  • Maximilian Riesenhuber
  • Thomas Serre

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Biomedical
  • C4I

DTIC Thesaurus Topics

  • Amplitude
  • Applied Computer Science
  • Artificial Intelligence
  • Aspect Ratio
  • Bandwidth
  • Cognitive Science
  • Computer Science
  • Computer Vision
  • Contracts
  • Frequency
  • Frequency Bands
  • Object Recognition
  • Recognition
  • Standards
  • Three Dimensional
  • Two Dimensional
  • Visual Cortex

Readers

  • Computer Vision.
  • Neural Network Machine Learning.
  • Theoretical Analysis.

Technology Areas

  • Space