Intelligent Scene Analysis and Recognition

Abstract

Knowing the name of the location and relative position towards the landmarks not only facilitates the end-user's navigation, but also provides the possibility to offer follow-up geographical services. Visual Location Recognition and Registration (VLRR) is addressed in this report, which refers to the problem of predicting the name and relative position of the location only using the captured query image. This problem is almost ill-posed, because on one hand, there is no formal definition of what constitutes a location and it is still not clear which are the location's properties that helps us to perform the recognition. On the other hand, in order to determine the relative position of the end-user, image registration of large viewpoint variation is required, which itself severely suffers from the well-known matching ambiguity. To solve the first difficulty, Bag-of-Features (BoF) model based on visual codebook is used, where the codebook is obtained by performing an unsupervised clustering on local image features extracted from the training images. Consequently, each location and query image can be efficiently represented by the corresponding histograms of the appearance of visual words in the codebook. Finally a classifier is designed to make the final decision based on the similarity of those histograms via a supervised learning. However, this BoF model lacks of being aware that different visual words actually provide different discrimination power in the sequential location classification. Therefore, a simple and novel weighting scheme, called Visual Words Aggregation Weighting (VWAW) is proposed and we assume those visual words which are cluster centers of highly aggregated local image features while with less neighboring words to be more important than others. These two assumptions are reasonable in the sense that highly aggregated cluster center usually has smaller clustering error and the visual word with less neighborhood is more discriminant and robust.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 30, 2010
Accession Number
ADA519403

Entities

People

  • Kai-kuang Ma

Organizations

  • Nanyang Technological University

Tags

Communities of Interest

  • Energy and Power Technologies
  • Space

DTIC Thesaurus Topics

  • Algorithms
  • Computational Complexity
  • Computations
  • Computer Vision
  • Consistency
  • Databases
  • Detection
  • Detectors
  • Frequency
  • Image Recognition
  • Image Registration
  • Information Science
  • Measurement
  • Mobile Devices
  • Mobile Phones
  • Object Recognition
  • Recognition

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Speech Processing/Speech Recognition.
  • Vision Science/Vision Psychology/Cognitive Neuroscience.

Technology Areas

  • AI & ML