Large Scale Hierarchical K-Means Based Image Retrieval With MapReduce

Abstract

Image retrieval remains one of the most heavily researched areas in Computer Vision. Image retrieval methods have been used in autonomous vehicle localization research, object recognition applications, and commercially in projects such as Google Glass. Current methods for image retrieval become problematic when implemented on image datasets that can easily reach billions of images. In order to process these growing datasets, we distribute the necessary computation for image retrieval among a cluster of machines using Apache Hadoop. While there are many techniques for image retrieval, we focus on systems that use Hierarchical K-Means Trees. Successful image retrieval systems based on Hierarchical K-Means Trees have been built using the tree as a Visual Vocabulary to build an Inverted File Index and implementing a Bag of Words retrieval approach, or by building the tree as a Full Representation of every image in the database and implementing a K-Nearest Neighbor voting scheme for retrieval. Both approaches involve different levels of approximation, and each has strengths and weaknesses that must be weighed in accordance with the needs of the application. Both approaches are implemented with MapReduce, for the first time, and compared in terms of image retrieval precision, index creation run-time, and image retrieval throughput. Experiments that include up to 2 million images running on 20 virtual machines are shown.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 27, 2014
Accession Number
ADA602439

Entities

People

  • William E. Murphy

Organizations

  • Air Force Institute of Technology

Tags

Communities of Interest

  • Autonomy
  • Human Systems
  • Space

DTIC Thesaurus Topics

  • Air Force
  • Algorithms
  • Artificial Intelligence
  • Computer Programming
  • Computer Vision
  • Computers
  • Data Sets
  • Data Storage Systems
  • Detection
  • Detectors
  • Electrical Engineering
  • Feature Extraction
  • Reliability
  • Training
  • United States Strategic Command
  • Unmanned Aerial Vehicles
  • Virtual Machines

Fields of Study

  • Computer science

Readers

  • Computer Vision.
  • Distributed Systems and Data Platform Development
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • Autonomy