Monocular Camera Localization Using a Bag of Visual Words from Virtual World Data
Abstract
The Visual Localization (VL) problem is the question of how to take in a query image and determine the pose of the camera that took that photo. The fact that Red Green Blue (RGB) cameras have become incredibly common and inexpensive, coupled with the huge amount of data they are able to capture have made building VL pipelines that accurately generate results increasingly interesting and important. These pipelines are useful in everything from Simultaneous Localization and Mapping (SLAM) and Smoothing and Mapping (SAM) for robotics to applications in Augmented Reality (AR). The work detailed in this paper seeks to determine if a Bag of Visual Words (BOVW) is adequately able to look past repetitious features in an indoor environment and localize the camera that captured an image. This pipeline is intended to be used as a truth system to verify results from other navigation techniques in development by the Autonomy and Navigation Technology (ANT) Center at the Air Force Institute of Technology (AFIT) in lieu of more expensive solutions such as motion capture systems or the Global Positioning System (GPS) which is unreliable in an indoor environment.
Document Details
- Document Type
- Technical Report
- Publication Date
- Mar 24, 2022
- Accession Number
- AD1166918
Entities
People
- Joshua A. Rinaldi
Organizations
- Air Force Institute of Technology