Web-Scale Search-Based Data Extraction and Integration

Abstract

In the current age of abundant, digitized geographic data, the classic, manual approach to geospatial feature discovery and gazetteer creation is cost-prohibitive. While geographic data has become increasingly prevalent on the open Web, it remains largely unstructured and difficult to study. This, the GeoEngine project, has developed generalizable methods for automatic gazetteer generation based on the ample, but unstructured data on the open Web. GeoEngine solves this problem with a three tiered architecture: automatic data discovery and extraction, machine-based semantic aggregation and human validation. GeoEngine has produced specific, but generalizable solutions in the following areas: sub-city feature discovery in domestic and foreign locales; neighborhood boundary discovery and refinement; physical feature gazetteer generation and attribute addition; Wikipedia traversal, extraction and auto-correction; and a comprehensive "Places Profile" of Afghanistan. These methods allow for fast, automated gazetteer generation and support for geospatial research by leveraging the abundance of unstructured data on the open Web and provides new ways of thinking about old problems in geographic information systems.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 17, 2011
Accession Number
ADA554205

Entities

People

  • Govind Kabra
  • Kevin C. Chang
  • Truman Shuck

Tags

Communities of Interest

  • Biomedical
  • C4I
  • Energy and Power Technologies
  • Human Systems
  • Materials and Manufacturing Processes
  • Space

DTIC Thesaurus Topics

  • Arabic Language
  • Buildings And Structures
  • Computer Programming
  • Computer Programs
  • Computers
  • Contracts
  • Databases
  • Geography
  • Information Systems
  • Natural Language Processing
  • Natural Languages
  • Network Protocols
  • Ontologies
  • Social Media
  • United States
  • Web Browsers
  • Websites

Readers

  • Computer Vision.
  • Neural Network Machine Learning.
  • Software Engineering.