InfoXtract Location Normalization: A Hybrid Approach to Geographic References in Information Extraction

Abstract

Ambiguity is very high for location names. For example, there are 23 cities named Buffalo in the U.S. Based on our previous work, this paper presents a refined hybrid approach to geographic references using our information extraction engine InfoXtract. The InfoXract location normalization module consists of local pattern matching and discourse co-occurrence analysis as well as default senses. Multiple knowledge sources are used in a number of ways: (i) pattern attaching driven by local context, (ii) maximum spanning tree search for discourse analysis and (iii) applying default sense heuristics and extracting default senses from the web. The results are benchmarked with 96% accuracy on our test collections that consist of both news articles and tourist guides. The performance contribution for each component of the module is also benchmarked and discussed.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2003
Accession Number
ADA457797

Entities

People

  • Cheng Niu
  • Huifeng Li
  • Rohini K. Srihari
  • Wei Li

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Algorithms
  • Ambiguity
  • British Columbia
  • Canada
  • Computational Linguistics
  • Extraction
  • Islands
  • Language
  • Linguistics
  • Machine Learning
  • Markov Models
  • Measurement
  • Models
  • Natural Languages
  • New York
  • South Africa
  • Visualizations

Fields of Study

  • Computer science

Readers

  • Artificial Intelligence
  • Computational Fluid Dynamics (CFD)
  • Computational Linguistics

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval