Resolving Partial Name Mentions Using String Metrics

Abstract

Information Extraction is concerned with discovering entities, relationships and events from text. Before relationships and events can be discovered accurately, it is critical to resolve all mentions of the same entity. This process is known as coreference resolution. Coreferenced mentions of entities can occur in a number of forms including pronominal mentions; partial name mentions; and through the use of honorifics. This report focuses on addressing the problem of resolving partial name mentions to their canonical form within a text document using character-based string metrics. Based on a review and investigation of some of the main character-based string metrics, we developed a method to resolve partial name mentions within a document. This method applies the Jaro-Winkler string comparator and a variation of the Smith-Waterman string similarity measure. The method was applied to name mentions sourced from a sample of emails with a precision of 97%, and news articles with a precision of 100%.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 2007
Accession Number
ADA484334

Entities

People

  • Jyotsna Das
  • Poh Lian Choong

Organizations

  • Defence Science and Technology Group

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Addressing
  • Algorithms
  • Artificial Intelligence
  • Australia
  • Classification
  • Command And Control
  • Comparators
  • Data Mining
  • Dynamic Programming
  • Electronic Mail
  • Engineering
  • Information Systems
  • Machine Learning
  • New York
  • Pattern Recognition
  • Personality
  • Precision

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Military Engineering.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval