Resolving Partial Name Mentions Using String Metrics
Abstract
Information Extraction is concerned with discovering entities, relationships and events from text. Before relationships and events can be discovered accurately, it is critical to resolve all mentions of the same entity. This process is known as coreference resolution. Coreferenced mentions of entities can occur in a number of forms including pronominal mentions; partial name mentions; and through the use of honorifics. This report focuses on addressing the problem of resolving partial name mentions to their canonical form within a text document using character-based string metrics. Based on a review and investigation of some of the main character-based string metrics, we developed a method to resolve partial name mentions within a document. This method applies the Jaro-Winkler string comparator and a variation of the Smith-Waterman string similarity measure. The method was applied to name mentions sourced from a sample of emails with a precision of 97%, and news articles with a precision of 100%.
Document Details
- Document Type
- Technical Report
- Publication Date
- Dec 01, 2007
- Accession Number
- ADA484334
Entities
People
- Jyotsna Das
- Poh Lian Choong
Organizations
- Defence Science and Technology Group