Robust Reading: Identification and Tracing of Ambiguous Names

Abstract

A given entity, representing a person, a location or an organization, may be mentioned in text in multiple, ambiguous ways. Understanding natural language requires identifying whether different mentions of a name, within and across documents, represent the same entity. We develop an unsupervised learning approach that is shown to resolve accurately the name identification and tracing problem. At the heart of our approach is a generative model of how documents are generated and how names are sprinkled into them. In its most general form, our model assumes: (1) a joint distribution over entities, (2) an author model, that assumes that at least one mention of an entity in a document is easily identifiable, and then generates other mentions via (3) an appearance model, governing how mentions are transformed from the representative mention. We show how to estimate the model and do inference with it and how this resolves several aspects of the problem from the perspective of applications such as questions answering.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2004
Accession Number
ADA457894

Entities

People

  • Dan Roth
  • Paul Morie
  • Xin Li

Organizations

  • University of Illinois Urbana–Champaign

Tags

Communities of Interest

  • Autonomy
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Accuracy
  • Computational Science
  • Computer Science
  • Generative Models
  • Hidden Markov Models
  • Identification
  • Language
  • Linguistics
  • Machine Learning
  • Markov Models
  • Maximum Likelihood Estimation
  • Models
  • Probabilistic Models
  • Probability
  • Probability Distributions
  • Test Sets
  • Unsupervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Joint Military Operations and Doctrine.
  • Neural Network Machine Learning.
  • Theoretical Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Information Retrieval