Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French

Abstract

This paper presents a statistical decision procedure for lexical ambiguity resolution. The algorithm exploits both local syntactic patterns and more distant collocational evidence, generating an efficient, effective, and highly perspicuous recipe for resolving a given ambiguity. By identifying and utilizing only the single best disambiguating evidence in a target context, the algorithm avoids the problematic complex modeling of statistical dependencies. Although directly applicable to a wide class of ambiguities, the algorithm is described and evaluated in a realistic case study, the problem of restoring missing accents in Spanish and French text. Current accuracy exceeds 99% on the full task, and typically is over 90% for even the most difficult ambiguities.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 1994
Accession Number
ADA574712

Entities

People

  • David Yarowsky

Organizations

  • University of Pennsylvania

Tags

Communities of Interest

  • Human Systems

DTIC Thesaurus Topics

  • Accuracy
  • Algorithms
  • Case Studies
  • Computational Linguistics
  • Computational Processes
  • Computer Languages
  • Computer Programs
  • Computers
  • Language
  • Linguistics
  • Machine Learning
  • Models
  • Natural Language Processing
  • Probabilistic Models
  • Probability
  • Probability Distributions
  • Semantic Models

Readers

  • Computational Linguistics
  • Neural Network Machine Learning.