A Report of Recent Progress in Transformation-Based Error-Driven Learning

Abstract

Most recent research in trainable part of speech taggers has explored stochastic tagging. While these taggers obtain high accuracy, linguistic information is captured indirectly, typically in tens of thousands of lexical and contextual probabilities. In [Brill 92], a trainable rule-based tagger was described that obtained performance comparable to that of stochastic taggers, but captured relevant linguistic information in a small number of simple non-stochastic rules. In this paper, we describe a number of extensions to this rule-based tagger. First, we describe a method for expressing lexical relations in tagging that stochastic taggers are currently unable to express. Next, we show a rule-based approach to tagging unknown words. Finally, we show how the tagger can be extended into a k-best tagger, where multiple tags can be assigned to words in some cases of uncertainty.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1994
Accession Number
ADA460636

Entities

People

  • Eric Brill

Organizations

  • Massachusetts Institute of Technology

Tags

DTIC Thesaurus Topics

  • Accuracy
  • Artificial Intelligence
  • Automated Speech Recognition
  • Computational Linguistics
  • Computational Science
  • Computer Science
  • Errors
  • Information Science
  • Language
  • Learning
  • Linguistics
  • Markov Models
  • Models
  • Natural Language Processing
  • Natural Languages
  • Probability
  • Test Sets

Readers

  • Computational Linguistics
  • Computational Modeling and Simulation