A Simple Rule-Based Part of Speech Tagger

Abstract

Automatic part of speech tagging is an area of natural language processing where statistical techniques have been more successful than rule-based methods. In this paper, we present a simple rule-based part of speech tagger which automatically acquires its rules and tags with accuracy comparable to stochastic taggers. The rule-based tagger has many advantages over these taggers, including: a vast reduction in stored information required, the perspicuity of a small set of meaningful rules, ease of finding and implementing improvements to the tagger, and better portability from one tag set, corpus genre or language to another. Perhaps the biggest contribution of this work is in demonstrating that the stochastic method is not the only viable method for part of speech tagging. The fact that a simple rule-based tagger that automatically learns its rules can perform so well should offer encouragement for researchers to further explore rule- based tagging, searching for a better and more expressive set of rule templates and other variations on the simple but effective theme described below.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1992
Accession Number
ADA460532

Entities

People

  • Eric Brill

Organizations

  • University of Pennsylvania

Tags

DTIC Thesaurus Topics

  • Accuracy
  • Acquisition
  • Computational Linguistics
  • Dictionaries
  • Errors
  • Information Science
  • Language
  • Linguistics
  • Markov Models
  • Models
  • Natural Language Processing
  • Natural Languages
  • Probabilistic Models
  • Probability
  • Statistics
  • Template Patterns
  • Training

Fields of Study

  • Computer science

Readers

  • Computational Linguistics

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation