Development of a Stemming Algorithm.

Abstract

A stemming algorithm, a procedure to reduce all words with the same stem to a common form, is useful in many areas of computational linguistics and information retrieval work. While the form of the algorithm varies with its applications, certain linguistic problems are common to any stemming procedure. As a basis for evaluation of previous attempts to deal with these problems, the paper discusses the theoretical and practical attributes of stemming algorithms. A new version of a context-sensitive, longest-match stemming algorithm for English, developed for use in a library information transfer system but of general applications, is then proposed. A major linguistic problem in stemming, variation in spelling of stems, is discussed in some detail and several feasible programmed solutions are outlined, along with sample results of one of these methods. (Author)

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 1968
Accession Number
AD0735504

Entities

People

  • Julie B. Lovins

Organizations

  • Massachusetts Institute of Technology

Tags

DTIC Thesaurus Topics

  • Algorithms
  • Computational Linguistics
  • Information Exchange
  • Information Processing
  • Information Retrieval
  • Information Transfer
  • Linguistics
  • Stemming
  • Test And Evaluation

Readers

  • Computational Linguistics
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Machine Translation