Developing a Corpus Specific Stoplist Using Quantitative Comparison

Abstract

We have become overwhelmed with electronic information and it seems our situation is not going to improve. It is becoming increasingly common for people to work with information on a daily basis. We seem to spend more and more time looking for information, and it is taking longer because more information is available. This thesis will look at how we can provide faster access to the information we want to find. Today's requirements are closely related to searching for information using queries. At the heart of the query process is the removal of search terms having little or no significance to the search being performed. Words considered to have little significance, in terms of their searching power, called stopwords, are compiled in a stoplist. Stoplists are usually constructed from commonly occurring words in the English language. This approach is acceptable for systems handling broad categories of information. We will build a stoplist for a specific area of interest based on a specific body of linguistic data, or corpus. A stoplist developed from an Air Force corpus will be tested to see if it is more effective than a stoplist created from a general use corpus.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 1997
Accession Number
ADA334570

Entities

People

  • Craig N. Berg

Organizations

  • Air Force Institute of Technology

Tags

Communities of Interest

  • Air Platforms
  • Biomedical
  • C4I
  • Space

DTIC Thesaurus Topics

  • Air Force
  • Computer Programming
  • Computer Programs
  • Computers
  • Databases
  • Defense Systems
  • Information Retrieval
  • Language
  • Military Science
  • Spreadsheet Software
  • Statistical Analysis
  • Students
  • Tactical Reconnaissance
  • United States
  • War Colleges
  • Warfare
  • Word Processors

Readers

  • Computational Linguistics
  • Geospatial Intelligence and Artificial Intelligence Analytics
  • Theoretical Analysis.

Technology Areas

  • Microelectronics