Text Filtering in MUC-3 and MUC-4

Abstract

One of the changes from the Third (MUC-3) to the Fourth (MUC-4) Message Understanding Conference was the emergence of text filtering as an explicit topic of discussion. In this paper we examine text filtering in MUC systems with three goals in mind. First, we clarify the difference between two uses of the term "text filtering" in the context of data extraction systems, and put these phenomena in the context of prior research on information retrieval (IR). Secondly, we discuss the use of text filtering components in MUC-3 and MUC-4 systems, and present a preliminary scheme for classifying data extraction systems in terms of the features over which they do text filtering. Finally, we examine the text filtering effectiveness of MUC-3 and MUC-4 systems, and introduce some approaches to the evaluation of text filtering systems which may be of interest themselves. Two questions of crucial interest are whether sites improved their system level text filtering effectiveness from MUC-3 to MUC-4, and what the effectiveness of MUC systems would be on real world data streams. Because of changes in both test set and system design since MUC-3 we were not able to address the first question. However, with respect to the second question, we present preliminary evidence suggesting that the text filtering precision of MUC systems declines with the generality of the data stream they process, i.e. the proportion of relevant documents. The ramifications of this for future research and for operational systems are discussed.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 1992
Accession Number
ADA637495

Entities

People

  • David D. Lewis
  • Richard M. Tong

Tags

Communities of Interest

  • Human Systems
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Classification
  • Computer Programs
  • Data Sets
  • Extraction
  • Filters
  • Filtration
  • Information Retrieval
  • Language
  • Latin America
  • Machine Learning
  • Natural Languages
  • Precision
  • Signal Detection
  • Template Patterns
  • Terrorists
  • Test And Evaluation
  • Test Sets

Fields of Study

  • Computer science

Readers

  • Information Retrieval
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval