Text Filtering in MUC-3 and MUC-4

Abstract

One of the changes from the Third (MUC-3) to the Fourth (MUC-4) Message Understanding Conference was the emergence of text filtering as an explicit topic of discussion. In this paper we examine text filtering in MUC systems with three goals in mind. First, we clarify the difference between two uses of the term "text filtering" in the context of data extraction systems, and put these phenomena in the context of prior research on information retrieval (IR). Secondly, we discuss the use of text filtering components in MUC-3 and MUC-4 systems, and present a preliminary scheme for classifying data extraction systems in terms of the features over which they do text filtering. Finally, we examine the text filtering effectiveness of MUC-3 and MUC-4 systems, and introduce some approaches to the evaluation of text filtering systems which may be of interest themselves. Two questions of crucial interest are whether sites improved their system level text filtering effectiveness from MUC-3 to MUC-4, and what the effectiveness of MUC systems would be on real world data streams. Because of changes in both test set and system design since MUC-3 we were not able to address the first question. However, with respect to the second question, we present preliminary evidence suggesting that the text filtering precision of MUC systems declines with the generality of the data stream they process, i.e. the proportion of relevant documents. The ramifications of this for future research and for operational systems are discussed.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jun 01, 1992
Accession Number: ADA637495

Entities

People

David D. Lewis
Richard M. Tong

Text Filtering in MUC-3 and MUC-4

Abstract

Document Details

Entities

People

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas