Document Filtering Using Semantic Information from a Machine Readable Dictionary

Abstract

Large scale information retrieval systems need to refine the flow of documents which will receive further fine-grain analysis to those documents with a high potential for relevance to their respective users. This paper reports on research we have conducted into the usefulness of semantic codes from a machine readable dictionary for filtering large sets of incoming documents for their broad subject appropriateness to a topic of interest. The Subject Field Coder produces a summary-level semantic representation of a text's contents by tagging each word in the document with the appropriate, disambiguated Subject Field Code (SFC). The within-document SFCs are normalized to produce a vector of the SFCs representing that document's contents. Queries are likewise represented as SFC vectors and then compared to SFC vectors of incoming documents, which are then ranked according to similarity to the query SFC vector. Only those documents whose SFC vectors exhibit a predetermined degree of similarity to the query SFC vector are passed to later system components for more refined representation and matching. The assignment of SFCs is fully automatic, efficient and has been empirically tested as a reasonable approach for ranking documents from a very large incoming flow of documents. We report details of the implementation, as well as results of an empirical testing of the Subject Field Coder on fifty queries.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 1993
Accession Number
ADA457808

Entities

People

  • Edmund S. Yu
  • Elizabeth D. Liddy
  • Woojin Paik

Organizations

  • Syracuse University

Tags

Communities of Interest

  • Engineered Resilient Systems
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Abstracts
  • Acquisition
  • Algorithms
  • Classification
  • Commerce
  • Computer Science
  • Databases
  • Detection
  • Dictionaries
  • English Language
  • Filtration
  • Information Retrieval
  • Language
  • Natural Languages
  • Political Science
  • Sociology
  • Universities

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Library and Information Science
  • Operations Research

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval