Using Information Extraction to Improve Document Retrieval

Abstract

The authors describe an approach to applying a particular kind of Natural Language Processing (NLP) system to the TREC routing task in Information Retrieval (IR). Rather than attempting to use NLP techniques in indexing documents in a corpus, they adapted an information extraction (IE) system to act as a post-filter on the output of an IR system. The IE system was configured to score each of the top 2000 documents as determined by an IR system and on the basis of that score to rerank those 2000 documents. One aim was to improve precision on routing tasks. Another was to make it easier to write IE grammars for multiple topics.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 09, 1998
Accession Number
ADA470701

Entities

People

  • David C. Martin
  • David Israel
  • Jeff Petit
  • John Bear

Organizations

  • SRI International

Tags

DTIC Thesaurus Topics

  • Computational Linguistics
  • Environmental Pollution
  • Environmental Restoration And Remediation
  • Extraction
  • Formal Languages
  • Grammars
  • Information Retrieval
  • Language
  • Linguistics
  • Natural Language Processing
  • Natural Languages
  • Ocean Surveillance
  • Precision
  • Probabilistic Models
  • Sequences
  • Template Patterns
  • Water Pollution

Fields of Study

  • Computer science

Readers

  • Computational Linguistics

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation