Building Effective Queries in Natural Language Information Retrieval

Abstract

In this paper we report on our natural language information retrieval (NLIR) project as related to the recently concluded 5th Text Retrieval Conference (TREC-5). The main thrust of this project is to use natural language processing techniques to enhance the effectiveness of full-text document retrieval. One of our goals was to demonstrate that robust if relatively shallow NLP can help to derive a better representation of text documents for statistical search. Recently, we have turned our attention away from text representation issues and more towards query development problems. While our NLIR system still performs extensive natural language processing in order to extract phrasal and other indexing terms, our focus has shifted to the problems of building effective search queries. Specifically, we are interested in query construction that uses words, sentences, and entire passages to expand initial topic specifications in an attempt to cover their various angles, aspects and contexts. Based on our earlier results indicating that NLP is more effective with long, descriptive queries, we allowed for long passages from related documents to be liberally imported into the queries. This method appears to have produced a dramatic improvement in the performance of two different statistical search engines that we tested (Cornell's SMART and NIST's Prise) boosting the average precision by at least 40%. In this paper we discuss both manual and automatic procedures for query expansion within a new stream-based information retrieval model.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1997
Accession Number
ADA460509

Entities

People

  • Fang Lin
  • Jin Wang
  • Jose Perez-carballo
  • Tomek Strzalkowski

Tags

Communities of Interest

  • Weapons Technologies

DTIC Thesaurus Topics

  • Abstracts
  • Air Traffic
  • Air Traffic Controllers
  • Automatic
  • Automation
  • Computing-Related Activities
  • Control Systems
  • Databases
  • Electric Automobiles
  • Information Processing
  • Information Retrieval
  • Language
  • Natural Language Processing
  • Natural Languages
  • Precision
  • Sequences
  • Standards

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Educational Psychology

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval