Effective Retrieval with Distributed Collections

Abstract

This paper evaluates the retrieval effectiveness of distributed information retrieval systems in realistic environments. We find that when a large number of collections are available, the retrieval effectiveness is significantly worse than that of centralized systems, mainly because typical queries are not adequate for the purpose of choosing the right collections. We propose two techniques to address the problem. One is to use phrase information in the collection selection index and the other is query expansion. Both techniques enhance the discriminatory power of typical queries for choosing the right collections and hence significantly improve retrieval results. Query expansion, in particular, brings the effectiveness of searching a large set of distributed collections close to that of searching a centralized collection.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1997
Accession Number
ADA341194

Entities

People

  • Jamie Callan
  • Jinxi Xu

Organizations

  • University of Massachusetts Amherst

Tags

Communities of Interest

  • C4I
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Air Pressure
  • Algorithms
  • Automobiles
  • Boundaries
  • Cardiovascular Physiological Phenomena
  • Computer Science
  • Computers
  • Degradation
  • Environment
  • Frequency
  • Hypertension
  • Information Retrieval
  • Judgment
  • Language
  • Natural Languages
  • Test Sets
  • United States

Fields of Study

  • Computer science

Readers

  • Geospatial Intelligence and Artificial Intelligence Analytics
  • Regression Analysis.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Learning Algorithms