Comparing the Performance of Collection Selection Algorithms

Abstract

The proliferation of online information resources increases the importance of effective and efficient information retrieval in a multicollection environment. Multicollection searching is cast in three parts: collection selection (also referred to as database selection), query processing and results merging. In this work, we focus our attention on the evaluation of the first step, collection selection. In this article, we present a detailed discussion of the methodology that we used to evaluate and compare collection selection approaches, covering both test environments and evaluation measures. We compare the CORI, CVV and gGIOSS collection selection approaches using six test environments utilizing three document testbeds. We note similar trends in performance among the collection selection approaches, but the CORI approach consistently outperforms the other approaches, suggesting that effective collection selection can be achieved using limited information about each collection. The contributions of this work are both the assembled evaluation methodology as well as the application of that methodology to compare collection selection approaches in a standardized environment.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 01, 2003
Accession Number
ADA466183

Entities

People

  • Allison L. Powell
  • James C. French

Organizations

  • University of Virginia

Tags

Communities of Interest

  • C4I
  • Engineered Resilient Systems

DTIC Thesaurus Topics

  • Algorithms
  • Computer Science
  • Databases
  • Estimators
  • Information Retrieval
  • Information Science
  • Information Systems
  • Intellectual Property
  • Internet
  • Judgment
  • Knowledge Management
  • Language
  • Models
  • New York
  • Probabilistic Models
  • Statistics
  • World Wide Web

Fields of Study

  • Computer science

Readers

  • Business Analytics
  • Distributed Systems and Data Platform Development
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML