Distributed EDLSI, BM25, and Power Norm at TREC 2008

Abstract

This paper describes our participation in the TREC Legal competition in 2008. Our first set of experiments involved the use of Latent Semantic Indexing (LSI) with a small number of dimensions, a technique we refer to as Essential Dimensions of Latent Semantic Indexing (EDLSI). Because the experimental dataset is large, we designed a distributed version of EDLSI to use for our submitted runs. We submitted two runs using distributed EDLSI, one with k = 10 and another with k = 41, where k is the dimensionality reduction parameter for LSI. We also submitted a traditional vector space baseline for comparison with the EDLSI results. This article describes our experimental design and the results of these experiments. We find that EDLSI clearly outperforms traditional vector space retrieval using a variety of TREC reporting metrics. We also describe experiments that were designed as a followup to our TREC Legal 2007 submission. These experiments test weighting and normalization schemes as well as techniques for relevance feedback. Our primary intent was to compare the BM25 weighting scheme to our power normalization technique. BM25 outperformed all of our other submissions on the competition metric (F1 at K) for both the ad hoc and relevance feedback tasks, but Power normalization outperformed BM25 in our ad hoc experiments when the 2007 metric (estimated recall at B) was used for comparison.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2008
Accession Number
ADA512902

Entities

People

  • Andrew Lilly
  • April Kontostathis
  • Raymond J Spiteri

Organizations

  • Ursinus College

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Algorithms
  • Competition
  • Computer Science
  • Dimensionality Reduction
  • Equations
  • Experimental Design
  • Feedback
  • Frequency
  • Information Processing
  • Information Retrieval
  • Mathematics
  • Natural Language Processing
  • Precision
  • Standards
  • Training
  • Universities
  • Vector Spaces

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Computational Modeling and Simulation
  • Forest Ecology

Technology Areas

  • Space