Sparse Matrix Factorization: Applications to Latent Semantic Indexing

Abstract

This article describes the use of Latent Semantic Indexing (LSI) and some of its variants for the TREC Legal batch task. Both folding-in and Essential Dimensions of LSI (EDLSI) appeared as if they might be successful for recall-focused retrieval on a collection of this size. Furthermore, we developed a new LSI technique, one which replaces the Singular Value Decomposition (SVD) with another technique for matrix factorization, the sparse column-row approximation (SCRA). We were able to conclude that all three LSI techniques have similar performance. Although our 2009 results showed significant improvement when compared to our 2008 results, the use of a better method for selection of the parameter K, which is the ranking that results in the best balance between precision and recall, appears to have provided the most benefit.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2009
Accession Number
ADA517770

Entities

People

  • April Kontostathis
  • Erin Moulding
  • Raymond J Spiteri

Organizations

  • University of Saskatchewan

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Algebra
  • Algorithms
  • Artificial Intelligence
  • Competition
  • Computations
  • Computer Science
  • Frequency
  • Information Processing
  • Information Retrieval
  • Mathematics
  • Natural Language Processing
  • Precision
  • Sparse Matrix
  • Standards
  • Universities
  • Vector Spaces

Readers

  • Information Retrieval
  • Linear Algebra
  • Regression Analysis.