Comparison of Human and Latent Semantic Analysis (LSA) Judgements of Pairwise Document Similarities for a News Corpus
Abstract
Pairwise similarity judgement correlations between humans and Latent Semantic Analysis (LSA) were explored on a set of 50 news documents. LSA is a modern and commonly used technique for automatic determination of document similarity. LSA users must choose local and global weighting schemes, the number of factors to be retained, stop word lists and whether to background. Global weighting schemes had more effect than local weighting schemes. Use of a stop word list almost always improved performance. Introduction of a background set of similar documents increased larger correlations and reduced smaller ones The correlations ranged between approximately 0 and 0.6 depending on the LSA settings indicating the importance of correct settings The low maximum correlation indicates that information presentation schemes based on LSA may often be at variance with visualisations based on human decisions even using the best settings for a data set.
Document Details
- Document Type
- Technical Report
- Publication Date
- Sep 01, 2004
- Accession Number
- ADA427585
Entities
People
- Brandon Pincombe
Organizations
- Defence Science and Technology Group