Ensemble Clustering for Result Diversification

Abstract

This paper describes the participation of the University of Twente in the Web track of TREC 2012. Our baseline approach uses the Mirex toolkit, an open source tool that sequentially scans all the documents. For result diversification we experimented with improving the quality of clusters through ensemble clustering. We combined clusters obtained by different clustering methods (such as LDA and K-means) and clusters obtained by using different types of data (such as document text and anchor text). Our two-layer ensemble run performed better than the LDA based diversification and also better than a non-diversification run.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2012
Accession Number
ADA581520

Entities

People

  • Djoerd Hiemstra
  • Dong Nguyen

Organizations

  • University of Twente

Tags

Communities of Interest

  • Biomedical

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Base Lines
  • Clustering
  • Generative Models
  • Information Operations
  • Language
  • Models
  • Probability
  • Random Walk
  • Scientific Research
  • Solar System
  • Standards
  • Universities

Fields of Study

  • Computer science

Readers

  • Information Retrieval
  • Neural Network Machine Learning.