Heuristic Ranking and Diversification of Web Documents

Abstract

We describe the participation of the University of Amsterdam's Intelligent Systems Lab in the web track at TREC 2009. We participated in the ad hoc and diversity task. We find that spam is an important issue in the ad hoc task and that Wikipedia-based heuristic optimization approaches help to boost the retrieval performance, which is assumed to potentially reduce spam in the top ranked results. As for the diversity task, we explored different methods. Clustering and a topic model-based approach have a similar performance and both are relatively better than a query log based approach.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2009
Accession Number
ADA517703

Entities

People

  • Edgar Meij
  • Jiyin He
  • Katja Hofmann
  • Krisztian Balog
  • Maarten De Rijke
  • Manos Tsagkias
  • Wouter Weerkamp

Organizations

  • University of Amsterdam

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Base Lines
  • Clustering
  • Education
  • Electronic Mail
  • Filters
  • Filtration
  • Heuristic Methods
  • Information Operations
  • Information Retrieval
  • Intelligent Systems
  • Language
  • New York
  • Online Communications
  • Probability
  • Standards

Fields of Study

  • Computer science

Readers

  • Information Retrieval
  • Neural Network Machine Learning.