IRRA at TREC 2012: Divergence From Independence (DFI)

Abstract

IRRA (IR-Ra) group participated in the 2012 Web track, with a system implementing a non-parametric term weighting method based on measuring the divergence from independence (DFI). This is the third year of participation for IRRA group, following the participations in TREC 2009 and 2010 Web tracks. In this year, the aim is to evaluate a new DFI-based term weighting model developed on the basis of Shannon s information theory (Shannon, 1949), along with the evaluation of a heuristic approach that is expected to provide early precision when used together with DFI term weighting. The TERRIER retrieval platform version 3.0 (Ounis et al., 2007) is used to index and search the ClueWeb09-T09B1 data set ( Category B data set), a subset of about 50 million Web pages in English. During indexing and searching, terms are stemmed (Porter s stemmer as implemented in TERRIER) but not stopped. The result sets are filtered using the fusion of two spam-page lists provided by Cormack et al. (2010) for ClueWeb09 document collection.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2012
Accession Number
ADA576843

Entities

People

  • Bekir T. Dincer

Organizations

  • Muğla University

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Data Sets
  • Electronic Mail
  • Engineering
  • Filtration
  • Frequency
  • Index Terms
  • Indexes
  • Information Retrieval
  • Information Theory
  • Models
  • Precision
  • Probabilistic Models
  • Probability
  • Standards
  • Test And Evaluation
  • Weighting Functions

Readers

  • Information Retrieval
  • Neural Network Machine Learning.