IRRA at TREC 2009: Index Term Weighting based on Divergence From Independence Model
Abstract
IRRA (IR-Ra) group participated in the 2009 Web track (both adhoc task and diversity task) and the Million Query track. In this year, the major concern is to examine the effectiveness of a novel, nonparametric index term weighting model, divergence from independence (DFI). The notion of independence, which is the notion behind the well-known statistical exploratory data analysis technique called the correspondence analysis (Greenacre, 1984; Jambu, 1991), can be adapted to the index term weighting problem. In this respect, it can be thought of as a qualitative description of the importance of terms for documents, in which they appear, importance in the sense of contribution to the information contents of documents relative to other terms. According to the independence notion, if the ratios of the frequencies of two different terms are the same across documents, they are independent from documents. For example, each Web page contains a pair of "html" and a pair of "body" tags, so that the ratio of frequencies of these tags is the same across all Web pages, indicating that the "html" and "body" tags are independent from Web pages.
Document Details
- Document Type
- Technical Report
- Publication Date
- Nov 01, 2009
- Accession Number
- ADA517855
Entities
People
- Bahar Karaoglan
- Bekir T. Dincer
- Ilker Kocabase
Organizations
- Muğla University