IRRA at TREC 2009: Index Term Weighting based on Divergence From Independence Model

Abstract

IRRA (IR-Ra) group participated in the 2009 Web track (both adhoc task and diversity task) and the Million Query track. In this year, the major concern is to examine the effectiveness of a novel, nonparametric index term weighting model, divergence from independence (DFI). The notion of independence, which is the notion behind the well-known statistical exploratory data analysis technique called the correspondence analysis (Greenacre, 1984; Jambu, 1991), can be adapted to the index term weighting problem. In this respect, it can be thought of as a qualitative description of the importance of terms for documents, in which they appear, importance in the sense of contribution to the information contents of documents relative to other terms. According to the independence notion, if the ratios of the frequencies of two different terms are the same across documents, they are independent from documents. For example, each Web page contains a pair of "html" and a pair of "body" tags, so that the ratio of frequencies of these tags is the same across all Web pages, indicating that the "html" and "body" tags are independent from Web pages.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2009
Accession Number
ADA517855

Entities

People

  • Bahar Karaoglan
  • Bekir T. Dincer
  • Ilker Kocabase

Organizations

  • Muğla University

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Contrast
  • Data Analysis
  • Data Science
  • Data Sets
  • Dimensionality Reduction
  • Electronic Mail
  • Equations
  • Frequency
  • Index Terms
  • Indexes
  • Information Processing
  • Information Retrieval
  • Information Science
  • New York
  • Numbers
  • Standards
  • Statistics

Fields of Study

  • Computer science

Readers

  • Ballistic Missile Meteorology
  • Database Systems and Applications
  • Regression Analysis.