Measuring Robustness with First Relevant Score in the TREC 2012 Microblog Track

Abstract

In this paper, we measure the effectiveness of various experimental search techniques not just with traditional TREC ad hoc search measures such as Average Precision, R-precision and Precision at 30, but also with robust measures based on just the rank of the first relevant item retrieved such as First Relevant Score and Generalized Success at 30. We report the results of our experiments conducted in the context of the Real-Time Adhoc Search Task of the TREC 2012 Microblog Track which investigated the effectiveness of ad hoc search of a collection of more than 10 million tweets. For the experimental technique of favoring tweets with urls, we found that both the traditional and robust measures indicated statistically significant increases in the mean score. However, for an experimental blind feedback technique, a technique known to be non-robust as it typically makes poor results even worse, the traditional Average Precision measure indicated a statistically significant increase in the mean score, but some of the measures just based on the rank of the first relevant item successfully discerned a statistically significant decrease in the mean score from the non-robust technique.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Feb 03, 2013
Accession Number: ADA581327

Entities

People

Stephen Tomlinson

Organizations

Open Text Corporation

Measuring Robustness with First Relevant Score in the TREC 2012 Microblog Track

Abstract

Document Details

Entities

People

Organizations

Tags

DTIC Thesaurus Topics

Readers