TREC 2013 Web Track Overview

Abstract

The goal of the TREC Web track is to explore and evaluate retrieval approaches over large-scale subsets of the Web -- currently on the order of one billion pages. For TREC 2013, the fifth year of the Web track, we implemented the following significant updates compared to 2012. First, the Diversity task was replaced with a new Risk-sensitive retrieval task that explores the tradeoffs systems can achieve between effectiveness (overall gains across queries) and robustness (minimizing the probability of significant failure, relative to a provided baseline). Second, we based the 2013 Web track experiments on the new ClueWeb12 collection created by the Language Technologies Institute at Carnegie Mellon University. ClueWeb12 is a successor to the ClueWeb09 dataset, comprising about one billion Web pages crawled between Feb-May 2012. The crawling and collection process for ClueWeb12 included a rich set of seed URLs based on commercial search traffic, Twitter and other sources, and multiple measures for flagging undesirable content such as spam, pornography, and malware. The Adhoc task continued as in previous years.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jan 30, 2014
Accession Number: ADA600873

Entities

People

Charles L. Clarke
Ellen M. Voorhees
Fernando Diaz
Kevyn Collins-thompson
Paul Bennett

Organizations

University of Michigan

TREC 2013 Web Track Overview

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas