Evaluating Stream Filtering for Entity Profile Updates for TREC 2013 (KBA Track Overview)

Abstract

The Knowledge Base Acceleration (KBA) track in TREC 2013 expanded the entity-centric filtering evaluation from TREC KBA 2012. This track evaluates systems that filter a time-ordered corpus for documents and slot fills that would change an entity profile in a predefined list of entities. We doubled the size of the KBA streamcorpus to twelve thousand contiguous hours and a billion documents from blogs, news, and Web content. We quadrupled the number of entities as query topics from structured knowledge bases (KB), such as Wikipedia and Twitter. We also added a second task component: identifying entity slot values that change over the course of the stream. This Streaming Slot Filling (SSF) subtask focuses on natural language understanding and is a step toward decomposing the profile update process undertaken by humans maintaining a knowledge base. A successful KBA system must do more than resolve the meaning of entity mentions by linking documents to the KB: it must also distinguish vitally relevant documents and new slot fills that would change a target entity's profile. This combines thinking from natural language processing (NLP) and information retrieval (IR). Filtering tracks in TREC have typically used queries based on topics described by a set of keyword queries or short descriptions, and annotators have generated relevance judgments based on their personal interpretation of the topic. For TREC 2013, we selected a set of filter topics based on Wikipedia and Twitter entities 98 people, 19 organizations, and 24 facilities. Assessors judged ~50k documents, which included all documents that mention a name from a handcrafted list of surface form names of the 141 target entities. Judgments for documents from before February 2012 were provided to TREC teams as training data, and the remaining 12 months of data was used to measure the F_1 accuracy and scaled utility of these systems. We present peak macro-averaged F_1 scores for all run sub

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Nov 01, 2013
Accession Number: ADA600032

Entities

People

Ce Zhang
Christopher Ré
Daniel A. Roberts
Ellen Voorhees
Ian Soboroff
John R. Frank
Max Kleiman-weiner
Nilesh Tripuraneni
Steven J. Bauer

Organizations

Massachusetts Institute of Technology

Evaluating Stream Filtering for Entity Profile Updates for TREC 2013 (KBA Track Overview)

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Readers

Technology Areas