Evaluating Stream Filtering for Entity Profile Updates for TREC 2013 (KBA Track Overview)
Abstract
The Knowledge Base Acceleration (KBA) track in TREC 2013 expanded the entity-centric filtering evaluation from TREC KBA 2012. This track evaluates systems that filter a time-ordered corpus for documents and slot fills that would change an entity profile in a predefined list of entities. We doubled the size of the KBA streamcorpus to twelve thousand contiguous hours and a billion documents from blogs, news, and Web content. We quadrupled the number of entities as query topics from structured knowledge bases (KB), such as Wikipedia and Twitter. We also added a second task component: identifying entity slot values that change over the course of the stream. This Streaming Slot Filling (SSF) subtask focuses on natural language understanding and is a step toward decomposing the profile update process undertaken by humans maintaining a knowledge base. A successful KBA system must do more than resolve the meaning of entity mentions by linking documents to the KB: it must also distinguish vitally relevant documents and new slot fills that would change a target entity's profile. This combines thinking from natural language processing (NLP) and information retrieval (IR). Filtering tracks in TREC have typically used queries based on topics described by a set of keyword queries or short descriptions, and annotators have generated relevance judgments based on their personal interpretation of the topic. For TREC 2013, we selected a set of filter topics based on Wikipedia and Twitter entities 98 people, 19 organizations, and 24 facilities. Assessors judged ~50k documents, which included all documents that mention a name from a handcrafted list of surface form names of the 141 target entities. Judgments for documents from before February 2012 were provided to TREC teams as training data, and the remaining 12 months of data was used to measure the F_1 accuracy and scaled utility of these systems. We present peak macro-averaged F_1 scores for all run sub
Document Details
- Document Type
- Technical Report
- Publication Date
- Nov 01, 2013
- Accession Number
- ADA600032
Entities
People
- Ce Zhang
- Christopher RĂ©
- Daniel A. Roberts
- Ellen Voorhees
- Ian Soboroff
- John R. Frank
- Max Kleiman-weiner
- Nilesh Tripuraneni
- Steven J. Bauer
Organizations
- Massachusetts Institute of Technology