Siena's Twitter Information Retrieval System: The 2012 Microblog Track

Abstract

Since 1992, the National Institute of Standards and Technology (NIST) has been annually hosting the Text Retrieval Conference (TREC). One of the newest tracks, which started in 2011, is the Microblog Track which uses a well-known social network site, Twitter, as its source of microblog data. Twitter allows its users to post 140 character length tweets to share messages with their followers, posting personal updates, and share major media stories from around the world. In order to evaluate information retrieval on microblog data, groups were provided with a file of about 16 million tweet IDs from January 24th to February 8th, 2011. This allowed us to download the tweet content of each ID for a total of 16,141,812 tweets. Participating teams were given a set of topics to test their retrieval process, and their program would return relevant tweets about that topic. The Siena College Institute of Artificial Intelligence expanded on STIRS, Siena's Twitter Information Retrieval System. The results for our adhoc run showed STIRS' best run to be at 18.08% precision, while the average of the median from all participating teams was 14.86%.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Nov 01, 2012
Accession Number: ADA581304

Entities

People

Darren Lim
Karl Appel
Lauren Mathews
Sharon Small

Organizations

Siena College

Siena's Twitter Information Retrieval System: The 2012 Microblog Track

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas