Siena's Twitter Information Retrieval System: The 2014 Microblog Track

Abstract

As the internet dramatically changes each year, microblogs - such as Facebook and Twitter - are being used more often as a source of information exchange. Twitter users are learning about current events earlier compared to reading about it on their news feeds, as companies and celebrities continue to utilize Twitter to spread information. Information Retrieval, a topic which NIST1 (National Institute of Standards and Technology) holds a conference for every year, involves utilizing such online environments, like microblogs, to grab as much information from these sources to find if the information can be put towards a purpose. The Microblog Track was originally introduced to TREC2 (Text REtrieval Conference) in 2011, and selected Twitter3 as its microblog resource. Twitter allows its users to share short, 140 character length posts with their followers, and is often used to share anything from fashion trends to the latest terrorist attacks. Due to the short length of tweets, users often utilize other ways to share more information, such as including links or images with their tweets, which has an effect on the tweet containing relevant information. Participating groups for the track were given access to a Twitter API, provided by TREC, containing a corpus of 243 million tweets scrapped from February 1st to March 31st, 2013. Each group was given a set of test topics in which to test their system, which return results for the Adhoc and/or Tweet Timeline Generation Task (TTG). In this paper, we describe five Query Expansion modules and three Relevance modules designed for the microblog track, built within STIRS. Our precision results for our adhoc run shows STIRS' average to be at 61.91% precision, with our average TTG at 85.38% precision.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2014
Accession Number
ADA618605

Entities

People

  • Darren Lim
  • Lauren Mathews
  • Matthew Roberts
  • Sharon Small
  • Timothy Larock

Organizations

  • Siena College

Tags

Communities of Interest

  • Autonomy
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Artificial Intelligence
  • Computational Processes
  • Demographic Cohorts
  • Information Retrieval
  • Language
  • Learning
  • Machine Learning
  • Online Communications
  • Precision
  • Social Media
  • Social Networking Services
  • Standards
  • Test And Evaluation
  • Universities

Fields of Study

  • Computer science

Readers

  • Academic Conference Management
  • Distributed Systems and Data Platform Development
  • Information Retrieval

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval