TREC Microblog 2012 Track: Real-Time Algorithm for Microblog Ranking Systems

Abstract

As a matter of fact Twitter is becoming the new big data container, due to the deep increase of amount of users and its growing popularity. Moreover the huge amount of user profiles and rough text data, are providing continuosly new research challenges. This paper reports our contribution and results to the Trec 2012 Microblog Track. In this particular, challenge each participant is required to conduct a "real-time" retrieval task which given a query topic seeks for the most recent and relevant tweets. We devised an effective real time ranking algorithm, avoiding heavy computational requirements. Our contribution is multifold: (1) adapting an existing ranking method BM25 to the microblogging purpose (2) enhancing traditional content-based features with knowledge extracted from Wikipedia, (3) employing Pseudo Relevance Feedback techniques for query expansion (4) performing text analysis such as ad-hoc text normalization and POS Tagging to limit noise data and better represent useful information.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2012
Accession Number
ADA579307

Entities

People

  • Davide F. Gurini
  • Fabio Gasparetti

Organizations

  • Università degli Studi Roma Tre

Tags

Communities of Interest

  • Autonomy
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Artificial Intelligence
  • Data Analysis
  • Digital Information
  • English Language
  • Feedback
  • Frequency
  • Information Retrieval
  • Information Security
  • Machine Learning
  • National Security
  • Precision
  • Security
  • Standards
  • Test And Evaluation
  • Universities

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Information Retrieval
  • Systems Analysis and Design