ICTNET at Microblog Track TREC 2012

Abstract

There are two search tasksin TREC2012 Microblog Track, namely: Real-time Adhoc and Real-time Filtering. The Tweets2011 corpus is used again and last year s results can be used as first officially labeled data for any participants to train their models. In this year s track, the former task has60 new queries given and the latter is first proposed. In the Real-time Adhoc task, we use indri retrieval toolkit to construct our retrieval system and propose a strategy of pseudo relevance feedback to expand original query, then we retrieved original tweets and their indri s scores as an important feature. Besides, we calculate lots of other features of these tweets, such abouturl, hash_tag, entropy, tfidf, bm25, language model and proximity. At last, we use two learning-to-rank methods, specifically RankSVM and ListNet, to combine all those featuresto sort them, returning the final ranked tweets to a specified query. In the Real-time Filtering task, we assuming this task is similar with the topic tracking in Twitter Stream, we build up two filtering models based on language model and Vector Space Model respectively. Each model is initialized by the start query and its relevant tweets. For each new coming tweet, the model will decide whether it is under the topic. If it is, we update the model to keep up with the development of the topic. The rest of this paper is organized as follows. In Section 2, we discuss the preprocessing of Tweets2011 corpus. In Section 3, the main method to rank the search results in Real-time Adhoc task is discussed. In Real-time Filtering task, we describe two filtering models in Section 4. Experiment resultresults areted in Section 5. And in the last section, we draw conclusions about our work.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2012
Accession Number
ADA578977

Entities

People

  • Bolong Zhu
  • Cunhui Shi
  • Jinghua Gao
  • Shenghua Liu
  • Xiao Han
  • Xueqi Cheng
  • Yue Liu

Organizations

  • Chinese Academy of Sciences

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Automatic
  • Data Sets
  • Filters
  • Filtration
  • Frequency
  • Governments
  • Information Operations
  • Instructions
  • Language
  • Learning
  • Multithreading
  • Preprocessing
  • Standards
  • Training
  • Vector Spaces

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Information Retrieval

Technology Areas

  • Space