POSTECH at TREC 2009 Blog Track: Top Stories Identification

Abstract

This paper describes our participation in the TREC 2009 Blog Track. Our system consists of the query likelihood component and the news headline prior component, based on the language model framework. For the query likelihood, we propose several approaches to estimate the query language model and the news headline language model. We also suggest two approaches to choose the 10 supporting relevant posts: Feed-Based Selection and Cluster-Based Selection. Furthermore, we propose two criteria to estimate the news headline prior for a given day. Experimental results show that using the prior significantly improves the performance of the task.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2009
Accession Number
ADA517740

Entities

People

  • Hun-young Jung
  • Jong-hyeok Lee
  • Woosang Song
  • Yeha Lee

Organizations

  • Pohang University of Science and Technology

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Engineering
  • Human Behavior
  • Identification
  • Information Operations
  • Information Retrieval
  • Language
  • Online Communications
  • Preprocessing
  • Standards

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • International Journalism and Media Studies.
  • Regression Analysis.