Facet Classification of Blogs: Know-Center at the TREC 2009 Blog Distillation Task

Abstract

In this paper, we outline our experiments carried out at the TREC 2009 Blog Distillation Task. Our system is based on a plain text index extracted from the XML feeds of the TREC Blogs08 dataset. This index was used to retrieve candidate blogs for the given topics. The resulting blogs were classified using a Support Vector Machine that was trained on a manually labelled subset of the TREC Blogs08 dataset. Our experiments included three runs on different features: firstly on nouns, secondly on stylometric properties, and thirdly on punctuation statistics. The facet identification based on our approach was successful, although a significant number of candidate blogs were not retrieved at all.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Nov 01, 2009
Accession Number: ADA517854

Entities

People

Andreas Juffinger
Elisabeth Lex
Michael Granitzer

Facet Classification of Blogs: Know-Center at the TREC 2009 Blog Distillation Task

Abstract

Document Details

Entities

People

Tags

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas