University of Glasgow at TREC 2008: Experiments in Blog, Enterprise, and Relevance Feedback Tracks with Terrier

Abstract

In TREC 2008, we participate in the Blog, Enterprise, and Relevance Feedback tracks. In all tracks, we continue the research and development of the Terrier platform centred around extending state-of-the-art weighting models based on the Divergence From Randomness (DFR) framework. In particular, we investigate two main themes, namely, proximity-based models, and collection and profile enrichment techniques based on several resources. In the Blog track, we aim to improve our opinion detection techniques and to integrate various new blog-specific features into our Voting Model. For the baseline ad-hoc task, we aim to build strongly performing baselines by applying two different techniques. The first one boosts documents in which query terms co-occur in a given window size, and the second one applies query expansion using collection enrichment. Non-English documents are also removed from the retrieved results. In the opinion-finding task, we experiment with two main opinion detection approaches. The first one improves our TREC 2007 dictionary-based approach by automatically building an internal opinion dictionary from the collection itself. We measure the opinionated discriminability of each term using an information-theoretic divergence measure based on the relevance assessments of previous years. The second approach is based on the OpinionFinder tool, which identifies subjective sentences in text. In particular, we introduce a novel method to measure the informativeness of query terms occurring in close proximity to subjective sentences. In the blog distillation task, we have two research themes.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2008
Accession Number
ADA512687

Entities

People

  • Ben He
  • Craig Macdonald
  • Iadh Ounis
  • Jie Peng
  • Rodrygo L. Santos

Organizations

  • University of Glasgow

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Computer Science
  • Data Fusion
  • Detection
  • Dictionaries
  • Distillation
  • Elections
  • Equations
  • Feedback
  • Frequency
  • Indexes
  • Information Processing
  • Information Science
  • Online Communications
  • Probability
  • Standards
  • Universities
  • Websites

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Information Retrieval
  • Regression Analysis.