TREC 2008 at the University at Buffalo: Legal and Blog Track

Abstract

In the TREC 2008, the team from the State University of New York at Buffalo participated in the Legal track and the Blog track. For the Legal track, we worked on the interactive search task using the Web-based Legacy Tobacco Document Library Boolean search system. Our experiment achieved reasonable precision but suffered significantly from low recall. These results, together with the appealing and adjudication results, suggest that the concept of document relevance in legal e-discovery deserve further investigation. For the Blog distillation task, our official runs were based on a reduced document model in which only text from several most content-bearing fields were indexed. This approach indeed yielded encouraging retrieval effectiveness while significantly decreasing the index size. We also studied query independence/dependence and link-based features for finding relevant feeds. For the Blog opinion and polarity tasks, we mainly investigated the usefulness of opinionated words contained in the SentiGI lexicon. Our experiment results showed that the effectiveness of the technique is quite limited, indicating other more sophisticated techniques are needed.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2008
Accession Number
ADA512967

Entities

People

  • Jianqiang Wang
  • Omar Mukhtar
  • Rohini Srihari
  • Ying Sun

Organizations

  • University at Buffalo

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Character Recognition
  • Commerce
  • Computer Vision
  • Distillation
  • Feature Extraction
  • Government (Foreign)
  • Information Processing
  • Information Science
  • Language
  • Law
  • Marketing
  • Neural Networks
  • New York
  • Retail
  • Statistics
  • Supervised Machine Learning
  • Universities

Readers

  • Computational Linguistics
  • Information Retrieval
  • Systems Analysis and Design