Query Expansion for Noisy Legal Documents

Abstract

The vocabulary of the TREC Legal OCR collection is noisy and huge. Standard techniques for improving retrieval performance such as content-based query expansion are ineffective for such document collection. In our work, we focused on exploiting metadata using blind relevance feedback, iterative improvement from the reference Boolean run, and the effects of using terms from different topic fields for automatic query formulation. This paper describes our methodologies and results.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2008
Accession Number
ADA512690

Entities

People

  • Douglas W. Oard
  • Lidan Wang

Organizations

  • University of Maryland

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Character Recognition
  • Computer Science
  • Feedback
  • Information Operations
  • Information Retrieval
  • Intact Stability
  • Judgment
  • Maryland
  • Metadata
  • Optical Character Recognition
  • Personality
  • Production
  • Social Networks
  • Standards
  • Universities

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Distributed Systems and Data Platform Development