CLIR Experiments at Maryland for TREC-2002: Evidence Combination for Arabic-English Retrieval

Abstract

The focus of the experiments reported in this paper was techniques for combining evidence for cross-language retrieval, searching Arabic documents using English queries. Evidence from multiple sources of translation knowledge was combined to estimate translation probabilities, and four techniques for estimating query-language term weights from document-language evidence were tried. A new technique that exploits translation probability information was found to outperform a comparable technique in which that information was not used. Comparative results for three variants of Arabic "light" stemming are also presented. A simple variant of an existing stemming algorithm was found to result in significantly better retrieval effectiveness.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 01, 2003
Accession Number
ADA452814

Entities

People

  • Douglas W. Oard
  • Kareem Darwish

Organizations

  • University of Maryland

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Abstracts
  • Automatic
  • Dictionaries
  • Equations
  • Failure Analysis
  • Frequency
  • Information Operations
  • Language
  • Machine Translation
  • Maryland
  • Natural Languages
  • Probability
  • Stemming
  • Text Processing
  • Translations
  • Universities
  • Word Lists

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Regression Analysis.