Arabic Information Retrieval at UMass in TREC-10

Abstract

The University of Massachusetts took on the TREC-10 cross-language track with no prior experience with Arabic, and no Arabic speakers among any of our researchers or students. We intended to implement some standard approaches, and to extend a language modeling approach to handle co-occurrences. Given the lack of resources -- training data, electronic bilingual dictionaries, and stemmers -- and our unfamiliarity with Arabic, we had our hands full carrying out some standard approaches to monolingual and cross-language Arabic retrieval, and did not submit any runs based on novel approaches. We submitted three monolingual runs and one cross-language run. We first describe the models, techniques, and resources we used, then we describe each run in detail. Our official runs performed moderately well, in the second tier (3rd or 4th place). Since submitting these results, we have improved normalization and stemming, improved dictionary construction, expanded Arabic queries, improved estimation and smoothing in language models, and added combination of evidence, increasing performance by a substantial amount.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jan 01, 2006
Accession Number: ADA456273

Entities

People

Leah S. Larkey
Margaret E. Connell

Organizations

University of Massachusetts Amherst

Arabic Information Retrieval at UMass in TREC-10

Abstract

Document Details

Entities

People

Organizations

Tags

DTIC Thesaurus Topics

Readers

Technology Areas