English-Chinese Information Retrieval at IBM

Abstract

We describe TREC-9 experiments with an IR system that incorporates statistical machine translation trained on sentence-aligned parallel corpora for both query translation (English-to-Chinese) and document translation (Chinese-to-English). These systems are contrasted with monolingual Chinese retrieval and with query translation based on a widely available commercial machine translation package. These systems incorporate both words and characters as features for the retrieval. Comparisons with a baseline from TREC-5/6 enable our experiments to address issues related to the differences between Beijing and Hong Kong dialects.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2006
Accession Number
ADA456312

Entities

People

  • J. S. Mccarley
  • Martin Franz
  • Wei-jing Zhu

Organizations

  • IBM Thomas J. Watson Research Center

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Computational Processes
  • Computing-Related Activities
  • Hong Kong
  • Information Operations
  • Information Retrieval
  • Instructions
  • Machine Translation
  • Translations

Readers

  • Computational Linguistics

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation