English-Chinese Information Retrieval at IBM
Abstract
We describe TREC-9 experiments with an IR system that incorporates statistical machine translation trained on sentence-aligned parallel corpora for both query translation (English-to-Chinese) and document translation (Chinese-to-English). These systems are contrasted with monolingual Chinese retrieval and with query translation based on a widely available commercial machine translation package. These systems incorporate both words and characters as features for the retrieval. Comparisons with a baseline from TREC-5/6 enable our experiments to address issues related to the differences between Beijing and Hong Kong dialects.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 2006
- Accession Number
- ADA456312
Entities
People
- J. S. Mccarley
- Martin Franz
- Wei-jing Zhu
Organizations
- IBM Thomas J. Watson Research Center