Translating Collocations for Use in Bilingual Lexicons

Abstract

Collocations are notoriously difficult for non-native speakers to translate, primarily because they are opaque and can not be translated on a word by word basis. We describe a program named Champollion which, given a pair of parallel corpora in two different languages, automatically produces translations of an input list of collocations. Our goal is to provide a tool to compile bilingual lexical information above the word level in multiple languages and domains. The algorithm we use is based on statistical methods and produces p word translations of n word collocations in which n and p need not be the same; the collocations can be either flexible or fixed compounds. For example, Champollion translates "to make a decision," "employment equity," and "stock market," respectively into: "prendre une decision," "equite en matiere d'emploi," and "bourse." Testing and evaluation of Champollion on one year's worth of the Hansards corpus yielded 300 collocations and their translations, evaluated at 77% accuracy. In this paper, we describe the statistical measures used, the algorithm, and the implementation of Champollion, presenting our results and evaluation.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1994
Accession Number
ADA460571

Entities

People

  • Frank Smadja
  • Kathleen Mckeown

Organizations

  • Columbia University

Tags

Communities of Interest

  • C4I

DTIC Thesaurus Topics

  • Abstracts
  • Accuracy
  • Algorithms
  • Coefficients
  • Computational Linguistics
  • Computer Science
  • Databases
  • Errors
  • Frequency
  • Information Retrieval
  • Information Science
  • Language
  • Linguistics
  • Machine Translation
  • Statistics
  • Test And Evaluation
  • Translations

Fields of Study

  • Computer science

Readers

  • Computer Science/Computer Engineering/Data Science/Digital Signal Processing.
  • Materials Science (Mechanical Engineering).
  • Speech Processing/Speech Recognition.