A Statistical Word-Level Translation Model for Comparable Corpora

Abstract

In this paper, we present a model of statistical word-level mapping for comparable corpora. The approach is based on the assumption that if two terms have close distributional profiles, their corresponding translations' distributional profiles should be close in a comparable corpus. The proposed model is described. A preliminary investigation on intralanguage comparable corpora is laid out. The preliminary results are >92% accurate suggesting the feasibility of the model. The model needs to undergo some improvements and should be tested cross linguistically before assessing its significance.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 2000
Accession Number
ADA455144

Entities

People

  • Mona Diab
  • Steve Finch

Organizations

  • University of Maryland

Tags

DTIC Thesaurus Topics

  • Algorithms
  • Computational Complexity
  • Computational Linguistics
  • Data Analysis
  • Dictionaries
  • Frequency
  • Information Retrieval
  • Information Science
  • Language
  • Linear Algebra
  • Linguistics
  • Machine Translation
  • Natural Language Processing
  • Statistics
  • Thesauri
  • Translations
  • Word Lists

Readers

  • Computational Linguistics
  • Mathematics or Statistics
  • Theoretical Analysis.