Breaking the Resource Bottleneck for Multilingual Parsing

Abstract

We propose a framework that enables the acquisition of annotation-heavy resources such as syntactic dependency tree corpora for low-resource languages by importing linguistic annotations from high-quality English resources. We present a large-scale experiment showing that Chinese dependency trees can be induced by using an English parser, a word alignment package, and a large corpus of sentence-aligned bilingual text. As a part of the experiment, we evaluate the quality of a Chinese parser trained on the induced dependency treebank. We find that a parser trained in this manner out-performs some simple baselines inspite of the noise in the induced treebank. The results suggest that projecting syntactic structures from English is a viable option for acquiring annotated syntactic structures quickly and cheaply. We expect the quality of the induced treebank to improve when more sophisticated filtering and error-correction techniques are applied.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2005
Accession Number
ADA440432

Entities

People

  • Amy Weinberg
  • Philip Resnik
  • Rebecca Hwa

Organizations

  • University of Maryland

Tags

Communities of Interest

  • Ground and Sea Platforms

DTIC Thesaurus Topics

  • Acquisition
  • Algorithms
  • Chinese Language
  • Computational Linguistics
  • Crossings
  • Data Acquisition
  • Filters
  • Filtration
  • Foreign Languages
  • Grammars
  • Language
  • Linguistics
  • Machine Translation
  • Natural Language Processing
  • Natural Languages
  • Test Sets
  • Universities

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Economics
  • Neural Network Machine Learning.