The Linguistic Core Approach to Structured Translation and Analysis of Low Resource Languages

Abstract

The Linguistic Core MURI project focused on machine translation (MT) and textual analysis (TA) engines for low resource languages. We produced systems that can be trained with less data by using knowledge-rich linguistic priors, linguistic corpus annotation, monolingual corpora, techniques for cross-lingual training of NLP systems, and compact representations that allow for generalization over small amounts of data. Our research activities ranged from data collection and annotation to the design, development, and evaluation of algorithms and models for text analysis and machine translation. Our work addressed three focus languages from Africa(Kinyarwanda, Malagasy, and Swahili), but we also piloted many techniques on a variety of other languages. This report covers work that was done in the five years of the project and the sixth year extension.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 02, 2017
Accession Number
AD1051063

Entities

Organizations

  • Carnegie Mellon University

Tags

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence Software
  • Bayesian Networks
  • Computational Linguistics
  • Computational Science
  • Computer Languages
  • Grammars
  • Hidden Markov Models
  • Information Science
  • Language
  • Linguistics
  • Machine Translation
  • Markov Models
  • Monte Carlo Method
  • Natural Language Processing
  • Natural Languages
  • Sampling

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Technical Research and Report Writing.

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation