The Linguistic-Core Approach to Structured Translation and Analysis of Low-Resource Languages

Abstract

The Linguistic Core MURI project focused on machine translation (MT) and textual analysis (TA) engines for low resource languages. We produced systems that can be trained with less data by using knowledge-rich linguistic priors, linguistic corpus annotation, monolingual corpora, techniques for cross-lingual training of NLP systems, and compact representations that allow for generalization over small amounts of data. Our research activities ranged from data collection and annotation to the design, development, and evaluation of algorithms and models for text analysis and machine translation. Our work addressed three focus languages from Africa (Kinyarwanda, Malagasy, and Swahili), but we also piloted many techniques on a variety of other languages. This report covers work that was done in the five years of the project and the sixth year extension

Document Details

Document Type
DoD Grant Award
Publication Date
Jun 25, 2021
Source ID
W911NF1010533

Entities

People

  • Jaime Carbonell

Organizations

  • Army Contracting Command
  • Carnegie Mellon University
  • United States Army

Tags

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Technical Research and Report Writing.

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation