The Linguistic Core Approach to Structured Translation and Analysis of Low Resource Languages
Abstract
The Linguistic Core MURI project focused on machine translation (MT) and textual analysis (TA) engines for low resource languages. We produced systems that can be trained with less data by using knowledge-rich linguistic priors, linguistic corpus annotation, monolingual corpora, techniques for cross-lingual training of NLP systems, and compact representations that allow for generalization over small amounts of data. Our research activities ranged from data collection and annotation to the design, development, and evaluation of algorithms and models for text analysis and machine translation. Our work addressed three focus languages from Africa(Kinyarwanda, Malagasy, and Swahili), but we also piloted many techniques on a variety of other languages. This report covers work that was done in the five years of the project and the sixth year extension.
Document Details
- Document Type
- Technical Report
- Publication Date
- Sep 02, 2017
- Accession Number
- AD1051063
Entities
Organizations
- Carnegie Mellon University