Addressing Challenges of Machine Translation of Inuit Languages

Abstract

Machine translation to and from polysynthetic languages, such as those of the Inuit language family, has largely been overlooked as their complex morphology has been a barrier to research in computational methodologies. Polysynthetic languages pack abundant semantic and grammatical information into single words, thus the data sets are inherently extremely sparse, making them challenging computationally using typical word-based analysis. Here, we focus on Inuktitut, a polysynthetic language spoken in Canada, one of the official languages of the Nunavut territory, used in all its governmental and educational documentation. We discuss Inuktitut, highlighting its polysynthetic typology, word formation, grammatical complexity, morphophonemics, spelling, and dialect variation, and review how this complexity presents challenges for machine translation and morphological processing. We consider the following: improving the performance of an finite-state transducer morphological analyzer using various neural network approaches; using alternate subword units with a neural network architecture to improve over a baseline English-Inuktitut statistical machine translation system and determining what subword unit yields the most improvement; using a pipelined English-Inuktitut translation system, featuring deep-representation morpheme sequences converted to surface forms, to compete with the best subword system; and using hierarchical structures over morphemes in a novel approach to improve over the best subword system.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 01, 2018
Accession Number
AD1062179

Entities

People

  • Jeffrey C. Micher

Organizations

  • United States Army Research Laboratory

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Analyzers
  • Artificial Intelligence Computing
  • Artificial Intelligence Software
  • Artificial Neural Networks
  • Computational Linguistics
  • Computational Science
  • Computer Languages
  • Computing System Architectures
  • Data Set
  • Data Sets
  • Digital Data
  • Grammars
  • Information Science
  • Language
  • Linguistics
  • Machine Learning
  • Machine Translation
  • Military Research
  • Morphology (Linguistics)
  • Natural Language Computing
  • Natural Language Processing
  • Natural Languages
  • Network Architecture
  • Neural Networks
  • Recurrent Neural Networks
  • Sequences
  • Test Sets
  • Translations

Fields of Study

  • Computer science

Readers

  • Computational Linguistics

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation