Provenance and Processing of an Inuktitut-English Parallel Corpus Part 1: Inuktitut Data Preparation and Factored Data Format

Abstract

We describe the Nunavut Hansard, a parallel English-Inuktitut corpus derived from Nunavut legislative proceedings, and we describe the processing that was carried out to prepare the data for use in morphological analysis and downstream machine translation experiments. We provide all of the scripts and code used to process the data.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 19, 2018
Accession Number
AD1062208

Entities

People

  • Jeffrey C. Micher

Organizations

  • United States Army Research Laboratory

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Abstracts
  • Analyzers
  • Applied Computer Science
  • Artificial Intelligence Computing
  • Cecum
  • Computational Linguistics
  • Computational Science
  • Computer Science
  • Data Set
  • Data Sets
  • Databases
  • Dictionaries
  • Digital Data
  • Directories
  • Governments
  • Information Science
  • Information Systems
  • Language
  • Law
  • Linguistics
  • Machine Translation
  • Military Research
  • Morphology (Linguistics)
  • Natural Language Computing
  • Natural Language Processing
  • Test Sets
  • Translations
  • United States Government

Readers

  • Computational Linguistics
  • Defense Acquisition Program Management

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation