Chahta Anumpa: A Multimodal Corpus of the Choctaw Language

Abstract

This paper presents a general use corpus for the Native American indigenous language Choctaw. The corpus contains audio, video, and text resources, with many texts also translated in English. The Oklahoma Choctaw and the Mississippi Choctaw variants of the language are represented in the corpus. The data set provides documentation support for the threatened language, and allows researchers and language teachers access to a diverse collection of resources.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2018
Accession Number
AD1158070

Entities

People

  • Eli Pincus
  • Jacqueline Brixey
  • Ron Artstein

Organizations

  • University of Southern California

Tags

Communities of Interest

  • Biomedical

DTIC Thesaurus Topics

  • Audio Files
  • Data Sets
  • Databases
  • Dictionaries
  • Grammars
  • Language
  • Linguistics
  • Louisiana
  • Machine Translation
  • Mississippi
  • Morphology (Linguistics)
  • Native Americans
  • Natural Language Processing
  • Natural Languages
  • Social Media
  • Speech
  • United States

Readers

  • Archaeological Resource Survey
  • Database Systems and Applications
  • Speech Processing/Speech Recognition.