Construction of a Phonotactic Dialect Corpus using Semiautomatic Annotation

Abstract

In this paper, we discuss rapid, semiautomatic annotation techniques of detailed phonological phenomena for large corpora. We describe the use of these techniques for the development of a corpus of American English dialects. The resulting annotations and corpora will support both large scale linguistic dialect analysis and automatic dialect identification. We delineate the semiautomatic annotation process that we are currently employing and, a set of experiments we ran to validate this process. From these experiments, we learned that the use of ASR techniques could significantly increase the throughput and consistency of human annotators.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2007
Accession Number
ADA521149

Entities

People

  • Christopher Cieri
  • Dominique Estival
  • Joseph Campbell
  • Julie Vonwiller
  • Reva Schwartz
  • Shelley Paget
  • Wade Shen

Organizations

  • Massachusetts Institute of Technology

Tags

DTIC Thesaurus Topics

  • Audio Files
  • Automated Speech Recognition
  • Automatic
  • Consistency
  • Construction
  • Department Of Homeland Security
  • Identification
  • Identification Systems
  • Language
  • Pilot Studies
  • Recognition
  • Semiautomatic
  • Speech
  • Standards
  • United States
  • United States Government
  • Vocalization

Readers

  • Computational Linguistics
  • Speech Processing/Speech Recognition.
  • Systems Analysis and Design