Open-Source Multi-Language Audio Database for Spoken Language Processing Applications

Abstract

This report gives a detailed summary of research work completed under Air Force Research Laboratory (AFRL) grant 53925, over the time period (April 12, 2010 April 10, 2012). There are two main aspects of the work completed. First was the collection and annotation of a large open source data base of speech passages from web sites such as You Tube. 300 passages were collected in each of three languages English, Mandarin, and Russian. Approximately 30 hours of speech were collected for each language. Each passage has been carefully transcribed at the phrasal level by human listeners. Each passage was originally transcribed and then checked and the transcription edited as needed by at least two additional native language listeners. The English and Mandarin were then forced aligned and labeled at the phonetic level using a combination of manual and automatic methods. The Russian passages have not yet been marked at the phonetic level. Another phase of the work was to explore several algorithmic methods for improving automatic speech recognition (ASR) for this intelligible but challenging data base. Note that the body of the report has four main sections plus appendices which introduce, describe, and summarize a portion of the work.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 2012
Accession Number
ADA571008

Entities

People

  • Stephen Zahorian

Organizations

  • Binghamton University

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Air Force
  • Air Force Research Laboratories
  • Automated Speech Recognition
  • Coding
  • Computational Science
  • Computer Science
  • Databases
  • Decoding
  • Feature Extraction
  • Graphical User Interface
  • Hidden Markov Models
  • Language
  • Markov Models
  • Military Research
  • Neural Networks
  • Recognition
  • Websites

Readers

  • Distributed Systems and Data Platform Development
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation