Open-Source Multi-Language Audio Database for Spoken Language Processing Applications

Abstract

This report gives a detailed summary of research work completed under Air Force Research Laboratory (AFRL) grant 53925, over the time period (April 12, 2010 April 10, 2012). There are two main aspects of the work completed. First was the collection and annotation of a large open source data base of speech passages from web sites such as You Tube. 300 passages were collected in each of three languages English, Mandarin, and Russian. Approximately 30 hours of speech were collected for each language. Each passage has been carefully transcribed at the phrasal level by human listeners. Each passage was originally transcribed and then checked and the transcription edited as needed by at least two additional native language listeners. The English and Mandarin were then forced aligned and labeled at the phonetic level using a combination of manual and automatic methods. The Russian passages have not yet been marked at the phonetic level. Another phase of the work was to explore several algorithmic methods for improving automatic speech recognition (ASR) for this intelligible but challenging data base. Note that the body of the report has four main sections plus appendices which introduce, describe, and summarize a portion of the work.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Dec 01, 2012
Accession Number: ADA571008

Entities

People

Stephen Zahorian

Organizations

Binghamton University

Open-Source Multi-Language Audio Database for Spoken Language Processing Applications

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Readers

Technology Areas