Speech Recognition, Articulatory Feature Detection, and Speech Synthesis in Multiple Languages
Abstract
This document provides a summary of work completed by General Dynamics under the work unit 71840871, Speech Interfaces for Multinational Collaboration, for the period August 2004 to February 2009 under contract FA8650-04-C-6443. The speech technologies developed during this period include speech recognizers, Articulatory Feature (AF) detectors, and speech synthesizers. Speech recognition systems were developed for 15 different languages, and three methods were investigated for improving the performance of the systems: vocal tract length normalization, speaker adaptive training, and recognizer output voting error reduction. English AF detectors were developed using Gaussian mixture models, two-class Multi-Layer Perceptrons (MLPs), fusion MLPs, and multi-class MLPs. The outputs of the AF detectors were used to form the feature set for a speech recognizer. Speech synthesis systems were created for 13 different languages, and the following system modifications were investigated: expanding the label set to include additional contextual factors, changing the minimum description length control factor, and applying speaker clustering and adaption to create new voices. In addition, two graphical user interfaces were developed for training new voices and synthesizing speech in real-time.
Document Details
- Document Type
- Technical Report
- Publication Date
- Nov 01, 2009
- Accession Number
- ADA519140
Entities
People
- Brian M. Ore
Organizations
- General Dynamics