Understanding Tonal Languages
Abstract
This report gives a detailed summary of research work completed under Air Force Research Laboratory (AFRL) grant 56236, over the time period (November 17, 2010 - November 16, 2012). The main objective was to study various methods for Mandarin syllable recognition. Techniques were explored for both base syllable recognition and lexical tone recognition. The RASC863 database, obtained from the Chinese Linguistic Data Consortium was used for experimental work. Basel syllable phone recognition (60 phones) was done with a Hidden Markov Model recognizer. Best results obtained were approximately 69%. Human listeners were used to establish a baseline for lexical tone recognition. Tone recognition accuracy for humans ranges from about 55% to about 90%, depending on how much context is given to the listeners. The best tone classification accuracy with a neural network classifier is about 76%. The best tone recognition accuracy obtained with a Hidden Markov Model recognizer is about 71%. In addition to ASR experiments with Mandarin, basic research on improved pitch tracking, and refinement of spectral/temporal features (DCTCs/DCSCs) was done. It was determined that much longer time intervals are preferred for dynamic feature calculations than are typically used with MFCC features. Also the "best" segment intervals for Mandarin feature calculations are somewhat longer than for English.
Document Details
- Document Type
- Technical Report
- Publication Date
- Apr 01, 2013
- Accession Number
- ADA584180
Entities
People
- Stephen A. Zahorian
Organizations
- Binghamton University