Machine Recognition vs Human Recognition of Voices
Abstract
While automated speaker recognition by machines can be quite good as demonstrated in NIST Speaker Recognition Evaluations, performance can still suffer when environmental conditions, emotions, or recording quality change. This research examines how robust humans are compared with machines at speaker recognition in changing environments. Several data conditions, including short sentences, frequency selective noise, and time-reversed speech were used to test the robustness of human listeners versus machine algorithms. Statistical significance tests were completed on the results and, for under conditions, human speaker recognition was more robust. The strength of the human listeners was especially evident for the challenging case of noise in the 2000-3000 Hz frequency range. Additional analysis was performed to identify factors that may impact a listener's ability to identify a person's identity. For example, the amount of voiced (or unvoiced) speech was examined to see if there was a correlation with how easily a speaker's voice was recognized. Unfortunately, the amount of voiced (or unvoiced) speech did not correlate strongly with how easily a speaker's voice was recognized. Other factors such as fundamental pitch, formant locations, pitch shimmer, pitch jitter, and other modulation measures also are being examined. The original goal of this effort was to discover which frequency bands are most important for the familiar speaker recognition task. This research was a cursory look at what frequency information is important for speaker identification. More listening experiments with better randomization of stimuli and phonetic consideration are required.
Document Details
- Document Type
- Technical Report
- Publication Date
- May 01, 2012
- Accession Number
- ADA568903
Entities
People
- Ronald L. Mitchell
- Stanley J. Wenndt
Organizations
- Air Force Research Laboratory