Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment

Abstract

This study has focused on five complementary research tasks in the domain of audio, speech, language, and speaker recognition and processing. In the area of speaker recognition/identification (SID), advancements have been realized to address acoustic mismatch due to speaker overlap, language mismatch, channel/microphone/additive noise, speaker style (spoken vs. singing), speaker state (physical task stress), distant speech, and environment based (room reverberation). In language ID (LID), advancements have been shown for improved out-of-set language rejection, as well as integrated spectral and prosody based LID solutions. For co-channel and diarization, new algorithms based on gammatone subband frequency modulation was achieved. In diarization, robust speech activity detection based on a combination (Combo-SAD) feature stream was developed. New keyword spotting technology using phonological features as well as audio stream assessment for peak clipping and speaker height estimation were also developed. All algorithms were evaluated on various speech corpora from AFRL, CRSS-UTDallas, and publicly available.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Oct 01, 2015
Accession Number: ADA623029

Entities

People

John H. Hansen

Organizations

University of Texas at Dallas

Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas