Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment

Abstract

This study has focused on five complementary research tasks in the domain of audio, speech, language, and speaker recognition and processing. In the area of speaker recognition/identification (SID), advancements have been realized to address acoustic mismatch due to speaker overlap, language mismatch, channel/microphone/additive noise, speaker style (spoken vs. singing), speaker state (physical task stress), distant speech, and environment based (room reverberation). In language ID (LID), advancements have been shown for improved out-of-set language rejection, as well as integrated spectral and prosody based LID solutions. For co-channel and diarization, new algorithms based on gammatone subband frequency modulation was achieved. In diarization, robust speech activity detection based on a combination (Combo-SAD) feature stream was developed. New keyword spotting technology using phonological features as well as audio stream assessment for peak clipping and speaker height estimation were also developed. All algorithms were evaluated on various speech corpora from AFRL, CRSS-UTDallas, and publicly available.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 01, 2015
Accession Number
ADA623029

Entities

People

  • John H. Hansen

Organizations

  • University of Texas at Dallas

Tags

Communities of Interest

  • C4I
  • Energy and Power Technologies
  • Ground and Sea Platforms

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automata Theory
  • Automated Speech Recognition
  • Computational Science
  • Computer Languages
  • Data Mining
  • Dimensionality Reduction
  • Feature Extraction
  • Information Retrieval
  • Information Science
  • Machine Learning
  • Natural Language Processing
  • Network Science
  • Neural Networks
  • Supervised Machine Learning

Fields of Study

  • Engineering

Readers

  • Computer Vision.
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML