A Computational Auditory Scene Analysis System for Speech Segregation and Robust Speech Recognition

Abstract

A conventional automatic speech recognizer does not perform well in the presence of multiple sound sources, while human listeners are able to segregate and recognize a signal of interest through auditory scene analysis. We present a computational auditory scene analysis system for separating and recognizing target speech in the presence of competing speech ornoise. We estimate, in two stages, the ideal binary time-frequency (T-F) mask which retainsthe mixture in a local T-F unit if and only if the target is stronger than the interference within the unit. In the first stage, we use harmonicity to segregate the voiced portions of individual sources in each time frame based on multipitch tracking. Additionally, unvoiced portions are segmented based on an onset/offset analysis. In the second stage, speaker characteristics are used to group the T-F units across time frames. The resulting masks are used in an uncertainty decoding framework for automatic speech recognition. We evaluate our system on a speech separation challenge and show that our system yields substantial improvement over the baseline performance.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2007
Accession Number
AD1001212

Entities

People

  • DeLiang Wang
  • Soundararajan Srinivasan
  • Yang Shao
  • Zhaozhang Jin

Organizations

  • Ohio State University

Tags

Communities of Interest

  • Energy and Power Technologies
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Automated Speech Recognition
  • Cognitive Science
  • Cognitive Systems Engineering
  • Computational Science
  • Computer Science
  • Computer Vision
  • Decoding
  • Engineering
  • Hidden Markov Models
  • Identification
  • Information Processing
  • Information Systems
  • Language
  • Probability
  • Recognition

Fields of Study

  • Engineering

Readers

  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference