Speaker Clustering for a Mixture of Singing and Reading (Preprint)

Abstract

In this study, we propose a speaker clustering algorithm based on reading and singing speech samples for each speaker. As a speaking style, singing introduces changes in the time-frequency structure of a speaker s voice. The purpose of this study is to introduce advancements into speech systems such as speech indexing and retrieval which improve robustness to intrinsic variations in speech production. Clustering is performed within a GMM mean supervector space. The proposed method includes two stages: first, initial clusters are obtained using traditional clustering techniques such as k-means, and hierarchical. Next, each cluster is refined in a PLDA subspace resulting in a more speaker dependent representation that is less sensitive to speaking style. The proposed algorithm improves the average clustering accuracy of the k-means baseline by +9.3% absolute.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 2012
Accession Number
ADA568317

Entities

People

  • John H. Hansen
  • Mahnoosh Mehrabani

Organizations

  • University of Texas at Dallas

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Accuracy
  • Air Force
  • Air Force Research Laboratories
  • Algorithms
  • Classification
  • Clustering
  • Computer Science
  • Contracts
  • Data Sets
  • Data Storage Systems
  • Databases
  • Discriminant Analysis
  • Information Retrieval
  • Language
  • Production
  • Recognition
  • Test Sets

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Speech Processing/Speech Recognition.

Technology Areas

  • Space