The MIT Lincoln Laboratory RT-04F Diarization Systems: Applications to Broadcast Audio and Telephone Conversations

Abstract

Audio diarization is the process of annotating an input audio channel with information that attributes (possibly overlapping) temporal regions of signal energy to their specific sources. These sources can include particular speakers, music, background noise sources, and other signal source/channel characteristics. Diarization has utility in making automatic transcripts more readable and in searching and indexing audio archives. In this paper we describe the systems developed by MITLL and used in DARPA EARS Rich Transcription Fall 2004 (RT-04F) speaker diarization evaluation. The primary system is based on a new proxy speaker model approach and the secondary system follows a more standard BIC based clustering approach. We present experiments analyzing performance of the systems and present a cross-cluster recombination approach that significantly improves performance. In addition, we also present results applying our system to a telephone speech, summed channel speaker detection task.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2004
Accession Number
ADA511688

Entities

People

  • D. A. Reynolds
  • P. Torres-carrasquillo

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Audio Files
  • Automated Speech Recognition
  • Automatic
  • Bandwidth
  • Change Detection
  • Classification
  • Clustering
  • Computer Vision
  • Detection
  • Detectors
  • False Alarms
  • Recognition
  • Standards
  • Test And Evaluation
  • Warning Systems

Readers

  • Distributed Systems and Data Platform Development
  • Radio communications and signal processing.
  • Speech Processing/Speech Recognition.