Towards Link Characterization from Content

Abstract

In processing large volumes of speech and language data, we are often interested in the distribution of languages, speakers, topics, etc. For large data sets, these distributions are typically estimated at a given point in time using pattern classification technology. Such estimates can be highly biased, especially for rare classes. While these biases have been addressed in some applications, they have thus far been ignored in the speech and language literature. This neglect causes significant error for low-frequency classes. Correcting this biased distribution involves exploiting uncertain knowledge of the classifier error patterns. The Metropolis-Hastings algorithm allows us to construct a Bayes estimator for the true class proportions. We experimentally evaluate this algorithm for a speaker recognition task.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2008
Accession Number
ADA510822

Entities

People

  • Allen Gorin
  • John Grothendieck

Organizations

  • Rutgers University–New Brunswick

Tags

Communities of Interest

  • Autonomy
  • Biomedical

DTIC Thesaurus Topics

  • Algorithms
  • Classification
  • Data Science
  • Data Sets
  • Estimators
  • False Alarms
  • Information Processing
  • Information Science
  • Language
  • Machine Learning
  • Monte Carlo Method
  • Natural Language Processing
  • Probability
  • Random Variables
  • Signal Processing
  • Statistical Algorithms
  • Statistics

Fields of Study

  • Computer science

Readers

  • Adaptive Control and Estimation with Uncertainty in Dynamic Systems.
  • Computational Linguistics
  • Regression Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Machine Translation