Towards Link Characterization from Content
Abstract
In processing large volumes of speech and language data, we are often interested in the distribution of languages, speakers, topics, etc. For large data sets, these distributions are typically estimated at a given point in time using pattern classification technology. Such estimates can be highly biased, especially for rare classes. While these biases have been addressed in some applications, they have thus far been ignored in the speech and language literature. This neglect causes significant error for low-frequency classes. Correcting this biased distribution involves exploiting uncertain knowledge of the classifier error patterns. The Metropolis-Hastings algorithm allows us to construct a Bayes estimator for the true class proportions. We experimentally evaluate this algorithm for a speaker recognition task.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 2008
- Accession Number
- ADA510822
Entities
People
- Allen Gorin
- John Grothendieck
Organizations
- Rutgers University–New Brunswick