Mathematical Modelling for the Evaluation of Automated Speech Recognition Systems--Research Area 3.3.1 (c)
Abstract
Automated speech recognizers (ASR) are now more often found as components inside other applications than as a stand alone application for transcribing speech word-for-word into text. Statistical pattern recognition techniques allow us to acquire a better task-specific evaluation measure for embedded applications than word error rates (WER), which are used for transcription. Our approach considered two applications of ASR: a decision support software system for meetings, in which a summary of a meeting is audited to record all of the decisions that were taken during the meeting, and a specific entity identification task, in which an intelligence analyst identifies triples of "who," "where" and "when" for each event described in transcribed broadcast news. Both of these resemble typical activities of intelligence analysts in OSINT processing and production applications. We assessed two task evaluation measures. The first fixes the input, and learns to predict human subject performance as the transcript for the input varies in accuracy. This measure is well-suited to developers of ASR systems who wish to measure the effects of modifications they make to their software during development. The second measure does not hold the input fixed, and does not require new human-subject data to be collected for new input.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 07, 2016
- Accession Number
- AD1008218
Entities
People
- Gerald Penn
Organizations
- University of Toronto