Morphology-Based Language Modeling for Arabic Speech Recognition
Abstract
Language modeling is a difficult problem for languages with rich morphology. In this paper we investigate the use of morphology-based language models at different stages in a speech recognition system for conversational Arabic. Class-based and single-stream factored language models using morphological word representations are applied within an N-best list rescoring framework. In addition, we explore the use of factored language models in first-pass recognition, which is facilitated by two novel procedures: the data-driven optimization of a multi-stream language model structure, and the conversion of a factored language model to a standard word-based model. We evaluate these techniques on a large-vocabulary recognition task and demonstrate that they lead to perplexity and word error rate reductions.
Document Details
- Document Type
- Technical Report
- Publication Date
- Oct 08, 2004
- Accession Number
- AD1007967
Entities
People
- Andreas Stolcke
- Dimitra Vergyri
- Katrin Kirchhoff
- Kevin Duh
Organizations
- SRI International