Morphology-Based Language Modeling for Arabic Speech Recognition

Abstract

Language modeling is a difficult problem for languages with rich morphology. In this paper we investigate the use of morphology-based language models at different stages in a speech recognition system for conversational Arabic. Class-based and single-stream factored language models using morphological word representations are applied within an N-best list rescoring framework. In addition, we explore the use of factored language models in first-pass recognition, which is facilitated by two novel procedures: the data-driven optimization of a multi-stream language model structure, and the conversion of a factored language model to a standard word-based model. We evaluate these techniques on a large-vocabulary recognition task and demonstrate that they lead to perplexity and word error rate reductions.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 08, 2004
Accession Number
AD1007967

Entities

People

  • Andreas Stolcke
  • Dimitra Vergyri
  • Katrin Kirchhoff
  • Kevin Duh

Organizations

  • SRI International

Tags

DTIC Thesaurus Topics

  • Adaptive Training
  • Automated Speech Recognition
  • Automatic
  • Counting Methods
  • Genetic Algorithms
  • Grammars
  • Hidden Markov Models
  • Hypotheses
  • Language
  • Markov Models
  • Models
  • Probability
  • Probability Distributions
  • Recognition
  • Standards
  • Test Sets
  • Training

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks