Novel Approaches to Arabic Speech Recognition: Report from the 2002 Johns Hopkins Summer Workshop

Abstract

Although Arabic is currently one of the most widely spoken languages in the world, there has been relatively little speech recognition research on Arabic compared to other languages. Moreover, most previous work has concentrated on the recognition of formal rather than dialectal Arabic. This paper reports on our project at the 2002 Johns Hopkins Summer Workshop, which focused on the recognition of dialectal Arabic. Three problems were addressed: (a) the lack of short vowels and other pronunciation information in Arabic texts; (b) the morphological complexity of Arabic; and (c) the discrepancies between dialectal and formal Arabic. We present novel approaches to automatic vowel restoration, morphology-based language modeling and the integration of out of-corpus language model data, and report significant word error rate improvements on the LDC Arabic Call Home task.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 01, 2002
Accession Number
AD1002466

Entities

People

  • Daben Liu
  • Dimitra Vergyri
  • Feng He
  • Gang Ji
  • Jeff Bilmes
  • John Henderson
  • Katrin Kirchhoff
  • Melissa Egan
  • Mohamed Noamany
  • Nicolae Duta
  • Pat Schone
  • Richard Schwartz
  • Sourin Das

Organizations

  • United States Department of Defense

Tags

DTIC Thesaurus Topics

  • Automated Speech Recognition
  • Automatic
  • Consonants
  • Errors
  • Formal Languages
  • Language
  • Linguistics
  • Morphology (Linguistics)
  • Phonemes
  • Probability
  • Recognition
  • Sequences
  • Standards
  • Statistical Analysis
  • Test And Evaluation
  • Test Sets
  • Workshops

Fields of Study

  • Computer science

Readers

  • Research Science/Academic Research
  • Speech Processing/Speech Recognition.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation