A Transcription Scheme for Languages Employing the Arabic Script Motivated by Speech Processing Application

Abstract

Abstract This paper offers a transcription system for Persian, the target language in the Transonics project, a speech-to-speech translation system developed as a part of the DARPA Babylon program (The DARPA Babylon Program; Narayanan, 2003). In this paper, we discuss transcription systems needed for automated spoken language processing applications in Persian that uses the Arabic script for writing. This system can easily be modified for Arabic, Dari, Urdu and any other language that uses the Arabic script. The proposed system has two components. One is a phonemic based transcription of sounds for acoustic modelling in Automatic Speech Recognizers and for Text to Speech synthesizer, using ASCII based symbols, rather than International Phonetic Alphabet symbols. The other is a hybrid system that provides a minimally-ambiguous lexical representation that explicitly includes vocalic information; such a representation is needed for language modelling, text to speech synthesis and machine translation.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2004
Accession Number
ADA458354

Entities

People

  • Panayiotis G. Georgiou
  • Shadi Ganjavi
  • Shrikanth Narayanan

Organizations

  • University of California, Los Angeles

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Alphabets
  • Ambiguity
  • Automated Speech Recognition
  • Consonants
  • Decoding
  • Electrical Engineering
  • English Language
  • Grammars
  • Graphical User Interface
  • Hybrid Systems
  • Language
  • Linguistics
  • Machine Translation
  • Persian Language
  • Speech
  • Speech Analysis
  • Translations

Fields of Study

  • Computer science
  • Engineering

Readers

  • Computational Linguistics
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation