A Scalable Distributed Syntactic, Semantic, and Lexical Language Model

Abstract

This paper presents an attempt at building a large scale distributed composite language model that is formed by seamlessly integrating an n-gram model, a structured language model, and probabilistic latent semantic analysis under a directed Markov random field paradigm to simultaneously account for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content. The composite language model has been trained by performing a convergent N-best list approximate EM algorithm and a follow-up EM algorithm to improve word prediction power on corpora with up to a billion tokens and stored on a supercomputer. The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the Bleu score and readability of translations when applied to the task of re-ranking the N-best list from a state-of-the-art parsing-based machine translation system.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2012
Accession Number
ADA563918

Entities

People

  • Lei Zheng
  • Ming Tan
  • Shaojun Wang
  • Wenli Zhou

Organizations

  • Wright State University

Tags

Communities of Interest

  • Autonomy
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automated Speech Recognition
  • Bayesian Networks
  • Computational Linguistics
  • Computational Science
  • Generative Models
  • Information Science
  • Language
  • Linguistics
  • Machine Learning
  • Markov Models
  • Natural Language Processing
  • Natural Languages
  • Probabilistic Models
  • Probability
  • Probability Distributions

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Neural Network Machine Learning.
  • Psychometric Testing or Psychological Assessment.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Machine Translation