A Spectral Algorithm for Latent Dirichlet Allocation

Abstract

Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. The increased representational power comes at the cost of a more challenging unsupervised learning problem for estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method is based on an efficiently computable orthogonal tensor decomposition of low-order moments.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 03, 2014
Accession Number
ADA620167

Entities

People

  • Anima Anandkumar
  • Daniel Hsu
  • Dean P. Foster
  • Sham Kakade
  • Yi-Kai Liu

Organizations

  • University of California, Irvine

Tags

Communities of Interest

  • Autonomy
  • C4I
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Algorithms
  • Clustering
  • Computer Science
  • Correlation Analysis
  • Data Science
  • Decomposition
  • Electronic Mail
  • Factor Analysis
  • Hidden Markov Models
  • Information Retrieval
  • Information Science
  • Machine Learning
  • Markov Models
  • Network Science
  • Probability
  • Statistics
  • Unsupervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Adaptive Control and Estimation with Uncertainty in Dynamic Systems.
  • Computational Linguistics
  • Statistical inference.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Information Retrieval
  • AI & ML - Machine Learning Algorithms