When are Overcomplete Representations Identifiable? Uniqueness of Tensor Decompositions Under Expansion Constraints

Abstract

Overcomplete latent representations have been very popular for unsupervised feature learning in recent years. In this paper, we specify which overcomplete models can be identified given observable moments of a certain order. We consider probabilistic admixture or topic models in the overcomplete regime. While general overcomplete admixtures are not identifiable, we establish generic identifiability under a constraint, referred to as topic persistence. Our sufficient conditions for identifiability involve a novel set of expansion conditions on the population structure (i.e. the topic-word matrix) of the persistent topic model. Specifically, we require the existence of a perfect matching from hidden variables to higher order observed variables, and can thus, incorporate overcomplete models. In particular, we establish that random models are identifiable w.h.p. in the overcomplete regime. Moreover, our analysis allows for general (nondegenerate) distributions for modeling the topic proportions. Our proof techniques incorporate a novel class of tensor decompositions which fall in between the well-known candecomp/parafac (CP) and Tucker decompositions, and provide novel conditions for unique tensor decomposition.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 16, 2013
Accession Number
ADA604842

Entities

People

  • Anima Anandkumar
  • Daniel Hsu
  • Majid Janzamin
  • Sham Kakade

Organizations

  • University of California, Irvine

Tags

Communities of Interest

  • C4I
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Algebraic Geometry
  • Algorithms
  • Artificial Intelligence
  • Artificial Intelligence Software
  • Coding
  • Computer Science
  • Dictionaries
  • Generative Models
  • Hidden Markov Models
  • Machine Learning
  • Markov Models
  • Models
  • Notation
  • Probabilistic Models
  • Probability
  • Probability Distributions
  • Random Variables

Fields of Study

  • Mathematics

Readers

  • Data Mining and Knowledge Discovery.
  • Operations Research
  • Statistical inference.