When are Overcomplete Representations Identifiable? Uniqueness of Tensor Decompositions Under Expansion Constraints
Abstract
Overcomplete latent representations have been very popular for unsupervised feature learning in recent years. In this paper, we specify which overcomplete models can be identified given observable moments of a certain order. We consider probabilistic admixture or topic models in the overcomplete regime. While general overcomplete admixtures are not identifiable, we establish generic identifiability under a constraint, referred to as topic persistence. Our sufficient conditions for identifiability involve a novel set of expansion conditions on the population structure (i.e. the topic-word matrix) of the persistent topic model. Specifically, we require the existence of a perfect matching from hidden variables to higher order observed variables, and can thus, incorporate overcomplete models. In particular, we establish that random models are identifiable w.h.p. in the overcomplete regime. Moreover, our analysis allows for general (nondegenerate) distributions for modeling the topic proportions. Our proof techniques incorporate a novel class of tensor decompositions which fall in between the well-known candecomp/parafac (CP) and Tucker decompositions, and provide novel conditions for unique tensor decomposition.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jun 16, 2013
- Accession Number
- ADA604842
Entities
People
- Anima Anandkumar
- Daniel Hsu
- Majid Janzamin
- Sham Kakade
Organizations
- University of California, Irvine