Big Data Covariance estimation
Abstract
A wide range of statistical methods, commonly applied in the natural andsocial sciences and in engineering, require an estimate of the relationshipsexisting between many variables, as described by the covariance matrix.These tasks include discovering patterns in unstructured data with principal components analysis, classifying observations with linear and quadraticdiscriminant analysis, modeling a dependency network with probabilisticgraphical models and find an application, for example, in gene arrays, fMRI,text retrieval, image classification, spectroscopy, climate studies, telemetry,finance and macro-economic analysis.For small datasets, where the number of variables is much smaller thanthe sample size, the sample covariance matrix is a natural and efficient tool toestimate a covariance matrix. However, recent technological advances havebrought an explosion of Big Data covariance estimation problems, wherethe matrix that needs to be estimated is quite large compared to the samplesize. In this new setting the sample covariance matrix turns out to beinadequate and alternative statistical methodologies have been and are stillbeing developed. The proposed method outperforms current algorithms according to different optimality/convergence criteria and under different assumptions on the Eigen structure of the population covariance matrix. Furthermore, the method overcomes the problem of a posteriori selecting the rank of the low rank approximation (as this issue is automatically solved within the minimization problem) and at the same times provides a realistic estimate of the sparsitystructure in the residual matrix.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Feb 06, 2017
- Source ID
- FA95501710103
Entities
People
- Angela Montanari
Organizations
- Air Force Office of Scientific Research
- United States Air Force