Lasso-type recovery of sparse representations for high-dimensional data

Abstract

The Lasso (Tibshirani, 1996) is an attractive technique for regularization and variable selection for high-dimensional data, where the number of predictor variables p is potentially much larger than the number of samples n. However, it was recently discovered (Zhao and Yu, 2006; Zou, 2005; Meinshausen and Buehlmann, 2006) that the sparsity pattern of the Lasso estimator can only be asymptotically identical to the true sparsity pattern if the design matrix satisfies the so-called irrepresentable condition. The latter condition can easily be violated in applications due to the presence of highly correlated variables. Here we examine the behavior of the Lasso estimators if the irrepresentable condition is relaxed. Even though the Lasso cannot recover the correct sparsity pattern, we show that the estimator is still consistent in the l(sub 2)-norm sense for fixed designs under conditions on (a) the number s(sub n) of non-zero components of the vector Beta(sub n) and (b) the minimal singular values of the design matrices that are induced by selecting of order s(sub n) variables. The results are extended to vectors Beta in weak l(sub q)-balls with 0 < q < 1. Our results imply that, with high probability, all important variables are selected. The set of selected variables is a useful (meaningful) reduction on the original set of variables. Finally, our results are illustrated with the detection of closely adjacent frequencies, a problem encountered in astrophysics.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 05, 2006
Accession Number
ADA472998

Entities

People

  • Bin Yu
  • Nicolai Meinshausen

Organizations

  • University of California, Berkeley

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Biological Sciences
  • Data Analysis
  • Data Science
  • Detection
  • Estimators
  • Frequency
  • Information Science
  • Information Theory
  • Mathematics
  • Monte Carlo Method
  • Notation
  • Probability
  • Recovery
  • Statistical Algorithms
  • Statistics
  • Theorems

Readers

  • Analytical Mechanics
  • Neural Network Machine Learning.
  • Operations Research

Technology Areas

  • Space