Lasso-type recovery of sparse representations for high-dimensional data
Abstract
The Lasso (Tibshirani, 1996) is an attractive technique for regularization and variable selection for high-dimensional data, where the number of predictor variables p is potentially much larger than the number of samples n. However, it was recently discovered (Zhao and Yu, 2006; Zou, 2005; Meinshausen and Buehlmann, 2006) that the sparsity pattern of the Lasso estimator can only be asymptotically identical to the true sparsity pattern if the design matrix satisfies the so-called irrepresentable condition. The latter condition can easily be violated in applications due to the presence of highly correlated variables. Here we examine the behavior of the Lasso estimators if the irrepresentable condition is relaxed. Even though the Lasso cannot recover the correct sparsity pattern, we show that the estimator is still consistent in the l(sub 2)-norm sense for fixed designs under conditions on (a) the number s(sub n) of non-zero components of the vector Beta(sub n) and (b) the minimal singular values of the design matrices that are induced by selecting of order s(sub n) variables. The results are extended to vectors Beta in weak l(sub q)-balls with 0 < q < 1. Our results imply that, with high probability, all important variables are selected. The set of selected variables is a useful (meaningful) reduction on the original set of variables. Finally, our results are illustrated with the detection of closely adjacent frequencies, a problem encountered in astrophysics.
Document Details
- Document Type
- Technical Report
- Publication Date
- Dec 05, 2006
- Accession Number
- ADA472998
Entities
People
- Bin Yu
- Nicolai Meinshausen
Organizations
- University of California, Berkeley