Lasso-type recovery of sparse representations for high-dimensional data

Abstract

The Lasso (Tibshirani, 1996) is an attractive technique for regularization and variable selection for high-dimensional data, where the number of predictor variables p is potentially much larger than the number of samples n. However, it was recently discovered (Zhao and Yu, 2006; Zou, 2005; Meinshausen and Buehlmann, 2006) that the sparsity pattern of the Lasso estimator can only be asymptotically identical to the true sparsity pattern if the design matrix satisfies the so-called irrepresentable condition. The latter condition can easily be violated in applications due to the presence of highly correlated variables. Here we examine the behavior of the Lasso estimators if the irrepresentable condition is relaxed. Even though the Lasso cannot recover the correct sparsity pattern, we show that the estimator is still consistent in the l(sub 2)-norm sense for fixed designs under conditions on (a) the number s(sub n) of non-zero components of the vector Beta(sub n) and (b) the minimal singular values of the design matrices that are induced by selecting of order s(sub n) variables. The results are extended to vectors Beta in weak l(sub q)-balls with 0 < q < 1. Our results imply that, with high probability, all important variables are selected. The set of selected variables is a useful (meaningful) reduction on the original set of variables. Finally, our results are illustrated with the detection of closely adjacent frequencies, a problem encountered in astrophysics.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Dec 05, 2006
Accession Number: ADA472998

Entities

People

Bin Yu
Nicolai Meinshausen

Organizations

University of California, Berkeley

Lasso-type recovery of sparse representations for high-dimensional data

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Readers

Technology Areas