Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data
Abstract
Association studies to discover links between genetic markers and phenotypes are central to bioinformatics. Methods of regularized regression, such as variants of the Lasso, are popular for this task. Despite the good predictive performance of these methods in the average case, they suffer from unstable selections of correlated variables and inconsistent selections of linearly dependent variables. Unfortunately, as we demonstrate empirically, such problematic situations of correlated and linearly dependent variables often exist in genomic datasets and lead to under-performance of classical methods of variable selection.
Document Details
- Document Type
- Pub Defense Publication
- Publication Date
- Sep 01, 2018
- Source ID
- 10.1093/bioinformatics/bty750
Entities
People
- Benjamin J Lengerich
- Bryon Aragam
- Eric P. Xing
- Haohan Wang
Organizations
- Carnegie Mellon University
- National Institutes of Health
- United States Department of Defense