Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data

Abstract

Association studies to discover links between genetic markers and phenotypes are central to bioinformatics. Methods of regularized regression, such as variants of the Lasso, are popular for this task. Despite the good predictive performance of these methods in the average case, they suffer from unstable selections of correlated variables and inconsistent selections of linearly dependent variables. Unfortunately, as we demonstrate empirically, such problematic situations of correlated and linearly dependent variables often exist in genomic datasets and lead to under-performance of classical methods of variable selection.

Document Details

Document Type
Pub Defense Publication
Publication Date
Sep 01, 2018
Source ID
10.1093/bioinformatics/bty750

Entities

People

  • Benjamin J Lengerich
  • Bryon Aragam
  • Eric P. Xing
  • Haohan Wang

Organizations

  • Carnegie Mellon University
  • National Institutes of Health
  • United States Department of Defense

Tags

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Molecular and genetic basis of cancer.
  • Neural Network Machine Learning.

Technology Areas

  • Biotechnology