A modern maximum-likelihood theory for high-dimensional logistic regression
Abstract
Logistic regression is a popular model in statistics and machine learning to fit binary outcomes and assess the statistical significance of explanatory variables. Here, the classical theory of maximum-likelihood (ML) estimation is used by most software packages to produce inference. In the now common setting where the number of explanatory variables is not negligible compared with the sample size, we show that classical theory leads to inferential conclusions that cannot be trusted. We develop a theory that provides expressions for the bias and variance of the ML estimate and characterizes the asymptotic distribution of the likelihood-ratio statistic under some assumptions regarding the distribution of the explanatory variables. This theory can be used to provide valid inference.
Document Details
- Document Type
- Pub Defense Publication
- Publication Date
- Jul 01, 2019
- Source ID
- 10.1073/pnas.1810420116
Entities
People
- Emmanuel Candès
- Pragya Sur
Organizations
- National Science Foundation
- Office of Naval Research
- Simons Foundation
- Stanford University
- Stanford University School of Humanities and Sciences