A modern maximum-likelihood theory for high-dimensional logistic regression

Abstract

Logistic regression is a popular model in statistics and machine learning to fit binary outcomes and assess the statistical significance of explanatory variables. Here, the classical theory of maximum-likelihood (ML) estimation is used by most software packages to produce inference. In the now common setting where the number of explanatory variables is not negligible compared with the sample size, we show that classical theory leads to inferential conclusions that cannot be trusted. We develop a theory that provides expressions for the bias and variance of the ML estimate and characterizes the asymptotic distribution of the likelihood-ratio statistic under some assumptions regarding the distribution of the explanatory variables. This theory can be used to provide valid inference.

Document Details

Document Type: Pub Defense Publication
Publication Date: Jul 01, 2019
Source ID: 10.1073/pnas.1810420116

Entities

People

Emmanuel Candès
Pragya Sur

Organizations

National Science Foundation
Office of Naval Research
Simons Foundation
Stanford University
Stanford University School of Humanities and Sciences

A modern maximum-likelihood theory for high-dimensional logistic regression

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas