A modern maximum-likelihood theory for high-dimensional logistic regression

Abstract

Logistic regression is a popular model in statistics and machine learning to fit binary outcomes and assess the statistical significance of explanatory variables. Here, the classical theory of maximum-likelihood (ML) estimation is used by most software packages to produce inference. In the now common setting where the number of explanatory variables is not negligible compared with the sample size, we show that classical theory leads to inferential conclusions that cannot be trusted. We develop a theory that provides expressions for the bias and variance of the ML estimate and characterizes the asymptotic distribution of the likelihood-ratio statistic under some assumptions regarding the distribution of the explanatory variables. This theory can be used to provide valid inference.

Document Details

Document Type
Pub Defense Publication
Publication Date
Jul 01, 2019
Source ID
10.1073/pnas.1810420116

Entities

People

  • Emmanuel Candès
  • Pragya Sur

Organizations

  • National Science Foundation
  • Office of Naval Research
  • Simons Foundation
  • Stanford University
  • Stanford University School of Humanities and Sciences

Tags

Fields of Study

  • Mathematics

Readers

  • Regression Analysis.
  • Statistical inference.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Learning Algorithms