Nonlinear primal-dual hybrid algorithms to perform variable selection with logistic regression effic

Abstract

Logistic regression is a widely used statistical model to describe the relationship between a binary response variable and predictor, variables in data sets. It is often used in machine learning to identify important predictor variables, whose numbers in big data s,ets can range from hundreds of thousands to billions. This task, variable selection, is frequently applied to problems in natural la,nguage processing, economics, social science and biology, and medicine, among others.Due to the size of modern big data sets, variab,le selection methods require efficient and robust optimization algorithms to perform well. State-of-the-art algorithms for variable,selection methods, however, were not traditionally designed to handle big data sets; they either lack scalable parallelism or scale,poorly in size, or are prone to produce unreliable numerical results. The lack of scalability makes it challenging to apply variable, selection to big data sets, while the lack of robustness may impact both the accuracy and the false discovery rate of variable sele,ction methods. These limitations in efficiency and robustness make variable selection methods on big data sets essentially impossibl,e without access to adequate and costly computational resources. Further exacerbating this problem is that machine learning applicat,ions to big data increasingly rely on computing power to make progress. Without efficient and robust algorithms to minimize monetary, and energy costs,these limitations prevent new scientific discoveries. Indeed, progress is expected to rapidly become economically,and environmentally unsustainable as computational requirements become a severe constraint. This constraint on computational require,ments, in particular, has been recently identified as an important challenge to overcome in evolutionary biology and ecology to unde,rstand the influence of climate change on species distributions.The proposed work is to develop technology for both civil and milita,ry applications. Specifically, we propose a novel optimization algorithm that overcomes the limitations of state-of-the-art algorith,ms for variable selection methods in big data sets. It relies on developing novel mathematical analysis for accelerated nonlinear pr,imal-dual hybrid-based optimization algorithms. Novel theoretical results that connect logistic regression to Hamilton?Jacobi partia,l differential equations will be developed. The latter theoretical connection will be leveraged in the following ways. For example,,it will allow us to (a) automatically and numerically perform parameter selection of logistic regression-based variational models an,d (b) control the false discovery rate inherent to variable selection by combining it with knockofffilters. Finally, efficient imple,mentations for standard CPUs and specialized hardware, e.g., FPGAs, will be performed to obtain fast numerical codes.If successful t,he proposed algorithm will allow DoD to extract relevant information from big data sets in a robust and reliable way in order to mak,e informed, computational-based decisions. Our efficient implementations will also allow DoD to analyze data sets in a fast way so t,hat decision can be rapidly made.(Approved for Public Release)

Document Details

Document Type
DoD Grant Award
Publication Date
Aug 05, 2022
Source ID
N000142212667

Entities

People

  • Jérôme Darbon

Organizations

  • Brown University
  • Office of Naval Research
  • United States Navy

Tags

Readers

  • Distributed Systems and Data Platform Development
  • Operations Research
  • Regression Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms