Practical Optimality Guarantees in Estimation and Learning
Abstract
With the amount and heterogeneity of data generated and collected from modernsensors~medical, web-based, physical experiments~classical statistical learning tools and theories are limited, as they generally fail to address considerations outside of statistical accuracy. Researchers in machine learning and statistics have developed a deep and wide-ranging theory of optimal estimation and learning. In modern inferential problems, however, practical use dictates additional constraints on procedures: for example, we might wish to use computationally efficient algorithms, procedures with limited communication and memory, maintain privacy of study participants, or guarantee that procedures can use heterogeneous, non-independent and identically distributed, or heavy-tailed data. These constraints impose costs on procedures, and for a theory of optimality to be practicable, we must understand these costs and tradeoffs. To that end, this proposal will develop a practical theory of optimality, showing how constraints on procedures must necessarily trade against statistical and learning performance, and concomitant optimal procedures. We take a three-pronged approach, building understanding of real-world resource constraintsthrough (i) instance-specific complexity measures, (ii) better models of computation, and (iii) new measures of sample quality, especially in the context of heterogeneous data.There are a number of directions relevant to the Office of Naval Research andDepartment of Defense. These fall under mathematical data science, and the research in this proposal focuses on three particular items: resource constraints in learning, large-scale optimization, and data heterogeneity and robustness. Success in this proposal will improve our understanding of (and ability to exploit) necessary tradeoffs in computation versus statistical modeling. It will also yield better large-scale optimization algorithms more able to adapt to problem structure and real-world computational substrates; the proposed methods (if successful) will be faster, more accurate, and use less energy for model fitting than current online and other large scale optimization approaches. Beyond computation, many resource constraints are of interest. For example, with appropriate privacy controls, we may be able to better use the medical records for millions of service members for treatment recommendations and disease study. Another area of importance is in high-sensitivity decision makingscenarios and problems with heterogeneous data sources, where the methods we develop will trade between achieving good performance on average (the typical metric for predictor quality) and achieving uniformly good performance.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- May 23, 2019
- Source ID
- N000141912288
Entities
People
- John Duchi
Organizations
- Office of Naval Research
- Stanford University
- United States Navy