A Novel Approach to Design of Bandits: Deterministic Analytic Approximations for Bandit Moments

Abstract

This proposal introduces a novel theoretical framework for analyzing and designing stochastic multi-armed bandit (MAB) algorithms through approximations of their deterministic moment dynamics. While the classic MAB literature typically focuses on bounding regret with long-time analysis, we propose a fundamentally different approach that characterizes the approximate time-evolution of bandits using dynamical systems theory. Our framework provides analytic insights into how different policies drive exploration and exploitation, enabling systematic design of improved algorithms. The research first develops a methodology that allows us to derive closed-form expressions for the evolution of a bandit s first two moments (mean and covariance). In particular, by assuming normality ofthe bandit state distribution after an initial burn-in period and leveraging the analytical properties of the error function, we can approximate key probability terms that govern the bandit s behavior. This treatment recovers known asymptotic results, such as logarithmic regret growth, while providing new geometric insights into policy design through the projection onto "northstar" vectors defined by the difference in expected rewards between arms. Our approach establishes a profound connection between MAB problems and optimal control theory, introducing a transformative paradigm for designing bandit algorithms. Rather than following the traditionaltwo-step process of proposing policies and then analyzing their performance, we can then formulate regret minimization as an optimal control problem. This allows direct optimization of policy parameters to minimize cumulative regret, leveraging established techniques from control theory such as adjoint methods and Pontryagin s Maximum Principle. The research program consists of two main thrusts: (1) Analysis and Convergence of moments in general and the normal-error function approximation - establishing rigorous theoretical foundations through error bounds, convergence rates, and asymptotic analysis, including extension to deterministic bandits likeUpper Confidence Bound algorithms; and (2) Connection to Optimal Control Theory - developing a comprehensive framework for policy optimization through control-theoretic principles, enabling systematic design of improved algorithms. The proposed work provides the Office of Naval Research with a key theoretical arbitrage opportunity in sequential decision-making. While many organizations use off-the-shelf implementations of standard MAB algorithms, our framework enables custom design of better algorithms optimized for specific operational contexts. This advantage becomes particularly significant for contextual bandits in dynamic environments, where standard approaches often perform poorly or require excessive exploration. Deliverables include new mathematical models and numerical methods implemented in open-source software, with results disseminated through top conferencesand journals in machine learning, control theory, and applied mathematics. The project will be executed over 36 months by an experienced team with complementary expertise in stochastic modeling, machine learning, and control theory from Columbia University and the Basque Center for Applied Mathematics. Our novel theoretical framework has the potential to significantly advance the state-of-the-art in sequential decision-making under uncertainty. By providing systematic design principles based on problem geometry and enabling precise tuning of exploration-exploitation tradeoffs, this research opens new avenues for developing more efficient and robust bandit algorithms. The resulting methods will have broad applicability across domains requiring adaptive decision-making, from experimental design to personalized medicine to autonomous systems.Approved for Public Release

Document Details

Document Type: DoD Grant Award
Publication Date: Apr 10, 2025
Source ID: N000142512251

Entities

People

Chris Wiggins

Organizations

Office of Naval Research
Trustees of Columbia University in the City of New York
United States Navy

A Novel Approach to Design of Bandits: Deterministic Analytic Approximations for Bandit Moments

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas