Learning to Control Renewal Processes with Bandit Feedback

Abstract

We consider a bandit problem with K task types from which the controller activates one task at a time. Each task takes a random and possibly heavy-tailed completion time, and a reward is obtained only after the task is completed. The task types are independent from each other, and have distinct and unknown distributions for completion time and reward. For a given time horizon τ, the goal of the controller is to schedule tasks adaptively so as to maximize the reward collected until τ expires. In addition, we allow the controller to interrupt a task and initiate a new one. In addition to the traditional exploration-exploitation dilemma, this interrupt mechanism introduces a new one: should the controller complete the task and get the reward, or interrupt the task for a possibly shorter and more rewarding alternative? We show that for all heavy-tailed and some light-tailed completion time distributions, this interruption mechanism improves the reward linearly over time. Applications of this model include server scheduling, optimal free sampling strategies in advertising and adaptive content selection. From a learning perspective, the interrupt mechanism necessitates learning the whole arm distribution from truncated observations. For this purpose, we propose a robust learning algorithm named UCB-BwI based on median-of-means estimator for possibly heavy-tailed reward and completion time distributions. We show that, in a K-armed bandit setting with an arbitrary set of L possible interrupt times, UCB-BwI achieves O(Kłog(τ)+KL) regret. We also prove that the regret under any admissible policy is Ømega(Kłog(τ)), which implies that UCB-BwI is order optimal.

Document Details

Document Type
Pub Defense Publication
Publication Date
Jun 19, 2019
Source ID
10.1145/3341617.3326158

Entities

People

  • Atilla Eryilmaz
  • R. Srikant
  • Semih Cayci

Organizations

  • Defense Threat Reduction Agency
  • National Science Foundation
  • Ohio State University
  • University of Illinois Urbana–Champaign

Tags

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Mathematical Modeling and Probability Theory.
  • Statistical inference.