Multi-Armed Bandits with Delayed and Aggregated Rewards

Abstract

We study the canonical multi-armed bandit problem under delayed feedback. Recently proposed algorithms have desirable regret bounds in the delayed-feedback setting but require strict prior knowledge of expected delays. In this work, we study the regret of such delay-resilient algorithms under milder assumptions on delay distributions. We experimentally investigate known theoretical performance bounds and attempt to improve on a recently proposed algorithm by making looser assumptions on prior delay knowledge. Further, we investigate the relationship between delay assumptions and marking an arm as suboptimal.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 01, 2019
Accession Number
AD1078688

Entities

People

  • Chirag Gupta
  • Conor Igoe
  • Jacob Tyo
  • Jonathon Byrd
  • Ojash Neopane

Tags

Communities of Interest

  • Biomedical
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Buildings And Structures
  • Environment
  • Equations
  • Feedback
  • Mathematics
  • Military Research
  • Monitoring
  • Normal Distribution
  • Observation
  • Probability
  • Research Facilities

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Regression Analysis.