Multi-Armed Bandits with Delayed and Aggregated Rewards
Abstract
We study the canonical multi-armed bandit problem under delayed feedback. Recently proposed algorithms have desirable regret bounds in the delayed-feedback setting but require strict prior knowledge of expected delays. In this work, we study the regret of such delay-resilient algorithms under milder assumptions on delay distributions. We experimentally investigate known theoretical performance bounds and attempt to improve on a recently proposed algorithm by making looser assumptions on prior delay knowledge. Further, we investigate the relationship between delay assumptions and marking an arm as suboptimal.
Document Details
- Document Type
- Technical Report
- Publication Date
- Aug 01, 2019
- Accession Number
- AD1078688
Entities
People
- Chirag Gupta
- Conor Igoe
- Jacob Tyo
- Jonathon Byrd
- Ojash Neopane