Multi-Armed Bandits with Delayed and Aggregated Rewards

Abstract

We study the canonical multi-armed bandit problem under delayed feedback. Recently proposed algorithms have desirable regret bounds in the delayed-feedback setting but require strict prior knowledge of expected delays. In this work, we study the regret of such delay-resilient algorithms under milder assumptions on delay distributions. We experimentally investigate known theoretical performance bounds and attempt to improve on a recently proposed algorithm by making looser assumptions on prior delay knowledge. Further, we investigate the relationship between delay assumptions and marking an arm as suboptimal.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Aug 01, 2019
Accession Number: AD1078688

Entities

People

Chirag Gupta
Conor Igoe
Jacob Tyo
Jonathon Byrd
Ojash Neopane

Multi-Armed Bandits with Delayed and Aggregated Rewards

Abstract

Document Details

Entities

People

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers