Learning To Maximize Welfare with a Reusable Resource

Abstract

Considerable work has focused on optimal stopping problems where random IID offers arrive sequentially for a single available resource which is controlled by the decision-maker. After viewing the realization of the offer, the decision-maker irrevocably rejects it, or accepts it, collecting the reward and ending the game. We consider an important extension of this model to a dynamic setting where the resource is "renewable'' (a rental, a work assignment, or a temporary position) and can be allocated again after a delay period d. In the case where the reward distribution is known a priori, we design an (asymptotically optimal) 1/2-competitive Prophet Inequality, namely, a policy that collects in expectation at least half of the expected reward collected by a prophet who a priori knows all the realizations. This policy has a particularly simple characterization as a thresholding rule which depends on the reward distribution and the blocking period d, and arises naturally from an LP-relaxation of the prophet's optimal solution. Moreover, it gives the key for extending to the case of unknown distributions; here, we construct a dynamic threshold rule using the reward samples collected when the resource is not blocked. We provide a regret guarantee for our algorithm against the best policy in hindsight, and prove a complementing minimax lower bound on the best achievable regret, establishing that our policy achieves, up to poly-logarithmic factors, the best possible regret in this setting.

Document Details

Document Type: Pub Defense Publication
Publication Date: May 26, 2022
Source ID: 10.1145/3530893

Entities

People

Constantine Caramanis
Matthew Faw
Orestis Papadigenopoulos
Sanjay Shakkottai

Organizations

National Science Foundation
Office of Naval Research
University of Texas at Austin

Learning To Maximize Welfare with a Reusable Resource

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers