ON THE N-ARMED BANDIT AND OTHER PROBLEMS INVOLVING SEQUENTIAL DECISIONS.

Abstract

The N-armed bandit problem can be defined as the problem faced by a decision maker who is allowed a given number of opportunities to sample from N populations which generate random variables whose distributions are not known with certainty. If he wants to maximize his expected payoff, what should his strategy be. The paper gives a general formulation which is later specialized to the case where the N random variables have binary distributions. A numerical solution is given and some properties of the optimal strategy are discussed for both finite and infinite processes; finally some near optimal rules are proposed and evaluated. In a second part, the so called action-timing problem and some variants of it are studied. Models with known probability distributions are considered first and then this assumption is relaxed to allow for imperfectly known distributions with adaptive learning built into the policy. (Author)

Document Details

Document Type: Technical Report
Publication Date: Oct 01, 1966
Accession Number: AD0644430

Entities

People

Ivan Obregon

Organizations

Massachusetts Institute of Technology

ON THE N-ARMED BANDIT AND OTHER PROBLEMS INVOLVING SEQUENTIAL DECISIONS.

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers