ON THE N-ARMED BANDIT AND OTHER PROBLEMS INVOLVING SEQUENTIAL DECISIONS.

Abstract

The N-armed bandit problem can be defined as the problem faced by a decision maker who is allowed a given number of opportunities to sample from N populations which generate random variables whose distributions are not known with certainty. If he wants to maximize his expected payoff, what should his strategy be. The paper gives a general formulation which is later specialized to the case where the N random variables have binary distributions. A numerical solution is given and some properties of the optimal strategy are discussed for both finite and infinite processes; finally some near optimal rules are proposed and evaluated. In a second part, the so called action-timing problem and some variants of it are studied. Models with known probability distributions are considered first and then this assumption is relaxed to allow for imperfectly known distributions with adaptive learning built into the policy. (Author)

Document Details

Document Type
Technical Report
Publication Date
Oct 01, 1966
Accession Number
AD0644430

Entities

People

  • Ivan Obregon

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Learning
  • Mathematics
  • Probability
  • Probability Distributions
  • Random Variables

Fields of Study

  • Mathematics

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Mathematical Modeling and Probability Theory.
  • Team-Based Human-Centered Cognitive Task Decision Making and Information Performance.