EXTENSIONS OF THE TWO-ARMED BANDIT AND RELATED PROCESSES WITH ON-LINE EXPERIMENTATION.

Abstract

Sequential decision problems are considered in which immediate payoffs are random with unknown means. Using prior knowledge, each decision is based on the prior expected payoff and the future value of information associated with the observed payoff. A fundamental theory is developed and used to extend the two-armed bandit model to multiple arms and setup costs or bonuses, assuming only two states of nature. Conjugate prior densities lead to a 'stay-on-the-winner' rule for bounded variables. A least-squares policy iteration method is developed for computation. Bounds on the optimal return function are derived for general stochastic dynamic programming problems. (Author)

Document Details

Document Type: Technical Report
Publication Date: Nov 15, 1965
Accession Number: AD0623884

Entities

People

Kent Quisel

Organizations

Stanford University

EXTENSIONS OF THE TWO-ARMED BANDIT AND RELATED PROCESSES WITH ON-LINE EXPERIMENTATION.

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Readers