Using Strong Convergence to Accelerate Value Iteration.
Abstract
Convergence of the relative value function (total value function less total value of a base state) and the optimal policy in length of the horizon for value iteration (markov decision programming) have been recently shown to be geometric with factor alpha beta. Here alpha is the discount factor, and beta < or = 1.0. The case beta < 1.0 is termed 'strong convergence'. It is suggested in this paper that bounds on the convergence rate be estimated computationally during the value iteration process, giving bounds directly on the extrapolated infinite horizon relative value function. Such an extrapolation has two purposes. First, large numbers of iterations in value iteration can be skipped by continuing computation directly with the estimated infinite horizon relative value function (this is directly analogous to quadratic acceleration procedures in non-linear programming). Second, existing procedures for the elimination of non-optimal actions are greatly strengthened, since actions can be eliminated permanently once bounds on the infinite horizon relative value function become sufficiently tight.
Document Details
- Document Type
- Technical Report
- Publication Date
- Mar 01, 1976
- Accession Number
- ADA025604
Entities
People
- Thomas E. Morton
Organizations
- Carnegie Mellon University