Learning Evaluation Functions for Global Optimization

Abstract

In complex sequential decision problems such as scheduling factory production, planning medical treatments, and playing backgammon, optimal decision policies are in general unknown, and it is often difficult, even for human domain experts, to manually encode good decision policies in software. The reinforcement learning methodology of value function approximation (VFA) offers an alternative: systems can learn effective decision policies autonomously, simply by simulating the task and keeping statistics on which decisions lead to good ultimate performance and which do not. This thesis advances the state of the art in VFA in two ways. First, it presents three new VFA algorithms, which apply to three different restricted classes of sequential decision problems: Grow Support for deterministic problems, ROUT for acyclic stochastic problems, and Least Squares TD(lambda) for fixed policy prediction problems. Each is designed to gain robustness and efficiency over current approaches by exploiting the restricted problem structure to which it applies. Second, it introduces STAGE, a new search algorithm for general combinatorial optimization tasks. STAGE learns a problem specific heuristic evaluation function as it searches. The heuristic is trained by supervised linear regression or Least Squares TD(lambda) to predict, from features of states along the search trajectory, how well a fast local search method such as hill climbing will perform starting from each state. Search proceeds by alternating between two stages: performing the fast search to gather new training data, and following the learned heuristic to identify a promising new start state. STAGE has produced good results (in some cases, the best results known) on a variety of combinatorial optimization domains, including VLSI channel routing, Bayes net structure finding, bin packing, Boolean satisfiability, radiotherapy treatment planning, and geographic cartogram design.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Aug 01, 1998
Accession Number: ADA356034

Entities

People

Justin Andrew Boyan

Organizations

Carnegie Mellon University

Learning Evaluation Functions for Global Optimization

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas