Learning Evaluation Functions for Global Optimization

Abstract

In complex sequential decision problems such as scheduling factory production, planning medical treatments, and playing backgammon, optimal decision policies are in general unknown, and it is often difficult, even for human domain experts, to manually encode good decision policies in software. The reinforcement learning methodology of value function approximation (VFA) offers an alternative: systems can learn effective decision policies autonomously, simply by simulating the task and keeping statistics on which decisions lead to good ultimate performance and which do not. This thesis advances the state of the art in VFA in two ways. First, it presents three new VFA algorithms, which apply to three different restricted classes of sequential decision problems: Grow Support for deterministic problems, ROUT for acyclic stochastic problems, and Least Squares TD(lambda) for fixed policy prediction problems. Each is designed to gain robustness and efficiency over current approaches by exploiting the restricted problem structure to which it applies. Second, it introduces STAGE, a new search algorithm for general combinatorial optimization tasks. STAGE learns a problem specific heuristic evaluation function as it searches. The heuristic is trained by supervised linear regression or Least Squares TD(lambda) to predict, from features of states along the search trajectory, how well a fast local search method such as hill climbing will perform starting from each state. Search proceeds by alternating between two stages: performing the fast search to gather new training data, and following the learned heuristic to identify a promising new start state. STAGE has produced good results (in some cases, the best results known) on a variety of combinatorial optimization domains, including VLSI channel routing, Bayes net structure finding, bin packing, Boolean satisfiability, radiotherapy treatment planning, and geographic cartogram design.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 01, 1998
Accession Number
ADA356034

Entities

People

  • Justin Andrew Boyan

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Autonomy
  • C4I
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Bayesian Networks
  • Computational Science
  • Data Mining
  • Information Processing
  • Information Science
  • Information Systems
  • Machine Learning
  • Mathematical Filters
  • Network Science
  • Neural Networks
  • Operations Research
  • Probabilistic Models
  • Robotics
  • Three Dimensional
  • Two Dimensional

Fields of Study

  • Computer science

Readers

  • Operations Research

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Neural Networks