A Hierarchical Network of Provably Optimal Learning Control Systems: Extensions of the Associative Control Process (ACP) Network
Abstract
An associative control process (ACP) network is a learning control system that can reproduce a variety of animal learning results from classical and instrumental conditioning experiments (Klopf, Morgan, and Weaver, 1993; see also the article, 'A Hierarchical Network of Control Systems that Learn'). The ACP networks proposed and tested by Klopf, Morgan, and Weaver are not guaranteed, however, to learn optimal policies for maximizing reinforcement. Optimal behavior is guaranteed for a reinforcement learning system such as Q- learning (Watkins, 1989), but simple Q-learning is incapable of reproducing the animal learning results that ACP networks reproduce. We propose two new models that reproduce the animal learning results and are provably optimal. The first model, the modified ACP network, embodies the smallest number of changes necessary to the ACP network to guarantee that optimal policies will be learned while still reproducing the animal learning results. The second model, the single-layer ACP network, embodies the smallest number of changes necessary to Q-learning to guarantee that is reproduced the animal learning results while still learning optimal policies
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 1993
- Accession Number
- ADA268288
Entities
People
- A. H. Klopf
- Leemon C. Baird Iii
Organizations
- Wright Laboratory