Control And Learning Of Uncertain Dynamical Systems: Optimization, Sampling, And Regret
Abstract
This report shows that first order methods can be used to provide an effective bridge between optimal control theory and sample-based reinforcement learning. The work focuses on the linear quadratic regulator problem and Markov decision processes. Some of the results include a proof that gradient descent starting from a stabilizing policy converges to the globally optimal policy and an algorithm that provides nearly tight regret bounds for the control of a linear dynamical system with adversarial disturbances.
Document Details
- Document Type
- Technical Report
- Publication Date
- Nov 01, 2019
- Accession Number
- AD1093314
Entities
People
- Maryam Fazel
- Mehran Mesbahi
- Sham Kakade
Organizations
- University of Washington