Online Learning of Markovian Systems with Censored Poisson Arrivals
Abstract
This thesis deals with online optimization of discrete performance measures in Markovian models with incomplete information. We consider a setting where a physical realization of the model is sequentially obtained over a number of periods. The information gathered to date is used in order to efficiently run the model in future days. The information is incomplete in two ways: (i) model parameters are initially unknown (the demand rates in our case), but can be estimated from the physical realizations; and (ii), the demands are censored when the system is in some boundary states. The method of Sample Average Approximation is used to solve the optimization problem. More precisely, in each period, sample paths are generated from the distributions estimated to date, and the best model configuration is determined with respect to these sample paths. Sequential observation of the systems behavior allows for information to be gathered and a more informed decision to be made in each future round. The method developed in this thesis can be applied in a variety of contexts, where no information is known about the system beforehand, but can be observed at least partially in a sequential manner, such as assigning assets for surveillance of remote geographical regions for illicit activity. The motivating setting of this work is the operation of a bike-sharing system with fixed capacity stations, where an initial number of bikes must be set each day to minimize unsatisfied customers.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jun 01, 2021
- Accession Number
- AD1150967
Entities
People
- Cedric G. Gibbons Mac-lean
Organizations
- Naval Postgraduate School