Finding a Needle in a Haystack: Utilizing Structures and Predictive Information in Online Optimization
Abstract
Approved for Public ReleaseModern technologies have enabled the acquisition of a large amount of data at an unprecedented rate. Thes e data can be extremely helpful in decision-making processes in time-varying environments, where data can be used to adapt to change s, making more efficient decisions. However, adapting can be challenging in time-varying combinatorial environments where the decisi on-making process involves choosing from exponentially many a ck problems for portfolio optimization, job scheduling, influence maximization in viral marketing, and product ranking on online pla tforms. This research investigates how to facilitate online decision-making processes in time-varying combinatorial environments. It is demanding to strike a balance between exploitation (choosing an action with the highest potential given current data) and explor ation (choosing another action to learn more about the environment).We develop new online learning algorithms that allow decision-ma kers to trade off exploitation against exploration, removing the necessity of experimenting with many options. To do so, we propose to utilize the structures and possible predictive information regarding the underlying combinatorial environments to design fast-lea rning algorithms. We examine to what extent the structures of these environments can be utilized by imitating (approximate) algorith ms designed for offline decision-making. In offline settings, decision-makers have the full knowledge of reward/cost function, and t hey aim to find a near-optimal action in polynomial time with the help of an offline (approximate) algorithm. The offline algorithm, itation during online decision-making. This research investigates if imitating offline algorithms can lead to efficient learning alg orithms for a largeclass of combinatorial decision-making processes. We focus on those that admit either myopic or dynamic programmi ng-based (DP) algorithms for their offline settings (e.g. influence maximization problem for viral marketing, knapsack and job-sched uling problems).When the offline setting admits a myopic algorithm, the near-optimal offline solution is built through a sequence of steps, making the locally optimal choice in each step. To imitate the offline myopic algorithms in the online decision-making proce sses, we explore if every step of the myopic algorithm can be handled with a reasonable error, despite uncertainty. When the offline setting admits a DP-based algorithm, the DP formulation can be represented with a directed acyclic graph (DAG), where every node in the graph represents a potential state in the DP. In online processes, we need to ensure that every node performs its assigned loca l recursive taskenforced by the DPreasonably well, resulting in an online learning algorithmwith strong theoretical performance.An e, the decision-maker may have a forecast of the current reward function or a prediction on how fast the environment evolves over ti me. To incorporate predictive information, we face two main challenges: (i) it is not often obvious how to incorporate any accurate predictive information into online decision-making, and (ii) the predictive information may be inaccurate. Thus it is imperative to design a theory that allows us to incorporate robustly different types of predictive information into online decision-making, allevi ating the exploration-exploitation dilemma. We study how to augment our online learning algorithms that imitate their corresponding offline algorithms with predictive information, leading to robust augmented learning algorithms that significantly outperform algori thms that ignore this information.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Aug 20, 2021
- Source ID
- N000142112776
Entities
People
- Negin Golrezaei
Organizations
- Massachusetts Institute of Technology
- Office of Naval Research
- United States Navy