Learning and Coordination for Cross-Layer Optimization in Tactical Wireless Networks

Abstract

Statement of Work and Basic Approach: The next generation of tactical wireless networks will need to support diverse real-time applications, and must operate over a range of spectrum bands and deliver heterogeneous link performance metrics such as ultra-low latency and ultra-high reliability. They must be able to dynamically learn network states, and conduct cross-layer optimizations across channel selection (PHY), scheduling (MAC) and routing (network) options in a distributed manner to maximize timely-throughput (i.e., satisfy per-packet latency guarantees) in a reliable manner. Our objective in this project is to explore the use of online reinforcement learning across multiple concurrent agents to enable such cross-layer optimization in tactical wireless networks. Scientific Objectives: Our goals are to address two fundamental themes in algorithm design for such optimization. First, we desire to design a systematic online reinforcement learning (RL) framework towards exploring and identifying cross-layer decisions with provable performance guarantees on learning efficiency. Second, we desire a principled approach to coordination design to enable a large number of agents to explore a valuable portion of the state space that minimizes conflict in accessing network resources. The ultimate goal is to generate viable algorithm designs that provably satisfy performance requirements, and can be validated through empirical studies. The project is divided into three complementary thrusts as follows: Thrust 1: Online Learning at a Single Node. We will develop algorithms applicable to channel selection and packet scheduling problems at a single node, posing the problems in the form of sequential cascading bandits and queueing bandits. Our goal will be to develop novel posterior sampling approaches that ensure low regret online learning to attain short queues and high timely-throughput. Thrust 2: Concurrent RL in a Deadline-Constrained Wireless Network. We will solve the global timely-throughput maximization problem in a distributed manner using an episodic Markov Decision Processes (MDPs) with an unknown kernel (link success probabilities) that is learned concurrently by multiple agents (packets). Here too, posterior kernel sampling will aid in low regret reinforcement learning. Thrust 3: Multi-Agent RL under the Mean Field Approximation. We will use the simplifications provided by the mean field game framework to obtain a complementary approach to the timely- throughput maximization problem, under which price-taking mean-field agents individually solve MDPs. In doing so, they aid the learning of kernel parameters across all agents, and we will study convergence to and characterization of the cooperative equilibria attained. Methods to be Employed: The main methodological contributions will as follows. The first contribution will be in the area of posterior sampling as a means of efficient online learning of the kernel in MDPs. The goals will be to obtain structural regret characterizations of this approach. The second contribution will be in evaluating mean field models as a means of simplifying multi-agent reinforcement learning. Here, the goal will be to characterize the optimality of equilibria attained. Significance of Proposed Effort: From a scientific perspective, the work will provide a systematic framework for developing online learning algorithms for cross-layer optimization in wireless networks. The methodological contributions will apply to the field of reinforcement learning as a whole. From the perspective of the Army, self-configuring networks are vital to the future of battlefield communication for supporting a variety of realtime and non-realtime sources of traffic under time-varying network connectivity and spectrum availability. This research will also contribute to the training of the US workforce in the science of wireless communications.

Document Details

Document Type
DoD Grant Award
Publication Date
Jun 25, 2019
Source ID
W911NF1910367

Entities

People

  • Srinivas Shakkottai

Organizations

  • Army Contracting Command
  • Texas Engineering Experiment Station
  • United States Army

Tags

Fields of Study

  • Computer science

Readers

  • Adaptive Control and Estimation with Uncertainty in Dynamic Systems.
  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Computer Networking

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • Space