Adversarial Risk Analysis for Optimal Obstacle Evasion
Abstract
ABSTRACT Research Problem and Objectives: We will combine the Canadian traveler s problem (CTP) with adversarial risk analysis (ARA), by using ARA to develop priors to be used in solving the CTP through reinforcement learning. Both aspects entail innovation, and th,e combination is even more so. The goal is to advance the complexity of reinforcement learning applications.Scope and Technical Appr,oaches: This is an example of a partially observed Markov decision process. The ARA generates the continually updated probabilities, that are inputs to the reinforcementlearning solution to the CTP. The research will solve an exemplar problem that is realistic eno,ugh to reflect plausible applications, but simple enough to be sanity-checked. We shall also develop theoretical results regarding p,roperties of our solution strategy. We envision to implement the project in two phases --- ideally, each phase being fulfilled in on,e year. In the first phase, we will focus on complete specification of the partially observed Markov decision process (state space i,ncluding the discretization of the traversal medium, transition probabilities, decision epochs, reward function, and knapsack constr,aints). In particular, we will fully specify the partially observed Markov decision process corresponding to the decisions made by, a patrol boat captain who seeks to avoid threats and minimize the damage she sustains. The transition probabilities and the capta,in?s beliefs about those will be built from level-1 and level-2 thinking models from ARA. Part of this specification will rely upon, subjective distributions developed bythe use of two kinds of ARA. This phase has two milestones: the problem specification and th,e completion of the ARA. In the second phase, we will use reinforcement learning to find the policy that is nearly optimal for the c,aptain. Given the partial observability and knapsack constraints, this will not be straightforward. This phase will also be spent, on writing the paper(s) describing the research, and these papers will be submitted for publication. The two milestones in this ph,ase are learning the policy function, efficient computational solutions, and the submission of the paper(s). In this phase we will a,lso work on the dissemination of developed methodology on the reinforcement learning solution to the partially observed Markov decis,ion problem. DoD Relevance and Impact on DoD Capabilities: The solution will be demonstrated by application to a realistic toy pro,blem in which the captain of a patrol boat seeks to avoid multiple kinds of threats from a drug-smuggling cartel. However, this appr,oach extends and applies to a broad class of problems, far beyond the patrol boat versus drug cartel fleet example that illustrates, the technique. Many situations entail partially observable states with random rewards and knapsack constraints on resources. Using, newideas from ARA to develop the required distributions for the Markov decision process is critical progress. Otherwise, one must r,ely upon either Bayes Nash equilibrium solution concepts, which are known to be unrealistic representations of people s reasoning, o,r upon probabilistic risk analysis, which does not properly account for intelligent adversaries. Improper reliance upon probabilisti,c risk analysis in strategic contexts was the focus of the National Research Council s criticism of the Department of Homeland Secur,ity s bioterrorism risk assessment.Research Outcomes and Deliverables: The development of the methodology will be the main research, outcome, while the m,rcement learning.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Jul 13, 2022
- Source ID
- N000142212572
Entities
People
- Elvan Ceyhan
Organizations
- Auburn University
- Office of Naval Research
- United States Navy