Sequential Decision Making with Human Biases
Abstract
Machine learning and artificial intelligence (AI) have been increasingly involved in sequential decision making in our daily life. In the idealized, clean and simple scenario, machine learning algorithms can autonomously compute and execute the optimal policy using standard techniques such as reinforcement learning. Indeed, we have seen an increasing number of applications of automatic sequential decision making, from determining which advertisements users see online, to operating robotic entities, to navigating unmanned aerial vehicles.Despite the success of automatic decision making in these applications, in situations when the stake is high or when the learning models/objectives are too complicated to be accurately specified, such as in government policy making or in military applications, fully automatic decision making has not achieved the level of being fully trusted and may lead to suboptimal outcomes. Therefore,humans are often brought into the loop to make the final call on what policy to execute. In this type of human-AI decision making frameworks, learning algorithms make recommendations based on the calculated optimal policy, and human decision makers decide on what policy to deploy, based on the recommended policy. Here the goal of machine learning and AI is to augment, instead of replacing, humans in the loop of decision making. In this research proposal, we plan to investigate the above mentioned human-AI decision makingframework, with the focus on studying how human decision biases affect the adopted policy, whether we can detect and even exploit the biases of the opponent decision maker, and how to design decision making policies that are robust to opponents attempts for exploitation. In particular, our proposal consists of three major components: Thrust 1: Formalize the human-AI sequential decision making framework that incorporates human decision biases. Quantify the effects of human biases and design interventions (e.g., what information to show to human decision maker) to help improve the outcome of the policy. Thrust 2: Design strategies that detect and exploit the biases of opponent decision maker. Develop approaches that learn decision makers payoff functions from their actions (e.g., using techniques from inverse reinforcement learning). Quantify the amount of information leak from revealed actions (e.g., how much the enemy can learn from the commanders objective from observing actions taken by the commander). Thrust 3: Develop strategies that are robust to exploitation. Develop sequential decision making algorithms that are robust to enemy exploitation. Formalize the tradeoffs of minimizing the information leak and maximizing the decision makers utility. 1 Sequential Decision Making with Human Biases
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- May 08, 2020
- Source ID
- N000142012240
Entities
People
- Chien-ju Ho
Organizations
- Office of Naval Research
- United States Navy
- Washington University in St. Louis