Deep Recurrent Q-Network Approach for Multi Objective Markov Decision Process in Partially Observable Environment

Abstract

The purpose of the project is to introduce a deep learning method for multi-objectiveaction prediction in partially observable MDPs (POMDPs). The proposed architectureis able to capture long histories of observation and non-linear relationship of thespatio-temporal behavior in order to compute the likelihood of the predicted action ineach of the objective to be accepted. This is useful for next action prediction which isa very important aspect in business because strategic planning such as optimizedlogistics delivery can be proactively performed. Existing works have focused mostlyon robotics and games as the Markov Decision Process (MDP) environmentsimulation and implemented for environment with complete observation and mostlyfor single objectives. Markov Decision Processes (MDPs) have been mostly used as atool for modeling action prediction. This is by the utilization of vector-valued rewards[1], [2] to learn about the actions of an autonomous decision-making agent to helphumans make better-informed decisions. Multi-objective MDP [3] allows severalobjectives to be achieved through the maximization of each objective’s rewardsvector learned in deep reinforcement learning. The proposed work will develop andevaluate the performance of the deep learning architecture for POMDPs to besimulated to learn multiple objectives such as (i) delivery scheduling, (ii) revenuemaximization and (iii) least distance optimization; which would be used as the inputto predict and recommend for optimized logistics delivery routes planning. Theresults will be compared against (i) standard and deep recurrent Q-network methodsto measure improvement in terms of POMDP learning, and (ii) multi-objective deepQ-networks in both complete and partially observable environment.

Document Details

Document Type: DoD Grant Award
Publication Date: Jul 24, 2019
Source ID: FA23861814079

Entities

People

Nurfadhlina Sharef

Organizations

Air Force Office of Scientific Research
United States Air Force
University of Putra Malaysia

Deep Recurrent Q-Network Approach for Multi Objective Markov Decision Process in Partially Observable Environment

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas