Deep Recurrent Q-Network Approach for Multi Objective Markov Decision Process in Partially Observable Environment

Abstract

Prediction of relevant items to the users interest in a recommendation system (RS), is an example of partially observable Markov Decision Process (POMDPs) as users interests fluctuate over time and the items satisfaction rating matrix is typically sparse. This problem also requires multi-objectives optimization (MOO) for multi-objectives which are precision, novelty and diversity. Existing solutions on MOO are based on evolutionary algorithms, which requires combination with rating prediction techniques such as collaborative filtering to fill up the sparse matrix prior to producing recommendation. However, collaborative filtering has limitations when handling cold start or new users. Most RS merely focus on accuracy of high-rating or trendy items predictions. However, other metrics such as novelty and diversity which are equally essential to generate more quality recommendation have mostly been ignored. The main challenge of considering multiple evaluation metrics is the conflict between the objectives, since to improve either one metrics will hurt the accuracy and vice versa. Results have shown that the DRL approaches, which are the first available DRL approach for MOO in movie recommendation, are better in multi-objective compared to the benchmark. The recurrent layer in the DRL agent is also able to remodel the POMDP as a complete MDP environment, which allows prediction of the sparse rating matrix.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Aug 23, 2021
Accession Number: AD1153753

Entities

People

Nurfadhlina Sharef

Organizations

University of Putra Malaysia

Deep Recurrent Q-Network Approach for Multi Objective Markov Decision Process in Partially Observable Environment

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers