Deep Recurrent Q-Network Approach for Multi Objective Markov Decision Process in Partially Observable Environment
Abstract
Prediction of relevant items to the users interest in a recommendation system (RS), is an example of partially observable Markov Decision Process (POMDPs) as users interests fluctuate over time and the items satisfaction rating matrix is typically sparse. This problem also requires multi-objectives optimization (MOO) for multi-objectives which are precision, novelty and diversity. Existing solutions on MOO are based on evolutionary algorithms, which requires combination with rating prediction techniques such as collaborative filtering to fill up the sparse matrix prior to producing recommendation. However, collaborative filtering has limitations when handling cold start or new users. Most RS merely focus on accuracy of high-rating or trendy items predictions. However, other metrics such as novelty and diversity which are equally essential to generate more quality recommendation have mostly been ignored. The main challenge of considering multiple evaluation metrics is the conflict between the objectives, since to improve either one metrics will hurt the accuracy and vice versa. Results have shown that the DRL approaches, which are the first available DRL approach for MOO in movie recommendation, are better in multi-objective compared to the benchmark. The recurrent layer in the DRL agent is also able to remodel the POMDP as a complete MDP environment, which allows prediction of the sparse rating matrix.
Document Details
- Document Type
- Technical Report
- Publication Date
- Aug 23, 2021
- Accession Number
- AD1153753
Entities
People
- Nurfadhlina Sharef
Organizations
- University of Putra Malaysia