DEEP REINFORCEMENT LEARNING BASED INSIGHT SELECTION POLICY

Abstract

We live in the era of ubiquitous sensing and computing. More and more data is being collected and processed from devices, sensors, and systems. This opens up opportunities to discover patterns from these data that could help in gaining better understanding into the source that produces them. This is useful in a wide range of domains, especially in the area of personal health, in which such knowledge could help in allowing users to comprehend their behavior and indirectly improve their lifestyle. Insight generators are systems that identify such patterns and verbalize them in a readable text format, referred to as insights. The selection of insights is done using a scoring algorithm which aims at optimizing this process based on multiple objectives, e.g., factual correctness, usefulness, and interestingness of insights. In this paper, we propose a novel Reinforcement Learning (RL) framework that for the first time recommends health insights in a dynamic environment based on user feedback and their lifestyle quality estimates. With the use of highly reusable and simple principles of automatic user simulation based on real data, we demonstrate in this preliminary study that the RL solution may improve the selection of insights towards multiple pre-defined objectives.

1. INTRODUCTION

The latest developments in big data, internet of things and personal health monitoring have led to the massive increase in the ease and scale at which data has been collected and processed. Learning from the information present in the data has shown to help to gain wisdom to better run businesses, manage health care services and even maintain a healthier lifestyle. Such understanding are mostly in the form of identifying significant rise or fall of a certain measurement given a context of interest. Let's say that the sleep data logs of a user of a health monitoring service shows that the time at which they went to sleep was later during the weekends in comparison to weekdays. This can be informed to the user as a statement such as, "You went to sleep later during the weekends than the weekdays". Here, the time at which they went to sleep is the measurement and the fact of the day being a weekday or a weekend is the context of interest. We call such statements as 'insights'. Providing such insights that accurately describe the scenarios during which certain health parameter improved or deteriorated could enable the user to make better lifestyle choices. Moreover, it has been accepted Abraham & Michie (2008) that providing relevant information to the user could improve their behavior. The insight generation task can be seen as a natural language generation task where a generator model creates appropriate insight statements. A generalized framework for such an insight generator (Genf) was proposed, in which components to analyze the data and generate the statements played an important role (Susaiyah et al., 2020) . More importantly, the framework has a provision to capture user feedback mechanism that understands what type of insights they are interested in. Implementations of this framework have shown to incorporate the "overgenerate and rank" approach, in which all possible candidates as per definition are generated and later filtered using a calculated rank or a score (Gatt & Krahmer, 2018; Varges & Mellish, 2010) . The selection of the most relevant insight via ranking or scoring from a list of multiple insights is an ongoing research topic. Earlier works have utilized purely statistical insight selection mechanisms where the top ranking insights based on a statistical algorithm are selected (Härmä & Helaoui, 2016) , often combined with machine-readable knowledge (Musto et al., 2017) . Other approaches

