LEARNING TO TAKE A BREAK: SUSTAINABLE OPTIMIZATION OF LONG-TERM USER ENGAGEMENT

Abstract

Optimizing user engagement is a key goal for modern recommendation systems, but blindly pushing users towards increased consumption risks burn-out, churn, or even addictive habits. To promote digital well-being, most platforms now offer a service that periodically prompts users to take a break. These, however, must be set up manually, and so may be suboptimal for both users and the system. In this paper, we propose a framework for optimizing long-term engagement by learning individualized breaking policies. Using Lotka-Volterra dynamics, we model users as acting based on two balancing latent states: drive, and interest-which must be conserved. We then give an efficient learning algorithm, provide theoretical guarantees, and empirically evaluate its performance on semi-synthetic data.

1. INTRODUCTION

As consumers of content, we have come to rely extensively on algorithmic recommendations. This has made the task of recommending-in a relevant, timely, and personalized manner-key to the success of modern media platforms. Most commercial systems are built with the primary aim of optimizing user engagement, a process in which machine learning plays a central role. But alongside their many successes, recommendation systems have also been scrutinized for heedlessly driving users towards excessive and often undesired levels of consumption. This has raised awareness as to the need for redesigning recommendation systems in ways that actively promote digital well-being. How can media platforms balance business goals with user well-being? One prominent approach, which is now offered by most major platforms, is to periodically prompt users to take breaks (Constine, 2018; Perez, 2018) . The idea behind breaks is that occasional disruptions curb the inertial urge for perpetual consumption, and can therefore aid in reducing 'mindless scrolling' (Rauch, 2018) , or even addiction (Montag et al., 2018; Ding et al., 2016) . As a general means for promoting well-being, breaking is psychologically well-grounded (e.g., Danziger et al., 2011; Sievertsen et al., 2016) . But for platforms, breaks serve a utilitarian purpose: their goal is to foster long-term engagement by compensating for the myopic nature of conventional recommendation algorithms, which are typically trained to optimize immediate engagement. Since breaking schedules are applied heuristically on top of existing recommendation policies-and typically need to be set up manually by users-current solutions unlikely utilize their full potential (Monge Roffarello & De Russis, 2019) . In this paper, we propose a disciplined learning framework for responsible and sustainable optimization of long-term user engagement by controlling breaks. Our point of departure is that sustained engagement necessitates sustained user well-being, and here we advocate for breaks as a means to establish both. Focusing on feed-based recommendation, our framework optimizes long-term engagement by learning an optimal breaking policy that prescribes individualized breaking schedules. The challenge in learning to break is that the effects of recommendations on users can slowly accumulate over time, deeming as ineffectual policies that rely on clear signs of over-exposure. To be preemptive, we argue that breaks must be scheduled in a way that anticipates the future trajectory of user behavior, and early on. To achieve this, we introduce a novel class of behavioral models based on Lokta-Volterra (LV) dynamical systems (Lotka, 1910) . These depict users as acting based on two balancing forces: drive to consume and intrinsic interest, with corresponding latent states. Intuitively, high interest increases drive to consume, but prolonged consumption decreases interest; together, these describe how user behavior varies over time and in response to recommendations. Our model captures the notion that interest can exhaust long before over-consumption is observed. This arms our approach with the prescience needed to prevent burn-out by ensuring that interest is sustainably preserved; thus, whereas current solutions target the symptom-ours aims for the cause. Our proposed learning algorithm consists of two steps: First, we embed user interaction sequences in 'LV-space'-the set of all possible trajectories that our behavioral model class can express. Then, we optimize individualized breaking policies by solving an optimal control problem over this latent space, in which the control variable is a breaking schedule applied on top of the existing recommendation scheme. Here the challenge is that different breaking policies can lead to different counterfactual trajectories, of which observational data is only partially informative. Since our goal considers long-term outcomes, our solution is to optimize directly for counterfactual steady-states. From a behavioral perspective, we view this as aiming to steer towards sustainable habits; from a computational perspective, under our choice of policy class, this enables tractable learning. As we show, the optimization landscape of LV equilibria admits a compact representation, whose main benefit is that it can be fully described by predictions of individualized user engagement rates. Practically, this is advantageous, as it circumvents the need to take arbitrary and costly exploration steps, and enables learning using readily available predictive tools (e.g., Gupta et al., 2006) . We make use of a small set of learned predictive models, each trained on a small and minimally-invasive experimental dataset, which allow us to tune our policy to suit different conditions. The final learned policy has an intuitive interpretation: it takes as input a small set of predictions for a user, and via careful interpolation, applies a decision rule that anticipates the effects of breaking on future outcomes (c.f. conventional approaches, which take in predictions and apply the myopic argmax rule). Our main theoretical result is a bound on the expected long-term engagement of our learned breaking policy, relative to the optimal policy in the class. We show that the gap decomposes into three distinct additive terms: (i) predictive error, (ii) modeling error (i.e., embedding distortion), and (iii) variance around the (theoretical) steady state. These provide an intuitive interpretation of the bound, as well as means to understand the effects of different modeling choices. Our proof technique relies on carefully weaving LV equilibrium analysis within conventional concentration bounds for learning. Finally, we provide an empirical evaluation of our approach on semi-synthetic data. Using the MoiveLens 1M dataset, we generate discrete time-series data in a way that captures the essence of our behavioral model, but is different from the actual continuous-time dynamics we optimize over. Results show that despite this gap, our approach improves significantly over myopic baselines, and often closely matches an optimal oracle. Taken together, these demonstrate the potential utility of our approach. Code is available at: https://github.com/lvml-iclr-2023/lvml. Broader perspective. At a high level, our work argues for viewing recommendation as a task of sustainable resource management. As other cognitive tasks, engaging with digital content requires the availability of certain cognitive resources-attentional, executive, or emotional. These resources are inherently limited, and prolonged engagement depletes them (Kahneman, 1973; Muraven & Baumeister, 2000) ; this, in turn, can reduce the capacity of key cognitive processes (e.g., perception, attention, memory, self-control, and decision-making), and in the extreme-cause ego depletion (Baumeister et al., 1998) or cognitive fatigue (Mullette-Gillman et al., 2015) . As a means to allow resources to replenish, 'mental breaks' have been shown to be highly effective (Bergum & Lehr, 1962; Hennfng et al., 1989; Gilboa et al., 2008; Ross et al., 2014; Helton & Russell, 2017) . Nevertheless, traditional approaches to recommendation remain agnostic to the idea that recommending takes a cognitive toll: they simply recommend at each point in time the item predicted to be most engaging (Robertson, 1977) . As an alternative, our approach explicitly models recommendation as a process which draws on these resources, and therefore-must also conserve them. The subclass of 'Predator-Prey' LV dynamics which we draw on are used extensively in ecology for modeling the dynamics of interacting populations, and demonstrate how over-predation can ultimately lead to self-extinction by eliminating the prey population-but also show how enabling resources to naturally replenish ensures sustainable relations. As such, here we advocate for studying recommendation systems as human-centric ecosystems, and take one step towards their sustainable design.

1.1. RELATED WORK

User dynamics: latent states and feedback. A recent body of work aims to capture time-varying behavior by modeling users as acting based on dynamic latent states. Broadly, works in this field

