EPIDEMIOPTIM: A TOOLBOX FOR THE OPTIMIZATION OF CONTROL POLICIES IN EPIDEMIOLOGICAL MODELS

Abstract

Epidemiologists model the dynamics of epidemics in order to propose control strategies based on pharmaceutical and non-pharmaceutical interventions (contact limitation, lock down, vaccination, etc). Hand-designing such strategies is not trivial because of the number of possible interventions and the difficulty to predict long-term effects. This task can be cast as an optimization problem where state-of-the-art machine learning algorithms such as deep reinforcement learning, might bring significant value. However, the specificity of each domainepidemic modelling or solving optimization problem -requires strong collaborations between researchers from different fields of expertise. This is why we introduce EpidemiOptim, a Python toolbox that facilitates collaborations between researchers in epidemiology and optimization. EpidemiOptim turns epidemiological models and cost functions into optimization problems via a standard interface commonly used by optimization practitioners (OpenAI Gym). Reinforcement learning algorithms based on Q-Learning with deep neural networks (DQN) and evolutionary algorithms (NSGA-II) are already implemented. We illustrate the use of EpidemiOptim to find optimal policies for dynamical on-off lock-down control under the optimization of death toll and economic recess using a Susceptible-Exposed-Infectious-Removed (SEIR) model for COVID-19. Using EpidemiOptim and its interactive visualization platform in Jupyter notebooks, epidemiologists, optimization practitioners and others (e.g. economists) can easily compare epidemiological models, costs functions and optimization algorithms to address important choices to be made by health decision-makers.

1. INTRODUCTION

The recent COVID-19 pandemic highlights the destructive potential of infectious diseases in our societies, especially on our health, but also on our economy. To mitigate their impact, scientific understanding of their spreading dynamics coupled with methods quantifying the impact of intervention strategies along with their associated uncertainty, are key to support and optimize informed policy making. For example, in the COVID-19 context, large scale population lock-downs were enforced based on analyses and predictions from mathematical epidemiological models (Ferguson et al., 2005; 2006; Cauchemez et al., 2019; Ferguson et al., 2020) . In practice, researchers often consider a small number of relatively coarse and pre-defined intervention strategies, and run calibrated epidemiological models to predict their impact (Ferguson et al., 2020) . This is a difficult problem for several reasons: 1) the space of potential strategies can be large, heterogeneous and multi-scale (Halloran et al., 2008) ; 2) their impact on the epidemic is often difficult to predict; 3) the problem is multi-objective by essence: it often involves public health objectives like the minimization of the death toll or the saturation of intensive care units, but also societal and economic sustainability. For these reasons, pre-defined strategies are bound to be suboptimal. Thus, a major challenge consists in leveraging more sophisticated and adaptive approaches to identify optimal strategies. Machine learning can be used for the optimization of such control policies, with methods ranging from deep reinforcement learning to multi-objective evolutionary algorithms. In other domains, they have proven efficient at finding robust control policies, especially in high-dimensional nonstationary environments with uncertainty and partial observation of the state of the system (Deb et al., 2007; Mnih et al., 2015; Silver et al., 2017; Haarnoja et al., 2018; Kalashnikov et al., 2018; Hafner et al., 2019 ). Yet, researchers in epidemiology, in public-health, in economics, and in ma-chine learning evolve in communities that rarely cross, and often use different tools, formalizations and terminologies. We believe that tackling the major societal challenge of epidemic mitigation requires interdisciplinary collaborations organized around operational scientific tools and goals, that can be used and contributed by researchers of these various disciplines. To this end, we introduce EpidemiOptim, a Python toolbox that provides a framework to facilitate collaborations between researchers in epidemiology, economics and machine learning. EpidemiOptim turns epidemiological models and cost functions into optimization problems via the standard OpenAI Gym (Brockman et al., 2016) interface that is commonly used by optimization practitioners. Conversely, it provides epidemiologists and economists with an easy-to-use access to a variety of deep reinforcement learning and evolutionary algorithms, capable of handling different forms of multi-objective optimization under constraints. Thus, EpidemiOptim facilitates the independent update of models by specialists of each topic, while enabling others to leverage implemented models to conduct experimental evaluations. We illustrate the use of EpidemiOptim to find optimal policies for dynamical on-off lock-down control under the optimization of death toll and economic recess using an extended Susceptible-Exposed-Infectious-Removed (SEIR) model for COVID-19 from Prague et al. (2020) . Related Work. We can distinguish two main lines of contributions concerning the optimization of intervention strategies for epidemic response. On the one hand, several contributions focus on providing guidelines and identifying the range of methods available to solve the problem. For example, Yáñez et al. ( 2019) framed the problem of finding optimal intervention strategies for a disease spread as a reinforcement learning problem; Alamo et al. ( 2020) provided a road-map that goes from the access to data sources to the final decision-making step; and Shearer et al. ( 2020) highlighted that a decision model for epidemic response cannot capture all of the social, political, and ethical considerations that these decisions impact. These contributions reveal a major challenge for the community: developing tools that can be easily used, configured and interpreted by decision-makers. On the other hand, computational contributions proposed actual concrete implementations of such optimization processes. These contributions mostly differ by their definition of epidemiological models (e.g. SEIR (Yaesoubi et al., 2020) )), of state and action spaces (e.g. using the entire observed epidemic history (Yaesoubi et al., 2020) or an image of the disease outbreak to capture the spatial relationships between locations (Probert et al., 2019)), as well as methods for representing the model decisions in a format suitable to decision-makers (e.g simple summary state representation (Probert et al., 2019) or real-time surveillance data with decision rules (Yaesoubi & Cohen, 2016) ). See Appendix A.2 for a detailed description of the aforementioned papers. Given this high diversity of potential methods in the field, our approach aims at providing a standard toolbox facilitating the comparison of different configurations along the aforementioned dimensions in order to assist decision-makers in the evaluation of the range of possible intervention strategies. Contributions. This paper makes three contributions. First, we formalize the coupling of epidemiological models and optimization algorithms with a particular focus on the multi-objective aspect of such problems (Section 2). Second, based on this formalization, we introduce the EpidemiOptim library, a toolbox that integrates epidemiological models, cost functions, optimization algorithms and experimental tools to easily develop, study and compare epidemic control strategies (Section 3). Third, we demonstrate the utility of the EpidemiOptim library by conducting a case study on the optimization of lock-down policies for the COVID-19 epidemic (Section 4). We use a recent epidemiological model grounded on real data, cost functions based on a standard economical model of GDP loss, as well as state-of-the-art optimization algorithms, all included in the EpidemiOptim library. This is, to our knowledge, the first contribution that provides a comparison of different optimization algorithm performances for the control of intervention strategies on the same epidemiological model. The user of the EpidemiOptim library can interact with the trained policies via Jupyter notebook, exploring the space of cost functions (health cost x economic cost) for a variety of algorithms. The code is made available anonymously at https://tinyurl.com/epidemioptim.



or agent-based models (Chandak et al., 2020)), of optimization methods (e.g. deterministic rules (Tarrataca et al., 2020), Bayesian optimization (Chandak et al., 2020), Deep RL (Arango & Pelov, 2020) or evolutionary optimization (Miikkulainen et al., 2020)), of cost functions (e.g. fixed weighted sum of health and economical costs (Arango & Pelov, 2020), possibly adding constraints on the school closure budget (Libin et al., 2020), or multi-objective optimization (Miikkulainen et al., 2020

