META-ACTIVE LEARNING IN PROBABILISTICALLY-SAFE OPTIMIZATION

Abstract

Learning to control a safety-critical system with latent dynamics (e.g. for deep brain stimulation) requires judiciously taking calculated risks to gain information. We present a probabilistically-safe, meta-active learning approach to efficiently learn system dynamics and optimal configurations. The key to our approach is a novel integration of meta-learning and chance-constrained optimization in which we 1) meta-learn an LSTM-based embedding of the active learning sample history, 2) encode a deep learning-based acquisition function with this embedding into a mixed-integer linear program (MILP), and 3) solve the MILP to find the optimal action trajectory, trading off the predicted information gain from the acquisition function and the likelihood of safe control. We set a new state-of-the-art in active learning to control a high-dimensional system with latent dynamics, achieving a 46% increase in information gain and a 20% speedup in computation time. We then outperform baseline methods in learning the optimal parameter settings for deep brain stimulation in rats to enhance the rats' performance on a cognitive task while safely avoiding unwanted side effects (i.e., triggering seizures).

1. INTRODUCTION

Safe and efficient control of a novel systems with latent dynamics is an important objective in domains from healthcare to robotics. In healthcare, deep brain stimulation devices implanted in the brain can improve memory deficits in patients with Alzheimers (Posporelis et al., 2018) and responsive neurostimulators (RNS) can counter epileptiform activity to mitigate seizures. Yet, the surgeon's trial-and-error process of finding effective RNS parameters for each patient is time-consuming and risky, with poor device settings possibly damaging the brain. Researchers studying active learning and Bayesian optimization have sought to develop algorithms to efficiently and safely learn a systems' dynamics, e.g. learning a brain's dynamics for RNS configuration (Ashmaig et al., 2018; Sui et al., 2018) . However, because these algorithms fail to scale up to higher-dimensional state-action spaces, researchers utilize only simple voltage and frequency controls rather than all 32 channels of the RNS waveform (Ashmaig et al., 2018) . Similarly, tasks in robotics, e.g. learning the dynamics of novel robotic systems (e.g., an autopilot learning to fly a damaged aircraft), require active learning methods that succeed in higher-dimensional domains. In this paper, we develop a probabilistically-safe, meta-active learning approach to tackle these challenging tasks to efficiently learn system dynamics and optimal configurations. We draw inspiration from recent contributions in meta-learning (Finn et al., 2017; Nagabandi et al., 2019; Wang et al., 2016; Andrychowicz et al., 2016) that seek to leverage a distribution over training tasks to optimize the parameters of a neural network for efficient, online adaptation. Researchers have previously investigated meta-learning for active learning, e.g. learning a Bayesian prior over a Gaussian Process (Wang et al., 2018b) for learning an acquisition function. However, these approaches do not consider the important problem of safely and actively learning to control a system with altered dynamics, which is a requirement for safety-critical robotic applications. Furthermore, as we show in Section 5, on challenging control tasks for healthcare and robotics, the performance of prior active learning approaches (Kirsch et al., 2019; Hastie et al., 2017 ) leaves much to be desired. We seek to overcome these key limitations of prior work by harnessing the power of meta-learning for active learning in a chance-constrained optimization framework for safe, online adaptation by encoding a learned representation of sample history. Instead of hand-engineering an acquisition

