CLOPS: CONTINUAL LEARNING OF PHYSIOLOGICAL SIGNALS

Abstract

Deep learning algorithms are known to experience destructive interference when instances violate the assumption of being independent and identically distributed (i.i.d). This violation, however, is ubiquitous in clinical settings where data are streamed temporally and from a multitude of physiological sensors. To overcome this obstacle, we propose CLOPS, a replay-based continual learning strategy. In three continual learning scenarios based on three publically-available datasets, we show that CLOPS can outperform the state-of-the-art methods, GEM and MIR. Moreover, we propose end-to-end trainable parameters, which we term taskinstance parameters, that can be used to quantify task difficulty and similarity. This quantification yields insights into both network interpretability and clinical applications, where task difficulty is poorly quantified.

1. INTRODUCTION

Many deep learning algorithms operate under the assumption that instances are independent and identically-distributed (i.i.d.) . The violation of this assumption can be detrimental to the training behaviour and performance of an algorithm. The assumption of independence can be violated, for example, when data are streamed temporally from a sensor. Introducing multiple sensors in a changing environment can introduce covariate shift, arguably the 'Achilles heel' of machine learning model deployment (Quionero-Candela et al., 2009) . A plethora of realistic scenarios violate the i.i.d. assumption. This is particularly true in healthcare where the multitude of physiological sensors generate time-series recordings that may vary temporally (due to seasonal diseases; e.g. flu), across patients (due to different hospitals or hospital settings), and in their modality. Tackling the challenges posed by such scenarios is the focus of continual learning (CL) whereby a learner, when exposed to tasks in a sequential manner, is expected to perform well on current tasks without compromising performance on previously seen tasks. The outcome is a single algorithm that can reliably solve a multitude of tasks. However, most, if not all, research in this field has been limited to a small handful of imaging datasets (Lopez-Paz & Ranzato, 2017; Aljundi et al., 2019b; a) . Although understandable from a benchmarking perspective, such research fails to explore the utility of continual learning methodologies in more realistic healthcare scenarios (Farquhar & Gal, 2018) . To the best of our knowledge, we are the first to explore and propose a CL approach in the context of physiological signals. The dynamic and chaotic environment that characterizes healthcare necessitates the availability of algorithms that are dynamically reliable; those that can adapt to potential covariate shift without catastrophically forgetting how to perform tasks from the past. Such dynamic reliability implies that algorithms no longer needs to be retrained on data or tasks to which it has been exposed in the past, thus improving its data-efficiency. Secondly, algorithms that perform consistently well across a multitude of tasks are more trustworthy, a desirable trait sought by medical professionals (Spiegelhalter, 2020). Our Contributions. In this paper, we propose a replay-based continual learning methodology that is based on the following: 1. Importance-guided storage: task-instance parameters, a scalar corresponding to each instance in each task, as informative signals for loss-weighting and buffer-storage. 2. Uncertainty-based acquisition: an active learning inspired methodology that determines the degree of informativeness of an instance and thus acts as a buffer-acquisition mechanism. 



Figure 1: Illustration of the three continual learning scenarios. A network is sequentially exposed to tasks (Class-IL) with mutually-exclusive pairs of classes, (Time-IL) with data collected at different times of the year, and (Domain-IL) with data from different input modalities.

Continual learning (CL) approaches have resurfaced in recent years(Parisi et al., 2019). Those similar to ours comprise memory-based methods such as iCaRL(Rebuffi et al., 2017), CLEAR(Rolnick et al., 2019), GEM(Lopez-Paz & Ranzato, 2017), and aGEM (Chaudhry et al., 2018). In contrast to our work, the latter two methods naively populate their replay buffer with the last m examples observed for a particular task.Isele & Cosgun (2018)  andAljundi et al. (2019b)  employ a more sophisticated buffer-storage strategy where a quadratic programming problem is solved in the absence of task boundaries.Aljundi et al. (2019a)  introduce MIR whereby instances are stored using reservoir sampling and sampled according to whether they incur the greatest change in loss if parameters were to be updated on the subsequent task. This approach is computationally expensive, requiring multiple forward and backward passes per batch. The application of CL in the medical domain is limited to that ofLenga et al. (2020)  wherein existing methodologies are implemented on chest X-ray datasets. In contrast to previous research that independently investigates buffer-storage and acquisition strategies, we focus on a dual storage and acquisition strategy. Active learning (AL) in healthcare has observed increased interest in recent years, with a review of methodologies provided by Settles (2009). For example, Gong et al. (2019) propose a Bayesian deep latent Gaussian model to acquire important features from electronic health record (EHR) data in MIMIC (Johnson et al., 2016) to improve mortality prediction. In dealing with EHR data, Chen et al. (2013) use the distance of unlabelled samples from the hyperplane in an SVM to acquire datapoints. Wang et al. (2019) implement an RNN to acquire ECG samples during training. Zhou et al. (2017) perform transfer learning in conjunction with a convolutional neural network to acquire biomedical images in an online manner. Smailagic et al. (2018; 2019) actively acquire unannotated medical images by measuring their distance in a latent space to images in the training set. Such similarity metrics, however, are sensitive to the amount of available labelled training data. Gal et al. (2017) adopt BALD (Houlsby et al., 2011) with Monte Carlo Dropout to acquire instances that maximize the Jensen-Shannon divergence (JSD) across MC samples. To the best of our knowledge, we are the first to employ AL-inspired acquisition functions in the context of CL.

