DEEP REINFORCED ACTIVE LEARNING FOR MULTI-CLASS IMAGE CLASSIFICATION

Abstract

High accuracy medical image classification can be limited by the costs of acquiring more data as well as the time and expertise needed to label existing images. In this paper, we apply active learning to medical image classification, a method which aims to maximise model performance on a minimal subset from a larger pool of data. We present a new active learning framework, based on deep reinforcement learning, to learn an active learning query strategy to label images based on predictions from a convolutional neural network. Our framework modifies the deep-Q network formulation, allowing us to pick data based additionally on geometric arguments in the latent space of the classifier, allowing for high accuracy multi-class classification in a batch-based active learning setting, enabling the agent to label datapoints that are both diverse and about which it is most uncertain. We apply our framework to two medical imaging datasets and compare with standard query strategies as well as the most recent reinforcement learning based active learning approach for image classification.

1. INTRODUCTION

Modern methods in machine learning (ML), including deep learning (DL) frameworks, require large amounts of labelled data to train sufficiently well to obtain high performance. Depending on the training task, these data can be very expensive to obtain or annotate, to the extent that traditional approaches become prohibitively costly. Active learning (AL) aims to alleviate this problem by adaptively selecting training samples with the highest value to construct a minimal training dataset with the most information for the ML model. In order to select training samples with the most information, different strategies are used in different AL cycles which can be either constructed based on knowledge of the specific problem one is aiming to learn, or using theoretical criteria to approximate mathematical bounds on information contained in the data. Standard query strategies in AL include the uncertainty-based approach (Lewis and Gale, 1994; Lewis and Catlett, 1994; Shannon, 1948; Scheffer et al., 2001; Esuli and Sebastiani, 2009; Seung et al., 1992; Dagan and Engelson, 1995) , which aim to quantify the model uncertainty about the samples to be selected using different hand-crafted heuristics. Other approaches aim to estimate the expected model change (Roy and Mccallum, 2001; Freytag et al., 2014) , or employ diversity-based approaches to promote diversity in sampling (Bilgic and Getoor, 2009; Gal et al., 2017; Nguyen and Smeulders, 2004) . Some approaches combine different techniques in hybrid-based query strategies, to take into account the uncertainty and diversity of query samples (Ash et al., 2019; Zhdanov, 2019; Shui et al., 2019; Beluch et al., 2018) . Other methods leverage the exploration-exploitation trade-off and reformulate the AL framework as a bandit problem (Hsu and Lin, 2015; Chu and Lin, 2016) or a reinforcement learning problem (Ebert et al., 2012; Long and Hua, 2015; Konyushkova et al., 2017a) , which are however still limited by their reliance on hand-crafted strategies, as opposed to learning a new one. The move towards combining deep learning methods with active learning, to combine the learning capability of the former in the context of high-dimensional data, with the data efficiency of the latter, have led to further methods development. However, combining the two is non-trivial; traditional active learning query strategies label samples one-by-one, and so batch-model deep active learning aims to use batch-based sample querying (Gal and Ghahramani, 2015; Gal et al., 2017; Kirsch et al., 2019; Cardoso et al., 2017) to ensure efficiency in sampling the data. Modern diversity-based approaches in deep active learning include the coreset approach (Sener and Savarese, 2018; Killamsetty et al., 2020; Wei et al., 2015; Shen et al., 2017; Mirzasoleiman et al., 2019) , which aim to minimise the Euclidean distance between the sampled and unsampled data points in the latent space of the trained model. Whilst the coreset approach has been shown to work well for image classification tasks (Sener and Savarese, 2018), the performance deteriorates as the number of classes grows. Furthermore, as the dimensionality of the data grows, the distance measure between data points becomes indistinct due to the curse of dimensionality. Semi-supervised approaches (Sinha et al., 2019; Kim et al., 2020; Zhang et al., 2020) aim to alleviate this issue by using an adversarial network as a sampling strategy to pick data with the largest amount of information in the latent space. Manually designing the DL models in addition to AL query strategies requires both expert knowledge about the task at hand, as well as a lot of compute resources to train the DL model. Furthermore, as the labelling heuristic is generally specific to the dataset of interest, there is little likelihood that the learnt acquisition function is transferrable to other datasets, whereas one based on a meta-learning approach may be more easily applied to other data domains. In this paper we combine active learning, deep learning and reinforcement learning into an end-toend framework which can automate the design of the acquisition function for active learning with high-dimensional, multi-class data, in a pool-based active learning setting. We additionally introduce a batched-labelling approach, enabling us to label multiple datapoints at each step, allowing for much more efficient training. By employing coreset-inspired methods, we encourage the reinforcement learning agent to label samples which maximise uncertainty and are also diverse. We apply our model to medical image classification datasets, covering binary and multi-label classification problems as well as differing imaging modalities. We add a range of noise to the images, in order to simulate a real-world annotation setting, and show that our framework is robust to high levels of noise. We compare the classification accuracies to a wide range of sampling methods, including (Konyushkova et al., 2018) , the most similar framework to ours, and we show we outperform all other strategies.

2. RELATED WORK

Recent papers on combining active learning with reinforcement learning aim to instead learn a policy for labelling data from the unlabelled pool in order to maximise model performance. In (Bachman et al., 2017; Liu et al., 2018) , they use information gathered from an expert oracle to learn the policy, whilst (Pang et al., 2018; Padmakumar et al., 2018) use policy gradient methods to learn the acquisition function. Current papers which aim to combine deep reinforcement learning with active learning are not able to label more than one datapoint per step (Konyushkova et al., 2018; Woodward and Finn, 2017; Pang et al., 2018; Padmakumar et al., 2018; Bachman et al., 2017; Liu et al., 2018; Woodward and Finn, 2017) , with the exception of (Casanova et al., 2020) which labels batches of pixels for semantic segmentation tasks. Many existing works on reinforced active learning focus on the simpler task of a stream-based active learning approach (Fang et al., 2017; Woodward and Finn, 2017) or are limited to binary tasks (Konyushkova et al., 2018; Pang et al., 2018; Liu et al., 2019) , which limit their application to more general, and difficult classification tasks. In (Haussmann et al., 2019) , they learn an acquisition function using a Bayesian neural network which is layered onto a bootstrapped existing heuristic. In this work we focus on medical image classification tasks, which has attracted attention in the field of deep active learning, due to the cost of acquiring medical image data as well as the relatively small size of the datasets. Image segmentation tasks in active learning have traditionally used hand-crafted uncertainty-based acquisition functions (Wen et al., 2018; Smailagic et al., 2018; Konyushkova et al., 2016; Gal and Ghahramani, 2015; Gal et al., 2017; Yang et al., 2017; Ozdemir et al., 2018) . Generative adversarial network (GAN) based methods, used widely for image synthesis, have been used in order to add informative labelled data to limited training sets, which is directly applicable for active learning scenarios (Zhao et al., 2019; Mahapatra et al., 2018; Last et al., 2020) . There recently has been some research applying meta-learning to medical image tasks in an active learning setting; in MedSelect (Smit et al., 2021) they use reinforcement learning to label medical images, and in (Konyushkova et al., 2017b) they use a regression model to learn a query strategy using greedy selection.

