BRIDGING BETWEEN POOL-AND STREAM-BASED AC-TIVE LEARNING WITH TEMPORAL DATA COHERENCE Anonymous authors Paper under double-blind review

Abstract

Active learning (AL) reduces the amount of labeled data needed to train a machine learning model by choosing intelligently which instances to label. Classic pool-based AL needs all data to be present in a datacenter, which can be challenging with the increasing amounts of data needed in deep learning. However, AL on mobile devices and robots like autonomous cars can filter the data from perception sensor streams before it even reaches the datacenter. In our work, we investigate AL for such image streams and propose a new concept exploiting their temporal properties. We define three methods using a pseudo uncertainty based on loss learning (Yoo & Kweon, 2019). The first considers the temporal change of uncertainty and requires 5% less labeled data than the vanilla approach. It is extended by the change in latent space in the second method. The third method, temporal distance loss stream (TDLS), combines both with submodular optimization. In our evaluation on an extension of the public Audi Autonomous Driving Dataset (Geyer et al., 2020) we outperform state-of-the-art approaches by using 1% fewer labels. Additionally, we compare our stream-based approaches with existing approaches for AL in a pool-based scenario. Our experiments show that, although pool-based AL has more data access, our stream-based AL approaches need 0.5% fewer labels.

1. INTRODUCTION

Active learning (AL) is a technique to minimize the labeling effort, in which a machine learning model chooses the data to be labeled by itself. It can be divided into two main scenarios, pool-based and stream-based AL (Settles, 2010) . Pool-based AL is a cyclic process of selecting batches of the most promising samples from a pool of data based on a query function. The model is retrained after the selection to start the next iteration of the AL cycle. The data pool is stored such that all samples are always accessible. In contrast, stream-based AL assumes an inflow of samples as a stream and the model decides if a sample should be saved and labeled or disposed. In classic stream-based AL the model is trained with each selected sample (Settles, 2010). However, in deep learning samples are usually selected in batches, due to the long training time of the models. This comes with the risk of selecting samples with an equal information gain. Most approaches ignore this fact or solve it by using a small selection batch size. Besides the scenarios, the selection method, also called querying strategy, is another important factor of AL methods. There are three main categories of AL algorithms: uncertainty-based, diversitybased and learning-based AL (Ren et al., 2022) . The first group are uncertainty-based AL methods, including for example Monte Carlo (MC) dropout methods (Gal & Ghahramani, 2016) or methods approximating the uncertainty by using ensembles (Beluch et al., 2018) . The second group are diversity-based methods like Coreset (Sener & Savarese, 2018) or diverse embedded gradients (Ash et al., 2020) . These methods select samples based on the dataset coverage. The third group are learning-based approaches. These methods, like loss learning (Yoo & Kweon, 2019) , train an additional model, which either predicts a value, determining the usefulness of a sample, or decides if a sample should be selected directly. Recent approaches from this category often include unlabeled data for unsupervised training. Other approaches taking diversity into account usually perform an optimization, which requires constant access to the complete labeled and unlabeled dataset. This decreases the number of needed samples as intended, but the access to unlabeled data makes the transfer to a stream-based scenario impossible. A large body of research in the perception domain focuses on pool-based AL, which requires the transfer of all data to a datacenter. Especially in autonomous driving AL is already an important research topic (Feng et al., 2019 ) (Hekimoglu et al., 2022) . However, the data logistics and data preparation limits the possibilities to apply and scale this approach to open world perception problems, where a lot of data is required. These perceptions task including autonomous driving and robotic perception and environmental sensing. In contrast to pool-based AL, stream-based AL can run directly on mobile devices used in these applications and enables data collection through a large number of agents without a prior transfer to the data center. By performing AL on a mobile robot, it can be applied on temporally coherent camera streams directly, which reduces preprocessing efforts. Based on these considerations we focus on stream-based AL for temporally coherent data. Our contribution can be summarized as follows: We suggest a novel concept of incorporating temporal information into AL, especially stream-based AL. Our concept exploits the temporal change of uncertainty and distance in latent space. Based on this we propose three methods and compare them with state-of-the-art methods in a classification task; the most commonly used task to benchmark AL. We evaluate our methods against other state-of-the-art methods. Therefore, we create a operational domain detection dataset by adding scene annotations to the Audi Autonomous Driving Dataset (A2D2) (Geyer et al., 2020) . Further, we give an overview of the necessary steps to transform a pool-based scenario in a stream-based scenario and perform, to the best of our knowledge, the first direct comparison between stream-based and pool-based AL methods.

2. RELATED WORK

While a lot of authors did great research in the field of pool-based AL, stream-based AL has become unpopular with the rise of deep learning. However, the number of vision sensors receiving constant data streams is increasing, so will the cost of transferring these data to a datacenter. This makes research of stream-based AL techniques interesting, as not all data can be transferred to the datacenter to perform pool-based AL.

2.1. POOL-BASED ACTIVE LEARNING

Sener & Savarese (2018) defined AL as a core set selection problem. The authors aim to select samples that minimize the maximum distance to other not selected points. In this way, it can be formulated as a K-center problem. Solving this is quite costly, so the authors suggested to use a greedy algorithm to approximate the K-center problem. The method will be further denoted as Coreset. In Bayesian active learning with diverse gradient embedding (Badge) Ash et al. (2020) the diversity idea has been extended by taking the prediction uncertainty into account. The authors combined a distance representation of the latent space with pseudo labels based on the highest onehot encoded value to generate gradients. These are created for every class such that the dimension of the embedding is higher than in the Coreset (Sener & Savarese, 2018) approach. The optimal set is estimated using greedy optimization algorithms. An uncertainty-based approach is MC dropout as a Bayesian approximation (Gal & Ghahramani, 2016) . The method uses several dropout layers which are active during the prediction phase. By performing multiple forward passes a distribution over the class predictions is generated where the authors applied the mutual information function in order to calculate the uncertainty of the samples. This is often combined with the Bayesian active learning by disagreement (Houlsby et al., 2011) metrics, considering the mutual information of the multiple forward passes. Their approach has been modified by Kirsch et al. (2019) to take the diversity of the selected batch into account by calculating the joint mutual information. With their BatchBald approach, the authors reduced the selected samples with redundant information in a batch. In contrast to sampling-based approaches, loss learning (Yoo & Kweon, 2019) is a learning-based approach that needs only one forward pass. By adding a loss module to specific layers of the prediction network, the authors predicted the network's loss and used it as pseudo uncertainty for sample selection. However, the loss module can only predict a relative loss. The authors showed the flexibility of the approach for several tasks, which makes it quite popular. Novel learning-based methods like variational adversarial active learning (VAAL) (Sinha et al., 2019) use the unlabeled data as well. An autoencoder is trained to learn a latent space representation of the data based on the labeled and unlabeled set. Based on the latent space encoding, a discriminator model is trained to discriminate between labeled and unlabeled data.

