ACTIVE LEARNING FOR OBJECT DETECTION WITH EVIDENTIAL DEEP LEARNING AND HIERARCHICAL UNCERTAINTY AGGREGATION

Abstract

Despite the huge success of object detection, the training process still requires an immense amount of labeled data. Although various active learning solutions for object detection have been proposed, most existing works do not take advantage of epistemic uncertainty, which is an important metric for capturing the usefulness of a sample. Also, previous works pay little attention to the attributes of each bounding box (e.g., nearest object, box size) when computing the informativeness of an image. In this paper, we propose a new active learning strategy for object detection that overcomes the shortcomings of prior works. To make use of epistemic uncertainty, we adopt evidential deep learning (EDL) and propose a new module termed model evidence head (MEH), that makes EDL highly compatible with object detection. Based on the computed epistemic uncertainty of each bounding box, we propose hierarchical uncertainty aggregation (HUA) for obtaining the informativeness of an image. HUA realigns all bounding boxes into multiple levels based on the attributes and aggregates uncertainties in a bottom-up order, to effectively capture the context within the image. Experimental results show that our solution outperforms existing state-of-the-art methods by a considerable margin.

1. INTRODUCTION

Deep learning contributes to huge success in computer vision problems such as semantic segmentation (Long et al., 2015; Ronneberger et al., 2015; Chen et al., 2018) and object detection (Liu et al., 2016; Lin et al., 2017; Redmon et al., 2016) . However, training a deep neural network typically comes with a cost of large labeled datasets. Labeling data for complex vision problems requires intensive labor of human experts, which makes preparing for practical application challenging. Active learning, which gradually labels a set of samples based on the informativeness (e.g., uncertainty), is a promising solution for this problem due to its simplicity and high performance. Although active learning has been extensively studied on image classification, only a few prior works focused on object detection (Yuan et al., 2021; Su et al., 2020; Haussmann et al., 2020; Yu et al., 2021) despite its practical importance. Furthermore, existing works on active learning for object detection have two limitations. First, when computing the informativeness of an image, most previous works only use the aleatoric uncertainty, not taking the epistemic uncertainty into account. Epistemic uncertainty, also known as knowledge uncertainty, captures the lack of knowledge of a model (caused by a lack of data) and can be reduced when large amounts of data are available. Aleatoric uncertainty, on the other hand, captures the noise inherent in the observed data and is irreducible. As stated in (Nguyen et al., 2022; Hafner et al., 2018; Hüllermeier & Waegeman, 2021) , epistemic uncertainty can reflect the usefulness of samples and support active learning better than aleatoric uncertainty. Secondly, previous works on active learning for object detection generally ignore the attributes of bounding boxes (e.g., nearest object, box size) when computing the informativeness of an image: informativeness is often defined as the maximum or mean of the uncertainty values of all bounding boxes in the image. This can be a problem because a cluttered image with many objects belonging to various categories can be enforced to have a similar uncertainty value relative to just a simple image with only a few objects belonging to a single category.

