WEAKLY SUPERVISED KNOWLEDGE TRANSFER WITH PROBABILISTIC LOGICAL REASONING FOR OBJECT DE-TECTION

Abstract

Training object detection models usually requires instance-level annotations, such as the positions and labels of all objects present in each image. Such supervision is unfortunately not always available and, more often, only image-level information is provided, also known as weak supervision. Recent works have addressed this limitation by leveraging knowledge from a richly annotated domain. However, the scope of weak supervision supported by these approaches has been very restrictive, preventing them to use all available information. In this work, we propose ProbKT, a framework based on probabilistic logical reasoning that allows to train object detection models with arbitrary types of weak supervision. We empirically show on different datasets that using all available information is beneficial as our ProbKT leads to significant improvement on target domain and better generalization compared to existing baselines. We also showcase the ability of our approach to handle complex logic statements as supervision signal.

1. INTRODUCTION

Object detection is a fundamental ability of numerous high-level machine learning pipelines such as autonomous driving [4; 16] , augmented reality [42] or image retrieval [17] . However, training state-of-the-art object detection models generally requires detailed image annotations such as the boxcoordinates location and the labels of each object present in each image. If several large benchmark datasets with detailed annotations are available [26; 15] , providing such detailed annotation on new specific datasets comes with a significant cost that is often not affordable for many applications. More frequently, datasets come with only limited annotation, also referred to as weak supervision. This has sparked research in weakly-supervised object detection approaches [25; 6; 40] , using techniques such as multiple instance learning [40] or variations of class activation maps [3]. However, these approaches have been shown to significantly underperform their fully-supervised counterparts in terms of robustness and accurate localization of the objects [39] . An appealing and intuitive approach to improve the performance of weakly supervised object detection is to perform transfer learning from an existing object detection model pre-trained on a fully annotated dataset [14; 46; 43] . This approach, also referred to as transfer learning or domain adaptation, consists in leveraging transferable knowledge from the pre-trained model (such as bounding boxes prediction capabilities) to the new weakly supervised domain. This transfer has been embodied in different ways in the literature. Examples include a simple fine-tuning of the classifier of bounding box proposals of the pre-trained model [43] , or an iterative relabeling of the weakly supervised dataset for retraining a new full objects detection model on the re-labeled data [46] . However, existing approaches are very restrictive in the type of weak supervision they are able to harness. Indeed, some do not support new object classes in the new domain [20] , others can only use a label indicating the presence of an object class [46] . However, in practice, the supervision on the new domain can come in very different forms. For instance, the count of each object class can be given, such as in atom detection from molecule images where only chemical formula might be given. Or, when many objects are present on an image, a range can be provided instead of an exact class counts (e.g. "there are at least 4 cats on this image"). Crucially, this variety of potential supervisory signals on the target domain cannot be fully utilized by existing domain adaption approaches. To address this limitation, we introduce ProbKT, a novel framework that allows to generalize knowledge transfer in object detection to arbitrary types of weak supervision using neural probabilistic logical reasoning [27] . This paradigm allows to connect probabilistic outputs of neural networks with logical rules and to infer the resulting probability of particular queries. One can then evaluate the probability of a query such as "the image contains at least two animals" and differentiate through the probabilistic engine to train the underlying neural network. Our approach allows for arbitrarily complex logical statements and therefore supports weak supervision like class counts or ranges, among other. To our knowledge, this is the first approach to allow for such versatility in utilizing the available information on the new domain. To assess the capabilities of this framework, we provide extensive empirical analysis of multiple object detection datasets. Our approach also supports any type of objects detection backbone architecture. We thus use two popular backbone architectures, DETR [7] and RCNN [34] and evaluate their performance in terms of accuracy, convergence as well of generalization on out-of-distribution data. Our experiments show that, due to its ability to use the complete supervisory signal, our approach outperforms previous works in a wide range of setups.

Key contributions:

(1) We propose a novel knowledge transfer framework for object detection relying on probabilistic programming that uniquely allows using arbitrary types of weak supervision on the target domain. (2) We make our approach amenable to different levels of computational capabilities by proposing different approximations of ProbKT. (3) We provide an extensive experimental setup to study the capabilities of our framework for knowledge transfer and out-of-distribution generalization.

2. RELATED WORKS

A comparative summary of related works is given in Table 1 . We distinguish three main categories: (1) pure weakly supervised object detection methods (WSOD) that do not leverage a richly annotated source domain, (2) unsupervised object detection methods with knowledge transfer (DA or domain adaptation methods) that do not use supervision on the target domain and (3) weakly supervised



Figure 1: ProbKT: Weakly supervised knowledge transfer with probabilistic logical reasoning. (Left) A model can be trained on the source domain using full supervision (labels, positions) but only on a limited set of shapes (cylinders and spheres). (Middle) The pre-trained model does not recognize the cubes from the target domain correctly. (Right) The model can adapt to the target domain after applying ProbKT and can recognize the cubes.

availability

Our code is available at https://github.com/molden/ProbKT 

