TRAINING IMAGE CLASSIFIERS USING SEMI-WEAK LABEL DATA

Abstract

This paper introduces a new semi-weak label learning paradigm which provides additional information in comparison to the weak label classification. We define semi-weak label data as data where we know the presence or absence of a given class and additionally we have the information about the exact count of each class as opposed to knowing the label proportions. A three-stage framework is proposed to address the problem of learning from semi-weak labels. It leverages the fact that counting information is naturally non-negative and discrete. Experiments are conducted on generated samples from CIFAR-10 and we compare our model with a fully-supervised setting baseline, a weakly-supervised setting baseline and a learning from proportion(LLP) baseline. Our framework not only outperforms both baseline models for MIL-based weakly supervised setting and learning from proportion setting, but also gives comparable results compared to the fully supervised model. Further, we conduct thorough ablation studies to analyze across datasets and variation with batch size, losses architectural changes, bag size and regularization, thereby demonstrating robustness of our approach.

1. INTRODUCTION

In a traditional fully supervised machine learning setting, training samples are "strongly" supervised, i.e. every training instance is labeled. In practice, though, strongly labeled data are expensive to collect. An alternate approach is to collect "weak" labels -labels which only indicate the presence or absence of instances of a class in sets of training samples. This form of labelling is particularly useful for data such as images or sounds. For instance, in an image, it is relatively easy to annotate if the image contains instances of (say) dogs, but much harder to tag the bounding box of every dog in the image. The former represents a weak label, while the latter is a strong label. So also, in sound recordings it is much easier to merely annotate if a recording includes gun shots (weak label) than to identify the onset and offset times of each instance of a shot (strong label). At an abstract level, it is useful to think of data such as images or sound recordings as bags of candidate instances (e.g. candidate regions of the image or candidate sections of the recording). Labels are now assigned to bags, rather than instances. A negative label assigned to a bag indicates that the original data (image or recording) did not contain the target class(es), and hence none of the instances in the bag formed from it are positive for any class. On the other hand, a positive label assigned to a bag indicates that some instances in the bag are positive, although it is unknown which or how many. In the real world, it is much easier to collect such bag-level, or weak labels. A number of algorithms have also been proposed to train classifiers with such weak labels Shah et al. ( 2018 2018) (trained from data where every instance is labelled) and that of the weakly supervised model (trained from data with weak labels). The gap in performance limits the ability to which weak supervision (i.e. supervision with weak labels) can be relied upon. In this paper, we introduce a middle ground between the two settings of full and weak supervision. In many settings, it is possible to annotate count information, which specifies the number of occurrences of individual classes in a bag, for not much extra effort over simple weak labeling that merely tags their presence or absence. For instance, it is often fairly straight-forward to annotate how many dogs there are in an image, if the annotation of exact bounding boxes is not required. Similarly, it is 1



); Pappas & Popescu-Belis (2014); Carbonneau et al. (2018); Vanwinckelen et al. (2016). However, there remains a gap between the performance of fully supervised modelsKumar & Raj (2016); Shah et al. (

