IMPROVING ASPECT RATIO DISTRIBUTION FAIRNESS IN FEW-SHOT DETECTOR PRETRAINING VIA COOPER-ATING RPN'

Abstract

Region proposal networks (RPN) are a key component of modern object detectors. An RPN identifies image boxes likely to contain objects, and so worth further investigation. An RPN false negative is unrecoverable, so the performance of an object detector can be significantly affected by RPN behavior, particularly in low-data regimes. The RPN for a few shot detector is trained on base classes. Our experiments demonstrate that, if the distribution of box aspect ratios for base classes is different from that for novel classes, errors caused by RPN failure to propose a good box become significant. This is predictable: for example, an RPN trained on base classes that are mostly square will tend to miss short wide boxes. It has not been noticed to date because the (relatively few) standard base/novel class splits on current datasets do not display this effect. But changing the base/novel split highlights the problem. We describe datasets where the distribution shift is severe using PASCAL VOC, COCO, and LVIS datasets. We show that the effect can be mitigated by training multiple distinct but cooperating specialized RPNs. Each specializes in a different aspect ratio, but cooperation constraints reduce the extent to which the RPNs are tuned. This means that if a box is missed by one RPN, it has a good chance of being picked up by another. Experimental evaluation confirms this approach results in substantial improvements in performance on the ARShift benchmarks, while remaining comparable to SOTA on conventional splits. Our approach applies to any few-shot detector and consistently improves performance of detectors.

1. INTRODUCTION

Most state-of-the-art object detectors follow a two-stage detection paradigm. A region proposal network (RPN) finds promising locations, and these are passed through a classifier to determine what, if any, object is present. In this architecture, if an RPN makes no proposal around an object, the object will not be detected. For a few-shot detector, one splits the classes into base and novel, then trains the RPN and classifier on base classes, fixes the RPN, and finally fine-tunes the classifier on novel classes using the RPN's predictions. Objects in large-scale object detection datasets (e.g. COCO (Lin et al., 2014) ; LVIS (Gupta et al., 2019) ) have typical aspect ratio that varies somewhat from instance to instance, and often differs sharply from category to category. As a result, the few-shot training procedure has a built-in problem with distribution shift. This phenomenon is illustrated in Figure 1 . Imagine all base classes are roughly square, and all novel classes are either short and wide, or tall and narrow. The RPN trained on the base classes should miss some novel class boxes. These boxes will have two effects: the training data the classifier sees will be biased against the correct box shape; and, at run time, the detector may miss objects because of RPN failures. We refer to this problem as the bias (the RPN does not deal fairly with different aspect ratios). The bias occurs because the RPN sees few or no examples of the novel classes during training (Kang et al., 2019; Wang et al., 2020; Yan et al., 2019) . To date, this bias has not been remarked on. This is an accident of dataset construction: the standard base/novel splits in standard datasets do not result in a distribution shift. But other base/novel splits do result in a distribution shift large enough to have notable effects, and Section 3 shows our evidence

