CLOSING THE GENERALIZATION GAP IN ONE-SHOT OBJECT DETECTION Anonymous

Abstract

Despite substantial progress in object detection and few-shot learning, detecting objects based on a single example -one-shot object detection -remains a challenge. A central problem is the generalization gap: Object categories used during training are detected much more reliably than novel ones. We here show that this generalization gap can be nearly closed by increasing the number of object categories used during training. Doing so allows us to beat the state-of-the-art on COCO by 5.4 %AP 50 (from 22.0 to 27.5) and improve generalization from seen to unseen classes from 45% to 89%. We verify that the effect is caused by the number of categories and not the amount of data and that it holds for different models, backbones and datasets. This result suggests that the key to strong fewshot detection models may not lie in sophisticated metric learning approaches, but instead simply in scaling the number of categories. We hope that our findings will help to better understand the challenges of few-shot learning and encourage future data annotation efforts to focus on wider datasets with a broader set of categories rather than gathering more samples per category.

Search Task

Generalization to Novel Categories A B 

1. INTRODUCTION

It's January 2021 and your long awaited household robot finally arrives. Equipped with the latest "Deep Learning Technology", it can recognize over 21,000 objects. Your initial excitement quickly vanishes as you realize that your casserole is not one of them. When you contact customer service they ask you to send some pictures of the casserole so they can fix this. They tell you that the fix will be some time, though, as they need to collect about a thousand images of casseroles to retrain the neural network. While you are making the call your robot knocks over the olive oil because the steam coming from the pot of boiling water confused it. You start filling out the return form ...



Figure 1: A. One-shot object detection: Identify and localize all objects of a certain category within a scene based on a (single) instructive example. B. Increasing the number of categories used during training reduces the generalization gap to novel categories presented at test time (in parenthesis: number of categories in each dataset).

