TASK REGULARIZED HYBRID KNOWLEDGE DISTIL-LATION FOR CONTINUAL OBJECT DETECTION Anonymous

Abstract

Knowledge distillation has been used to overcome catastrophic forgetting in Continual Object Detection(COD) task. Previous works mainly focus on combining different distillation methods, including feature, classification, location and relation, into a mixed scheme to solve this problem. In this paper, we propose a task regularized hybrid knowledge distillation method for COD task. First, we propose an image-level hybrid knowledge representation by combining instance-level hard and soft knowledge to use teacher knowledge critically. Second, we propose a task-based regularization distillation loss by taking account of loss and category differences to make continual learning more balance between old and new tasks. We find that, under appropriate knowledge selection and transfer strategies, using only classification distillation can also relieve knowledge forgetting effectively. Extensive experiments conducted on MS COCO2017 demonstrate that our method achieves state-of-the-art results under various scenarios. We get an absolute improvement of 27.98 at RelGap under the most difficult five-task scenario. Code is in attachment and will be available on github.

1. INTRODUCTION

The existing object detection models (Ge et al., 2021) mainly adopt overall learning paradigm, in which the annotations of all categories must be available before learning. It assumes that data distribution is fixed or stationary (Yuan et al., 2021) , while data in real-world comes dynamically with a non-stationary distribution. When model learns from incoming data continually, new knowledge interferes with the old one, leading to catastrophic forgetting (McCloskey & Cohen, 1989; Goodfellow et al., 2014) . To solve this problem, continual learning is proposed in recent years and has made progresses in image classification (Zeng et al., 2019; Qu et al., 2021) . On the other hand, continual object detection (COD) is rarely studied. Knowledge distillation (Hinton et al., 2015) has been proved to be an effective method for COD task, in which the model trained on old classes performs as a teacher to guide the training of student model on new classes. There are four kinds of distillation schemes: feature, classification, location and relation distillation. Most previous works combine feature distillation and classification distillation to construct their distillation methods (Li & Hoiem, 2018; Li et al., 2019; Yang et al., 2022b) , while the latest work (Feng et al., 2022) combines classification distillation and location distillation to construct a response-based distillation method. In addition, various distillation losses, based on KL diversity, cross entropy and mean square error, are proposed for knowledge transfer. In summary, the keys of knowledge distillation are what knowledge should be selected from teacher and how it is transferred to student. The former question needs Knowledge Selection Strategy (KSS), while the latter needs Knowledge Transfer Strategy (KTS). Continual object detection face two problems. (1) Teacher outputs probability distributions as logits and converts them into one-hot labels as final predictions. Logits and one-hot labels are regarded as soft and hard knowledge, respectively. Soft knowledge contains confidence relations among categories, but brings knowledge fuzziness inevitably. While, hard knowledge has completely opposite effects. Therefore, how to design KSS to keep balance between accuracy and ambiguity of knowledge is a key problem. (2) Continual learning should maintain old knowledge during the learning of new knowledge to overcome catastrophic forgetting, therefore how to design KTS to keep balance between stability of old knowledge and plasticity of new knowledge is a key problem. This paper

