NON-INHERENT FEATURE COMPATIBLE LEARNING

Abstract

The need of Feature Compatible Learning (FCL) arises from many large scale retrieval-based applications, where updating the entire library of embedding vectors is expensive. When an upgraded embedding model shows potential, it is desired to transform the benefit of the new model without refreshing the library. While progresses have been made along this new direction, existing approaches for feature compatible learning mostly rely on old training data and classifiers, which are not available in many industry settings. In this work, we introduce an approach for feature compatible learning without inheriting old classifier and training data, i.e., Non-Inherent Feature Compatible Learning. Our approach requires only features extracted by old model's backbone and new training data, and makes no assumption about the overlap between old and new training data. We propose a unified framework for FCL, and extend it to handle the case where the old model is a black-box. Specifically, we learn a simple pseudo classifier in lieu of the old model, and further enhance it with a random walk algorithm. As a result, the embedding features produced by the new model can be matched with those from the old model without sacrificing performance. Experiments on ImageNet ILSVRC 2012 and Places365 data proved the efficacy of the proposed approach.

1. INTRODUCTION

In recent years, deep learning based methods achieved huge success in various of computer vision tasks, especially for visual searching since they could provide powerful feature representations. In a typical visual search system, the deployed deep learning model extracts the features of both gallery and query images as discriminate representations. During the retrieval stage, gallery images will be ranked based on their feature distances (e.g. Euclidean distance) to query images. In conventional approaches, the query and gallery features are generated by the same model. Once the deployed model of retrieval system is updated, the entire set of gallery features need to be 'backfilled' or 'reindexed ' (Shen et al., 2020) . As time goes by, the gallery becomes extremely large and 'backfilling' could be a painful process since millions even billions of images need to be re-processed by the new model, which is computationally expensive. There has to be a new mechanism that processes gallery images and the query image with two different models, while still maintaining the retrieval accuracy. In other words, the new deployed model extracted features should be 'compatible' to the existing ones without sacrificing accuracy. Such feature compatible learning problem is also named as 'Backward-Compatible Training' (Shen et al., 2020 ), or 'Asymmetric Metric Learning' (Budnik & Avrithis, 2020) . Existing approaches for feature compatible learning assumed significant overlap between new and old training sets. In Shen et al. (2020) , the training set for new embedding model is a superset of the old set. In Budnik & Avrithis (2020), the training set for large and small models is the same, which means obtaining new model in an incremental way is not possible. Besides, in Shen et al. (2020) , the classifier for old model is also needed for computing the influence loss, which is a strong requirement in real applications. As an example, a model deployed in a recommendation system as a black-box API takes images as input and returns the processed features, but the parameters of the model are not accessible. In addition, its classifier and training details are not available, neither does the formula of the loss function. This kind of setting is quite common for various practical reasons in search, recommendation, content understanding and review applications. To address the limitation, we propose an approach for non-inherent feature compatible learning, which only exploits the old model backbone and new training data. Despite the lack of old training data or old classifier, the new model extracts compatible features without sacrificing accuracy. The proposed approach has three contributions including: • Study and formulate the non-inherent setting of the FCL problem for the first time • Establish a baseline with a data-incremental approach, where performance degradation is prevented by regularizing the training process of the new model • Extend the baseline with a random walk algorithm that further improves accuracy The experiments conducted on several standard data sets validated the effectiveness of the proposed approach.

2. RELATED WORK

Our approach is most relevant to feature compatible learning (Shen et al., 2020; Budnik & Avrithis, 2020; Wang et al., 2020) , which has drawn attention from the research community due to the increasing size of gallery sets, and the heavy workload of re-generating gallery features. Most approaches of incremental learning focus on the stability of classifier output, while feature compatible learning try to solve the feature compatibility problem among different models.

2.3. EMBEDDING LEARNING

Our approach is also relevant to the embedding learning problem that optimizes a distance metric to improve discriminative power and robustness of embedded features. There were efforts on designing powerful network architectures (He et al., 2016; Simonyan & Zisserman, 2014; Szegedy et al., 2015) , discriminative loss functions (Deng et al., 2019; Wang et al., 2018) , and model optimizers (Kingma & Ba, 2015) , but few of them set FCL as a part of their objectives.



Shen et al. (2020)  first formulated the 'backward-compatible' problem by deriving influence loss from an empirical criterion, and solved it by utilizing the old model to regularize the optimization process.InBudnik & Avrithis (2020), authors investigated the problem of asymmetric test, where the gallery images are represented by a teacher model and query images are represented by a student model. A pair-based metric for instance-level image retrieval was proposed to achieve the goal. InWang et al.  (2020), authors proposed Residual Bottleneck Transformation (RBT) blocks for feature embedding transferring. Some previous works(Li et al., 2015; Yu et al., 2018)  discussed the connection between features that learned by different models. However, all methods mentioned above either can not achieve the model compatibility in a data-incremental way or need to exploit old classifier for training. Our proposed method could achieve compatibility without old training data or utilizing any old classifier.2.2 INCREMENTAL LEARNINGIncremental learning and Life-long learning(Rebuffi et al., 2017; Li & Hoiem, 2017)  approaches aimed to stabilize model predictions when updating with new training data. Different from feature compatible problem, incremental learning focuses on maintaining performance on 'old' classes after introducing 'new' ones. InLi & Hoiem (2017), authors utilized knowledge distillation(Hinton et al.,  2015)  and teacher-student models for regularizing features on new data, where model distillation was used as a form of regularization when introducing new classes. InRebuffi et al. (2017), authors proposed to use old class centers to regularize the model learning when new classes were introduced.

