FastFill: Efficient Compatible Model Update

Abstract

In many retrieval systems the original high dimensional data (e.g., images) is mapped to a lower dimensional feature through a learned embedding model. The task of retrieving the most similar data from a gallery set to a given query data is performed through a similarity comparison on features. When the embedding model is updated, it might produce features that are not comparable/compatible with features already in the gallery computed with the old model. Subsequently, all features in the gallery need to be re-computed using the new embedding model -a computationally expensive process called backfilling. Recently, compatible representation learning methods have been proposed to avoid backfilling. Despite their relative success, there is an inherent trade-off between the new model performance and its compatibility with the old model. In this work, we introduce FastFill: a compatible model update process using feature alignment and policy based partial backfilling to promptly elevate retrieval performance. We show that previous backfilling strategies suffer from decreased performance and demonstrate the importance of both the training objective and the ordering in online partial backfilling. We propose a new training method for feature alignment between old and new embedding models using uncertainty estimation. Compared to previous works, we obtain significantly improved backfilling results on a variety of datasets: mAP on ImageNet (+4.4%), Places-365 (+2.7%), and VGG-Face2 (+1.3%). Further, we demonstrate that when updating a biased model with FastFill, the minority subgroup accuracy gap promptly vanishes with a small fraction of partial backfilling. 1

1. Introduction

Retrieval problems have become increasingly popular for many real-life applications such as face recognition, voice recognition, image localization, and object identification. In an image retrieval setup, we have a large set of images called the gallery set with predicted labels and a set of unknown query images. The aim of image retrieval is to match query images to related images in the gallery set, ideally of the same class/identity. In practice, we use low-dimensional feature vectors generated by a learned embedding model instead of the original high dimensional images to perform retrieval. When we get access to more or better training data, model architectures, or training regimes we want to update the embedding model to improve the performance of the downstream retrieval task. However, different neural networks rarely learn to generate compatible features even when they have been trained on the same dataset, with the same optimization method, and have the same architecture (Li et al., 2015) . Hence, computing the query features with a new embedding model, whilst keeping the old gallery features, leads to poor retrieval results due to incompatibility of old and new embedding models (Shen et al., 2020) . Old and new features are somewhat compatible, but have a few mismatches shown by red crosses (top). We can improve retrieval performance by backfilling some of the old features to higher quality new features in a specific order (bottom). In a partial backfilling scenario the accuracy improvement depends on the order of backfilling. Right: Retrieval performance for ImageNet as a function of time when updating the embedding model with different backfilling strategies. A backfilling curve reaching the new model performance faster is better. The vanilla solution is to replace the features in the gallery set that have been generated by the old model with features from the new model. This computationally expensive process is called backfilling. In practice, we carry out backfilling offline: we carry on using the old gallery features and the old model for queries whilst computing new gallery features with the new model in the background (see Figure 1 -right). We only use the new gallery features once we've finished updating the entire set. However, for large scale systems this process is computationally expensive and can take months. In many real world systems, the cost of backfilling is a blocker for model updates, despite the availability of a more accurate model. 2021) all proposed methods to update the embedding model to achieve better performance whilst still being compatible with features generated by the old model (see Figure 1 -left-top). Despite relative success, compatibility learning is not perfect: performing retrieval with a mixture of old and new features achieves lower accuracies than when we replace all the old features with new ones. In this work, we focus on closing this performance gap. Further, some previous methods degrade the new model performance when trying to make it more compatible with the old model (Shen et al., 2020; Hu et al., 2022) or requiring availability of side-information, extra features from a separate self-supervised model, (Ramanujan et al., 2022) (which may not be available for an existing system). We relax both constraints in this work. To benefit from the more accurate embedding model sooner and cheaper, we can backfill some or all of the images in an online continuous fashion: We run downstream retrieval tasks on a partially backfilled gallery set where part of the features are computed with the old model, and part of them with the new model (see Figure 1 -left-bottom). In practice, we can consider two scenarios: 1) we will backfill the entire gallery set with a random order in an extended period of time and want to maximize the average performance during the backfilling process (see Figure 1 -right); 2) we have a fixed partial backfilling budget and want to reach optimal retrieval performance after backfilling the allowed number of images (e.g., we can backfill only 10% of the gallery). In both cases, we want to reach the highest possible performance by backfilling the fewest images possible. We demonstrate that the training objective as well as the order by which we backfill images are both crucial. In fact, if we use training losses proposed in the literature and choose a random ordering we may even reduce performance before we see an improvement (see FCT-Random in Figure 2a ) due to the occurrence of 'negative flips' (Zhang et al., 2021) -images that where classified correctly when using the old model but are misclassified when using the new one.



* Corr: florian.jaeckle@gmail.com & mpouransari@apple.com † Work completed during internship at Apple. 1 Code available at https://github.com/apple/ml-fct.



Figure 1: Left: Old and new features for a binary fruits vs. animals classification setup.Old and new features are somewhat compatible, but have a few mismatches shown by red crosses (top). We can improve retrieval performance by backfilling some of the old features to higher quality new features in a specific order (bottom). In a partial backfilling scenario the accuracy improvement depends on the order of backfilling. Right: Retrieval performance for ImageNet as a function of time when updating the embedding model with different backfilling strategies. A backfilling curve reaching the new model performance faster is better.

This has led rise to the study of compatible representation learning. Shen et al. (2020); Budnik & Avrithis (2021); Zhang et al. (2021); Ramanujan et al. (2022); Hu et al. (2022); Zhao et al. (2022); Duggal et al. (

