ELODI: ENSEMBLE LOGIT DIFFERENCE INHIBITION FOR POSITIVE-CONGRUENT TRAINING Anonymous

Abstract

Negative flips are errors introduced in a classification system when a legacy model is updated. Existing methods to reduce the negative flip rate (NFR) either do so at the expense of overall accuracy by forcing a new model to imitate the old models, or use ensembles, which multiply inference cost prohibitively. We analyze the role of ensembles in reducing NFR and observe that they remove negative flips that are typically not close to the decision boundary, but often exhibit large deviations in the distance among their logits. Based on the observation, we present a method, called Ensemble Logit Difference Inhibition (ELODI), to train a classification system that achieves paragon performance in both error rate and NFR, at the inference cost of a single model. The method distills a homogeneous ensemble to a single student model which is used to update the classification system. ELODI also introduces a generalized distillation objective, Logit Difference Inhibition (LDI), which penalizes changes in the logits between the reference ensemble and the student single model. On multiple image classification benchmarks, model updates with ELODI demonstrate superior accuracy retention and NFR reduction.

1. INTRODUCTION

The rapid development of visual recognition in recent years has led to the need for frequently updating existing models in production-scale systems. However, when replacing a legacy classification model, one has to weigh the benefit of decreased error rate against the risk of introducing new errors that may disrupt post-processing pipelines (Yan et al., 2021) or cause friction with human users (Bansal et al., 2019) . Positive-Congruent Training (PC-Training) refers to any training procedure that minimizes the negative flip rate (NFR) along with the error rate (ER). Negative flips are instances that are misclassified by the new model, but correctly classified by the old one. They are manifest in both visual and natural language tasks (Yan et al., 2021; Xie et al., 2021) . They typically include not only samples close to the decision boundary, but also highconfidence mistakes that lead to perceived "regression" in performance compared to the old model. They are present even in identical architectures trained from different initial conditions, or with different data augmentations, or using different sampling of mini-batches. Yan et al. ( 2021) have shown that in state-of-the-art image classification models, where a 1% improvement is considered significant, NFR can be in the order of 4∼5% even across models that have identical ER. These intriguing properties motivate us to investigate causes of negative flips and mechanism of reducing negative flips to establish a model update method that achieves the cross-model compatibility, thus lower NFR, and lower error rate, for better PC-training. Two questions. A naive approach to cross-model compatibility is to bias one model to mimic the other, as done in model distillation (Hinton et al., 2015) . In this case, however, compatibility comes at the expense of accuracy (Yan et al., 2021; Bansal et al., 2019) . On the other hand, averaging a number of models in a deep ensemble (Lakshminarayanan et al., 2017) can reduce NFR without negative accuracy impact (Yan et al., 2021) , even if it does not explicitly optimize NFR nor its surrogates. The role of ensembles in improving accuracy is widely known, but our first question arises: what is the role of ensembles in reducing NFR? Even though using deep ensembles achieves state-of-the-art performance in terms of reducing NFR (Yan et al., 2021) , it is not viable in real applications at scale since it multiplies the cost of 1

