EFFECTIVELY CLARIFY CONFUSION VIA VISUALIZED AGGREGATION AND SEPARATION OF DEEP REPRE-SENTATION

Abstract

Clarifying confusion is the most critical issue for improving classification performance. The current mainstream research mainly focuses on solving the confusion in a specific case, such as data insufficiency and class imbalance. In this paper, we propose a novel, simple and intuitive Aggregation Separation Loss (ASLoss), as an adjunct for classification loss to clarify the confusion in some common cases. The ASLoss aggregates the representations of the same class samples as near as possible and separates the representations of different classes as far as possible. We use two image classification tasks with three simultaneous confounding characteristics i.e. data insufficiency, class imbalance, and unclear class evidence to demonstrate ASLoss. Representation visualization, confusion comparison and detailed comparison experiments are conducted. The results show that representations in deep spaces extracted by ASLoss are sufficiently clear and distinguishable, the confusion among different classes is significantly clarified and the optimal network using ASLoss reaches the state-of-the-art level.

1. INTRODUCTION

Aggregating Aggregated Aggregated Clarifying confusion is the most critical issue for improving classification performance. In fact, all prediction mistakes in classification are confusion, i.e., the model incorrectly considers samples of class "A" as class "B". Confusion occurs with almost all classification models but tends to be ignored in excellentperforming models because the mainstream datasets are artificially constructed to be nearly perfect In this paper, we propose a novel, simple and intuitive method called Aggregation Separation Loss (ASLoss), as an adjunct for classification loss. This loss can be adopted on any linear feature extraction layers as shown in Figure 2 , constructing distinguishable representations in geometric spaces of deep features to clarify the confusion in common cases as shown in 1. It aggregates the representations of the same class samples as near as possible and separates the representations of different classes as far as possible to mine the commonalities of the same class and the gaps among different classes. To interpret its effect, the distinguishable representations can be visualized by condensing the representation layers into two dimensions. We validate our method using two image classification tasks that simultaneously have three easily confusion-caused common conditions: data insufficiency, class imbalance, and unclear evidence. The experimental results show that the representations in deep geometric spaces are sufficiently clearer, the performances of various deep networks are efficiently improved and the optimal network achieves state-of-the-art. The code for this work is available on GitHubfoot_0 .

2. RELATED WORK

Contrastive learning is a series of unsupervised methods that use an agent task to minimize the distances among varieties of the same sample and maximize the distances among different sample varieties Chen et al. ( 2020 2019). These excellent methods using unsupervised pretrain improve downstream classification by finetuning. The key is using the agent task which transforms a sample into many varieties. Our method is plug-andplay without pretraining and transforming. And ASloss directly pulls and pushes the representations of samples but not varieties. et al. (2019b) . Some of them propose better activation methods and others use linear transformation. Our method is similar to the linear transformation series but uses linear transformation on multiple layers, leading to more separable representations. Furthermore, our method does not set the class interval but sets a scope to make the distance between classes as large as possible, which is more straightforward. Triple loss sets a triple of (anchor, positive, negative) to pull the same class samples and push other class samples Yuan et al. (2020); Schroff et al. (2015) . ASLoss is more flexible, calculating all



The link will be open if our work can be accepted.



Figure 1: Schematic of Aggregation Separation Loss.

Figure 2: Schematic of Using Aggregation Separation Loss.

); Gao et al. (2021); Wang et al. (2021b); Bachman et al. (

Metric learning also optimizes sample distances Wang et al. (2018b) Wang et al. (2018a; 2017a); Liu et al. (2017); Zhou et al. (2019); Wang et al. (2019a); Xu et al. (2019); Zheng et al. (2019); Wang

