ERRORAUG: MAKING ERRORS TO FIND ERRORS IN SEMANTIC SEGMENTATION

Abstract

Error Detection SynthCP ErrorAug Figure 1 : We propose ErrorAug as a simple and reliable approach for pixel-wise error detection. ErrorAug allows us to artificially generate more examples of errors which are also of a higher degree of difficulty. ErrorAug improves relative performance of key error detection metrics by over 7.8%/11.2% for in-domain/out-of-domain scenarios versus previous state-of-the-art approach Syn-thCP.

1. INTRODUCTION

Understanding when machine learning models are producing inaccurate predictions is essential for improving the reliability of systems that build upon these models. Recent works in performance prediction have made strides in predicting the performance of classification systems in novel environments Garg et al. (2022) ; Chen et al. (2021) ; Guillory et al. (2021) . However for complex computer vision tasks like semantic segmentation, its important to not only identify when a model produces and inaccurate prediction but also where the models predictions have failed. For instance, in a robotics setting a misclassification far away from the action space of the robot may be less relevant than one in the immediate pathway. As such the task of pixel-wise error detection becomes increasingly important as we strive to produce AI systems that can safely interact with ever-changing real world environments. We propose Error Augmentation (ErrorAug), a process for synthesizing challenging localization errors by applying data transformations on predicted class probabilities independent of any transformations on the input images, as a step for training high-quality pixel-wise error detectors. In order to demonstrate the effectiveness of this process, we propose three swapping operations that when applied to a segmentation map allow us to treat error detection as a supervised learning task and Image ClassSwap SemSeg Model Error < l a t e x i t s h a 1 _ b a s e 6 4 = " Y k v j q o k x F X h 3 W K g F 6 u m G v X o b D 7 c = " > A A A C A H i c b V D L S s N A F J 3 U V 6 2 v q A s X b o Y W o a K U R E T d C E V B X F a w D 2 h C m U y n 7 d D J J M x M x B C y 8 R v 8 A z c u F H H r Z 7 j r 3 z h N u 9 D W A x f O n H M v c + / x Q k a l s q y R k V t Y X F p e y a 8 W 1 t Y 3 N r f M 7 Z 2 G D C K B S R 0 H L B A t D 0 n C K C d 1 R R U j r V A Q 5 H u M N L 3 h 9 d h v P h A h a c D v V R w S 1 0 d 9 T n s U I 6 W l j r n n D J B K S A o v 4 U 3 5 8 R h m z z g 9 7 J g l q 2 J l g P P E n p J S t e g c P Y + q c a 1 j f j v d A E c + 4 Q o z J G X b t k L l J k g o i h l J C 0 4 k S Y j w E P V J W 1 O O f C L d J D s g h Q d a 6 c J e I H R x B T P 1 9 0 S C f C l j 3 9 O d P l I D O e u N x f + 8 d q R 6 F 2 5 C e R g p w v H k o 1 7 E o A r g O A 3 Y p Y J g x W J N E B Z U 7 w r x A A m E l c 6 s o E O w Z 0 + e J 4 2 T i n 1 W O b 3 T a V y B C f J g H x R B G d j g H F T B L a i B O s A g B S / g D b w b T 8 a r 8 W F 8 T l p z x n R m F / y B 8 f U D L r m Y f Q = = < / l a t e x i t > ê = F (x, ŷ) Prediction Error Detection Network ShiftSwap MapSwap < l a t e x i t s h a 1 _ b a s e 6 4 = " g K q  F P j + G 2 w U Z X p s T s F 5 G D r q D n T c = " > A A A C B 3 i c b V D J S g N B E O 2 J W 4 z b q E d B m g Q h o o Q Z E f U Y E g Q P H i K Y B T I h 9 H Q q S Z

SynthCP ErrorAug Error

Figure 3 : The top row illustrates a semantic segmentation task where a segmentation map is predicted from an image. The difference between the top row prediction and the top row label is the error map, depicted in the bottom right. The goal of our approach, ErrorAug, is to accurately predict the error map on the bottom right, based on the image and the prediction. We see that despite the model's simplicity, ErrorAug does a much better job at locating large misclassified regions, like the moving truck in this example, than the previous state-of-the-art approach SynthCP. directly train a model to predict pixel-wise error maps for semantic segmentation systems. Most prior works in this space have attempted to directly train error detectors , but due to their tendency to overfit in this problem setting, they are most frequently presented as naive baselines to motivate more complicated approaches. In this work we show that by including our ErrorAug process into previous pipelines we bypass the complications of previous approaches and can produce state-ofthe-art results using prior codebases direct prediction implementations. Our training procedure is illustrated in figure 2 . The most prominent prior work in this space built upon the success of conditional generative models by using the predicted segmentation map to condition a model whose goal it is the recreate the input image Xia et al. (2020 )Di Biase et al. (2021) . The discrepancies between this reconstruction and the original input image are then used to detect errors. These approaches are not only expensive, but are overly-sensitive to the training dataset as the behavior of these generative models on changing domains remains or poorly understood. Our approach does not require complicated auxiliary tasks or networks. We dramatically improve upon state-of-the-art approach, SynthCP, by simplifying the modeling pipeline into a single neural network, optimizing with a standard binary cross entropy loss, and extending the set of errors observed in training. Figure 1 and 3 illustrate the apple-to-apple comparison between SynthCP (SOTA) and ErrorAug (ours). Error Augmentation, in the context of error detection, can be used to describe any data transformation applied independently to the label predictions in the training pipeline of an error detector that allows the derivation of novel target error maps. In this work, we explore three swapping data transformations (ShiftSwap, ClassSwap, and MapSwap) which generate different types of challenging



Figure 2: Error augmentation allows use to directly train an error detection network with supervised learning. Applying ErrorAug over the model's prediction allows us to create a challenging and diverse set of example errors. Training according to this pipeline leads to error detection model which performs reliably on novel examples.

