E-CRF: EMBEDDED CONDITIONAL RANDOM FIELD FOR BOUNDARY-CAUSED CLASS WEIGHTS CONFUSION IN SEMANTIC SEGMENTATION

Abstract

Modern semantic segmentation methods devote much effect to adjusting image feature representations to improve the segmentation performance in various ways, such as architecture design, attention mechnism, etc. However, almost all those methods neglect the particularity of class weights (in the classification layer) in segmentation models. In this paper, we notice that the class weights of categories that tend to share many adjacent boundary pixels lack discrimination, thereby limiting the performance. We call this issue Boundary-caused Class Weights Confusion (BCWC). We try to focus on this problem and propose a novel method named Embedded Conditional Random Field (E-CRF) to alleviate it. E-CRF innovatively fuses the CRF into the CNN network as an organic whole for more effective end-to-end optimization. The reasons are two folds. It utilizes CRF to guide the message passing between pixels in high-level features to purify the feature representation of boundary pixels, with the help of inner pixels belonging to the same object. More importantly, it enables optimizing class weights from both scale and direction during backpropagation. We make detailed theoretical analysis to prove it. Besides, superpixel is integrated into E-CRF and served as an auxiliary to exploit the local object prior for more reliable message passing. Finally, our proposed method yields impressive results on ADE20K, Cityscapes, and Pascal Context datasets.

1. INTRODUCTION

Semantic segmentation plays an important role in practical applications such as autonomous driving, image editing, etc. Nowadays, numerous CNN-based methods (Chen et al., 2014; Fu et al., 2019; Ding et al., 2019) have been proposed. They attempt to adjust the image feature representation of the model itself to recognize each pixel correctly. However, almost all those methods neglect the particularity of class weights (in the classification layer) that play an important role in distinguishing pixel categories in segmentation models. Hence, it is critical to keep class weights discriminative. Unfortunately, CNN models have the natural defect for this. Generally speaking, most discriminative higher layers in the CNN network always have the larger receptive field, thus pixels around the boundary may obtain confusing features from both sides. As a result, these ambiguous boundary pixels will mislead the optimization direction of the model and make the class weights of such categories that tend to share adjacent pixels indistinguishable. For the convenience of illustration, we call this issue as Boundary-caused Class Weights Confusion (BCWC). We take DeeplabV3+ (Chen et al., 2018a) as an example to train on ADE20K (Zhou et al., 2017) dataset. Then, we count the number of adjacent pixels for each class pair and find a corresponding category that has the most adjacent pixels for each class. Fig 1(a) shows the similarity of the class weight between these pairs in descending order according to the number of adjacent pixels. It is clear that if two categories share more adjacent pixels, their class weights tend to be more similar, which actually indicates that BCWC makes class representations lack discrimination and damages the overall segmentation performance.

