BOUNDARY EFFECTS IN CNNS: FEATURE OR BUG?

Abstract

Recent studies have shown that the addition of zero padding drives convolutional neural networks (CNNs) to encode a significant amount of absolute position information in their internal representations, while a lack of padding precludes position encoding. Additionally, various studies have used image patches on background canvases (e.g., to accommodate that inputs to CNNs must be rectangular) without consideration that different backgrounds may contain varying levels of position information according to their color. These studies give rise to deeper questions about the role of boundary information in CNNs, that are explored in this paper: (i) What boundary heuristics (e.g., padding type, canvas color) enable optimal encoding of absolute position information for a particular downstream task?; (ii) Where in the latent representations do boundary effects destroy semantic and location information?; (iii) Does encoding position information affect the learning of semantic representations?; (iv) Does encoding position information always improve performance? To provide answers to these questions, we perform the largest case study to date on the role that padding and border heuristics play in CNNs. We first show that zero padding injects optimal position information into CNNs relative to other common padding types. We then design a series of novel tasks which allow us to accurately quantify boundary effects as a function of the distance to the border. A number of semantic objectives reveal the destructive effect of dealing with the border on semantic representations. Further, we demonstrate that the encoding of position information improves separability of learned semantic features. Finally, we demonstrate the implications of these findings on a number of real-world tasks to show that position information can act as a feature or a bug.

1. INTRODUCTION

One of the main intuitions behind the success of CNNs for visual tasks such as image classification (Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; Szegedy et al., 2015; Huang et al., 2017 ), video classification (Karpathy et al., 2014; Yue-Hei Ng et al., 2015; Carreira & Zisserman, 2017) , object detection (Ren et al., 2015; Redmon et al., 2016; He et al., 2017) , generative image models (Brock et al., 2018) , and semantic segmentation (Long et al., 2015; Noh et al., 2015; Chen et al., 2017; 2018) , is that convolutions add a visual inductive bias to neural networks that objects can appear anywhere in the image. To accommodate the finite domain of images, manual heuristics (e.g., padding) have been applied to allow the convolutional kernel's support to extend beyond the border of an image and reduce the impact of the boundary effects (Wohlberg & Rodriguez, 2017; Tang et al., 2018; Liu et al., 2018a; Innamorati et al., 2019; Liu et al., 2018b) . Recent studies (Pérez et al., 2019; Islam et al., 2020; Kayhan & Gemert, 2020) have shown that zero padding allows CNNs to encode absolute position information despite the presence of pooling layers in their architecture (e.g., global average pooling). In our work, we argue that the relationship between boundary effects and absolute position information extends beyond zero padding and has major implications in a CNN's ability to encode confident and accurate semantic representations (see Fig. 1 ). An unexplored area related to boundary effects is the use of canvases (i.e., backgrounds) with image patches (see Fig. 1 , top row). When using image patches in a deep learning pipeline involving CNNs, the user is required to paste the patch onto a background due to the constraint that the image must be rectangular. Canvases have been used in a wide variety of domains, such as image generation (Gregor et al., 2015; Huang et al., 2019) , data augmentation (DeVries & Taylor, 2017), image inpainting (Demir & Unal, 2018; Yu et al., 2018) , and interpretable AI (Geirhos et al., 2018; Esser et al., 2020) . To the best of our knowledge, this paper contains the first analysis done on canvas value selection. Figure 1 : An illustration of how CNNs use position information to resolve boundary effects. We place CIFAR-10 images in random locations on a canvas of 0's (black) or 1's (white). We evaluate if a ResNet-18, trained w/ or w/o padding for semantic segmentation, can segment the image region. Surprisingly, performance is improved when either zero padding or a black canvas is used, implying position information can be exploited from border heuristics to reduce the boundary effect. Colormap is 'viridis'; yellow is high confidence. In other works, the canvas value is simply chosen based on the authors intuition. Given the pervasiveness of CNNs in a multitude of applications, it is of paramount importance to fully understand what the internal representations are encoding in these networks, as well as isolating the precise reasons that these representations are learned. This comprehension can also allow for the effective design of architectures that overcome recognized shortcomings (e.g., residual connections (He et al., 2016) for the vanishing gradient problem). As boundary effects and position information in CNNs are still largely not fully understood, we aim to provide answers to the following hypotheses which reveal fundamental properties of these phenomenon: Hypothesis I: Zero Padding Encodes Maximal Absolute Position Information: Does zero padding encode maximal position information compared to other padding types? We evaluate the amount of position information in networks trained with different padding types and show zero padding injects more position information than common padding types, e.g., reflection, replicate, and circular. Hypothesis II: Different Canvas Colors Affect Performance: Do different background values have an effect on performance? If the padding value at the boundary has a substantial effect on a CNNs performance and position information contained in the network, one should expect that canvas values may also have a similar effect. Hypothesis III: Position information is Correlated with Semantic Information: Does a network's ability to encode absolute position information affect its ability to encode semantic information? If zero padding and certain canvas colors can affect performance on classification tasks due to the increased position information, we expect that the position information is correlated with a networks ability to encode semantic information. We demonstrate that encoding position information improves the robustness and separability of semantic features. Hypothesis IV: Boundary Effects Occur at All Image Locations: Does a CNN trained without padding suffer in performance solely at the border, or at all image regions? How does the performance change across image locations? Our analysis reveals strong evidence that the border effect impacts a CNN's performance at all regions in the input, contrasting previous assumptions (Tsotsos et al., 1995; Innamorati et al., 2019) that border effects exist solely at the image border. Hypothesis V: Position Encoding Can Act as a Feature or a Bug: Does absolute position information always correlate with improved performance? A CNN's ability to leverage position information from boundary information could hurt performance when a task requires translation-invariance, e.g., texture recognition; however, it can also be useful if the task relies on position information, e.g., semantic segmentation. To give answers to these hypotheses (hereon referred to as H-X), we design a series of novel tasks as well as use existing techniques to quantify the location information contained in different CNNs with various settings. In particular, we introduce location dependant experiments (see Fig. 2 ) which use a grid-based strategy to allow for a per-location analysis of absolute position encoding and performance on semantic tasks. The per-location analysis plays a critical role in representing the boundary effects as a function of the distance to the image border. We also estimate the number of dimensions which encode position information in the latent representations of CNNs. Through these experiments we show both quantitative and qualitative evidence that boundary effects have a substantial effect on CNNs in surprising ways and then demonstrate the practical implications of these findings on multiple real-world applications. Code will be made available for all experiments.

