BOUNDARY EFFECTS IN CNNS: FEATURE OR BUG?

Abstract

Recent studies have shown that the addition of zero padding drives convolutional neural networks (CNNs) to encode a significant amount of absolute position information in their internal representations, while a lack of padding precludes position encoding. Additionally, various studies have used image patches on background canvases (e.g., to accommodate that inputs to CNNs must be rectangular) without consideration that different backgrounds may contain varying levels of position information according to their color. These studies give rise to deeper questions about the role of boundary information in CNNs, that are explored in this paper: (i) What boundary heuristics (e.g., padding type, canvas color) enable optimal encoding of absolute position information for a particular downstream task?; (ii) Where in the latent representations do boundary effects destroy semantic and location information?; (iii) Does encoding position information affect the learning of semantic representations?; (iv) Does encoding position information always improve performance? To provide answers to these questions, we perform the largest case study to date on the role that padding and border heuristics play in CNNs. We first show that zero padding injects optimal position information into CNNs relative to other common padding types. We then design a series of novel tasks which allow us to accurately quantify boundary effects as a function of the distance to the border. A number of semantic objectives reveal the destructive effect of dealing with the border on semantic representations. Further, we demonstrate that the encoding of position information improves separability of learned semantic features. Finally, we demonstrate the implications of these findings on a number of real-world tasks to show that position information can act as a feature or a bug.

1. INTRODUCTION

One of the main intuitions behind the success of CNNs for visual tasks such as image classification (Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; Szegedy et al., 2015; Huang et al., 2017 ), video classification (Karpathy et al., 2014; Yue-Hei Ng et al., 2015; Carreira & Zisserman, 2017) , object detection (Ren et al., 2015; Redmon et al., 2016; He et al., 2017) , generative image models (Brock et al., 2018) , and semantic segmentation (Long et al., 2015; Noh et al., 2015; Chen et al., 2017; 2018) , is that convolutions add a visual inductive bias to neural networks that objects can appear anywhere in the image. To accommodate the finite domain of images, manual heuristics (e.g., padding) have been applied to allow the convolutional kernel's support to extend beyond the border of an image and reduce the impact of the boundary effects (Wohlberg & Rodriguez, 2017; Tang et al., 2018; Liu et al., 2018a; Innamorati et al., 2019; Liu et al., 2018b) . Recent studies (Pérez et al., 2019; Islam et al., 2020; Kayhan & Gemert, 2020) have shown that zero padding allows CNNs to encode absolute position information despite the presence of pooling layers in their architecture (e.g., global average pooling). In our work, we argue that the relationship between boundary effects and absolute position information extends beyond zero padding and has major implications in a CNN's ability to encode confident and accurate semantic representations (see Fig. 1 ). An unexplored area related to boundary effects is the use of canvases (i.e., backgrounds) with image patches (see Fig. 1 , top row). When using image patches in a deep learning pipeline involving CNNs, the user is required to paste the patch onto a background due to the constraint that the image must be rectangular. Canvases have been used in a wide variety of domains, such as image generation (Gregor et al., 2015; Huang et al., 2019) , data augmentation (DeVries & Taylor, 2017), image inpainting (Demir & Unal, 2018; Yu et al., 2018) , and interpretable AI (Geirhos et al., 2018; Esser et al., 2020) . To the best of our knowledge, this paper contains the first analysis done on canvas value selection.

