MIND THE PAD -CNNS CAN DEVELOP BLIND SPOTS

Abstract

We show how feature maps in convolutional networks are susceptible to spatial bias. Due to a combination of architectural choices, the activation at certain locations is systematically elevated or weakened. The major source of this bias is the padding mechanism. Depending on several aspects of convolution arithmetic, this mechanism can apply the padding unevenly, leading to asymmetries in the learned weights. We demonstrate how such bias can be detrimental to certain tasks such as small object detection: the activation is suppressed if the stimulus lies in the impacted area, leading to blind spots and misdetection. We propose solutions to mitigate spatial bias and demonstrate how they can improve model accuracy.

1. MOTIVATION

Convolutional neural networks (CNNs) serve as feature extractors for a wide variety of machinelearning tasks. Little attention has been paid to the spatial distribution of activation in the feature maps a CNN computes. Our interest in analyzing this distribution is triggered by mysterious failure cases of a traffic light detector: The detector successfully detects a small but visible traffic light in a road scene. However, it fails completely in detecting the same traffic light in the next frame captured by the ego-vehicle. The major difference between both frames is a limited shift along the vertical dimension as the vehicle moves forward. Therefore, the drastic difference in object detection is surprising given that CNNs are often assumed to have a high degree of translation invariance [8; 17] . The spatial distribution of activation in feature maps varies with the input. Nevertheless, by closely examining this distribution for a large number of samples, we found consistent patterns among them, often in the form of artifacts that do not resemble any input features. This work aims to analyze the root cause of such artifacts and their impact on CNNs. We show that these artifacts are responsible for the mysterious failure cases mentioned earlier, as they can induce 'blind spots' for the object detection head. Our contributions are: • Demonstrating how the padding mechanism can induce spatial bias in CNNs (Section 2). • Demonstrating how spatial bias can impair downstream tasks (Section 3). • Identifying uneven application of 0-padding as a resolvable source of bias (Section 5). • Relating the padding mechanism with the foveation behavior of CNNs (Section 6). • Providing recommendations to mitigate spatial bias and demonstrating how this can prevent blind spots and boost model accuracy.

2. THE EMERGENCE OF SPATIAL BIAS IN CNNS

Our aim is to determine to which extent activation magnitude in CNN feature maps is influenced by location. We demonstrate our analysis on a publicly-available traffic-light detection model [36] . This model implements the SSD architecture [26] in TensorFlow [1], using as a feature extractor. The model is trained on the BSTLD dataset [4] which annotates traffic lights in road scenes. Figure 1 shows two example scenes from the dataset. For each scene, we show two feature maps computed by two filters in the 11 th convolutional layer. This layer contains 512 filters whose feature maps are used directly by the first box predictor in the SSD to detect small objects.

