EXPLAINING REPRESENTATION BOTTLENECKS OF CONVOLUTIONAL DECODER NETWORKS

Abstract

In this paper, we prove representation bottlenecks of a cascaded convolutional decoder 1 network, considering the capacity of representing different frequency components of an input sample. We conduct the discrete Fourier transform on each channel of the feature map in an intermediate layer of the decoder network. Then, we introduce the rule of the forward propagation of such intermediate-layer spectrum maps, which is equivalent to the forward propagation of feature maps through a convolutional layer. Based on this, we find that each frequency component in the spectrum map is forward propagated independently with other frequency components. Furthermore, we prove two bottlenecks in representing feature spectrums. First, we prove that the convolution operation, the zero-padding operation, and a set of other settings all make a convolutional decoder network more likely to weaken high-frequency components. Second, we prove that the upsampling operation generates a feature spectrum, in which strong signals repetitively appears at certain frequencies. We will release all codes when this paper is accepted.

1. INTRODUCTION

Deep neural networks (DNNs) have exhibited superior performance in many tasks. However, in recent years, many studies discovered some theoretical defects of DNNs, e.g., the vulnerability to adversarial attacks (Goodfellow et al., 2014) , and the difficulty of learning middle-complex interactions (Deng et al., 2022) . Besides, other studies explained typical phenomena during the training of DNNs, e.g., the double-descent phenomenon (Nakkiran et al., 2019) , the information bottleneck hypothesis (Tishby & Zaslavsky, 2015) , and the lottery ticket hypothesis (Frankle & Carbin, 2018) . In comparison, in this study, we propose a new perspective to investigate how a cascaded convolutional decoderfoot_0 network represents features at different frequencies. I.e., when we apply the discrete Fourier transform (DFT) to each channel of the feature map or the input sample, we try to prove which frequency components of each input channel is usually strengthened/weakened by the network. To this end, previous studies (Xu et al., 2019a; Rahaman et al., 2019) claimed that DNNs were less likely to encode high-frequency components. However, these studies focused on a specific frequency that took the landscape of the loss function on all input samples as the time domain. In comparison, we focus on a fully different type of frequency, i.e., the frequency w.r.t. the DFT on an input image or a feature map. • Reformulating forward propagation in the frequency domain. As the basis for subsequent theoretical proof, we discover that we can reformulate the traditional forward propagation of feature maps as a new forward propagation on the feature spectrum. We derive the rule that forward propagates spectrums of different channels through a cascaded convolutional network, which is mathematically equivalent to the forward propagation on feature maps through a cascaded convolutional network. • Based on the reformulation of the forward propagation, we prove the following conclusions. (1) The layerwise forward propagation of each frequency component of the spectrum map is independent with other frequency components. In the forward propagation process, each frequency component of the feature spectrum is forward propagated independently with other frequency components, if the convolution operation does not change the size of the feature map in each channel. In



Here, the decoder represents a typical network, whose feature map size is non-decreasing during the forward propagation.

