CONVOLUTION AND POOLING OPERATION MODULE WITH ADAPTIVE STRIDE PROCESS-ING EFFECT

Abstract

Convolutional neural network is one of the representative models of deep learning, which has a wide range of applications. Convolution and pooling are two key operations in convolutional neural networks. They play an important role in extracting input features and mapping low-level semantic features to high-level semantic features. Stride is an important parameter involved in convolution and pooling operations, which refers to the distance of each slide of the convolution kernel (pooling kernel) during the convolution (pooling) operation. The stride has an impact on the granularity of feature extraction and the selection (filtering) of features, thus affecting the performance of convolutional neural networks. At present, in the training of convolutional neural networks, the content of convolution kernel and pooling kernel can be determined by the optimization algorithm based on gradient descent. However, the stride usually cannot be treated similarly, and can only be selected manually as a hyperparameter. Most of the existing related works choose a fixed stride, for example, the value is 1. In fact, different tasks or inputs may require different stride for better model processing. Therefore, this paper views the role of stride in convolution and pooling operation from the perspective of sampling, and proposes a convolution and pooling operation module with adaptive stride processing effect. The feature of the proposed module is that the feature map finally obtained by convolution or pooling operation is no longer limited to equal interval downsampling (feature extraction) according to a fixed stride, but adaptively extracted according to the changes of input features. We apply the proposed module on many convolutional neural network models, including VGG, Alexnet and MobileNet for image classification, YOLOX-S for object detection, Unet for image segmentation, and so on. Simulation results show that the proposed module can effectively improve the performance of existing models.

1. INTRODUCTION

The research on convolutional neural networks started from 1980s to 1990s. Time delay network and LENET-5 were the earliest convolutional neural networks. After the 21st century, with the proposal of deep learning theory and the improvement of computing equipment, convolutional neural networks have developed rapidly and been applied in computer vision (Krizhevsky et al., 2012) , natural language processing(Qiuqiang Kong, 2020) and other fields. Operators in convolutional neural network include convolution operator and pooling operator. The elements of the convolution operator include the size of the convolution kernel, the numerical size of the convolution kernel, the stride of the convolution operation and so on. The elements of the pooling operator include stride, padding, and so on. Convolution neural network in the stride is to point to: convolution kernels or pooling operator acting on by convolution or by pooling area, convolution kernels or pooling operator each sliding distance, convolution and pooling operation is to extract the characteristics of the input and the lower sampling, stride for feature extraction of the characteristics of grain size and trade-off (filtering), which influence the properties of convolution neural network. In the current convolutional neural network, the stride of convolution or pooling operator convolution or pooling operation is manually selected as a hyperparameter and is fixed. The fixed stride means that the sliding distance of convolution kernel (pooling kernel) is the same

