LATERAL INHIBITION-INSPIRED STRUCTURE FOR CONVOLUTIONAL NEURAL NETWORK ON IMAGE CLASSIFICATION Anonymous

Abstract

Convolutional neural networks (CNNs) have become powerful and popular tools since deep learning emerged for image classification in the computer vision field. For better recognition, the dimension of both depth and width has been explored, leading to convolutional neural networks with more layers and channels. In addition to these factors, neurobiology suggests lateral inhibition (lateral antagonism, e.g. Mach band effect), a widely existing phenomenon for vision that increases the contrast and sharpness of nearby neuron excitation in the lateral direction to help recognition. However, such mechanism has not been well explored in the design of convolutional neural network. In this paper, we explicitly explore the filter dimension in the lateral direction and propose our lateral inhibition-inspired (LI) structure. Our naive design uses the low-pass filter to mimic the strength decay of lateral interaction from neighbors regarding the distance. One learnable parameter per channel is applied to set the amplitude of the low-pass filter by multiplication, which is flexible to model various lateral interactions (including lateral inhibition). The convolution result is then subtracted from the input, which could increase the contrast and sharpness for better recognition. Furthermore, a learnable scaling factor and shift are applied to adjust the value after subtraction. Our lateral inhibition-inspired (LI) structure works on both plain convolution and the convolutional block with residual connection, while being compatible with the existing modules. Preliminary results demonstrate obvious improvements on the ImageNet dataset for AlexNet (7.58%) and ResNet-18 (0.81%), respectively, with little increase in parameters, indicating the effectiveness of our brain-similar design to help feature learning for image classification from a different perspective.

1. INTRODUCTION

In recent years, convolutional neural networks (CNNs) (Hinton et al., 2012; Simonyan & Zisserman, 2015; Szegedy et al., 2015; He et al., 2016) have become powerful and popular tools since deep learning emerged for image classification in the computer vision field. They have recorded record-breaking performance and outperformed traditional methods (Quinlan, 1986; Cortes & Vapnik, 1995) with hand-crafted features (Lowe, 1999; Dalal & Triggs, 2005) on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) (Deng et al., 2009) . Today, convolutional neural networks still possess unique merits, as they have been studied the most, along with the fact that convolution has a strong connection with the human vision system and image processing, making them good models for feature learning research. Different factors have been explored to improve recognition performance of convolutional neural networks. VGGNet (Simonyan & Zisserman, 2015) applies a small convolution kernel size (3 × 3) for increased network depth, while ResNet (He et al., 2016) introduces deep residual learning to make training very deep deep networks feasible. The success of such networks indicates that depth is a crucial factor for recognition performance. Wide Residual Networks (Zagoruyko & Komodakis, 2016) , on the other hand, demonstrate width as another important factor to improved performance. In addition to these factors, neurobiology suggests the widely existing lateral inhibition (lateral antagonism, e.g. Mach band effect, shown in Fig. 1 ), a phenomenon that increases the contrast and sharpness of nearby neuron excitation in the lateral direction, also important to help feature learning.

