BASIC BINARY CONVOLUTION UNIT FOR BINARIZED IMAGE RESTORATION NETWORK

Abstract

Lighter and faster image restoration (IR) models are crucial for the deployment on resource-limited devices. Binary neural network (BNN), one of the most promising model compression methods, can dramatically reduce the computations and parameters of full-precision convolutional neural networks (CNN). However, there are different properties between BNN and full-precision CNN, and we can hardly use the experience of designing CNN to develop BNN. In this study, we reconsider components in binary convolution, such as residual connection, Batch-Norm, activation function, and structure, for IR tasks. We conduct systematic analyses to explain each component's role in binary convolution and discuss the pitfalls. Specifically, we find that residual connection can reduce the information loss caused by binarization; BatchNorm can solve the value range gap between residual connection and binary convolution; The position of the activation function dramatically affects the performance of BNN. Based on our findings and analyses, we design a simple yet efficient basic binary convolution unit (BBCU). Furthermore, we divide IR networks into four parts and specially design variants of BBCU for each part to explore the benefit of binarizing these parts. We conduct experiments on different IR tasks, and our BBCU significantly outperforms other BNNs and lightweight models, which shows that BBCU can serve as a basic unit for binarized IR networks.

1. INTRODUCTION

Image restoration (IR) aims to restore a high-quality (HQ) image from its low-quality (LQ) counterpart corrupted by various degradation factors. Typical IR tasks include image denoising, superresolution (SR), and compression artifacts reduction. Due to its ill-posed nature and high practical values, image restoration is an active yet challenging research topic in computer vision. Recently, the deep convolutional neural network (CNN) has achieved excellent performance by learning a mapping from LQ to HQ image patches for image restoration (Chen & Pock, 2016; Zhang et al., 2018a; Tai et al., 2017; Xia et al., 2023) . However, most IR tasks require dense pixel prediction and the powerful performance of CNN-based models usually relies on increasing model size and computational complexity. That requires extensive computing and memory resources. While, most hand-held devices and small drones are not equipped with GPUs and enough memory to store and run the computationally expensive CNN models. Thus, it is quite essential to largely reduce its computation and memory cost while preserving model performance to promote IR models. Binary neural network (Courbariaux et al., 2016) (BNN, also known as 1-bit CNN) has been recognized as one of the most promising neural network compression methods (He et al., 2017; Jacob et al., 2018; Zoph & Le, 2016) for deploying models onto resource-limited devices. BNN could achieve 32× memory compression ratio and up to 64× computational reductions on specially designed processors (Rastegari et al., 2016) . Nowadays, the researches of BNN mainly concentrate on high-level tasks, especially classification (Liu et al., 2018; 2020) , but do not fully explored in lowlevel vision, like image denoising. Considering the great significance of BNN for the deployment of IR deep networks and the difference between high-level and low-level vision tasks, there is an (1) For the IR tasks, we observe that residual connection is quite important for binarized IR networks. That is because that BNN will binarize the input full-precision activations to 1 or -1 before binary convolution (BC). It means that BNN would lose a large amount of information about the value range of activations. By adding the full-precision residual connection for each binary convolution (BC), BNN can reduce the effect of value range information loss. (2) Then, we explore the BN for BBCU. BNN methods (Liu et al., 2020) for classification always adopt a BN in BC. However, in IR tasks, EDSR (Lim et al., 2017) has demonstrated that BN is harmful to SR performance. We find that BN in BNN for IR tasks is useful and can be used to balance the value range of residual connection and BC. Specifically, as shown in Fig. 2 (b) , the values of the full-precision residual connection are mostly in the range of -1 to 1 because the value range of input images is around 0 to 1 or -1 to 1, while the values of the BC without BN are large and ranges from -15 to 15 for the bit counting operation (Fig. ?? ). The value range of BC is larger than that of the residual connection, which covers the preserved full-precision image information in the residual connection and limits the performance of BNN. In Fig. 2 (a), the BN in BNN for image restoration can realize the value range alignment of the residual connection and BC. (3) Based on these findings, to remove BN, we propose a residual alignment (RA) scheme by multiplying the input image by an amplification factor k to increase the value range of the residual connection rather than using BN to narrow the value range of the BC (Fig. 2 (c )). Using this scheme can improve the performance of binarized IR networks and simplify the BNN structure (Sec. 4.5). (4) In Fig. 2 (d) , different from BNNs (Liu et al., 2020; 2018) for classification, we further move the activation function into the residual connection and can improve performance (Sec. 4.5). That is because activation function would narrow the negative value ranges of residual connection. Its information would be covered by the next BC with large negative value ranges (Fig. 2 (c) ). (5) Furthermore, we divide IR networks into four parts: head, body, upsampling, and tail (Fig. 3 (a)). These four parts have different input and output channel numbers. Previous binarized SR networks (Xin et al., 2020; Jiang et al., 2021) merely binarize the body part. However, upsampling part accounts for 52.3% total calculations and needs to be binarized. Besides, the binarized head and tail parts are also worth exploring. Thus, we design different variants of the BBCU to binarize these four parts (Fig. 3 (b) ). Overall, our contributions can be mainly summarized in threefold:



Super-Resolution. Take SRResNet as backbone and test on 4× Urban100. (b) Denoising. Take DnCNN as backbone and test on σ = 15 Urban100.(c) Deblocking.Take DnCNN3 as backbone and test on q = 10 Classic5.

Figure 1: Our BBCU achieves the SOTA performance on IR tasks with efficient computation. urgent need to explore the property of BNN on low-level vision tasks and provide a simple, strong, universal, and extensible baseline for latter researches and deployment. Recently, there have been several works exploring the application of BNN on image SR networks. Specifically, Ma et al. (Ma et al., 2019) tried to binarize the convolution kernel weight to decrease the SR network model size. But, the computational complexity is still high due to the preservation of full-precision activations. Then BAM (Xin et al., 2020) adopted the bit accumulation mechanism to approximate the full-precision convolution for the SR network. Zhang et al. (Zhang et al., 2021b) designed a compact uniform prior to constrain the convolution weight into a highly narrow range centered at zero for better optimization. BTM (Jiang et al., 2021) further introduced knowledge distillation (Hinton et al., 2015) to boost the performance of binarized SR networks. However, the above-mentioned binarized SR networks can hardly achieve the potential of BNN. In this work, we explore the properties of three key components of BNN, including residual connection (He et al., 2016), BatchNorm (BN) (Ioffe & Szegedy, 2015), and activation function (Glorot et al., 2011) and design a strong basic binary Conv unit (BBCU) based on our analyses.

availability

//github.com/

