FTBNN: RETHINKING NON-LINEARITY FOR 1-BIT CNNS AND GOING BEYOND Anonymous

Abstract

Binary neural networks (BNNs), where both weights and activations are binarized into 1 bit, have been widely studied in recent years due to its great benefit of highly accelerated computation and substantially reduced memory footprint that appeal to the development of resource constrained devices. In contrast to previous methods tending to reduce the quantization error for training BNN structures, we argue that the binarized convolution process owns an increasing linearity towards the target of minimizing such error, which in turn hampers BNN's discriminative ability. In this paper, we re-investigate and tune proper non-linear modules to fix that contradiction, leading to a strong baseline which achieves state-of-theart performance on the large-scale ImageNet dataset in terms of accuracy and training efficiency. To go further, we find that the proposed BNN model still has much potential to be compressed by making a better use of the efficient binary operations, without losing accuracy. In addition, the limited capacity of the BNN model can also be increased with the help of group execution. Based on these insights, we are able to improve the baseline with an additional 4∼5% top-1 accuracy gain even with less computational cost. Our code and all trained models will be made public.

1. INTRODUCTION

In the past decade, Deep Neural Networks (DNNs), in particular Deep Convolutional Neural Networks (DCNNs), has revolutionized computer vision and been ubiquitously applied in various computer vision tasks including image classification (Krizhevsky et al., 2012 ), object detection (Liu et al., 2020a) and semantic segmentation (Minaee et al., 2020) . The top performing DCNNs (He et al., 2016; Huang et al., 2017) are data and energy hungry, relying on cloud centers with clusters of energy hungry processors to speed up processing, which greatly impedes their deployment in ubiquitous edge devices such as smartphones, automobiles, wearable devices and IoTs which have very limited computing resources. Therefore, in the past few years, numerous research effort has been devoted to developing DNN compression techniques to pursue a satisfactory tradeoff between computational efficiency and prediction accuracy (Deng et al., 2020) . Among various DNN compression techniques, Binary Neural Networks (BNNs), firstly appeared in the pioneering work by Hubara et al. (2016) , have attracted increasing attention due to their favorable properties such as fast inference, low power consumption and memory saving. In a BNN, the weights and activations during inference are aggressively quantized into 1-bit (namely two values), which can lead to 32× saving in memory footprint and up to 64× speedup on CPUs (Rastegari et al., 2016) . However, the main drawback of BNNs is that despite recent progress (Liu et al., 2018; Gu et al., 2019; Kim et al., 2020b) , BNNs have trailed the accuracy of their full-precision counterparts. This is because the binarization inevitably causes serious information loss due to the limited representational capacity with extreme discreteness. Additionally, the discontinuity nature of the binarization operation brings difficulty to the optimization of the deep network (Alizadeh et al., 2018) . A popular direction on enhancing the predictive performance of a BNN is to make the binary operation mimic the behavior of its full-precision counterpart by reducing the quantization error cuased by the binarization function. For example, XNOR-Net (Rastegari et al., 2016) firstly introduced scaling factors for both the binary weight and activation such that the output of the 1

