SPARSE BINARY NEURAL NETWORKS

Abstract

Quantized neural networks are gaining popularity thanks to their ability to solve complex tasks with comparable accuracy as full-precision Deep Neural Networks (DNNs), while also reducing computational power and storage requirements and increasing the processing speed. These properties make them an attractive alternative for the development and deployment of DNN-based applications in Internet-Of-Things (IoT) devices. Among quantized networks, Binary Neural Networks (BNNs) have reported the largest speed-up. However, they suffer from a fixed and limited compression factor that may result insufficient for certain devices with very limited resources. In this work, we propose Sparse Binary Neural Networks, a novel model and training scheme that allows to introduce sparsity in BNNs by using positive 0/1 binary weights, instead of the -1/+1 weights used by state-ofthe-art binary networks. As a result, our method is able to achieve a high compression factor and reduces the number of operations and parameters at inference time. We study the properties of our method through experiments on linear and convolutional networks over MNIST and CIFAR-10 datasets. Experiments confirm that SBNNs can achieve high compression rates and good generalization, while further reducing the operations of BNNs, making it a viable option for deploying DNNs in very cheap and low-cost IoT devices and sensors.

1. INTRODUCTION

The term Internet-Of-Things (IoT) became notable in the late 2000s under the idea of enabling internet access to electrical and electronic devices (Miraz et al., 2015) , thus allowing them to collect and exchange data. Since its introduction, the number of connected devices has managed to surpass the number of humans connected to the internet (Evans, 2011) . The increasing number of both mobile and embedded IoT devices has led to a sensors-rich world, capable of addressing a various number of real-time applications, such as security systems, healthcare monitoring, environmental meters, factory automation, autonomous vehicles and many others, where both accuracy and time matter (Al-Fuqaha et al., 2015) . At the same time, Deep Neural Networks (DNNs) have reached and surpassed state-of-the-art results for multiple tasks involving images and video (Krizhevsky et al., 2012 ), speech (Hinton et al., 2012) or language processing (Collobert & Weston, 2008) . Thanks to their ability to process large and complex multiple heterogeneous data and extract patterns needed to take autonomous decisions with high reliability (LeCun et al., 2015) , DNNs have the potential of enabling a myriad of new IoT applications. DNNs, however, suffer from high resource consumption, in terms of required computational power, memory and energy consumption (Canziani et al., 2016) . Instead, most IoT devices are characterized by their limited resources. They have limited processing power, small storage capabilities, they are not GPU-enabled and they are powered with batteries of limited capacity, which are expected to last over 10 years without being replaced or recharged (Global System for Mobile Communications, 2018) . All these important constraints remain an important bottleneck towards deploying DNN models in IoT applications (Yao et al., 2018) . Achieving deployment of DNNs in IoT devices requires to compress deep neural networks to fit on IoT devices, while enabling real-time "intelligent" interactions with the environment (Yao et al., 2018) and without degrading their accuracy. Sparsity, compression and quantization, i.e. replacing 

