ENABLING BINARY NEURAL NETWORK TRAINING ON THE EDGE

Abstract

The ever-growing computational demands of increasingly complex machine learning models frequently necessitate the use of powerful cloud-based infrastructure for their training. Binary neural networks are known to be promising candidates for on-device inference due to their extreme compute and memory savings over higher-precision alternatives. In this paper, we demonstrate that they are also strongly robust to gradient quantization, thereby making the training of modern models on the edge a practical reality. We introduce a low-cost binary neural network training strategy exhibiting sizable memory footprint reductions and energy savings vs Courbariaux & Bengio's standard approach. Against the latter, we see coincident memory requirement and energy consumption drops of 2-6×, while reaching similar test accuracy in comparable time, across a range of small-scale models trained to classify popular datasets. We also showcase ImageNet training of ResNetE-18, achieving a 3.12× memory reduction over the aforementioned standard. Such savings will allow for unnecessary cloud offloading to be avoided, reducing latency and increasing energy efficiency while also safeguarding privacy.

1. INTRODUCTION

Although binary neural networks (BNNs) feature weights and activations with just single-bit precision, many models are able to reach accuracy indistinguishable from that of their higher-precision counterparts (Courbariaux & Bengio, 2016; Wang et al., 2019b) . Since BNNs are functionally complete, their limited precision does not impose an upper bound on achievable accuracy (Constantinides, 2019) . BNNs represent the ideal class of neural networks for edge inference, particularly for custom hardware implementation, due to their use of XNOR for multiplication: a fast and cheap operation to perform. Their use of compact weights also suits systems with limited memory and increases opportunities for caching, providing further potential performance boosts. FINN, the seminal BNN implementation for field-programmable gate arrays (FPGAs), reached the highest CIFAR-10 and SVHN classification rates to date at the time of its publication (Umuroglu et al., 2017) . Despite featuring binary forward propagation, existing BNN training approaches perform backward propagation using high-precision floating-point data types-typically float32-often making training infeasible on resource-constrained devices. The high-precision activations used between forward and backward propagation commonly constitute the largest proportion of the total memory footprint of a training run (Sohoni et al., 2019; Cai et al., 2020) . Additionally, backward propagation with high-precision gradients is costly, challenging the energy limitations of edge platforms. An understanding of standard BNN training algorithms led us to ask two questions: why are highprecision weight gradients used when we are only concerned with weights' signs, and why are highprecision activations used when the computation of weight gradients only requires binary activations as input? In this paper, we present a low-memory, low-energy BNN training scheme based on this intuition featuring (i) the use of binary, power-of-two and 16-bit floating-point data types, and (ii) batch normalization modifications enabling the buffering of binary activations. By increasing the viability of learning on the edge, this work will reduce the domain mismatch between training and inference-particularly in conjunction with federated learning (McMahan et al., 2017; Bonawitz et al., 2019) -and ensure privacy for sensitive applications (Agarwal et al., 2018) . Via the aggressive energy and memory footprint reductions they facilitate, our proposals will enable 1

