BIT-PRUNING: A SPARSE MULTIPLICATION-LESS DOT-PRODUCT

Abstract

Dot-product is a central building block in neural networks. However, multiplication (mult) in dot-product consumes intensive energy and space costs that challenge deployment on resource-constrained edge devices. In this study, we realize energy-efficient neural networks by exploiting a mult-less, sparse dotproduct. We first reformulate a dot-product between an integer weight and activation into an equivalent operation comprised of additions followed by bit-shifts (add-shift-add). In this formulation, the number of add operations equals the number of bits of the integer weight in binary format. Leveraging this observation, we propose Bit-Pruning, which removes unnecessary bits in each weight value during training to reduce the energy consumption of add-shift-add. Bit-Pruning can be seen as soft Weight-Pruning as it prunes bits, not the whole weight element. In extensive experiments, we demonstrate that sparse multless networks trained with Bit-Pruning show a better accuracy-energy trade-off than sparse mult networks trained with Weight-Pruning.

1. INTRODUCTION

Modern deep neural networks (DNNs) contain numerous dot-products between input features and weight matrices. However, it is well known that multiplication (mult) in dot-product consumes intensive energy and space costs, challenging DNNs' deployment on resource-constrained edge devices. This drives several attempts for efficient DNNs by reducing the energy of mult. 





