REDUCING THE NUMBER OF NEURONS OF DEEP RELU NETWORKS BASED ON THE CURRENT THEORY OF REGULARIZATION Anonymous authors Paper under double-blind review

Abstract

We introduce a new Reduction Algorithm which makes use of the properties of ReLU neurons to reduce significantly the number of neurons in a trained Deep Neural Network. This algorithm is based on the recent theory of implicit and explicit regularization in Deep ReLU Networks from (Maennel et al, 2018) and the authors. We discuss two experiments which illustrate the efficiency of the algorithm to reduce the number of neurons significantly with provably almost no change of the learned function within the training data (and therefore almost no loss in accuracy).



These results state that 2 weight regularization on parameter space is equivalent to L 1 -typed Pfunctionals on function space under certain conditions. This implies that the optimal function could also be represented by finitely many neurons (Rosset et al., 2007) . With the knowledge of these properties, we were able to design a reduction algorithm which can reduce infinitely wide (in practice: arbitrarily wide) layers in our architecture to much smaller layers. This allows us to reduce the number of neurons by 90% to 99% without introducing sparsity (which allows more efficient GPU-implementation (Gale et al., 2020)) and with almost no loss in accuracy. This can be of interest for deploying neural networks on small devices or for making predictions which are computationally less costly and less energy consuming.

1.2. LITERATURE / LINK TO OTHER RESEARCH

Many papers have been written on the subject of reducing neural networks. There is the approach of weight pruning, by removing the least salient weights (LeCun et al., 1990; Hassibi & Stork, 1993; Han et al., 2015; Tanaka et al., 2020) . A different technique is pruning neurons Mariet & Sra (2015) ; He et al. (2014); Srinivas & Babu (2015) , which does not introduce sparsity in the network by removing single weights, but reduces the number of neurons. For CNNs there are ways to prune the filters (Li et al., 2016) . In transfer learning, one can prune the weights with decreasing magnitude (Sanh et al., 2020) . All these techniques require the same steps: train a large network, prune and update remaining weights or neurons, retrain. And for too much pruning, the accuracy of the pruned models drops significantly, also it might not always be useful to fine-tune the pruned models (Liu et al., 2018) .



, we investigate a particular type of deep neural network. Its architecture (see section 2) can be better understood, thanks to the previous work on wide shallow neural networks: Neyshabur et al. (2014); Ongie et al. (2019); Savarese et al. (2019); Williams et al. (2019); Maennel et al. (2018); Heiss et al. (2019) and unpublished work of the authors on deep neural networks (with arbitrarily many inputs and outputs).

