DFPC: DATA FLOW DRIVEN PRUNING OF COUPLED CHANNELS WITHOUT DATA

Abstract

Modern, multi-branched neural network architectures often possess complex interconnections between layers, which we call coupled channels (CCs). Structured pruning of CCs in these multi-branch networks is an under-researched problem, as most existing works are typically designed for pruning single-branch models like VGG-nets. While these methods yield accurate subnetworks, the improvements in inference times when applied to multi-branch networks are comparatively modest, as these methods do not prune CCs, which we observe contribute significantly to inference time. For instance, layers with CCs as input or output take more than 66% of the inference time in ResNet-50. Moreover, pruning in the data-free regime, where data is not used for pruning, is gaining traction owing to privacy concerns and computational costs associated with fine-tuning. Motivated by this, we study the problem of pruning CCs in the data-free regime. To facilitate the development of algorithms to prune CCs, we define Data Flow Couplings (DFCs) to enumerate the layers that constitute coupled connections and the associated transformation. Additionally, saliencies for pruning CCs cannot be gauged in isolation, as there may be discrepancies among the layerwise importance of CCs using conventional scoring strategies. This necessitates finding grouped saliencies to gauge the importance of all corresponding coupled elements in a network. We thus propose the Backwards Graph-based Saliency Computation (BGSC) algorithm, a data-free method that computes saliencies by estimating an upper bound to the reconstruction error of intermediate layers; we call this pruning strategy Data Flow driven Pruning of Coupled channels (DFPC). Finally, we show the efficacy of DFPC for models trained on standard datasets. Since we pruned coupled channels, we achieve up to 1.66x improvements in inference time for ResNet-101 trained on CIFAR-10 with a 5% accuracy drop without fine-tuning. With access to the ImageNet training set, we achieve significant improvements over the data-free method and see an improvement of at least 47.1% in speedup for a 2.3% accuracy drop for ResNet-50 against our baselines. 1

1. INTRODUCTION

As computational resources have become significantly more powerful, deep learning models have become correspondingly larger and more complex as well, with some models possessing billions of parameters (Sevilla et al., 2022) . Moreover, many modern architectures are multi-branched networks due to layer skip connections like Residual Connections (He et al., 2016) that are used to avoid vanishing gradients. These large, complex architectures enable these models to learn patterns in data with better performance in terms of optimization and generalization (Arora et al., 2019; Neyshabur et al., 2019; Zhang et al., 2021) . The benefits of overparameterization in these models come at the cost of increased memory and compute footprint, necessitating the invention of techniques to mitigate them. Techniques such as network pruning (Hoefler et al., 2021 ), quantization(Gholami et al., 2021) , knowledge distillation (Gou et al., 2021), and low-rank decomposition(Jaderberg et al., 2014 ) make it possible to compress overparameterized models in order to improve real-world performance metrics such as inference time and power consumption. Pruning involves discarding elements of neural networks after gauging the importance or saliencies of these elements. Generally, two broad categories of pruning techniques exist in the literature -unstructured pruning, which involves removing individual weights from the model, such as the results in Han et al. (2015) ; LeCun et al. (1989); Tanaka et al. (2020) , and structured pruning (also called channel pruning for CNNs), which involves removing entire neurons or channels from the model (Ding et al., 2021; Luo et al., 2017; Prakash et al., 2019; Singh et al., 2019; Wang et al., 2021; He et al., 2017) . In this work, we focus on structured pruning for multi-branched CNNs. Due to the complicated interconnections that exist in multi-branched networks, pruning multibranched neural networks such as ResNets and MobileNets, raise unique challenges that do not arise when pruning single branch networks such as VGG-nets (Simonyan & Zisserman, 2015). These complex connections, such as residual connections in ResNets, require channels fed into the connection to be of the same dimensions, thus coupling the channels. Pruning such coupled channels (CCs) is generally not addressed in current works on structured pruning, such as Ding et al. ( 2021 2021), which are designed for pruning single-branched networks; for example, in ResNets, only the output channels of the first two layers of a ResNet residual block are pruned, and the channels that feed into the residual connections are ignored when using these methods. Pruning CCs is challenging since not pruning filters from all the associated layers would break the CNN. Furthermore, pruning CCs is crucial as we observe that the layers associated with CCs take up a significant portion of the inference time -more than 66% in ResNet-50. The few methods for pruning CCs currently available generally rely on data-driven statistics of the output layer to infer saliencies and involve heavy finetuning (Chen et al., 2021; Liu et al., 2021; Luo & Wu, 2020; Shen et al., 2021) . However, situations may arise where models trained on proprietary datasets may be distributed but not the dataset for reasons such as privacy, security, and competitive disadvantage (Yin et al., 2020) . Thus, pruning without data is an important challenge and an active area of research (Patil & Dovrolis, 2021; Srinivas & Babu, 2015; Tanaka et al., 2020) . However, these techniques do not address pruning CCs, especially in the one-shot and data-free pruning regime, which is an open problem (Hoefler et al., 2021) . In this work, we aim to prune CCs with the additional challenge of doing so without access to data. Towards answering the posed challenges, our contributions in this work are as follows. 1. Unlike single-branch networks, the CCs in multi-branched networks provide an additional challenge for structured pruning. Identifying the associations between coupled layers, as well as the mappings between them, is a nontrivial task. To address this problem, we define Data Flow Couplings (DFCs) to abstract the notion of coupling in a network by enumerating both the layers and transformations involved in the coupling. Two types of layers are associated with a DFCfeed-in and feed-out layers; their outputs and inputs are involved in the coupling. 2. Pruning involves measuring saliencies for elements to be pruned, which, for CCs, is not straightforward due to interconnections between layers. In Section 4, we investigate whether saliencies of channels in a DFC can be inferred in isolation from the feed-in layers. To do so, we define Maximum Score Disagreement to quantify disagreement in saliencies among the feed-in layers. We empirically observe significant disagreement among saliencies assigned to channels by the feed-in layers suggesting that the importance of such channels cannot be deduced in isolation. This leads us to propose grouped saliencies, with which we can rank coupled elements of a DFC. 3. Measuring the effect of pruning coupled elements of a multi-branch network without data, and thus inferring filter saliencies, is a challenging task. For this, Theorem 1 proposes a saliency mechanism, using the transformations enumerated by the DFCs, that bounds the joint reconstruction error of the outputs of the DFC's feed-out layers without data. To compute these saliencies, we propose the Backwards Graph-based Saliency Computation (BGSC)[1] Algorithm. To mitigate the computational cost of this algorithm (both time and memory) for CNNs, we provide a parallelized implementation of the algorithm owing to its embarrassingly parallel nature. On ResNet-101 for the CIFAR-10 dataset, we obtain a 1.66x inference time speedup for a 5% accuracy drop without



); Joo et al. (2021); Luo et al. (2017); Singh et al. (2019); Wang et al. (

