THE CROSSWORD PUZZLE: SIMPLIFYING DEEP NEU-RAL NETWORK PRUNING WITH FABULOUS COORDI-NATES

Abstract

Pruning is a promising technique to shrink the size of Deep Neural Network models with only negligible accuracy overheads. Recent efforts rely on experiencederived metric to guide pruning procedure, which heavily saddles with the effective generalization of pruning methods. We propose The Cross Puzzle, a new method to simplify this procedure by automatically deriving pruning metrics. The key insight behind our method is that: For Deep Neural Network Models, a Pruning-friendly Distribution of model's weights can be obtained, given a proper Coordinate. We experimentally confirm the above insight, and denote the new Coordinate as the Fabulous Coordinates. Our quantitative evaluation results show that: the Crossword Puzzle can find a simple yet effective metric, which outperforms the state-of-the-art pruning methods by delivering no accuracy degradation on ResNet-56 (CIFAR-10)/-101 (ImageNet), while the pruning rate is raised to 70%/50% for the respective models.

1. INTRODUCTION

Pruning Deep Neural Network models is promising the reduce the size of these models while keeping the same level of the accuracy. Prior arts focus on the designs of the pruning method, such as iterative pruning (Han et al. (2015a) , one-shot pruning (Lee et al. ( 2018)), pruning without training (Ramanujan et al. ( 2020)), etc. However, prior works craft the pruning metrics via additional efforts, based on the testing experiences. Our goal in this work is to design a method for automatically searching a proper metric for model pruning. Based on the classic pipelines (e.g. Genetic Algorithm (Mitchell (1998)) and Ant Colony Optimization (Dorigo & Di Caro (1999) ), we first systematically summarize such a method requires three components: ➊ Basic building blocks of pruning criteria; ➋ Objective function to evaluate auto-generated pruning metrics; ➌ Heuristic searching process to guide the searching. Based on the above summary, prior works mainly focus on the first and third components (for instance, we can use L 1 -norm (Li et al. ( 2016)) and geometric median (He et al. ( 2018b)) as building blocks, and simulated annealing (Kirkpatrick et al. (1983) ) as our searching guider). Therefore, it's still unclear that how objective functions should be measured for the quality of a certain pruning metric (namely the unfilled letters in our "crossword puzzle" denotation). This motivates us to examine the essential condition(s) of a good-quality pruning criterion. Based on a simple magnitude-based pruning method (Han et al. (2015b) ) and the follow-up weight distribution analysis (Liu et al. ( 2018)), we formalize that one essential condition and describe it as follows: Given a coordinate Ψ (the formal expression of a pruning criterion) and neural network model M , Ψ is highly likely to be high-qualifiedfoot_0 , if the distribution D(M ) got from Ψ(M ) obeys the following requirements: • Centralized distribution: the statistics are concentrated on one center in the distribution, which is an important symbol of overparameterized neural networks.



We refer "a coordinate to be highly-qualified", if we can use it to prune neural network model with (almost) no accuracy drop under a relatively-high pruning rate.

