LEARNING DEEPLY SHARED FILTER BASES FOR EFFICIENT CONVNETS

Abstract

Recently, inspired by repetitive block structure of modern ConvNets, such as ResNets, parameter-sharing among repetitive convolution layers has been proposed to reduce the size of parameters. However, naive sharing of convolution filters poses many challenges such as overfitting and vanishing/exploding gradients, resulting in worse performance than non-shared counterpart models. Furthermore, sharing parameters often increases computational complexity due to additional operations for re-parameterization. In this work, we propose an efficient parametersharing structure and an effective training mechanism for recursive ConvNets. In the proposed ConvNet architecture, convolution layers are decomposed into a filter basis, that can be shared recursively, and non-shared layer-specific parts. We conjecture that a shared filter basis combined with a small amount of layerspecific parameters can retain, or further enhance, the representation power of individual layers, if a proper training method is applied. We show both theoretically and empirically that potential vanishing/exploding gradients problems can be mitigated by enforcing orthogonality to the shared filter bases. Experimental results demonstrate that our scheme effectively reduces redundancy by saving up to 63.8% of parameters while consistently outperforming non-shared counterpart networks even when a filter basis is shared by up to 10 repetitive convolution layers.

1. INTRODUCTION

Modern networks such as ResNets usually have massive identical convolution blocks and recent analytic studies (Jastrzebski et al., 2018) show that these blocks perform similar iterative refinement rather than learning new features. Inspired by these massive identical block structure of modern networks, recursive ConvNets sharing weights across iterative blocks have been studied as a promising direction to parameter-efficient ConvNets (Jastrzebski et al., 2018; Guo et al., 2019; Savarese & Maire, 2019) . However, repetitive use of parameters across many convolution layers incurs several challenges that limit the performance of such recursive networks. First of all, deep sharing of parameters might result in vanishing gradients and exploding gradients problems, which are often found in recurrent neural networks (RNNs) (Pascanu et al., 2013; Jastrzebski et al., 2018) . Another challenge is that overall representation power of the networks might be limited by using same filters repeatedly for many convolution layers. To address aforementioned challenges, in this paper, we propose an effective and efficient parametersharing mechanism for modern ConvNets having many repetitive convolution blocks. In our work, convolution filters are decomposed into a fundamental and reusable unit, which is called a filter basis, and a layer-specific part, which is called coefficients. By sharing a filter basis, not whole convolution filters or a layer, we can impose two desirable properties on the shared parameters: (1) resilience against vanishing/exploding gradients, and (2) representational expressiveness of individual layers sharing parameters. We first show theoretically that a shared filter basis can cause vanishing gradients and exploding gradients problems, and this problem can be controlled to a large extent by making filter bases orthogonal. To enforce the orthogonality of filter bases, we propose an orthogonality regularization to train ConvNets having deeply shared filter bases. Our experimental results show that the proposed orthogonality regularization reduces the redundancy not just in deeply shared filter bases, but also in none-shared parameters, resulting in better performance than over-parameterized counterpart networks. Next, we make convolution layers with shared parameters more expressive using a hybrid approach to sharing filter bases, in which a small number of layer-specific non-shared 1

