DYNAMIC GRAPH: LEARNING INSTANCE-AWARE CONNECTIVITY FOR NEURAL NETWORKS

Abstract

One practice of employing deep neural networks is to apply the same architecture to all the input instances. However, a fixed architecture may not be representative enough for data with high diversity. To promote the model capacity, existing approaches usually employ larger convolutional kernels or deeper network structure, which may increase the computational cost. In this paper, we address this issue by raising the Dynamic Graph Network (DG-Net). The network learns the instanceaware connectivity, which creates different forward paths for different instances. Specifically, the network is initialized as a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent the connection paths. We generate edge weights by a learnable module router and select the edges whose weights are larger than a threshold, to adjust the connectivity of the neural network structure. Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability. To facilitate the training, we represent the network connectivity of each sample in an adjacency matrix. The matrix is updated to aggregate features in the forward pass, cached in the memory, and used for gradient computing in the backward pass. We verify the effectiveness of our method with several static architectures, including MobileNetV2, ResNet, ResNeXt, and RegNet. Extensive experiments are performed on ImageNet classification and COCO object detection, which shows the effectiveness and generalization ability of our approach.

1. INTRODUCTION

Deep neural networks have driven a shift from feature engineering to feature learning. The great progress largely comes from well-designed networks with increasing capacity of models (He et al., 2016a; Xie et al., 2017; Huang et al., 2017; Tan & Le, 2019) . To achieve the superior performance, a useful practice is to add more layers (Szegedy et al., 2015) or expand the size of existing convolutions (kernel width, number of channels) (Huang et al., 2019; Tan & Le, 2019; Mahajan et al., 2018) . Meantime, the computational cost significantly increases, hindering the deployment of these models in realistic scenarios. Instead of adding much more computational burden, we prefer adding sampledependent modules to networks, increasing the model capacity by accommodating the data variance. Several existing work attempt to augment the sample-dependent modules into network. For example, Squeeze-and-Excitation network (SENet) (Hu et al., 2018) learns to scale the activations in the channel dimension conditionally on the input. Conditionally Parameterized Convolution (CondConv) (Yang et al., 2019) uses over-parameterization weights and generates individual convolutional kernels for each sample. GaterNet (Chen et al., 2018) adopts a gate network to extract features and generate sparse binary masks for selecting filters in the backbone network based upon inputs. All these methods focus on the adjustment of the micro structure of neural networks, using a data-dependent module to influence the feature representation at the same level. Recall the deep neural network to mammalian brain mechanism in biology (Rauschecker, 1984) , the neurons are linked by synapses and responsible for sensing different information, the synapses are activated to varying degrees when the neurons perceive external information. Such a phenomenon inspires us to design a data-dependent network structure so that different samples will activate different network paths. In this paper, we learn to optimize the connectivity of neural networks based upon inputs. Instead of using stacked-style or hand-designed manners, we allow more flexible selection for forwarding paths.

