A GENERAL COMPUTATIONAL FRAMEWORK TO MEA-SURE THE EXPRESSIVENESS OF COMPLEX NETWORKS USING A TIGHT UPPER BOUND OF LINEAR REGIONS Anonymous

Abstract

The expressiveness of deep neural network (DNN) is a perspective to understand the surprising performance of DNN. The number of linear regions, i.e. pieces that a piece-wise-linear function represented by a DNN, is generally used to measure the expressiveness. And the upper bound of regions number partitioned by a rectifier network, instead of the number itself, is a more practical measurement of expressiveness of a rectifier DNN. In this work, we propose a new and tighter upper bound of regions number. Inspired by the proof of this upper bound and the framework of matrix computation in Hinz & Van de Geer ( 2019), we propose a general computational approach to compute a tight upper bound of regions number for theoretically any network structures (e.g. DNN with all kind of skip connections and residual structures). Our experiments show our upper bound is tighter than existing ones, and explain why skip connections and residual structures can improve network performance.

1. INTRODUCTION

Deep nerual network (DNN) (LeCun et al., 2015) has obtained great success in many fields such as computer vision, speech recognition and neural language process (Krizhevsky et al., 2012; Hinton et al., 2012; Devlin et al., 2018; Goodfellow et al., 2014) . However, it has not been completely understood why DNNs can perform well with satisfying generalization on different tasks. Expressiveness is one perspective used to address this open question. More specifically, one can theoretically study expressiveness of DNNs using approximation theory (Cybenko, 1989; Hornik et al., 1989; Hanin, 2019; Mhaskar & Poggio, 2016; Arora et al., 2016) , or measure the expressiveness of a DNN. While sigmoid or tanh functions are employed as the activation functions in early work of DNNs, rectified linear units (ReLU) or other piece-wise linear functions are more popular in nowadays. Yarotsky (2017) has proved that any DNN with piece-wise linear activation functions can be transformed to a DNN with ReLU. Thus, the study of expressiveness usually focuses on ReLU DNNs. It is known that a ReLU DNN represents a piece-wise linear (PWL) function, which can be regarded to have different linear transforms for each region. And with more regions the PWL function is more complex and has stronger expressive ability. Therefore, the number of linear regions is intuitively a meaningful measurement of expressiveness (Pascanu et al., 2013; Montufar et al., 2014; Raghu et al., 2017; Serra et al., 2018; Hinz & Van de Geer, 2019) . A direct measurement of linear regions number is difficult, if not impossible, and thus the upper bound of linear regions number is practically used as a figure of metrics to characterize the expressiveness. Inspired by the computational framework in (Hinz & Van de Geer, 2019), we improve the upper bound in Serra et al. (2018) for multilayer perceptrons (MLPs) and extend the framework to more complex networks. More importantly, we propose a general approach to construct a more accurate upper bound for almost any type of network. The contributions of this paper are listed as follows. • Through a geometric analysis, we derive a recursive formula for γ, which is a key parameter to construct a tight upper bound. Employing a better initial value, we propose a tighter upper bound for deep fully-connected ReLU networks. In addition, the recursive formula provide a potential to further improve the upper bound given an improved initial value. In this section, we will introduce some definitions and propositions. Since the main computational framework is inspired by Hinz & Van de Geer (2019), some notations and definitions are similar. Let us assume a ReLU MLP has the form as follows. f (x) = W (L) σ(W (L-1) • • • σ (L-1) (W (1) x + b (1) ) • • • + b (L-1) ) + b (L) where x ∈ R n0 , W (i) ∈ R ni×ni-1 , b (i) ∈ R ni and σ(x) = max(x, 0) denoting the ReLU function. W (i) is the weights in the i th layer and b (i) is the bias vector. f (x) can also be written as: For a PWL function f , the domain can be partitioned into different linear regions. Let h 0 (x) = x, h i (x) = σ(W (i) h i-1 (x) + b (i) ), 1 ≤ i < L, f (x) = h L (x) = W (L) h L-1 (x) + b (L) P(f ) = {D i |D i is a linear region of f, ∀D i = D j , D i ∩ D j = ∅} represent all the linear regions of f . We then define the activation pattern of ReLU DNNs as follows. Definition 2. For any x ∈ R n0 , we define the activation pattern of x in i th layer s hi (x) ∈ {0, 1} ni as follows. s hi (x) j = 1, if W (i) j,: h i-1 (x) + b (i) j > 0 0, if W (i) j,: h i-1 (x) + b (i) j ≤ 0 , for i ∈ {1, 2, . . . , L -1}, j ∈ {1, 2, . . . , n i }, where W (i) j,: is the j th row of W (i) , b (i) j is the j th component of b (i) . (Hinz & Van de Geer, 2019) 



the linear region in the following way. Definition 1. For a PWL function f (x) : R n0 → R n L , we define D is a linear region, if D satisfies that: (a) D is connected; (b) f is an affine function on D; (c) Any D D, f is not affine on D .

Different from Hinz & Van de Geer (2019), we not only consider deep fully-connected ReLU networks, but also extend the computational framework to more widely used network architectures, such as skip connections, pooling layers and so on. With the extension, the upper bound of U-Net(Ronneberger et al., 2015)  or other common networks can be computed. By comparing the upper bound of different networks, we show the relation between expressiveness of networks with or without special structures.• Our experiments show that novel network structures enhance the upper bound in most cases. For cases in which the upper bound is almost not enhanced by novel network settings, we explain it by analysing the partition efficiency and the practical number of linear regions.There are literature on the linear regions number in the case of ReLU DNNs. Pascanu et al. (2013) compare the linear regions number of shallow networks by providing a lower bound. Montufar et al. (2014) give a simple but improved upper bound compared with Pascanu et al. (2013). Montúfar (2017) proposes a even tighter upper bound than Montufar et al. (2014). And Raghu et al. (2017) also prove a similar result which has the same order compared to Montúfar (2017). Later, Serra et al. (2018) propose a tighter upper bound and a method to count the practical number of linear regions. Furthermore, Serra & Ramalingam (2018); Hanin & Rolnick (2019a;b) explore the properties of the practical number of linear regions. Finally, Hinz & Van de Geer (2019) employ the form of matrix computation to erect a framework to compute the upper bound, which is a generalization of previous work (Montufar et al., 2014; Montúfar, 2017; Serra et al., 2018) 2.2 NOTATIONS, DEFINITIONS AND PROPERTIES

