Achieve the Minimum Width of Neural Networks for Universal Approximation

Abstract

The universal approximation property (UAP) of neural networks is fundamental for deep learning, and it is well known that wide neural networks are universal approximators of continuous functions within both the L p norm and the continuous/uniform norm. However, the exact minimum width, w min , for the UAP has not been studied thoroughly. Recently, using a decoder-memorizer-encoder scheme, Park et al. ( 2021) found that w min = max(d x + 1, d y ) for both the L p -UAP of ReLU networks and the C-UAP of ReLU+STEP networks, where d x , d y are the input and output dimensions, respectively. In this paper, we consider neural networks with an arbitrary set of activation functions. We prove that both C-UAP and L p -UAP for functions on compact domains share a universal lower bound of the minimal width; that is, w * min = max(d x , d y ). In particular, the critical width, w * min , for L p -UAP can be achieved by leaky-ReLU networks, provided that the input or output dimension is larger than one. Our construction is based on the approximation power of neural ordinary differential equations and the ability to approximate flow maps by neural networks. The nonmonotone or discontinuous activation functions case and the one-dimensional case are also discussed.

1. Introduction

The study of the universal approximation property (UAP) of neural networks is fundamental for deep learning and has a long history. Early studies, such as Cybenkot (1989); Hornik et al. (1989) ; Leshno et al. (1993) , proved that wide neural networks (even shallow ones) are universal approximators for continuous functions within both the L p norm (1 ≤ p < ∞) and the continuous/uniform norm. Further research, such as Telgarsky (2016), indicated that increasing the depth can improve the expression power of neural networks. If the budget number of the neuron is fixed, the deeper neural networks have better expression power Yarotsky & Zhevnerchuk (2020); Shen et al. (2022) . However, this pattern does not hold if the width is below a critical threshold w min . Lu et al. (2017) first showed that the ReLU networks have the UAP for L 1 functions from R dx to R if the width is larger than d x + 4, and the UAP disappears if the width is less than d In this paper, we consider neural networks having the UAP with arbitrary activation functions. We give a universal lower bound, w min ≥ w * min = max(d x , d y ), to approximate functions from a compact domain K ⊂ R dx to R dy in the L p norm or continuous norm. Furthermore, we show that the critical width w * min can be achieved by many neural networks, as listed in Table 1 . Surprisingly, the leaky-ReLU networks achieve the critical width for the L p -UAP provided that the input or output dimension is larger than one. This result relies on a novel construction scheme proposed in this paper based on the approximation power of neural ordinary differential equations (ODEs) and the ability to approximate flow maps by neural networks. . † UOE means the function having universal ordering of extrema, see Definition 7 . 1.1 Contributions 1) Obtained the universal lower bound of width w * min for feed-forward neural networks (FNNs) that have universal approximation properties. 2) Achieved the critical width w * min by leaky-ReLU+ABS networks and UOE+FLOOR networks. (UOE is a continuous function which has universal ordering of extrema. It is introduced to handle C-UAP for one-dimensional functions. See Definition 7.) 3) Proposed a novel construction scheme from a differential geometry perspective that could deepen our understanding of UAP through topology theory.

1.2. Related work

To obtain the exact minimum width, one must verify the lower and upper bounds. Generally, the upper bounds are obtained by construction, while the lower bounds are obtained by counterexamples. Lower bounds. 2020) noticed that such functions on a compact domain K take their maximum value on the boundary ∂K. These properties allow one to construct counterexamples and give a lower bound w min ≥ d x + 1 for C-UAP. For general activation



x . Further research, Hanin & Sellke (2017); Kidger & Lyons (2020); Park et al. (2021), improved the minimum width bound for ReLU networks. Particularly, Park et al. (2021) revealed that the minimum width is w min = max(d x + 1, d y ) for the L p (R dx , R dy ) UAP of ReLU networks and for the C(K, R dy ) UAP of ReLU+STEP networks, where K is a compact domain in R dx . For general activation functions, the exact minimum width w min for UAP is less studied. Johnson (2019) consider uniformly continuous activation functions that can be approximated by a sequence of one-to-one functions and give a lower bound w min ≥ d x + 1 for C-UAP (means UAP for C(K, R dy )). Kidger & Lyons (2020) consider continuous nonpolynomial activation functions and give an upper bound w min ≤ d x + d y + 1 for C-UAP. Park et al. (2021) improved the bound for L p -UAP (means UAP for L p (K, R dy )) to w min ≤ max(d x + 2, d y + 1). A summary of known upper/lower bounds on minimum width for the UAP can be found in Park et al. (2021).

) ReLU w min = max(d x + 1, d y ) Park et al. (2021) C([0, 1], R 2 ) ReLU w min = 3 = max(d x , d y ) + 1 Park et al. (2021) C(K, R dy ) ReLU+STEP w min = max(d x + 1, d y ) Park et al. (2021) L p (K, R dy ) Conti. nonpoly ‡ w min ≤ max(d x + 2, d y + 1) Park et al. (2021) L p (K, R dy ) Arbitrary w min ≥ max(d x , d y ) =: w * min Ours (Lemma 1) Leaky-ReLU w min = max(d x , d y , 2) Ours (Theorem 2) Leaky-ReLU+ABS w min = max(d x , d y ) Ours (Theorem 3) C(K, R dy ) Arbitrary w min ≥ max(d x , d y ) =: w * min Ours (Lemma 1) ReLU+FLOOR w min = max(d x , d y , 2) Ours (Lemma 4) UOE † +FLOOR w min = max(d x , d y ) Ours (Corollary 6) C([0, 1], R dy ) UOE † w min = d y Ours (Theorem 5) ‡ Continuous nonpolynomial ρ that is continuously differentiable at some z with ρ ′ (z) ̸ = 0.

Summary of the known minimum width of feed-forward neural networks that have the universal approximation property.

For ReLU networks, Lu et al. (2017) utilized the disadvantage brought by the insufficient size of the dimensions and proved a lower bound w min ≥ d x for L 1 -UAP; Hanin & Sellke (2017) considered the compactness of the level set and proved a lower bound w min ≥ d x + 1 for C-UAP. For monotone activation functions or its variants, Johnson (2019) noticed that functions represented by networks with width d x have unbounded level sets, and Beise & Da Cruz (

