Achieve the Minimum Width of Neural Networks for Universal Approximation

Abstract

The universal approximation property (UAP) of neural networks is fundamental for deep learning, and it is well known that wide neural networks are universal approximators of continuous functions within both the L p norm and the continuous/uniform norm. However, the exact minimum width, w min , for the UAP has not been studied thoroughly. Recently, using a decoder-memorizer-encoder scheme, Park et al. ( 2021) found that w min = max(d x + 1, d y ) for both the L p -UAP of ReLU networks and the C-UAP of ReLU+STEP networks, where d x , d y are the input and output dimensions, respectively. In this paper, we consider neural networks with an arbitrary set of activation functions. We prove that both C-UAP and L p -UAP for functions on compact domains share a universal lower bound of the minimal width; that is, w * min = max(d x , d y ). In particular, the critical width, w * min , for L p -UAP can be achieved by leaky-ReLU networks, provided that the input or output dimension is larger than one. Our construction is based on the approximation power of neural ordinary differential equations and the ability to approximate flow maps by neural networks. The nonmonotone or discontinuous activation functions case and the one-dimensional case are also discussed.

1. Introduction

The study of the universal approximation property (UAP) of neural networks is fundamental for deep learning and has a long history. Early studies, such as Cybenkot (1989); Hornik et al. (1989); Leshno et al. (1993) , proved that wide neural networks (even shallow ones) are universal approximators for continuous functions within both the L p norm (1 ≤ p < ∞) and the continuous/uniform norm. Further research, such as Telgarsky (2016), indicated that increasing the depth can improve the expression power of neural networks. If the budget number of the neuron is fixed, the deeper neural networks have better expression power Yarotsky & Zhevnerchuk ( 2020 



); Shen et al. (2022). However, this pattern does not hold if the width is below a critical threshold w min . Lu et al. (2017) first showed that the ReLU networks have the UAP for L 1 functions from R dx to R if the width is larger than d x + 4, and the UAP disappears if the width is less than d x . Further research, Hanin & Sellke (2017); Kidger & Lyons (2020); Park et al. (2021), improved the minimum width bound for ReLU networks. Particularly, Park et al. (2021) revealed that the minimum width is w min = max(d x + 1, d y ) for the L p (R dx , R dy ) UAP of ReLU networks and for the C(K, R dy ) UAP of ReLU+STEP networks, where K is a compact domain in R dx . For general activation functions, the exact minimum width w min for UAP is less studied. Johnson (2019) consider uniformly continuous activation functions that can be approximated by a sequence of one-to-one functions and give a lower bound w min ≥ d x + 1 for C-UAP (means UAP for C(K, R dy )). Kidger & Lyons (2020) consider continuous nonpolynomial activation functions and give an upper bound w min ≤ d x + d y + 1 for C-UAP. Park et al. (2021) improved the bound for L p -UAP (means UAP for L p (K, R dy )) to w min ≤ max(d x +

