DEEP NEURAL TANGENT KERNEL AND LAPLACE KERNEL HAVE THE SAME RKHS

Abstract

We prove that the reproducing kernel Hilbert spaces (RKHS) of a deep neural tangent kernel and the Laplace kernel include the same set of functions, when both kernels are restricted to the sphere S d-1 . Additionally, we prove that the exponential power kernel with a smaller power (making the kernel less smooth) leads to a larger RKHS, when it is restricted to the sphere S d-1 and when it is defined on the entire R d .

1. INTRODUCTION

In the past few years, one of the most seminal discoveries in the theory of neural networks is the neural tangent kernel (NTK) (Jacot et al., 2018) . The gradient flow on a normally initialized, fully connected neural network with a linear output layer in the infinite-width limit turns out to be equivalent to kernel regression with respect to the NTK (This statement does not necessarily hold for a non-linear output layer, because the NTK is non-constant (Liu et al., 2020) ). Through the NTK, theoretical tools from kernel methods were introduced to the study of deep overparametrized neural networks. Theoretical results were thereby established regarding the convergence (Allen-Zhu et al., 2019; Du et al., 2019b; a; Zou et al., 2020 ), generalization (Cao & Gu, 2019; Arora et al., 2019b) , and loss landscape (Kuditipudi et al., 2019) of overparametrized neural networks in the NTK regime. While NTK has proved to be a powerful theoretical tool, a recent work (Geifman et al., 2020) posed an important question whether the NTK is significantly different from our repertoire of standard kernels. Prior work provided empirical evidence that supports a negative answer. For example, Belkin et al. (2018) showed experimentally that the Laplace kernel and neural networks had similar performance in fitting random labels. In the task of speech enhancement, exponential power kernels K γ,σ exp (x, y) = e -x-y γ /σ , which include the Laplace kernel as a special case, outperform deep neural networks with even shorter training time (Hui et al., 2019) . The experiments in (Geifman et al., 2020) also exhibited similar performance of the Laplace kernel and the NTK. The expressive power of a positive definite kernel can be characterized by its associated reproducing kernel Hilbert space (RKHS) (Saitoh & Sawano, 2016) . The work (Geifman et al., 2020) considered the RKHS of the kernels restricted to the sphere S d-1 {x ∈ R d | x 2 = 1} and presented a partial answer to the question by showing the following subset inclusion relation H Gauss (S d-1 ) H Lap (S d-1 ) = H N1 (S d-1 ) ⊆ H N k (S d-1 ) , where the four spaces denote the RKHS associated with the Gaussian kernel, Laplace kernel, the NTK of two-layer and (k + 1)-layer (k ≥ 1) fully connected neural networks, respectively. All four kernels are restricted to S d-1 . However, the relation between H Lap (S d-1 ) and H N k (S d-1 ) remains open in (Geifman et al., 2020) . We make a final conclusion on this problem and show that the RKHS of the Laplace kernel and the NTK with any number of layers have the same set of functions, when they are both restricted to S d-1 . In other words, we prove the following theorem. Theorem 1. Let H Lap (S d-1 ) and H N k (S d-1 ) be the RKHS associated with the Laplace kernel K Lap (x, y) = e -c x-y (c > 0) and the neural tangent kernel of a (k + 1)-layer fully connected 1

