APPROXIMATING ANY FUNCTION VIA CORESET FOR RADIAL BASIS FUNCTIONS: TOWARDS PROVABLE DATA SUBSET SELECTION FOR EFFICIENT NEURAL NETWORKS TRAINING

Abstract

Radial basis function neural networks (RBFNN) are well-known for their capability to approximate any continuous function on a closed bounded set with arbitrary precision given enough hidden neurons. Coreset is a small weighted subset of an input set of items, that provably approximates their loss function for a given set of queries (models, classifiers, etc.). In this paper, we suggest the first coreset construction algorithm for RBFNNs, i.e., a small weighted subset which approximates the loss of the input data on any radial basis function network and thus approximates any function defined by an RBFNN on the big input data. This is done by constructing coresets for radial basis and Laplacian loss functions. We use our coreset to suggest a provable data subset selection algorithm for training deep neural networks, since our coreset approximates every function, it should approximate the gradient of each weight in a neural network as it is defined as a function on the input. Experimental results on function approximation and dataset subset selection on popular network architectures and data sets are presented, demonstrating the efficacy and accuracy of our coreset construction.

1. INTRODUCTION

Radial basis function neural networks (RBFNNs) are artificial neural networks that generally have three layers: an input layer, a hidden layer with a radial basis function (RBF) as an activation function, and a linear output layer. In this paper, the input layer receives a d-dimensional vector x ∈ R d of real numbers. The hidden layer then consists of various nodes representing RBFs, to compute ρ(∥x-c i ∥ 2 ) := exp -∥xc i ∥ 2 2 , where c i ∈ R d is the center vector for neuron i across, say, N neurons in the hidden layer. The linear output layer then computes N i=1 α i ρ(∥x -c i ∥ 2 ), where α i is the weight of neuron i in the linear output neuron. Therefore, RBFNNs are feed-forward neural networks because the edges between the nodes do not form a cycle, and enjoy advantages such as simplicity of analysis, faster training time, and interpretability, compared to alternatives such as convolutional neural networks (CNNs) and even multi-layer perceptrons (MLPs) (Padmavati, 2011) . Function approximation via RBFNNs. RBFNNs are universal approximators in the sense that an RBFNN with a sufficient number of hidden neurons (large N ) can approximate any continuous function on a closed, bounded subset of R d with arbitrary precision (Park & Sandberg, 1991) , i.e., given a sufficiently large input set P of n points in R d and given its corresponding label function y : P → R, an RBFNN, can be trained to approximate the function y. Therefore, RBFNNs are commonly used across a wide range of applications, such as function approximation (Park & Sandberg, 1991; 1993; Lu et al., 1997) , time series prediction (Whitehead & Choate, 1996; Leung et al., 2001; Harpham & Dawson, 2006 ), classification (Leonard & Kramer, 1991; Wuxing et al., 2004; Babu & Suresh, 2012), and system control (Yu et al., 2011; Liu, 2013) , due to their faster learning speed. For a given size of RBFNN (number of neurons in the hidden layer) and an input set, the aim of this paper is to compute a small weighted subset that approximates the loss of the input data on any radial basis function neural network of this size and thus approximates any function defined 1

