SMOOTH MATHEMATICAL FUNCTION FROM COMPACT NEURAL NETWORKS

Abstract

This is paper for the smooth function approximation by neural networks (NN). Mathematical or physical functions can be replaced by NN models through regression. In this study, we get NNs that generate highly accurate and highly smooth function, which only comprised of a few weight parameters, through discussing a few topics about regression. First, we reinterpret inside of NNs for regression; consequently, we propose a new activation function-integrated sigmoid linear unit (ISLU). Then special charateristics of metadata for regression, which is different from other data like image or sound, is discussed for improving the performance of neural networks. Finally, the one of a simple hierarchical NN that generate models substituting mathematical function is presented, and the new batch concept "meta-batch" which improves the performance of NN several times more is introduced. The new activation function, meta-batch method, features of numerical data, meta-augmentation with metaparameters, and a structure of NN generating a compact multi-layer perceptron(MLP) are essential in this study.

1. INTRODUCTION

In many fields, such as astronomy, physics, and economics, someone may want to obtain a general function that satisfies a dataset through regression from numerical data, which are fairly accurate (Ferrari & Stengel (2005) ; Czarnecki et al. (2017) ; Raissi et al. (2019) ; Langer (2021)). The problem of smoothly approximating and inferring general functions using neural networks (NNs) has been considered in the some literature. However, there is insufficient research on using NNs to completely replace the ideal mathematical functions of highly smooth levels, which are sufficiently precise to be problem-free when a simulation is performed. This study aims to completely replace such ideal mathematical functions. Assuming a model M (X) was developed by regression on a dataset using an NN. M (X) for input X can be thought of as a replacement of a mathematical function f (X). In this study, such NN is called "neural function(NF)" as a mathematical function created by an NN. The components of an analytic mathematical function can be analyzed using a series expansion or other methods, whereas it is difficult for a NF. In this study, we created "highly accurate" and "highly smooth" NFs with a "few parameters" using metadata. Particularly, we combined a new activation function, a meta-batch method, and weight-generating network (WGN) to realize the desired performances. The major contributions of this study can be summarized as follows. • We dissected and interpreted the middle layers of NNs. The outputs of each layer are considered basis functions for the next layer; from this interpretation, we proposed a new activation function-integrated sigmoid linear unit (ISLU)-suitable for regression. • The characteristics and advantages of metadata for regression problems were investigated. A training technique with fictious metaparameters and data augmentation, which significantly improves performance, was introduced. It was also shown that for regression problems, the function values at specific locations could be used as metaparameters representing the characteristics of a task. • NN structures that could generate compactfoot_0 NFs for each task from metaparameters were investigated, and a new batch concept-'meta-batch'-that could be used in the NFs was introduced.

2. NNS FOR REGRESSION

Let's talk about an easy but non-common interpretation about regression with a multilayer perceptron (MLP). What do the outputs of each layer of an MLP mean? They can be seen as basis functions that determine the function to be input to the next layer. The input x i+1 of the (i + 1)th layer can be expressed as follows: x i+1 j = k w i j,k * M i k (x 0 ) + b j , where x 0 denotes the input of the first layer, w i j,k denotes the weight that connects the kth node of the ith layer to jth node of the (i + 1)th layer, and M i k denotes a model comprising the 0th to ith layers and having the kth node of the ith layer as the output. This is similar to the expression f (x) = j w j ϕ j (x) + b of the radial basis function(RBF) kernel method. Clearly, the outputs of each layer act as basis functions for the next layer. Figure 2 shows the outputs of each layer of an MLP that learned the dataset D = {(x i , y i )|y = 0.2(x -1)x(x + 1.5), x ∈ [-2, 2]} with the exponential linear unit (ELU) activation function. To efficiently extract the final function, the output functions of the intermediate layers must be well-developed. If the output functions of each layer are well-developed, the desired final NF can be compact. In addition, for the final function of NN to be infinitely differentiable, the output functions of the intermediate layers should also be infinitely differentiable. If the activation function is a rectified linear unit(ReLU), the output function bends sharply after every layer. If a one-dimensional regression problem is modeled with a simple MLP that has (k+1) layers with nodes [N 0 , N 1 , N 2 ..N k ], the output function will bend more than N 0 * N 1 ...N k . The ELU activation function weakens such bending but does not smoothen it for the



Comprising few parameters



Figure 1: Perspective on MLP

Figure 2: The output graphs of each layer, trained with an MLP, where the nodes of each layer are [1,30,30,30,5,1].

Figure 3: The graphs of ELU and ISLU(α = 0.5, β = 1)

