LEGENDRE DEEP NEURAL NETWORK (LDNN) AND ITS APPLICATION FOR APPROXIMATION OF NON-LINEAR VOLTERRA-FREDHOLM-HAMMERSTEIN IN-TEGRAL EQUATIONS

Abstract

Various phenomena in biology, physics, and engineering are modeled by differential equations. These differential equations including partial differential equations and ordinary differential equations can be converted and represented as integral equations. In particular, Volterra-Fredholm-Hammerstein integral equations are the main type of these integral equations and researchers are interested in investigating and solving these equations. In this paper, we propose Legendre Deep Neural Network (LDNN) for solving nonlinear Volterra-Fredholm-Hammerstein integral equations (V-F-H-IEs). LDNN utilizes Legendre orthogonal polynomials as activation functions of the Deep structure. We present how LDNN can be used to solve nonlinear V-F-H-IEs. We show using the Gaussian quadrature collocation method in combination with LDNN results in a novel numerical solution for nonlinear V-F-H-IEs. Several examples are given to verify the performance and accuracy of LDNN.

1. INTRODUCTION

Deep neural networks are a main and beneficial part of machine learning family which are applied in various areas including speech processing, computer vision, natural language processing and image processing (LeCun et al., 2015; Krizhevsky et al., 2012) . Also, the approximation of the functions is a significant branch in scientific computational and achieving success in this area is considered by some research (Tang et al., 2019; Hanin, 2019) . Solving differential equations is the other main branch of scientific computational which neural networks and deep learning have been shown success in this area. (Lample & Charton, 2019; Berg & Nyström, 2018; Raissi et al., 2019) . Various phenomena in biology, physics, finance, neuroscience and engineering are modeled by differential equations (Courant & Hilbert, 2008; Davis, 1961) . In recent years, several researchers studied the solving differential equations via deep learning or neural networks. differential equations consists of ordinary differential equations, partial differential equations and integral equations. (Sirignano & Spiliopoulos, 2018; Lu et al., 2019; Meng et al., 2020) . It is notable that the various numerical methods are applied for solving differential equations. Homotopy analysis method (HAM) (Liao, 2012) and variational iteration method (VIM) (He & Wu, 2007) are known as analytical/semi-analytical methods. Usually, spectral methods (Canuto et al., 2012) , Runge-Kutta methods (Hairer et al., 2006) , the finite difference methods (FDM) (Smith, 1985) and the finite element methods (FEM) (Johnson, 2012) are considered as the popular numerical methods. When the complexity of the model does not allow us to obtain the solution explicitly, numerical methods are a proper selection for finding the approximate solution for the models. Recently, some of the machine learning methods are applied for solving differential equations. Chakraverty & Mall (2017) introduced orthogonal neural networks which used orthogonal polynomials in the structure of the network. Raja et al. ( 2019) applied meta-heuristic optimization algorithm to neural network for obtaining the solution of differential equations. Moreover, other methods of machine learning such as support vector machine (Vapnik, 2013) are used to approximate the solution of the models. Least squares support vector machines are considered in these researches (Hajimohammadi et al., 2020; Mehrkanoon & Suykens, 2015) . Baker et al. (2019) selected deep neural networks for solving the differential equations. Pang et al. (2019) introduced a new network to find the solution of the different equations. Han et al. (2018) solved high-dimensional problems via deep networks. Also, Long et al. (2018) and Raissi et al. (2019) introduced a group of the equations which solved by deep learning. Furthermore, He et al. (2018) and Molina et al. (2019) investigated the effect of the activation function on networks. In this paper, we concern nonlinear Volterra-Fredholm-Hammerstein integral equations (V-F-H-IEs) and try to obtain the solution of them via deep neural network. We present a new numerical approach of machine learning which is a combination of deep neural network and Legendre collocation method. This approach is useful for solving the differential equations and we applied it for solving nonlinear V-F-H-IEs. We used Legendre collocation method to our network for perfect the numerical computations and enhancement the performance the network.

2. LEGENDRE DEEP NEURAL NETWORK (LDNN)

The main purpose of introducing LDNN is to apply it for solving differential models. Indeed, this purpose is to expand the utilization of deep learning networks in the field of scientific computing, especially the solution of differential equations. Moreover, this network has the advantages of solving equations by deep learning as well as numerical methods such as collocation method used to achieve better solution to the equations. LDNN presents a combination of a deep neural network and Legendre collocation method. In fact, our network consists of two networks which have connected consecutive to each other. The first network is a feed forward neural network which has an orthogonal Legendre layer. The second network includes operation nodes to create the desired computational model. In recent decades, numerical methods especially collocation method are popular methods for solving differential equations. In the collocation method, first an approximation of the solution is expanded by using the sum of the basic functions. The basic functions consists of the orthogonal polynomials such as Legendre polynomials.Then this approximation is placed in the differential equation. By considering the appropriate set of candidate points, an attempt is made to obtain the unknown coefficients of the basic functions so that the solution satisfies the equation in a set of candidate points. The first network is applied to creat the approximation of the solution. This approximation can be known as the scattered data interpolation problem. The second network is used to obtain the desired equation so that the solution satisfies it. The structure of LDNN is described in detail at the following rest. Consider that the first network has a M-layer which defined as follows: H 0 = x, x ∈ R d , H 1 = L(W (1) H 0 + b (1) ), H i = f (W (i) H i-1 + b (i) ), 2 ≤ i ≤ M -1, H M = W (M) H M-1 + b (M) . where H 0 is the input layer with d dimension. H i , 1 ≤ i ≤ M -1 are hidden layers, L = [L 0 , L 1 , ...L n ] T which L i are i- th degrees of Legendre orthogonal polynomials, H 1 is an orthogonal layer, f is the hyperbolic tangent activation function or other commonly used activation functions. W (i) , i = 1, • • • , M are the weight parameters and b (i) , 1 ≤ i ≤ M are the bias parameters. H M is the output layer. It is notable that the second network is applied to obtain the desired differential model. This aim is possible by using operation nodes including integrals, derivatives, and etc. These nodes are applied to the output of the first network. Moreover, automatic differentiation (AD) (Baydin et al., 2017) and Legendre Gaussian integration (Shen et al., 2011) have been used in network computing to obtain more accurate and fast calculations. How to train the network and set the parameters are also important points. Supervised learning method is used to train network. The cost function for setting parameters is defined as follows: CostFun = min(y t -y p ) + min(R m ). (1) where y t is an exact value of the model and y p is a predicted value of the LDNN. The definition of R m is explained in section 3.The minimization of CostFun is obtained by performing Adam algorithm (Kingma & Ba, 2015) and the L-BFGS method (Liu & Nocedal, 1989) on mean squared errors of training data set.

2.1. LEGENDRE POLYNOMIALS

Legendre polynomials (Shen et al., 2011) are a main series of orthogonal polynomials which denoted by L n (η), are defined as: L n (η) = 1 2 n [ n 2 ] =0 (-1) (2n -2 )! 2 n !(n -)!(n -2 )! η n-2 (2) Legendre polynomials are defined in [-1, 1] domain and have the recurrence formula in the following form: (n + 1)L n+1 (η) = (2n + 1)ηL n (η) -nL n-1 (η), n ≥ 1, L 0 (η) = 1, L 1 (η) = η. Orthogonality relation for these polynomials is as follows: 1 -1 L n (η)L m (η)dη = γδ n,m , where δ n,m is a delta Kronecker function and γ = 2 2n+1 . The weight function of them is W(η) = 1. Some following useful properties of Legendre polynomials are defined: L n (-η) = (-1) n L n (η), (5) |L n (η)| ≤ 1, ∀η ∈ [-1, 1], n ≥ 0, (6) L n (±1) = (±1) n , (2n + 1)L n (η) = L n+1 (η) -L n-1 (η), n ≥ 1.

3. NONLINEAR VOLTERRA-FREDHOLM-HAMMERSTEIN INTEGRAL EQUATIONS AND LDNN

The general form of nonlinear Volterra-Fredholm-Hammerstein integral equations (V-F-H-IEs) is as follows: y(x) = g(x) + ξ 1 x 0 K 1 (x, s)ϕ 1 (s, y(s))ds + ξ 2 1 0 K 2 (x, s)ϕ 2 (s, y(s))ds, x ∈ [0, 1]. (9) where ξ 1 , ξ 2 are fixed, g(x), K 1 (x, s) and K 2 (x, s) are given functions and ϕ 1 (s, y(s)), ϕ 2 (s, y(s)) are nonlinear functions. The aim is to find the proper y(x). In order to use the LDNN, reformulated Eq. ( 9) in the following form: R m = -y(x) + g(x) + ξ 1 x 0 K 1 (x, s)ϕ 1 (s, y(s))ds + ξ 2 1 0 K 2 (x, s)ϕ 2 (s, y(s))ds, x ∈ [0, 1]. (10) y(x) is approximated by the first network of the LDNN. y(x) ≈ H M . Furthermore, we applied Legendre-Gauss integration formula (Shen et al., 2011) : 1 -1 h(X)dX = N j=0 ω j h(X j ) where {X j } N j=0 are the roots of L n+1 and {ω j } N j=0 = 2 (1-X 2 j )(L n+1 (Xj )) 2 . Here, we should transfer the [0, x] and [0, 1] domains into the [-1, 1] domain. It is possible by using the following transformation: t 1 = 2 x s -1, t 2 = 2s -1. Consider Z 1 (x, s) = K 1 (x, s)ϕ 1 (s, y(s)), Z 2 (x, s) = K 2 (x, s)ϕ 2 (s, y(s)). we have R m = -y(x) + g(x) + ξ 1 x 2 1 -1 Z 1 (x, x 2 (t 1 + 1))dt 1 + ξ 2 2 1 -1 Z 2 (x, x 2 (t 2 + 1))dt 2 . ( ) by using Legendre-Gauss integration formula, the below form is concluded: R m = -y(x) + g(x) + ξ 1 x 2 N1 j=0 ω 1j Z 1 (x, x 2 (t 1j + 1)) + ξ 2 2 N2 j=0 ω 2j Z 2 (x, x 2 (t 2j + 1)). ( ) The second network of LDNN and its nodes makes R m . The architecture of LDNN for solving nonlinear V-F-H-IEs is represented in Figure 1 . Figure 1 : The architecture of LDNN for solving nonlinear V-F-H-IEs. The first network approximates the solution of IE y(x). This network has M-layer and feed forward neural network is the structure of it. H 1 is introduced as a orthogonal layer which consists of p neurons with {L i } p i=0 (Legendre polynomials) as activation functions. Other layers have f , hyperbolic tangent as activation functions. The second network with the nodes makes the desired model and the output of it, is R m (consider Eq. ( 14)). The outputs of LDNN are y(x) and R m .

4. NUMERICAL RESULTS

In order to present the accuracy and performance of the LDNN for solving nonlinear V-F-H-IEs and justify the efficiency of the proposed method, several examples are given. The convergence behavior of the LDNN is reported by using the following parameters: The exact value y t , the predicted value y p and the absolute error (Error) in some points of test data are reported in various tables. -1) , 1]. This network has d dimension in input layer, M -1 hidden layers with N L ( ) , 2 ≤ ≤ M -1, neurons in each layer and one output which approximates the y(x). All the experiments have 4 hidden layers. , 10, 30, 20, 10 , 1] 500 (50, -) 3.937867e-09 100 4.015095e-09 Experiment 2 [1, 10, 30, 20, 10, 1] 500 (50, , 10, 30, 20, 10 , 1] 500 (50, 50) 1.347132e-09 100 1.659349e-08 Experiment 4 [1, 10, 30, 20, 10 , 1] 500 (50, 50) 9.182442e-09 100 1.107755e-09 The Tensorflow package of Python version 3.7.0. is applied for writing the code of all experiments. Adam algorithm is stoped when the number of iteration is up to 5000 and L-BFGS method is stoped when it converges. The figures are obtained on the test data set. L train 2 = ||y t -y p || 2 = [ mtr j=1 (y t (x j ) -y p (x j )) 2 ] 1 2 , L test 2 = ||y t -y p || 2 = [ mte j=1 (y t (x j ) -y p (x j )) 2 ] 1 2 , ( ) d, N L (1) , N L (2) , • • • , N L (M Experiment M-Layers m 1 (N 1 , N 2 ) L train 2 m 2 L test 2 Experiment 1 [1

4.1. EXPERIMENT 1

Suppose that we have the following model (Yousefi & Razzaghi, 2005) : y(x) = e x - 1 3 e 3x + 1 3 + x 0 y 3 (s)ds, x ∈ [0, 1]. It has the exact solution y(x) = e x . Table 2 represents the exact value, the predicted value and the absolute error (Error) in several test points on [0, 1] domain. 50 points of shifted Legendre quadrature points are applied for training LDNN. The number of train data set is 500 and the number of test data set is 100. Figure 2 shows the illustrated comparison between y t and y p . Suppose that we have the following model (Razzaghi & Ordokhani, 2002) : y(x) = 1 sin 2 (x) + 1 0 K(x, s)y 2 (s)ds, x ∈ [0, 1]. ( ) where K(x, s) = -3 sin(x -s), 0 ≤ s ≤ x; 0 x ≤ s ≤ 1. ( ) It has the exact solution y(x) = cos(x). The exact value, the predicted value and the absolute error (Error) in several test points on [0, 1] domain are reported in Table 3 . 50 points of shifted Legendre quadrature points are applied for training LDNN. The number of train data set is 500 and the number of test data set is 100. Figure 3 shows the illustrated comparison between y t and y p . 

4.3. EXPERIMENT 3

Suppose that we have the following model (Babolian et al., 2007) : y(x) = g(x) + x 0 (x -s)y 2 (s)ds + 1 0 (x + s)y(s)ds, x ∈ [0, 1]. ( ) where g(x) = - 1 30 x 6 + 1 3 x 4 -x 2 + 5 3 x - 5 4 It has the exact solution y(x) = x 2 -2.  It has the exact solution y(x) = x 2 + 1 2 . The exact value, the predicted value and the absolute error (Error) in several test points on [0, 1] domain are reported in Table 5 . 50 points of shifted Legendre quadrature points are applied for training LDNN. The number of train data set is 500 and the number of test data set is 100. Figure 5 shows the illustrated comparison between y t and y p . LDNN includes two networks. The first network approximates the solution of a nonlinear V-F-H-IE y(x) which has M-layers feed forward neural network structure. The first hidden layer of this has a orthogonal layer consists of Legendre polynomials as activation functions. The last network adjusts the output of the sooner network to fit to a desired equation form. The better performance of the network has been obtained by using Legendre Gaussian integration and automatic differentiation. Some experiments of nonlinear V-F-H-IEs are given to investigate the reliability and validity of LDNN. The results show that this network is an efficient and has high accuracy.



Figure 2: Results of Experiment 1. Exact solution y t (x) = e x , predicted solution y p (x) by LDNN.

Figure 3: Results of Experiment 2. Exact solution y t (x) = cos(x), predicted solution y p (x) by LDNN.

Figure 4: Results of Experiment 3. Exact solution y t (x) = x 2 -2, predicted solution y p (x) by LDNN.

Figure 5: Results of Experiment 4. Exact solution y t (x) = x 2 + 1 2 , predicted solution y p (x) by LDNN.

The number of the train data m 1 , the number of Legendre quadrature points (N 1 , N 2 ), the number of the test data m 2 , the structure of network M-layers, L train

The LDNN parameters for all the experiments. The structure of M-Layers indicates by [

The exact value, the predicted value and the absolute error (Error) in several test points on [0, 1] domain for Experiment 1.

The exact value, the predicted value and the absolute error (Error) in several test points on [0, 1] domain for Experiment 2.x exact value (y t = cos(x)) predicted value (y p ) Error

The exact value, the predicted value and the absolute error (Error) in several test points on [0, 1] domain for Experiment 3.x exact value (y t = x 2 -2) predicted value (y p ) Error

The exact value, the predicted value and the absolute error (Error) in several test points on [0, 1] domain for Experiment 4.x exact value (y t = x 2 + 1 2 ) predicted value (y p ) Error

