LAU: A NOVEL TWO-PARAMETER LEARNABLE LOG-MOID ACTIVATION UNIT Anonymous

Abstract

In this work, we proposed a novel learnable Logmoid Activation Unit (LAU), f (x) = x ln(1 + αsigmoid(βx)) by parameterizing Logmoid with two hyperparameters α and β that are optimized via back-propagation algorithm. We design quasi-interpolation neural network operators with Logmoid-1 for approximating any continuous function in closed spaces. Our simulations show end-to-end learning deep neural networks with learnable Logmoids increase the predictive performances beyond all well-known activation functions for different tasks.

1. INTRODUCTION

In recent years, deep learning has achieved remarkable success in various classification problems (LeCun et al., 2015; Sriperumbudur et al., 2010) . The main reason is the powerful abilities of deep neural networks (DNNs) in representation and learning for the unknown structures. One of most important components in DNNs is activation function. A well-designed activation function can greatly improve the predictive performances (Krizhevsky et al., 2017) . This has intrigued growing interests in exploring activation functions (Nwankpa et al., 2018; Liang & Srikant, 2016; Shen et al., 2019) . Activation functions are generally classified into linear, nonlinear monotonic and nonlinear nonmonotonic functions. Although linear functions including Step Function (Klein et al., 2009 ), Sign Function (Huang & Babri, 1998) and Identity Function have been widely used in early results, they are useless in practical applications because of the discontinuous derivatives, or lack of biological motivation and classification ability. These problems are further addressed by using nonlinear monotonic functions such as Sigmoid, Tanh and ReLU families. While small derivatives of Sigmoid (Hassell et al., 1977) and Tanh (Kalman & Kwasny, 1992) may cause the gradient to disappear (He & Xu, 2010; Klambauer et al., 2017) . Softplus and ReLU functions are designed for solving this problem. Meanwhile, ReLU shows other characters such as the educed saturation, sparsity, efficiency, and ease of use, but the neural network may lose some valid information because of all negative values of ReLU being zero. This intrigues designing new activation functions such as Leaky ReLU (Maas et al., 2013 ), RReLU (Xu et al., 2015) , ELU (Clevert et al., 2015) and Swish (Ramachandran et al., 2017) . In general, designing good activation function is still an open question. One method is to find new activation functions by combining different units, such as Mish (Misra, 2019) and TanhExp (Liu & Di, 2020) . The second is to parameterize some well-known activation functions Biswas et al. 2019) using learnable hyper-parameter(s) and back-propagation algorithm (LeCun et al., 1989) . In this work, we proposed a new family of activation functions by parameterizing Logmoid with fewer trainable parameters for each network layer. It is given by f (x; α, β) = x ln(1 + αSigmoid(βx)) where the logarithmic operation can reduce the range of Sigmoid, and α and β are trainable parameters. The main contributions are summarized as follows: 1. A new family of activation functions called Logmoid family is proposed.



(2020); Zhou et al. (2020), which may show better performance beyond parameter-free functions. One example is Logish Zhu et al. (2021) which exhibits better performance and Top-1 accuracy. Another is Padé Activation Unit (PAU) Molina et al. (

