MIRROR TRAINING FOR INPUT CONVEX NEURAL NETWORK

Abstract

The input convex neural network (ICNN) aims to learn a convex function from the input to the output by using non-decreasing convex activation functions and non-negativity constraints on the weight parameters of some layers. However, in practice, it loses some representation power because of these non-negativity parameters of the hidden units, even though the design of the "passthrough" layer can partially address this problem. To solve issues caused by these non-negativity constraints, we use a duplication input pair trick, i.e., the negation of the original input as part of the new input in our structure. This new method will preserve the convexity of the function from the original input to the output and tackle the representation problem in training. Additionally, we design a mirror unit to address this problem further, making the network Mirror ICNN. Moreover, we propose a recurrent input convex neural network (RICNN) structure to deal with the timeseries problems. The recurrent unit of the structure can be ICNN or any other convex variant of ICNN. This structure can maintain convexity by constraining the mapping from the hidden output at time step t to the input of the next time step t + 1. The experiments can support our design, including the simple numerical curve fitting, power system hosting capacity dataset regression, and the MNIST dataset classification.

1. INTRODUCTION

Convex optimization's mathematical foundations have been researched for centuries, yet numerous recent developments have sparked new interest in this topic Hindi (2004) . In machine learning, convexity and optimization typically refer to the optimization of the parameters or the minimization of the loss Bengio et al. (2005) . However, the input convex neural network (ICNN) in Amos et al. (2017) provides a different perspective on the convexity of the neural network, which is from the input to the output. The input convexity of the ICNN is preserved because of the non-decreasing convex activation function, such as the rectified linear unit (ReLU) Nair & Hinton (2010); Agarap (2018), and the non-negativity constraint on some of the hidden layers. These non-negative weights can maintain the convexity from the input to the output but also brings the problem of the lack of representation power. Though using the "passthrough" layers can partially provide substantial representation, this problem is still a challenge during the input convex neural network training. To tackle the challenge in representation, Chen et al. ( 2018) concatenates the negation of the original input with itself, making it the new input of the network. This method can theoretically get more representation power because of the duplication input pair, but the actual training process does not work as expected. The reason for the bad convergence is the new non-negativity constraint on the "passthrough" layers. Therefore, we proposed the modified trick, mirror training, to improve the convergence of the training. We prove that the non-negativity constraint on the "passthrough" layers will lead to a poor training result. Moreover, the new designs in our model, the negative pair of input and the mirror unit, will not break the convexity from the input to the output and improve the training performance. Because of the great convexity property of the ICNN, it has been widely used for different tasks. Chen et al. (2020) introduces the application of voltage regulation through the ICNN. In addition,

