MIRROR TRAINING FOR INPUT CONVEX NEURAL NETWORK

Abstract

The input convex neural network (ICNN) aims to learn a convex function from the input to the output by using non-decreasing convex activation functions and non-negativity constraints on the weight parameters of some layers. However, in practice, it loses some representation power because of these non-negativity parameters of the hidden units, even though the design of the "passthrough" layer can partially address this problem. To solve issues caused by these non-negativity constraints, we use a duplication input pair trick, i.e., the negation of the original input as part of the new input in our structure. This new method will preserve the convexity of the function from the original input to the output and tackle the representation problem in training. Additionally, we design a mirror unit to address this problem further, making the network Mirror ICNN. Moreover, we propose a recurrent input convex neural network (RICNN) structure to deal with the timeseries problems. The recurrent unit of the structure can be ICNN or any other convex variant of ICNN. This structure can maintain convexity by constraining the mapping from the hidden output at time step t to the input of the next time step t + 1. The experiments can support our design, including the simple numerical curve fitting, power system hosting capacity dataset regression, and the MNIST dataset classification.

1. INTRODUCTION

Convex optimization's mathematical foundations have been researched for centuries, yet numerous recent developments have sparked new interest in this topic Hindi (2004) . In machine learning, convexity and optimization typically refer to the optimization of the parameters or the minimization of the loss Bengio et al. (2005) . However, the input convex neural network (ICNN) in Amos et al. (2017) provides a different perspective on the convexity of the neural network, which is from the input to the output. The input convexity of the ICNN is preserved because of the non-decreasing convex activation function, such as the rectified linear unit (ReLU) Nair & Hinton (2010); Agarap (2018), and the non-negativity constraint on some of the hidden layers. These non-negative weights can maintain the convexity from the input to the output but also brings the problem of the lack of representation power. Though using the "passthrough" layers can partially provide substantial representation, this problem is still a challenge during the input convex neural network training. To tackle the challenge in representation, Chen et al. (2018) concatenates the negation of the original input with itself, making it the new input of the network. This method can theoretically get more representation power because of the duplication input pair, but the actual training process does not work as expected. The reason for the bad convergence is the new non-negativity constraint on the "passthrough" layers. Therefore, we proposed the modified trick, mirror training, to improve the convergence of the training. We prove that the non-negativity constraint on the "passthrough" layers will lead to a poor training result. Moreover, the new designs in our model, the negative pair of input and the mirror unit, will not break the convexity from the input to the output and improve the training performance. Chen et al. ( 2018) uses the ICNN for the building control task under both the single and timeseries scenarios. Similarly, Kolter & Manek (2019) also extends the scope of the ICNN for dynamic models. Dynamic models or sequence models are always topics that require attention. To take advantage of the convexity of the ICNN and the sequence models, such as the RNN, we construct a loop using the ICNN as the recurrent unit, which is the recurrent ICNN. The network is convex from the sequential input to output by adding the non-negativity constraint on the weights of the hidden output. Our model is more straightforward to modify than the other sequence structure because the recurrent unit can be any convex variant of the basic ICNN. The remainder of this paper is organized as follows. Section 2 discusses the related works. Section 3 illustrates the design of the mirror training and the structure of the recurrent input convex neural network. Section 4 provides the numerical validation using different datasets. Conclusion and discussions are in Section 5.

2. RELATED WORK

Input convex neural network. The input convex neural network (ICNN) is proposed in Amos et al. ( 2017). Given a fully connected neural network, the model can learn a convex function from the input to the output because of the non-negativity constraint on the weights of some hidden layers. However, this constraint reduces the representation power of the model even though the use of the "passthrough" layers can provide some additional representation. The solution provided by Chen et al. ( 2018) aims to address this problem by using the duplication input pair, but the new constraint on the "passthrough" layers will cause a problem in convergence. We provide the mirror training technique to tackle the problem and prove that the new structure can preserve convexity and have additional representation power. Recurrent neural network. The recurrent neural network (RNN) is a class of sequence models that are widely used for various tasks, including time-series prediction Qin et al. ( 2017), machine translation Cho et al. ( 2014) and speech recognition Shewalkar (2019). The core design of the basic RNN is the recurrent unit, also known as an RNN "cell," where the output is connected to the input, forming a cycle. Many variants of the basic RNN achieve state-of-the-art performance by modifying the cell structure. For example, the long short-term memory (LSTM) Hochreiter & Schmidhuber (1997) and the gated recurrent unit (GRU) Chung et al. (2014) use the concept of "gate" to forget, select, and memorize the information flowing in the model, therefore learning the time-series relationship. Our design of the recurrent input convex neural network (RICNN) takes advantage of this recurrent structure. It formulates the cell as a basic ICNN to capture the convexity for time-series tasks. Chen et al. ( 2018) also proposed an input convex sequence model. The differences between this model and our network are that, first, the recurrent unit of our model can be any form of a convex network, while the cell of the model in Chen et al. (2018) only has one layer of full connected neural; second, our recurrent network does not need to make all weights non-negative. Hosting capacity analysis in the power system. Hosting capacity analysis is a popular topic in power system research. The analysis determines how many more distributed energy resources (DERs) the power grid can host without causing technical issues Wu et al. (2022) . Traditional hosting capacity analysis can be treated as an optimization problem Yuan et al. (2022) . Nazir & Almassalkhi (2019) presents the hosting capacity analysis as a convex inner approximation of the optimal power flow problem. The data-driven hosting capacity analysis method is always formulated as a time-series problem to observe the hosting capacity value changes over time Rylander et al. (2018) . In this paper, we use the proposed recurrent input convex neural network to consider the convexity of this analysis, meanwhile capturing the temporal correlation of the hosting capacity values.

