REMOVING DIMENSIONAL RESTRICTIONS ON COMPLEX/HYPER-COMPLEX CONVOLUTIONS

Abstract

It has been shown that the core reasons that complex and hypercomplex valued neural networks offer improvements over their real-valued counterparts is the fact that aspects of their algebra forces treating multi-dimensional data as a single entity (forced local relationship encoding) with an added benefit of reducing parameter count via weight sharing. However, both are constrained to a set number of dimensions, two for complex and four for quaternions. These observations motivate us to introduce novel vector map convolutions which capture both of these properties provided by complex/hypercomplex convolutions, while dropping the unnatural dimensionality constraints their algebra imposes. This is achieved by introducing a system that mimics the unique linear combination of input dimensions via the Hamilton product using a permutation function, as well as batch normalization and weight initialization for the system. We perform three experiments using three different network architectures to show that these novel vector map convolutions seem to capture all the benefits of complex and hyper-complex networks, such as their ability to capture internal latent relations, while avoiding the dimensionality restriction.

1. INTRODUCTION

While the large majority of work in the area of machine learning (ML) has been done using real-valued models, recently there has been an increase in use of complex and hyper-complex models (Trabelsi et al., 2018a; Parcollet et al., 2020) . These models have been shown to handle multidimensional data more effectively and require fewer parameters than their real-valued counterparts. For tasks with two dimensional input vectors, complex-valued neural networks (CVNNs) are a natural choice. For example in audio signal processing the magnitude and phase of the signal can be encoded as a complex number. Since CVNNs treat the magnitude and phase as a single entity, a single activation captures their relationship as opposed to real-valued networks. CVNNs have been shown to outperform or match real-valued networks, while sometimes at a lower parameter count (Trabelsi et al., 2018b; Aizenberg & Gonzalez, 2018) . However, most real world data has more than two dimensions such as color channels of images or anything in the realm of 3D space. The quaternion number system extends the complex numbers. These hyper-complex numbers are composed of one real and three imaginary components making them ideal for three or four dimensional data. Quaternion neural networks (QNNs) have enjoyed a surge in recent research and show promising results (Takahashi et al., 2017; Bayro-Corrochano et al., 2018; Gaudet & Maida, 2018; Parcollet et al., 2017a; b; 2018a; b; 2019) . Quaternion networks have been shown to be effective at capturing relations within multidimensional data of four or fewer dimensions. For example the red, green, and blue color image channels for image processing networks needs to capture the cross channel relationships of these colors as they contain important information to support good generalization (Kusamichi et al., 2004; Isokawa et al., 2003) . Real-valued networks treat the color channels as independent entities unlike quaternion networks. Parcollet et al. (2019) showed that a real-valued, encoder-decoder fails to reconstruct unseen color images due to it failing to capture local (color) and global (edges and shapes) features independently, while the quaternion encoder-decoder can do so. Their conclusion is that the Hamilton product of the quaternion algebra allows the quaternion network to encode the color relation since it treats the colors as a single entity. Another example is 3D spatial coordinates for robotic and human-pose estimation. Pavllo et al. (2018) showed improvement on short-term prediction on the Human3.6M dataset using a network that encoded rotations as quaternions over Euler angles. The prevailing view is that the main reason that these complex networks outperform real-valued networks is their underlying algebra which treats the multidimensional data as a single entity (Parcollet et al., 2019) . This allows the complex networks to capture the relationships between the dimensions without the trade-off of learning global features. However, only the Hamilton product seems to be needed to capture this property and the other aspects of the algebra are only imposing dimensionality constraints. Therefore, the present paper proposes: 1) to create a system that mimics the concepts of complex and hyper-complex numbers for neural networks, which treats multidimensional input as a single entity and incorporates weight sharing, but is not constrained to certain dimensions; 2) to increase their local learning capacity by introducing a learnable parameter inside the multidimensional dot product. Our experiments herein show that these novel vector map convolutions seem to capture all the benefits of complex and hyper-complex networks, while improving their ability to capture internal latent relations, and avoiding the dimensionality restriction.

2. MOTIVATION FOR VECTOR MAP CONVOLUTIONS

Nearly all data used in machine learning is multidimensional and, to achieve good performance models, must both capture the local relations within the input features (Tokuda et al., 2003; Matsui et al., 2004) , as well as non-local features, for example edges or shapes composed by a group of pixels. Complex and hyper-complex models have been shown to be able to both capture these local relations better than real-valued models, but also to do so at a reduced parameter count due to their weight sharing property. However, as stated earlier, these models are constrained to two or four dimensions. Below we detail the work done showing how hyper-complex models capture these local features as well as the motivation to generalize them to any number of dimensions. Consider the most common method for representing an image, which is by using three 2D matrices where each matrix corresponds to a color channel. Traditional real-valued networks treat this input as a group of uni-dimensional elements that may be related to one another, but not only does it need to try to learn that relation, it also needs to try to learn global features such as edges and shapes. By encoding the color channels into a quaternion, each pixel is treated as a whole entity whose color components are strongly related. It has been shown that the quaternion algebra is responsible for allowing QNNs to capture these local relations. For example, Parcollet et al. (2019) showed that a real-valued, encoder-decoder fails to reconstruct unseen color images due to it failing to capture local (color) and global (edges and shapes) features independently, while the quaternion encoder-decoder can do so. Their conclusion is that the Hamilton product of the quaternion algebra allows the quaternion network to encode the color relation since it treats the colors as a single entity. The Hamilton product forces a different linear combination of the internal elements to create each output element. This is seen in Fig. 1 from Parcollet et al. (2018a) , which shows how a real-valued model looks when converted to a quaternion model. Notice that the real-valued model treats local and global weights at the same level, while the quaternion model learns these local relations during the Hamilton product. This is because each output unit shares the weights and are therefore forced to discover joint correlations within the input dimensions. The weight sharing property can also be seen where each element of the weight is used four times, reducing the parameter count by a factor of four from the real-valued model. The advantages of hyper-complex networks on multidimensional data seems clear, but what about niche cases where there are higher dimensions than four? Examples include applications where one needs to ingest extra channels of information in addition to RGB for image processing, like satellite images which have several bands. To overcome this limitation we introduce vector map convolutions, which attempt to generalize the benefits of hyper-complex networks to any number of dimensions. We also add a learnable set of parameters that modify the linear combination of internal elements to allow the model to decide how important each dimension may be in calculating others.

3. VECTOR MAP COMPONENTS

This section will include the work done to obtain a working vector map network. This includes the vector map convolution operation and the weight initialization used.

3.1. VECTOR MAP CONVOLUTION

Vector map convolutions use a similar mechanism to that of complex (Trabelsi et al., 2018b) and quaternion (Gaudet & Maida, 2018) convolutions but drops the other constraints imposed by the hyper-complex algebra. We will begin by observing the quaternion valued layer from Fig. 1 . Our goal is to capture the properties of weight sharing and each output axis being composed of a linear combination of all the input axes, but for an arbitrary number of dimensions D vm . For the derivation we will choose D vm = N . Let V n in = [v 1 , v 2 , . . . , v n ] be an N dimensional input vector and W n = [w 1 , w 2 , . . . , w n ] be an N dimensional weight vector. Note that for the complex and quaternion case the output vector is a set of different linear combinations where each input vector is multiplied by each weight vector element a total of

