REMOVING DIMENSIONAL RESTRICTIONS ON COMPLEX/HYPER-COMPLEX CONVOLUTIONS

Abstract

It has been shown that the core reasons that complex and hypercomplex valued neural networks offer improvements over their real-valued counterparts is the fact that aspects of their algebra forces treating multi-dimensional data as a single entity (forced local relationship encoding) with an added benefit of reducing parameter count via weight sharing. However, both are constrained to a set number of dimensions, two for complex and four for quaternions. These observations motivate us to introduce novel vector map convolutions which capture both of these properties provided by complex/hypercomplex convolutions, while dropping the unnatural dimensionality constraints their algebra imposes. This is achieved by introducing a system that mimics the unique linear combination of input dimensions via the Hamilton product using a permutation function, as well as batch normalization and weight initialization for the system. We perform three experiments using three different network architectures to show that these novel vector map convolutions seem to capture all the benefits of complex and hyper-complex networks, such as their ability to capture internal latent relations, while avoiding the dimensionality restriction.

1. INTRODUCTION

While the large majority of work in the area of machine learning (ML) has been done using real-valued models, recently there has been an increase in use of complex and hyper-complex models (Trabelsi et al., 2018a; Parcollet et al., 2020) . These models have been shown to handle multidimensional data more effectively and require fewer parameters than their real-valued counterparts. For tasks with two dimensional input vectors, complex-valued neural networks (CVNNs) are a natural choice. For example in audio signal processing the magnitude and phase of the signal can be encoded as a complex number. Since CVNNs treat the magnitude and phase as a single entity, a single activation captures their relationship as opposed to real-valued networks. CVNNs have been shown to outperform or match real-valued networks, while sometimes at a lower parameter count (Trabelsi et al., 2018b; Aizenberg & Gonzalez, 2018) . However, most real world data has more than two dimensions such as color channels of images or anything in the realm of 3D space. The quaternion number system extends the complex numbers. These hyper-complex numbers are composed of one real and three imaginary components making them ideal for three or four dimensional data. Quaternion neural networks (QNNs) have enjoyed a surge in recent research and show promising results (Takahashi et al., 2017; Bayro-Corrochano et al., 2018; Gaudet & Maida, 2018; Parcollet et al., 2017a; b; 2018a; b; 2019) . Quaternion networks have been shown to be effective at capturing relations within multidimensional data of four or fewer dimensions. For example the red, green, and blue color image channels for image processing networks needs to capture the cross channel relationships of these colors as they contain important information to support good generalization (Kusamichi et al., 2004; Isokawa et al., 2003) . Real-valued networks treat the color channels as independent entities unlike quaternion networks. Parcollet et al. (2019) showed that a real-valued, encoder-decoder fails to reconstruct unseen color images due to it failing to capture local (color) and global (edges and shapes) features independently, while the quaternion encoder-decoder can do so. Their conclusion is that the Hamilton product of the quaternion algebra allows the quaternion network to encode the color relation since it treats the colors as a single entity. Another example is 3D spatial coordinates for robotic and human-pose estimation. Pavllo et al. (2018) showed improvement on short-term prediction on the Human3.6M dataset using a network that encoded rotations as quaternions over Euler angles. The prevailing view is that the main reason that these complex networks outperform real-valued networks is their underlying algebra which treats the multidimensional data as a single entity (Parcollet et al., 2019) . This allows the complex networks to capture the relationships between the dimensions without the trade-off of learning global features. However, only the Hamilton product seems to be needed to capture this property and the other aspects of the algebra are only imposing dimensionality constraints. Therefore, the present paper proposes: 1) to create a system that mimics

