CLIFFORD NEURAL LAYERS FOR PDE MODELING

Abstract

Partial differential equations (PDEs) see widespread use in sciences and engineering to describe simulation of physical processes as scalar and vector fields interacting and coevolving over time. Due to the computationally expensive nature of their standard solution methods, neural PDE surrogates have become an active research topic to accelerate these simulations. However, current methods do not explicitly take into account the relationship between different fields and their internal components, which are often correlated. Viewing the time evolution of such correlated fields through the lens of multivector fields allows us to overcome these limitations. Multivector fields consist of scalar, vector, as well as higher-order components, such as bivectors and trivectors. Their algebraic properties, such as multiplication, addition and other arithmetic operations can be described by Clifford algebras. To our knowledge, this paper presents the first usage of such multivector representations together with Clifford convolutions and Clifford Fourier transforms in the context of deep learning. The resulting Clifford neural layers are universally applicable and will find direct use in the areas of fluid dynamics, weather forecasting, and the modeling of physical systems in general. We empirically evaluate the benefit of Clifford neural layers by replacing convolution and Fourier operations in common neural PDE surrogates by their Clifford counterparts on 2D Navier-Stokes and weather modeling tasks, as well as 3D Maxwell equations. For similar parameter count, Clifford neural layers consistently improve generalization capabilities of the tested neural PDE surrogates. Source code for our PyTorch implementation is available at https://microsoft.github.io/cliffordlayers/ We introduce important mathematical concepts and discuss three Clifford algebras, Cl 2,0 (R), Cl 0,2 (R), Cl 3,0 (R), which we later use for the layers introduced in Section 3. A more detailed introduction as well as connections to complex numbers and quaternions is given in Appendix A.

1. INTRODUCTION

Most scientific phenomena are described by the evolution and interaction of physical quantities over space and time. The concept of fields is one widely used construct to continuously parameterize these quantities over chosen coordinates (McMullin, 2002) . Prominent examples include (i) fluid mechanics, which has applications in domains ranging from mechanical and civil engineering, to geophysics and meteorology, and (ii) electromagnetism, which provides mathematical models for electric, optical, or radio technologies. The underlying equations of these examples are famously described in various forms of the Navier-Stokes equations and Maxwell's equations. For the majority of these equations, solutions are analytically intractable, and obtaining accurate predictions necessitates falling back on numerical approximation schemes often with prohibitive computation costs. Deep learning's success in various fields has led to a surge of interest in scientific applications, especially at augmenting and replacing numerical solving schemes in fluid dynamics with neural networks (Li et al., 2020; Kochkov et al., 2021; Lu et al., 2021; Rasp & Thuerey, 2021; Keisler, 2022; Weyn et al., 2020; Sønderby et al., 2020; Pathak et al., 2022) . Taking weather simulations as our motivating example to ground our discussion, two different kinds of fields emerge: scalar fields such as temperature or humidity, and vector fields such as wind velocity or pressure gradients. Current deep learning based approaches treat different vector field Figure 1 : Fields of the Earth's shallow water model. Vector components of the wind velocities (right) are strongly related, i.e. they form a vector field. Additionally, the wind vector field and the scalar pressure field (left) are related since the gradient of the pressure field causes air movement and subsequently influences the wind components. We therefore aim to describe scalar and vector field as one multivector field, which models the dependencies correctly. components the same as scalar fields, and stack all scalar fields along the channel dimension, thereby omitting the geometric relations between different components, both within vector fields as well as between individual vector and scalar fields. This practice leaves out important inductive bias information present in the input data. For example, wind velocities in the xand ydirections are strongly related, i.e. they form a vector field. Additionally, the wind vector field and the scalar pressure field are related since the gradient of the pressure field causes air movement and subsequently influences the wind components. In this work, we therefore build neural PDE surrogates which model the relation between different fields (e.g. wind and pressure field) and field components (e.g. xand ycomponent of the wind velocities). Figure 1 shows an example of a wind vector field as per the Earth's shallow water model in two dimensions, and the related scalar pressure field. Clifford algebras (Suter, 2003; Hestenes, 2003; 2012; Dorst et al., 2010; Renaud, 2020) are at the core intersection of geometry and algebra, introduced to simplify spatial and geometrical relations between many mathematical concepts. For example, Clifford algebras naturally unify real numbers, vectors, complex numbers, quaternions, exterior algebras, and many more. Most notably, in contrast to standard vector analysis where primitives are scalars and vectors, Clifford algebras have additional spatial primitives for representing plane and volume segments. An expository example is the crossproduct of two vectors in 3 dimensions, which naturally translates to a plane segment spanned by these two vectors. The cross product is often represented as a vector due to its 3 independent components, but the cross product has a sign flip under reflection that a true vector does not. In Clifford algebras, different spatial primitives can be summarized into objects called multivectors, as illustrated in Figure 2 . In this work, we replace operations over feature fields in deep learning architectures by their Clifford algebra counterparts, which operate on multivector feature fields. Operations on, and mappings between multivectors are defined by Clifford algebras. For example, we will endow a convolutional kernel with multivector components, such that it can convolve over multivector feature maps. 2 BACKGROUND: CLIFFORD ALGEBRAS Clifford algebras. Consider the vector space R n with standard Euclidean product ⟨., .⟩, where n = p + q, and p and q are non-negative integers. A real Clifford algebra Cl p,q (R) is an associative algebrafoot_0 generated by p + q orthonormal basis elements e 1 , . . . , e p+q of the generating vector space R n , such that the following quadratic relations hold: efoot_1 i = +1 for 1 ≤ i ≤ p; e 2 j = -1 for p < j ≤ p + q; e i e j = -e j e i for i ̸ = j .           1 e1 e2 e3 e1e2 e3e1 e2e3 e1e2e3            (1) The pair (p, q) is called the signature and defines a Clifford algebra Cl p,q (R), together with the basis elements that span the vector space G p+q of Cl p,q (R). Vector spaces of Clifford algebras have scalar elements and vector elements, but can also have elements consisting of multiple basis elements of the generating vector space R n , which can be interpreted as plane and volume segments. Exemplary low-dimensional Clifford algebras are: (i) Cl 0,0 (R) which is a one-dimensional algebra that is spanned by the basis element {1} and is therefore isomorphic to R, the field of real numbers; (ii) Cl 0,1 (R) which is a two-dimensional algebra with vector space G 1 spanned by {1, e 1 } where the basis vector e 1 squares to -1, and is therefore isomorphic to C, the field of complex numbers; (iii) Cl 0,2 (R) which is a 4-dimensional algebra with vector space G 2 spanned by {1, e 1 , e 2 , e 1 e 2 }, where e 1 , e 2 , e 1 e 2 all square to -1 and anti-commute. Thus, Cl 0,2 (R) is isomorphic to the quaternions H. Grade, dual, geometric product. The grade of a Clifford algebra basis element is the dimension of the subspace it represents. For example, the basis elements {1, e 1 , e 2 , e 1 e 2 } of the vector space G 2 of the Clifford algebra Cl 2,0 (R) have the grades {0, 1, 1, 2}. Using the concept of grades, we can divide Clifford algebras into linear subspaces made up of elements of each grade. The grade subspace of smallest dimension is M 0 , the subspace of all scalars (elements with 0 basis vectors of the generating vector space). Elements of M 1 are called vectors, elements of M 2 are bivectors, and so on. In general, a vector space G p+q of a Clifford algebra Cl p,q (R) can be written as the direct sum of all of these subspaces: G p+q = M 0 ⊕ M 1 ⊕ . . . ⊕ M p+q . The elements of a Clifford algebra are called multivectors, containing elements of subspaces, i.e. scalars, vectors, bivectors, . . . , k-vectors. The basis element with the highest grade is called the pseudoscalar 2 , which in R 2 corresponds to the bivector e 1 e 2 , and in R 3 to the trivector e 1 e 2 e 3 . The dual a * of a multivector a is defined as a * = ai p+q , where i p+q represents the respective pseudoscalar of the Clifford algebra. This definition allows us to relate different multivectors to each other, which is a useful property when defining Clifford Fourier transforms. For example, for Clifford algebras in R 2 the dual of the scalar is the bivector, and in R 3 , the dual of the scalar is the trivector. Finally, the geometric product is a bilinear operation on multivectors. For arbitrary multivectors a, b, c ∈ G p+q , and scalar λ, the geometric product has the following properties: (i) closure, i.e. ab ∈ G p+q , (ii) associativity, i.e. (ab)c = a(bc); (iii) commutative scalar multiplication, i.e. λa = aλ; (iv) distributive over addition, i.e. a(b + c) = ab + ac. The geometric product is in general non-commutative, i.e. ab ̸ = ba. Note that Equation 1 describe the geometric product specifically between basis elements of the generating vector space. Clifford algebras Cl 2,0 (R) and Cl 0,2 (R). The 4-dimensional vector spaces of these Clifford algebras have the basis vectors {1, e 1 , e 2 , e 1 e 2 } where e 1 , e 2 square to +1 for Cl 2,0 (R) and to -1 for Cl 0,2 (R). For Cl 2,0 (R), the geometric product of two multivectors a = a 0 + a 1 e 1 + a 2 e 2 + a 12 e 1 e 2 and b = b 0 + b 1 e 1 + b 2 e 2 + b 12 e 1 e 2 is given by: ab = (a 0 b 0 + a 1 b 1 + a 2 b 2 -a 12 b 12 )1 + (a 0 b 1 + a 1 b 0 -a 2 b 12 + a 12 b 2 )e 1 + (a 0 b 2 + a 1 b 12 + a 2 b 0 -a 12 b 1 )e 2 + (a 0 b 12 + a 1 b 2 -a 2 b 1 + a 12 b 0 )e 1 e 2 , which can be derived by collecting terms that multiply the same basis elements, see Appendix A. A vector x = (x 1 , x 2 ) ∈ R 2 with standard Euclidean product ⟨., .⟩ can be related to x 1 e 1 + x 2 e 2 ∈ R 2 ⊂ G 2 . Clifford multiplication of two vectors x, y ∈ R 2 ⊂ G 2 yields the geometric product xy: xy = (x 1 e 1 + x 2 e 2 )(y 1 e 1 + y 2 e 2 ) = x 1 y 1 e 2 1 + x 2 y 2 e 2 2 + x 1 y 2 e 1 e 2 + x 2 y 1 e 2 e 1 = ⟨x, y⟩ + x ∧ y , where ∧ is the exterior or wedge product. The asymmetric quantity x ∧ y = -y ∧ x is associated with the bivector, which can be interpreted as an oriented plane segment as shown in Figure 3 . A unit bivector i 2 , spanned by the (orthonormal) basis vectors e 1 and e 2 is determined by the product: i 2 = e 1 e 2 = ⟨e 1 , e 2 ⟩ =0 + e 1 ∧ e 2 = -e 2 ∧ e 1 = -e 2 e 1 , which if squared yields i 2 2 = -1. Thus, i 2 represents a geometric √ -1. From Equation 4, it follows that e 2 = e 1 i 2 = -i 2 e 1 and e 1 = i 2 e 2 = -e 2 i 2 . Using the pseudoscalar i 2 , the dual of a scalar is a bivector and the dual of a vector is again a vector. The dual pairs of the base vectors are 1 ↔ e 1 e 2 and e 1 ↔ e 2 . For Cl 2,0 (R), these dual pairs allow us to write an arbitrary multivector a as a = a 0 + a 1 e 1 + a 2 e 2 + a 12 e 1 e 2 = 1 a 0 + a 12 i 2 spinor part +e 1 a 1 + a 2 i 2 vector part , which can be regarded as two complex-valued parts: the spinorfoot_2 part, which commutes with the base element 1, i.e. 1i 2 = i 2 1, and the vector part, which anti-commutes with the respective base element e 1 , i.e. e 1 i 2 = e 1 e 1 e 2 = -e 1 e 2 e 1 = -i 2 e 1 . For Cl(0, 2)(R), the vector part changes to e 1 a 1 -a 2 i 2 . This decomposition will be the basis for Clifford Fourier transforms. The Clifford algebra Cl 0,2 (R) is isomorphic to the quaternions H, which are an extension of complex numbers and are commonly written in the literature as a + bî + cȷ + d k. Quaternions also form a 4-dimensional algebra spanned by {1, î, ȷ, k}, where î, ȷ, k all square to -1. The algebra isomorphism to Cl 0,2 (R) is easy to verify since e 1 , e 2 , e 1 e 2 all square to -1 and anti-commute. The basis element 1 is called the scalar and the basis elements î, ȷ, k are called the vector part of a quaternion. Quaternions have practical uses in applied mathematics, particularly for expressing rotations, which we will use to define the rotational Clifford convolution layer in Section 3. Clifford algebra Cl 3,0 (R). The 8-dimensional vector space G 3 of the Clifford algebra Cl 3,0 (R) has the basis vectors {1, e 1 , e 2 , e 3 , e 1 e 2 , e 3 e 1 , e 2 e 3 , e 1 e 2 e 3 }, i.e. it consists of one scalar, three vectors {e 1 , e 2 , e 3 }, three bivectors {e 1 e 2 , e 3 e 1 , e 2 e 3 }foot_3 , and one trivector e 1 e 2 e 3 . The trivector is the pseudoscalar i 3 of the algebra. The geometric product of two multivectors is defined analogously to the geometric product of Cl 2,0 (R), see Appendix A. The dual pairs of Cl 3,0 (R) are: 1 ↔ e 1 e 2 e 3 = i 3 , e 1 ↔ e 2 e 3 , e 2 ↔ e 3 e 1 , and e 3 ↔ e 1 e 2 . An intriguing example of the duality of the multivectors of Cl 3,0 (R) emerges when writing the expression of the electromagnetic field F in terms of an electric vector field E and a magnetic vector field B, such that F = E + Bi 3 , where E = E x e 1 +E y e 2 +E z e 3 and B = B x e 1 +B y e 2 +B z e 3 . In this way the electromagnetic field F decomposes into electric vector and magnetic bivector parts via the pseudoscalar i 3 (Hestenes, 2003) . For example, for the base component B x e 1 of B it holds that B x e 1 i 3 = B x e 1 e 1 e 2 e 3 = B x e 2 e 3 which is a bivector and the dual to the base component E x e 1 of E. Consequently, the multivector representing F consists of three vectors (the electric field components) and three bivectors (the magnetic field components multiplied by i 3 ). This viewpoint gives Clifford neural layers a natural advantage over their default counterparts as we will see in Section 4.

3. CLIFFORD NEURAL LAYERS

Here, we introduce 2D Clifford convolution and 2D Clifford Fourier transform layers. Appendix B contains extensions to 3 dimensions. In Appendices B, D, related literature is discussed, most notably complex (Bassey et al., 2021) and quaternion neural networks (Parcollet et al., 2020) . Clifford CNN layers. Regular convolutional neural network (CNN) layers take as input feature maps f : Z 2 → R cin and convolvefoot_4 them with a set of c out filters {w i } cout i=1 with w i : Z 2 → R cin : [f ⋆ w i ](x) = y∈Z 2 f (y), w i (y -x) = y∈Z 2 cin j=1 f j (y)w i,j (y -x) , which can be interpreted as an inner product of input feature maps with the corresponding filters at every point y ∈ Z 2 . By applying c out filters, the output feature maps can be interpreted as c out dimensional feature vectors at every point y ∈ Z 2 . We now extend CNN layers such that the element-wise product of scalars f j (y)w i,j (y-x) is replaced by the geometric product of multivector inputs and multivector filters f j (y)w i,j (y -x), where the chosen signature of Cl is reflected in the geometric product. We replace the feature maps f : Z 2 → R cin by multivector feature maps f : Z 2 → (G 2 ) cin and convolve them with a set of c out multivector filters {w i } cout i=1 : Z 2 → (G 2 ) cin : f ⋆ w i (x) = y∈Z 2 cin j=1 f j (y)w i,j (y -x) f j w i,j : G 2 ×G 2 →G 2 . ( ) Geometric product

Multivector input fields

Multivector kernels Multivector output fields  f (x) w(x) f ′ (x) 0 , b 1 , b 2 , b 12 } → {w i,j 0 , w i,j 1 , w i,j 2 , w i,j 12 } correspond to 4 different kernels representing one 2D multivector kernel, i.e. 4 different convolution layers, and {a 0 , a 1 , a 2 , a 12 } → {f j 0 , f j 1 , f j 2 , f j 12 } correspond to the scalar, vector and bivector parts of the input multivector field. The channels of the different layers represent different stacks of scalars, vectors, and bivectors. Analogously, we can implement a Cl(3, 0)(R) CNN layer using Equation 42in Appendix B. A schematic sketch of a Clifford convolution layer is shown in Figure 4 . Rotational Clifford CNN layers. Here we introduce an alternative parameterization to the Clifford CNN layer introduced in Equation 7 by using the isomorphism of the Clifford algebra Cl 0,2 (R) to quaternions. We take advantage of the fact that a quaternion rotation can be realized by a matrix multiplication (Jia, 2008; Kuipers, 1999; Schwichtenberg, 2015) . Using the isomorphism, we can represent the feature maps f j and filters w i,j as quaternions: f j = f j 0 + f j 1 î + f j 2 ȷ + f j 3 k and w i,j = w i,j 0 + w i,j 1 î + w i,j 2 ȷ + w i,j 3 k6 . We can now devise an alternative parameterization of the product between the feature map f j and w i,j . To be more precise, we introduce a composite operation that results in a scalar quantity and a quaternion rotation, where the latter acts on the vector part of the quaternion f j and only produces nonzero expansion coefficients for the vector part of the quaternion output. A quaternion rotation w i,j f j (w i,j ) -1 acts on the vector part (î, ȷ, k) of f j , and can be algebraically manipulated into a vector-matrix operation R i,j f j , where R i,j : H → H is built up from the elements of w i,j (Kuipers, 1999). In other words, one can transform the vector part (î, ȷ, k) of f j ∈ H via a rotation matrix R i,j that is built from the scalar and vector part (1, î, ȷ, k) of w i,j ∈ H. Altogether, a rotational multivector filter {w i rot } cout i=1 : Z 2 → (G 2 ) cin acts on the feature map f j through a rotational transformation R i,j (w i,j rot,0 , w i,j rot,1 , w i,j rot,2 , w i,j rot,12 ) acting on vector and bivector parts of the multivector feature map f : Z 2 → (G 2 ) cin , and an additional scalar response of the multivector filters: f ⋆ w i rot (x) = y∈Z 2 cin j=1 f j (y)w i,j rot (y -x)) 0 scalar output +R i,j (y -x) • Ñ f j 1 (y) f j 2 (y) f j 12 (y) é , where f j (y)w i,j rot (y -x)) 0 = f j 0 w i,j rot,0 -f j 1 w i,j rot,1 -f j 2 w i,j rot,2 -f j 12 w i,j rot,12 , i.e., the scalar output of the geometric product of Cl 0,2 (R) as in Equation 34. A detailed description of the rotational multivector filters R i,j (y -x) is outlined in Appendix B. While in principle the Clifford CNN layer in Equation 7 and the rotational Clifford CNN layer in Equation 8 are equally flexible, our experiments in Section 4 show that rotational Clifford CNN layers lead to better performance. Clifford convolutions satisfy the property of equivariance under translation of the multivector inputs, as shown in theorem 1 in Appendix B. Analogous to Theorem 1, translation equivariance can be derived for rotational Clifford CNN layers. Clifford Fourier layers. The discrete Fourier transform of an n-dimensional complex signal f (x) = f (x 1 , . . . , x n ) : R n → C at M 1 × . . . × M n grid points is defined as: F{f }(ξ 1 , . . . , ξ n ) = M1 m1=0 . . . Mn mn=0 f (m 1 , . . . , m n ) • e -2πi• m 1 ξ 1 M 1 +...+ mn ξn Mn , where (ξ 1 , . . . , ξ n ) ∈ Z M1 . . . × . . . Z Mn . In Fourier Neural Operators (FNO) (Li et al., 2020) , discrete Fourier transforms on real-valued input fields and respective back-transforms -implemented as Fast Fourier Transforms on real-valued inputs (RFFTs)foot_6 -are interleaved with a weight multiplication by a complex weight matrix of shape c in ×c out for each mode, which results in a complex-valued weight tensor of the form W ∈ C cin×cout×(ξ max 1 ×...×ξ max n ) , where Fourier modes above cut-off frequencies (ξ max 1 , . . . , ξ max n ) are set to zero. Additionally, a residual connection is usually implemented as convolution layer with kernel size 1. In Figure 5a , a sketch of an FNO layer is shown. For Cl(2, 0)(R), the Clifford Fourier transform (Ebling & Scheuermann, 2005; Ebling, 2006; Hitzer, 2012) for multivector valued functions f (x) : R 2 → G 2 and vectors x, ξ ∈ R 2 is defined as: f (ξ) = F{f }(ξ) = 1 2π R2 f (x)e -2πi2⟨x,ξ⟩ dx , ∀ξ ∈ R 2 , ( ) provided that the integral exists. In contrast to standard Fourier transforms, f (x) and f (ξ) represent multivector fields in the spatial and the frequency domain, respectively. Furthermore, i 2 = e 1 e 2 is used in the exponent. Inserting the definition of multivector fields, we can rewrite Equation 10 as: F{f }(ξ) = 1 2π R2 1 Å f 0 (x) + f 12 (x)i 2 spinor part ã + e 1 Å f 1 (x) + f 2 (x)i 2 vector part ã e -2πi2⟨x,ξ⟩ dx = 1 F Å f 0 (x) + f 12 (x)i 2 ã (ξ) + e 1 F Å f 1 (x) + f 2 (x)i 2 ã (ξ) . We obtain a Clifford Fourier transform by applying two standard Fourier transforms to the dual pairs f 0 = f 0 (x) + f 12 (x)i 2 and f 1 = f 1 (x) + f 2 (x)i 2 , which both can be treated as a complexvalued signals f 0 , f 1 : R 2 → C. Consequently, f (x) can be understood as an element of C 2 . The 2D Clifford Fourier transform is the linear combination of two classical Fourier transforms. Discrete versions of Equation 11 are obtained analogously to Equation 9, see Appendix B. Similar to FNO, multivector weight tensors W ∈ (G 2 ) cin×cout×(ξ max 1 ×ξ max 2 ) are applied, where again Fourier modes above cut-off frequencies (ξ max 1 , ξ max 2 ) are set to zero. In doing so, we point-wise modify the Clifford Fourier modes f (ξ) = F{f }(ξ) = f0 (ξ)+ f1 (ξ)e 1 + f2 (ξ)e 2 + f12 (ξ)e 12 via the geometric product. The Clifford Fourier modes follow naturally when combining spinor and vector parts of Equation 11. Finally, the residual connection is replaced by a Clifford convolution with multivector kernel k. A schematic sketch is shown in Figure 5b . For Cl(3, 0)(R), Clifford Fourier transforms follow a similar elegant construction, where we apply four separate Fourier transforms to f 0 (x) = f 0 (x) + f 123 (x)i 3 f 1 (x) = f 1 (x) + f 23 (x)i 3 f 2 (x) = f 2 (x) + f 31 (x)i 3 f 3 (x) = f 3 (x) + f 12 (x)i 3 , i.e. scalar/trivector and vector/bivector components are combined into complex fields and then subjected to a Fourier transform.

4. EXPERIMENTS

We assess Clifford neural layers for different architectures in three experimental settings: the incompressible Navier-Stokes equations, shallow water equations for weather modeling, and 3dimensional Maxwell's equations. We replace carefully designed baseline architectures by their The real valued Fast Fourier transform (RFFT) over real valued scalar input fields f (x) is replaced by the complex Fast Fourier transform (FFT) over the complex valued dual parts v(x) and s(x) of multivector fields f (x). Pointwise multiplication in the Fourier space via complex weight tensor W is replaced by the geometric product in the Clifford Fourier space via multivector weight tensor W . Additionally, the convolution path is replaced by Clifford convolutions with multivector kernels w. R F F T F(f (x)) RFFT -1 f (x) f * (x) f † (x) W w (a) FNO layer f (x) v(x) s(x) f * (x) v * (x) s * (x) FFT F(f (x)) FFT FFT -1 FFT -1 f † (x) W w (b) Clifford FNO layer Clifford counterparts. Baseline ResNet architectures comprise 8 residual blocks, each consisting of two convolution layers with 3 × 3 kernels, shortcut connections, group normalization (Wu & He, 2018) , and GeLU activation functions (Hendrycks & Gimpel, 2016) . Baseline 2-dimensional Fourier Neural Operators (FNOs) consist of 8 (4) FNO blocks, GeLU activations and no normalization scheme, using 16 (8) Fourier modes for the 2and 3-dimensional equations, respectively. For Clifford networks, we change convolutions and Fourier transforms to their respective Clifford operation, and substitute normalization techniques and activation functions with Clifford counterparts, keeping the number of parameters similar. We evaluate different training set sizes, and report losses for scalar and vector fields. All datasets share the common trait of containing multiple input and output fields. More precisely, one scalar and one 2-dimensional vector field in case of the Navier-Stokes and the shallow water equations, and a 3-dimensional (electric) vector field and its dual (magnetic) bivector field in case of the Maxwell's equations. Navier-Stokes in 2D. The incompressible Navier-Stokes equations (Temam, 2001) conserve the velocity flow fields v : X → R 2 where X ∈ R 2 via: ∂v ∂t = -v • ∇v + µ∇ 2 v -∇p + f , ∇ • v = 0 , ( ) where v • ∇v is the convection, i.e. the rate of change of v along v, µ∇ 2 v the viscosity, i.e. the diffusion or net movement of v, ∇p the internal pressure and f an external force, which in our case is a buoyancy force. An additional incompressibility constraint ∇ • v = 0 yields mass conservation of the Navier-Stokes equations. In addition to the velocity field, we introduce a scalar field representing a scalar quantity, i.e. smoke, that is being transported via the velocity field. The scalar field is advected by the vector field, i.e. as the vector field changes, the scalar field is transported along with it, whereas the scalar field influences the vector field only via an external force term. We call this weak coupling between vector and scalar fields. We implement the 2D Navier-Stokes equation using ΦFlowfoot_7 (Holl et al., 2020) , obtaining data on a grid with spatial resolution of 128 × 128 (∆x = 0.25, ∆y = 0.25), and temporal resolution of ∆t = 1.5 s. Results for one-step loss and rollout loss on the test set are shown in Figure 7a . For ResNet-like architectures, we observe that both CResNet and CResNet rot improve upon the ResNet baseline. Additionally, we observe that rollout losses are also lower for the two Clifford based architectures, which we attribute to better and more stable models that do not overfit to one-step predictions so easily. Lastly, while in principle CResNet and CResNet rot based architectures are equally flexible, CResNet rot ones in general perform better than CResNet ones. For FNO and respective Clifford Fourier based (CFNO) architectures, the loss is in general much lower than for ResNet based architectures. CFNO architectures improve upon FNO architectures for all dataset sizes, and for one-step as well as rollout losses. Shallow water equations. This set of coupled equations (Vreugdenhil, 1994) can be derived from integrating the incompressible Navier-Stokes equations, in cases where the horizontal length scale is much larger than the vertical length scale. As such, the equations model a thin layer of fluid of constant density in hydrostatic balance, bounded from below by the bottom topography and from above by a free surface via 3 coupled PDEs, describing the velocity in xdirection, the velocity in the ydirection, and the scalar pressure field. The shallow water equations can be therefore be used as simplified weather model, as done in this work and exemplified in Figure 1 . The relation between vector and scalar components is relatively strong (strong coupling due to the 3-coupled PDEs). We obtain data for the 2D shallow water equations on a grid with spatial resolution of 192 × 96 (∆x = 1.875 • , ∆y = 3.75 • ), and temporal resolution of ∆t = 6 h. We observe similar results than for the Navier-Stokes experiments. For low number of trajectories, ResNet architectures seem to lack expressiveness, where arguably some data smoothing is learned first. Thus, ResNets need significantly more trajectories compared to (C)FNO architectures to obtain reasonable loss values, which seems to go hand in hand with Clifford layers gaining advantage. In general, performance differences between baseline and Clifford architectures are even more pronounced, which we attribute to the stronger coupling of the scalar and the vector fields. Maxwell's equations in matter in 3D. In isotropic media, Maxwell's equations (Griffiths, 2005) propagate solutions of the displacement field D, which is related to the electrical field via D = ϵ 0 ϵ r E, where ϵ 0 is the permittivity of free space and ϵ r is the permittivity of the medium, and the magnetization field H, which is related to the magnetic field B via H = µ 0 µ r B, where µ 0 is the permeability of free space and µ r is the permeability of the medium. The electromagnetic field F has the intriguing property that the electric field E and the magnetic field B are dual pairs, thus F = E + Bi 3 , i.e. strong coupling between the electric field and its dual (bivector) magnetic field. This duality also holds for D and H. Concretely, the fields of interest are the vector-valued D-field (D x , D y , D z ) and the vector-valued H-field (H x , H y , H z ). We obtain data for the 3D Maxwell's equations on a grid with spatial resolution of 32 × 32 × 32 (∆x = ∆y = ∆z = 5 • 10 -7 m), and temporal resolution of ∆t = 50 s. We randomly place 18 different light sources outside a cube which emit light with different amplitude and different phase shifts, causing the resulting D and H fields to interfere. The wavelength of the emitted light is 10 -5 m. We test FNO based architectures and respective Clifford counterparts (CFNO). Due to the vector-bivector character of electric and magnetic field components, Maxwell's equations are an ideal playground to stress-test the inductive bias advantages of Clifford base architectures. Results for one-step loss and rollout loss on the test set are shown in Figure 8 . CFNO architectures improve upon FNO architectures, especially for low numbers of trajectories. Results demonstrate the much stronger inductive bias of Clifford based 3-dimensional Fourier layers, and their general applicability to 3-dimensional problems, which are structurally even more interesting than 2-dimensional ones.

5. CONCLUSION

We introduced Clifford neural layers that handle the various scalar (e.g. charge density), vector (e.g. electric field), bivector (magnetic field) and higher order fields as proper geometric objects organized as multivectors. This geometric algebra perspective allowed us to naturally generalize convolution and Fourier transformations to their Clifford counterparts, providing an elegant rule to design new neural network layers. The multivector viewpoint denotes an inductive bias advantage, leading to a better representation of the relationship between fields and their individual components, which is prominently demonstrated by the fact that our Clifford layers significantly outperformed equivalent standard neural PDE surrogates. Limitations. One limitation is the current speed of Fast Fourier Transform (FFT) operations on machine learning accelerators like GPUs. While an active area of research, current available versions of cuFFTfoot_8 kernels wrapped in PyTorch (Paszke et al., 2019) are not yet as heavily optimizedfoot_9 , especially for the gradient pass. In contrast to FNO layers, which operate on real-valued signals, Clifford Fourier layers use complex-valued FFT operations where the backward pass is approximately twice as slow. For similar parameter counts, inference times of FNO and CFNO networks are similar. Similar to Grassucci et al. (2021) who investigated the speed of geometric convolution layers, we found that Clifford convolutions are more parameter efficient since they share parameters among filters, with the downside that the net number of operations is larger, resulting in increased training times by a factor of about 2. Finally, from a PDE point of view, the presented approaches to obtain PDE surrogates are limited since the neural networks have to be retrained for different equation parameters or e.g. different ∆t. Future work. Besides modeling of PDEs, weather, and fluid dynamics, we see potential applications of Clifford layers for e.g. MRI or radar data, and for neural implicit representations (Xie et al., 2022; Rella et al., 2022) . Extensions towards graph networks and attention based models will be useful to explore. Furthermore, custom multivector GPU kernels can overcome many of the speed issues as the compute density of Clifford operations is much higher which is better for hardware accelerators (Hoffmann et al., 2020) . The use of a just-in-time compiled language with better array abstractions like Julia (Bezanson et al., 2017) could significantly simplify the interface. Finally, combining the ideas of multivector modeling together with various physics-informed neural network approaches (Raissi et al., 2019; Lutter et al., 2018; Gupta et al., 2019; Cranmer et al., 2020; Zubov et al., 2021) is an attractive next step.

REPRODUCIBILITY AND ETHICAL STATEMENT

Reproducibility statement. We have included error bars, and ablation studies wherever we found it necessary and appropriate. We have described our architectures in Section 4 and provided further implementation details in Appendix Section C. We have further include pseudocode for the newly proposed layers in Appendix Section B.6. We open-sourced our PyTorch implementation at https://microsoft.github.io/cliffordlayers/ for others to use. We aim to develop this codebase further in the future. Ethical statement. Neural PDE surrogates will play an important role in modeling many natural phenomena, and thus developing them further might enable us to achieve shortcuts or alternatives for computationally expensive simulations. For example, if used as such, PDE surrogates will potentially help to advance different fields of research, especially in the natural sciences. Examples related to this paper are fluid dynamics or weather modeling. Therefore, PDE surrogates might potentially be directly or indirectly related to reducing the carbon footprint. On the downside, relying on simulations always requires rigorous cross-checks and monitoring, especially when we "learn to simulate". 

A MATHEMATICAL BACKGROUND

This appendix supports Section 2 of the main paper. We give a more detailed explanation of real Clifford algebras and have a closer look at Cl 2,0 (R), Cl 0,2 (R), and Cl 3,0 (R). For a detailed introduction into Clifford algebras we recommend Suter (2003) ; Hestenes (2003; 2012) ; Dorst et al. (2010) ; Renaud (2020) A.1 CLIFFORD ALGEBRAS Vector spaces and algebras over a field. A vector space over a field F is a set V together with two binary operations that satisfy the axioms for vector addition and scalar multiplication. The axioms of addition ensure that if two elements of V get added together, we end up with another element of V . The elements of F are called scalars. Examples of a field F are the real numbers R and the complex numbers C. Although it is common practice to refer to the elements of a general vector space V as vectors, to avoid confusion we will reserve the usage of this term to the more specific case of elements of R n . As we will see below, general vector spaces can consist of more complicated, higher-order objects than scalars, vectors or matrices. An algebra over a field consists of a vector space V over a field F together with an additional bilinear law of composition of elements of the vector space, V × V → V , that is, if a and b are any two elements of V , then ab : V × V → V is an element of V , satisfying a pair of distribution laws: a(λ 1 b + λ 2 c) = λ 1 ab + λ 2 ac and (λ 1 a + λ 2 b)c = λ 1 ac + λ 2 bc for λ 1 , λ 2 ∈ F and a, b, c ∈ V . Note that general vector spaces don't have bilinear operations defined on their elements. Clifford algebras over R. In this manuscript we will focus on Clifford algebras over R. For a more general exposition on Clifford algebras over different fields the reader is referred to Lounesto (1986) . A real Clifford algebra is generated by the n-dimensional vector space R n through a set of relations that hold for the basis elements of the vector space R n . Let us denote the basis elements of R n with e 1 , ..., e n , and without loss of generality choose these basis elements to be mutually orthonormal. Taking two nonnegative integers p and q, such that p + q = n, then a real Clifford algebra Cl p,q (R) with the "signature" (p, q), is generated through the following relations that define how the bilinear product of the algebra operates on the basis elements of R n : e 2 i = +1 for 1 ≤ i ≤ p , e 2 j = -1 for p < j ≤ p + q , e i e j = -e j e i for i ̸ = j . Through these relations we can generate a basis for the vector space of the Clifford algebra, which we will denote with G. Equations 14 and 15 show that the product between two vectors yields a scalar. According to the aforementioned definition of an algebra over a field, a Clifford algebra with a vector space G is equipped with a bilinear product G × G → G, that combines two elements from the vector space G and yields another element of the same space G. Therefore, both scalars and vectors must be elements of the vector space G. Equation 16 shows that besides scalar and vector elements, higher order elements consisting of a combination of two basis elements, such as e i e j and e j e i , are also part of the vector space G. Finally, by combining Equations 14, 15, 16 we can create even higher order elements such as e i e j e k for i ̸ = j ̸ = k, or e 1 e 2 ...e p+q , which all must be part of the vector space G. In order to determine what the basis elements are that span the vector space G of Cl p,q (R), we note that elements e σ(1) e σ(2) ...e σ(k) and e 1 e 2 ...e k are related through a simple scalar multiplicative factor of plus or minus one, depending on the sign of the permutation σ. Therefore, it suffices to consider the unordered combinations of basis elements of R n : the basis of the vector space G is given by {1, e 1 , e 2 , ..., e p+q , e 1 e 2 , ..., e p+q-1 e p+q , ..., e 1 e 2 ...e p+q }. In summary, we have introduced two different vector spaces. First, the vector space R n which generates the Clifford algebra, and second the vector space G, which is the vector space spanned by the basis elements of the Clifford algebra Cl p,q (R). Convention is to denote the vector space of a real Clifford algebra with a superscript n of the dimension of the generating vector space, yielding G n for a generating vector space R n . Note that the dimension of the vector space G n is 2 n = 2 p+q . Exemplary low-dimensional Clifford algebras are: (i) Cl 0,0 (R) which is a one-dimensional algebra that is spanned by the vector {1} and is therefore isomorphic to R, the field of real numbers; (ii) Cl 0,1 (R) which is a two-dimensional algebra with vector space G 1 spanned by {1, e 1 } where the basis vector e 1 squares to -1, and is therefore isomorphic to C, the field of complex numbers; (iii) Cl 0,2 (R) which is a 4-dimensional algebra with vector space G 2 spanned by {1, e 1 , e 2 , e 1 e 2 }, where e 1 , e 2 square to -1 and anti-commute. Thus, Cl 0,2 (R) is isomorphic to the quaternions H.

Definition 1: Grade of Clifford algebra element

The grade of a Clifford algebra basis element is the dimension of the subspace it represents. For example, the basis elements {1, e 1 , e 2 , e 1 e 2 } of the Clifford algebras Cl 0,2 (R) and Cl 2,0 (R) have the grades {0, 1, 1, 2}. Using the concept of grades, we can divide the vector spaces of Clifford algebras into linear subspaces made up of elements of each grade. The grade subspace of smallest dimension is M 0 , the subspace of all scalars (elements with 0 basis vectors). Elements of M 1 are called vectors, elements of M 2 are bivectors, and so on. In general, the vector space G p+q of a Clifford algebra Cl p,q can be written as the direct sum of all of these subspaces: G p+q = M 0 ⊕ M 1 ⊕ . . . ⊕ M p+q . ( ) The elements of a Clifford algebra are called multivectors, containing elements of subspaces, i.e. scalars, vectors, bivectors, trivectors etc. The basis element with the highest grade is called the pseudoscalarfoot_10 , which in R 2 corresponds to the bivector e 1 e 2 , and in R 3 to the trivector e 1 e 2 e 3 . The pseudoscalar is often denoted with the symbol i p+q . From hereon, only multivectors will be denoted with boldface symbols. Geometric product. Using Equations 14, 15, 16, we have seen how basis elements of the vector space G p+q of the Clifford algebra are formed using basis elements of the generating vector space V . We now, look at how elements of G p+q are combined, i.e. how multivectors are bilinearly operated on. The geometric product is the bilinear operation on multivectors in Clifford algebras. For arbitrary multivectors a, b, c ∈ G p+q , and scalar λ the geometric product has the following properties: ab ∈ G p+q closure , ( ) (ab)c = a(bc) associativity , (19) λa = aλ commutative scalar multiplication , (20) a(b + c) = ab + ac distributive over addition . ( ) The geometric product is in general non-commutative, i.e. ab ̸ = ba. As we describe later, the geometric product is made up of two things: an inner product (that captures similarity) and exterior (wedge) product that captures difference.

Definition 2: Dual of a multivector

The dual a * of a multivector a is defined as: a * = ai p+q , where i p+q represents the respective pseudoscalar of the Clifford algebra. This definition allows us to relate different multivectors to each other, which is a useful property when defining Clifford Fourier transforms. For example, for Clifford algebras in R 2 the dual of a scalar is a bivector, and for the Clifford algebra R 3 the dual of a scalar is a trivector.

A.2 EXAMPLES OF LOW-DIMENSIONAL CLIFFORD ALGEBRAS

A.2.1 CLIFFORD ALGEBRA Cl 0,1 (R) The Clifford algebra Cl 0,1 (R) is a two-dimensional algebra with vector space G 1 spanned by {1, e 1 }, and where the basis vector e 1 squares to -1. Cl 0,1 (R) is thus algebra-isomorphic to C, the field of complex numbers. This becomes more obvious if we identify the basis element with the highest grade, i.e. e 1 , as the pseudoscalar i 1 which is the imaginary part of the complex numbers. The geometric product between two multivectors a = a 0 + a 1 e 1 and b = b 0 + b 1 e 1 is therefore also isomorphic to the product of two complex numbers:  ab = a 0 b 0 + a 0 b 1 e 1 + a 1 b 0 e 1 + a 1 b 1 e 1 e 1 = (a 0 b 0 -a 1 b 1 ) + (a 0 b 1 + a 1 b 0 ) e 1 . Using the relations e 1 e 1 = 1, e 2 e 2 = 1, and e i e j = -e j e i for i ̸ = j ∈ {e 1 , e 2 }, from which it follows that e 1 e 2 e 1 e 2 = -1, we obtain: ab = a 0 b 0 + a 1 b 1 + a 2 b 2 -a 12 b 12 + (a 0 b 1 + a 1 b 0 -a 2 b 12 + a 12 b 2 ) e 1 + (a 0 b 2 + a 1 b 12 + a 2 b 0 -a 12 b 1 ) e 2 + (a 0 b 12 + a 1 b 2 -a 2 b 1 + a 12 b 0 ) e 1 e 2 . ( ) A vector x ∈ R 2 ⊂ G 2 is identified with x 1 e 1 + x 2 e 2 ∈ R 2 ⊂ G 2 . Clifford multiplication of two vectors x, y ∈ R 2 ⊂ G 2 yields the geometric product xy: xy = (x 1 e 1 + x 2 e 2 )(y 1 e 1 + y 2 e 2 ) = x 1 y 1 e 2 1 + x 2 y 2 e 2 2 + x 1 y 2 e 1 e 2 + x 2 y 1 e 2 e 1 = ⟨x, y⟩ + x ∧ y , Inner product Outer/Wedge product The asymmetric quantity x∧y = -y ∧x is associated with the now often mentioned bivector, which can be interpreted as an oriented plane segment. Equation 26 can be rewritten to express the (symmetric) inner product and the (anti-symmetric) outer product in terms of the geometric product: x ∧ y = 1 2 (xy -yx) (27) ⟨x, y⟩ = 1 2 (xy + yx) . ( ) From the basis vectors of the vector space G 2 of the Clifford algebra Cl 2,0 (R), i.e. {1, e 1 , e 2 , e 1 e 2 }, probably the most interesting is e 1 e 2 . We therefore have a closer look the unit bivector i 2 = e 1 e 2 which is the plane spanned by e 1 and e 2 and determined by the geometric product: i 2 = e 1 e 2 = ⟨e 1 , e 2 ⟩ =0 + e 1 ∧ e 2 = -e 2 ∧ e 1 = -e 2 e 1 , where the inner product ⟨e 1 , e 2 ⟩ is zero due to the orthogonality of the base vectors. The bivector i 2 if squared yields i 2 2 = -1, and thus i 2 represents a true geometric √ -1. From Equation 29, it follows that e 2 = e 1 i 2 = -i 2 e 1 e 1 = i 2 e 2 = -e 2 i 2 . ( ) Using definition 2, the dual of a multivector a ∈ G 2 is defined via the bivector as i 2 a. Thus, the dual of a scalar is a bivector and the dual of a vector is again a vector. The dual pairs of the base vectors are 1 ↔ e 1 e 2 and e 1 ↔ e 2 . These dual pairs allow us to write an arbitrary multivector a as a = a 0 + a 1 e 1 + a 2 e 2 + a 12 e 12 , a = 1 a 0 + a 12 i 2 spinor part + e 1 a 1 + a 2 i 2 vector part , which can be regarded as two complex-valued parts: the spinor part, which commutes with i 2 and the vector part, which anti-commutes with i 2 . A.2.3 CLIFFORD ALGEBRA Cl 0,2 (R) The Clifford algebra Cl 0,2 (R) is a 4-dimensional algebra with vector space G 2 spanned by the basis vectors {1, e 1 , e 2 , e 1 e 2 } where e 1 , e 2 square to -1. The Clifford algebra Cl 0,2 (R) is algebraisomorphic to the quaternions H, which are commonly written in literature (Schwichtenberg, 2015) as a + bî + cȷ + d k, where the (imaginary) base elements î, ȷ, and k fulfill the relations: î2 = ȷ2 = -1 îȷ = k ȷî = - k k2 = îȷîȷ = -îȷȷî = îî = -1 . Quaternions also form a 4-dimensional algebra spanned by {1, î, ȷ, k}, where î, ȷ, k all square to -1. The basis element 1 is often called the scalar part, and the basis elements î, ȷ, k are called the vector part of a quaternion. The geometric product of two multivectors a = a 0 + a Using the relations e 1 e 1 = -1, e 2 e 2 = -1, and e i e j = -e j e i for i ̸ = j ∈ {e 1 , e 2 }, from which it follows that e 1 e 2 e 1 e 2 = -1, we obtain: ab = a 0 b 0 -a 1 b 1 -a 2 b 2 -a 12 b 12 + (a 0 b 1 + a 1 b 0 + a 2 b 12 -a 12 b 2 ) e 1 + (a 0 b 2 -a 1 b 12 + a 2 b 0 + a 12 b 1 ) e 2 + (a 0 b 12 + a 1 b 2 -a 2 b 1 + a 12 b 0 ) e 1 e 2 . ( ) A.2.4 CLIFFORD ALGEBRA Cl 3,0 (R) The Clifford algebra is a 8-dimensional algebra with vector space G 3 spanned by the basis vectors {1, e 1 , e 2 , e 3 , e 1 e 2 , e 1 e 3 , e 2 e 3 , e 1 e 2 e 3 }, i.e. one scalar, three vectors {e 1 , e 2 , e 3 }, three bivectors {e 1 e 2 , e 1 e 3 , e 2 e 3 }, and one trivector e 1 e 2 e 3 . The trivector is the pseudoscalar i 3 of the algebra. The geometric product of two multivectors is defined analogously to the geometric product of Cl 2,0 (R), following the associative and bilinear multiplication of multivectors follows: e 2 i = 1 for i = 1, 2, 3 e i e j = -e j e i for i, j = 1, 2, 3, i ̸ = j . Using Definition 2, the dual pairs of Cl 3,0 are: 1 ↔ e 1 e 2 e 3 = i 3 (37) e 1 ↔ e 2 e 3 (38) e 2 ↔ e 3 e 1 (39) e 3 ↔ e 1 e 2 . (40) The geometric product for Cl 3,0 (R) is defined analogously to the geometric product of Cl 2,0 (R) via:  ab = a 0 b 0 + a 0 b 1 e 1 + a 0 b 2 e 2 + where minus signs appear to do reordering of basis elements. Equation 41 simplifies to

B CLIFFORD NEURAL LAYERS

This appendix supports Section 3 of the main paper. Clifford convolutions are related to the work on complex networks by Trabelsi et al. (2017) , and closely related to work on quaternion neural networks (Zhu et al., 2018; Parcollet et al., 2018a; Gaudet & Maida, 2018; Parcollet et al., 2018b; 2019; 2020; Nguyen et al., 2021) . Probably the most related work are (i) by Zang et al. (2022) who build geometric algebra convolution networks to process spatial and temporal data, and (ii) Spellings ( 2021) who build rotation-and permutationequivariant graph network architectures based on geometric algebra products of node features. Higher order information is built from available node inputs.

B.1 CLIFFORD CONVOLUTION LAYERS

We derive the implementation of translation equivariant Clifford convolution layers for multivectors in G 2 , i.e. multivectors of Clifford algebras generated by the 2-dimensional vector space R 2 . Finally, we make the extension to Clifford algebras generated by the 3-dimensional vector space R 3 . Regular CNN layers. Regular convolutional neural network (CNN) layers take as input feature maps f : Z 2 → R cin and convolvefoot_11 them with a set of c out filters {w i } cout i=1 : Z 2 → R cin : f ⋆ w i (x) = y∈Z 2 f (y), w i (y -x) (44) = y∈Z 2 cin j=1 f j (y)w i,j (y -x) . ( ) Equation 44 can be interpreted as inner product of the input feature maps with corresponding filters at every point y ∈ Z 2 . By applying c out filters, the output feature maps can be interpreted as c outdimensional features vectors at every point y ∈ Z 2 . We now want to extend convolution layers such that the elementwise product of scalars f j (y)w i,j (y -x) are replaced by the geometric product of multivector inputs and multivector filters f j (y)w i,j (y -x). Clifford CNN layers. We replace the feature maps f : Z 2 → R cin by multivector feature maps f : Z 2 → (G 2 ) cin and convolve them with a set of c out multivector filters  {w i } cout i=1 : Z 2 → (G 2 ) cin : f ⋆ w i (x) = y∈Z 2 cin j=1 f j (y)w i,j (y -x) f j w i,j : G 2 ×G 2 →G 2 . ( Let f : Z 2 → (G 2 ) cin be a multivector feature map and let w : Z 2 → (G 2 ) cin be a multivector kernel, then for Cl(2, 0)(R) [[L t f ] ⋆ w] (x) = [L t [f ⋆ w]] (x). Published as a conference paper at ICLR 2023 Proof. [ [Ltf ] ⋆ w] (x) = y∈Z 2 c in j=1 f (y -t)w(y -x) = y∈Z 2 c in j=1 f0(y -t)w0(y -x) + f1(y -t)w1(y -x) + f2(y -t)w2(y -x) -f12(y -t)w12(y -x) + Å f0(y -t)w1(y -x) + f1(y -t)w0(y -x) -f2(y -t)w12(y -x) + f12(y -t)w2(y -x) ã e1 + Å f0(y -t)w2(y -x) + f1(y -t)w12(y -x) + f2(y -t)w0(y -x) -f12(y -t)w1(y -x) ã e2 + Å f0(y -t)w12(y -x) + f1(y -t)w2(y -x) -f2(f -t)w1(y -x) + f12(y -t)w0(y -x) ã e1e2 (using y → y -t) = y∈Z 2 c in j=1 f0(y)w0(y -(x -t)) + f1(y)w1(y -(x -t)) + f2(y)w2(y -(x -t)) -f12(y)w12(y -(x -t)) + Å f0(y)w1(y -(x -t)) + f1(y)w0(y -(x -t)) -f2(y)w12(y -(x -t)) + f12(y)w2(y -(x -t)) ã e1 + Å f0(y)w2(y -(x -t)) + f1(y)w12(y -(x -t)) + f2(y)w0(y -(x -t)) -f12(y)w1(y -(x -t)) ã e2 + Å f0(y)w12(y -(x -t)) + f1(y)w2(y -(x -t)) -f2(y)w1(y -(x -t)) + f12(y)w0(y -(x -t)) ã e1e2 = [Lt [f ⋆ w]] (x) . Implementation of Cl 2,0 (R) and Cl 0,2 (R) layers. We can implement a Cl(2, 0)(R) Clifford CNN layer using Equation 25where {b 0 , b 1 , b 2 , b 12 } → {w i,j 0 , w i,j 1 , w i,j 2 , w i,j 12 } correspond to 4 different kernels representing one 2D multivector kernel, i.e. 4 different convolution layers, and {a 0 , a 1 , a 2 , a 12 } → {f j 0 , f j 1 , f j 2 , f j 12 } correspond to the scalar, vector and bivector parts of the input multivector field. The channels of the different layers represent different stacks of scalars, vectors, and bivectors. All kernels have the same number of input and output channels (number of input and output multivectors), and thus the channels mixing occurs for the different terms of Equations 25, 42 individually. Lastly, usually not all parts of the multivectors are present in the input vector fields. This can easily be accounted for by just omitting the respective parts of Equations 25, 42. A similar reasoning applies to the output vector fields. For Cl(0, 2)(R), the signs within the geometric product change slightly.

B.1.2 ROTATIONAL CLIFFORD CNN LAYERS

Here we introduce an alternative parameterization to the Clifford CNN layer introduced in Equation 7 by using the isomorphism of the Clifford algebra Cl 0,2 (R) to quaternionsfoot_12 . We take advantage of the fact that a quaternion rotation can be realized by a matrix multiplication (Jia, 2008; Kuipers, 1999; Schwichtenberg, 2015) . Using the isomorphism, we can represent the feature maps f j and filters w i,j as quaternions: f j = f j 0 +f j 1 î+f j 2 ȷ+f j 3 k and w i,j = w i,j 0 +w i,j 1 î+w i,j 2 ȷ+w i,j 3 k14 . Leveraging this quaternion representation, we can devise an alternative parameterization of the product between the feature map f j and w i,j . To be more precise, we introduce a composite operation that results in a scalar quantity and a quaternion rotation, where the latter acts on the vector part of the quaternion f j and only produces nonzero expansion coefficients for the vector part of the quaternion output. A quaternion rotation w i,j f j (w i,j ) -1 acts on the vector part (î, ȷ, k) of f j , and can be algebraically manipulated into a vector-matrix operation R i,j f j , where R i,j : H → H is built up from the elements of w i,j (Kuipers, 1999). In other words, one can transform the vector part (î, ȷ, k) of f j ∈ H via a rotation matrix R i,j that is built from the scalar and vector part (1, î, ȷ, k) of w i,j ∈ H. Altogether, a rotational multivector filter {w i rot } cout i=1 : Z 2 → (G 2 ) cin acts on the feature map f j through a rotational transformation R i,j (w i,j rot,0 , w i,j rot,1 , w i,j rot,2 , w i,j rot,12 ) acting on vector and bivector parts of the multivector feature map f : Z 2 → (G 2 ) cin , and an additional scalar response of the multivector filters: f ⋆ w i rot (x) = y∈Z 2 cin j=1 f j (y)w i,j rot (y -x) = y∈Z 2 cin j=1 f j (y)w i,j rot (y -x)) 0 scalar output +R i,j (y -x) • Ñ f j 1 (y) f j 2 (y) f j 12 (y) é , where f j (y)w i,j rot (y -x)) 0 = f j 0 w i,j rot,0 -f j 1 w i,j rot,1 -f j 2 w i,j rot,2 -f j 12 w i,j rot,12 , which is the scalar output of Equation 34. The rotational matrix R i,j (y -x) in written out form reads: R i,j = Ñ 1 -2 ( ŵi,j rot,2 ) 2 + ( ŵi,j rot,12 ) 2 2 ŵi,j rot,1 ŵi,j rot,2 -ŵi,j rot,0 ŵi,j rot,12 2 ŵi,j  1 -2 ( ŵi,j rot,1 ) 2 + ( ŵi,j rot,2 ) 2 é , where ŵi,j rot (y -x) = ŵi,j rot,0 (y -x) + ŵi,j rot,1 (y -x)e 1 + ŵi,j rot,2 (y -x)e 2 + ŵi,j rot,12 (y -x)e 12 is the normalized filter with ∥ ŵi,j rot ∥ = 1. The dependency (y -x) is omitted inside the rotation matrix R i,j for clarity. (Ioffe & Szegedy, 2015; Ba et al., 2016; Wu & He, 2018; Ulyanov et al., 2017) . Their standard formulation applies only to real values. Simply translating and scaling multivectors such that their mean is 0 and their variance is 1 is insufficient because it does not ensure equal variance across all components. Trabelsi et al. (2017) extended the batch normalization formulation to apply to complex values. We build on the same principles to first propose an appropriate batch normalization scheme for multivectors, similar to the work of Gaudet & Maida (2018) for quaternions. For 2D multivectors of the form a = a 0 + a 1 e 1 + a 2 e 2 + a 12 e 1 e 2 , we can formulate the problem of batch normalization as that of whitening 4D vectors:

Batch normalization

ã = (V) -1 2 (a -E[a]) where the covariance matrix V is V = Ö V a0a0 V a0a1 V a0a2 V a0a12 V a1a0 V a1a1 V a1a2 V a1a12 V a2a0 V a2a1 V a2a2 V a2a12 V a12a0 V a12a1 V a12a2 V a12a12 è . (51) Now, f ⋆ w i ′ (T x) = y∈Z 2 cin j=1 f ′j (y)w i,j (y -T x)) = y∈Z 2 cin j=1 f ′j (y)w i,j (T (T -1 y -x))) = T y ′ ∈Z 2 cin j=1 f ′j (T y ′ )w i,j (T (y ′ -x))), y ′ = T -1 y = y ′ ∈Z 2 cin j=1 f ′j (T y ′ )w i,j (T (y ′ -x))) = y ′ ∈Z 2 cin j=1 Tf j (y ′ )Tw i,j (y ′ -x)) = y ′ ∈Z 2 cin j=1 T(f j (y ′ )w i,j (y ′ -x))) = T y ′ ∈Z 2 cin j=1 (f j (y ′ )w i,j (y ′ -x))) = T [f ⋆ w i ] (x) where in the fourth line we transform variables y → y ′ , in the fifth line we use the invariance of the summation "measure" under T , in the sixth line we use the transformation property of f and equivariance for w i , in the seventh line we use the property of multivectors, and in the eighth line we use linearity of T.

B.5 CLIFFORD FOURIER LAYERS

We derive the implementation of Clifford Fourier layers for multivectors in G 2 and G 3 , i.e. multivectors of Clifford algebras generated by the 2-dimensional vector space R 2 and the 3-dimensional vector space R 3 . Classical Fourier transform. In arbitrary dimension n, the Fourier transform f (ξ) = F{f }(ξ) for a continuous n-dimensional complex-valued signal f (x) = f (x 1 , . . . , x n ) : R n → C is defined as: f (ξ) = F{f }(ξ) = 1 (2π) n/2 R n f (x)e -2πi⟨x,ξ⟩ dx , ∀ξ ∈ R n , provided that the integral exists, where x and ξ are n-dimensional vectors and ⟨x, ξ⟩ is the contraction of x and ξ. Usually, ⟨x, ξ⟩ is the inner product, and ξ is an element of the dual vector space R n⋆ . The inversion theorem states the back-transform from the frequency domain into the spatial domain: f (x) = F -1 {F{f }}(x) = 1 (2π) n/2 R n f (ξ)e 2πi⟨x,ξ⟩ dξ , ∀x ∈ R n . ( ) We can rewrite the Fourier transform of Equation 58 in coordinates: f (ξ 1 , . . . , ξ n ) = F{f }(ξ 1 , . . . , ξ n ) = 1 (2π) n/2 R n f (x 1 , . . . , x n )e -2πi(x1ξ1+...+xnξn) dx 1 . . . dx n . (60) Discrete/Fast Fourier transform. The discrete counterpart of Equation 58 transforms an ndimensional complex signal f (x) = f (x 1 , . . . , x n ) : R n → C at M 1 × . . . × M n grid points into its complex Fourier modes via: f (ξ 1 , . . . , ξ n ) = F{f }(ξ 1 , . . . , ξ n ) = M1 m1=0 . . . Mn mn=0 f (m 1 , . . . , m n ) • e -2πi• Å m 1 ξ 1 M 1 +...+ mnξn Mn ã , where (ξ 1 , . . . , ξ n ) ∈ Z M1 . . . × . . . Z Mn . Fast Fourier transforms (FFTs) (Cooley & Tukey, 1965; Van Loan, 1992) immensely accelerate the computation of the transformations of Equation 61 by factorizing the discrete Fourier transform matrix into a product of sparse (mostly zero) factors.

B.5.1 2D CLIFFORD FOURIER TRANSFORM

Analogous to Equation 58, for Cl(2, 0)(R) the Clifford Fourier transform (Ebling & Scheuermann, 2005; Hitzer, 2012) and the respective inverse transform for multivector valued functions f (x) : R 2 → G 2 and vectors x, ξ ∈ R 2 are defined as: f (ξ) = F{f }(ξ) = 1 2π R2 f (x)e -2πi2⟨x,ξ⟩ dx , ∀ξ ∈ R 2 , ( ) f (x) = F -1 {F{f }}(x) = 1 2π R2 f (ξ)e 2πi2⟨x,ξ⟩ dξ , ∀x ∈ R 2 , ( ) provided that the integrals exist. The differences to Equations 58 and 59 are that f (x) and f (ξ) represent multivector fields in the spatial and the frequency domain, respectively, and that the pseudoscalar i 2 = e 1 e 2 is used in the exponent. Inserting the definition of multivector fields, we can rewrite Equation 62 as: F{f }(ξ) = 1 2π R2 f (x)e -2πi2⟨x,ξ⟩ dx , = 1 2π R2 1 Å f 0 (x) + f 12 (x)i 2 spinor part ã + e 1 Å f 1 (x) + f 2 (x)i 2 vector part ã e -2πi2⟨x,ξ⟩ dx = 1 2π R2 1 Å f 0 (x) + f 12 (x)i 2 ã e -2πi2⟨x,ξ⟩ dx + 1 2π R2 e 1 Å f 1 (x) + f 2 (x)i 2 ã e -2πi2⟨x,ξ⟩ dx = 1 F Å f 0 (x) + f 12 (x)i 2 ã (ξ) + e 1 F Å f 1 (x) + f 2 (x)i 2 ã (ξ) . ( ) We obtain a Clifford Fourier transform by applying two standard Fourier transforms for the dual pairs f 0 = f 0 (x) + f 12 (x)i 2 and f 1 = f 1 (x) + f 2 (x)i 2 , which both can be treated as a complexvalued signal f 0 , f 1 : R 2 → C. Consequently, f (x) can be understood as an element of C 2 . The 2D Clifford Fourier transform is the linear combination of two classical Fourier transforms. The discretized versions of the spinor/vector part ( fs/v ) reads analogously to Equation 61: fs/v (ξ 1 , ξ 2 ) = F{f s/v }(ξ 1 , ξ 2 ) = M1 m1=0 M2 m2=0 f s/v (m 1 , m 2 ) • e -2πi2 Å m 1 ξ 1 M 1 + m 2 ξ 2 M 2 ã , where again (ξ 1 , ξ 2 ) ∈ Z M1 × Z Mn . Similar to Fourier Neural Operators (FNOs) where weight tensors are applied pointwise in the Fourier space, we apply multivector weight tensors W ∈ (G 2 ) cin×cout×(ξ max 1 ×ξ max 2 ) point-wise. Fourier modes above cut-off frequencies (ξ max 1 , ξ max 2 ) are set to zero. In doing so, we modify the Clifford Fourier modes f (ξ) = F{f }(ξ) = f0 (ξ) + f1 (ξ)e 1 + f2 (ξ)e 2 + f12 (ξ)e 12 (66) via the geometric product. The Clifford Fourier modes follow naturally when combining spinor and vector parts of Equation 64. Analogously to FNOs, higher order modes are cut off. Finally, the residual connections used in FNO layers is replaced by a multivector weight matrix realized as Clifford convolution, ideally a Cl 2,0 (R) convolution layer. A schematic sketch of a Clifford Fourier layer is shown in Figure 5b in the main paper. For Cl(0, 2)(R), the the vector part changes to e 1 Å f 1 (x) -f 2 (x)i 2 ã .

B.5.2 2D CLIFFORD CONVOLUTION THEOREM

In contrast to Ebling & Scheuermann (2005) , we proof the 2D Clifford convolution theorem for multivector valued filters applied from the right, such that filter operations are consistent with Clifford convolution layers. We first need to show that the Clifford kernel commutes with the spinor and anti-commutes with the vector part of multivectors. We can write the product ae i2s for every scalar s ∈ R and multivector a ∈ G 2 as ae i2s = a cos(s) + i 2 sin(s) . (67) For the basis of the spinor part, we obtain 1i 2 = i 2 1, and for the basis of the vector part e 1 i 2 = e 1 e 1 e 2 = -e 1 e 2 e 1 = -i 2 e 1 . Thus, the Fourier kernel e -2πi2⟨x,ξ⟩ commutes with the spinor part, and anti-commutes with the vector part of a, both for Cl(2, 0)(R) and Cl(0, 2)(R). We therefore proof the convolution theorem for the commuting spinor and the anti-commuting vector part of a. Theorem 2: 2D Clifford convolution theorem. Let the field f : R 2 → G 2 be multivector valued, the filter k s : R 2 → G 2 be spinor valued, and the filter k v : R 2 → G 2 be vector valued, and let F{f }, F{k s }, F{k v } exist, then F{f ⋆ k s }(ξ) = F{f }(ξ) • F † {k s }(ξ) , F{f ⋆ k v }(ξ) = F{f }(ξ) • F {k v }(ξ) , where F † {k s }(ξ) = F{k s }(-ξ) and F † {k v }(ξ) = F{k v }(-ξ). Proof. F{f ⋆ k s }(ξ) = 1 (2π) 2 R 2 R 2 f (y)k s (y -x)dy e -2πi2⟨x,ξ⟩ dx = 1 (2π) 2 R 2 f (y) R 2 k s (y -x)e -2πi2⟨x,ξ⟩ dx dy = 1 (2π) 2 R 2 f (y) R 2 k s (x)e -2πi2⟨y-x,ξ⟩ dx F † {ks}(ξ)e -2πi 2 ⟨y,ξ⟩ =e -2πi 2 ⟨y,ξ⟩ F † {ks}(ξ) dy = 1 2π R 2 f (y)e -2πi2⟨y,ξ⟩ dy F † {k s }(ξ) = F{f }(ξ) • F † {k s }(ξ) . ( ) F{f ⋆ k v }(ξ) = 1 (2π) 2 R 2 R 2 f (y)k v (y -x)dy e -2πi2⟨x,ξ⟩ dx = 1 (2π) 2 R 2 f (y) R 2 k v (y -x)e -2πi2⟨x,ξ⟩ dx dy = 1 (2π) 2 R 2 f (y) R 2 k v (x)e -2πi2⟨y-x,ξ⟩ dx F † {kv}(ξ)e 2πi 2 ⟨y,ξ⟩ =e -2πi 2 ⟨y,ξ⟩ F {kv}(ξ) , where -ξ→ ξ dy = 1 2π R 2 f (y)e -2πi2⟨y,ξ⟩ dy F{k v }(ξ) = F{f }(ξ) • F {k v }(ξ) .

B.5.3 3D CLIFFORD FOURIER TRANSFORM

For Cl(3, 0)(R), analogous to Equation 58, the Clifford Fourier transform (Ebling & Scheuermann, 2005) and the respective inverse transform for multivector valued functions f : R 3 → G 3 and vectors x, ξ ∈ R 3 are defined as: f (ξ) = F{f }(ξ) = 1 (2π) 3/2 R3 f (x)e -2πi3⟨x,ξ⟩ dx , ∀ξ ∈ R 3 , f (x) = F -1 {F{f }}(x) = 1 (2π) 3/2 R3 f (ξ)e 2πi3⟨x,ξ⟩ dξ , ∀x ∈ R 3 , provided that the integrals exist. A multivector valued function f : R 3 → G 3 , f = f 0 + f 1 e 1 + f 2 e 2 + f 3 e 3 + f 12 e 12 + f 13 e 13 + f 23 e 23 + f 123 e 123 ( ) can be expressed via the pseudoscalar i 3 = e 1 e 2 e 3 as: f = (f 0 + f 123 i 3 )1 + (f 1 + f 23 i 3 )e 1 + (f 2 + f 31 i 3 )e 2 + (f 3 + f 12 i 3 )e 3 , We obtain a 3-dimensional Clifford Fourier transform by applying four standard Fourier transforms for the four dual pairs f 0 = f 0 (x) + f 123 (x)i 3 , f 1 = f 1 (x) + f 23 (x)i 3 , f 2 = f 2 (x) + f 31 (x)i 3 , and f 3 = f 3 (x)+f 12 (x)i 3 , which all can be treated as a complex-valued signal f 0 , f 1 , f 2 , f 3 : R 3 → C. Consequently, f (x) can be understood as an element of C 4 . The 3D Clifford Fourier transform is the linear combination of four classical Fourier transforms: F{f }(ξ) = 1 (2π) 3/2 R3 f (x)e -2πi3⟨x,ξ⟩ dx , = 1 (2π) 3/2 R3 1 Å f 0 (x) + f 123 (x)i 3 ã + e 1 Å f 1 (x) + f 23 (x)i 3 ã + e 2 Å f 2 (x) + f 31 (x)i 3 ã + e 3 Å f 3 (x) + f 12 (x)i 3 ã e -2πi3⟨x,ξ⟩ dx = 1 (2π) 3/2 R3 1 Å f 0 (x) + f 123 (x)i 3 ã e -2πi3⟨x,ξ⟩ dx + 1 (2π) 3/2 R3 e 1 Å f 1 (x) + f 23 (x)i 3 ã e -2πi3⟨x,ξ⟩ dx + 1 (2π) 3/2 R3 e 2 Å f 2 (x) + f 31 (x)i 3 ã e -2πi3⟨x,ξ⟩ dx + 1 (2π) 3/2 R3 e 3 Å f 3 (x) + f 12 (x)i 3 ã e -2πi3⟨x,ξ⟩ dx = 1 F Å f 0 (x) + f 12 (x)i 3 ã (ξ) + e 1 F Å f 1 (x) + f 23 (x)i 3 ã (ξ) + e 2 F Å f 2 (x) + f 31 (x)i 3 ã (ξ) + e 3 F Å f 3 (x) + f 12 (x)i 3 ã (ξ) . ( ) Analogous to the 2-dimensional Clifford Fourier transform, we apply multivector weight tensors W ∈ (G 3 ) cin×cout×(ξ max 1 ×ξ max 2 ×ξ max 3 ) point-wise. Fourier modes above cut-off frequencies (ξ max 1 , ξ max 2 , ξ max 3 ) are set to zero. In doing so, we modify the Clifford Fourier modes f (ξ) = F{f }(ξ) = f0 (ξ) + f1 (ξ)e 1 + f2 (ξ)e 2 + f3 (ξ)e 3 + f12 (ξ)e 12 + f31 (ξ)e 31 + f23 (ξ)e 23 + f123 (ξ)e 123 (75) via the geometric product. The Clifford Fourier modes follow naturally when combining the four parts of Equation 74. Finally, the residual connections used in FNO layers is replaced by a multivector weight matrix realized as Clifford convolution, ideally a Cl 3,0 (R) convolution layer. For other 3-dimensional Clifford algebras, the signs of the dual pairs in Equation 73 change accordingly.

B.5.4 3D CLIFFORD CONVOLUTION THEOREM

This theorem adapted from Ebling & Scheuermann (2005) ). First, again let's check if the Clifford kernel commutes with the different parts of multivectors. We can write the product ae i3s for every scalar s ∈ R and multivector a ∈ G 3 as ae i3s = a cos(s) + i 3 sin(s) . (76) First, we check again if the different basis vectors of the Fourier transforms of Equation 74 commute with the pseudoscalar i 3 : In contrast to the 2-dimensional Clifford Fourier transform, now all four parts of the multivector of Equation 73 commute with i 3 . This holds for all 3-dimensional Clifford algebras. 1i 3 = i 3 1 ✓ e 1 i 3 = e 1 Theorem 3: 3D Clifford convolution theorem. Let the field f : R 3 → G 3 be multivector valued, the filter k a : R 3 → G 3 be multivector valued, and let F{f }, F{k a } exist, then F{f ⋆ k a }(ξ) = F{f }(ξ) • F † {k a }(ξ) , where F † {k a }(ξ) = F{k a }(-ξ). Proof. F{f ⋆ k a }(ξ) = 1 (2π) 3 R 3 R 3 f (y)k a (y -x)dy e -2πi3⟨x,ξ⟩ dx = 1 (2π) 3 R 3 f (y) R 3 k a (y -x)e -2πi3⟨x,ξ⟩ dx dy = 1 (2π) 3 R 3 f (y) R 3 k a (x)e -2πi3⟨y-x,ξ⟩ dx F † {ka}(ξ)e -2πi 3 ⟨y,ξ⟩ =e -2πi 3 ⟨y,ξ⟩ F † {ka}(ξ) dy = 1 (2π) 3/2 R 3 f (y)e -2πi3⟨y,ξ⟩ dy F † {k a }(ξ) = F{f }(ξ) • F † {k a }(ξ) .

B.5.5 IMPLEMENTATION OF CLIFFORD FOURIER LAYERS

We implement a 2D Clifford Fourier layer by applying two standard Fourier transforms on the dual pairs of Equation 11. These dual pairs can be treated as complex valued inputs. Similarly, we implement a 3D Clifford Fourier layer by applying four standard Fourier transforms on the dual pairs of e.g. Cl 3,0 (Equation 37 -Equation 40). Since Clifford convolution theorems hold both for the vector and the spinor parts and for the four dual pairs for Cl 2,0 and Cl 3,0 , respectively, we multiply the modes in the Fourier space using the geometric product. Finally, we apply an inverse Fourier transformation and resemble the multivectors in the spatial domain.

B.6 PSEUDOCODE

Algorithm 1 sketches the implementation of a Clifford convolution, Algorithm 2 of a rotational Clifford convolution, and Algorithm 3 of a Clifford Fourier layer. 1: function CLIFFORDKERNEL2D(W ) 2: kernel ←    W [0] W [1] W [2] -W [3] W [1] W [0] -W [3] W [2] W [2] W [3] W [0] -W [1] W [3] W [2] -W [1] W [0]    3: return kernel 4: function CLIFFORDCONV2D(W , x)  sq 12 ← W [1] 2 + W [2] 2 3: sq 13 ← W [1] 2 + W [3] 2 4: sq 23 ← W [2] 2 + W [3] 2 5: sumsq ← W [0] 2 + W [1] 2 + W [2] 2 + W [3] 2 + ϵ 6: rot 12 ← W [0]W [1]/sumsq 7: rot 13 ← W [0]W [2]/sumsq 8: rot 14 ← W [0]W [3]/sumsq 9: rot 23 ← W [1]W [2]/sumsq 10: rot 24 ← W [1]W [3]/sumsq 11: rot 34 ← W [2]W [3]/sumsq 12: kernel ←    W [0] -W [1] -W [2] -W [3] W [5] W [4](1.0 -sq 23 ) W [4](rot 23 -rot 14 ) W [4](rot 24 + rot 13 ) W [5] W [4](rot 23 + rot 14 ) W [4](1.0 -sq 13 ) W [4](rot 34 -rot 12 ) W [5] W [4](rot 24 -rot 13 ) W [4](rot 34 + rot 12 ) W [4](1.0 -sq 12 )    13: return kernel 14: function CLIFFORDCONV2D ROT (W , x) return VIEW AS MULTIVECTOR (output) Algorithm 2: Pseudocode for 2D rotational Clifford convolution using Cl 0,2 . 1: function CLIFFORDSPECTRALCONV2D(W , x, m 1 , m 2 ) 2: x v , x v ← VIEW AS DUAL PARTS(x) 3: f (x v ) ← FFT2(x v ) ▷ Complex 2D FFT of vector part 4: f (x s ) ← FFT2(x s ) ▷ Complex 2D FFT of scalar part 5: f * (x v ) ← ï f (x v )[. . . , : m 1 , : m 2 ] f (x v )[. . . , : m 1 , -m 2 :] f (x v )[. . . , -m 1 :, : m 2 ] f (x v )[. . . , -m 1 :, -m 2 :] ò ▷ Vector modes 6: f * (x s ) ← ï f (x s )[. . . , : m 1 , : m 2 ] f (x s )[. . . , : m 1 , -m 2 :] f (x s )[. . . , -m 1 :, : m 2 ] f (x s )[. . . , -m 1 :, -m 2 :] ò ▷ Scalar modes 7: f * (x) ← f * (x s ).r + f * (x v ).r + f * (x v ).i + f * (x s ).i ▷ Multivector Fourier modes 8: f * (x) ← f * (x)W ▷ Geometric product in the Fourier space 9: xv ← IFFT2( f * (x)[1] + f * (x)[2]) ▷ Inverse 2D FFT of vector part 10: x2 ← IFFT2( f * (x)[0] + f * (x)[3]) ▷ Inverse 2D FFT of scalar part 11: x ← VIEW AS MULTIVECTOR(x v , xs ) 12: return x 13: function CLIFFORDFOURIERLAYER2D(W f , W c , x) 14: y 1 ← CLIFFORDSPECTRALCONV(W f , x, m 1 , m 2 ) 15: x 2 ← VIEW AS REALVECTOR(x) 16: y 2 ← CLIFFORDCONV(W c , x 2 ) 17: y 2 ← VIEW AS MULTIVECTOR(y 2 ) 18: out ← ACTIVATION(y 1 + y 2 ) 19: return out 20: Algorithm 3: Pseudocode for 2D Clifford Fourier layer using Cl 2,0 . and overall use 128 hidden channels. We used GeLU activation functions (Hendrycks & Gimpel, 2016) . Additional shortcut connections or normalization techniques, such as batchnorm or group, norm did not improve performance, neither did larger numbers of hidden channels, nor more FNO blocks. Overall this resulted in roughly 140 million parameters for FNO based architectures. For 3-dimensional Maxwell experiments, we used 3-dimensional Fourier Neural Operators (FNOs) consisting of 4 FNO blocks, two embedding and two output layers. Each FNO block comprised a 3D convolution path with a 1 × 1 kernel and an FFT path. We used 6 Fourier modes (for x, y, and z components) for point-wise weight multiplication, and overall used 96 hidden channels. Interestingly, using more layers or more Fourier modes degraded performances. Similar to the 2D experiments, we applied GeLU activation functions, and neither apply shortcut connections nor normalization techniques, such as batchnorm or groupnorms. Overall this resulted in roughly 65 million floating point parameters for FNO based architectures. Clifford Fourier Neural Operator architectures. For every FNO-based experiment, we replaced the fine-tuned FNO architectures with respective Clifford counterparts: each FNO layer is replaced by its Clifford counterpart. To keep the number of weights similar, instead of 128 channels the resulting architectures have 48 multivector channels, resulting in roughly the same number of parameters. Additionally, GeLU activation functions are replaced with Clifford GeLU activation functions. Using Clifford initialization techniques did not improve results. For 3-dimensional Maxwell experiments, we replaced each 3D Fourier transform layer with a 3D Clifford Fourier layer and each 3D convolution with a respective Clifford convolution. We also use 6 Fourier modes (for x, y, and z components) for point-wise weight multiplication, and overall used 32 hidden multivector channels, which results in roughly the same number of parameters (55 millions). In contrast to 2-dimensional implementations, Clifford initialization techniques proved important for 3-dimensional architectures. Most notably, too large initial values of the weights of Clifford convolution layers hindered gradient flows through the Clifford Fourier operations.

C.3 TRAINING AND MODEL SELECTION.

We optimized models using the Adam optimizer (Kingma & Ba, 2014) with learning rates [10 -4 , 2 • 10 -4 , 5 • 10 -4 ] for 50 epochs and minimized the summed mean squared error (SMSE) which is outlined in Equation 79. We used cosine annealing as learning rate scheduler (Loshchilov & Hutter, 2016) with a linear warmup. For baseline ResNet models, we optimized number of layers, number of channels, and normalization procedures. We further tested different activation functions. For baseline FNO models, we optimized number of layers, number of channels, and number of Fourier modes. Larger numbers of layers or channels did not improve the performances for both ResNet and FNO models. For the respective Clifford counterparts, we exchanged convolution and Fourier layers by Clifford convolution and Clifford Fourier layers. We further used Clifford normalization schemes. We decreased the number of layers to obtain similar numbers of parameters. We could have optimized Clifford architectures slightly more by e.g. using different numbers of hidden layers than the baseline models did. However, this would (i) slightly be against the argument of having "plug-and play" replace layers, and (ii) would have added quite some computational overhead. Finally, we are quite confident that the used architectures are very close to the optimum for the current tasks. Computational resources. All FNO and CFNO experiments used 4×16 GB NVIDIA V100 machines for training. All ResNet and Clifford ResNet experiments used 8×32 GB NVIDIA V100 machines. Average training times varied between 3 h and 48 h, depending on task and number of trajectories. Clifford runs on average took twice as long to train for equivalent architectures and epochs.

C.4 NAVIER-STOKES IN 2D

The incompressible Navier-Stokes equations are built upon momentum and mass conservation of fluids. Momentum conservation yields for the velocity flow field v ∂v ∂t = -v • ∇v + µ∇ 2 v -∇p + f , where v • ∇v is the convection, µ∇ 2 v the viscosity, ∇p the internal pressure and f an external force. Convection is the rate of change of a vector field along a vector field (in this case along itself), viscosity is the diffusion of a vector field, i.e. the net movement form higher valued regions to lower concentration regions, µ is the viscosity coefficient. The incompressibility constrained yields mass conservation via ∇ • v = 0 . Additional to the velocity field v(x), we introduce a scalar field s(x) representing a scalar quantity that is being transported through the velocity field. For example, v might represent velocity of air inside a room, and s might represent concentration of smoke. As the vector field changes, the scalar field is transported along it, i.e. the scalar field is advected by the vector field. Similar to convection, advection is the transport of a scalar field along a vector field: ds dt = -v • ∇s . We implement the 2D Navier-Stokes equation using ΦFlowfoot_15 (Holl et al., 2020) . Solutions are propagated where we solve for the pressure field and subtract its spatial gradients afterwards. Semi-Lagrangian advection (convection) is used for v, and MacCormack advection for s. Additionally, we express the external buoyancy force f in Equation 80 as force acting on the scalar field. Solutions are obtained using Boussinesq approximation (Kleinstreuer, 1997) , which ignores density differences except where they appear in terms multiplied by the acceleration due to gravity. The essence of the Boussinesq approximation is that the difference in inertia is negligible but gravity is sufficiently strong to make the specific weight appreciably different between the two fluids. Equation details. We obtain data for the 2D Navier-Stokes equations on a grid with spatial resolution of 128 × 128 (∆x = 0.25, ∆y = 0.25), and temporal resolution of ∆t = 1.5 s. The equation is solved on a closed domain with Dirichlet boundary conditions (v = 0) for the velocity, and Neumann boundaries ∂s ∂x = 0 for the scalar smoke field. The viscosity parameter is set to ν = 0.01, and a buoyancy factor of (0, 0.5) T is used. The scalar field is initialized with random Gaussian noise fluctuations, and the velocity field is initialized to 0. We run the simulation for 21 s and sample every 1.5 s. Trajectories contain scalar and vector fields at 14 different time points. Results. Results are summarized in Figures 10, 9 , and detailed in Table 1 . Figure 11 displays examples of Navier-Stokes rollouts of scalar and vector fields obtained by Clifford Fourier surrogates, and contrasts them with ground truth trajectories. For ResNet-like architectures, we observe that both CResNet and CResNet rot improve upon the ResNet baseline. Additionally, we observe that rollout losses are also lower for the two Clifford based architectures, which we attribute to better and more stable models that do not overfit to one-step predictions so easily. Lastly, while in principle CResNet and CResNet rot based architectures are equally flexible, CResNet rot ones in general perform better than CResNet ones. For FNO and respective Clifford Fourier based (CFNO) architectures, the loss is in general much lower than for ResNet based architectures. CFNO architectures improve upon FNO architectures for all dataset sizes, and for one-step as well as rollout losses. 0.002 34 ± 0.000 14 0.008 57 ± 0.000 66 0.010 87 ± 0.000 74 0.094 ± 0.000 71 ResNet 0.003 21 ± 0.000 04 0.013 37 ± 0.000 44 0.016 53 ± 0.000 48 0.138 ± 0.002 23 CResNet 10400 0.003 15 ± 0.000 06 0.011 62 ± 0.000 19 0.014 73 ± 0.000 18 0.106 ± 0.002 46 CResNet rot 0.002 01 ± 0.000 20 0.007 19 ± 0.000 74 0.009 17 ± 0.000 90 0.100 ± 0.002 29 ResNet 0.003 42 ± 0.000 03 0.013 79 ± 0.000 79 0.017 16 ± 0.000 91 0.130 ± 0.003 79 CResNet 15600 0.002 85 ± 0.000 19 0.010 76 ± 0.000 51 0.013 57 ± 0.000 63 0.103 ± 0.002 69 CResNet rot 0.002 04 ± 0.000 14 0.007 36 ± 0.000 69 0.009 38 ± 0.000 87 0.097 ± 0.001 39 FNO 2080 0.003 18 ± 0.000 21 0.006 13 ± 0.000 44 0.009 31 ± 0.000 64 0.042 ± 0.003 00 CFNO 0.002 66 ± 0.000 02 0.004 84 ± 0.000 06 0.007 49 ± 0.000 08 0.034 ± 0.000 31 FNO 5200 0.002 04 ± 0.000 04 0.003 32 ± 0.000 11 0.005 36 ± 0.000 15 0.026 ± 0.000 67 CFNO 0.001 89 ± 0.000 01 0.002 93 ± 0.000 02 0.004 82 ± 0.000 03 0.024 ± 0.000 12 FNO 10400 0.001 56 ± 0.000 03 0.002 20 ± 0.000 07 0.003 75 ± 0.000 10 0.020 ± 0.000 42 CFNO 0.001 48 ± 0.000 01 0.002 05 ± 0.000 01 0.003 53 ± 0.000 02 0.018 ± 0.000 06 C.5 SHALLOW WATER EQUATIONS. The shallow water equations (Vreugdenhil, 1994) describe a thin layer of fluid of constant density in hydrostatic balance, bounded from below by the bottom topography and from above by a free surface. For example, the deep water propagation of a tsunami can be described by the shallow water equations, and so can a simple weather model. The shallow water equations read: ∂v x ∂t + v x ∂v x ∂x + v y ∂v x ∂y + g ∂η ∂x = 0 , ∂v y ∂t + v x ∂v y ∂x + v y ∂v y ∂y + g ∂η ∂y = 0 , ∂η ∂t + ∂ ∂x ï (η + h)v x ò + ∂ ∂y ï (η + h)v y ò = 0 , where v x is the velocity in the x-direction, or zonal velocity, v y is the velocity in the y-direction, or meridional velocity, g is the acceleration due to gravity, η(x, y) is the vertical displacement of free surface, which subsequently is used to derive pressure fields; h(x, y) is the topography of the earth's surface. We modify the implementation in SpeedyWeather.jlfoot_16 (Klöwer et al., 2022) to further randomize initial conditions to generate our dataset. SpeedyWeather.jl combines the shallow water equations with spherical harmonics for the linear terms and Gaussian grid for the non-linear terms with the appropriate spectral transforms. It internally uses a leapfrog time scheme with a Robert and William's filter to dampen the computational modes and achieve 3rd oder accuracy. SpeedyWeather.jl is based on the atmospheric general circulation model SPEEDY in Fortran (Molteni, 2003; Kucharski et al., 2013) . Equation details. We obtain data for the 2D shallow water equations on a grid with spatial resolution of 192 × 96 (∆x = 1.875 • , ∆y = 3.75 • ), and temporal resolution of ∆t = 6 h. The equation is solved on a closed domain with periodic boundary conditions. We rollout the simulation for 20 days and sample every 6 h. Here 20 days is of course not the actual simulation time but rather the simulated time. Trajectories contain scalar pressure and wind vector fields at 84 different time points. Results. Results are summarized in Figures 12, 13, 14, and detailed in Tables 2, 3 . Figure 15 displays examples of shallow water equations rollouts of scalar pressure and vector wind fields obtained by Clifford Fourier surrogate models, and contrasts them with ground truth trajectories. The predictions are fairly indistinguishable from ground truth trajectories. We observe similar results than for the Navier-Stokes experiments. However, performance differences between baseline and Clifford architectures are even more pronounced, which we attribute to the stronger coupling of the scalar and the vector fields. Electromagnetic simulations play a critical role in understanding light-matter interaction and designing optical elements. Neural networks have been already successful applied in inverse-designing photonic structures (Ma et al., 2021b; Lim & Psaltis, 2022 ). Maxwell's equations in matter read: ∇ • D = ρ Gauss's law (84) ∇ • B = 0 Gauss's law for magnetism (85) ∇ × E = -∂B ∂t Faraday's law of induction ( 86) ∇ × H = ∂D ∂t + j Ampère's circuital law In isotropic media, the displacement field D is related to the electrical field via D = ϵ 0 ϵ r E, where ϵ 0 is the permittivity of free space and ϵ r is the permittivity of the media. Similarly, the magnetization field H in isotropic media is related to the magnetic field B via H = µ 0 µ r B, where µ 0 is the permeability of free space and µ r is the permeability of the media. Lastly, j is the electric current density and ρ the total electric charge density. We propagate the solution of Maxwell's equation in matter using a finite-difference time-domain method 18 , where the discretized Maxwell's equations are solved in a leapfrog manner. First, the electric field vector components in a volume of space are solved at a given instant in time. Second, the magnetic field vector components in the same spatial volume are solved at the next instant in time. Equation details. We obtain data for the 3D Maxwell's equations on a grid with spatial resolution of 32×32×32 (∆x = ∆y = ∆z = 5•10 -7 m), and temporal resolution of ∆t = 50 s. Results. Results are summarized in Figure 17 and detailed in Table 4 . Pearson & Bisset (1994) , and later by Pearson (2003) . These works put the emphasis on the geometric perceptron (Melnyk et al., 2021) , i.e. how to recast vanilla multilayer perceptrons (MLPs) as Clifford MLPs. Similarly, (Hoffmann et al., 2020) generalized from complex numbers and quaternions to a set of alternative algebras. Besides Clifford MLPs, Clifford algebras have been used in recurrent neural networks (RNNs) (Kuroe, 2011) , and have been used to formulate quantum neural networks (Trindade et al., 2022) . Their applicability to neural computing has been studied in general (Buchholz & Sommer, 2001; Buchholz, 2005) , exploring global exponential stabilities of Clifford MLPs with time-varying delays and impulsive effects. Probably the most related wors are: (i) Zang et al. (2022) who build geometric algebra convolution networks to process spatial and temporal data of 3D traffic data. Multidimensional traffic parameters are encoded as multivectors which allows to model correlation between traffic data in both spatial and temporal domains. (ii) Spellings (2021) who build rotation-and permutation-equivariant graph network architectures based on geometric algebra products of node features. Higher order information is built from available node inputs. In contrast to previous works, we are the first to introduce the multivector viewpoint of field components which allows us to effectively connect Clifford neural layers with the geometric structure of the input data. We further connect neural Clifford convolutions on multivectors with various works on complex numbers and quaternions. We are further the first to introduce neural Clifford Fourier transforms. Complex and quaternion neural networks. & Ell, 2000) and hypercomplex correlation (Moxey et al., 2003) techniques were introduced for color image processing. Quaternion based deep learning architectures are a natural extension of complex neural networks. In quaternion neural networks (Zhu et al., 2018; Parcollet et al., 2018a; Gaudet & Maida, 2018; Parcollet et al., 2018b; 2019; 2020; Nguyen et al., 2021; Moya-Sánchez et al., 2021) , concepts such as complex convolution, complex batchnorm, and complex initialization are transfered from the complex numbers C, which are algebra-isomorph to Cl(0, 1)(R) to Cl(0, 2)(R), which is algebra-isomorph to the quaternions H. Although Hoffmann et al. (2020) generalized these from complex numbers and quaternions to a set of alternative algebras, their tasks did not really leverage any multivector structure in data. Fourier Neural Operators. Fourier Neural Operators (FNOs) (Li et al., 2020) have had tremendous impact towards improving neural PDE solver surrogates. Efficient implementations of FNO layers come as physics-informed neural networks (PINO) (Li et al., 2021b) , as U-shaped network architectures (UNO) (Rahman et al., 2022b) , as spectral surrogate for vision transformer architectures (Rao et al., 2021; Guibas et al., 2021) , as Markov neural operators (MNO) for chaotic systems (MNO) (Li et al., 2021a) , and as generative adversarial neural operators (GANOs) (Rahman et al., 2022a) . Applications range from weather forecasting (Pathak et al., 2022) , CO 2 -water multiphase problems (Wen et al., 2022) , multiscale method for crystal plasticity (Liu et al., 2022) , seismic wave propagation (Yang et al., 2021) , photoaccustic wave propagation (Guan et al., 2021) , PDE-constrained control problems (Hwang et al., 2022) , and for thermochemical curing of composites (Chen et al., 2021) . Recently, FNOs have been successfully applied to PDEs on general geometries (Li et al., 2022b) . Furthermore, universal approximation and error bounds have been studied for FNOs (Kovachki et al., 2021) . Neural PDE solvers/surrogates. The intersection of PDE solving, deep learning, fluid dynamics, and weather forecasting has developed into a very active hub of research lately (Thuerey et al., 2021) . We roughly group recent approaches to learn neural PDE surrogates and neural PDE solvers into three categories: (i) hybrid approaches, where neural networks augment numerical solvers or replace parts of numerical solvers; (ii) direct approaches, (a) where the mapping from an initial state to a solution is learned, i.e. the solution function of the underlying PDE is approximated; (b) where the mapping from an initial state to a final state of an underlying PDE, i.e. the solution operator is learned, ideally as mapping between function spaces to be able to generalize across e.g. parameters. Ad (i): Neural networks augment numerical solvers by learning data-driven discretizations for PDEs (Bar-Sinai et al., 2019) or by controlling learned approximations inside the calculation of standard numerical solver used for computational fluid dynamics (Kochkov et al., 2021) (Li et al., 2020) proposed the mapping from parameter space to solution spaces, and had tremendous impact towards improving neural PDE solver surrogates. In parallel, Lu et al. (2021) introduced DeepONet, which learns mappings between function spaces, and was successfully applied to many parametric ODEs and PDEs. Both, FNOs and DeepONets have been combined with PINNs and trained in a physics-informed style (Li et al., 2022b; Wang et al., 2021) . A comprehensive compar-ison of these two neural operator approaches is done by Lu et al. (2022) . Other directions include the modeling of PDE solution operators via latent space models, transformers, and graph neural networks (GNNs). Wu et al. (2022) present the modeling of the systems dynamics in a latent space with fixed dimension where the latent modeling is done via MLPs, and the encoding and decoding via CNNs, which can also be replaced by graph neural networks (GNNs). Cao (2021) propose the Galerkin transformer, a simple attention based operator learning method without softmax normalization, LOCA (Learning Operators with Coupled Attention) (Kissas et al., 2022) (Brandstetter et al., 2022a) or by building equivariant PDE surrogates (Wang et al., 2020b) . Our current work in a way also improves data efficiency by capturing the inductive bias appropriate for multivector fields. Overall we believe hybrids of such approaches are going to be necessary for making neural PDE surrogates of practical use in many domains. Neural PDE surrogates for fluid flow and weather forecasting applications are gaining momentum. In weather forecasting, Pathak et al. (2022) introduced FourCastNet as high-resolution weather modeling built on Adaptive Fourier Neural Operators (Guibas et al., 2021) , Keisler (2022) successfully applied a graph neural network based approach to weather forecasting, Rasp & Thuerey (2021) achieved data-driven medium-range weather prediction with a ResNet which was pretrained on climate simulations, Weyn et al. (2020) This exhaustive list of neural PDE solver surrogates shows that many of the architectures are based on convolutional or Fourier layers. For these two, Clifford layers are applicable as a drop-in replacement in almost all cases. For graph neural network and attention based architectures, we leave the implementation of respective Clifford counterparts to future work. Geometric deep learning. The core idea of geometric deep learning (Bronstein et al., 2017; 2021) is to exploit underlying low-dimensionality and structure of the physical world, in order to design deep learning models which can better learn in high dimensional spaces. Incorporating underlying symmetries would be one way to achieve this. If done correctly, it can drastically shrink the search space, which has proven to be quite successful in multiple scenarios. The most obvious examples are CNNs (Fukushima & Miyake, 1982; LeCun et al., 1998) , where the convolution operation commutes with the shift operator, and thus provides a way to equip layers and subsequently networks with translation equivariant operations. Group convolution networks (Cohen & Welling, 2016a; Kondor & Trivedi, 2018; Cohen et al., 2019) generalize equivariant layers beyond translations, i.e. provide a concept of how to build general layers that are equivariant to a broader range of groups, such as rotation groups. An appealing way of how to build such group equivariant layers is via so-called steerable basis functions (Hel-Or & Teo, 1998) , which allow to write transformation by specific groups as a linear combination of a fixed, finite set of basis functions. This concept leads to steerable group convolution approaches (Cohen & Welling, 2016b; Worrall et al., 2017) . Two concrete examples are: (i) circular harmonics, which are respective basis functions for building layers that are equivariant to the group SO(2), the rotation group in 2 dimensions (Worrall et al., 2017; Weiler & Cesa, 2019) ; (ii) spherical harmonics, which are respective basis functions for building layers that are equivariant to the group SO(3), the rotation group in 3 dimensions (Weiler et al., 2018; Geiger & Smidt, 2022; Brandstetter et al., 2021) . The similarity to multivector fields becomes more obvious if we have a closer look at spherical harmonics, which are defined as homogeneous polynomials of degree l, where the l = 0 case corresponds to scalars, the l = 1 case to vectors, and l ≥ 2 to higher order objects. Finally, Jenner & Weiler (2021) built steerable PDE operators such as curl or divergence as equivariant neural network components. Grouped convolution. In their seminal work, Krizhevsky et al. (2012) introduced filter grouping, which allowed them to reduce the parameters in CNNs. The respective grouped convolutions (not to be confused with group convolutions) divide the filter maps at channel dimension, as the channel dimension most of the time increases strongly for deeper layers, and thus dominates the parameter count. Subsequent work showed that it is beneficial to additionally shuffle the channels for each filter group (Zhang et al., 2018) , and to adaptively recalibrate channel-wise feature responses (Hu et al., 2018) . All these approaches can be seen in the wider spectrum of effective model scaling (Tan & Le, 2019; Sandler et al., 2018) . Clifford convolutions in contrast do not have groupings in the channel dimensions, but instead group together elements as multivectors. In Clifford convolution, the Clifford kernel is therefore a constrained object where weight blocks appear multiple times (due to the nature of the geometric product). Thus, Clifford convolutions are more parameter efficient than standard convolutions, and all tricks of effective model scaling could in principle be applied on top of Clifford convolutions. Findings from Hoffmann et al. (2020) with respect to higher compute density of alternative algebras are applicable to our work as well.

E GLOSSARY

This short appendix summarizes notations used throughout the paper (Table 5 ), and contrasts the most fundamental concepts which arise when using Clifford algebras. Table 5 : Notations used throughout the paper. Notation Meaning e 1 , e 2 , e 3 Basis vectors of the generating vector space of the Clifford algebra. e i ∧ e j Wedge (outer) product of basis vectors e i and e j . e i • e j = ⟨e 1 , e j ⟩ Inner product of basis vectors e i and e j . e 1 e 2 , e 3 e 1 , e 2 e 3 Basis bivectors of the vector space of the Clifford algebra. Geometric, Exterior, and Clifford algebras. A geometric algebra is a Clifford algebra of the real numbers. Since we are only using Cl 2,0 (R), Cl 0,2 (R), and Cl 3,0 (R), we are effectively working with geometric algebras. The exterior or Grassmann algebra is built up from the same concepts of scalars, vectors, bivectors, . . . , k-vectors, but only exterior (wedge) products exist. Therefore, the exterior algebra has a zero quadratic form (all base vectors square to zero). Clifford algebras are a generalization thereof with nonzero quadratic forms. Complex numbers, quaternions, hypercomplex numbers. Hypercomplex numbers are elements of finite-dimensional algebras over the real numbers that are unital, i.e. contain a multiplicative identity element, but not necessarily associative or commutative. Elements are generated for a basis {î, ȷ, . . .} such that î2 , ȷ2 , . . . ∈ {-1, 0, 1}. Complex numbers, quaternions, octonions are all hypercomplex numbers which can be characterized by different Clifford algebras. The bivector, trivector (and higher objects) of the Clifford algebras directly translate into basis elements of the respective algebras. For example, quaternions (which are of the form a + bî + cȷ + d k, where î2 = ȷ2 = k2 = -1) are isomorphic to the Clifford algebra Cl 0,2 (R) where the basis element e 1 , e 2 , and e 1 e 2 directly translate to î, ȷ, k. Spinor. Spinors arise naturally in discussions of the Lorentz group, the group to describe transformations in special relativity. One could say that a spinor is the most basic sort of mathematical object that can be Lorentz-transformed. In its essence, a spinor is a complex two-component vectorlike quantity in which rotations and Lorentz boosts (relativistic translations) are built into the overall formalism. More generally, spinors are elements of complex vector spaces that can be associated with Euclidean vector spaces. However, unlike vectors, spinors transform to their negative when the space is rotated by 360 • . In this work, the subalgebra Cl 0 (2, 0)(R), spanned by even-graded basis elements of Cl 2,0 (R), i.e. 1 and e 1 e 2 , determines the space of spinors via linear combinations of 1 and e 1 e 2 . It is thus isomorphic to the field of complex numbers C. Most notably, spinors of Cl 2,0 (R) commute with the Fourier kernel, whereas vectors do not. For a detailed introduction to spinors we recommend Steane (2013), and the comprehensive physics book of Schwichtenberg (2015) . Pseudoscalar. A pseudoscalar -unlike a scalar -changes sign when you invert the coordinate axis. The easiest example of a pseudoscalar is the scalar triplet product of three arbitrary vectors of an Euclidean vector space x, y, z ∈ R n with inner product ⟨., .⟩. The scalar triplet product becomes negative for any parity inversion, i.e. swapping any two of the three operands: x • (y × z) = -x • (z × y) = -y • (x × z) = -z • (y × x). Scalar field, vector field. A field is any (physical) quantity which takes on different values at different points in space (space-time). A scalar field is map D → R, where D ⊆ R n . A vector field is map D → R n , where D ⊆ R n . For example, n = 2 results in a vector field in plane, and n = 3 in a vector field in space. For an interesting history of the evolution of the concept of fields in physics we recommend Mirowski (1991); McMullin (2002) . In Table 6 , we list various important vector and scalar fields for comparison.



Operations of addition and multiplication are associative. In contrast to scalars, pseudoscalars change sign under reflections. Spinors are elements of a complex vector space that can be associated with Euclidean space. Unlike vectors, spinors transform to their negative when rotated 360 • . The bivector e1e3 has negative orientation. In deep learning, a convolution operation in the forward pass is implemented as cross-correlation. Note that the expansion coefficients for the feature map f j and filters w i,j in terms of the basis elements of G 2 and in terms of quaternion elements î, ȷ and k are the same. The FFT of a real-valued signal is Hermitian-symmetric, so the output contains only the positive frequencies below the Nyquist frequency for the last spatial dimension. https://github.com/tum-pbs/PhiFlow https://developer.nvidia.com/cufft For alternative efficient GPU-accelerated multidimensional FFT libraries see e.g. https://github. com/DTolm/VkFFT In contrast to scalars, pseudoscalars change sign under reflection. In deep learning, a convolution operation in the forward pass is implemented as cross-correlation. We could not find neural rotational quaternion convolutions in existing literature, we however used the codebase of https://github.com/Orkis-Research/Pytorch-Quaternion-Neural-Networks as inspiration. Note that the expansion coefficients for the feature map f j and filters w i,j in terms of the basis elements of G 2 and in terms of quaternion elements î, ȷ and k are the same. The default PyTorch initialization of linear and convolution layers is He Uniform initialization(He et al., 2015) for 2-dimensional problems. The gain is calculated using LeakyRelu activation functions with negative part of 5, which effectively results in Glorot Uniform initialization. https://github.com/tum-pbs/PhiFlow https://github.com/milankl/SpeedyWeather.jl https://github.com/flaport/fdtd



Figure 2: Multivector components of Clifford algebras.

e2 = -e2 ∧ e1 e1

Figure 3: Antisymmetry of bivector exterior (wedge) product.

Figure 5: Sketch of Fourier Neural Operator (FNO) and Clifford Fourier Operator (CFNO) layers.The real valued Fast Fourier transform (RFFT) over real valued scalar input fields f (x) is replaced by the complex Fast Fourier transform (FFT) over the complex valued dual parts v(x) and s(x) of multivector fields f (x). Pointwise multiplication in the Fourier space via complex weight tensor W is replaced by the geometric product in the Clifford Fourier space via multivector weight tensor W . Additionally, the convolution path is replaced by Clifford convolutions with multivector kernels w.

Figure 6: Example input and target fields for the Navier-Stokes experiments. Input fields comprise a t = 2 timestep history.

Figure 7: Results for ResNet based (left) and Fourier based (right) architectures on the 2-dimensional Navier-Stokes and Shallow water experiments. One-step and rollout loss are shown.

Figure 8: Results for Fourier based architectures on Maxwell equation's.

algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.2 Examples of low-dimensional Clifford algebras . . . . . . . . . . . . . . . . . . . A.2.1 Clifford algebra Cl 0,1 (R) . . . . . . . . . . . . . . . . . . . . . . . . . . A.2.2 Clifford algebra Cl 2,0 (R) . . . . . . . . . . . . . . . . . . . . . . . . . . A.2.3 Clifford algebra Cl 0,2 (R) . . . . . . . . . . . . . . . . . . . . . . . . . . A.2.4 Clifford algebra Cl 3,0 (R) . . . . . . . . . . . . . . . . . . . . . . . . . . A.3 The electromagnetic field in 3 dimensions . . . . . . . . . . . . . . . . . . . . . . B Clifford neural layers B.1 Clifford convolution layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.1 Translation equivariance of Clifford convolutions . . . . . . . . . . . . . . B.1.2 Rotational Clifford CNN layers . . . . . . . . . . . . . . . . . . . . . . . B.1.3 3D Clifford convolution layers . . . . . . . . . . . . . . . . . . . . . . . . B.2 Clifford normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.3 Clifford initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.4 Equivariance under rotations and reflections . . . . . . . . . . . . . . . . . . . . . B.5 Clifford Fourier layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.5.1 2D Clifford Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . B.5.2 2D Clifford convolution theorem . . . . . . . . . . . . . . . . . . . . . . . B.5.3 3D Clifford Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . B.5.4 3D Clifford convolution theorem . . . . . . . . . . . . . . . . . . . . . . . B.5.5 Implementation of Clifford Fourier layers . . . . . . . . . . . . . . . . . . B.6 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C Experiments C.1 Loss function and metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A.2.2 CLIFFORD ALGEBRA Cl 2,0 (R) The Clifford algebra Cl 2,0 (R) is a 4-dimensional algebra with vector space G 2 spanned by the basis vectors {1, e 1 , e 2 , e 1 e 2 } where e 1 , e 2 square to +1. The geometric product of two multivectors a = a 0 + a 1 e 1 + a 2 e 2 + a 12 e 1 e 2 and b = b 0 + b 1 e 1 + b 2 e 2 + b 12 e 1 e 2 is defined via: ab = a 0 b 0 + a 0 b 1 e 1 + a 0 b 2 e 2 + a 0 b 12 e 1 e 2 + a 1 b 0 e 1 + a 1 b 1 e 1 e 1 + a 1 b 2 e 1 e 2 + a 1 b 12 e 1 e 1 e 2 + a 2 b 0 e 2 + a 2 b 1 e 2 e 1 + a 2 b 2 e 2 e 2 + a 2 b 12 e 2 e 1 e 2 + a 12 b 0 e 1 e 2 + a 12 b 1 e 1 e 2 e 1 + a 12 b 2 e 1 e 2 e 2 + a 12 b 12 e 1 e 2 e 1 e 2 .

e 1 + a 2 e 2 + a 12 e 1 e 2 and b = b 0 + b 1 e 1 + b 2 e 2 + b 12 e 1 e 2 is defined as: ab = a 0 b 0 + a 0 b 1 e 1 + a 0 b 2 e 2 + a 0 b 12 e 1 e 2 + a 1 b 0 e 1 + a 1 b 1 e 1 e 1 + a 1 b 2 e 1 e 2 + a 1 b 12 e 1 e 1 e 2 + a 2 b 0 e 2 + a 2 b 1 e 2 e 1 + a 2 b 2 e 2 e 2 + a 2 b 12 e 2 e 1 e 2 + a 12 b 0 e 1 e 2 + a 12 b 1 e 1 e 2 e 1 + a 12 b 2 e 1 e 2 e 2 + a 12 b 12 e 1 e 2 e 1 e 2 .(33)

a 0 b 3 e 3 + a 0 b 12 e 1 e 2 + a 0 b 13 e 1 e 3 + a 0 b 23 e 2 e 3 + a 0 b 123 e 1 e 2 e 3 + a 1 b 0 e 1 + a 1 b 1 e 1 e 1 + a 1 b 2 e 1 e 2 + a 1 b 3 e 1 e 3 + a 1 b 12 e 1 e 1 e 2 + a 1 b 13 e 1 e 1 e 3 + a 1 b 23 e 1 e 2 e 3 + a 1 b 123 e 1 e 1 e 2 e 3 + a 2 b 0 e 2 + a 2 b 1 e 2 e 1 + a 2 b 2 e 2 e 2 + a 2 b 3 e 2 e 3 + a 2 b 12 e 2 e 1 e 2 + a 2 b 13 e 1 e 3 e 2 + a 2 b 23 e 2 e 2 e 3 -a 2 b 123 e 2 e 2 e 1 e 3 + a 3 b 0 e 3 + a 3 b 1 e 3 e 1 + a 3 b 2 e 3 e 2 + a 3 b 3 e 3 e 3 + a 3 b 12 e 1 e 3 e 2 -a 3 b 13 e 1 e 3 e 3 -a 3 b 23 e 2 e 3 e 3 + a 3 b 123 e 1 e 2 e 3 e 3 + a 12 b 0 e 1 e 2 -a 12 b 1 e 2 e 1 e 1 + a 12 b 2 e 1 e 2 e 2 + a 12 b 3 e 1 e 2 e 3 + a 12 b 12 e 1 e 2 e 1 e 2 -a 12 b 13 e 1 e 1 e 2 e 3 + a 12 b 23 e 2 e 2 e 1 e 3 + a 12 b 123 e 1 e 2 e 1 e 2 e 3 + a 13 b 0 e 1 e 3 -a 13 b 1 e 3 e 1 e 1 + a 13 b 2 e 1 e 3 e 2 + a 13 b 3 e 1 e 3 e 3 a 13 b 12 e 1 e 1 e 3 e 2 + a 13 b 13 e 1 e 3 e 1 e 3 -a 13 b 23 e 1 e 2 e 3 e 3 + a 13 b 123 e 1 e 3 e 1 e 2 e 3 + a 23 b 0 e 2 e 3 + a 23 b 1 e 1 e 3 e 2 + a 23 b 2 e 2 e 3 e 2 + a 23 b 3 e 2 e 3 e 3 + a 23 b 12 e 2 e 3 e 1 e 2 -a 23 b 13 e 2 e 1 e 3 e 3 + a 23 b 23 e 2 e 3 e 2 e 3 + a 23 b 123 e 2 e 3 e 1 e 2 e 3 + a 123 b 0 e 1 e 2 e 3 + a 123 b 1 e 1 e 2 e 3 e 1 -a 123 b 2 e 1 e 3 e 1 e 2 + a 123 b 3 e 1 e 2 e 3 e 3 + a 123 b 12 e 1 e 2 e 2 e 1 e 2 + a 123 b 13 e 1 e 2 e 3 e 1 e 3 + a 123 b 23 e 1 e 2 e 3 e 2 e 3 + a 123 b 123 e 1 e 2 e 3 e 1 e 2 e 3 ,

Translation equivariance of Clifford convolutions

.3 3D CLIFFORD CONVOLUTION LAYERS Implementation of Cl 3,0 (R) layers. Analogously to the 2-dimensional case, we can implement a 3D Clifford CNN layer using Equation 42, where {b 0 , b 1 , b 2 , b 12 , b 13 , b 23 , b 123 } correspond to 8 different kernels representing one 3D multivector kernel, i.e. 8 different convolution layers, and {a 0 , a 1 , a 2 , a 12 , a 13 , a 23 , a 123 } correspond to the scalar, vector, bivector, and trivector parts of the input multivector field. Convolution layers for different 3-dimensional Clifford algebras change the signs in the geometric product. B.2 CLIFFORD NORMALIZATION Different normalization schemes have been proposed to stabilize and accelerate training deep neural networks

e 1 e 2 e 3 = -e 1 e 2 e 1 e 3 = e 1 e 2 e 3 e 1 = i 3 e 1 ✓ e 2 i 3 = e 2 e 1 e 2 e 3 = -e 1 e 2 e 2 e 3 = e 1 e 2 e 3 e 2 = i 3 e 2 ✓ e 3 i 3 = e 3 e 1 e 2 e 3 = -e 1 e 3 e 2 e 3 = e 1 e 2 e 3 e 3 = i 3 e 3 ✓ (77)

← VIEW AS REALVECTOR(x) 7: output ← CONV2D(kernel, input) 8: return VIEW AS MULTIVECTOR(output) Algorithm 1: Pseudocode for 2D Clifford convolution using Cl 2,0 . 1: function CLIFFORDKERNEL2D ROT (W ) 2:

Figure 9: Results on Navier-Stokes equations obtained by ResNet based architectures. Unrolled loss, one-step loss, scalar loss and vector loss are reported for ResNet, CResNet, and CResNet rot architectures. Models are trained on training sets with increasing number of trajectories. ResNet based architectures have a much higher loss than FNO based architectures in the low data regime, where possibly smearing and averaging operations are learned first.

Figure 16: An example propagation of the displacement field D and the magnetization field H. Shown are the field components for an arbitrary slice of the x-y plane.

Figure 17: Results on the Maxwell equations obtained by Fourier based architectures using a two timestep history input. Rollout loss, one-step loss, displacement field D loss, and magnetization field H loss are reported for FNO and CFNO architectures. Models are trained on four training sets with increasing number of trajectories.

use CNNs on a cubed sphere for global weather prediction,Weyn et al. (2021) forecast weather sub-seasonally with a large ensemble of deep-learning weather prediction models,Arcomano et al. (2020) build a reservoir computing-based, low-resolution, global  prediction model, and MetNet (Sønderby et al., 2020)  takes as input radar and satellite data to forecast probabilistic precipitation maps. Finally, data assimilation is improved by deep learning techniques inFrerix et al. (2021) andMaulik et al. (2022). Similarly, in fluid dynamics,Ma et al.  (2021a)  applied U-Nets(Ronneberger et al., 2015) to achieve physics-driven learning of steady Navier-Stokes equations,Stachenfeld et al. (2021) learned coarse models for turbulence simulations, TF-Net(Wang et al., 2020a)  introduced domain-specific variations of U-Nets along with trainable spectral filters in a coupled model of Reynolds-averaged Navier-Stokes and Large Eddy Simulation.

Basis trivector of the vector space of the Clifford algebra. i 2 = e 1 e 2 Pseudoscalar for Clifford algebras of grade 2. i 3 = e 1 e 2 e 3 Pseudoscalar for Clifford algebras of grade 3. x Euclidean vector ∈ R n . x ∧ y wedge (outer) product of Euclidean vectors x and y. x • y = ⟨x, y⟩ Inner product of vectors x and y. a Multivector. ab Geometric product of multivectors a and b. î, ȷ, k Base elements of quaternions.

Model comparison on four different metrics for neural PDE surrogates which are trained on Navier-Stokes training datasets of varying size. Error bars are obtained by running experiments with three different initial seeds.

For ResNet-like architectures, CResNet and CResNet rot improve upon the ResNet baseline, rollout losses are much lower for the two Clifford based architectures, and CResNet rot based architectures in general perform better than CResNet based ones. For Fourier based architectures, the loss is in general much lower than for ResNet based architectures (a training set size of 56 trajectories yields similar (C)FNO test set performance than a training set size of 896 trajectories for ResNet based architectures). CFNO architectures improve upon FNO architectures for all dataset sizes, and for one-step as well as rollout losses, which is especially pronounced for low number of training trajectories. Results on the shallow water equations obtained by ResNet based architectures using a two timestep history input. Rollout loss, one-step loss, scalar loss and vector loss are reported for ResNet, CResNet, and CResNet rot architectures. Models are trained on training sets with increasing number of trajectories. ResNet based architectures have a much higher loss than FNO based architectures in the low data regime, where possibly smearing and averaging operations are learned first. Model comparison on four different metrics for neural PDE surrogates which are trained on the shallow water equations training datasets of varying size. Results are obtained by using a two timestep history input. Error bars are obtained by running experiments with three different initial seeds.

Model comparison on four different metrics for neural PDE surrogates which are trained on the shallow water equations training datasets of varying size. Results are obtained by using a four timestep history input. Error bars are obtained by running experiments with three different initial seeds.

Model comparison on four different metrics for neural PDE surrogates which are trained on the Maxwell equations training datasets of varying size. Results are obtained by using a two timestep history input. Error bars are obtained by running experiments with three different initial seeds.This appendix supports detailed discussions of how our work relates to complex and quaternion neural networks, to work on Clifford algebras and Clifford Fourier transforms in computer vision, to Fourier Neural Operators, equivariant neural networks and geometric deep learning approaches, to neural operator learning and neural PDE surrogates. Clifford algebras and Clifford Fourier transforms are already deployed to solve PDEs numerically inAlfarraj & Wei (2022). More precisely, the Clifford-Fourier transform is used to solve the mode decomposition process in PDE transforms.Clifford neural networks. Neural networks in the Clifford domain were proposed already in 1994 by

Trabelsi et al. (2017) introduced the key components for complex-valued deep neural networks. More precisely, they introduced convolutional (Le-Cun et al., 1998) feed-forward and convolutional LSTM(Shi et al., 2015;Hochreiter & Schmidhuber, 1997) networks, together with complex batch-normalization, and complex weight initialization strategies. Quaternions are a natural extension of complex neural networks. Already in classical computer vision, quaternions as hypercomplex convolution (Sangwine

. InGreenfeld et al. (2019), a prolongation is learned which maps from discretized PDE solutions to multigrid solutions,Hsieh et al. (2019) learn to modify the updates of an existing solver,Praditia et al. (2021) adopt the numerical structure of Finite Volume Methods (FVMs), andUm et al. (2020) learn a correction function of conventional PDE solvers to improve accuracy. All these approaches are hybrid approaches(Garcia Satorras et al., 2019), where the computational graph of the solver is preserved and heuristically-chosen parameters are predicted with a neural network. A different flavor of hybrid approaches can be assigned to the works ofSanchez-Gonzalez et al. (2020);Pfaff et al. (2020);Mayr et al. (2021) who predict accelerations of particles/meshes to numerical update the respective positions. Finally, PauliNet(Hermann et al., 2020) and FermiNet(Pfau et al., 2020) approximate wave-functions of many-electron systems, and thus replace the hand-crafted ansatz which is conventionally used in variational quantum Monte Carlo methods.

maps the input functions to a finite set of features and attends to them by output query locations, andLi et al.  (2022a)  propose a transformer which provides a flexible way to implicitly exploit the patterns within inputs.Brandstetter et al. (2022b)  formulated a message passing neural network approach that representationally contains several conventional numerical PDE solving schemes. Further GNN based approaches areLötzsch et al. (2022) who learn the operator for boundary value problems on finite element method (FEM)(Brenner et al., 2008) ground truth data, andLienen & Günnemann (2022) who derive their GNN models from FEM in a principled way.A practical use case for neural PDE surrogates is replacing expensive classical PDE solvers. There is however a major chicken-and-egg problem here(Brandstetter et al., 2022a;Shi et al., 2022): obtaining high quality ground truth training data for neural PDE surrogates often requires using these expensive solvers. Minimizing this data requirement is beginning to be approached in recent works. Geneva & Zabaras (2020);Wandel et al. (2020;2022) achieve "data-free" training in various settings. "Data-free" refers to the self-supervised training steps, which are done without ground truth data. The current state-of-art generic approach is introduced in Shi et al. (2022) as the mean squared residual (MSR) loss constructed by the discretized PDE itself. However, for e.g. generating realistic initial conditions numerical solvers are still needed.Pestourie et al. (2021) identify how incorporating limited physical knowledge in the form of a low-fidelity "coarse" solver can allow training PDE surrogate models with an order of magnitude less data. Another direction to improve data efficiency is by exploiting the Lie point symmetries of the underlying PDEs, either via data augmentation

annex

Through the lense of Cl(3, 0)(R), an intriguing example of the duality of multivectors is found when writing the expression of the electromagnetic field F in terms of an electric vector field E and a magnetic vector field B (Hestenes & Sobczyk, 2012; Hestenes, 2003) , such thatBoth the electric field E and the magnetic field B are described by Maxwell's equations (Griffiths, 2005) . The two fields are strongly coupled, e.g. temporal changes of electric fields induce magnetic fields and vice versa. Probably the most illustrative co-occurence of electric and magnetic fields is when describing the propagation of light. In standard vector algebra, E is a vector while B is a pseudovector, i.e. the two kinds of fields are distinguished by a difference in sign under space inversion. Equation 43 naturally decomposes the electromagnetic field into vector and bivector parts via the pseudoscalar i 3 . For example, for the base component B x e 1 of B it holds that B x e 1 i 3 = B x e 1 e 1 e 2 e 3 = B x e 2 e 3 , which is a bivector and the dual to the base component e 1 of E.Geometric algebra reveals that a pseudovector is nothing else than a bivector represented by its dual, so the magnetic field B in Equation 43 is fully represented by the complete bivector Bi 3 , rather than B alone. Consequently, the multivector representing F consists of three vectors (the electric field components) and three bivectors e 1 i 3 = e 2 e 3 , e 2 i 3 = e 3 e 1 , e 3 i 3 = e 1 e 2 (the magnetic field components multiplied by i 3 ).Published as a conference paper at ICLR 2023The shift parameter β is a multivector with 4 learnable components and the scaling parameter γ is 4 × 4 positive matrix. The multivector batch normalization is defined as:When the batch sizes are small, it can be more appropriate to use Group Normalization or Layer Normalization. These can be derived with appropriate application of Eq. 50 along appropriate tensor dimensions. As such, batch, layer, and group normalization can be easily extended to 3-dimensional Clifford algebras. 2020). However, 3-dimensional implementations necessitate much smaller initialization values (factor 1/8).

B.4 EQUIVARIANCE UNDER ROTATIONS AND REFLECTIONS

Clifford convolutions satisfy the property of equivariance under translation of the multivector inputs, as shown in this Appendix B. However, the current definition of Clifford convolutions is not equivariant under multivector rotations or reflections. Here, we derive a general kernel constraint which allows us to build generalized Clifford convolutions which are equivariant w.r.t rotations or reflections of the multivectors. That is, we like to prove equivariance of a Clifford layer under rotations and reflections (i.e. orthogonal transformations) if the multivector kernel multivector filters {w i } cout i=1 : Z 2 → (G) cin satisfies the constraint:for 0 ≤ j < c in . We first define an orthogonal transformation on a multivector by,where u and f are multivectors which are multiplied using the geometric product. The minus sign is picked by reflections but not by rotations, i.e. it depends on the parity of the transformation. This construction is called a "versor" product. The construction can be found in e.g. Suter (2003) for vectors and its extension to arbitrary multivectors. The above construction makes it immediately clear that T(f g) = (Tf )(Tg). When we write T x, we mean an orthogonal transformation of an Euclidean vector (which can in principle also be defined using versors). To show equivariance, we wish to prove for multivectors f : Z 2 → (G) cin and a set of c out multivector filtersandEquations 54, 55 yield:That is: if the input multivector field transforms as a multivector, and the kernel satisfies the stated equivariance constraint, then the output multivector field also transforms properly as a multivector. Note that T might act differently on the various components (scalars, vectors, pseudoscalars, pseudovectors) under rotations and/or reflections.

C EXPERIMENTS

This appendix supports Section 4 of the main paper.

C.1 LOSS FUNCTION AND METRICS

We report the summed MSE (SMSE) loss defined as:where u is the target, û the model output, N fields comprises scalar fields as well as individual vector field components, and N y is the total number of spatial points. Equation 79 is used for training with N t = 1, and further allows us to define four metrics:• One-step loss where N t = 1 and N fields comprises all scalar and vector components.• Vector loss where N t = 1 and N fields comprises only vector components.• Scalar loss where N t = 1 and N fields comprises only the scalar field.• Rollout loss where N t = 5 and N fields comprises all scalar and vector components.For Maxwell's equation, electric and magnetic loss are defined analogously to the vector and the scalar loss for Navier-Stokes and shallow water experiments.

C.2 MODELS

We experiment with two architecture families: ResNet models (He et al., 2016) and Fourier Neural Operators (FNOs) (Li et al., 2020) . All baseline models are fine-tuned for all individual experiments with respect to number of blocks, number of channels, number of modes (FNO), learning rates, normalization and initialization procedures, and activation functions. The best models are reported, and for reported Clifford results each convolution layer is substituted with a Clifford convolution, each Fourier layer with a Clifford Fourier layer, each normalization with a Clifford normalization and each non-linearity with a Clifford non-linearity. A Clifford non-linearity in this context is a the application of the corresponding default linearity to the different multivector components.ResNet architectures. For Navier-Stokes and shallow water experiments, we use ResNet architectures with 8 residual blocks, each consisting of two convolution layers with 3×3 kernels, shortcut connections, group normalization (Wu & He, 2018) , and GeLU activation functions (Hendrycks & Gimpel, 2016) . We further use two embedding and two output layers, i.e. the overall architectures could be classified as Res-20 networks. In contrast to standard residual networks for image classification, we don't use any down-projection techniques, e.g. convolution layers with strides larger than 1 or via pooling layers. In contrast, the spatial resolution stays constant throughout the network. We therefore also use the same number of hidden channels throughout the network, that is 128 channels per layer. Overall this results in roughly 2.4 million parameters. Increasing the number of residual blocks or the number of channels did not increase the performance significantly.Clifford ResNet architectures. For every ResNet-based experiment, we replaced the fine-tuned ResNet architectures with two Clifford counterparts: each CNN layer is replaced with a (i) Clifford CNN layer, and (ii) with a rotational Clifford CNN layer. To keep the number of weights similar, instead of 128 channels the resulting architectures have 64 multivector channels, resulting again in roughly 1.6 million floating point parameters. Additionally for both architectures, GeLU activation functions are replaced with Clifford GeLU activation functions, group normalization is replaced with Clifford group normalization. Using Clifford initialization techniques did not improve results.Fourier Neural Operator architectures. For Navier-Stokes and shallow water experiments, we used 2-dimensional Fourier Neural Operators (FNOs) consisting of 8 FNO blocks, two embedding and two output layers. Each FNO block comprised a convolution path with a 1 × 1 kernel and an FFT path. We used 16 Fourier modes (for x and y components) for point-wise weight multiplication, 

