FILTRA: RETHINKING STEERABLE CNN BY FILTER TRANSFORM

Abstract

Steerable CNN imposes the prior knowledge of transformation invariance or equivariance in the network architecture to enhance the the network robustness on geometry transformation of data and reduce overfitting. Filter transform has been an intuitive and widely used technique to construct steerable CNN in the past decades. Recently, group representation theory is used to analyze steerable CNN and reveals the function space structure of a steerable kernel function. However, it is not yet clear on how this theory is related to the filter transform technique. In this paper, we show that kernel constructed by filter transform can also be interpreted in the group representation theory. Meanwhile, we show that filter transformed kernels can be used to convolve input/output features in different group representation. This interpretation help complete the puzzle of steerable CNN theory and provides a novel and simple approach to implement steerable convolution operators. Experiments are executed on multiple datasets to verify the feasibilty of the proposed approach.

1. INTRODUCTION

Beyond the well-known property of equivariance under translation, there has been substantial recent interest in CNN architectures that are equivariant with respect to other transformation groups, e.g. reflection and rotation. Applications of such architectures range over scenarios where object orientation might variate, including OCR, aerial image processing, 3D point cloud processing, medical image processing, texture analysis and etc. Previous works on constructing equivariant CNN can be coarsely categorized as two aspects. The first aspect designs special steerable filters so that the convolutional output is hard-baked to transform accordingly when the input reflects or rotates. A plenty of works develop this idea by filter rotation, including hand-crafted filters (Oyallon & Mallat, 2015) and learned filters (Laptev et al., 2016; Zhou et al., 2017; Cheng et al., 2018; Marcos et al., 2017) . TI-Pooling (Laptev et al., 2016) produce invariant output as input rotates. ORN (Zhou et al., 2017) and RotDCF (Cheng et al., 2018) produces output which circularly shifted as input rotates. Since each dimension of such permutable output corresponds to a uniformly discrete rotation angle, RotEqNet (Marcos et al., 2017) propose to extract rotation angle from the permutable features. Another approach to construct steerable filters is to linearly combine a set of steerable bases. These basis can be solved in discrete function space (Cohen & Welling, 2014; 2016) or continuous function space (Worrall et al., 2017; Weiler & Cesa, 2019) . Weiler & Cesa (2019) comprehensively summarize works on steerable bases using polar Fourier basis. The second aspect exploits specific transforms to act on input. Spatial Transformer Network (STN) is a well-known representative, which predicts an affine matrix to transform its input to the canonical form. Tai et al. (2019) inherits this idea to design equivariant network. Another choice of transform is to the polar coordinate system (Henriques & Vedaldi, 2017; Esteves et al., 2018) . Since 2D rotation in Cartesian coordinate system corresponds to 2D translation in polar coordinate system, rotation equivariance can be achieved by conventional translation equivariant CNN. The approach proposed in this paper falls into the first category. Weiler & Cesa (2019) proves that all steerable convolutional operator could be denoted as the combination of a specific set of polar Fourier bases. However, it is not clear yet how this interpretation is related with the widely used filter transform scheme. In this paper, we aim to establish the missing connection between the group representation based analysis for steerable filters and filter transform scheme. To this end, we propose a new approach (FILTRA) to use filter transform to establish steerability between features in different group representation in cyclic group C N and dihedral group D N . We verify the feasibility of FILTRA for the classification and regression tasks on different datasets.

2. PRELIMINARIES

We make use of several NumPy or SciPy functions in equations including rollfoot_0 , flipudfoot_1 and circulantfoot_2 . We omit the variable in bracket sometimes by writing κ * * = κ * * (g) and K * * = K * * (φ). We recapitulate the basic concepts of steerable CNN which will be frequently used in this paper. For detailed introduction, readers can refer to Weiler & Cesa (2019) for a comprehensive information. We mainly consider the 2D image case and denote x ∈ R 2 as a pixel coordinate. We use vector field f (x) ∈ R C to denote a general multi-channel image, where C is the number of channels. Typical examples of f (x) include RGB image f (x) ∈ R 3 and gradient image f (x) ∈ R 2 . Consider a group G of transformations and an element g ∈ G. Examples of G include rotation, translation and flip. A vector field f (x) follows the below rules when undergoing the act π(g) of a group element g:

2.1. STEERABLE CNN

f (x) π(g) • f = ρ(g)f (g -1 x) g g ρ ≡ 1 ρ = ψ 1 π(g) • f = ρ(g)f (g -1 x), where ρ(g) is a group representation related to vector field f . Fig. 1 shows an example of different types of ρ for RGB images and gradient images under a rotation transform element g. The group representation of RGB is ρ(g) ≡ 1 while for gradient image ρ(g) is a 2D rotation matrix which also rotates vector f (x) by g. In the scenario of convolutional neural network, a convolution operator f → κ • f is considered as steerable if it satisfies κ • [π 1 (g)f ] = π 2 (g)[κ • f ], i.e. the output vector field transforms equivariantly under g when the input is transformed by g.

2.2. REFLECTION GROUP, CYCLIC GROUP AND DIHEDRAL GROUP

We consider steerable filters on reflection group ({±1}, * ), cyclic group C N and dihedral group D N = ({±1}, * ) C N . To unify the notations in derivation, we interpret C N = ({1}, * ) C N and ({±1}, * ) = ({±1}, * ) C 1 = D 1 so that a element in these three groups can always be denoted as a pair g = (i 0 , i 1 ), whose range is Z 2 × Z 1 for reflection group, Z 1 × Z N for cyclic group and Z 2 × Z N for dihedral group. Each element in C N corresponds to rotation angle θ i1 = 2i1π N .

2.3. GROUP REPRESENTATION

A linear representation ρ of a group G on a vector space R n is a group homomorphism from G to the general linear group GL(n), denoted as ρ : G → GL(n) s.t. ρ(gg ) = ρ(g)ρ(g ), ∀g, g ∈ G. (3)



https://numpy.org/doc/stable/reference/generated/numpy.roll.html https://numpy.org/doc/stable/reference/generated/numpy.flipud.html https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.circulant.html



Figure 1: Examples of images (feature maps) with different group representation ρ. Both images undergo 90deg rotation. The upper row is an RGB image whose 3-channel colors remain the same when the image is rotated. The lower row is a gradient image whose two channel value should be rotated in the same way when the gradient image is rotated.

