BISPECTRAL NEURAL NETWORKS

Abstract

We present a neural network architecture, Bispectral Neural Networks (BNNs) for learning representations that are invariant to the actions of compact commutative groups on the space over which a signal is defined. The model incorporates the ansatz of the bispectrum, an analytically defined group invariant that is complete-that is, it preserves all signal structure while removing only the variation due to group actions. Here, we demonstrate that BNNs are able to simultaneously learn groups, their irreducible representations, and corresponding equivariant and complete-invariant maps purely from the symmetries implicit in data. Further, we demonstrate that the completeness property endows these networks with strong invariance-based adversarial robustness. This work establishes Bispectral Neural Networks as a powerful computational primitive for robust invariant representation learning.

1. INTRODUCTION

A fundamental problem of intelligence is to model the transformation structure of the natural world. In the context of vision, translation, rotation, and scaling define symmetries of object categorization-the transformations that leave perceived object identity invariant. In audition, pitch and timbre define symmetries of speech recognition. Biological neural systems have learned these symmetries from the statistics of the natural world-either through evolution or accumulated experience. Here, we tackle the problem of learning symmetries in artificial neural networks. At the heart of the challenge lie two requirements that are frequently in tension: invariance to transformation structure and selectivity to pattern structure. In deep networks, operations such as max or average are commonly employed to achieve invariance to local transformations. Such operations are invariant to many natural transformations; however, they are also invariant to unnatural transformations that destroy image structure, such as pixel permutations. This lack of selectivity may contribute to failure modes such as susceptibility to adversarial perturbations [1], excessive invariance [2], and selectivity for textures rather than objects [3] . Thus, there is a need for computational primitives that selectively parameterize natural transformations and facilitate robust invariant representation learning. An ideal invariant would be complete. A complete invariant preserves all pattern information and is invariant only to specific transformations relevant to a task. Transformations in many datasets arise from the geometric structure of the natural world. The mathematics of groups and their associated objects give us the machinery to precisely define, represent, and parameterize these transformations, and hence the problem of invariant representation learning. In this work, we present a novel neural network primitive based on the bispectrum, a complete invariant map rooted in harmonic analysis and group representation theory [4] . Bispectral Neural Networks flexibly parameterize the bispectrum for arbitrary compact commutative groups, enabling both the group and the invariant map to be learned from data. The architecture is remarkably simple. It is comprised of two layers: a single learnable linear layer, followed by a fixed collection of triple products computed from the output of the previous layer. BNNs are trained with an objective function consisting of two terms: one that collapses all transformations of a pattern to a single point in the output (invariance), and another that prevents information collapse in the first layer (selectivity). We demonstrate that BNNs trained to separate orbit classes in augmented data learn the group its Fourier transform, and corresponding bispectrum purely from the symmetries implicit in the data (Section 4.1). Because the model has learned the fundamental structure of the group, we show that it generalizes to novel, out-of-distribution classes with the same group structure and facilitates downstream group-invariant classification (Section 4.2). Further, we demonstrate that the trained network inherits the completeness of the analytical model, which endows the network with strong adversarial robustness (Section 4.3). Finally, we demonstrate that the weights of the network can be used to recover the group Cayley table-the fundamental signature of a group's structure (Section 4.4). Thus, an explicit model of the group can be learned and extracted from the network weights. To our knowledge, our work is the first to demonstrate that either a bispectrum or a group Cayley table can be learned from data alone. Our results set the foundation of a new computational primitive for robust and interpretable representation learning.

1.1. RELATED WORK

The great success and efficiency of convolutional neural networks owe much to built-in equivariance to the group of 2D translations. In recent years, an interest in generalizing convolution to non-Euclidean domains such as graphs and 3D surfaces has led to the incorporation of additional group symmetries into deep learning architectures [5], [6] . In another line of work, Kakarala [7] and Kondor [8] pioneered the use of the analytical group-invariant bispectrum in signal processing and machine learning contexts. Both of these approaches require specifying the group of transformations a priori and explicitly building its structure into the network architecture. However, the groups that structure natural data are often either unknown or too complex to specify analytically-for example, when the structure arises from the interaction of many groups, or when the group acts on latent features in data. Rather than building in groups by hand, another line of work has sought to learn underlying group structure solely from symmetries contained in data. The majority of these approaches use structured models that parameterize irreducible representations or Lie algebra generators, which act on the data through the group exponential map [9]- [14] . In these models, the objective is typically to infer the group element that acts on a template to generate the observed data, with inference accomplished through Expectation Maximization or other Bayesian approaches. A drawback of these models is that both the exponential map and inference schemes are computationally expensive. Thus, they are difficult to integrate into large-scale systems. A recent feed-forward approach [15] learns distributions over the group of 2D affine transformations in a deep learning architecture. However, the group is not learned in entirety, as it is restricted to a distribution on a pre-specified group. Here, we present a novel approach for learning groups from data in an efficient, interpretable, and fully feed-forward model that requires no prior knowledge of the group, computation of the exponential map, or Bayesian inference. The key insight in our approach is to harness the generality of the form of the group-invariant bispectrum, which can be defined for arbitrary compact groups. Here, we focus on the class of compact commutative groups.

2. THE BISPECTRUM

The theory of groups and their representations provides a natural framework for constructing computational primitives for robust machine learning systems. We provide a one-page introduction to these mathematical foundations in Appendix A. Extensive treatment of these concepts can be found in the textbooks of Hall [16] and Gallier & Quaintance [17] . We now define the concepts essential to this work: invariance, equivariance, orbits, complete invariance, and irreducible representations. Let G be a group, X a space on which G acts, and f a signal defined on the domain X. The orbit of a signal is the set generated by acting on the domain of the signal with each group element, i.e. {f (gx) : g ∈ G}. In the context of image transformations this is the set of all transformed versions of a canonical image-for example, if G is the group SO(2) of 2D rotations, then the orbit contains all rotated versions of that image. A function ϕ : X → Y is G-equivariant if ϕ(gx) = g ′ ϕ(x) for all g ∈ G and x ∈ X, with g ′ ∈ G ′ homomorphic to G. That is, a group transformation on the input space results in a corresponding

