

A

Recent work has highlighted several advantages of enforcing orthogonality in the weight layers of deep networks, such as maintaining the stability of activations, preserving gradient norms, and enhancing adversarial robustness by enforcing low Lipschitz constants. Although numerous methods exist for enforcing the orthogonality of fully-connected layers, those for convolutional layers are more heuristic in nature, often focusing on penalty methods or limited classes of convolutions. In this work, we propose and evaluate an alternative approach to directly parameterize convolutional layers that are constrained to be orthogonal. Specifically, we propose to apply the Cayley transform to a skew-symmetric convolution in the Fourier domain, so that the inverse convolution needed by the Cayley transform can be computed efficiently. We compare our method to previous Lipschitz-constrained and orthogonal convolutional layers and show that it indeed preserves orthogonality to a high degree even for large convolutions. Applied to the problem of certified adversarial robustness, we show that networks incorporating the layer outperform existing deterministic methods for certified defense against 2 -norm-bounded adversaries, while scaling to larger architectures than previously investigated. Code is available at https://github.com/locuslab/orthogonal-convolutions.

1. I

Encouraging orthogonality in neural networks has proven to yield several compelling benefits. For example, orthogonal initializations allow extremely deep vanilla convolutional neural networks to be trained quickly and stably (Xiao et al., 2018; Saxe et al., 2013) . And initializations that remain closer to orthogonality throughout training seem to learn faster and generalize better (Pennington et al., 2017) . Unlike Lipschitz-constrained layers, orthogonal layers are gradient-norm-preserving (Anil et al., 2019) , discouraging vanishing and exploding gradients and stabilizing activations (Rodríguez et al., 2017) . Orthogonality is thus a potential alternative to batch normalization in CNNs and can help to remember long-term dependencies in RNNs (Arjovsky et al., 2016; Vorontsov et al., 2017) . Constraints and penalty terms encouraging orthogonality can improve generalization in practice (Bansal et al., 2018; Sedghi et al., 2018) , improve adversarial robustness by enforcing low Lipschitz constants, and allow deterministic certificates of robustness (Tsuzuku et al., 2018) . Despite evidence for the benefits of orthogonality constraints, and while there are many methods to orthogonalize fully-connected layers, the orthogonalization of convolutions has posed challenges. More broadly, current Lipschitz-constrained convolutions rely on spectral normalization and kernel reshaping methods (Tsuzuku et al., 2018) , which only allow loose bounds and can cause vanishing gradients. Sedghi et al. (2018) showed how to clip the singular values of convolutions and thus enforce orthogonality, but relied on costly alternating projections to achieve tight constraints. Most recently, Li et al. ( 2019) introduced the Block Convolution Orthogonal Parameterization (BCOP), which cannot express the full space of orthogonal convolutions. In contrast to previous work, we provide a direct, expressive, and scalable parameterization of orthogonal convolutions. Our method relies on the Cayley transform, which is well-known for parameterizing orthogonal matrices in terms of skew-symmetric matrices, and can be easily extended to 1

