MULTIPLICATIVE FILTER NETWORKS

Abstract

Although deep networks are typically used to approximate functions over high dimensional inputs, recent work has increased interest in neural networks as function approximators for low-dimensional-but-complex functions, such as representing images as a function of pixel coordinates, solving differential equations, or representing signed distance functions or neural radiance fields. Key to these recent successes has been the use of new elements such as sinusoidal nonlinearities or Fourier features in positional encodings, which vastly outperform simple ReLU networks. In this paper, we propose and empirically demonstrate that an arguably simpler class of function approximators can work just as well for such problems: multiplicative filter networks. In these networks, we avoid traditional compositional depth altogether, and simply multiply together (linear functions of) sinusoidal or Gabor wavelet functions applied to the input. This representation has the notable advantage that the entire function can simply be viewed as a linear function approximator over an exponential number of Fourier or Gabor basis functions, respectively. Despite this simplicity, when compared to recent approaches that use Fourier features with ReLU networks or sinusoidal activation networks, we show that these multiplicative filter networks largely outperform or match the performance of these approaches on the domains highlighted in these past works.

1. INTRODUCTION

Neural networks are most commonly used to approximate functions over high-dimensional input spaces, such as functions that operate on images or long text sequences. However, there has been a recent growing interest in neural networks used to approximate low-dimensional-but-complex functions: for example, one could represent a continuous image as a function f : R 2 → R 3 where the input to this function specifies (x, y) coordinates of a location in the image, and the output specifies the RGB value of the pixel at that location. However, two recent papers in particular have argued that specific architectural changes are required to make (fully-connected) deep networks suitable to this task: Sitzmann et al. (2020) employ sinusoidal activation functions within a multi-layer networks (called the SIREN architecture); and Tancik et al. (2020) propose random Fourier features input to a traditional ReLU-based network. Both papers show that the resulting networks can approximate these low-dimensional functions much better than simple feedforward ReLU networks, and achieve striking results in representing fairly complex functions (e.g. 3D signed distance fields or neural radiance fields) with a high degree of fidelity. However, the precise benefit of sinusoidal bases or a first layer of Fourier features seems difficult to characterize, and it remains unclear why such representations work well for these tasks. In this paper, however, we argue and empirically demonstrate that an arguably simpler class of functions can work as well or better than these previously-proposed networks on this task. Specifically, we propose an architecture we call the multiplicative filter network (MFN). Unlike a traditional multi-layer network that achieves representation power through compositional depth, the MFN instead simply repeatedly applies nonlinear filters (such as a sinusoid or a Gabor wavelet function) to the network's input, then multiplies together linear functions of these features. The notable advantage of this representation that, owing to the multiplicative properties of Fourier and Gabor filters, the entire function is ultimately just a linear function of (an exponential number of) these Fourier or Gabor features of the input. Indeed, we can express the exact linear form of these MFNs, which can make their analysis considerably simpler than that for deep networks, where compositions of nonlinear activation's make the entire function difficult to characterize. In this work, we show that despite this simplicity, the proposed networks often perform as well or better than the previously proposed SIREN or Fourier feature networks. Specifically, we compare our approach on networks with comparable numbers of parameters to the exact benchmarks proposed in the SIREN and Fourier features papers. We show that MFNs achieve better performance deltas when increasing the depth or width of the networks. Despite this, we do emphasize that SIREN networks, in particular, appear to retain some notable advantages over MFNs, such as a bias towards smoother regions in the represented function and its gradients. However, especially given the fact that MFNs ultimately just correspond to a linear Fourier or Wavelet representation of a lowdimensional function, we believe they should be considered a standard benchmark for future work on such problems, to indicate where the compositional depth of typical deep networks can propose a substantial benefit.

2. BACKGROUND AND RELATED WORK

Our approach is related to many previous works in Fourier and Wavelet transforms, random Fourier features, and implicit neural representations. We explore the connection among the areas below. Fourier and Wavelet transforms. Transforming time or space domain signals to frequency domain using transforms such as Fourier and Wavelet transforms have been at the heart of many developments in image processing, signal processing, and computer vision. In particular, the Fourier transform (Bracewell & Bracewell, 1986; Vetterli et al., 2014) and its various forms have found usage in myriad applications, such as spectroscopy, quantum mechanics, signal processing. Wavelet transforms, which in particular aid in multi-scale analysis, have been found to be particularly useful in data compression, JPEG2000 (Rabbani, 2002) being one example. Rahimi & Recht (2008) demonstrates the power of Fourier transform in machine learning applications. They show that simply projecting the original dataset into random Fourier bases vastly improves the expressiveness of models as it approximates kernel computations. Many subsequent works apply the Fourier features and variations (Rahimi & Recht, 2009; Le et al., 2013; Yu et al., 2016) to improve machine learning algorithm performance in many domain areas, including classification (Sun et al., 2018; Rawat et al., 2019 ), regression (Avron et al., 2017; Brault et al., 2016 ), clustering (Chitta et al., 2012; Liu et al., 2019) , online learning (Lin et al., 2014; Hu et al., 2015) , and deep learning (Xue et al., 2019; Mehrkanoon & Suykens, 2018; Rick Chang et al., 2016; Mairal et al., 2014; Jacot et al., 2018; Tancik et al., 2020) .

Implicit neural representations.

A recent line of work in representing signals as a continuous function parameterized by neural network (instead of using the traditional discrete representation) is gaining popularity. This strategy has been used to represent different objects such as images (Nguyen et al., 2015; Stanley, 2007) , shapes (Park et al., 2019; Genova et al., 2019; Chen & Zhang, 2019; Chabra et al., 2020 ), scenes (Mildenhall et al., 2020; Sitzmann et al., 2019; Jiang et al., 2020; Niemeyer et al., 2020), and textures (Oechsle et al., 2019; Henzler et al., 2020) . In most of these applications, the standard neural networks architecture with multi-layer perceptrons and ReLU activation function is often used. Recently, motivated by the success of Fourier transform in machine learning, a few papers have suggested architectural changes that integrate periodic nonlinearities into the network. Mildenhall et al. (2020); Zhong et al. (2020); Tancik et al. (2020) proposed the use of sinusoidal mapping of the input features (Rahimi & Recht, 2008) that uses positional encoding and Gaussian random distribution in the mapping. Others (Klocek et al., 2019; Sitzmann et al., 2020) have proposed the use of sinusoidal activation function within a multi-layer perceptron architecture.

