MULTIPLICATIVE FILTER NETWORKS

Abstract

Although deep networks are typically used to approximate functions over high dimensional inputs, recent work has increased interest in neural networks as function approximators for low-dimensional-but-complex functions, such as representing images as a function of pixel coordinates, solving differential equations, or representing signed distance functions or neural radiance fields. Key to these recent successes has been the use of new elements such as sinusoidal nonlinearities or Fourier features in positional encodings, which vastly outperform simple ReLU networks. In this paper, we propose and empirically demonstrate that an arguably simpler class of function approximators can work just as well for such problems: multiplicative filter networks. In these networks, we avoid traditional compositional depth altogether, and simply multiply together (linear functions of) sinusoidal or Gabor wavelet functions applied to the input. This representation has the notable advantage that the entire function can simply be viewed as a linear function approximator over an exponential number of Fourier or Gabor basis functions, respectively. Despite this simplicity, when compared to recent approaches that use Fourier features with ReLU networks or sinusoidal activation networks, we show that these multiplicative filter networks largely outperform or match the performance of these approaches on the domains highlighted in these past works.

1. INTRODUCTION

Neural networks are most commonly used to approximate functions over high-dimensional input spaces, such as functions that operate on images or long text sequences. However, there has been a recent growing interest in neural networks used to approximate low-dimensional-but-complex functions: for example, one could represent a continuous image as a function f : R 2 → R 3 where the input to this function specifies (x, y) coordinates of a location in the image, and the output specifies the RGB value of the pixel at that location. However, two recent papers in particular have argued that specific architectural changes are required to make (fully-connected) deep networks suitable to this task: Sitzmann et al. (2020) employ sinusoidal activation functions within a multi-layer networks (called the SIREN architecture); and Tancik et al. (2020) propose random Fourier features input to a traditional ReLU-based network. Both papers show that the resulting networks can approximate these low-dimensional functions much better than simple feedforward ReLU networks, and achieve striking results in representing fairly complex functions (e.g. 3D signed distance fields or neural radiance fields) with a high degree of fidelity. However, the precise benefit of sinusoidal bases or a first layer of Fourier features seems difficult to characterize, and it remains unclear why such representations work well for these tasks.

