FUNKNN: NEURAL INTERPOLATION FOR FUNCTIONAL GENERATION

Abstract

Can we build continuous generative models which generalize across scales, can be evaluated at any coordinate, admit calculation of exact derivatives, and are conceptually simple? Existing MLP-based architectures generate worse samples than the grid-based generators with favorable convolutional inductive biases. Models that focus on generating images at different scales do better, but employ complex architectures not designed for continuous evaluation of images and derivatives. We take a signal-processing perspective and treat continuous image generation as interpolation from samples. Indeed, correctly sampled discrete images contain all information about the low spatial frequencies. The question is then how to extrapolate the spectrum in a data-driven way while meeting the above design criteria. Our answer is FunkNN-a new convolutional network which learns how to reconstruct continuous images at arbitrary coordinates and can be applied to any image dataset. Combined with a discrete generative model it becomes a functional generator which can act as a prior in continuous ill-posed inverse problems. We show that FunkNN generates high-quality continuous images and exhibits strong out-of-distribution performance thanks to its patch-based design. We further showcase its performance in several stylized inverse problems with exact spatial derivatives. Our implementation is available at

1. INTRODUCTION

Deep generative models are effective image priors in applications from ill-posed inverse problems (Shah & Hegde, 2018; Bora et al., 2017) to uncertainty quantification (Khorashadizadeh et al., 2022) and variational inference (Rezende & Mohamed, 2015) . Since they approximate distributions of images sampled on discrete grids they can only produce images at the resolution seen during training. But natural, medical, and scientific images are inherently continuous. Generating continuous images would enable a single trained model to drive downstream applications that operate at arbitrary resolutions. If this model could also produce exact spatial derivatives, it would open the door to generative regularization of many challenging inverse problems for partial differential equations (PDEs). There has recently been considerable interest in learning grid-free image representations. Implicit neural representations (Tancik et al., 2020; Sitzmann et al., 2020; Martel et al., 2021; Saragadam et al., 2022) have been used for mesh-free image representations in various inverse problems (Chen et al., 2021; Park et al., 2019; Mescheder et al., 2019; Chen & Zhang, 2019; Vlašić et al., 2022; Sitzmann et al., 2020) . An implicit network f θ (x), often a multi-layered perceptron (MLP), directly approximates the image intensity at spatial coordinate x ∈ R D . While f θ (x) only represents a single image, different works incorporate a latent code z in f θ (x, z) to model distributions of continuous images. These approaches perform well on simple datasets but their performance on complex data like human faces is far inferior to that of conventional grid-based generative models based on convolutional neural networks (CNNs) (Chen & Zhang, 2019; Dupont et al., 2021; Park et al., 2019) . This is in fact true even when evaluated at resolution they were trained on. One reason for their limited performance is that these implicit models use MLPs which are not well-suited for modelling image data. In this paper, we alleviate the above challenges with a new mesh-free convolutional image generator that can faithfully learn the distribution of continuous image functions. The key feature of the proposed framework is our novel patch-based continuous super-resolution network-FunkNN-which takes a discrete image at any resolution and super-resolves it to generate image intensities at arbitrary spatial coordinates. As shown in Figure 1 ), our approach combines a traditional discrete image generator with FunkNN, resulting in a deep generative model that can produce images at arbitrary coordinates or resolution. FunkNN can be combined with any off-the-shelf pre-trained image generator (or trained jointly with one). This includes the highly successful GAN architectures (Karras et al., 2019; 2020) , normalizing flows (Kingma & Dhariwal, 2018; Kothari et al., 2021) or scorematching generative models (Song & Ermon, 2019; Song et al., 2020) . It naturally enables us to learn complex image distributions. Unlike prior works (Chen & Zhang, 2019; Chen et al., 2021; Dupont et al., 2021; Park et al., 2019) , FunkNN neither requires a large encoder to generate latent codes nor does it use any MLPs. This is possible thanks to FunkNN's unique way of integrating the coordinate x with image features. The key idea is that resolving image intensity at a coordinate x should only depend on its neighborhood. Therefore, instead of generating a code for the entire image and then combining it with x using an MLP, FunkNN simply crops a patch around x in the low-resolution image obtained from a traditional generator. This window is then provided to a small convolutional neural network that generates the image intensity at x. The window cropping is performed in a differentiable manner with a spatial transformer network (Jaderberg et al., 2015) . We experimentally show that FunkNN reliably learns to resolve images to resolutions much higher than those seen during training. In fact, it performs comparably to state-of-the-art continuous superresolution networks (Chen et al., 2021) despite having only a fraction of the latter's trainable parameters. Unlike traditional learning-based methods, our approach can also super-resolve images that belong to image distributions different from those seen while training. This is a benefit of patchbased processing which reduces the chance of overfitting on global image features. In addition, we show that our overall generative model framework can produce high quality image samples at any resolution. With the continuous, differentiable map between spatial coordinates and image intensities, we can access spatial image derivatives at arbitrary coordinates and use them to solve inverse problems.

REPRESENTATION

Let u ∈ L 2 (R D ) c denote a continuous signal of interest with c ≥ 1 channels and u be its discretized version supported along a fixed grid with n × n ∈ N * elements. For simplicity we consider the case with D = 2. Our discussion naturally extends to higher dimensions.



Figure 1: The proposed architecture. The generative model (orange) produces a fixed-resolution image that is differentiably used by FunkNN to produce the image intensity at any location (blue).

availability

https://github.com/swing-research/

