NERN -LEARNING NEURAL REPRESENTATIONS FOR NEURAL NETWORKS

Abstract

Neural Representations have recently been shown to effectively reconstruct a wide range of signals from 3D meshes and shapes to images and videos. We show that, when adapted correctly, neural representations can be used to directly represent the weights of a pre-trained convolutional neural network, resulting in a Neural Representation for Neural Networks (NeRN). Inspired by coordinate inputs of previous neural representation methods, we assign a coordinate to each convolutional kernel in our network based on its position in the architecture, and optimize a predictor network to map coordinates to their corresponding weights. Similarly to the spatial smoothness of visual scenes, we show that incorporating a smoothness constraint over the original network's weights aids NeRN towards a better reconstruction. In addition, since slight perturbations in pre-trained model weights can result in a considerable accuracy loss, we employ techniques from the field of knowledge distillation to stabilize the learning process. We demonstrate the effectiveness of NeRN in reconstructing widely used architectures on CIFAR-10, CIFAR-100, and ImageNet. Finally, we present two applications using NeRN, demonstrating the capabilities of the learned representations.

1. INTRODUCTION

In the last decade, neural networks have proven to be very effective at learning representations over a wide variety of domains. Recently, NeRF (Mildenhall et al., 2020) demonstrated that a relatively simple neural network can directly learn to represent a 3D scene. This is done using the general method for neural representations, where the task is modeled as a prediction problem from some coordinate system to an output that represents the scene. Once trained, the scene is encoded in the weights of the neural network and thus novel views can be rendered for previously unobserved coordinates. NeRFs outperformed previous view synthesis methods, but more importantly, offered a new view on scene representation. Following the success of NeRF, there have been various attempts to learn neural representations on other domains as well. In SIREN (Sitzmann et al., 2020) it is shown that neural representations can successfully model images when adapted to handle high frequencies. NeRV (Chen et al., 2021) utilizes neural representations for video encoding, where the video is represented as a mapping from a timestamp to the pixel values of that specific frame. In this paper, we explore the idea of learning neural representations for pre-trained neural networks. We consider representing a Convolutional Neural Network (CNN) using a separate predictor neural network, resulting in a neural representation for neural networks, or NeRN. We model this task as a problem of mapping each weight's coordinates to its corresponding values in the original network. Specifically, our coordinate system is defined as a (Layer, Filter, Channel) tuple, denoted by (l, f, c), where each coordinate corresponds to the weights of a k × k convolutional kernel. NeRN is trained to map each input's coordinate back to the original kernel weights. One can then reconstruct the original network by querying NeRN over all possible coordinates. While a larger predictor network can trivially learn to overfit a smaller original network, we show that successfully creating a compact implicit representation is not trivial. To achieve this, we propose methods for introducing smoothness over the learned signal, i.e. the original network weights,

