IMAGES AS WEIGHT MATRICES: SEQUENTIAL IMAGE GENERATION THROUGH SYNAPTIC LEARNING RULES

Abstract

Work on fast weight programmers has demonstrated the effectiveness of key/value outer product-based learning rules for sequentially generating a weight matrix (WM) of a neural net (NN) by another NN or itself. However, the weight generation steps are typically not visually interpretable by humans, because the contents stored in the WM of an NN are not. Here we apply the same principle to generate natural images. The resulting fast weight painters (FPAs) learn to execute sequences of delta learning rules to sequentially generate images as sums of outer products of selfinvented keys and values, one rank at a time, as if each image was a WM of an NN. We train our FPAs in the generative adversarial networks framework, and evaluate on various image datasets. We show how these generic learning rules can generate images with respectable visual quality without any explicit inductive bias for images. While the performance largely lags behind the one of specialised state-ofthe-art image generators, our approach allows for visualising how synaptic learning rules iteratively produce complex connection patterns, yielding human-interpretable meaningful images. Finally, we also show that an additional convolutional U-Net (now popular in diffusion models) at the output of an FPA can learn one-step "denoising" of FPA-generated images to enhance their quality. Our code is public.

1. INTRODUCTION

A Fast Weight Programmer (Schmidhuber, 1991a; 1992 ) is a neural network (NN) that can learn to continually generate and rapidly modify the weight matrix (i.e., the program) of another NN in response to a stream of observations to solve the task at hand (reviewed in Sec. 2.1). At the heart of the weight generation process lies an expressive yet scalable parameterisation of update rules (or learning rules, or programming instructions) that iteratively modify the weight matrix to obtain any arbitrary weight patterns/programs suitable for solving the given task. Several recent works (Schlag et al., 2021a; Irie et al., 2021; 2022c; b) have demonstrated outer products with the delta rule (Widrow & Hoff, 1960; Schlag et al., 2021b) as an effective mechanism for weight generation. In particular, this has been shown to outperform the purely additive Hebbian update rule (Hebb, 1949) used in the Linear Transformers (Katharopoulos et al., 2020; Choromanski et al., 2021) in various settings including language modelling (Schlag et al., 2021a), time series prediction (Irie et al., 2022b) , and reinforcement learning for playing video games (Irie et al., 2021) . However, despite its intuitive equations-treating the fast weight matrix as a key/value associative memory-, the effective "actions" of these learning rules on the "contents" stored in the weight matrix still remain opaque, because in general the values stored in a weight matrix are not easily interpretable by humans. Now what if we let a fast weight programmer generate a "weight matrix" that corresponds to some human-interpretable data? While outer product-based pattern generation may have a good inductive bias for generating a weight matrix of a linear layer 2 , it can also be seen as a generic mechanism for iteratively generating any high dimensional data. So let us apply the same principle to generate and incrementally refine natural images. We treat a colour image as three weight matrices representing synaptic connection weights of a fictive NN, and generate them iteratively through sequences of delta learning rules whose key/value patterns and learning rates are produced by an actual NN that we train. The resulting Fast Weight Painters (FPAs) learn to sequentially generate images, as sums of outer products, one rank at a time, through sequential applications of delta learning rules. Intuitively, the delta rule allows a painter to look into the currently generated image in a computationally efficient way, and to apply a change to the image at each painting step. We empirically observe that the delta rules largely improve the quality of the generated images compared to the purely additive outer product rules. We train our FPAs in the framework of Generative Adversarial Networks (GAN; Goodfellow et al. (2014) ; Niemitalo (2010); Schmidhuber (1990) ; reviewed in Sec. 2.2). We evaluate our model on six standard image generation datasets (CelebA, LSUN-Church, Metfaces, AFHQ-Cat/Dog/Wild; all at the resolution of 64x64), and report both qualitative image quality as well as the commonly used Fréchet Inception Distance (FID) evaluation metric (Heusel et al., 2017) . Performance is compared to the one of the state-of-the-art StyleGAN2 (Karras et al., 2020b;a) and the speed-optimised "lightweight" GAN (LightGAN; Liu et al. ( 2021)). While the performance still largely lags behind the one of StyleGAN2, we show that our generic models can generate images of respectable visual quality without any explicit inductive bias for image processing (e.g., no convolution is used in the generator). This confirms and illustrates that generic learning rules can effectively produce complex weight patterns that, in our case, yield natural images in various domains. Importantly, we can visualise each step of such weight generation in the human-interpretable image domain. This is a unique feature of our work since learning rules are typically not visually meaningful to humans in the standard weight generation scenario-see the example shown in Figure 1 (weight generation for few-shot image classification). Clearly, our goal is not to achieve the best possible image generator (for that, much better convolutional architectures exist). Instead, we use natural images to visually illustrate the behaviour of an NN that learns to execute sequences of learning rules. Nevertheless, it is also interesting to see how a convolutional NN can further improve the quality of FPA-generated images. For this purpose, we conduct an additional study where we add to the FPA's output, a now popular convolutional U-Net (Ronneberger et al., 2015; Salimans et al., 2017) used as the standard architecture (Ho et al., 2020; Song et al., 2021; Dhariwal & Nichol, 2021) for denoising diffusion models (Sohl-Dickstein et al., 2015) . The image-to-image transforming U-Net learns (in this case) one-step "denoising" of FPA-generated images and effectively improves their quality. Here we visualise weight generation of two heads: head '16' in layer 1 and head '12' in layer 4 as examples. In each case, the top row shows the rank-one update term (last term in Eq. 2), and the bottom row shows their cumulative sum, i.e., the fast weight matrix W t of Eq. 2, generated at the corresponding step.



https://github.com/IDSIA/fpainter In a linear layer, the weight matrix is multiplied with an input vector to produce an output vector. Consequently, assuming that the output vector is used to compute some scalar loss function, the gradient of the loss w.r.t. weights is expressed as an outer product (between the input and the gradient of the loss w.r.t. the output).



Figure 1: An illustration of the hardly human-interpretable standard weight generation process through sequences of delta rules in an FWP (DeltaNet) trained for 5-way 5-shot image classification on Mini-ImageNet (Vinyals et al., 2016; Ravi & Larochelle, 2017). The model is trained with the public code of Irie et al. (2022c) and achieves a test accuracy of 62.5%. The input to the model (shown at the bottom) is a sequence of images with their label (except for the last one to be predicted) processed from left to right, one image/label pair per step. The model has four layers with 16 heads each and a hidden layer size of 256. Each head generates a 16x16-dimensional fast weight matrix.Here we visualise weight generation of two heads: head '16' in layer 1 and head '12' in layer 4 as examples. In each case, the top row shows the rank-one update term (last term in Eq. 2), and the bottom row shows their cumulative sum, i.e., the fast weight matrix W t of Eq. 2, generated at the corresponding step.

