IMAGES AS WEIGHT MATRICES: SEQUENTIAL IMAGE GENERATION THROUGH SYNAPTIC LEARNING RULES

Abstract

Work on fast weight programmers has demonstrated the effectiveness of key/value outer product-based learning rules for sequentially generating a weight matrix (WM) of a neural net (NN) by another NN or itself. However, the weight generation steps are typically not visually interpretable by humans, because the contents stored in the WM of an NN are not. Here we apply the same principle to generate natural images. The resulting fast weight painters (FPAs) learn to execute sequences of delta learning rules to sequentially generate images as sums of outer products of selfinvented keys and values, one rank at a time, as if each image was a WM of an NN. We train our FPAs in the generative adversarial networks framework, and evaluate on various image datasets. We show how these generic learning rules can generate images with respectable visual quality without any explicit inductive bias for images. While the performance largely lags behind the one of specialised state-ofthe-art image generators, our approach allows for visualising how synaptic learning rules iteratively produce complex connection patterns, yielding human-interpretable meaningful images. Finally, we also show that an additional convolutional U-Net (now popular in diffusion models) at the output of an FPA can learn one-step "denoising" of FPA-generated images to enhance their quality. Our code is public.

1. INTRODUCTION

A Fast Weight Programmer (Schmidhuber, 1991a; 1992 ) is a neural network (NN) that can learn to continually generate and rapidly modify the weight matrix (i.e., the program) of another NN in response to a stream of observations to solve the task at hand (reviewed in Sec. 2.1). At the heart of the weight generation process lies an expressive yet scalable parameterisation of update rules (or learning rules, or programming instructions) that iteratively modify the weight matrix to obtain any arbitrary weight patterns/programs suitable for solving the given task. Several recent works (Schlag et al., 2021a; Irie et al., 2021; 2022c; b) have demonstrated outer products with the delta rule (Widrow & Hoff, 1960; Schlag et al., 2021b) as an effective mechanism for weight generation. In particular, this has been shown to outperform the purely additive Hebbian update rule (Hebb, 1949) used in the Linear Transformers (Katharopoulos et al., 2020; Choromanski et al., 2021) in various settings including language modelling (Schlag et al., 2021a), time series prediction (Irie et al., 2022b), and reinforcement learning for playing video games (Irie et al., 2021) . However, despite its intuitive equations-treating the fast weight matrix as a key/value associative memory-, the effective "actions" of these learning rules on the "contents" stored in the weight matrix still remain opaque, because in general the values stored in a weight matrix are not easily interpretable by humans. Now what if we let a fast weight programmer generate a "weight matrix" that corresponds to some human-interpretable data? While outer product-based pattern generation may have a good inductive bias for generating a weight matrix of a linear layerfoot_1 , it can also be seen as a generic mechanism for iteratively generating any high dimensional data. So let us apply the same principle to generate and



https://github.com/IDSIA/fpainter In a linear layer, the weight matrix is multiplied with an input vector to produce an output vector. Consequently, assuming that the output vector is used to compute some scalar loss function, the gradient of the loss w.r.t. weights is expressed as an outer product (between the input and the gradient of the loss w.r.t. the output).1

