SLOT MACHINES: DISCOVERING WINNING COMBINA-TIONS OF RANDOM WEIGHTS IN NEURAL NETWORKS

Abstract

In contrast to traditional weight optimization in a continuous space, we demonstrate the existence of effective random networks whose weights are never updated. By selecting a weight among a fixed set of random values for each individual connection, our method uncovers combinations of random weights that match the performance of traditionally-trained networks of the same capacity. We refer to our networks as "slot machines" where each reel (connection) contains a fixed set of symbols (random values). Our backpropagation algorithm "spins" the reels to seek "winning" combinations, i.e., selections of random weight values that minimize the given loss. Quite surprisingly, we find that allocating just a few random values to each connection (e.g., 8 values per connection) yields highly competitive combinations despite being dramatically more constrained compared to traditionally learned weights. Moreover, finetuning these combinations often improves performance over the trained baselines. A randomly initialized VGG-19 with 8 values per connection contains a combination that achieves 90% test accuracy on CIFAR-10. Our method also achieves an impressive performance of 98.1% on MNIST for neural networks containing only random weights.

1. INTRODUCTION

Innovations in how deep networks are trained have played an important role in the remarkable success deep learning has produced in a variety of application areas, including image recognition (He et al., 2016 ), object detection (Ren et al., 2015; He et al., 2017) , machine translation (Vaswani et al., 2017) and language modeling (Brown et al., 2020) . Learning typically involves either optimizing a network from scratch (Krizhevsky et al., 2012) , finetuning a pre-trained model (Yosinski et al., 2014) or jointly optimizing the architecture and weights (Zoph & Le, 2017) . Against this predominant background, we pose the following question: can a network instantiated with only random weights achieve competitive results compared to the same model using optimized weights? For a given task, an untrained, randomly initialized network is unlikely to produce good performance. However, we demonstrate that given sufficient random weight options for each connection, there exist selections of these random weight values that have generalization performance comparable to that of a traditionally-trained network with the same architecture. More importantly, we introduce a method that can find these high-performing randomly weighted configurations consistently and efficiently. Furthermore, we show empirically that a small number of random weight options (e.g., 2 -8 values per connection) are sufficient to obtain accuracy comparable to that of the traditionally-trained network. Instead of updating the weights, the algorithm simply selects for each connection a weight value from a fixed set of random weights. We use the analogy of "slot machines" to describe how our method operates. Each reel in a Slot Machine has a fixed set of symbols. The reels are jointly spinned in an attempt to find winning combinations. In our context, each connection has a fixed set of random weight values. Our algorithm "spins the reels" in order to find a winning combination of symbols, i.e., selects a weight value for each connection so as to produce an instantiation of the network that yields strong performance. While in physical Slot Machines the spinning of the reels is governed by a fully random process, in our Slot Machines the selection of the weights is guided by a method that optimizes the given loss at each spinning iteration.

