LEARNING TO ACT THROUGH ACTIVATION FUNCTION OPTIMIZATION IN RANDOM NETWORKS

Abstract

Biological neural networks are characterised by a high degree of neural diversity, a trait that artificial neural networks (ANNs) generally lack. Additionally, learning in ANNs is typically synonymous with only modifying the strengths of connection weights. However, there is much evidence from neuroscience that different classes of neurons each have crucial roles in the information processing done by the network. In nature, each neuron is a dynamical system that is a powerful information processor in its own right. In this paper we ask the question, how well can ANNs learn to perform reinforcement learning tasks only through the optimization of neural activation functions, without any weight optimization? We demonstrate the viability of the method and show that the neural parameters are expressive enough to allow learning three different continuous control tasks without weight optimization. These results open up for more possibilities for synergies between synaptic and neural optimization in ANNs in the future. Code is available from [anonymised].

1. INTRODUCTION

Artificial neural networks (ANNs) have been shown to be able to learn a wide variety of different tasks (Schmidhuber, 2015) . With inspiration from their biological counterparts (Hassabis et al., 2017) , ANNs have dramatically pushed the boundaries for what is achievable for artificial intelligence technologies. ANNs are trained by tuning a large number of parameters, each of which provides a small contribution to the final output of the network. Likewise, most, but not all (Titley et al., 2017) , learning and behavioral changes are manifested in the biological brain as long-or short-term potentiation or depression of synapses between neurons (Stiles, 2000) . Neurons of the human brain are characterised by a high degree of diversity (Lillien, 1997; Soltesz et al., 2006) , and different classes of neurons respond differently to the incoming signals (Izhikevich, 2003) . A single biological neuron is a sophisticated processor in its own rights (Izhikevich, 2007; Poirazi et al., 2003) , with information processing occurring at several steps both in its dendrites (Beaulieu-Laroche et al., 2018; Magee, 2000) , cell body and axon terminals (Kamiya & Debanne, 2020; Rama et al., 2018) . Neurons of various classes interconnect in intricate circuits (Breedlove & Watson, 2013; Kandel et al., 2000) . This suggests that at least part of the explanation behind the impressive ability of biological networks to learn and retain knowledge must be found in the interplay between the abundance of different neuron types (Nusser, 2018) . While the diversity of biological neurons is well documented, in ANNs it is common to have a single activation function used by all hidden neurons. Intrigued by the interesting properties of randomly-initialised networks in both machine learning (Gaier & Ha, 2019; Najarro & Risi, 2020; Ulyanov et al., 2018) and neuroscience (Lindsay et al., 2017) , we are interested in the computational expressivity of only optimizing parameterized neural activation functions without any weight optimization. As described below, our approach allows every neuron in our ANNs to be a unique dynamical system. We apply our method to three diverse continuous control tasks. The simpler CartPoleSwingUp Gaier & Ha (2019), the locomotion of a bipedal robot (Brockman et al., 2016) , and a vision-based car racing task with procedurally generated tracks (Brockman et al., 2016) . The results show that the method performs well on all three tasks, outperforming weight-optimized networks that have a similar number of adjustable parameters in two out of three tasks. Surprisingly, optimized activation functions in random networks even outperform a weight-optimized network with many more adjustable parameters in the more challenging CarRacing-v0 environment. While there exist previous work that deals with optimizing activation function parameters (Agostinelli et al., 2014; Sipper, 2021; Bingham & Miikkulainen, 2022) , to the best of our knowledge, here we show for the first time that optimizing expressive activation functions alone allows solving challenging RL tasks. We hope that our results inspire further research in approaches that do not only see ANN optmization as the optimization of weight parameters and challenging some of our assumptions on what it means for such systems to learn. Figure 1 : Illustration of the proposed neural activation function. The parameters p i are optimized in order to achieve an expressive function. These parameters are used to integrate the input with a neural state and a bias term through a vector-matrix multiplication. As input, the neuron takes an input value propagated from the previous layer, its neural state, and a constant value of one as a bias term. It outputs a value to be propagated to the next layer, as well as its own new state. For further details see Section 3.

2. RELATED WORK

Neurocentric Optimization. Biases in neural networks is an example of neurocentric parameters. When an ANN is optimized to solve any task, the values of the network's weights and biases are gradually being tuned until a functional network has been achieved. The network most commonly has a one bias value for each neuron and the weight parameters thus greatly outnumber these neurocentric bias parameters. It is well known that the function of biases is to translate the activations in the network (Benítez et al., 1997) , and ease the optimization of the network. Another well known example of neurocentric parameters are found in the PReLU activation functions (He et al., 2015) where a parameter is learned to determine the slope of the function in the case of negative inputs. Introducing this per neuron customization of the activation functions was shown to improve performance of networks with little extra computational cost. Neurocentric parameter optimization can also be found within the field of plastic neural networks. In one setting of their experiments, Urzelai & Floreano (2001) optimized plasticity rules for each neuron (they referred to this as 'node encoding'), such that each incoming synapse to a node was adapted by a common plasticity rule. The idea of neurocentric parameters is therefore far from new. However, in contrast to earlier work, in this paper we explore the potential of solely optimizing the activation functions of a randomly initialized network without ever adapting the weights. Activation Functions in Neuroevolution. Not all ANNs have a single activation function for all hidden neurons. Some version of Neuro-Evolution of Augmented Topologies (NEAT) (Stanley & Miikkulainen, 2002; Papavasileiou et al., 2021) allow for different activation functions on each neuron. The NEAT algorithm searches through networks with increasing complexity over the process of evolution. Starting from a simple network structure, each new network has a chance of adding a new neuron to the network. When a new neuron is added, it can be allocated a random activation function from a number of predetermined functions. In newer versions of NEAT, mutations allow activation functions of neurons to be changed even after it was initially added (Hagg et al., 2017) . This resulted

