NEURAL NETWORK SURGERY: COMBINING TRAINING WITH TOPOLOGY OPTIMIZATION

Abstract

With ever increasing computational capacities, neural networks become more and more proficient at solving complex tasks. However, picking a sufficiently good network topology usually relies on expert human knowledge. Neural architecture search aims to reduce the extent of expertise that is needed. Modern architecture search techniques often rely on immense computational power, or apply trained meta-controllers for decision making. We develop a framework for a genetic algorithm that is both computationally cheap and makes decisions based on mathematical criteria rather than trained parameters. It is a hybrid approach that fuses training and topology optimization together into one process. Structural modifications that are performed include adding or removing layers of neurons, with some re-training applied to make up for incurred change in input-output behaviour. Our ansatz is tested on both the SVHN and (augmented) CIFAR-10 datasets with limited computational overhead compared to training only the baseline. This algorithm can achieve a significant increase in accuracy (as compared to a fully trained baseline), rescue insufficient topologies that in their current state are only able to learn to a limited extent, and dynamically reduce network size without loss in achieved accuracy.

1. INTRODUCTION

A common problem for any given machine learning task to be addressed with artificial neural networks (ANNs) is how to choose a sufficiently good network topology. Picking one that is too small may not yield acceptable prediction accuracy. To improve results, one can keep adding structural elements to the network until the desired accuracy value has been reached. However, at the same time too large networks may cause an explosion in computational cost for both training and evaluation. Where the sweet spot in between lies is unclear, and heavily dependent on the given task. A priori optimization is not easily possible, since reliable estimates on network behaviour already require training results, and also there exists no generalization for which topology will fit which problem. et al. (1989) are so called (neuro-) evolutionary algorithms. These algorithms serve to evolve the network architecture, often also training network weights at the same time (Elsken et al., 2019) . In this paper we propose a novel training regime incorporating a genetic algorithm that reduces computational cost compared to state of the art approaches of this kind (Dong & Yang, 2019; Li & Talwalkar, 2019) . We achieve this by re-using network weights for competing modification candidates instead of retraining each net from scratch, branching off modification candidates during training, and letting them compete against each other until a new main branch is selected. This better fuses the evolutionary optimization paradigm with the ANN training into an integrated framework that folds both processes into a single training/topology optimization hybrid. As such, evolutionary steps are not carried out by a meta-controller or other black-box-like implementations. We make use of mathematical tools such as singular value decomposition (SVD) and the Bayesian information criterion (BIC) (Schwarz, 1978) for network weight analysis, decision making, and structural modifications. Network modifications are performed by adapting existing weights such as to incur minimal changes to input-output behaviour. Our framework for a combined ANN training and neural architecture search consists of three main components: a module that can perform a number of minimally invasive network operations ("surgeries"), a module that analyses network weights and can give recommendations which modifications are most likely to increase (validation) accuracy, and finally a module that serves as a genetic algorithm (the "Surgeon"), containing the former two while gradually evolving any given starting network. With the Surgeon, we are able to evolve and improve models for the SVHN (Netzer et al., 2011) and CIFAR-10 (Krizhevsky, 2009) datasets for varying starting topologies. We achieve particularly good results on starting topologies that would a posteriori have proven to be suboptimal. A great benefit of our approach is that it adds topology optimization to the ML training at a very limited additional computational cost. Convergence is reached for all test cases within a few CPU hours. This paper contributes a computationally cheap ansatz for a genetic neural architecture search algorithm that makes evolutionary decisions based on mathematical analysis.

2. RELATED WORK

Neural architecture search (NAS) has been an increasingly popular research topic for many years (Elsken et al., 2019) 2017) reach competitive performance on benchmark datasets such as CIFAR-10. However, this often comes at the cost of vast computational resources, with Zoph & Le (2017) making use of up to 800 GPUs for several weeks. Cai et al. (2018) attempt to reduce computational costs by re-using network weights, as well as training and applying a reinforcement meta-controller for structural decisions. They make use of a number of function-preserving transformations (net2net) introduced by Chen et al. ( 2016), and extend them to allow also non-sequential network structures, such as DenseNet (Huang et al., 2017) . DiMattina & Zhang (2010) introduce and rigorously prove conditions, under which gradual changes of the parametrization of a neural network are possible, while keeping the input-output behaviour constant. There are also a number of neural architecture search strategies that do not depend on manual network modifications. Dong & Yang (2019) represent the search space as a directed acyclic graph and propose an efficient search algorithm. İrsoy & Alpaydın (2020) learn the network structure via so-called "budding perceptrons", in which an extra parameter is learned for each layer, that indicates whether or not any given node needs to branch out again or be removed altogether. Their method focuses on growing the network to the required size from a minimal starting topology. Frankle & Carbin (2019) present a method to identify particularly good network initializations that can train sparse networks to competitive accuracy. Another approach in NAS is to prune down from a larger starting topology (Blalock et al., 2020) . Popular pruning techniques include applying SVD to existing network weights (Girshick, 2015; Denton et al., 2014; Xue et al., 2013) . The novelty of our research lies in combining existing tools such as net2net (Chen et al., 2016) and SVD with a genetic algorithm that modifies the given network in a decision based process instead of utilizing a black-box like decision module. To our best knowledge no such method has yet been proposed.

3. METHODS

This work introduces and utilizes three main modules: • modification module: performs network modifications ("surgeries") so as to incur minimal changes to input-output behaviour. • recommendation module: analyzes network weights and gives recommendations on which operations are most likely to improve network accuracy. • "the Surgeon": a genetic algorithm that links the above two modules, and gradually evolves a given starting network.



Researchers have applied a number of search strategies such as random search (Li & Talwalkar, 2019), Bayesian optimization (Kandasamy et al., 2018), reinforcement learning (Zoph & Le, 2017), and gradient-based methods (Dong & Yang, 2019). Another technique applied since at least Miller

, starting as early as Miller et al. (1989), who presented one of the earliest neuroevolutionary algorithms to search for suitable network topologies. Recent approaches by Dong & Yang (2019); Li & Talwalkar (2019), and Zoph & Le (

