NEURON ACTIVATION ANALYSIS IN MULTI-JOINT ROBOT REINFORCEMENT LEARNING

Abstract

Recent experiments indicate that pre-training of end-to-end Reinforcement Learning neural networks on general tasks can speed up the training process for specific robotic applications. However, it remains open if these networks form general feature extractors and a hierarchical organization that are reused as apparent e.g. in Convolutional Neural Networks. In this paper we analyze the intrinsic neuron activation in networks trained for target reaching of robot manipulators with increasing joint number in a vertical plane. We analyze the individual neuron activity distribution in the network, introduce a pruning algorithm to reduce network size keeping the performance, and with these dense network representations we spot correlations of neuron activity patterns among networks trained for robot manipulators with different joint number. We show that the input and output network layers have more distinct neuron activation in contrast to inner layers. Our pruning algorithm reduces the network size significantly, increases the distance of neuron activation while keeping a high performance in training and evaluation. Our results demonstrate that neuron activity can be mapped among networks trained for robots with different complexity. Hereby, robots with small joint difference show higher layer-wise projection accuracy whereas more different robots mostly show projections to the first layer.

1. INTRODUCTION

Convolutional Neural Networks (CNN) are well known to demonstrate a strong general feature extraction capability in lower network layers. In these networks feature kernels can not only be visualized, pre-trained general feature extractors can also be reused for efficient network learning. Recent examples propose efficient reusability experimentally for Reinforcement Learning neural networks as well: Networks are pre-trained on similar tasks and continued learning for the goal application. Reusing (sub)networks that can be re-assembled for an application never seen before can reduce network training time drastically. A better understanding of uniform or inhomogeneous network structures also improves the evaluation of network performance as well unveils opportunities for the interpretability of networks which is crucial for the application of machine learning algorithms e.g. in industrial scenarios. Finally, methodologies and metrics estimating network intrinsic and inter correlations in artificial neural networks may also enhance the understanding of biological learning. Eickenberg et al. (2017) could recently demonstrate that layers serving as feature extractors in CNNs could actually be found in the Human Visual Cortex by correlating artificial networks to biological recordings. Successful experiments to re-use end-to-end learned networks for similar tasks leave open whether such networks also self-organize feature extractors or in a dynamical domain motion primitives. Here, we analyze neuron activation in networks in order to investigate activation distribution and mapping between different networks trained on similar robot reaching tasks. In this paper we consider a standard vertical space robot manipulator with variable number of revolute joints as the test setup for target reaching end-to-end Reinforcement Learning (RL) experiments. We introduce metrics applied to evaluate individual neuron activation over time and compare activity within individual networks all-to-all (every neuron is correlated to any other neurons in the network) and layer wise (only correlations between networks on the same layer are inspected). These metrics are utilized to set up a pruning procedure to maximize the information density in learned neural networks and reduce redundancy as well as unused network nodes. Exploiting these optimization procedure we learn various neural networks with variable dimensions on robot manipulators with two to four joints, representing two to four Degrees of Freedom (DOF). in order to analyze similarities between network activation patterns. As a result we demonstrate experimentally that the introduced pruning process reduces the network size efficiently keeping performance loss in bounds and hereby builds a valid basis for network analysis. We show that the networks trained and iteratively pruned on the robot manipulators form distinct neuron activation. Analyzing neuron activation correlations between different networks of various sizes, mappings between neurons trained on different manipulators can be found. A layer wise interpretation reveals that networks trained for same tasks build similar structures, but we can also discover partially similar structures between networks trained on 3 and 4 joint manipulators.

2. RELATED WORK

The apability of feature extraction in CNNs, alongside with a variety of analysis and visualization tools, serves as a motivation for this work on training, analysis and pruning for networks trained with RL. Analysis methods for CNNs reach from regional based methods, e.g. image occlusion Zeiler & Fergus (2014) , that aim to expose the region of an image most relevant for classification, to feature based methods, e.g. deconvolution Zeiler & Fergus (2014) 

3. EXPERIMENTAL SETUP

In this paper we focus on a robot manipulator with operation limited to a vertical plane. A neural network is trained with end-to-end Reinforcement Learning in order to reach predefined locations in 2D space without prior knowledge of neither robot dynamics nor the environment. Hereby, end-toend refers to a mapping from sensory feedback in terms actual joint positions in cartesian space and the desired goal location to output actions as joint position commands. We apply Deep q-learning, as proposed in Mnih et al. (2015) , to predict q-values, an action is selected by means to the softmax exploration policy and Gradient descent of the networks weights is handled by the Adam Solver Kingma & Ba (2014) . For performance reasons our experiments are executed within a simplified simulation environment as shown conceptually in Figure 1 (right), but exemplary behaviors have been successfully trans-



or guided backpropagation Selvaraju et al. (2017). Methods combining the described techniques are for example introduced as Grad-CAM in Springenberg et al. (2014). These networks demonstrate class discrimination for features of deeper network layers (Zeiler & Fergus (2014)) as a basis to apply such general feature extractors to different applications after pre-training. Pre-trained networks such as ResNet He et al. (2016), which has been trained on the ImageNet1 data set, speed up training drastically by initializing CNNs applied for similar tasks. Kopuklu2019 demonstrated that even reusing individual layers in the same network can lead to a performance increase. Recent advances pushed RL agents to reach super human performance in playing Atari video games Bellemare et al. (2013) Mnih et al. (2015), Chess Silver et al. (2017) and Go Silver et al. (2016). These results were extended to cope with continuous action spaces in e.g. Lillicrap et al. (2015) and demonstrated great performance on highly dynamic multi-actuated locomotion learning tasks such as demonstrated in the NIPS 2017 Learning to Run challenge Kidziński et al. (2018). Vuong et al. (2019) and Eramo et al. (2020) demonstrate experimentally that knowledge learned by a neural network can be reused for other tasks in order to speed up training and hereby translate modularity concepts from CNNs to RL frameworks. Hierarchical Reinforcement Learning incorporates these ideas, utilizing the concept of subtask solving into neural networks e.g. in Andreas et al. (2016) for question answering. A successful example of transfer learning to build up a general knowledge base could be demonstrated with RL in Atari games in Parisotto et al. (2016). Gaier & Ha (2019) emphasizes the importance of neural architectures that can perform well even without weight learning. With a main motivation to improve learning efficiency and reduce computational requirements, network pruning is introduced for various network architectures. Early work in LeCun et al. (1990) utilizes second derivative information as a heuristic to decrease network size, recent work in Livne & Cohen (2020) introduces network pruning for Deep Reinforcement Learning based on redundancy detection in an iterative process. Li et al. (2018)

