RESPERFNET: DEEP RESIDUAL LEARNING FOR RE-GRESSIONAL PERFORMANCE MODELING OF DEEP NEURAL NETWORKS

Abstract

The rapid advancements of computing technology facilitate the development of diverse deep learning applications. Unfortunately, the efficiency of parallel computing infrastructures varies widely with neural network models, which hinders the exploration of the design space to find high-performance neural network architectures on specific computing platforms for a given application. To address such a challenge, we propose a deep learning-based method, ResPerfNet, which trains a residual neural network with representative datasets obtained on the target platform to predict the performance for a deep neural network. Our experimental results show that ResPerfNet can accurately predict the execution time of individual neural network layers and full network models on a variety of platforms. In particular, ResPerfNet achieves 8.4% of mean absolute percentage error for LeNet, AlexNet and VGG16 on the NVIDIA GTX 1080Ti, which is substantially lower than the previously published works.

1. INTRODUCTION

Deep learning (DL) has exploded successfully and is applied to many application domains, such as image recognition and object detection Thus, a lot of human experts design high-accuracy neural network architectures for different applications. However, for Internet of Things (IoT) applications, large neural network models cannot fit into resource-constrained devices. On the other hand, a system designer often tries to find a proper computing platform or a deep learning accelerator (DLA) to execute a DL application with acceptable responsiveness. An exhaustive way to optimize the system design is to evaluate the cost and performance of desired DL models on all the available hardware/software options, but it is not only tedious but costly and lengthy in practice. Since DL frameworks and accelerators are evolving rapidly, and even some slight changes could significantly impact the performance of DL applications, it may be necessary to update the performance models frequently. Therefore, we need a systematic and efficient approach to produce accurate performance models when changes occur. While several works (Qi et al.; Justus et al. (2018) ; Wang et al.) have been proposed to estimate the delivered performance of a given DL model on a specific computing platform, so as to rapidly evaluate design alternatives, the estimates from these efforts are not very accurate. For example, the mean absolute percentage error (MAPE) for estimating full neural network models such as LeNet (LeCun et al. 2019)), which use residual neural networks to solve regression problems. The proposed model can be trained with performance data collected from many system configurations to establish a unified performance predictor which assists the users in selecting the DL model, the DL framework, and the DLA for their applications. Extensive experiments have been done to show that our unified approach not only provides more accurate performance estimates than the previous works, but also enables the users to quickly pre-



(1998)), AlexNet (Krizhevsky et al. (2012)) and VGG16 (Simonyan & Zisserman) on the NVIDIA GTX 1080Ti is as high as 24% in Wang et al., whose accuracy is the best among the previous works, but still has room for improvement. In this paper, we propose a deep residual network architecture, called ResPerfNet, to efficiently and accurately model the performance of DL models running on a wide range of DL frameworks and DLAs. It is based on the residual function approach proposed by (He et al. (2016) and inspired by the prior works Liu & Yang (2018); Jha et al. (2019); Wan et al. (

