MORE OR LESS: WHEN AND HOW TO BUILD CONVOLUTIONAL NEURAL NETWORK ENSEMBLES

Abstract

Convolutional neural networks are utilized to solve increasingly more complex problems and with more data. As a result, researchers and practitioners seek to scale the representational power of such models by adding more parameters. However, increasing parameters requires additional critical resources in terms of memory and compute, leading to increased training and inference cost. Thus a consistent challenge is to obtain as high as possible accuracy within a parameter budget. As neural network designers navigate this complex landscape, they are guided by conventional wisdom that is informed from past empirical studies. We identify a critical part of this design space that is not well-understood: How to decide between the alternatives of expanding a single convolutional network model or increasing the number of networks in the form of an ensemble. We study this question in detail across various network architectures and data sets. We build an extensive experimental framework that captures numerous angles of the possible design space in terms of how a new set of parameters can be used in a model. We consider a holistic set of metrics such as training time, inference time, and memory usage. The framework provides a robust assessment by making sure it controls for the number of parameters. Contrary to conventional wisdom, we show that when we perform a holistic and robust assessment, we uncover a wide design space, where ensembles provide better accuracy, train faster, and deploy at speed comparable to single convolutional networks with the same total number of parameters.

1. INTRODUCTION

Scaling capacity of deep learning models. Convolutional neural network models are becoming as accurate as humans on perceptual tasks. They are now used in numerous and diverse applications such as drug discovery, data compression, and automating gameplay. These models increasingly grow in size with more parameters and layers, driven by two major trends. First, there is a continuous rise in data complexity and sizes in many applications (Shazeer et al., 2017) . Second, there is an increasing need for higher accuracy as models are utilized in more critical applications -such as self-driving cars and medical diagnosis (Grzywaczewski, 2017) . This effect is especially pronounced in computer vision and natural language processing: Model sizes are three orders of magnitude larger than they were just three years ago (Sanh et al., 2019) . With bigger model sizes, the time, computation, and memory needed to train and deploy such models also increase. Thus, it is a consistent challenge to design models that maximize accuracy while remaining practical with respect to the resources they need (Lee et al., 2015; Huang et al., 2017b) . In this paper, we study the following question: Given a number of parameters (neurons), how to design a convolutional neural network to optimize holistically for accuracy, training cost, and inference cost? The holistic design space is very complex. Designers of convolutional neural network models navigate a complex design landscape to address this question: First, they need to decide on network architecture. Then, they have to consider whether to use a single network or build an ensemble model with multiple networks. Additionally, they have to decide how many neural networks to use and their individual designs, i.e., the depth, width, and number of networks in their model. Modern

