OPTIMIAL HYPERPARAMETER OPTI-MIZATION MODELS TO GENERATE EFFICIENT ENSEM-BLE DEEP LEARNING

Abstract

Ensemble Deep Learning improves accuracy over a single model by combining predictions from multiple models. It has established itself to be the core strategy for tackling the most difficult problems, like winning Kaggle challenges. Due to the lack of consensus to design a successful deep learning ensemble, we introduce Hyperband-Dijkstra, a new workflow that automatically explores neural network designs with Hyperband and efficiently combines them with Dijkstra's algorithm. This workflow has the same training cost than standard Hyperband running except sub-optimal solutions are stored and are candidates to be selected in the ensemble selection step (recycling). Next, to predict on new data, the user gives to Dijkstra the maximum number of models wanted in the ensemble to control the tradeoff between accuracy and inference time. Hyperband is a very efficient algorithm allocating exponentially more resources to the most promising configurations. It is also capable to propose diverse models due to its pure-exploration nature, which allows Dijkstra algorithm with a smart combination of diverse models to achieve a strong variance and bias reduction. The exploding number of possible combinations generated by Hyperband increases the probability that Dijkstra finds an accurate combination which fits the dataset and generalizes on new data. The two experimentation on CIFAR100 and on our unbalanced microfossils dataset show that our new workflow generates an ensemble far more accurate than any other ensemble of any ResNet models from ResNet18 to ResNet152.

1. INTRODUCTION

Ensemble machine learning is a popular method to use predictions and combine them for a successful and optimal classification. In the light of its success in Kaggle competition, all top-5 solutions published in the last seven image recognition challenges use at least one ensemble method. The average and median number of individual models used by ensemble is between 7 and 8. Appendix A summarized these 17 solutions. Despite its recent popularity among practitioners, there is no consensus on how to apply ensemble in the context of deep neural network. The overall work on ensemble Machine Learning (non-deep) was carried out in the 1990s and 2000s. The implementation of Deep Learning on GPU appeared less than 10 years ago. The outbreak of multi-GPU servers allows to effectively train and evaluate many neural networks simultaneously but also deploy ensemble deep architectures. Another recent trend to improve accuracy is the transfer learning or use external similar data source Kolesnikov et al. (2019) . Instead we search a new model-oriented method which can be applied on new kind of problems where no similar dataset exists. Hyperband-Dijkstra is an innovative way to benefit from this increasing computing power. It consists in unifying the two already proven efficient but contradictory approaches: hyperparameter optimization (HPO) and ensemble. First, one explores and trains models until finding the optimal solution and wasting sub-optimal ones while the other one uses a population of trained models to predict more accurately.

