ROMUL: SCALE ADAPTATIVE POPULATION BASED TRAINING

Abstract

In most pragmatic settings, data augmentation and regularization are essential, and require hyperparameter search. Population based training (PBT) is an effective tool for efficiently finding them as well as schedules over hyperparameters. In this paper, we compare existing PBT algorithms and contribute a new one: RO-MUL, for RObust MULtistep search, which adapts its stepsize over the course of training. We report competitive results with standard models on CIFAR (image classification) as well as Penn Tree Bank (language modeling), which both depend on heavy regularization. We also open-source hoptim, a PBT library agnostic to the training framework, which is simple to use, reentrant, and provides good defaults with ROMUL.

1. INTRODUCTION

Hyperparameter tuning is essential for good performance in most machine learning tasks, and poses numerous challenges. First, optimal hyperparameter values can change over the course of training (schedules), e.g. for learning rate, fine tuning phases, data augmentation. Hyperparameters values are also rarely independent from each other (e.g. the magnitude of individual data augmentations depends on the number of data augmentations applied), and the search space grows exponentially with the number of hyperparameters. All of that search has to be performed within a computational budget, and sometimes even within a wall-clock time budget (e.g. models that are frequently retrained on new data), requiring efficient parallelization. In practice, competitive existing methods range from random search (Bergstra & Bengio, 2012) to more advanced methods (that aim at being more compute-efficient) like sequential search (Bergstra et al., 2011; 2013; Li et al., 2018) 2019)) and search structured by the space of the hyperparameters (Liu et al., 2018; Cubuk et al., 2019b) . A major drawback of advanced hyperparameter optimization methods is that they themselves require attention from the user to reliably outperform random search. In this work, we empirically study the different training dynamics of data augmentation and regularization hyperparameters across vision and language modeling tasks, in particular for multistep (sequential) hyperparameter search. A common failure mode (i) is due to hyperparameters that have a different effect on the validation loss in the short and long terms, for instance using a smaller dropout often leads to faster but worse convergence. Another common problem (ii) is that successful searches are constrained on adequate "hyper-hyperparameters" (such as value ranges or the search policy used, which in current methods are non-adaptative mutation steps). Our contributions can be summarized as follows: • We present a robust algorithm for leveraging population based training for hyperparameter search: ROMUL (RObust MULtistep) search, which addresses (i) and (ii). We empirically study its benefits and limitations, and show that it provides good defaults that compare favorably to existing methods. • We open-source hoptim, a simple library for sequential hyperparameter search, that provides multiple optimizers (including ROMUL), as well as toy benchmarks showcasing hyperparameter optimization problems we identified empirically and standard datasets.



, population based training (PBT, e.g. Jaderberg et al. (2017); Ho et al. (

