DEEP RANKING ENSEMBLES FOR HYPERPARAMETER OPTIMIZATION

Abstract

Automatically optimizing the hyperparameters of Machine Learning algorithms is one of the primary open questions in AI. Existing work in Hyperparameter Optimization (HPO) trains surrogate models for approximating the response surface of hyperparameters as a regression task. In contrast, we hypothesize that the optimal strategy for training surrogates is to preserve the ranks of the performances of hyperparameter configurations as a Learning to Rank problem. As a result, we present a novel method that meta-learns neural network surrogates optimized for ranking the configurations' performances while modeling their uncertainty via ensembling. In a large-scale experimental protocol comprising 12 baselines, 16 HPO search spaces and 86 datasets/tasks, we demonstrate that our method achieves new state-of-the-art results in HPO.

1. INTRODUCTION

Hyperparameter Optimization (HPO) is a crucial ingredient in training state-of-the-art Machine Learning (ML) algorithms. The three popular families of HPO techniques are Bayesian Optimization (Hutter et al., 2019 ), Evolutionary Algorithms (Awad et al., 2021a) , and Reinforcement Learning (Wu & Frazier, 2019; Jomaa et al., 2019) . Among these paradigms, Bayesian Optimization (BO) stands out as the most popular approach to guide the HPO search. At its core, BO fits a parametric function (called a surrogate) to estimate the evaluated performances (e.g. validation error rates) of a set of hyperparameter configurations. The task of fitting the surrogate to the observed data points is treated as a probabilistic regression, where the common choice for the surrogate is Gaussian Processes (GP) (Snoek et al., 2012) . Consequently, BO uses the probabilistic predictions of the configurations' performances for exploring the search space of hyperparameters. For an introduction to BO, we refer the interested reader to Hutter et al. (2019) . In this paper, we highlight that the current BO approach of training surrogates through a regression task is sub-optimal. We furthermore hypothesize that fitting a surrogate to evaluated configurations is instead a learning-to-rank (L2R) problem (Burges et al., 2005) . The evaluation criterion for HPO is the performance of the top-ranked configuration. In contrast, the regression loss measures the surrogate's ability to estimate all observed performances and does not pay any special consideration to the top-performing configuration(s). We propose that BO surrogates must be learned to estimate the ranks of the configurations with a special emphasis on correctly predicting the ranks of the top-performing configurations. Unfortunately, the current BO machinery cannot be naively extended for L2R, because Gaussian Processes (GP) are not directly applicable to ranking. In this paper, we propose a novel paradigm to train probabilistic surrogates for learning to rank in HPO with neural network ensemblesfoot_0 . Our networks are learned to minimize L2R listwise losses (Cao et al., 2007) , and the ensemble's uncertainty estimation is modeled by training diverse networks via the Deep Ensemble paradigm (Lakshminarayanan et al., 2017) . While there have been a few HPO-related works using flavors of basic ranking losses (Bardenet et al., 2013; Wistuba & Pedapati, 2020; Öztürk et al., 2022) , ours is the first



Our code is available in the following repository: https://github.com/releaunifreiburg/ DeepRankingEnsembles

