TARGETED HYPERPARAMETER OPTIMIZATION WITH LEXICOGRAPHIC PREFERENCES OVER MULTIPLE OB-JECTIVES

Abstract

Motivated by various practical applications, we propose a novel and general formulation of targeted multi-objective hyperparameter optimization. Our formulation allows a clear specification of an automatable optimization goal using lexicographic preference over multiple objectives. We then propose a randomized directed search method named LexiFlow to solve this problem. We demonstrate the strong empirical performance of the proposed algorithm in multiple hyperparameter optimization tasks.

1. INTRODUCTION

Hyperparameter optimization (HPO) of machine learning models, as a core component of AutoML, is a process of finding a good choice of hyperparameter configuration that optimizes the model "performance". In the context of practical ML systems, there are typically more than one metrics to evaluate the model "performance" on which one desires to optimize. For instance, latency (He et al., 2018) , fairness (Brookhouse & Freitas, 2022) , and explainability (Gonzalez et al., 2021) are important complementary metrics of interest in addition to prediction accuracy in many application scenarios. Typical multi-objective HPO (MO-HPO) approaches (Knowles, 2006; Daulton et al., 2020) seek to find wide-spread Pareto frontiers for users to choose from. This type of method can only establish a partial ordering of the configurations. The final choice on which Pareto frontier to use is typically done manually and is opaque to the optimization algorithm. We call such optimization "untargeted". An automated approach is desirable, especially in repetitive tuning scenarios such as continuous integration and delivery (CI/CD) of machine learning models or MLOps in general (Garg et al., 2021; Mäkinen et al., 2021; Symeonidis et al., 2022) . This automation is possible if the criteria for selecting the final choice is specified explicitly. In this scenario, untargeted HPO can be inefficient as the optimization algorithm may waste resources on finding Pareto frontiers that are far from the desired final choice, i.e., the target. In this work, we consider a targeted HPO scenario: practitioners have a priority order over the objectives, which enables a total ordering of all the configurations. We formalize a general notion of priority order rigorously as a lexicographic preference (Fishburn, 1975) over multiple objectives in an HPO task. It allows users to specify a clear optimization target across multiple objectives before the optimization starts and removes the need for manual post hoc selection. Such a priority structure is found in HPO tasks from various application domains. For example, in many bioinformatics applications, besides the primary objective of finding model hyperparameter configurations with low prediction error, minimizing feature numbers via a feature selection step is found to be helpful in avoiding overfitting and discovering relevant features for domain experts and thus is suggested to be used as an auxiliary objective in HPO (Bommert et al., 2017; Gonzalez et al., 2021) . When both objectives are included, the auxiliary objective is considered less important than the minimization of the prediction error, which naturally forms a lexicographic structure. Despite its appealing practical importance, we find this type of targeted HPO problem remarkably under-explored. In this work, we first provide a rigorous problem formulation for the targeted HPO Figure 1 : Results in an XGBoost tuning task to find accurate and fair models, in which validation loss is specified as an objective of a higher priority, and DSP (fairness-related objective) of a lower priority. Lexi-Target * = l * +0.05, in which l * is the optimal loss value which is unknown before the optimization starts and 0.05 is the user-specified tolerance onloss degradation. The circles represent proposed configurations from different methods in the objective space (darker color indicates later iteration). Both objectives are smaller the better. SO-HPO easily achieves the target loss value but performs poorly regarding fairness. MO-HPO is able to achieve a better performance in terms of fairness compared to SO-HPO. However, it also wastes resources on finding points outside the desired loss target as it seeks a wide spread of the Pareto front. Compared to MO-HPO, a larger fraction of the configurations in LexiFlow are within the loss target. This allows LexiFlow to try more configurations within the desired loss range and achieves better fairness performance. task with lexicographic preference over multiple objectives. This formulation provides a general and flexible way for the users to specify customized targets expressed via a priority order on the objectives and a list of optional goals and tolerances on the objectives. Based on the problem formulation, we propose an algorithm named LexiFlow as a general solution. Specifically, LexiFlow conducts the optimization by leveraging pairwise comparisons between hyperparameter configurations in a randomized direct search framework. The pairwise comparisons are supported by a suite of targeted lexicographic relations, which allow us to navigate toward the more promising region of the search space considering the lexicographic structure in the objective space. By doing so, the algorithm is able to efficiently optimize the objectives with a strong any-time performance. We perform extensive empirical evaluation on four different machine learning model tuning tasks, including a tuning task to find accurate and fast/small neural network models, a tuning task to find accurate and fair Xgboost models, a tuning task on random forest combined with feature selection for gene expression prediction, and a tuning task to mitigate overfitting. Our method has promising performance on all the evaluated tasks. The good empirical performance verified the unique advantages of our proposed algorithm. We demonstrate different performance patterns of methods including our method LexiFlow, a single objective method (SO-HPO), and a multiple objective method (MO-HPO) in Figure 1 .

1.1. RELATED WORK

There are a number of works trying to address MO-HPO tasks on machine learning models, including evolutionary algorithms (Deb et al., 2002; Srinivas & Deb, 1994; Zhang & Li, 2007; Binder et al., 2020) , Bayesian optimization (Knowles, 2006; Daulton et al., 2020; Emmerich & Klinkenberg, 2008; Hernández-Lobato et al., 2016) and multi-fidelity methods (Schmucker et al., 2020; 2021) . All the aforementioned methods treat all the objectives equally important and seek an approximation of the Pareto front in the objectives space. Some recent work proposes to incorporate different types of user preferences into MO-HPO. Paria et al. (2020) allow users to encode their preferences as a prior on a weight vector to scalarize multiple objectives. The prior will induce a posterior distribution over the set of Pareto optimal values. Setting a proper prior in practice is non-trivial as the relation between the prior and posterior is difficult, if not impossible, to derive for an average practitioner. Abdolshah et al. (2019) regard preferences as the stability of objectives, using a constrained Bayesian optimization method. An earlier multi-objective optimization method (Zitzler et al., 2008) incorporates preference information into the multiple objective evolutionary frameworks by defining relations between different populations. However, this preference information is defined upon different populations rather than individual configurations.

