FEDHPO-BENCH: A BENCHMARK SUITE FOR FEDER-ATED HYPERPARAMETER OPTIMIZATION

Abstract

Hyperparameter optimization (HPO) is crucial for machine learning algorithms to achieve satisfactory performance. Its research progress has been boosted by existing HPO benchmarks. Nonetheless, existing efforts in benchmarking all focus on HPO for traditional centralized learning while ignoring federated learning (FL), a promising paradigm for collaboratively learning models from dispersed data. In this paper, we first identify some uniqueness of HPO for FL algorithms from various aspects. Due to this uniqueness, existing HPO benchmarks no longer satisfy the need to compare HPO methods in the FL setting. To facilitate the research of HPO in the FL setting, we propose and implement a benchmark suite FEDHPO-BENCH that incorporates comprehensive FedHPO problems, enables flexible customization of the function evaluations, and eases continuing extensions. We also conduct extensive experiments based on FEDHPO-BENCH to provide the community with more insights into FedHPO. We open-sourced FEDHPO-BENCH at https://github.com/FedHPO-Bench/FedHPO-Bench-ICLR23 

1. INTRODUCTION

Most machine learning (ML) algorithms expose many design choices, which can drastically impact the ultimate performance. Hyperparameter optimization (HPO) (Feurer & Hutter, 2019) aims at making the right choices without human intervention. To this end, HPO methods usually attempt to solve min λ∈Λ1×•••×Λ K f (λ), where each Λ k corresponds to the candidate choices of a specific hyperparameter, e.g., taking the learning rate from Λ 1 = [0.01, 1.0] and the batch size from Λ 2 = {16, 32, 64}. For each specified λ, f (λ) is the output result (e.g., validation loss) of executing the considered algorithm configured by λ. A solution λ * found for such a problem is expected to make the considered algorithm lead to superior generalization performance. Research in this line has been facilitated by HPO benchmarks (Gijsbers et al., 2019; Eggensperger et al., 2021; Pineda-Arango et al., 2021) , which prepare many HPO problems so that different HPO methods can be effortlessly compared, encouraging fair, reliable, and reproducible empirical studies. However, existing HPO benchmarks all focus on traditional learning paradigms, where the functions to be optimized correspond to centralized learning tasks. Federated learning (FL) (McMahan et al., 2017; Li et al., 2020a) , as a privacy-preserving paradigm for collaboratively learning a model from distributed data, has not been considered. Actually, along with the increasing privacy concerns from the whole society, FL has been gaining more attention from academia and industry. Meanwhile, HPO for FL algorithms (denoted by FedHPO from now on) is identified as a critical and promising open problem in FL (Kairouz et al., 2019) . In this paper, we first elaborate on several differences between FedHPO and traditional HPO (see Section 2.2), which essentially come from FL's distributed nature and the heterogeneity among FL's participants. These differences make existing HPO benchmarks inappropriate for studying FedHPO and, in particular, unusable for comparing FedHPO methods. Consequently, several recently proposed FedHPO methods (Zhou et al., 2021; Dai et al., 2020; Khodak et al., 2021; Zhang et al., 2021; Guo et al., 2022) are evaluated on respective problems and have not been uniformly implemented in one FL framework and well benchmarked. Motivated by FedHPO's uniqueness and the successes of existing HPO benchmarks, we propose and implement FEDHPO-BENCH, a dedicated benchmark suite, to facilitate the research and application of FedHPO. FEDHPO-BENCH is featured by satisfying the desiderata as follows:

