MULTI-OBJECTIVE OPTIMIZATION VIA EQUIVARIANT DEEP HYPERVOLUME APPROXIMATION

Abstract

Optimizing multiple competing objectives is a common problem across science and industry. The inherent inextricable trade-off between those objectives leads one to the task of exploring their Pareto front. A meaningful quantity for the purpose of the latter is the hypervolume indicator, which is used in Bayesian Optimization (BO) and Evolutionary Algorithms (EAs). However, the computational complexity for the calculation of the hypervolume scales unfavorably with increasing number of objectives and data points, which restricts its use in those common multi-objective optimization frameworks. To overcome these restrictions, previous work has focused on approximating the hypervolume using deep learning. In this work, we propose a novel deep learning architecture to approximate the hypervolume function, which we call DeepHV. For better sample efficiency and generalization, we exploit the fact that the hypervolume is scale equivariant in each of the objectives as well as permutation invariant w.r.t. both the objectives and the samples, by using a deep neural network that is equivariant w.r.t. the combined group of scalings and permutations. We show through an ablation study that including these symmetries leads to significantly improved model accuracy. We evaluate our method against exact, and approximate hypervolume methods in terms of accuracy, computation time, and generalization. We also apply and compare our methods to state-of-theart multi-objective BO methods and EAs on a range of synthetic and real-world benchmark test cases. The results show that our methods are promising for such multi-objective optimization tasks.

1. INTRODUCTION

Imagine, while listening to a lecture you also quickly want to check out the latest news on your phone, so you can appear informed during lunch. As an experienced listener, who knows what lecture material is important, and an excellent reader, who knows how to scan over the headlines, you are confident in your abilities in each of those tasks. So you continue listening to the lecture, while scrolling through news. Suddenly you realize you need to split focus. You face the unavoidable trade-off between properly listening to the lecture while slowly reading the news, or missing important lecture material while fully processing the news. Since this is not your first rodeo, you learned over time how to transition between these competing objectives while still being optimal under those trade-off constraints. Since you don't want to stop listening to the lecture, you decide to listen as closely as possible, while still making some progress in reading the news. Later, during lunch, you propose an AI that can read the news while listening to a lecture. The question remains of how to train an AI to learn to excel in different, possibly competing, tasks or objectives and make deliberate well-calibrated trade-offs between them, whenever necessary. The challenges involved in this setting are different from the ones in the single-objective case. If we are only confronted with a single objective function f the task is to find a point x * that maximizes f : x * ∈ argmax x∈X f (x), only by iteratively proposing input points x n and evaluating f on x n and observing the output values f (x n ). Since the usual input space X = R D is uncountably infinite we can never be certain if the finite number of points x 1 , . . . , x N contain a global maximizer x * of f . This is an instance of the exploration-vs-exploitation trade-off inherent to Bayesian optimization (BO) (Snoek et al., 2012) , active learning, (Burr, 2009) and reinforcement learning (Kaelbling et al., 1996) . In the multi-objective (MO) setting, where we have M ≥ 2 objective functions, f 1 , . . . , f M , it is desirable, but usually not possible to find a single point x * that maximizes all objectives at the same time: x * ∈ M m=1 argmax x∈X f m (x). (1) A maximizer x * 1 of f 1 might lead to a non-optimal value of f 2 , etc. So the best we can do in this setting is to find the set of Pareto points of F = (f 1 , . . . , f M ), i.e. those points x ∈ X that cannot be improved in any of the objectives f m , m = 1, . . . , M , while not lowering the other values: X * := {x ∈ X | ∄ x ′ ∈ X . F (x) ≺ F (x ′ )} , where y ⪯ y ′ means that y m ≤ y ′ m for all m = 1, . . . , M and y ≺ y ′ that y ⪯ y ′ , but y ̸ = y ′ . In conclusion, in the multi-objective setting we are rather concerned with the exploration of the Pareto front: P * := F (X * ) ⊆ R M =: Y, which often is a (M -1)-dimensional subspace of Y = R M . Success in this setting is measured by how "close" the empirical Pareto front, based on the previously chosen points x 1 , . . . , x N : x N +1 ∈ X in such a way that it would lead to a maximal improvement of the previously measured hypervolume HV M r ( PN ). This is illustrated in Fig. 1 , where the hypervolume (i.e., the area in 2D) is shown for the empirical Pareto front (blue dots). Adding an additional point to the empirical Pareto front (y 8 ) in Fig. 1 ), increases the hypervolume indicated by the green area. PN := {F (x n ) | n ∈ [N ], ∄ n ′ ∈ [N ]. F (x n ) ≺ F (x n ′ )} ⊆ Y, Unfortunately, known algorithms for computing the hypervolume scale unfavorably with the number of objective functions M and the number of data points N . Nonetheless, finding a fast and scalable computational method that approximates the hypervolume reliably would have far-reaching consequences, and is an active field of research. Computation of the hypervolume has complexity of O(2 N N M ) when computed in a naive way, however more efficient exact algorithms have been proposed such as WFG (O(N M/2 log N ),While et al. ( 2012)) and HBDA (O(N M/2 ), Lacour et al. ( 2017)). However, these computational complexities are still deemed impractical for application in EAs (Tang et al., 2020) , where computational overhead typically is required to be low. Also in BO, where one is typically less concerned with the computational overhead, faster hypervolume methods would be beneficial. For instance, the state-of-the-art expected hypervolume improvement (qEHVI) (Daulton et al., 2020) is dependent on many hypervolume computations, greatly restricting its use to the setting of low M and N . In addition, the authors of the recently proposed MORBO (Daulton et al., 2022) , which is current state-of-the-art in terms of sample-efficiency and scalability to high M and N , identified the computational complexity of the hypervolume as a limitation. Therefore, efforts have been made to approximate the hypervolume. The method FPRAS (Bringmann & Friedrich, 2010) , provides an efficient Monte Carlo (MC) based method to approximate the hypervolume, of complexity O(N M/ϵ), with the error of ± √ ϵ. In addition, at time of this work, a hypervolume approximation based on DeepSets (Zaheer et al., 2017) was proposed called HV-Net (Shang et al., 2022a) . HV-Net uses a deep neural network with permutation invariance (i.e.



Simultaneous optimization of multiple, possibly competing, objectives is not just a challenge in our daily routines, it also finds widespread application in many fields of science. For instance, in machine learning(Wu et al., 2019; Snoek et al., 2012), engineering (Liao et al., 2007; Oyama et al., 2018), and chemistry(O'Hagan et al., 2005; Koledina et al., 2019; MacLeod et al., 2022; Boelrijk et al., 2021; 2023; Buglioni et al., 2022).



Figure 1: Illustration of Pareto front and Hypervolume.

is to the 'true' Pareto front P * , where [N ] := {1, . . . , N }. This is illustrated in Fig. 1, where the blue points form the empirical Pareto front and where the black line depicts the true Pareto front. Since the values F (x n ) can never exceed the values of P * w.r.t. ≺ one way to quantify the closeness of PN to P * is by measuring its hypervolume HV M r ( PN ) inside Y w.r.t. a jointly dominated reference point r ∈ Y. This suggests the multi-objective optimization strategy of picking the next points

