DOE2VEC: REPRESENTATION LEARNING FOR EXPLORATORY LANDSCAPE ANALYSIS

Abstract

We propose DoE2Vec, a variational autoencoder (VAE)-based methodology to learn optimization landscape characteristics for downstream meta-learning tasks, e.g., automated selection of optimization algorithms. Principally, using large training data sets generated with a random function generator, DoE2Vec selflearns an informative latent representation for any design of experiments (DoE). Unlike the classical exploratory landscape analysis (ELA) method, our approach does not require any feature engineering and is easily applicable for high dimensional search spaces. For validation, we inspect the quality of latent reconstructions and analyze the latent representations using different experiments. The latent representations not only show promising potentials in identifying similar (cheapto-evaluate) surrogate functions, but also can significantly boost performances when being used complementary to the ELA features in classification tasks.

1. INTRODUCTION

Solving real-world black-box optimization problems can be extremely complicated, particularly if they are strongly nonlinear and require expensive function evaluations. As suggested by the no free lunch theorem in Wolpert & Macready (1997) , there is no such things as a single-best optimization algorithm, that is capable of optimally solving all kind of problems. The task in identifying the most time-and resource-efficient optimization algorithms for each specific problem, also known as the algorithm selection problem (ASP) (see Rice (1976) ), is tedious and challenging, even with domain knowledge and experience. In recent years, landscape-aware algorithm selection has gained increasing attention from the research community, where the fitness landscape characteristics are exploited to explain the effectiveness of an algorithm across different problem instances (see van Stein et al. (2013); Simoncini et al. (2018)) . Beyond that, it has been shown that landscape characteristics are sufficiently informative in reliably predicting the performance of optimization algorithms, e.g., using Machine Learning approaches (see Bischl et al. (2012) Exploratory landscape analysis (ELA), for instance, considers six classes of expertly designed features, including y-distribution, level set, meta-model, local search, curvature and convexity, to numerically quantify the landscape complexity of an optimization problem, such as multimodality, global structure, separability, plateaus, etc. (see Mersmann et al. (2010; 2011) ). Each feature class consists of a set of features, which can be relatively cheaply computed. Other than typical ASP tasks, ELA has shown great potential in a wide variety of applications, such as understanding the underlying landscape of neural architecture search problems in van Stein et al. (2020) and classifying the Black-Box Optimization Benchmarking (BBOB) problems in Renau et al. (2021) . Recently, ELA has been applied not only to analyze the landscape characteristics of crash-worthiness optimization problems from automotive industry, but also to identify appropriate cheap-to-evaluate functions as representative of the expensive real-world problems (see Long et al. (2022) ). While ELA is well established in capturing the optimization landscape characteristics, we would like to raise our concerns regarding the following aspects. Instead of improving the ELA method directly, e.g., searching for more discriminative landscape features, we approach the problems from a different perspective. In this paper, we introduce an automated self-supervised representation learning approach to characterize optimization landscapes by exploiting information in the latent space. Essentially, a deep variational autoencoder (VAE) model is trained to extract an informative feature vector from a design of experiments (DoE), which is essentially a generic low-dimensional representation of the optimization landscape. Thus, the name of our approach: DoE2Vec. While the functionality of our approach is fully independent of ELA, experimental results reveal that its performance can be further improved when combined with ELA (and vice versa). To the best of our knowledge, a similar application approach with VAE in learning optimization landscape characteristics is still lacking. Section 2 briefly introduces the stateof-the-art representation learning of optimization landscapes as well as the concepts of (variational) autoencoder. This is followed by the description of our methodology in Section 3. Next, we explain and discuss our experimental results in Section 4. Lastly, conclusions and outlooks are included in Section 5.

2. REPRESENTATION OF OPTIMIZATION LANDSCAPE

In the conventional ELA approach, landscape features are computed primarily using a DoE of some samples points W = {w 1 , ..., w n } evaluated on an objective function f , i.e., f : R d → R, with To overcome the drawbacks of the ELA approach, attentions have been focused on developing algorithm selection approaches without requiring landscape features. For example, Prager et al. w i ∈ R d , (2022) proposed two feature-free approaches using a deep learning method, where optimization landscapes can be represented through either 1) image-based fitness maps or 2) graph-like fitness clouds. In the first approach, convolutional neural networks were employed to project data sets into two-dimensional fitness maps, using different dimensionality reduction techniques. In the second approach, data sets were embedded into point clouds using modified point cloud transformers, which can accurately capture the global landscape characteristics. Nonetheless, the fitness map approach suffered from the curse of dimensionality, while the fitness cloud approach was limited to fixed training sample size. Additional relevant works can be found in Alissa et al. (2019); Seiler et al. (2020; 2022); Prager et al. (2021) . Unlike these approaches, which were directly used as classifiers, the latent feature sets generated by our proposed approach can be easily combined with other features, such as ELA features, for classification tasks. In our work, we do not propose to replace conventional ELA features, but to actually extend them with autoencoder (AE) based latent-space features. Since the implementation of both approaches mentioned above is not available, a comparison to our work in terms of classifying high-level properties is only feasible by directly comparing their results on a identical experimental setup. Following this, results from the downstream tasks in this work can partially be compared to the mentioned results in Seiler et al. ( 2022), inlcuding the standard Principal Component Analysis (PCA), reduced Multiple Channel (rMC) and a transformer based approach (Transf.), taking into account that additional hyperparameter tuning was involved in their classification experiments with ELA features.



; Dréo et al. (2019); Kerschke & Trautmann (2019a); Jankovic & Doerr (2020); Jankovic et al. (2021); Pikalov & Mironovich (2021)). In other words, the expected performance of an optimization algorithm on an unseen problem can be estimated, once the corresponding landscape characteristics have been identified. Interested readers are referred to Muñoz et al. (2015b;a); Kerschke et al. (2019); Kerschke & Trautmann (2019a); Malan (2021).

1. Many of the ELA are highly correlated and redundant, particularly those within the same feature class (see Škvorc et al. (2020)). 2. Some of the ELA features are insufficiently expressive in distinguishing problem instances (see Renau et al. (2019)). 3. Since ELA features are manually engineered by experts, their feature computation might be biased in capturing certain landscape characteristics (see Seiler et al. (2020)). 4. ELA features are less discriminative for high-dimensional problems (see Muñoz & Smith-Miles (2017)).

n represents sample size, and d represents function dimension. The objective function values f (w i ), i ∈ {1, . . . , n} are the inputs of VAE models in DoE2Vec. In this work, we consider ELA features similar to those in Long et al. (2022), which do not require additional sampling, and compute them with the package flacco by Kerschke & Trautmann (2019b;c). These features include commonly used dimensionality reduction approaches such as Principal Component Analysis (PCA) Abdi & Williams (2010), a number of simple surrogate models and many others.

