BOPTFORMER: BEYOND TRANSFORMER FOR BLACK-BOX OPTIMIZATION

Abstract

We design a novel Transformer for continuous unconstrained black-box optimization, called BOptformer. Inspired by the similarity between Vision Transformer and evolutionary algorithms (EAs), we modify Tansformer's multi-head self-attention layer, feed-forward network, and residual connection to implement the functions of crossover, mutation, and selection operators. Moreover, we devise an iterated mode to generate and survive potential solutions like EAs. BOptformer learns the optimization strategies from the target task automatically without human intervention, which addresses the poor generalization of human-designed EAs when given a new task. Compared to baselines, such as EAs, Bayesian optimization, and the learning-to-optimize (L2O) method, BOptformer shows the top performance in six black-box functions and two real-world applications. We also find that untrained BOptformer can achieve good performance on the simple tasks. Deep BOptformer performs better than shallow ones. We bring a new and efficient Transformer-based black-box optimization framework for the L2O and EA communities.

1. INTRODUCTION

Many tasks, such as neural architecture search (Elsken et al., 2019) and hyperparameter optimization (Hutter et al., 2019; Golovin et al., 2017) , can be abstracted as black-box optimization problems, which means that although we can evaluate f (x) for any x ∈ X, we have no access to any other information about f , such as the Hessian and gradients. A series of hand-designed algorithms, such as evolutionary algorithms (EAs) (Mitchell, 1998; Khadka & Tumer, 2018; Zhang & Li, 2007) , Bayesian optimization (Snoek et al., 2012; Mutny & Krause, 2018; Li et al., 2017; Kandasamy et al., 2015; Balandat et al., 2020) , and evolutionary strategies (ES) (Wierstra et al., 2014; Hansen & Ostermeier, 2001; Auger & Hansen, 2005; Salimans et al., 2017) , have been designed to solve black-box optimization. Recently, the learning to optimize (L2O) framework (Chen et al., 2022) gives an new insight on optimization by leveraging the recurrent neural network (RNN), long short-term memory architecture (LSTM) (Chen et al., 2020; Andrychowicz et al., 2016; Chen et al., 2017; Li & Malik, 2016; Wichrowska et al., 2017; Bello et al., 2017) or multilayer perceptron (MLP) (Metz et al., 2019) as the optimizer to develop optimization methods, aiming at reducing the laborious iterations of hand engineering (Sun et al., 2018; Vicol et al., 2021; Flennerhag et al., 2021; Li & Malik, 2016; Sun et al., 2018) . They don't concentrate on issues with black-box optimization. The core of L2O is constructing a strong mapping from the initial solutions to the optimal solution. Although several efforts like (Cao et al., 2019; Chen et al., 2017) have coped with the black-box problems, their effectiveness may be hindered by the limited representational capabilities of RNN, LSTM, and MLP. In EAs, the hand-designed crossover, mutation, and selection operators make the initial population move near the optimal solution. This updated model has stood the test of time. Because the evolutionary operators must be modified to maximize their performance on the target task, humandesigned EAs have a low generalization ability to a new black-box problem. Most notably, the limited use of target function information in EA design due to expert knowledge limitations makes it difficult to adapt to the target task. Learning the optimization strategies from the taget task is the key step to overcome this limitation. This paper designs a novel L2O framework based on the advantages of Vision Transformer (Dosovitskiy et al., 2021) and EAs to overcome the above limitations, termed BOptformer. Moreover, Transformer (Han et al., 2022) owns a strong representation ability, and there is currently no work to use Transformer for optimization. Inspired by the similarity of EAs and Transformer (Zhang et al., 2021; 2022) , BOptformer revised the critical part of Transformer to realize the mapping from the random and optimal populations. To generate potential individuals to approach the optimal solution, we first design an self-attention (SA)-based crossover module (SAC) to simulate the crossover operator of EA, and then the output of this module is input into the proposed feed-forward network (FFN)-based mutation module (FM) to perform mutation. Moreover, the residual and selection module (RSSM) is designed to survive the fittest individuals. RSSM is a pairwise comparison between the output of SAC, FM, and the input population regarding their fitness. We design an BOptformer Block (OB) consisting of SAC, FM, and RSSM. Finally, we construct BOptformer by stacking OBs to simulate generations of EAs. Moreover, to cope with black-box optimization, we establish a function set to train BOptformer under an unsupervised mode. We construct a set of differentiable functions with similar properties to the targeted black-box optimization problems. This training set contains the pair of the initial population and the designed function. Thus, we can use gradient-based methods to train BOptformer. We tested BOptformer on six standard functions, the protein docking (Cao & Shen, 2020) problem, and the planar mechanic arm problem (Wang et al., 2021) . The experimental results demonstrate the top rank of BOptformer and the strong representation compared with three population-based baselines, Bayesian optimization, and one learning-to-optimize method (Cao et al., 2019) . Moreover, we also analyze the effect of learning rate, deep structure, and weight sharing between OBs. The highlights of this paper are summarized as follows: 1) We propose a solid Transformer-based L2O framework addressing black-box problems to the L2O community. We have demonstrated its benefit when compared with standard black-box optimization methods, particularly for the L2O-based method. 2) BOptformer efficiently uses the target black-box function's information to aid in the development of the optimization strategy. Compared to the human-designed EA, BOptformer has a substantially greater degree of task fit.

2. RELATED WORK

Transformer Transformer structure achieves significant progress for machine translation task (Vaswani et al., 2017 ), computer vision task (Dosovitskiy et al., 2021) , time series task (Zhou et al., 2021) , and so on. Many improved models are proposed and obtain great achievements (Han et al., 2022) . There are no Transformer-based efforts for handling optimization problems, which is crucial in the machine learning community. (Vaswani et al., 2017) proposed the meta-learning hyperparameter optimization framework with Transformers to learn both policy and function priors from data across different search spaces. However, the BOptformer proposed in this paper expands the application scope of Transformer and can effectively deal with this case. The basic modules of Transformer are shown in Appendix A.1. Evolutionary Algorithm Inspired by the evolution of species, EAs have provided surprising performance for black-box optimization (Mitchell, 1998) . The basic modules of EAs are shown in Appendix A.2. Many influential variants have been proposed to deal with different problems (Das & Suganthan, 2010; Wu & Liu, 2019 ), but at their core they are: 1) recombination and mutation, how to produce the excellent solution; 2) selection, how to choose the best individuals between the parents and offspring. Thus, many algorithmic components have been designed for different tasks. The performance of algorithms varies towards various tasks, as different optimization strategies may be required given diverse landscapes. Current methods manually adjust genetic operators' hyperparameters and design the combination between them (Kerschke et al., 2019; Tian et al., 2020) to map the random population to the optimal solution. We require an expert to design or choose the evolutionary operations when given a new black-box optimization task to maximize its performance on the target task, which negatively impacts generalization ability. Most notably, the limited use of target function information in EA design due to expert knowledge limitations makes it difficult to adapt to the target task. The suggested BOptformer uses a Transformer framework instead of the manually designed crossover, mutation, and selection operators. The genetic operator is then designed automatically by the built Transformer rather than by a human designer. BOptformer efficiently

