ADADGS: AN ADAPTIVE BLACK-BOX OPTIMIZATION METHOD WITH A NONLOCAL DIRECTIONAL GAUSSIAN

Abstract

The local gradient points to the direction of the steepest slope in an infinitesimal neighborhood. An optimizer guided by the local gradient is often trapped in local optima when the loss landscape is multi-modal. A directional Gaussian smoothing (DGS) approach was recently proposed in (Zhang et al., 2020) and used to define a truly nonlocal gradient, referred to as the DGS gradient, for high-dimensional black-box optimization. Promising results show that replacing the traditional local gradient with the DGS gradient can significantly improve the performance of gradient-based methods in optimizing highly multi-modal loss functions. However, the optimal performance of the DGS gradient may rely on fine tuning of two important hyper-parameters, i.e., the smoothing radius and the learning rate. In this paper, we present a simple, yet ingenious and efficient adaptive approach for optimization with the DGS gradient, which removes the need of hyper-parameter fine tuning. Since the DGS gradient generally points to a good search direction, we perform a line search along the DGS direction to determine the step size at each iteration. The learned step size in turn will inform us of the scale of function landscape in the surrounding area, based on which we adjust the smoothing radius accordingly for the next iteration. We present experimental results on highdimensional benchmark functions, an airfoil design problem and a game content generation problem. The AdaDGS method has shown superior performance over several the state-of-the-art black-box optimization methods.

1. INTRODUCTION

We consider the problem of black-box optimization, where we search for the optima of a loss function F : R d → R given access to only its function queries. This type of optimization finds applications in many machine learning areas where the loss function's gradient is inaccessible, or unuseful, for example, in optimizing neural network architecture (Real et al., 2017) , reinforcement learning (Salimans et al., 2017) , design of adversarial attacks (Chen et al., 2017) , and searching the latent space of a generative model (Sinay et al., 2020) . The local gradient, i.e., ∇F (x), is the most commonly used quantities to guide optimization. When ∇F (x) is inaccessible, we usually reformulate ∇F (x) as a functional of F (x). One class of methods for reformulation is Gaussian smoothing (GS) (Salimans et al., 2017; Liu et al., 2017; Mania et al., 2018) . GS first smooths the loss landscape with d-dimensional Gaussian convolution and represents ∇F (x) by the gradient of the smoothed function. Monte Carlo (MC) sampling is used to estimate the Gaussian convolution. It is known that the local gradient ∇F (x) points to the direction of the steepest slope in an infinitesimal neighborhood around the current state x. An optimizer guided by the local gradient is often trapped in local optima when the loss landscape is non-convex or multimodal. Despite the improvements (Maggiar et al., 2018; Choromanski et al., 2018; 2019; Sener & Koltun, 2020; Maheswaranathan et al., 2019; Meier et al., 2019) , GS did not address the challenge of applying the local gradient to global optimization, especially in high-dimensional spaces. The nonlocal Directional Gaussian Smoothing (DGS) gradient, originally developed in (Zhang et al., 2020) , shows strong potential to alleviate such challenge. The key idea of the DGS gradient is to conduct 1D nonlocal explorations along d orthogonal directions in R d , each of which defines a non-

