COVARIANCE MATRIX ADAPTATION MAP-ANNEALING

Abstract

Single-objective optimization algorithms search for the single highest-quality solution with respect to an objective. Quality diversity (QD) algorithms, such as Covariance Matrix Adaptation MAP-Elites (CMA-ME), search for a collection of solutions that are both high-quality with respect to an objective and diverse with respect to specified measure functions. However, CMA-ME suffers from three major limitations highlighted by the QD community: prematurely abandoning the objective in favor of exploration, struggling to explore flat objectives, and having poor performance for low-resolution archives. We propose a new quality diversity algorithm, Covariance Matrix Adaptation MAP-Annealing (CMA-MAE), that addresses all three limitations. We provide theoretical justifications for the new algorithm with respect to each limitation. Our theory informs our experiments, which support the theory and show that CMA-MAE achieves state-of-the-art performance.

1. INTRODUCTION

Consider an example problem of searching for celebrity faces in the latent space of a generative model. As a single-objective optimization problem, we specify an objective f that targets a celebrity such as Tom Cruise. A single-objective optimizer, such as CMA-ES (Hansen, 2016) , will converge to a single solution of high objective value, an image that looks like Tom Cruise as much as possible. However, this objective has ambiguity. How old was Tom Cruise in the photo? Did we want the person in the image to have short or long hair? By instead framing the problem as a quality diversity optimization problem, we additionally specify a measure function m 1 that quantifies age and a measure function m 2 that quantifies hair length. A quality diversity algorithm (Pugh et al., 2015; Chatzilygeroudis et al., 2021) , such as CMA-ME (Fontaine et al., 2020) , can then optimize for a collection of images that are diverse with respect to age and hair length, but all look like Tom Cruise. While previous work (Fontaine et al., 2020; 2021a; b; Earle et al., 2021) has shown that CMA-ME solves such QD problems efficiently, three important limitations of the algorithm have been discovered. First, on difficult to optimize objectives, variants of CMA-ME will abandon the objective too soon (Tjanaka et al., 2022) , and instead favor exploring the measure space, the vector space defined by the measure function outputs. Second, the CMA-ME algorithm struggles to explore flat objective functions (Paolo et al., 2021) . Third, CMA-ME works well on high-resolution archives, but struggles to explore low-resolution archives (Cully, 2021; Fontaine & Nikolaidis, 2021a) . We note that the chosen archive resolution affects the performance of all current QD algorithms. We propose a new algorithm, CMA-MAE, that addresses these three limitations. To address the first limitation, we derive an algorithm that smoothly blends between CMA-ES and CMA-ME. First, consider how CMA-ES and CMA-ME differ. At each step CMA-ES's objective ranking maximizes the objective function f by approximating the natural gradient of f at the current solution point (Akimoto et al., 2010) . In contrast, CMA-ME's improvement ranking moves in the direction of the natural gradient of f -f A at the current solution point, where f A is a discount function equal to the objective of the best solution so far that has the same measure values as the current solution point. The function f -f A quantifies the gap between a candidate solution and the best solution so far at the candidate solution's position in measure space. Our key insight is to anneal the function f A by a learning rate α. We observe that when α = 0, then our discount function f A never increases and our algorithm behaves like CMA-ES. However, when For α = 0, the objective f is equivalent to f -f A , as f A remains constant. For larger of α, CMA-MAE discounts region Y in favor of prioritizing the optimization of region X. α = 1, then our discount function always maintains the best solution for each region in measure space and our algorithm behaves like CMA-ME. For 0 < α < 1, CMA-MAE smoothly blends between the two algorithms' behavior, allowing for an algorithm that spends more time on the optimization of f before transitioning to exploration. Figure 1 is an illustrative example of varying the learning rate α. Our proposed annealing method naturally addresses the flat objective limitation. Observe that both CMA-ES and CMA-ME struggle on flat objectives f as the natural gradient becomes 0 in this case and each algorithm will restart. However, we show that, when CMA-MAE optimizes f -f A for 0 < α < 1, the algorithm becomes a descent method on the density histogram defined by the archive. Finally, CMA-ME's poor performance on low resolution archives is likely caused by the nonstationary objective f -f A changing too quickly for the adaptation mechanism to keep up. Our archive learning rate α controls how quickly f -f A changes. We derive a conversion formula for α that allows us to derive equivalent α for different archive resolutions. Our conversion formula guarantees that CMA-MAE is the first QD algorithm invariant to archive resolution. Overall, our work shows how a simple algorithmic change to CMA-ME addresses all three major limitations affecting CMA-ME's performance and robustness. Our theoretical findings justify the aforementioned properties and inform our experiments, which show that CMA-MAE outperforms state-of-the-art QD algorithms and maintains robust performance across different archive resolutions.

2. PROBLEM DEFINITION

Quality Diversity. We adopt the quality diversity (QD) problem definition from Fontaine & Nikolaidis (2021a). A QD problem consists of an objective f : R n → R that maps n-dimensional solution parameters to a scalar value denoting the quality of the solution and k measures m i : R n → R or, as a vector function, m : R n → R k that quantify behavior or attributes of each solutionfoot_0 . The range of m forms a measure space S = m(R n ). The QD objective is to find a set of solutions θ ∈ R n , such that m(θ) = s for each s in S and f (θ) is maximized. The measure space S is continuous, but solving algorithms need to produce a finite collection of solutions. Therefore, QD algorithms in the MAP-Elites (Mouret & Clune, 2015; Cully et al., 2015) family relax the QD objective by discretizing the space S. Given T as the tessellation of S into M cells, the QD objective becomes to find a solution θ i for each of the i ∈ {1, . . . , M } cells, such that each θ i maps to the cell corresponding to m(θ i ) in the tesselation T . The QD objective then becomes maximizing the objective value f (θ i ) of all cells: max M i=1 f (θ i ) The differentiable quality diversity (DQD) problem (Fontaine & Nikolaidis, 2021a) is a special case of the QD problem where both the objective f and measures m i are first-order differentiable.



In agent-based settings, such as reinforcement learning, the measure functions are sometimes called behavior functions and the outputs of each measure function are called behavioral characteristics or behavior descriptors.



Figure 1: An example of how different α values affect the function f -f A optimized by CMA-MAE after a fixed number of iterations.Here f is a bimodal objective where mode X is harder to optimize than mode Y , requiring more optimization steps, and modes X and Y are separated by measure m 1 . For α = 0, the objective f is equivalent to f -f A , as f A remains constant. For larger of α, CMA-MAE discounts region Y in favor of prioritizing the optimization of region X.

