AUGMENTATIVE TOPOLOGY AGENTS FOR OPEN-ENDED LEARNING

Abstract

In this work, we tackle the problem of open-ended learning by introducing a method that simultaneously evolves agents and increasingly challenging environments. Unlike previous open-ended approaches that optimize agents using a fixed neural network topology, we hypothesize that generalization can be improved by allowing agents' controllers to become more complex as they encounter more difficult environments. Our method, Augmentative Topology EPOET (ATEP), extends the Enhanced Paired Open-Ended Trailblazer (EPOET) algorithm by allowing agents to evolve their own neural network structures over time, adding complexity and capacity as necessary. Empirical results demonstrate that ATEP results in general agents capable of solving more environments than a fixed-topology baseline. We also investigate mechanisms for transferring agents between environments and find that a species-based approach further improves the performance and generalization of agents.

1. INTRODUCTION

Machine learning has successfully been used to solve numerous problems, such as classifying images (Krizhevsky et al., 2012) , writing news articles (Radford et al., 2019; Schick & Schütze, 2021) or solving games like Atari (Mnih et al., 2015) or chess (Silver et al., 2018) . While impressive, these approaches still largely follow a traditional paradigm where a human specifies a task that is subsequently solved by the agent. In most cases, this is the end of the agent's learning-once it can solve the required task, no further progression takes place. Through the motivation by the fact that humans have always learnt and innovated in an open-ended manner, Open-ended learning research field emerged (Stanley et al., 2017) . For instance, humans did not invent microwaves to heat food, but to study radars. Vacuum tubes and electricity was invented for very different reason but we stumbled upon computers through them (Stanley, 2019) . In perspective of agent, Open-ended learning is a research field that rather than converge to a specific goal, the aim is to obtain an increasingly growing set of diverse and interesting behaviors (Stanley et al., 2017) . One approach is to allow both the agents, as well as the environments, to change, evolve and improve over time (Brant & Stanley, 2017; Wang et al., 2019) . This has the potential to discover a large collection of useful and reusable skills (Quessy & Richardson, 2021) , as well as interesting and novel environments (Gisslén et al., 2021) . Open-ended learning is also a much more promising way to obtain truly general agents than the traditional single task-oriented paradigm (Team et al., 2021) . The concept of open-ended evolution has been a part of artificial life (ALife) research for decades now, spawning numerous artificial worlds (Ray, 1991; Ofria & Wilke, 2004; Spector et al., 2007; Yaeger & Sporns, 2006; Soros & Stanley, 2014) . These worlds consist of agents with various goals, such as survival, predation, or reproduction. Recently, open-ended algorithms have received renewed interest (Stanley, 2019 ), with Stanley et al. (2017) proposing the paradigm as a path towards the goal of human-level artificial intelligence. A major breakthrough in open-ended evolution was that of NeuroEvolution of Augmenting Topologies (NEAT) (Stanley & Miikkulainen, 2002) , which was capable of efficiently solving complex reinforcement learning tasks. Its key idea was to allow the structure of the network to evolve alongside the weights, starting with a simple network and adding complexity as the need arises. This inspired future research about open-endedly evolving networks indefinitely (Soros & Stanley, 2014) . Specifically, novelty search (Lehman et al., 2008) , used the idea of novelty to drive evolution, instead of traditional objective-based techniques. This in turn led to the emergence of quality diversity (QD) algorithms (Lehman & Stanley, 2011; Mouret & Clune, 2015; Ecoffet et al., 2019; Nilsson & Cully, 2021) , which are based on combining novelty with an objective sense of progress, where the goal is to obtain a collection of diverse and high-performing individuals. While QD has successfully been used in numerous domains, such as robotic locomotion (Cully et al., 2015; Mouret & Clune, 2015; Tarapore et al., 2016) , video game playing (Ecoffet et al., 2019) and procedural content generation (Khalifa et al., 2018; Earle et al., 2022) , it still is not completely open-ended, where completely open-ended means to run indefinitely and create novel artifacts. One reason for this is that the search space for phenotypical behavior characteristics (or behavioral descriptors) remains fixed (Mouret & Clune, 2015) . A second reason is that in many cases, the environment remains fixed, which limits the open-endedness of the algorithm (Wang et al., 2019) . A way to circumvent this is to co-evolve problems and solutions, as is done by Minimal Criterion Coevolution (MCC) (Brant & Stanley, 2017) . This co-evolutionary pressure allowed more complex mazes to develop, and better agents to solve them emerged, giving rise to an open-ended process. However, MCC had some limits; for instance, it only allows new problems if they are solvable by individuals in the current population. This leads to only slight increases in difficulty, and complexity which only arises randomly. Taking this into account, Paired Open-ended Trailblazer (POET) (Wang et al., 2019) builds upon MCC, but instead allows the existence of unsolvable environments, if it was likely that some individuals could quickly learn to solve these environments. POET further innovates by transferring agents between different environments, to increase the likelihood of solving hard problems. While POET obtained state of the art results, its diversity slows down as it evolves for longer. Enhanced POET (Wang et al., 2020) adds improved algorithmic components to the base POET method, resulting in superior performance and less stagnation. Enhanced POET, however, uses agents with fixed topology neural network controllers. While this approach works well for simple environments, it has an eventual limit on the complexity of tasks it can solve: at some point of complexity, the fixed topology agents may not have sufficient capacity to solve the environments. To address this issue, we propose Augmentative Topology Enhanced POET (ATEP), which uses NEAT to evolve agents with variable, and potentially unbounded, network topologies. We argue that fixed-topology agents will cease to solve environments after a certain level of complexity and empirically show that ATEP outperforms Enhanced POET (EPOET) in a standard benchmark domain. Finally, we find that using NEAT results in improved exploration and better generalization compared to Enhanced POET.

2. RELATED WORK

POET (Wang et al., 2019) and EPOET (Wang et al., 2020) are the founding algorithms of the field of open-ended learning, building upon prior approaches such as MCC (Brant & Stanley, 2017) . This has led to an explosion of new use cases such as PINSKY (Dharna et al., 2020; 2022) , which uses POET on 2D Atari games. This approach extends POET to generate 2D Atari video game levels alongside agents that solve these levels. Quessy & Richardson (2021) uses unsupervised skill discovery (Campos et al., 2020; Eysenbach et al., 2019; Sharma et al., 2020) in the context of POET to discover a large repertoire of useful skills. Meier & Mujika (2022) also investigate unsupervised skill discovery through reward functions learned by neural networks. Other uses of POET include the work by Zhou & Vanschoren (2022) , who obtain diverse skills in a 3D locomotion task. POET has also been shown to aid in evolving robot morphologies (Stensby et al., 2021) and avoiding premature convergence which is often the result when using handcrafted curricula. Norstein et al. (2022) use MAP-Elites (Mouret & Clune, 2015) to open-endedly create a structu red repertoire of various terrains and virtual creatures. Hejna III et al. (2021) introduces TAME that evolves morphologies without tasks, potentially creating a system of open-ended morphology evolution. Adversarial approaches are commonly adopted when developing open-ended algorithms. Dennis et al. (2020) propose PAIRED, a learning algorithm where an adversary would produce an environment based on the difference between the performance of an antagonist and a protagonist agent.

