AUGMENTATIVE TOPOLOGY AGENTS FOR OPEN-ENDED LEARNING

Abstract

In this work, we tackle the problem of open-ended learning by introducing a method that simultaneously evolves agents and increasingly challenging environments. Unlike previous open-ended approaches that optimize agents using a fixed neural network topology, we hypothesize that generalization can be improved by allowing agents' controllers to become more complex as they encounter more difficult environments. Our method, Augmentative Topology EPOET (ATEP), extends the Enhanced Paired Open-Ended Trailblazer (EPOET) algorithm by allowing agents to evolve their own neural network structures over time, adding complexity and capacity as necessary. Empirical results demonstrate that ATEP results in general agents capable of solving more environments than a fixed-topology baseline. We also investigate mechanisms for transferring agents between environments and find that a species-based approach further improves the performance and generalization of agents.

1. INTRODUCTION

Machine learning has successfully been used to solve numerous problems, such as classifying images (Krizhevsky et al., 2012) , writing news articles (Radford et al., 2019; Schick & Schütze, 2021) or solving games like Atari (Mnih et al., 2015) or chess (Silver et al., 2018) . While impressive, these approaches still largely follow a traditional paradigm where a human specifies a task that is subsequently solved by the agent. In most cases, this is the end of the agent's learning-once it can solve the required task, no further progression takes place. Through the motivation by the fact that humans have always learnt and innovated in an open-ended manner, Open-ended learning research field emerged (Stanley et al., 2017) . For instance, humans did not invent microwaves to heat food, but to study radars. Vacuum tubes and electricity was invented for very different reason but we stumbled upon computers through them (Stanley, 2019) . In perspective of agent, Open-ended learning is a research field that rather than converge to a specific goal, the aim is to obtain an increasingly growing set of diverse and interesting behaviors (Stanley et al., 2017) . One approach is to allow both the agents, as well as the environments, to change, evolve and improve over time (Brant & Stanley, 2017; Wang et al., 2019) . This has the potential to discover a large collection of useful and reusable skills (Quessy & Richardson, 2021), as well as interesting and novel environments (Gisslén et al., 2021) . Open-ended learning is also a much more promising way to obtain truly general agents than the traditional single task-oriented paradigm (Team et al., 2021) . The concept of open-ended evolution has been a part of artificial life (ALife) research for decades now, spawning numerous artificial worlds (Ray, 1991; Ofria & Wilke, 2004; Spector et al., 2007; Yaeger & Sporns, 2006; Soros & Stanley, 2014) . These worlds consist of agents with various goals, such as survival, predation, or reproduction. Recently, open-ended algorithms have received renewed interest (Stanley, 2019 ), with Stanley et al. (2017) proposing the paradigm as a path towards the goal of human-level artificial intelligence. A major breakthrough in open-ended evolution was that of NeuroEvolution of Augmenting Topologies (NEAT) (Stanley & Miikkulainen, 2002) , which was capable of efficiently solving complex reinforcement learning tasks. Its key idea was to allow the structure of the network to evolve alongside the weights, starting with a simple network and adding complexity as the need arises. This inspired

