POWDERWORLD: A PLATFORM FOR UNDERSTANDING GENERALIZATION VIA RICH TASK DISTRIBUTIONS

Abstract

One of the grand challenges of reinforcement learning is the ability to generalize to new tasks. However, general agents require a set of rich, diverse tasks to train on. Designing a 'foundation environment' for such tasks is tricky -the ideal environment would support a range of emergent phenomena, an expressive task space, and fast runtime. To take a step towards addressing this research bottleneck, this work presents Powderworld, a lightweight yet expressive simulation environment running directly on the GPU. Within Powderworld, two motivating challenges distributions are presented, one for world-modelling and one for reinforcement learning. Each contains hand-designed test tasks to examine generalization. Experiments indicate that increasing the environment's complexity improves generalization for world models and certain reinforcement learning agents, yet may inhibit learning in high-variance environments. Powderworld aims to support the study of generalization by providing a source of diverse tasks arising from the same core rules. Try an interactable demo at kvfrans.com/static/powder

1. INTRODUCTION

One of the grand challenges of reinforcement learning (RL), and of decision-making in general, is the ability to generalize to new tasks. RL agents have shown incredible performance on single task settings (Berner et al., 2019; Lillicrap et al., 2015; Mnih et al., 2013) , yet frequently stumble when presented with unseen challenges. Single-task RL agents are largely overfit on the tasks they are trained on (Kirk et al., 2021) , limiting their practical use. In contrast, a general agent, which can robustly perform well on a wide range of novel tasks, can then be adapted to solve downstream tasks and unseen challenges. General agents greatly depend on a diverse set of tasks to train on. Recent progress in deep learning has shown that as the amount of data increases, so do generalization capabilities of trained models (Brown et al., 2020; Ramesh et al., 2021; Bommasani et al., 2021; Radford et al., 2021) . Agents trained on environments with domain randomization or procedural generation capabilities transfer better to unseen test tasks Cobbe et al. ( 2020 2020). However, as creating training tasks is expensive and challenging, most standard environments are inherently over-specific or limited by their focus on a single task type, e.g. robotic control or gridworld movement. As the need to study the relationships between training tasks and generalization increases, the RL community would benefit greatly from a 'foundation environment' supporting diverse tasks arising from the same core rules. The benefits of expansive task spaces have been showcased in Unsupervised Environment Design (Wang et al., 2019; Dennis et al., 2020; Jiang et al., 2021; Parker-Holder et al., 2022) , but gridworld domains fail to display how such methods scale up. Previous works have proposed specialized task distributions for multi-task training (Samvelyan et al., 2021; Suarez et al., 2019; Fan et al., 2022; Team et al., 2021) , each focusing on a specific decision-making problem. To further investigate generalization, it is beneficial to have an environment where many variations of training tasks can easily be compared. As a step toward lightweight yet expressive environments, this paper presents Powderworld, a simulation environment geared to support procedural data generation, agent learning, and multi-task generalization. Powderworld aims to efficiently provide environment dynamics by running directly Figure 1 : Examples of tasks created in the Powderworld engine. Powderworld provides a physics-inspired simulation over which many distributions of tasks can be defined. Pictured above are human-designed challenges where a player must construct unstable arches, transport sand through a tunnel, freeze water to create a bridge, and draw a path with plants. Tasks in Powderworld creates challenges from a set of core rules, allowing agents to learn generalizable knowledge. Try an interactive Powderworld simulation at kvfrans.com/static/powder on the GPU. Elements (e.g. sand, water, fire) interact in a modular manner within local neighborhoods, allowing for efficient runtime. The free-form nature of Powderworld enables construction of tasks ranging from simple manipulation objectives to complex multi-step goals. Powderworld aims to 1) be modular and supportive of emergent interactions, 2) allow for expressive design capability, and 3) support efficient runtime and representations. Additionally presented are two motivating frameworks for defining world-modelling and reinforcement learning tasks within Powderworld. World models trained on increasingly complex environments show superior transfer performance. In addition, models trained over more element types show stronger fine-tuning on novel rulesets, demonstrating that a robust representation has been learned. In the reinforcement learning case, increases in task complexity benefit generalization up to a task-specific inflection point, at which performance decreases. This point may mark when variance in the resulting reward signal becomes too high, inhibiting learning. These findings provide a starting point for future directions in studying generalization using Powderworld as a foundation.

2. RELATED WORK

Task Distributions for RL. Video games are a popular setting for studying multi-task RL, and environments have been built off NetHack (Samvelyan et al., 2021; Küttler et al., 2020 ), Minecraft (Fan et al., 2022; Johnson et al., 2016; Guss et al., 2019 ), Doom (Kempka et al., 2016 ), and Atari (Bellemare et al., 2013) 2016) detail more open-ended environments containing multiple task types. Most similar to this work may be ProcGen (Cobbe et al., 2020) , a platform that supports infinite procedurally generated environments. However, while ProcGen games each have their own rulesets, Powderworld aims to share core rules across all tasks. Powderworld focuses specifically on runtime and expressivity, taking inspiration from online "powder games" where players build ranges of creations out of simple elements (bal; pow; Bittker).

Generalization in RL.

Multi-task reinforcement learning agents are generally valued for their ability to perform on unseen training tasks (Packer et al., 2018; Kirk et al., 2021) . The sim2real problem requires agents aim to generalize to out-of-distribution real world domains (Tobin et al., 2017; Sadeghi & Levine, 2016) . The platforms cited above also target generalization, often within the context of solving unseen levels within a game. This work aims to study generalization within a physics-inspired simulated setting, and creates out-of-distribution challenges by hand-designing a set of unseen test tasks.



); Tobin et al. (2017); Risi & Togelius (2020); Khalifa et al. (

. Team et al. (2021); Yu et al. (2020); Cobbe et al. (2020) describe task distributions focused on meta-learning, and Fan et al. (2022); Suarez et al. (2019); Hafner (2021); Perez-Liebana et al. (

