THE GAME OF HIDDEN RULES: A NEW CHALLENGE FOR MACHINE LEARNING Anonymous authors Paper under double-blind review

Abstract

Systematic examination of learning tasks remains an important but understudied area of machine learning (ML) research. To date, most ML research has focused on measuring performance on new tasks or surpassing state of the art performance on existing tasks. These efforts are vital but do not explain why some tasks are more difficult than others. Understanding how task characteristics affect difficulty is critical to formalizing ML's strengths and limitations; a rigorous assessment of which types of tasks are well-suited to a specific algorithm and, conversely, which algorithms are well-suited to a specific task would mark an important step forward for the field. To assist researchers in this effort, we introduce a novel learning environment designed to study how task characteristics affect measured difficulty for the learner. This tool frames learning tasks as a "board-clearing game," which we call the Game of Hidden Rules (GOHR). In each instance of the game, the researcher encodes a specific rule, unknown to the learner, that determines which moves are allowed at each state of the game. The learner must infer the rule through play. We detail the game's expressive rule syntax and show how it gives researchers granular control over learning tasks. We present example rules, an example ML algorithm, and methods to assess algorithm performance. Separately, we provide additional benchmark rules, a public leaderboard for performance on these rules, and documentation for installing and using the GOHR environment.

1. INTRODUCTION

Learning computational representations of rules has been one of the main objectives of the field of machine learning (ML) since its inception. In contrast to pattern recognition and classification (the other main domains of ML), rule learning is concerned with identifying a policy or computational representation of the hidden process by which data has been generated. These sorts of learning tasks have been common in applications of ML to real world settings such as biological research (Khatib et al., 2011) , imitation learning (Hussein et al., 2018) , and game play (Mnih et al., 2015; Silver et al., 2018) . Since this process involves sequential experimentation with the system, much of the recent work exploring rule learning has focused on using reinforcement learning (RL) for learning rules as optimal policies of Markov decision processes. An important question is whether some characteristics make particular rules easier or harder to learn by a specific algorithm (or in general). To date, this has been a difficult question to answer, since many rules of interest in the real world are multifaceted and not well characterized. For instance, while there are effective RL algorithms that can play backgammon, chess, and go, these games differ in significant ways and it is not clear how much each structural variation contributes to differences in overall difficulty for the learner. In order to investigate these questions, new ways of generating rules and data must be devised that allow for researchers to examine these characteristics in a controlled environment. In this paper, we propose a new data environment called the Game of Hidden Rules (GOHR), which aims to help researchers in this endeavor. The main component of the environment is a game played in a 6 × 6 board with game pieces of different shapes and colors. The task of the learner is to clear the board in each round by moving the game pieces to "buckets" at the corners of the board according to a hidden rule, known to the researcher but not to the learner. Our environment allows researchers to express a hidden rule using a rich syntax that can map to many current tasks of interest in both the classification and RL settings. A key advantage of our environment is that researchers can control each aspect of the hidden rules and test them at a granular level, allowing for experiments that determine exactly which characteristics make some learning tasks harder than others and which algorithms are better at learning specific types of rules. The rest of the paper proceeds as follows. We first describe how our environment relates to other data-generating environments from the literature in Section 2. In Section 3, we describe the GOHR and its rule syntax, explaining how the environment can be used to investigate the effects of rule structure. We introduce our ML competition and refer readers to benchmark rules, instructions, and documentation available at our public site in Section 4. In Section 5, we present example rules and analysis for an example algorithm. Finally, we conclude with some discussion on the implications of our results and on other questions that can be answered by the GOHR environment in Section 6.

2. LITERATURE REVIEW

Games have historically served as rich benchmark environments for RL, with novel challenges in each game environment spurring progress for the field as a whole. RL has tackled increasingly complex classical board games, such as backgammon, chess, shogi, and go (Tesauro, 1994; Campbell et al., 2002; Silver et al., 2016; 2017; 2018) , eventually surpassing the performance of human experts in each. Of late, video-game environments have also become drivers of progress in RL. Beginning with human-level performance in Atari 2600 games (Mnih et al., 2015; Badia et al., 2020) , machine players have become competitive with humans in a variety of environments, including real-time-strategy and multiplayer games such as Quake III, StarCraft II, and Dota 2 (Jaderberg et al., 2019; OpenAI et al., 2019; Vinyals et al., 2017; 2019) . Instrumental in this progress has been the growing space of benchmark environments and supporting tools, such as the Arcade Learning Environment (Bellemare et al., 2013) , General Video Game AI (Perez-Liebana et al., 2016 ), OpenAI Gym (Brockman et al., 2016 ), Gym Retro (Nichol et al., 2018 ), Obstacle Tower (Juliani et al., 2019 ), Procgen (Cobbe et al., 2020) , and NetHack environments (Küttler et al., 2020; Samvelyan et al., 2021) . Taken together, these represent an impressive range of new tasks for RL agents, bridging many gaps and challenges in achieving aspects of artificial intelligence. Benchmarks such as classic board and video games offer human-relatable assessments of ML but are not readily configurable. This makes it challenging to use them for the systematic study of task characteristics and their impact on learning. Recent literature has proposed variations on these environments, such as alternative forms of chess (Tomašev et al., 2022) . While these environments provide additional insights, they are variations of already complex systems and do not provide the granularity needed to study the impact of task characteristics on learning. We propose the use of the GOHR environment to study these questions since it is constructed specifically for configurability. GOHR distinguishes itself as a useful environment in four important ways. First, each hidden rule represents a deterministic mapping between the game's pieces and the four available buckets, allowing for clear distinctions between the characteristics of each learning task. Second, the game's rule syntax introduces a vast space of hidden rules for study, ranging from trivial to complex. Third, the rule syntax allows for fine variations in task definition, enabling experiments that study controlled differences in learning tasks. Fourth, encoding different hidden rules does not affect the learning environment. In contrast to existing benchmarking tools, this decouples the study of different learning tasks from associated changes to the environment itself, making it possible to compare the effects of task characteristics under identical environmental conditions.

3. GAME OF HIDDEN RULES

In this section we describe the GOHR's structure, syntax, and the expressivity of the rule language. In each episode of our game, the player is presented with a game board containing game pieces, each drawn from a configurable set of shapes and colors. The player's objective is to remove game pieces from play by dragging and dropping them into buckets located at each corner of the game board. A hidden rule, unknown to the player, determines which pieces may be placed into which buckets at a given point in the game play. For instance, a rule might assign game pieces to specific buckets based on their shape or color. If the player makes a move permitted by the rule, the corresponding game piece is removed from play; otherwise, it remains in its original location. The episode concludes

