HALMA: HUMANLIKE ABSTRACTION LEARNING MEETS AFFORDANCE IN RAPID PROBLEM SOLVING

Abstract

Humans learn compositional and causal abstraction, i.e., knowledge, in response to the structure of naturalistic tasks. When presented with a problem-solving task involving some objects, toddlers would first interact with these objects to reckon what they are and what can be done with them. Leveraging these concepts, they could understand the internal structure of this task, without seeing all of the problem instances. Remarkably, they further build cognitively executable strategies to rapidly solve novel problems. To empower a learning agent with similar capability, we argue there shall be three levels of generalization in how an agent represents its knowledge: perceptual, conceptual, and algorithmic. In this paper, we devise the very first systematic benchmark that offers joint evaluation covering all three levels. This benchmark is centered around a novel task domain, HALMA, for visual concept development and rapid problem solving. Uniquely, HALMA has a minimum yet complete concept space, upon which we introduce a novel paradigm to rigorously diagnose and dissect learning agents' capability in understanding and generalizing complex and structural concepts. We conduct extensive experiments on reinforcement learning agents with various inductive biases and carefully report their proficiency and weakness. 1

1. INTRODUCTION

Have you ever heard of Super Halma,foot_1 a fast-paced variant of Halma? In case you have not played Halma or its fast-paced variant before, we briefly introduce both of them here. Halma is a strategic board game, also known as Chinese checkers. The rules of Halma are minimal; it can be perspicuously explained using basic concepts of numbers and arithmetic. To win the game, one needs to transport pawns initially in one's own camp into the target camp. In each turn, a player could either move into an empty adjacent hole and end the play, or jump over an adjacent pawn, place on the opposite side of the jumped pawn, and recursively apply this jump rule till the end of the play. While the standard rules allow hopping over only a single adjacent occupied position at a time, Super Halma allows pieces to catapult over multiple adjacent occupied positions in a line when hopping; see an illustration in Fig. 1 . We will use the term Halma to specifically refer to Super Halma in the remainder of the paper. Now, imagine you are teaching your preschool cousin, Ada, to play Halma. Since she has not yet formed a complete notion of natural numbers or arithmetic, verbally explaining the rules to her will render in vain. Alternatively, you can play with her while providing scarce supervisions, e.g., if a move is allowed; you can even reward her when she successfully moves a pawn to the target camp. By the time Ada could independently and rapidly solve unseen scenarios, we would know she has mastered the game. How many scenarios do you think Ada has to play before achieving this goal? This Halma playing task is quintessential in the open-ended world; its environment is a minimal yet complete playground to test the rapid problem-solving capability of a learning agent. Under limited exposure to the underlying structure of the complex and immense concept space, we humans, by observing and interacting with entities, could form abstract concepts of "what it is" and "what can be done with it." The former one is dubbed semantics (Jackendoff, 1983) and the latter affordance (Gibson, 1986) . These abstract concepts, once accepted as knowledge, generalize robustly over scenarios; they are considered as milestones of human evolution in abstract reasoning and general problem solving (Holyoak et al., 1996) . In the case of Halma playing task, Ada would be able to solve unseen scenarios within no time if she were able to master (i) the abstract concept of natural numbers, emerged from and grounded to visual stimuli, (ii) both valid and invalid actions, and (iii) causal relations and potential outcomes risen from the grounded natural numbers and valid actions. What is the proper machinery to learn these generalizable concepts from scarce supervisions? By scarce supervision, we mean the way to provide supervision is akin to how you teach Ada; one only provides sparse and indirect feedback without direct rules or dense annotations. By generalizable concepts, we emphasize more than the competence of memorization and interpolation; the learned representation ought to appropriately extrapolate and generalize in out-of-distribution scenarios. Such a superb generalization capability is often regarded as one of the celebrated signatures of human intelligence (Lake et al., 2015; Marcus, 2018; Lake & Baroni, 2018) ; it is attributed to rich compositional and casual structures in human mind (Fodor et al., 1988) . Inspired by these observations, in this work, we quest for a computational framework to learn abstract concepts emerged in challenging and interactive problem-solving tasks, with a humanlike generalization capability: The learned abstract knowledge should be easily transferred to out-of-distribution scenarios. The general context of interactive problem solving poses extra challenges over classic settings of concept learning; instead of merely emerging concepts, it further demands the learning agent to leverage such emerged concepts for decision-making and planning. Ada, after understanding semantics and affordance in Halma, can effortlessly perceive and parse novel scenarios (Zhu et al., 2020) . Yet, she would still struggle in strategically playing the game as she needs to decide among multiple affordable moves. In essence, the central question is: If conceptual knowledge can generalize as such, what meta-benefits does it offer on solving unseen problems (Schmidhuber et al., 1996) ? The classic decision-making account of these meta-benefits would be: Leveraging knowledge, we can develop cognitively executable strategies with high planning (Sanner, 2008) 2019) hypothesize that modern reinforcement learning agents, incentivized by these meta-benefits, have already discovered such algorithms. However, to date, their argument is still speculative since these agents have not been evaluated in tasks with rich internal structures yet limited exposure (Lake et al., 2017; Kansky et al., 2017) . A diagnosis benchmark for generalization capability is thus in demand to bridge communities of concept development and decision-making. The main contribution of this paper is a Halma-inspired competence benchmark: Humanlike Abstraction Learning Meets Affordance (HALMA). We rigorously devise HALMA with three levels of generalization in visual concept development and rapid problem solving; see details in Section 2. HALMA is unique in its minimum yet complete concept spaces, a miniature of compositional and causal structures in human knowledge. It dynamically generates test problems to informatively evaluate learning agents' capability in out-of-distribution scenarios under limited exposure. We conduct extensive experiments with reinforcement learning agents to benchmark proficiency and weakness.

2. THREE LEVELS OF GENERALIZATION

Our motivations might seem, prima facie, bold. To convince readers and support our optimism, we summarize some recent progress in this section. In particular, we provide a taxonomy of three levels of generalization on a competency basis. Indeed, generalization is a multifaceted phenomenon. Previous evaluations for generalization were predominantly defined in a statistical sense, following the classical paradigm of train-evaluation-test random split (Cobbe et al., 2019) while ignoring internal structures. However, we argue this classical paradigm should not be the only objective approach wherein agents can or should generalize beyond their experience (Barrett et al., 2018) , especially if our goal is to construct humanlike general-purpose problem-solving agents (Lake et al., 2017).



We will make HALMA and tested agents publicly accessible upon publication. See https://en.wikipedia.org/wiki/Chinese_checkers#Variants for details.



Figure 1: Illustration of the Super Halma playing task. By playing the game with scarce supervision, Ada should be able to learn basic concepts of numbers and arithmetic, such as concepts with both (a) valid and (b) invalid actions (jumps).

