META-LEARNING OF STRUCTURED TASK DISTRIBU-TIONS IN HUMANS AND MACHINES

Abstract

In recent years, meta-learning, in which a model is trained on a family of tasks (i.e. a task distribution), has emerged as an approach to training neural networks to perform tasks that were previously assumed to require structured representations, making strides toward closing the gap between humans and machines. However, we argue that evaluating meta-learning remains a challenge, and can miss whether meta-learning actually uses the structure embedded within the tasks. These metalearners might therefore still be significantly different from humans learners. To demonstrate this difference, we first define a new meta-reinforcement learning task in which a structured task distribution is generated using a compositional grammar. We then introduce a novel approach to constructing a "null task distribution" with the same statistical complexity as this structured task distribution but without the explicit rule-based structure used to generate the structured task. We train a standard meta-learning agent, a recurrent network trained with modelfree reinforcement learning, and compare it with human performance across the two task distributions. We find a double dissociation in which humans do better in the structured task distribution whereas agents do better in the null task distribution -despite comparable statistical complexity. This work highlights that multiple strategies can achieve reasonable meta-test performance, and that careful construction of control task distributions is a valuable way to understand which strategies meta-learners acquire, and how they might differ from humans.

1. INTRODUCTION

While machine learning has supported tremendous progress in artificial intelligence, a major weakness -especially in comparison to humans -has been its relative inability to learn structured representations, such as compositional grammar rules, causal graphs, discrete symbolic objects, etc. (Lake et al., 2017) . One way that humans acquire these structured forms of reasoning is via "learning-to-learn", in which we improve our learning strategies over time to give rise to better reasoning strategies (Thrun & Pratt, 1998; Griffiths et al., 2019; Botvinick et al., 2019) . Inspired by this, researchers have renewed investigations into meta-learning. Under this approach, a model is trained on a family of learning tasks based on structured representations such that they achieve better performance across the task distribution. This approach has demonstrated the acquisition of sophisticated abilities including model-based learning (Wang et al., 2016) , causal reasoning (Dasgupta et al., 2019) , compositional generalization (Lake, 2019), linguistic structure (McCoy et al., 2020) , and theory of mind (Rabinowitz et al., 2018) , all in relatively simple neural network models. The meta-learning approach, along with interaction with designed environments, has also been suggested as a general way to automatically generate artificial intelligence (Clune, 2019). These approaches have made great strides, and have great promise, toward closing the gap between human and machine learning. However, in this paper, we argue that significant challenges remain in how we evaluate whether structured forms of reasoning have indeed been acquired. There are often multiple strategies that can result in good meta-test performance, and there is no guarantee a priori that meta-learners will learn the strategies we intend when generating the training distribution. Previous work on metalearning structured representations do partially acknowledge this. In this paper, we highlight these challenges more generally. At the end of the day, meta-learning is simply another learning problem. And similar to any vanilla learning algorithm, meta-learners themselves have inductive biases (which we term meta-inductive bias). Note that meta-learning is a way to learn inductive biases for vanilla learning algorithms Grant et al. (2018) . Here, we consider the fact the meta-learners themselves have inductive biases that impact the kinds of strategies (and inductive biases) they prefer to learn. In this work, the kind of structure we study is that imposed by compositionality, where simple rules can be recursively combined to generate complexity (Fodor et al., 1988) . Previous work demonstrates that some aspects of compositionality can be meta-learned (Lake, 2019). Here, we introduce a broader class of compositionally generated task environments using an explicit generative grammar, in an interactive reinforcement learning setting. A key contribution of our work is to also develop control task environments that are not generated using the same simple recursively applied rules, but are comparable in statistical complexity. We provide a rigorous comparison between human and meta-learning agent behavior in tasks performed in distributions of environments of each type. We show through three different analyses that human behavior is consistent with having learned the structure that results from our compositional rules in the structured environments. In contrast, despite training on distributions that contain this structure, standard meta-learning agents instead prefer (i.e. have a meta-inductive bias toward) more global statistical patterns that are a downstream consequence of these low-dimensional rules. Our results show that simply doing well at meta-test on a tasks in a distribution of structured environments does not necessarily indicate meta-learning of that structure. We therefore argue that architectural inductive biases still play a crucial role in the kinds of structure acquired by meta-learners, and simply embedding the requisite structure in a training task distribution may not be adequate.

2. EMBEDDING STRUCTURE IN A TASK DISTRIBUTION

In this work, we define a broad family of task distributions in which tasks take place in environments generated from abstract compositional structures, by recursively composing those environments using simple, low-dimensional rules. Previous work on such datasets (Lake & Baroni, 2018; Johnson et al., 2017) focuses primarily on language. Here we instead directly consider the domain of structure learning. This is a fundamental tenet of human cognition and has been linked to how humans learn quickly in novel environments (Tenenbaum et al., 2011; Mark et al., 2020) . Structure learning is required in a vast range of domains: from planning (understanding an interrelated sequence of steps for cooking), category learning (the hierarchical organization of biological species), to social inference (understanding a chain of command at the workplace, or social cliques in a high school). A task distribution based on structure learning can therefore be embedded into several domains relevant for machine learning. Kemp & Tenenbaum (2008) provide a model for how people infer such structure. They present a probabilistic context-free graph grammar that produces a space of possible structures, over which humans do inference. A grammar consists of a start symbol S, terminal and non-terminal symbols Σ and V , as well as a set of production rules R. Different structural forms arise from recursively applying these production rules. This framework allows us to specify abstract structures (via the grammar) and to produce various instantiations of this abstract structure (via the noisy generation process), naturally producing different families of task environments, henceforth referred to as task distributions. We consider three structures: chains, trees, and loops. These exist in the real world across multiple domains. Chains describe objects on a one-dimensional spectrum, like people on the left-right political spectrum. Trees describe objects organized in hierarchies, like evolutionary trees. Loops describe cycles, like the four seasons. Here we embed these structures into a grid-based task. Exploration on a grid is an extensively studied problem in machine learning, particularly in reinforcement learning. Further, it is also a task that is easy for humans to perform on online crowdsourcing platforms -but not trivially so. This allows us to directly compare human and machine performance on the same task. Fig. 1 displays the symbols of the grammar we use and the production rules that give rise to grids of different structural forms.

