CONCEPTUAL SCAN: LEARNING WITH AND ABOUT RULES Anonymous authors Paper under double-blind review

Abstract

The ability to learn from a mix of rules and examples and to reflect on the learned abstractions is an important aspect of human intelligence. At the same time, there is a lack of benchmarks that systematically test for this ability, which makes it hard to evaluate the degree to which it is present in state-of-the-art ML architectures. We introduce a novel task format for such benchmarks by using an example structure that allows us to explicitly provide and ask about rules that are relevant for the given task. We present a simple dataset illustrating this format, and we use it to analyze the performance of a variety of T5-based ML models. We identify three challenge areas in this setup: maintaining consistency between learned rules and their application, scaling to larger rule sets, and compositional generalization.

1. INTRODUCTION

Machine learning algorithms are typically designed to be able to learn functions from examples. This is a very general paradigm, but it does not explicitly capture some aspects of human learning. Humans, in contrast, are able to learn both by being shown examples of the task to accomplish and by being told rules or instructions about this task. They can even provide relevant rules to others once they have learned the task from examples. As a realistic illustration of this ability, consider the task of a personal assistant who, among other things, is expected to make movie recommendations based on the age and interests of a user. Even for a task such as this that would currently be considered a standard use case for an example-based recommender system, as humans, we do not learn how to perform this task exclusively by observing examples of movie recommendations. Instead, we can accomplish this task much more efficiently by also taking into account relevant knowledge in the form of rules that have been communicated to us explicitly, i.e., by "learning with rules". For recommending a movie to a girl called Anna, we may, among others, use the rules (and facts, which we consider a special case of rules) illustrated on the left side of Figure 1 . In addition to the ultimate goal of providing movie recommendations (e.g., "What movie could Anna watch?"), we would also expect a human to be able to answer the intermediate questions shown on Figure 1 : Personal assistants answer questions using knowledge consisting of rules and facts. Note that the last knowledge bullet point above can also be construed as an example of the underlying "movie recommendation" task, while the other bullet points represent other relevant knowledge. The first bullet point is a traditional "rule" that states conditional knowledge that can apply to many different movies. The second is a concept definition, which can be equivalently construed as a rule relating two different pieces of information about a person. The other bullet points are facts stated at varying levels of granularity. the right side of Figure 1 -i.e., to "learn about rules" that are relevant to the ultimate task -and we would expect these questions to be answered consistently w.r.t. the provided knowledge and ultimate recommendation. These questions allow us to introspect the understanding of the assistant, e.g., to debug why a movie recommendation was not as expected. Similar interactions between learning from examples and learning with and about rules can also be observed in simpler synthetic tasks. Consider, for instance, the SCAN task of Lake & Baroni (2017), which our work builds on. This task requires the learner to translate natural language commands (such as "jump left twice") to corresponding actions (such as "LTURN JUMP LTURN JUMP"). The learner is presented with a certain subset of some thousands of (command, action sequence) pairs during training and is then expected to translate unseen commands. This focus on learning purely from examples, while typical of most traditional ML tasks, differs from the way one would "teach" a human the task, and indeed how the authors of the SCAN paper "teach" the task to their readers. On the one hand, while humans are also adept at guessing rules from examples, rather than depending on thousands of examples, they can often grasp the relevant rule from just a handful of examples (Lake et al., 2015), as we as readers may find ourselves doing when seeing the handful of illustrative examples of SCAN provided by the authors in the paper figures. More fundamentally, however, rather than expecting readers to learn the translation function purely from examples, the authors provide this function in a much more direct and efficient fashion using a set of interpretation rules like those in Figure 2 . The explicit nature of the provided rules has the additional advantage that it allows us to deduce the translation of a given command by applying the translation rules rather than having to always speculatively induce the translation by generalizing from a set of examples. In this paper, we introduce conceptual learning tasks (CLTs), which are a type of learning task that is specifically designed to evaluate the combination of such inductive and deductive learning, and we make the following contributions: • We define the notion of a CLT (Section 2). • We present a first simple instance of a CLT called Conceptual SCAN (cSCAN), which is a synthetically-constructed conceptual learning variation of SCAN (Section 3). • We formalize metrics to measure a learner's performance on cSCAN, including a novel measurement of consistency between learned rules and their application (Section 4). • We analyze the performance of baseline ML architectures on cSCAN and identify three challenge areas: consistency, rule set size, and compositional generalization (Section 6). • We provide the code used in generating cSCAN, constructing compositional generalization splits, and calculating consistency from experiment results.foot_0 

2. CONCEPTUAL LEARNING TASKS (CLTS)

2.1 DESIRED PROPERTIES Motivated by the use case from the introduction and our goal of evaluating learning with and about rules, we interest ourselves in tasks with the following properties, which we formalize in the following section. 1. Context. The learner answers requests based on some explicit knowledge (context), which consists of "examples" that directly determine the replies of certain requests and "rules" that provide indirect information to do so. Part of this knowledge varies across contexts (e.g., transient knowledge about Anna's preferences or country-specific rules about movie ratings). 2. Request. In addition to requests corresponding to the ultimate goal, (e.g., "What movie could Anna watch?"), we can ask the learner whether an intermediate rule holds given the context (e.g., "Is Jerry Maguire appropriate for 14-year-olds?"). This allows us to test whether the learner "understands" the rules by checking for consistency between rules that the learner claims to be true (or false) and their application to answering the ultimate requests.



To be released on GitHub upon paper acceptance. For an overview, see Appendices F and G.



walk =WALK x1 left =LTURN x1 x1 twice = x1 x1 ...

Figure 2: Examples of SCAN interpretation rules from Lake & Baroni (2017).

