OCIM: OBJECT-CENTRIC COMPOSITIONAL IMAGINA-TION FOR VISUAL ABSTRACT REASONING

Abstract

A long-sought property of machine learning systems is the ability to compose learned concepts in novel ways that would enable them to make sense of new situations. Such capacity for imagination -a core aspect of human intelligence -is not yet attained for machines. In this work, we show that object-centric inductive biases can be leveraged to derive an imagination-based learning framework that achieves compositional generalization on a series of tasks. Our method, denoted Object-centric Compositional Imagination (OCIM), decomposes visual reasoning tasks into a series of primitives applied to objects without using a domain-specific language. We show that these primitives can be recomposed to generate new imaginary tasks. By training on such imagined tasks, the model learns to reuse the previously-learned concepts to systematically generalize at test time. We test our model on a series of arithmetic tasks where the model has to infer the sequence of operations (programs) applied to a series of inputs. We find that imagination is key for the model to find the correct solution for unseen combinations of operations.

1. INTRODUCTION

Humans have the remarkable ability to adapt to new unseen environments with little experience (Lake et al., 2017) . In contrast, machine learning systems are sensitive to distribution shifts (Arjovsky et al., 2019; Su et al., 2019; Engstrom et al., 2019) . One of the key aspects that makes human learning so robust is the ability to produce or acquire new knowledge by composing few learned concepts in novel ways, an ability known as compositional generalization (Fodor and Pylyshyn, 1988; Lake et al., 2017) . Although the question of how to achieve such compositional generalization in brains or machines is an active area of research (Ruis and Lake, 2022), a promising hypothesis is that dreams are a crucial element (Hoel, 2021) through the Overfitted Brain Hypothesis (OBH). Both imagination and abstraction are core to human intelligence. Objects in particular are an important representation used by the human brain when applying analogical reasoning (Spelke, 2000) . For instance, we can infer the properties of a new object by transferring our knowledge of these properties from similar objects (Mitchell, 2021) . This realization has inspired a recent body of work that focuses on learning models that discover objects in a visual scene without supervision (Eslami et al., 2016b; Kosiorek et al., 2018; Greff et al., 2017; van Steenkiste et al., 2018; Greff et al., 2019; Burgess et al., 2019; van Steenkiste et al., 2019; Locatello et al., 2020) . Many of these works propose several inductive biases that lead to a visual scene decomposition in terms of its constituting objects. The expectation is that such an object-centric decomposition would lead to better generalization since it better represents the underlying structure of the physical world (Parascandolo et al., 2018) . To the best of our knowledge, the effect of object-centric representations for systematic generalization in visual reasoning tasks remains largely unexplored. While abstractions, like objects, allow for reasoning and planning beyond direct experience, novel configurations of experienced concepts are possible through imagination. Hoel (2021) goes even further and posits that dreaming, which is a form of imagination, improves the generalization and robustness of learned representations. Dreams do so by producing new perceptual events that are composed of concepts experienced/learned during wake-time. These perceptual events can be described by two knowledge types (Goyal et al., 2020; 2021b) : the declarative knowledge encoding object states (e.g. entities that constitute the dreams), and the procedural knowledge encoding how they behave and interact with each other (e.g. how these entities are processed to form the percep-

