LEARNING ALGEBRAIC REPRESENTATION FOR ABSTRACT SPATIAL-TEMPORAL REASONING Anonymous authors Paper under double-blind review

Abstract

Is intelligence realized by connectionist or classicist? While connectionist approaches have achieved superhuman performance, there has been growing evidence that such task-specific superiority is particularly fragile in systematic generalization. This observation lies in the central debate (Fodor et al., 1988; Fodor & McLaughlin, 1990) between connectionist and classicist, wherein the latter continually advocates an algebraic treatment in cognitive architectures. In this work, we follow the classicist's call and propose a hybrid approach to improve systematic generalization in reasoning. Specifically, we showcase a prototype with algebraic representations for the abstract spatial-temporal reasoning task of Raven's Progressive Matrices (RPM) and present the ALgebra-Aware Neuro-Semi-Symbolic (ALANS 2 ) learner. The ALANS 2 learner is motivated by abstract algebra and the representation theory. It consists of a neural visual perception frontend and an algebraic abstract reasoning backend: the frontend summarizes the visual information from object-based representations, while the backend transforms it into an algebraic structure and induces the hidden operator on-the-fly. The induced operator is later executed to predict the answer's representation, and the choice most similar to the prediction is selected as the solution. Extensive experiments show that by incorporating an algebraic treatment, the ALANS 2 learner outperforms various pure connectionist models in domains requiring systematic generalization. We further show that the algebraic representation learned can be decoded by isomorphism and used to generate an answer.

1. INTRODUCTION

"Thought is in fact a kind of Algebra." -William James (James, 1891) Imagine you are given two alphabetical sequences of "c, b, a" and "d, c, b", and asked to fill in the missing element in "e, d, ?". In nearly no time will one realize the answer to be c. However, more surprising for human learning is that, effortlessly and instantaneously, we can "freely generalize" (Marcus, 2001) the solution to any partial consecutive ordered sequences. While believed to be innate in early development for human infants (Marcus et al., 1999) , such systematic generalizability has constantly been missing and proven to be particularly challenging in existing connectionist models (Lake & Baroni, 2018; Bahdanau et al., 2019) . In fact, such an ability to entertain a given thought and semantically related contents strongly implies an abstract algebra-like treatment (Fodor et al., 1988) ; in literature, it is referred to as the "language of thought" (Fodor, 1975) , "physical symbol system" (Newell, 1980), and "algebraic mind" (Marcus, 2001) . However, in stark contrast, existing connectionist models tend only to capture statistical correlation (Lake & Baroni, 2018; Kansky et al., 2017; Chollet, 2019) , rather than providing any account for a structural inductive bias where systematic algebra can be carried out to facilitate generalization. This contrast instinctively raises a question-what constitutes such an algebraic inductive bias? We argue that the foundation of the modeling counterpart to the algebraic treatment in early human development (Marcus, 2001; Marcus et al., 1999) lies in algebraic computations set up on mathematical axioms, a form of formalized human intuition and the starting point of modern mathematical reasoning (Heath et al., 1956; Maddy, 1988) . Of particular importance to the basic building blocks of algebra is the Peano Axiom (Peano, 1889). In the Peano Axiom, the essential components of algebra, the algebraic set and corresponding operators over it, are governed by three statements: (1) the existence of at least one element in the field to study ("zero" element), (2) a successor function that is recursively applied to all elements and can, therefore, span the entire field, and (3) the principle of mathematical induction. Building on such a fundamental axiom, we begin to form the notion of an algebraic set and induce the operator along with it to construct an algebraic structure. We hypothesize that such a treatment of algebraic computations set up on fundamental axioms is essential for a model's systematic generalizability, the lack of which will only make it sub-optimal. To demonstrate the benefits of such an algebraic treatment in systematic generalization, we showcase a prototype for Raven's Progressive Matrices (RPM) (Raven, 1936; Raven & Court, 1998) , an exemplar task for abstract spatial-temporal reasoning (Santoro et al., 2018; Zhang et al., 2019a) . In this task, an agent is given an incomplete 3 ˆ3 matrix consisting of eight context panels with the last one missing, and asked to pick one answer from a set of eight choices that best completes the matrix. Human's reasoning capability of solving this abstract reasoning task has been commonly regarded as an indicator of "general intelligence" (Carpenter et al., 1990) and "fluid intelligence" (Spearman, 1923; 1927; Hofstadter, 1995; Jaeggi et al., 2008) . In spite of the task being one that ideally requires abstraction, algebraization, induction, and generalization (Raven, 1936; Raven & Court, 1998; Carpenter et al., 1990) , recent endeavors unanimously propose pure connectionist models that attempt to circumvent such intrinsic cognitive requirements (Santoro et al., 2018; Zhang et al., 2019a; b; Wang et al., 2020; Zheng et al., 2019; Hu et al., 2020; Wu et al., 2020) . However, these methods' inefficiency is also evident in systematic generalization; they struggle to extrapolate to domains beyond training, as pointed out in (Santoro et al., 2018; Zhang et al., 2019b) and shown later in this paper. To address the issue, we introduce the ALgebra-Aware Neuro-Semi-Symbolic (ALANS 2 ) learner. At a high-level, the ALANS 2 learner is embedded in a general neuro-symbolic architecture (Yi et al., 2018; Mao et al., 2019; Han et al., 2019; Yi et al., 2020) but has on-the-fly operator learnability and hence semi-symbolic. Specifically, it consists of a neural visual perception frontend and an algebraic abstract reasoning backend. For each RPM instance, the neural visual perception frontend first slides a window over each panel to obtain the object-based representations (Kansky et al., 2017; Wu et al., 2017) for every object. A belief inference engine latter aggregates all object-based representations in each panel to produce the probabilistic belief state. The algebraic abstract reasoning backend then takes the belief states of the eight context panels, treats them as snapshots on an algebraic structure, lifts them into a matrix-based algebraic representation built on the Peano Axiom and the representation theory (Humphreys, 2012), and induces the hidden operator in the algebraic structure by solving an inner optimization (Colson et al., 2007; Bard, 2013) . The algebraic representation for the answer is predicted by executing the induced operator: its corresponding set element is decoded by isomorphism established in the representation theory, and the final answer is selected as the one most similar to the prediction. The ALANS 2 learner enjoys several benefits in abstract reasoning with an algebraic treatment: 1. Unlike previous monolithic models, the ALANS 2 learner offers a more interpretable account of the entire abstract reasoning process: the neural visual perception frontend extracts object-based representations and produces belief states of panels by explicit probability inference, whereas the algebraic abstract reasoning backend induces the hidden operator in the algebraic structure. The corresponding representation for the final answer is obtained by executing the induced operator, and the choice panel with minimum distance is selected. This process much resembles the topdown bottom-up strategy in human reasoning: humans reason by inducing the hidden relation, executing it to generate a feasible solution in mind, and choosing the most similar answer available (Carpenter et al., 1990) . Such a strategy is missing in recent literature (Santoro et al., 2018; Zhang et al., 2019a; b; Wang et al., 2020; Zheng et al., 2019; Hu et al., 2020; Wu et al., 2020) . 2. While keeping the semantic interpretability and end-to-end trainability in existing neurosymbolic frameworks (Yi et al., 2018; Mao et al., 2019; Han et al., 2019; Yi et al., 2020) , ALANS 2 is what we call semi-symbolic in the sense that the symbolic operator can be learned and concluded on-the-fly without manual definition for every one of them. Such an inductive ability also enables a greater extent of the desired generalizability. 3. By decoding the predicted representation in the algebraic structure, we can also generate an answer that satisfies the hidden relation in the context. This work makes three major contributions: (1) We propose the ALANS 2 learner. Compared to existing monolithic models, the ALANS 2 learner adopts a neuro-semi-symbolic design, where the problem-solving process is decomposed into neural visual perception and algebraic abstract reasoning. (2) To demonstrate the efficacy of incorporating an algebraic treatment in abstract spatialtemporal reasoning, we show the superior systematic generalization ability of the proposed ALANS 2 learner in various extrapolatory RPM domains. (3) We present analyses into both neural visual perception and algebraic abstract reasoning. We also show the generative potential of ALANS 2 .

