NEURAL-SYMBOLIC RECURSIVE MACHINE FOR SYSTEMATIC GENERALIZATION

Abstract

Despite the tremendous success, existing machine learning models still fall short of human-like systematic generalization-learning compositional rules from limited data and applying them to unseen combinations in various domains. We propose Neural-Symbolic Recursive Machine (NSR) to tackle this deficiency. The core representation of NSR is a Grounded Symbol System (GSS) with combinatorial syntax and semantics, which entirely emerges from training data. NSR implements a modular design for neural perception, syntactic parsing, and semantic reasoning, which are jointly learned by a deduction-abduction algorithm. We prove that NSR is expressive enough to model various sequence-to-sequence tasks. Superior systematic generalization is achieved via the inductive biases of equivariance and recursiveness embedded in NSR. In experiments, NSR achieves state-of-the-art performance in three benchmarks from different domains: SCAN for semantic parsing, PCFG for string manipulation, and HINT for arithmetic reasoning. Specifically, NSR achieves 100% generalization accuracy on SCAN and PCFG and outperforms state-of-the-art models on HINT by about 23%. Our NSR demonstrates stronger generalization than pure neural networks due to its symbolic representation and inductive biases. NSR also demonstrates better transferability than existing neural-symbolic approaches due to less domain-specific knowledge required.

1. INTRODUCTION

A remarkable property underlying human intelligence is its systematic compositionality: the algebraic capacity to interpret an infinite number of novel combinations from finite known components (Chomsky, 1957) -"infinite use of finite means" (Chomsky, 1965) . This type of compositionality is central to the human ability to generalize from limited data to novel combinations (Lake et al., 2017) . Recently, several datasets have been proposed to test systematic generalization of machine learning models-SCAN (Lake & Baroni, 2018) , PCFG (Hupkes et al., 2020) , CFQ (Keysers et al., 2020), and HINT (Li et al., 2021) , to name a few. While conventional neural networks fail dramatically on these datasets, certain inductive biases have been explored to improve systematic generalization. Csordás et al. (2021); Ontanón et al. (2022) improve Transformers' generalization performance by using relative positional encoding and sharing weights between layers. Chen et al. ( 2020) introduce a neural-symbolic stack machine to achieve nearly perfect accuracy on SCAN-like datasets. Despite the improved performance, these neural-symbolic methods often require domain-specific knowledge to design non-trivial symbolic components and are difficult to transfer to other domains. To achieve human-like systematic generalization in a wide range of domains, we propose Neural-Symbolic Recursive Machine (NSR), which integrates the joint learning of perception, syntax, and semantics in a principled framework. The core representation of NSR is a Grounded Symbol System (GSS) (see Fig. 1 ), which entirely emerges from training data without domain-specific knowledge. NSR implements a modular design for neural perception, syntactic parsing, and semantic reasoning. Specifically, we first utilize a neural network as the perception module to ground symbols on the raw inputs. Next, the symbols are parsed into a syntax tree of the Grounded Symbol System by a transition-based neural dependency parser (Chen & Manning, 2014) . Finally, we adopt functional programs to realize the semantic meaning of symbols (Ellis et al., 2021) . Theoretically, we show that the proposed NSR is expressive enough to model various sequence-to-sequence tasks. Critically, the inductive biases of equivariance and recursiveness, encoded in each module, enable NSR to break down the long input into small components, process them progressively, and compose the results, encouraging the model to learn meaningful symbols and their compositional rules. Such inductive biases are the crux of NSR's superb systematic generalization. It is challenging to optimize NSR in an end-to-end fashion since annotations for the internal GSS are oftentimes unavailable and NSR is not fully differentiable. To tackle this issue, we present a probabilistic learning framework and derive a novel deduction-abduction algorithm to coordinate the joint learning of different modules. In the learning phase (see also Fig. 2 ), the model first performs greedy deduction over these modules to propose an initial GSS, which may yield wrong results. Next, a search-based abduction is applied top-down to search the neighborhood of initial GSS for possible solutions; such abduction revises the GSS until it generates the correct result. As a plausible solution, the revised GSS provides pseudo supervision to train each module, facilitating the learning of individual components in NSR. We evaluate NSR on three benchmarks from various domains to study systematic generalization: (1) SCAN (Lake & Baroni, 2018), mapping natural language commands to action sequences; (2) PCFG (Hupkes et al., 2020) , predicting the output sequences of string manipulation commands; (3) HINT (Li et al., 2021) , predicting the results of handwritten arithmetic expressions. All these datasets include multiple splits for evaluating different aspects of systematic generalization. NSR achieves state-of-the-art performance on all these benchmarks. Specifically, NSR obtains 100% generalization accuracy on SCAN and PCFG and improves the state-of-the-art accuracy on HINT by about 23%. Result analyses reveal that NSR possesses stronger generalization than pure neural networks due to its symbolic representation and inductive bias. It also demonstrates better transferability than existing neural-symbolic approaches due to less domain-specific knowledge required. We also evaluate NSR on a proof-of-concept machine translation task from Lake & Baroni ( 2018) and the results demonstrate the promise of applying NSR to realistic domains.

2. RELATED WORK

There has been an increasing interest in studying the systematic generalization of deep neural networks. Started by the SCAN dataset (Lake & Baroni, 2018), multiple benchmarks across various domains have been proposed, including semantic parsing (Keysers et al., 2020; Kim & Linzen, 2020) , string manipulation (Hupkes et al., 2020) , visual question answering (Bahdanau et al., 2019) , grounded language understanding (Ruis et al., 2020), and mathematical reasoning (Saxton et al., 2018; Li et al., 2021) . These datasets serve as the test bed for evaluating different aspects of generalization, including systematicity and productivity. A line of research has developed different techniques for these datasets by injecting various inductive biases into deep neural networks. We categorize previous approaches into three classes by how they inject the inductive bias: Architectural Prior The first class of methods explores different architectures of deep neural networks for compositional generalization. Dessì & Baroni (2019) found that convolutional networks are significantly better than recurrent networks in the "jump" split of SCAN. Russin et al. ( 2019 



) improved the standard RNNs by learning separate modules for syntax and semantics. Gordon et al. (2019) proposed the equivariant seq2seq model by incorporating convolution operations into RNNs to achieve local equivariance over permutation symmetries of interest, which are provided beforehand. Csordás et al. (2021) and Ontanón et al. (2022) observed that relative position encoding and sharing weights across layers significantly improve the systematic generalization of Transformers. Data Augmentation The second class of methods designs different schemes to generate auxiliary training data for encouraging compositional generalization. Andreas (2020) performed data augmentation by replacing fragments of training samples with other fragments from similar samples, and Akyürek et al. (2020) trained a generative model to recombine and resample training data. The meta sequence-to-sequence model (Lake, 2019) and the rule synthesizer (Nye et al., 2020) are trained with samples drawn from a meta-grammar with a format close to the SCAN grammar. Symbolic Scaffolding The third class of methods bakes symbolic components into neural architectures for improving compositional generalization. Liu et al. (2020) connected a memoryaugmented neural model with analytical expressions, simulating the reasoning process. Chen et al. (2020) integrated a symbolic stack machine into a seq2seq framework and learned a neural controller to operate the machine. Kim (2021) learned latent neural grammars for both the encoder

