SOFTENED SYMBOL GROUNDING FOR NEURO-SYMBOLIC SYSTEMS

Abstract

Neuro-symbolic learning generally consists of two separated worlds, i.e., neural network training and symbolic constraint solving, whose success hinges on symbol grounding, a fundamental problem in AI. This paper presents a novel, softened symbol grounding process, bridging the gap between the two worlds, and resulting in an effective and efficient neuro-symbolic learning framework. Technically, the framework features (1) modeling of symbol solution states as a Boltzmann distribution, which avoids expensive state searching and facilitates mutually beneficial interactions between network training and symbolic reasoning; (2) a new MCMC technique leveraging projection and SMT solvers, which efficiently samples from disconnected symbol solution spaces; (3) an annealing mechanism that can escape from sub-optimal symbol groundings. Experiments with three representative neuro-symbolic learning tasks demonstrate that, owing to its superior symbol grounding capability, our framework successfully solves problems well beyond the frontier of the existing proposals.

1. INTRODUCTION

Neuro-symbolic systems have been proposed to connect neural network learning and symbolic constraint satisfaction (Garcez et al., 2019; Marra et al., 2021; Yu et al., 2021; Hitzler, 2022) . In these systems, the neural network component first recognizes the raw input as a symbol, which is further fed into the symbolic component to produce the final output (Yi et al., 2018; Li et al., 2020; Liang et al., 2017) . Such a neuro-symbolic paradigm has shown unprecedented capability and achieved impressive results in many tasks including visual question answering (Yi et al., 2018; Vedantam et al., 2019; Amizadeh et al., 2020 ), vision-language navigation (Anderson et al., 2018; Fried et al., 2018) , and math word problem solving (Hong et al., 2021; Qin et al., 2021) , to name a few. As exemplified by Figure 1 , to maximize generalizability, such problems are usually cast in a weakly-supervised setting (Garcez et al., 2022) : the final output of the neuro-symbolic computation is provided as supervision during training rather than the label of intermediate symbols. Lacking direct supervised labels for network training appeals for an effective and efficient approach to solve the symbol grounding problem, i.e., establishing a feasible and generalizable mapping from the raw inputs to the latent symbols. Note that bypassing symbol grounding (by, e.g., regarding the problem as learning with logic constraints) is possible, but cannot achieve a satisfactory performance (Manhaeve et al., 2018; Xu et al., 2018; Pryor et al., 2022) . Existing methods incorporating symbol grounding in network learning heavily rely on a good initial model and perform poorly when starting from scratch (Dai et al., 2019; Li et al., 2020; Huang et al., 2021) . A key challenge of symbol grounding lies in the semantic gap between neural learning which is stochastic and continuous, and symbolic reasoning which is deterministic and discrete. To bridge the gap, we propose to soften the symbol grounding. That is, instead of directly searching for a deterministic input-symbol mapping, we optimize their Boltzmann distribution, with an annealing strategy to gradually converge to the deterministic one. Intuitively, the softened Boltzmann distribution provides a playground where the search of input-symbol mappings can be guided by the neural network, and the network training can be supervised by sampling from the distribution. Game theory indeed provides a theoretical support for this strategy (Conitzer, 2016): the softening makes the learning process a series of mixed-strategy games during the annealing process, which encourages stronger interactions between the neural and symbolic worlds. The remaining challenge is how to efficiently sample the feasible input-symbol mappings. Specifically, feasible solutions are extremely sparse in the entire symbol space and different solutions are poorly connected, which prevents the Markov Chain Monte Carlo (MCMC) sampling from efficiently exploring the solution space. To overcome this deficiency, we leverage the projection technique to accelerate the random walk for sampling (Feng et al., 2021b) , aided by satisfiability modulo theory (SMT) solvers (Nieuwenhuis et al., 2006; Moura & Bjørner, 2008) . The intuition is that disconnected solutions in a high-dimensional space may become connected when they are projected onto a low-dimensional space, resulting in a rapid mixing time of the MCMC sampling (Feng et al., 2021a) . The SMT solver, which is called on demand, is used as a generic approach to compute the inverse projection. Although MCMC sampling and SMT solvers may introduce bias, the theoretical result confirms that it can be pleasantly offset by the proposed stochastic gradient descent algorithm.

2. SOFTENING SYMBOL GROUNDING

Throughout this paper, we refer to X as the input space of the neuro-symbolic system, and Z as its symbol space or state space (e.g., all legal and illegal arithmetic expressions in the HWF task). We consider the neuro-symbolic computing task which first trains a neural network (parameterized by θ), mapping a raw input x ∈ X to some latent state z ∈ Z with a (variational) probability distribution P θ (z|x). The state z is further fed into a predefined symbolic reasoning procedure to produce the final output y. The training data contains only the input x's and the corresponding y's, which casts the problem into the so-called weakly-supervised setting. In general, we formulate the pre-defined symbolic reasoning procedure and the output y as a set of symbolic constraint S y on the symbol space. For instance, in Figure 1 , the constraint specifies that the arithmetic expressions must evaluate to 42. We say a state z is feasible or satisfies the symbolic constraint if z ∈ S y . The major challenge in this neuro-symbolic learning paradigm lies in the symbol grounding problem, i.e., establishing a mapping h : X → Z from the raw input to a feasible state that satisfies the symbolic constraint. Specifically, an effective mapping h should enable the model to explain as many observations as possible. As a result, the symbol grounding problem on a given dataset {(x i , y i )} i=1,...,N can be formulated as (1)



Figure 1: An example neural-symbolic system for handwritten formula evaluation. It takes a handwritten arithmetic expression x as input and evaluates the expression to output y. The neural network component M θ recognizes the symbols z (i.e., digits and operators) in the expression, and the symbolic component evaluates the recognized formula by, e.g., the Python function 'eval'. The challenge in training M θ comes from the lack of explicit z to bridge the gap between the neural world (x to z) and the symbol world (z to y). Through softened symbol grounding, the model training and the constraint satisfaction join force to resolve the latent z to fit both the given x and y.

) := -N i=1 log P θ (z i |x i ) s.t. z i = h(x i ) ∈ S y i , i = 1, . . . , N.

