LEARNING WITH LOGICAL CONSTRAINTS BUT WITH-OUT SHORTCUT SATISFACTION

Abstract

Recent studies have explored the integration of logical knowledge into deep learning via encoding logical constraints as an additional loss function. However, existing approaches tend to vacuously satisfy logical constraints through shortcuts, failing to fully exploit the knowledge. In this paper, we present a new framework for learning with logical constraints. Specifically, we address the shortcut satisfaction issue by introducing dual variables for logical connectives, encoding how the constraint is satisfied. We further propose a variational framework where the encoded logical constraint is expressed as a distributional loss that is compatible with the model's original training loss. The theoretical analysis shows that the proposed approach bears salient properties, and the experimental evaluations demonstrate its superior performance in both model generalizability and constraint satisfaction.

1. INTRODUCTION

There have been renewed interests in equipping deep neural networks (DNNs) with symbolic knowledge such as logical constraints/formulas (Hu et al., 2016; Xu et al., 2018; Fischer et al., 2019; Nandwani et al., 2019; Li & Srikumar, 2019; Awasthi et al., 2020; Hoernle et al., 2021) . Typically, existing work first translates the given logical constraint into a differentiable loss function, and then incorporates it as a penalty term in the original training loss of the DNN. The benefits of this integration have been well-demonstrated: it not only improves the performance, but also enhances the interpretability via regulating the model behavior to satisfy particular logical constraints. Despite the encouraging progress, existing approaches tend to suffer from the shortcut satisfaction problem, i.e., the model overfits to a particular (easy) satisfying assignment of the given logical constraint. However, not all satisfying assignments are the truth, and different inputs may require different assignments to satisfy the same constraint. An illustrative example is given in Figure 1 . Essentially, the example considers a logical constraint P → Q, which holds when (P, Q) = (T, T) or (P, Q) = (F, F)/(F, T). However, it is observed that existing approaches tend to simply satisfy the constraint via assigning F to P for all inputs, even when the real meaning of the logic constraint is arguably (P, Q) = (T, T) for certain inputs (e.g., class '6' in the example). To escape from the trap of shortcut satisfaction, we propose to consider how a logical constraint is satisfied by distinguishing between different satisfying assignments of the constraint for different inputs. The challenge here is the lack of direct supervision information of how a constraint is satisfied other than its truth value. However, our insight is that, by addressing this "harder" problem, we can make more room for the conciliation between logic information and training data, and achieve better model performance and logic satisfaction at the same time. To this end, when translating a logical constraint into a loss function, we introduce a dual variable for each operand of the logical connectives in the conjunctive normal form (CNF) of the logical constraint. The dual variables, together with the softened truth values for logical variables, provide a working interpretation for the satisfaction of the logical constraint. Take the example in Figure 1 : for the satisfaction of P → Q, we consider its CNF ¬P ∨ Q and introduce two variables τ 1 and τ 2 to indicate the weights of the

