DEEP ADAPTIVE SEMANTIC LOGIC (DASL): COMPIL-ING DECLARATIVE KNOWLEDGE INTO DEEP NEURAL NETWORKS

Abstract

We introduce Deep Adaptive Semantic Logic (DASL), a novel framework for automating the generation of deep neural networks that incorporates user-provided formal knowledge to improve learning from data. We provide formal semantics that demonstrate that our knowledge representation captures all of first order logic and that finite sampling from infinite domains converges to correct truth values. DASL's representation improves on prior neuro-symbolic work by avoiding vanishing gradients, allowing deeper logical structure, and enabling richer interactions between the knowledge and learning components. We illustrate DASL through a toy problem in which we add structure to an image classification task and demonstrate that knowledge of that structure reduces data requirements by a factor of 1000. We apply DASL on a visual relationship detection task and demonstrate that the addition of commonsense knowledge improves performance by 10.7% in conditions of data scarcity.

1. INTRODUCTION

Early work on Artificial Intelligence focused on Knowledge Representation and Reasoning (KRR) through the application of techniques from mathematical logic [Genesereth & Nilsson (1987) ]. The compositionality of KRR techniques provides expressive power for capturing expert knowledge in the form of rules or assertions (declarative knowledge), but they are brittle and unable to generalize or scale. Recent work has focused on Deep Learning (DL), in which the parameters of complex functions are estimated from data [LeCun et al. (2015) ]. DL techniques learn to recognize patterns not easily captured by rules and generalize well from data, but they often require large amounts of data for learning and in most cases do not reason at all [Yang et al. (2017); Garcez et al. (2012) ; Marcus (2018); Weiss et al. (2016) ]. In this paper we present [Deep Adaptive Semantic Logic (DASL)], a framework that attempts to take advantage of the complementary strengths of KRR and DL by fitting a model simultaneously to data and declarative knowledge. DASL enables robust abstract reasoning and application of domain knowledge to reduce data requirements and control model generalization. DASL represents declarative knowledge as assertions in first order logic. The relations and functions that make up the vocabulary of the domain are implemented by neural networks that can have arbitrary structure. The logical connectives in the assertions compose these networks into a single deep network that is trained to maximize their truth. Figure 1 provides an example network that implements a simple rule set through composition of network components performing image classification. Logical quantifiers "for all" and "there exists" generate subsamples of the data on which the network is trained. DASL treats labels like assertions about data, removing any distinction between knowledge and data. This provides a mechanism by which supervised, semi-supervised, unsupervised, and distantly supervised learning can take place simultaneously in a single network under a single training regime. The field of neuro-symbolic computing [Garcez et al. (2019) ] focuses on combining logical and neural network techniques in general, and the approach of [Serafini & Garcez (2016) ] may be the closest of any prior work to DASL. To generate differentiable functions to support backpropagation, these approaches replace pure Boolean values of 0 and 1 for True and False with continuous values from [0, 1] and select fuzzy logic operators for implementing the Boolean connectives. These operators generally employ maximum or minimum functions, removing all gradient information at the limits, or else they use a product, which drives derivatives toward 0 so that there is very little gradient for learning (see subsection A.7). DASL circumvents these issues by using a logit representation of truth values, for which the range is all real numbers. Approaches to knowledge representation, both in classical AI and in neuro-symbolic computing, often restrict the language to fragments of first order logic (FOL) in order to reduce computational complexity. We demonstrate that DASL captures full FOL with arbitrary nested quantifiers, function symbols, and equality by providing a single formal semantics that unifies DASL models with classical Tarski-style model theory [Chang & Keisler (1973) ]. We show that DASL is sound and complete for full FOL. FOL requires infinite models in general, but we show that iterated finite sampling converges to correct truth values in the limit. In this paper we show an application of DASL to learning from small amounts of data for two computer vision problems. The first problem is an illustrative toy problem based on the MNIST handwritten digit classification problem. The second is a well-known challenge problem of detecting visual relationships in images. In both cases, we demonstrate that the addition of declarative knowledge improves the performance of a vanilla DL model. This paper makes the following contributions: 1. The novel framework DASL, which compiles a network from declarative knowledge and bespoke domain-specific reusable component networks, enabling gradient-based learning of model components; 2. Grounding of the proposed framework in model theory, formally proving its soundness and completeness for full first order logic; 3. A logit representation of truth values that avoids vanishing gradients and allows deep logical structures for neural-symbolic systems; 4. Syntactic extensions that allow (i) restricted quantification over predicates and functions without violating first order logic constraints, and (ii) novel hybrid network architectures; 5. Evaluation on two computer vision problems with limited training data, demonstrating that knowledge reduces data requirements for learning deep models e.g. factor of 1000 for the MNIST toy problem and 10.7% improvement in accuracy for visual relationship detection in conditions of data scarcity.



Figure 1: DASL integrates user-provided expert knowledge with training data to learn DNNs. It achieves this by compiling a DNN from knowledge, expressed in first order logic, and domain-specific neural components. This DNN is trained using backpropagation, fitting both the data and knowledge. Here DASL applies commonsense knowledge to the visual relationship detection task. ∧ and → refer to 'and' and 'implies' connectives respectively.

