NEURAL PROBABILISTIC LOGIC PROGRAMMING IN DISCRETE-CONTINUOUS DOMAINS

Abstract

Neural-symbolic AI (NeSy) methods allow neural networks to exploit symbolic background knowledge. NeSy has been shown to aid learning in the limited data regime and to facilitate inference on out-of-distribution data. Neural probabilistic logic programming (NPLP) is a popular NeSy approach that integrates probabilistic models with neural networks and logic programming. A major limitation of current NPLP systems, such as DeepProbLog, is their restriction to discrete and finite probability distributions, e.g., binary random variables. To overcome this limitation, we introduce DeepSeaProbLog, an NPLP language that supports discrete and continuous random variables on (possibly) infinite and even uncountable domains. Our main contributions are 1) the introduction of DeepSeaProbLog and its semantics, 2) an implementation of DeepSeaProbLog that supports inference and gradient-based learning, and 3) an experimental evaluation of our approach.

1. INTRODUCTION

Neural-symbolic AI (NeSy) (Garcez et al., 2002; De Raedt et al., 2021) focuses on the integration of symbolic and neural methods. The advantage of NeSy methods is that they combine the reasoning power of logical representations with the learning capabilities of neural networks. Such methods have been shown to converge faster during learning and to be more robust (Rocktäschel and Riedel, 2017; Xu et al., 2018; Evans and Grefenstette, 2018) . The challenge of NeSy lies in combining discrete symbols with continuous and differentiable neural representations. So far this has been accomplished by interpreting the outputs of neural networks as the weights of Boolean variables. These weights can either be given a fuzzy semantics (Donadello et al., 2017; Diligenti et al., 2017) or a probabilistic semantics (Manhaeve et al., 2018; Yang et al., 2020) . The latter is also used in neural probabilistic logic programming (NPLP) (De Raedt et al., 2019) , where neural networks parametrize probabilistic logic programs. A shortcoming of traditional probabilistic NeSy approaches is that they fail to capture models that integrate continuous random variables and neural networks -a feature that has already been achieved with mixture density networks (Bishop, 1994) and also more generally within a deep probabilistic programming (DPP) setting (Tran et al., 2017; Bingham et al., 2019) . Despite the expressiveness of these methods, they have so far focused on efficient probabilistic inference in continuous domains, e.g., Hamiltonian Monte Carlo or variational inference. It is unclear whether they can be generalised to enable logical and relational reasoning. This exposes a gap between DPP and NeSy as reasoning is, after all, a fundamental component of the latter. We close the DPP-NeSy gap by introducing DeepSeaProbLogfoot_0 . DeepSeaProbLog is an NPLP language with support for discrete-continuous random variables that retains logical and relational reasoning capabilities. More concretely, we allow for neural networks to parameterize arbitrary and differentiable probability distributions. We achieve this using the reparameterization trick (Ruiz et al., 2016) and continuous relaxations (Petersen et al., 2021) . This stands in contrast to DeepProbLog (Manhaeve et al., 2018) where only finite categorical distributions are supported. Our main contributions are (1) the well-defined probabilistic semantics of DeepSeaProbLog, a differentiable discrete-continuous NPLP language, (2) an implementation of inference and gradient-based learning algorithms, and (3) an experimental evaluation showing the necessity of discrete-continuous reasoning and the efficacy of our approach.

2. LOGIC PROGRAMMING CONCEPTS

A term t is either a constant c, a variable V or a structured term of the form f(t 1 ,...,t K ), where f is a functor and each t i is a term. Atoms are expressions of the form q(t 1 ,...,t K ). Here, q/K is a predicate of arity K and each t i is a term. A literal is an atom or the negation of an atom ¬q(t 1 ,...,t K ). A definite clause (also called a rule) is an expression of the form h:b 1 ,...,b K where h is an atom and each b i is a literal. Within the context of a rule, h is called the head and the conjunction of b i 's is referred to as the body of the rule. Rules with an empty body are called facts. A logic program is a finite set of definite clauses. If an expression does not contain any variables, it is called ground. Ground expressions are obtained from non-ground ones by means of substitution. A substitution θ = {V 1 = t 1 , . . . , V K = t K } is a mapping from variables V i to terms t i . Applying a substitution θ to an expression e (denoted eθ) replaces each occurrence of V i in e with the corresponding t i . While pure Prolog (or definite clause logic) is defined using the concepts above, practical implementations of Prolog extend definite clause logic with an external arithmetic engine (Sterling and Shapiro, 1994, Section 8 ). Such engines enable the use of system specific routines in order to handle numeric data efficiently. Analogous to standard terms in definite clause logic, as defined above, we introduce numeric terms. A numeric term n i is either a numeric constant (a real, an integer, a float, etc.), a numeric variable N i , or a numerical functional term, which is an expression of the form φ(n 1 ,...,n K ) where φ is an externally defined numerical function. The difference between a standard logical term and a numerical term is that ground numerical terms are evaluated and yield a numeric constant. For instance, if add is a function, then add(3, add(5, 0)) evaluates to the numerical constant 8. Lastly, numeric constants can be compared to each other using a built-in binary comparison operator ▷◁ ∈ {<, =<, >, >=, =:=, =\=}. Here we use Prolog syntax to write comparison operators, which correspond to {<, ≤, >, ≥, =, ̸ =} in standard mathematical notation. Comparison operators appear in the body of a rule, have two arguments, and are generally written as φ l (n l,1 ,...,n n,K ) ▷◁ φ r (n r,1 ,...,n r,K ). They evaluate their left and right side and subsequently compare the results, assuming everything is ground. If the stated comparison holds, the comparison is interpreted by the logic program as true, else as false.

3. DEEPSEAPROBLOG

3.1 SYNTAX While facts in pure Prolog are deterministically true, in probabilistic logic programs they are annotated with the probability with which they are true. These are the so-called probabilistic facts (De Raedt et al., 2007) . When working in discrete-continuous domains, we need to use the more general concept of distributional facts (Zuidberg Dos Martires, 2020), inspired by the distributional clauses of Gutmann et al. (2011) . Definition 3.1 (Distributional fact). Distributional facts are expressions of the form x distribution(n 1 ,...,n K ), where x denotes a term, the n i 's are numerical terms and distribution expresses the probability distribution according to which x is distributed. The meaning of a distributional fact is that all ground instances xθ serve as random variables that are distributed according to distribution(n 1 ,...,n K )θ. All variables appearing on the right-hand side of a distributional fact must also appear on its left-hand side. Definition 3.2 (Neural distributional fact). A neural distributional fact (NDF) is a distributional fact in which a subset {f j } L j=1 ⊆ {n i } K i=1 of the set of numerical terms in the distributional fact is implemented by neural networks that depend on a set of neural parameters {λ j } L j=1 . Example 3.1 (DeepSeaProbLog program). Consider the DeepSeaProbLog program below where humid(Data) denotes a Bernoulli random variable that takes the value 1 with probability p given by the output of a neural network humidity_detector. temp(Data) denotes a normally distributed variable whose parameters are predicted by a network temperature_predictor. The program further contains two rules that deduce whether we have good weather or not.



'Sea' stands for the letter C, as in continuous random variable.

