TEACHING TEMPORAL LOGICS TO NEURAL NET-WORKS *

Abstract

We study two fundamental questions in neuro-symbolic computing: can deep learning tackle challenging problems in logics end-to-end, and can neural networks learn the semantics of logics. In this work we focus on linear-time temporal logic (LTL), as it is widely used in verification. We train a Transformer on the problem to directly predict a solution, i.e. a trace, to a given LTL formula. The training data is generated with classical solvers, which, however, only provide one of many possible solutions to each formula. We demonstrate that it is sufficient to train on those particular solutions to formulas, and that Transformers can predict solutions even to formulas from benchmarks from the literature on which the classical solver timed out. Transformers also generalize to the semantics of the logics: while they often deviate from the solutions found by the classical solvers, they still predict correct solutions to most formulas.

1. INTRODUCTION

Machine learning has revolutionized several areas of computer science, such as image recognition (He et al., 2015) , face recognition (Taigman et al., 2014) , translation (Wu et al., 2016) , and board games (Moravcík et al., 2017; Silver et al., 2017) . For complex tasks that involve symbolic reasoning, however, deep learning techniques are still considered as insufficient. Applications of deep learning in logical reasoning problems have therefore focused on sub-problems within larger logical frameworks, such as computing heuristics in solvers (Lederman et al., 2020; Balunovic et al., 2018; Selsam & Bjørner, 2019) or predicting individual proof steps (Loos et al., 2017; Gauthier et al., 2018; Bansal et al., 2019; Huang et al., 2018) . Recently, however, the assumption that deep learning is not yet ready to tackle hard logical questions was drawn into question. Lample & Charton (2020) demonstrated that Transformer models (Vaswani et al., 2017) perform surprisingly well on symbolic integration, Rabe et al. (2020) demonstrated that self-supervised training leads to mathematical reasoning abilities, and Brown et al. (2020) demonstrated that large-enough language models learn basic arithmetic despite being trained on mostly natural language sources. This poses the question if other problems that are thought to require symbolic reasoning lend themselves to a direct learning approach. We study the application of Transformer models to challenging logical problems in verification. We thus consider linear-time temporal logic (LTL) (Pnueli, 1977) , which is widely used in the academic verification community (Dwyer et al., 1998; Li et al., 2013; Duret-Lutz et al., 2016; Rozier & Vardi, 2007; Schuppan & Darmawan, 2011; Li et al., 2013; 2014; Schwendimann, 1998) and is the basis for industrial hardware specification languages like the IEEE standard PSL (IEEE-Commission et al., 2005) . LTL specifies infinite sequences and is typically used to describe system behaviors. For example, LTL can specify that some proposition P must hold at every point in time ( P ) or that P must hold at some future point of time ( P ). By combining these operators, one can specify that P must occur infinitely often ( P ). In this work, we apply a direct learning approach to the fundamental problem of LTL to find a satisfying trace to a formula. In applications, solutions to LTL formulas can represent (counter) examples for a specified system behavior, and over the last decades, generations of advanced algorithms have been developed to solve this question automatically. We start from the standard benchmark distribution of LTL formulas, consisting of conjunctions of patterns typically encountered in practice (Dwyer et al., 1998) . We then use classical algorithms, notably spot by Duret-Lutz et al. (2016) , that implement a competitive classical algorithm, to generate solutions to formulas from this distribution and train a Transformer model to predict these solutions directly. Relatively small Transformers perform very well on this task and we predict correct solutions to 96.8% of the formulas from a held-out test set (see Figure 1 ). Impressive enough, Transformers hold up pretty well and predict correct solutions in 83% of the cases, even when we focus on formulas on which spot timed out. This means that, already today, direct machine learning approaches may be useful to augment classical algorithms in logical reasoning tasks. We also study two generalization properties of the Transformer architecture, important to logical problems: We present detailed analyses on the generalization to longer formulas. It turns out that transformers trained with tree-positional encodings (Shiv & Quirk, 2019) generalize to much longer formulas than they were trained on, while Transformers trained with the standard positional encoding (as expected) do not generalize to longer formulas. The second generalization property studied here is the question whether Transformers learn to imitate the generator of the training data, or whether they learn to solve the formulas according to the semantics of the logics. This is possible, as for most formulas there are many possible satisfying traces. In Figure 1 we highlight the fact that our models often predicted traces that satisfy the formulas, but predict different traces than the one found by the classical algorithm with which we generated the data. Especially when testing the models out-of-distribution we observed that almost no predicted trace equals the solution proposed by the classical solver. To demonstrate that these generalization behaviors are not specific to the benchmark set of LTL formulas, we also present experimental results on random LTL formulas. Further, we exclude that spot, the tool with which we generate example traces, is responsible for these behaviors, by repeating the experiments on propositional formulas for which we generate the solutions by SAT solvers.



Figure 1: Performance of our best models trained on practical pattern formulas. The x-axis shows the formula size. Syntactic accuracy, i.e., where the Transformer agrees with the generator are displayed in dark green. Instances where the Transformer deviates from the generators output but still provides correct output are displayed in light green; incorrect predictions in orange.

funding

* Partially supported by the European Research Council (ERC) Grant OSARES (No. 683300) and the Collaborative Research Center "Foundations of Perspicuous Software Systems" (TRR 248, 389792660).

