FORMAL SPECIFICATIONS FROM NATURAL LANGUAGE

Abstract

We study the generalization abilities of language models when translating natural language into formal specifications with complex semantics. In particular, we finetune language models on three datasets consisting of English sentences and their corresponding formal representation: 1) regular expressions (regex), frequently used in programming and search; 2) First-order logic (FOL), commonly used in software verification and theorem proving; and 3) linear-time temporal logic (LTL), which forms the basis for industrial hardware specification languages. Our experiments show that, in these diverse domains, the language models maintain their generalization capabilities from pre-trained knowledge of natural language to generalize, e.g., to new variable names or operator descriptions. Additionally, they achieve competitive performance, and even outperform the state-of-the-art for translating into regular expressions, with the benefits of being easy to access, efficient to fine-tune, and without a particular need for domain-specific reasoning.

1. INTRODUCTION

Translating natural language into formal languages is a long-standing goal of artificial intelligence research dating back to the 1960s (e.g., Weizenbaum (1966) ; Winograd (1971) ). Due to recent progress in deep learning (especially Vaswani et al. (2017) ) and the development of language models (LMs), the field has seen significant improvements, for instance, in the translation from natural language into coding languages or formal mathematics (e.g., Lewkowycz et al. (2022) 2022)). In this paper, we study the generalization abilities of a pre-trained LM when translating natural language into formal specification languages. Formal specification languages are used in various computer science fields to describe a system's desired behavior, including fields such as systems design, requirements analysis, and automated reasoning. Examples include specification languages based on logics, such as Alloy (Jackson, 2002) and LTL (Pnueli, 1977) , system specification languages based on state charts, such as SDL (Fonseca i Casas et al., 2013) , or text processing specifications based on regular languages, omega-regular languages, and automata theory (Aho, 1991; Thomas, 1990) . Compared to natural language, the benefit of a formal specification language is its unambiguous semantics making it accessible for algorithmic work that relies on a specification as input. Examples are high-performance SAT and SMT solvers (e.g., Sorensson & Een ( 2005 2002)). Despite their benefits and various application areas, formal specification languages are still almost exclusively used by domain experts as their application requires significant domain-specific knowledge and extensive manual work. With the success of LMs, the goal of making the techniques mentioned above available to a broader user base to increase the correctness, trust, and assurance in computer systems is finally getting closer. So far, efforts in utilizing deep learning to translate natural language into formal specifications have relied on training (often over-engineered) neural networks from scratch (e.g., Singh et al. (2020) ; He et al. ( 2022)). Such approaches are naturally limited in their generalization capabilities. The natural questions arise: 1) Can off-the-shelf LMs achieve competitive performance when fine-tuned on this challenging translation task? 2) How well will they generalize with their pre-trained knowledge of natural language? In this work, we initiate a study on this topic by fine-tuning the open-source transformer language model T5 (Raffel et al., 2020) . The transformer architecture (Vaswani et al., 



; Chowdhery et al. (2022); Chen et al. (2021); Wu et al. (

); Biere et al. (2013); Audemard & Simon (2018); Moura & Bjørner (2008); Barrett et al. (2011)), planning tools LaValle (2006), model checkers (e.g., Cimatti et al. (2002); Holzmann (1997); Behrmann et al. (2006)), hardware synthesis tools (e.g., Bohy et al. (2012); Faymonville et al. (2017); Meyer et al. (2018)), or automatic theorem provers (e.g., Bertot & Castéran (2013); Nipkow et al. (

