LEARNING TO INFER RUN-TIME INVARIANTS FROM SOURCE CODE

Abstract

Source code is notably different from natural language in that it is meant to be executed. Experienced developers infer complex "invariants" about run-time state while reading code, which helps them to constrain and predict program behavior. Knowing these invariants can be helpful; yet developers rarely encode these explicitly, so machine-learning methods don't have much aligned data to learn from. We propose an approach that adapts cues within existing if-statements regarding explicit run-time expectations to generate aligned datasets of code and implicit invariants. We also propose a contrastive loss to inhibit generation of illogical invariants. Our model learns to infer a wide vocabulary of invariants for arbitrary code, which can be used to detect and repair real bugs. This is entirely complementary to established approaches, which either use logical engines that scale poorly, or run-time traces that are expensive to obtain; when present, that data can complement our tool, as we demonstrate in conjunction with Daikon, an existing tool. Our results show that neural models can derive useful representations of run-time behavior directly from source code.

1. INTRODUCTION

Software maintenance requires reading a lot of code. Experienced developers are adept at this, garnering rich semantics just from this "static" (viz, without running the code) inspection to find complex bugs, predict a function's outputs from its inputs, and learn new coding patterns. They strongly rely on generic assumptions about the program's run-time behavior; e.g., that a list index never escapes the list bounds and strictly increases. Such "invariants" capture general, yet relevant constraints on the program's expected run-time behavior. Automatically inferring invariants can help both developers and tools: first, they can be used to detect bugs where explicit assumptions are incorrect or implicit ones ought to be explicit; second, invariants can guide myriad other tools, such as test-case generators (Artzi et al., 2006) . However, inferring invariants is not tractable in general and sound approximations don't scale beyond very small programs. Instead, popular tools either use dynamic trace data from real executions (esp. Daikon (Ernst et al., 2007) ), which requires costly instrumentation, or focuses on highly constrained cases such as loops (Sharma et al., 2013a; Padhi et al., 2016) . 1: A snippet that demonstrates how explicitly guarded code is often equivalent to code with salient implicit, invariant-like conditions. The code on the right was a real (bug) that was patched by adding the conditional check on the left. We synthesize such samples to train our model by selectively removing if-statements. Our model correctly predicted this repair. Yet this scalability obstacle may be largely artificial. Practical programs rarely take on an exponential range of values (e.g., integers tend to come in a bounded range), and developers seem able to make such inferences without undertaking a project-scale analysis. Rather, they reliably extract them from a local context, using their past experience and cues from the code itself. Consider the snippet in Figure 1 : the program on the right uses a time variable, returned from one method and passed to another. Not only is 'time' generally non-negative, in this particular case we should not update a position (using moments dx, dy) if no time has passed either. This inference, and many more, can quickly be made from reading just these lines of code. Other times, such implicit inferences should be made explicit: this snippet was later repaired by adding the guard on the left. Based on this observed symmetry between explicitly guarded code and implicit run-time assumptions about code, we propose a model that learns invariants directly from static code. As developers rarely "assert" invariants in their code, we train this model using a proxy, by automatically converting explicitly guarded code to its implicitly guarded counterpart across millions of functions. The generated programs are constrained to be similar to real functions and used to train a large model with a new loss function that is aware of logical constraints. Our model, BODYGUARD predicts a rich vocabulary of conditions about arbitrary code from new projects, and can be used to find & fix real missing-guard bugs, such as the one in Figure 1 , with over 69% (repair) precision at 10% inspection cost. It also predicts more than two-thirds of Daikon's invariants that could previously only be inferred with run-time data, and some entirely new ones that can be validated automatically with trace data. Our work presents a significant next step in learned static analysis, being the first to reliably produce natural invariants from arbitrary code alone. More broadly, we show that learned models can implicitly represent behavioral semantics, just from code.

2. OVERVIEW

Inferring invariants for arbitrary programs is NP-hard. Sound approaches using theorem proofers are therefore constrained to restricted settings, such as simple loops (Sharma et al., 2013a) , or ones with known inputs (Pham et al., 2017) . Such approaches generally don't scale: needing SMT solvers limits tools to the few program points where invariants can be proven, and ground-truth inputs typically need to be constructed by hand. An alternative is to use execution traces (Ernst et al., 2007) : when realistic workloads are available (e.g. from test suites), they generally span entire systems. However, genuinely representative workloads are rare, so trace-based tools often generate poor invariants (Kim & Petersen). A key concern is that none of these have a notion of relevance, or naturalness of the actual statements (Hellendoorn et al., 2019a) . To address these gaps, we propose a learned invariant generator that predicts directly from code, trained with realistic examples. Our central claim is that the natural distribution of programs includes many groups of similar functions, some of which assert run-time assumptions explicitly, and with much detail, while others vary along these dimensions. As Figure 1 highlights, it is common for code not to state salient conditions (time > 0, on the right) that developers may naturally intuit, while other times (e.g. in a later revision, on the left), such conditions are explicitly checked. If this distributional assumption holds in general, then we can use explicit conditional checks that guard blocks in functions to teach our models about the implicit invariants of unguarded blocks in similar functions. Furthermore, we conjecture that in such comparable samples, the condition is both salient (since it is checked explicitly) and natural (since it is written by humans). Learning from such examples is thus a very appropriate training signal for inferring practically useful invariants. Figure 2 illustrates our data generation: we find explicitly guarded blocks in functions that can be removed without substantially perverting the program, and convert these checked cases to implicit ones (Section 3.1). We garner a large aligned dataset to learn to predict the reverse of this mapping, training a Transformer-style model for code, augmented with a loss that encourages sampling logical conditions (Section 3.2). This model, nick-named BODYGUARD, works on any (Java) function, quickly adapting to the local vocabulary and semantics, and has a natural inclination to generate realistic, salient invariants that are often valid (Section 4). This result fits in a long line of observations that programming is remarkably predictable, including in its syntax (Hindle et al., 2012) and execution values (Tsimpourlas et al., 2020) , likely by developers' design, to control the complexity of the task (Casalnuovo et al., 2019) . Yet none of these relate code and its execution directly, as we do through translating the former into general, intuitively meaningful statements about the latter.

