Decompilation into Logic

What?

By decompilation into logic, or simply decompilation, we mean proof-certified translation of concrete machine code into functional programs defined in the logic of a theorem prover.

Why?

Decompilation is easily coded up as a tool, a decompiler, which significantly simplifies verification of machine-code programs. Instead of having to read and reason about machine code, the verifier can, with the aid of a decompiler, reason about relatively clean functional programs. Properties proved of these functional programs carry over to the real machine code via theorems that the decompiler proves when it extracts the functional programs: any behaviour of the extracted functions is also the behaviour of the machine code.

Decompilation can also be used to implement proof-producing synthesis tools and binary translators. Synthesis tools take as input a functional program and synthesise machine code which it proves has exactly the behaviour of the given input function. Binary translators take machine code for one architecture as input and produce machine code for a different architecture as output, with the output being guaranteed to have exactly the same behaviour of the original machine code.

How?

We implement decompilation as a fully automatic proof tool within an interactive theorem prover (HOL4 in our case). The proofs it produces are based on detailed models of the instruction set architecture (ISA) of the machine language. We parametrise the decompiler by the underlying ISA model, i.e. the same decompiler can be used for multiple different ISAs.

Our initial approach to decompilation was based on automating proofs in a general-purpose Hoare logic for machine-code programs. Since then, we have refined the approach and made it significantly simpler, faster and more generally applicable. The new and improved version is described in the following paper.

Magnus O. Myreen, Michael J. C. Gordon and Konrad Slind.
Decompilation into Logic — Improved.
Submitted, 2012.

Abstract

This paper presents improvements to a technique which aids verification of machine-code programs. This technique, called decompilation into logic, allows the verifier to only deal with tractable extracted models of the machine code rather than the concrete code itself. Our improvements make decompilation simpler, faster and more generally applicable. In particular, the new technique allows the verifier to always avoid tedious reasoning directly in the underlying machine-code Hoare logic or the model of the instruction set architecture. The method described in this paper has been implemented in the HOL4 theorem prover.

Benchmarks

The implementation that was used to produce the benchmarks presented in this paper are available in benchmarks.zip with instructions on how to build them.

More?

For more information, contact Magnus Myreen.