Guaranteeing the correctness of implementations is not straightforward. There are various standards that specify a combination of design process and verification technology for achieving a spectrum of degrees of assurance. The highest assurance requires some use of formal methods. For example, security level 4 of the Federal Information Processing Standard (FIPS 140-2) and Evaluation Assurance Level 7 (EAL7) of the Common Criteria (CC) require some use of formal specification and proof.
Both FIPS Level 4 and CC EAL7 are intended to be achievable by current commercial contractors and are weaker than the state-of-the-art in research. In particular, neither requires formal verification of the "low-level design", i.e. of the implementation that is actually deployed. An even higher level of assurance would require formally verified implementations, but using current implementation technologies this is impractical. The reason is that industry standard high-level programming languages are too complex to have tractable formal definitions, and thus it is impossible to rigorously prove properties about designs expressed in them. The best that has been done so far is to:
What is needed is a high-level programming language with a tractable semantics (i.e. a semantics that supports non-trivial reasoning) together with an implementation method guaranteeing that the semantics corresponds exactly to the actual behaviour when programs are run. Existing high-level languages have neither a tractable semantics nor formally verified implementations.
Some languages do have a formal semantics, notably the denotational semantics of Scheme and the celebrated formal operational semantics of ML, but I think it is fair to say that these semantics do not provide a practical basis for formal reasoning about programs in the languages. They are, however, extremely valuable as reference documents and for proving meta-theorems (like the type preservation for ML).
A grand challenge would be to design a high level programming language that realistically competes with C++, C# and Java for implementating industrial strength systems, and which has (i) a tractable semantics that supports formal verification of non-trivial programs, (ii) a highly assured implementation guaranteeing that the semantics is correct, and (iii) a value proposition that ensures it gets used in the real world. This challenge is too grand -- it is probably impossible.
Even if the ultimate goal is impossible, then it can still be valuable to work on partial solutions. An analogy: many properties of computer programs are undecidable, but building decision procedures for useful domains is an active and important research area, with many practical applications.
Logic is manifestly a language that has maximum tractability for formal reasoning. We propose a challenge that is much less ambitious than the grand challenge mentioned above, but is nevertheless pretty challenging.
This challenge can be split into three parts:
There are at least two ways one might execute logic programs: perform a verified compilation to an assured execution platform, or use a theorem prover to obtain the results of each program by formal deduction. The first of these is verified compilation and the second is verified interpretation. Each may well have its uses.
There already are verified implementation platforms, such as Piton and ARM610, that could be the target for verified compilation. The CLI Stack provides a verified compiler for a higher level language, micro-Gypsy. This work comes closest to already achieving our challenge, and J Moore's own Grand Challenge has many similarities with what is proposed here.
Two ways one might perform verified interpretation by theorem proving are: (i) use a uniform proof procedure, such as rewriting and beta-reduction for functions and resolution for relations, and (ii) write a proof script in a meta-language to execute via domain-specific deduction. The first of these is how traditional logic programming works, the second is how the LCF system of Robin Milner works.