# Formal Verification of Machine Code Anthony Fox, Mike Gordon and Magnus Myreen

### **Instruction Sets**

Microprocessors are ubiquitous — they are used to control servers, laptops, tablets, phones, TVs, transportation and a vast range of other digital devices. The behaviour of microprocessors is controlled by low-level software or *machine code*. An *instruction set* is a defined collection of machine code instructions, as implemented by a class of processors. Families of instruction sets include: ARM, x86, Power, MIPS and SPARC.

Instruction set architectures (ISAs) are often extremely complex — consisting of hundreds of low-level instructions, each altering a processor's *registers* and *memory* in a wide variety of different ways.

# Formal Specification and Verification

There are applications where software assurance extremely important. Errors in software can have significant repercussions, with a single bug having the potential to cause huge corporate and/or personal loss.

Using mathematic models, it is possible to verify that software will always behave as required. Our work involves formally specifying the *semantics* of instruction set architectures and using this as the basis for formally verifying the correctness of machine code programs.

#### **Disassembly of section .text:**

#### 00000000 <hypotenuse>:

| 0:         | e92d4008 | <pre>push {r3, lr}</pre>                              |
|------------|----------|-------------------------------------------------------|
| 4:         | ee607aa0 | vmul.f32 s15, s1, s1                                  |
| 8:         | ee407a00 | vmla.f32 s15, s0, s0                                  |
| <b>C</b> : | eef70ae7 | vcvt.f64.f32 d16, s15                                 |
| 10:        | eef10be0 | vsqrt.f64 d16, d16                                    |
| 14:        | eef40b60 | vcmp.f64 d16, d16                                     |
| 18:        | eef1fa10 | vmrs APSR_nzcv, fpscr                                 |
| <b>1c:</b> | 0a000002 | <pre>beq 2c <hypotenuse+0x2c></hypotenuse+0x2c></pre> |
| 20:        | eeb70ae7 | vcvt.f64.f32 d0, s15                                  |
| 24:        | ebffffe  | bl 0 <sqrt></sqrt>                                    |
| 28:        | eef00b40 | vmov.f64 d16, d0                                      |
| 2c:        | eeb70be0 | vcvt.f32.f64 s0, d16                                  |
| 30:        | e8bd8008 | pop {r3, pc}                                          |
|            |          |                                                       |

ARM machine code — as produced by the GCC compiler

## HOL4: Interactive Theorem Proving

Interactive theorem provers are software tools that provide assistance in constructing formal proofs. The HOL4 proof assistant derives from Robin Milner's LCF theorem prover, which was initially developed in the 1970s. The logical foundation of HOL4 is Higher-Order Logic. HOL4 provides a excellent framework in which to write tools for the formal verification of machine code programs.

### L3: ISA Specification

The L3 language has been designed to ease the task of constructing ISA models in theorem provers. In particular, L3 acts as an authoring tool for HOL4 ISA specifications. We have L3 written specifications for the ARM and x86-64 ISAs. Advanced tools have been developed in HOL4 for working with these ISA models — these include a decompiler and web interface for exploring the semantics of ARM instructions.

-- BL<c> <label>
-- BLX<c> <label>
define Branch > BranchLinkExchangeImmediate
 ( targetInstrSet :: InstrSet,
 imm32 :: bits(32) )
 =
 {
 if CurrentInstrSet() == InstrSet\_ARM then
 LR <- PC - 4
 else
 LR <- PC<31:1> : '1';
 targetAddress =
 if targetInstrSet == InstrSet\_ARM
 then Align (PC, 4) + imm32

|- SPEC (STATE arm\_proj,NEXT\_REL \$= NextStateARM,arm\_instr,\$=) (cond (Aligned (pc,4) Λ Aligned (r13,4)) \* (arm\_exception NoException \* arm\_undefined und \* arm\_CurrentCondition cond \* arm\_Encoding enc \* arm\_CPSR\_ arm\_CPSR\_E F \* arm\_CPSR\_T F \* arm\_CPSR\_M 16w \* arm\_REG RName\_SPusr r13 \* arm\_REG RName\_3usr r3 \* arm\_REG RName\_LRusr r14 \* arm\_REG RName\_PC pc \* arm\_MEM (r13 - 8w) b0 \* arm\_MEM (r13 - 8w + 1w) b1 \*  $arm_MEM$  (r13 - 8w + 2w) b2 \*  $arm_MEM$  (r13 - 8w + 3w) b3 arm\_MEM (r13 - 8w + 4w) b4 \* arm\_MEM (r13 - 8w + 4w + 1w arm\_MEM (r13 - 8w + 4w + 2w) b6 \*  $arm_MEM (r13 - 8w + 4w + 3w) b7)) {(pc,0xE92D4008w)}$ (arm\_exception NoException \* arm\_undefined F \* arm\_CurrentCondition 14w \* arm\_Encoding Encoding\_ARM \* arm\_CPSR\_J F \* arm\_CPSR\_E F \* arm\_CPSR\_T F \* arm\_CPSR\_M 16w \* arm\_REG RName\_SPusr (r13 - 8w) \* arm\_REG RName\_3usr r3 \* arm\_REG RName\_LRusr r14 \* arm\_REG RName\_PC (pc + 4w) \*

| -           | <image/>                                                                                                                                                                                                                                                                                                                             |         |
|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
|             | ● ● ● ● bl +#12                                                                                                                                                                                                                                                                                                                      |         |
|             | ◄ ► Svr-acjf3-armie.cl.cam.ac.uk/main.cgi                                                                                                                                                                                                                                                                                            | C Reade |
|             | ARM instruction evaluator         Item   About   Learn more   Feedback           Enter an instruction to run         Architecture:         ARMV7-A ÷         Instruction set:         ARM ÷         Processor mode:         Usr ÷         Byte order:         If-Then block:         Outside         Assembly code:         Di + #12 |         |
|             | Lookup an instruction                                                                                                                                                                                                                                                                                                                |         |
|             | Mnemonic: - +                                                                                                                                                                                                                                                                                                                        |         |
| JF*         | bl +#12<br>machine code: EB000001                                                                                                                                                                                                                                                                                                    |         |
| *<br>) b5 * | <pre>architecture = ARMv7-A not sctlr.V sctlr.A not sctlr.U cpsr.IT = 0 not cpsr.E</pre>                                                                                                                                                                                                                                             |         |







not cpsr.J

not cpsr.T

cpsr.M = 0x10