Model Checking a CAN network of PIC CPUs
David Greaves, MJ Nam, Univ Cambridge, Computer Laboratory
Keywords: MicroChip, CBG CSEW, PIC, Model Checking, CAN Bus, MEH.
In this work, the instruction set and programmers' model for a number of small microprocessors, taken from the MicroChip Pic range, together with a model of the Controller Area Network (CAN) bus were read into a model checker for equivalence and property checking.
The instruction set and architecture for each microprocessor were specified declaratively as XML files (available on links below) and the assembler code to run on the processor cam from MicroChip AN215 (linked below). This code implements CAN nodes suitable for a wide variety of sensors and actuators, as first implemented by Bosch and now widely used in automotive and related industries. An assembler, disassembler and emulator for each processor are automatically generated from the XML descriptions and these are integrated into a complete system emulation using a top-level XML description. A full model of the CAN controller chip, with its SPI interface, the MCP2510, was not available, and so a simplified model was created by hand, also in XML, and memory mapped into the PIC addressable space, with the SPI interface routines in the PIC assembler recoded.
The system emulation is currently implemented entirely within a simple, locally-developed, interpreted language, called Lambda. Lambda uses C++ objects to model each component, but these objects have no significant native executable code. Instead, they contain Lambda abstract syntax trees and each method is an application of the interpreter to that tree. Lambda can be read and written to/from XML files and, in the future, it can also be written out in other forms, such as synthesisable RTL or SystemC. Hence FPGA implementations of the emulated system are envisaged. We do not code the functionality directly in C++ since this could not be walked to generate derived models.
To model check the system we first generate lifted finite state machines from each processor and its code. These lifted FSMs can execute, in one step, a number of sequential instructions executed by the real processor, provide the processor does not loop inside the sequence or interact with externally observable signals. This is achieved by symbolic application of the next state function a number of times. A simple composition of the lifted machines can be checked in lock-step, a model that is rather unrealistic, or with every machine non-deterministically and fairly choosing whether or not to advance at each global model step, so that all possible interleavings are checked while liveness is preserved.
Pic Check Flow
The instruction set for each PIC processor was converted to an XML file from the Microchip datasheet(s) using a set of macros and the C pre-processor. A separate XML file was created for each device family, where the main distinction between the families is the opcode width: 12, 14 or 16 bits. Within a family, devices mainly vary in terms of their memory map and on-chip resources. These aspects were captured by hand and entered into an XML file for each device. Here are some of the C macros and XML files ZIP.
From the description of a PIC processor, Unix executable programs are generated that implement an assembler, symbolic disassembler and emulator. These outputs are perfectly usable but otherwise uninteresting.
An Intel Hex file containing firmware for a processor can be read in and resultant, closed, system saved as a single XML file that encapsulates the combined behaviour. The behaviour of the closed system can be compared against one of those stored earlier using a pair of checkers, or alternatively safety and liveness properties of the current system, specified as simple assertions, can be checked.
The closed system can be written out as a synthesiable Verilog RTL file giving a direct hardware implementation of the closed system. This uses the same number of clock cycles as the original, but can be smaller when registers that are unused in the system (i.e. have constant values) are deleted.
The closed design is converted to a MEH (memoized expression heap) so that symbolic model checking and symbolic bisimulation using a variation of the Bouali and Simone algorithm, where clock edges that produce no observable output are treated as tau transitions.
The closed system can instead be converted to a fully-binary form using standard expansion techniques, labled as 'Pandex' in the diagram. The resulting design can be saved as a Verilog netlist that instantiates common gates, such as AND/OR/DFF and MUX2.
A memomized expression heap (MEH) holds a number of expressions using exactly one copy of any sub-expression. In addition, the expressions are held in a normal form, provided they can be converted to that normal form without exceeding linear memory use. The MEH can represent all monadic, diadic and triadic operators found in C/C++ and Verilog (except division). It uses four bit look up tables (LUTs) to represent logical operators as described in [ANDERSEN97]. Apart from the LUT's the only other operators are OR-REDUCE, ARRAYSUBS, LSHIFT, ADD and MULT. A comprehensive set of distributive laws ensure that shifts are folded to the bottom leaves of the trees so that they apply directly to variables and that ADD and MULT float to the top of the tree. Right shift is represented by an LSHIFT of a negative ammount. Subtraction and all comparison predicates, such as $>=$ are represented using ADD of a negated value. To determine the normal representation for all commutative operators, including the commuatative LUTs, ADD and MULT, every variable is given an index in a linear ordering and every subexpression may be sorted with respect to another by comparing indexes. A negative-normal form is approached by preferring certain nestings of LUT values, such as rewriting an XNOR of an XOR the other way around. Compared with BDDs (binary decision diagrams) [BRYANT86]}, the MEH has three major advantages: it supports bit vectors instead of just bits, creating one does not have any significant time or memory cost and the original structure of the input is generally preserved. Its disadvantages are that tautology checking has cost that grows with the number of primary inputs (rather than being instant for a BDD) and that a heuristic approach to expression simplification must be used. We use calls to the transduction method [Muroga89] deployed after each intermediate step of the more-complex operations, such as forsome and forall.
The reduced form is the basis for the second comparison, where the number of clock cycles between observable equivalence states is unimportant. The reduced form may be written out as a synthesisable Verilog file that consumes fewer clock cycles than the original PIC design.
The links to the project outputs are being slowly added here (2006/7). For members of the department, the project source code is held in the CSEW tool using CVS under /usr/groups/han/cvs/mixersew.
The CAN emulation is also being used as a testbed for code generated from native-compiled Pushlogic.
This paper shows that embedded processors containing a kilobyte or so of firmware can be viewed fully as hardware systems from the point of view of validation and model extraction. Property checking algorithms used in the EDA industry can be directly applied. Equally, the combined processor and firmware can be converted into a synthesisable netlist form, where the registers that contain fixed values or whose bits are direct functions of other bits can be deleted. Finally, clock cycle reduction is possible, a technique that is especially applicable when the original clock frequency was unconstrained (e.g. depended on integrated RC oscillator manufacturing variations).
In the final version of this paper, we will give the actual figures for number of registers eliminated, and gates and clock cycles used.
Further work will be to include the CAN bus model. The CAN bus itself is know to work and some verification work has been done by others (see below). Prolema/Spin was elsewhere used to check the higher-layer CAN protocols, in a horizontal slice, as used for infotainment: e.g. for traffic announcements channel setup/teardown of connections and block data transfer. To continue our vertical slice, we must include the relative complexity of CAN MAC devices, that can include multiple receive filters, and additional protocols such as SPI between the MAC and the microprocessor. Indeed, it would be advisable to include all of this complexity for checking and interesting to see how much can be eliminated by bi-simulation before synthesis. A drop-in equivalent for the PIC MCP2510 CAN PHY, used in this project, is being written in Verilog to explore how much complexity it adds to the system: in essence a lot of bit-serial operations will be introduced and then eroded by the bi-simulation.
The machine code is arranged in ROM with low density: ROM SIZE = 3811 words of 14 bits, ones=2030, density=3%.
This code can be stored in a BDD using under 1000 nodes. ROM to BDD result
The combined code and ISS for the PIC 12c67x device forms a BDD/MEH with 426 inputs (including primed inputs) and 10823 nodes.
(C) 2005-6 DJ Greaves.