LG5 Notes: ESL - Assertion-Based Design

ABD versus the main alternative: Simulation
Formally Synthesised Bus Monitor
PSL Assertion, General Structure
PSL Extended Regular Expressions
Naive pattern to RTL Automaton
PSL Overall Layered Architecture
A Simple Model Checker
Boolean Equivalence Checker
Sequential Logic Equivalence and Simplification
Automated Stimulus Generation

Assertion-based design is an approach that encourages writing assertions as early as possible, even before coding starts.

The Z notation is quite well known and suitable for describing properties of data structures, as is object contstraint language, Alloy and so on.

Assertions should be machine readable and machine provable, as far as possible. There's even a school of thought that assertions should be executable, whenever possible, thereby generating example output that conforms to what is specified.

Assertions are (combinations of):

Imperative safety checks (like assert.h in C++), that must hold when a flow of control reaches it.
Declarative safety properties, that always hold, such as `Never are both the inner and outer door of the airlock open at once unless we are on the ground'. Declarative safety properties normally use the keywords never or always.
Strong properties (also declarative) about liveness and deadlock. Strong, in the language of PSL, means that the property cannot be checked by simulation, only by a static formal method that checks all possible executions.

All three can potentially be proved by automated provers, such as model checkers. SoC design only uses fully automated provers, whereas research in specification and verification often uses manually-guided provers where the computer may make suggestions about proof steps buts its main role is to check the result has been derived without a false step.

Declarative assertions are written either as assertions about the current state or about state trajectories (i.e. sequences of states). In general, trajectory expressions can be compiled into a checker automata (or RTL sub-circuit) and a state assertion can be applied to the output function (terminal) of the automata (RTL module).

A declarative assertion about the current state is intrinsically universally quantified over time, so really it is about all states. Applying the same reasoning to a trajectory assertion implies that the property holds for all occurrences of the trajectory, whether overlapping or not.

Assertions can be imported from previous designs or other parts of the same design for global consistency.

ABD shows up corner case problems not encountered in simulation. And in some cases, such as medical life-support systems, a formally-verified result may be required by the customer.

ABD versus the main alternative: Simulation

Simulation is effective at finding many early bugs in a design. It can sometimes find safety violations and sometimes find deadlock but it cannot prove liveness.

Once the early, low-hanging bugs are fixed, formal proof can be more effective at finding the remainder. These tend to lurk in unusual corner cases, where particular alignment of events is not handled correctly.

If a bug has a one in ten million chance of being found by simulation, then it will likely be missed, since fewer than that number clock cycles might typically be simulated in any run. However, given a clock frequency of just 10 MHz, the bug might show up in the real hardware in one second!

Simulation is generally easier to understand. Simulation gives performance results. Simulation can give a golden output that can be compared against a stored result to give a pass/fail result. A large collection of golden outputs is normally built up and the current version of the design is compared against them every night to spot regressions.

Simulation test coverage is expressed as a percentage. Given any set of simulations, only a certain subset of the states will be entered. Only a certain subset of the possible state-to-state transitions will be executed. Only a certain number of the disjuncts to the guard to an IF statement may hold. Only a certain number of paths through the block-structured behavioural RTL may be taken.

There are many ways of defining coverage: for instance do we have to know the reachable state space before defining the state space coverage, or can we use all possible states as the denominator in the fraction?

In general software, a common coverage metric is the percentage of lines of code that are executed.

Formally Synthesised Bus Monitor

Busses and I/O ports often behave according to a protocol. Such protocols are usefully defined and checked using a formal method.

By `formal' we mean a machine-readable description of what is correct or incorrect behaviour. A complete specification might describe all allowable behaviours and prohibit all remaining behaviours, but most formal definitions today are not complete in this sense. For instance, a definition that consists of of a list of saftey assertions and a few liveness assertions might still allow all sorts of behaviours that the designer knows are wrong. He can go on adding more assertions, but when does he stop ?

A bus monitor connects to the net-level bus. It is an instantiated component that looks like any other component. It can be left in for actual fabrication or, more typically, used only for simulation and removed for fabrication.

A bus monitor can keep statistics as well as detect protocol violations. It is useful to know how much data was transferred, as well as other figures, such as the average size of a data block or the percentage of time more than one initiator was contending for the bus.

Can it be synthesised from a formal spec ? Yes, the internals of the bus monitor can be normal RTL that was synthesised from the a formal specification. Transactors.

For safety violations the monitor can print out an error as soon as it is detected. However, liveness properties cannot be checked in simulation but they can be checked to see if they have occurred and the number of occurrences printed at the end of simulation. If a liveness property has occurred once, it is likely that it might happen infinitely often in the future.

There is no well-accepted coverage metrics for formal specifications. We could measure what percentage of rule disjuncts held as dominators (on their own) ? There is no clear definition of 100 percent coverage.

PSL Assertion, General Structure

The general structure of a PSL assertion has the following parts:

A name or label that can be used for diagnostic output.
A verification directive, such as assert.
When to check, such as always or eventually!.
The property to be checked: a state expression or a temporal logic expression.
A qualifying guard, such as a clock edge or enable signal at which time we expect the assertion to hold.

always a; // Predicate a holds all the time.
never a; // The same as always not(a);
eventually! a; // Predicate a will hold at some point in the future
next a ; // Predicate a will hold in the next clock cycle (or other interval as qualified by the guard).

The always operator is the most frequently used one and it specifies that the following property expression should be checked every clock.

PSL Extended Regular Expressions

Regular expressions can describe regular languages. If a new character of the language appears each clock edge (or other qualified interval) then they describe time sequences.

PSL defines an enhanced set of regular expression operators to define SERES (Sugar Extended Regular Expression) which are denoted inside curley braces and called sequences.

Sequence elements are state predicates from Modelling and Boolean layers. Core operators: Disjunction, Concatenation, Arbitrary repetition.

{ A ; B} ~~~ Semicolon denotes sequence concatenation
{ A [*]} ~~~ Postfix asterisk for arbitrary repetition
{ A$ | $B} ~~~ Vertical bar (stile) for alternation.

Make richer with additional operators:

{ A [+]} ~~~ One or more occurrences: { A ; A[*] }
{ A [*n]} ~~~ Repeat {\tt n} times
{ A [=n]} ~~~ Repeat {\tt n} times non-consecutively
{ A : B} ~~~ Fusion concatenation (last of A occurs during first of B)

Further repetition operators denote repeat count ranges. Repeat counts must be compile-time constant (in most implementations).

For ease of expression, PSL allows one to define properties and macros. PSL defines some simple path to state macros

rose(X) means { !X; X }
fell(X) means { X; !X }

Others are easy to define:

stable(X) can be defined as { X; X } \/ { !X; !X }
changed(X) can be defined as { X; !X } \/ { !X; X }

Naive pattern to RTL Automaton

It is relatively easy to compile large parts of PSL to hardware circuits. These circuits can serve as checker automaton that monitor a state trajectory to see whether it satisfies a constraint.

The ML fragment gen_pattern_matcher on the slides handles concatenation, fusion concatenation, alternation, arbitrary repetition and n-times repetition. However, this generates a one-hot automata and there are far more efficient procedures used in practice and given in the literature.

A harder operator to compile is the length-matching conjunction, since care is needed when each side contains arbitrary repetition and can declare success or failure at a number of possible times.

It is imporant to note that putting a SERES as the body of an always statement probably does not have the desired effect: it does not imply that the contents occur sequentially. Owing to the overlapping occurrences interpretation, such an always statement distributes over sequencing and so implies every element of the sequence occurs at all times. Therefore, it is recommended to always uses an SERES as part of a suffix implication or with some other temporal operator.

PSL Temporal Layer Operators

{ P |-> Q } P is followed by Q (one state overlapping)
{ P |=> Q } P is followed by Q (immediately afterwards)
{ P && Q } P and Q occur at once (length matching)
{ P & Q } P and Q succeed at once
{ P within Q } P occurred at some point during Q
{ P until Q } P held at all times until Q started
{ P before Q } P held before Q held

PSL Overall Layered Architecture

The PSL standard defines four layers to the language.

Since the language is embedded in the concrete syntax of several other languages, such as Verilog, SystemVerilog and VHDL, its syntactic details vary. In particular, creating state predicates involves expressions that range over the nets and variables of the host language. The precise means for this is defined by the MODELLING LAYER that allows one to create state properties using RTL.
All high-level languages and RTLs have their own syntax for boolean operators and this can be used within the modelling layer. However boolean combinations can also be formed using the PSL BOOLEAN LAYER.

The PSL TEMPORAL LAYER allows one to define named sub-expressions and properties that use the temporal operators. For example:

   -- Sequence definition
   sequence s1 is {pkt_sop; (not pkt_xfer_en_n [*1 to 100]); pkt_eop};

   sequence s2 is {pkt_sop; (not pkt_xfer_en_n [*1 to 100]); pkt_aborted};

   -- Property definition
   property p1 is reset_cycle_ended |=> {s1; s2};

   -- Property p1 uses previously defined sequences s1 and s2.

The PSL VERIFICATION LAYER implements the declarative language itself. It includes the main keywords, such as assert.

A Simple Model Checker

The PSL strong assertions need to be checked with a formal proof tool. Model checking is normally used.

A model checker explores every possible execution route of a finite-state system by exploring the behaviour over all possible input patterns.

There are two major classes of model checker: explicit state and symbolic. Explicit state checkers actually visit every possible state and store the history in a very concise bit array. If the bit array becomes too big they use probablistic and hashing techniques. The main example is Spin. Symbolic model checkers manipulate expressions that describe the reachable state space and these were famously implemented as BDDs in the SMV checker. There are also other techniques, such as bounded model checking, but the internal details of model checkers is beyond the scope of this course.

The most basic model checker only checks state properties. To check a path property it can be compiled into an automaton and included as part of the system itself. We then become interested in a state property predicated on the output of the checker.

To check saftey over all reachable states, one can either find the reachable state space and then see if all of it is safe, or one can check the safety predicate after each step in creating the reachable state space. The algorithm for the reachable state space, given on the slide, is simply to start with the initial state and repeatedly add any successors until closure.

Boolean Equivalence Checker

Often we have two implementations to check for equivalence, for instance, when RTL is turned into a gate-level netlist by synthesis we have:

RTL version: pre-synthesis, and
Gate-level version: post-synthesis.

After place and route operations, it is common to extract the netlist out from the masks and check that for correctness, so this is another source of the same netlist.

There two main sources of potential errors: 1. manual edits at any point may upset correctness. 2. EDA tools used in the flow may have bugs.

The boolean equivalence problem is do two functions produce the same output. However, are we interested for all input combinations? No, normally we are only interested in a subset of input combinations (because of don't care conditions).

The standard method is to create a mitre of the two designs using a disjunction of XOR gates. Then, feed negation of mitre to a SAT solver to see if it can find any input condition that produces a one on the output.

SAT solving is a matter of trying all input combinations, so has exponential cost in theory and is NP complete. However, modern programs such as zChaff exploit the intrinsic structure of the problem so that they normally are quite quick at finding the answer.

Result: if there are no input combinations that make the mitre indicate a functionality difference, then the designs are equivalent.

Commercial example Formality

Sequential Logic Equivalence and Simplification

Different implementations of a circuit may vary in their state encoding or even in the amount of state they keep in terms of bits. One might be simpler or better than the other for a given purpose. At times we need to check the equivalence of deigns.

For a synchronous clock domain, if two designs are known to have the same state encodings, then the problem degenerates to that of combinational equivalence checking of their resepective next-state functions. For each D-type flip-flop we need to check the combinational equivalence of its sourcing circuits in the two designs. However, even if the state encodings are the same, there can be un-used portions of the state space which must be treated as don't-cares for this check.

If the state encoding is known to be changed, then what can be compared? Perhaps we can compare the trajectory of states between two designs, building up our own mapping between the encodings in each design. Generally, two designs will have the same set of output terminals and so the basis of the mapping is equivalence classes formed around each possible setting of the output terminals.

This leads to the concept of full equivalence in terms of external observable behaviour. Do the two designs behave the sames as each other in black box testing: that is, without any knowledge of the internal behaviour. If so, they are said to bi-simulate each other.

Again, not all of the reachable state space may be used. The circuit might always be wired up externally so that one input is a delayed version of one output. Therefore the question arises, do a pair of designs follow the same state trajectory when interfacing with a specified reactive automaton ?

Commonly, the number of clock cycles of delay through a sub-system (its latency) is not important and perhaps can be disregarded in equivalence checking. This leads to the concept of temporally floating ports (not lectured in 2008/9), where a pair of designs are equivalent if the timing behaviour inside a port (subset of the external collections) appears equivalent, even though we would see differences if we looked at the relative timing of operations with respect to other ports. For example, the precise order in which an Ethernet hub chip sends a broadcast packet to each of its output ports does not matter, as long as it is actually delivered to each port, and from any given port this ordering cannot be perceived without peeking at the other ports. This sort of floating issimilar to the temporal decoupling ides in the loosely-timed models.

Two other variations in the problem definition arise for systems where the exact number of clock cycles is not considered important. This clearly also applies to asynchronous systems without a clock.

When a pair of sub-systems are wired to each other, the composition may be a synchronous or asynchronous (turn-taking) composition. The algorithms used for making the different combinations may vary (a research topic of DJG) and the possible behaviours may also be different. For instance, it might only be possible to enter a given state (on the next clock edge) without the simultaneous change of two communicating nets, but if these are sourced in different components that share the same clock this might never happen. This is rather like the two main types of auction: where participants either bids in turn, with knowledge of the previous bid, or all bid together without knowledge (competitive tender) and then another round is used when there are tie-breaks needed.
Strong or weak bi-simulation: a pair of designs might differ in the number of clock cycles they take to produce a response to a given input. As mentioned, this can be hidden by putting the input and output in different temporal groups, or it can be considered a difference (strong bi-simulation) or it can be considered not a difference if no nets other than the clock have changed in the meantime: a stuttering equivalence (a form of weak bi-simulation).

Sequential Logic Simplification Algorithms

Combinational logic is simplified using Quine Quine–McCluskey or Espresso algorithms, or enhanced variants thereof, that generate multi-output, multi-level logic with minimised area, speed, power or testability.

Looking at state re-coding in general, we find similar or the same metrics that might be minimised, as well as some notion of the total number of state-holding flip-flops that we seek to reduce. Converting to one-hot coding can improve speed at the expense of area and converting to a binary encoding reduces state at the expense of speed. However, this sort of re-coding is not actually state minimisation.

A finite-state machine may have more states than it needs to perform its observable function because some states are totally equivalent to others in terms of output function and subsequenct behaviour. Note that one-hot coding does not increase the reachable state space and so is not an example of that sort of redundancy.

Sequential logic minimisation involves finding classes of equivalent states and re-writing the next state function to use just one member of each equivalence class.

A Moore machine can be simplified by the following baseline procedure:

1. Partition all of the state space into blocks of states where the observable outputs are the same for all members of a block.
2. Repeat until nothing changes (i.e. until it closes) For each input setting:
- 2a. Chose two blocks, B1 and B2.
- 2b. Split B1 into two blocks consisting of those states with and without a transition from B2.
- 2c. Discard any empty blocks.
3. The final blocks are the new states.

Alternative algorithm: start with one partition per state and repeatedly conglomerate. Recent algorithms use a mixture of the two approaches.

One future use of this sort of procedure might be generate an instruction set simulator for a processor from its full RTL RTL implementation. This sort of de-pipelining would give a non-cycle accurate, higher-level model that runs much faster in simulation.

Automated Stimulus Generation

Simulations and test programs require stimulus. This is a sequence of input signals, including clock and reset, that exercise the design.

Given that formal specifications for many of the input port protocols might exist, one can consider automatic generation of the stimulus, from random sources, within the envelope defined by the formal specification. Several commercial products do this, including Verisity's Specman Elite, Synopsys Vera.

Here is an example of some code in Specman's own language, called 'e', that defines a frame format used in networking. Testing will be inside envelope defined by keep statement.

struct frame {
  llc: LLCHeader;
  destAddr: uint (bits:48);
  srcAddr: uint (bits:48);
  size: int;  
  payload: list of byte;           
  keep payload.size() in [0..size];  };

Sequences of bits that conform to the frame structure are accepted at an input port of the design under test.

An heirarchy of specifications and constraints is supported. One can compose and extend one specification to reduce its possible behaviours:

  extend frame { keep size == 0;  };

Conclusion

ABD today is often focussed on saftey and liveness properties of systems and formal specifications of the protocols at the ports of a system. However, there are many other useful properties we might want to ensure or reason about, such as those involving counting and/or data conservation. These are less-well embodied in contemporary tools.

Formal methods are taking over from simulation, with the percentage of bugs being found by formal methods growing. However, there is a lack of formal design entry. Low-level languages such as Verilog do not seamlessly mix with automatic synthesis from formal specification and so double-entry of designs is common.