Orangepath/HPRLS Project: Hardware and Embedded Software Synthesis from Executable Specifications.
Direct Synthesis of Asynchronous Logic

# Direct Synthesis of Asynchronous Logic from Statecharts via Fundamental Mode Untimed Automata

A number of design techniques have been published over the years that generate an asynchronous netlist from an unclocked or fundamental mode automata. Indeed, edge-triggered flip-flops for use in synchronous logic can be designed in this way, as can whole sub-systems.

The HPRLS H2 compiler [HPRLS] generates internal psudocode in the form of statecharts. (Note: Statecharts in the user's source code are internally converted to assertions for scheduling and it is the output from this schedule that is the statechart psudocode mentioned here.) These statecharts can be readily converted to software or synchronous hardware, but generating gate-level asynchronous hardware is a more complex problem. In this paper we describe how we have tackled this problem by first converting the statechart code to a Boolean Network.

#### Boolean Network

A Boolean network is a completely general view of a subsystem of a digital logic system [Soulié 87].

A Boolean network consists of a set of N nodes, where many of these nodes are driven inside the Boolean network using combinatorial Boolean functions of the other nodes, whereas the remainder are externally driven. Such a network can have state-holding properties and/or include ring oscillators.

When there are N nodes, with k driven internally and r=N-k being driven externally, there are 2**N states that the network may potentially be in. Many of these states are transitory because the network will immediately advance to another state as a result of the internal driving functions. Many others will never be entered at all owing to the nature of the wiring. We will forbid oscillators completely. This means all states are unreachable, transitory or stable. Certain networks may contain cliques of reachability, and so reachability must be defined with respect to a given starting state. Unreachable states are also known as dead states.

Leading from each stable state, there are r implicit arcs, corresponding to the r possible changes in the external inputs with respect to the current input vector. Most of these arcs will lead first to transit states, and these in turn lead, under the control of the internal driving functions, eventually, to stable states. The arcs that lead out of a transit state are called delta arcs (after Paul Dirac). In the absence of oscillators, a state is stable if it has no delta arc leading from it.

Where more than one delta arc emanates from a single state, the eventual stable state reached may depend on which of these arcs is selected. This is known as a race or hazard and a good design has none.

A delta arc which is created as described so far, as a Boolean combinatorial function of the nodes, will run parallel to the axis it drives, when looking at the Boolean Network as an N-dimensional binary space. However, in the sequel, we need to consider Boolean Networks generated from other sources. If an arc's start and end nodes are different in a given dimension, we say the arc projects onto that dimension. Note that no delta arc projects on to a dimension indexed by an input node. We use the term `diagonal' to describe an arc that projects onto more than one dimension.

A Boolean Network may be parallel, permuted or localised. In a parallel network, each driving function is assumed to execute in parallel, and so all driven values are computed in one generation from the values of the previous generation. In a permuted network, each driving function is computed in turn according to a set ordering or permutation. For real hardware systems, certain small cliques are engineered to operate in local parallelism and the overall design is race-free if it achieves the same outcome for all permutations. We here limit ourselves to designing small sub-circuits that have this property and will rely on users of these circuits to instantiate them in ways that, in turn, do no violate this property.

For completeness, and so that we can make experiments with existing, well-known netlists, we present first a method of converting a netlist to a Boolean Network. In a later section, we will turn to netlist synthesis which is the reverse direction and more interesting.

Rather than considering composing a pair of Boolean networks under a general algorithm, we need here only consider adding a single combinatorial function to an existing Boolean network. This directly corresponds to adding a new gate to an existing netlist. In either view, one of the externally driven inputs becomes an internally driven node. The new driving function will add a delta arc between all pairs of nodes that differ in the driven bit position. The direction of the arc is determined by the output value of the combinatorial function (or can be bi-directional if the function generates don't cares in certain places).

The elementary Boolean Network consists of a single state. N=k=r=0.

An additional (unused) input can be added to a Boolean network by duplicating each state (thereby doubling the number of states).

To add, for example, a two-input NOR gate to the elementary Boolean network, we need to add three nodes, of which two are inputs and one is an internally driven node. This gives us a total of eight nodes and no arcs.

To then add the gate itself, we add straightforward arcs according to its logic function. Four unstable states now have delta arcs to four stable states. A slash on a state indicates it is stable (as does the lack of emanating delta arcs).

#### Adding a new input and a second gate

An RS-latch may be formed by first adding a new input C, which duplicates the existing state map, giving now sixteen states in total. Then, the addition of a second NOR2 gate to drive what was external input b, adds further delta arcs, leaving just five stable states: set, clear, setting, clearing and violated. Note that two states at the top left are metastable in that they may jump to either of the two stable states.

### Converting From A State Chart to an Asynchronous Netlist

We now present the main contribution of this monograph: an algorithm to synthesise a netlist from a state chart using the following steps:

There are two broad semantics for imperative behavioural code. Both have the concept of a program counter or current 'point' of execution. In a conventional program, commands are processed in order and flow advances by itself to the next label. On the other hand, in a statechart, control has no implied flow: it sits at the current state with all commands at that point being active, with the conditional ones being ready to fire when circumstances match their condition.

• 1. As a first step we rewrite the statechart in a more simple form by expanding block-structured commands such as `while' and `wait' into lower-level forms with gotos to additional labels.

• 2. As a second step we rewrite the statechart so that it is speed independent. This means that one part of the resulting circuit does not require another part to operate at a given speed. The D-type example below illustrates the effects of including and not including this step. This rewrite fails and produces an error if the design includes a reachable oscillator. An oscillator is a ring of points connected by delta arcs or arcs that are satisfied by a common setting of the input terminals. An extension might be to collapse such rings into a single point, but in general, rules would then have to be added to decide how to handle the actions at points along the ring.

• 3. In the third step, we convert the statechart to a Boolean Network.

• 4. In the fourth step, we convert the Boolean Network to a Netlist.

We present these steps using a simple example: the synthesis of a D-type flip-flop.

#### Stage 1: Simplify Statechart

In fundamental mode, the D-type can be described with the following state chart:

```
statechart()
{
start:
wait(~clk); wait(clk);
q = d;
goto start;
}
```

After statechart elaboration, we have the following declarations:

```
start: [ A(~clk, n1) ]
n1: [ A(clk, n2) ]
n2: [ B(q, d), A(1, start) ]
```
The declarations consist of a `point' followed by a list of `elabs'. The points n1 and n2 are generated by a macro expansion of the wait command.

Using a one-hot coding for the point labels, there are six terminal nodes for initial consideration : clk, d, q, start, n1, and n2. Two are externally driven inputs: clk and d. There are 2**6=64 states in the Boolean Network.

This definitions of the A, B and C elabs is given using the following examples.

• The line "start: [ A(~clk, n1) ]" can be read as "add a delta arc from all states that contain start and not clk and not n1 to their companion states where n1 is set and start is clear". These are diagonal arcs.
• The line "n2: [ B(q, d) ]" would be read as "add a delta arc from all states containing n2 where q is different from d to the companion states with a different q that is the same as d."
• The C elab is used for encoded variables. The declaration "v: [ C(clk, c1) ]" is read as "add a delta arc from all states where clk and the nodes that hold point contain v to the companion states where the nodes where the states that hold point contain c1.

#### Stage 2: Rewrite as a Speed-Independent Statechart

The above statechart, if allowed to feed on to stage 3, would generate a poor D-type: it would consist of a transparent latch from d to q with a glitch generator for its enable input. This will require the glitch to be longer than the window time of the transparent latch. In this stage we remove this problem.

Looking at it in terms of the above abstracted fragment of the Boolean Network, we see a clear race. When the clock goes high, we move to n2, but then we may not move horizontally (in the q dimension) before we move down to the start, and whether we do or not alters the final value of q. This feature can be deduced in that an assignment takes place inside a point that is transitorary.

#### Improved Design of D-type

As is well known, a much better approach is the explicit master slave arrangement. Our implementation of this uses a `while' construct that is converted to gotos during elaboration in the normal way that HLLs are compiled to assembler. The output of that elaboration is shown on the right.

```
statechart()                      statechart()
{                                   {
start:                              start:
while(~clk) m = d;                  m = d;
while(clk) q = m;                   if (~clk) goto start;
goto start;                       n1:
}                                     q = m;
if (clk) goto n1;
goto start;
}
```

During elaboration, the conditional branches end up with a contrary appearance: the implied flow of control from one point to the next requires an explicit elab whereas a goto to the current point is the default action and does not require an elab. Hence that master-slave D-type looks like this:

```
start: [ B(m, d), A(clk, n1) ]
n1: [ B(q, m), A(~clk, start) ]
```

The significant aspect of the master-slave form is that there is no assignment made during a transient state. Instead, all of the assigns take place in stable states. The form takes advantage of the fact that the D input is stable for an epoch before the active clock edge.

#### Automatic Insertion of Master/Slave Design

It is possible to convert any statechart into a form where all the assigns effectively take place inside such while loops. The conversion process we use embodies the same concept of introducing the master variable (or variables) as used in a master-slave flip-flop.

In general terms, we can describe each point in the input statechart as either stable or transitory (ignoring unreachable) for each particular setting of the input terminals and state variables. We call a setting of the input variables a scenario. As mentioned above, we forbid input designs that contain oscillators (alternatively we can forbid scenarios which cause oscillations). A point path is a sequence of points that are executed on the change of scenario. All paths lead from a point that was stable in the previous scenario to a point that is stable in the current scenario. These are never the same point since otherwise the starting point would not have been an end point in the previous scenario. (Actually this is possible where state variables are changed along the path, so need to write this a bit more clearly!)

Where there is an assignment, B(lhs, rhs), during a transient point on a point path that is not also made at the terminating stable point we make the following modification to the statechart. This modification is done independently for each lhs. Basically, we first introduce a new master variable, m, that is assigned B(m, rhs) in the stable point at the start of the point path and is assigned B(lhs, m) instead of the transient point assign and at all subsequent points along the point path to the end stable point of the point path.

Now point paths corresponding to different scenarios may intersect or merge, so the situation is slightly more complex in practice. To handle this, the slave assign is replaced with B(lhs, ME) where ME is a master expression, built up as the point path is traversed. A different master variable is allocated for each starting point of a point path (some of these can be shared during later optimisation). The starting ME is just a reference to the corresponding master variable but ME's are combined with a 2-to-1 multiplexor at each merge point of a pair of point paths for a given lhs. The control input to the multiplexor is any expression that distinguishes the two scenarios, and so, in naive form, is just a partial decode of the input terminals. Where a pair of point paths (for the same lhs but different scenarios) intersect at a point or or run confluently and then diverge, this fact can be ignored and the paths processed independently at transient points.

Once the above modification to the assign elabs has been made, we modify the goto elabs so that all points that are transitorary under a given scenario are bypassed in favour of their final, stable destination at the end of the point path. A point that is bypassed for all scenarios becomes unreachable and is deleted entirely. There will then be no delta arcs between different settings of points in the resulting Boolean Network.

Returning to the flip-flop running example, please note that although it is possible for a poorly-implemented master-slave flip-flop to have a brief moment of transparency, where both the master and slave are enabled, this does not constitute a race condition in the fundamental-mode diagram of the flip-flop. A race only arises with such a flip-flop when it is part of an external, nominally synchronous, feedback loop. In this work, we rely on the delay through the external subsystems being much greater than any possible moment of transparency. An extension would be to automatically generate timing/layout directives that ensure this case is avoided entirely. Instead of doing this in great detail, we take a blanket approach suitable for simulation: we simply introduce a small, explicit delay in our Verilog/VHDL netlist at the output of every output net from the resulting circuit. These are visible (TODO) in the netlists shown below.

Optimisations.... The text for the final part of this section is withheld until publication.

This has six nodes, two externally driven.

#### Stage 3: Convert Statechart to Boolean Network

With one-hot coding of point, the nodes from the statechart are directly the nodes in the Boolean Network, whereas in a binary encoding of point, the encoded nodes replaced the point labels.

The arcs for the Boolean Network are added by taking each elab in turn and converting it to one or more arcs for the Boolean Network.

Each elab of the form `p: A(x, y)' generates an arc from each state where p and x is true and y is false to the equivalent state where p and y is true and x is false. This will in general be a diagonal arc. It will be a straight arc only in the rare case that p and y are the only two values of a binary-encoded two-point chart.

Each elab of the form `p: B(x, y)' generates an arc from each state where p and x are true and y is false to the equivalent state where x is false. It also generates an arc from each state were p and y are true and x is false to the equivalent state where x is true.

Each elab of the form `p: C(x, y)' generates arcs in the same way as the type `A' arcs, except that it is not necessary to make x false because of the encoding.

As an another example, consider a flip-flop that clocks off both edges of a net. This can be described as follows:

```
statechart()
{
start:
wait(~clk); q = d;
wait(clk);  q = d;
goto start;
}
```
This generates two master variables, one being assigned while the clock is high and the other while it is low. In general, signals that are changed on edges of multiple signals or multiple edges of single signals will have multiple master sections.

#### Stage 4: Convert Boolean Network to Netlist

The input to this stage is a Boolean Network, consisting of a list of nodes, flagged as to which are externally driven, and the set of delta arcs describing the behaviour. It is the task of this phase to work out a set of combinatorial functions that form the internal drivers and obey the arcs. These functions are then converted to a netlist, with some functions becoming a single combinatorial gate or buffer and the remainder becoming RS-latches.

There are many possible netlists that correspond to a given Boolean Network. This is true even if the netlist is composed entirely of two-input NOR gates. Our procedure is to search for a sequence of steps that exactly reverse a composition sequence formed of the operations defined above for composition. The sequence ends when the elemental Boolean Network is reached.

One rule we use is where the Boolean Network can be divided into a pair of identical networks whose internal arcs are indifferent to one of the input values and whose transverse arcs (i.e. between the two networks) may be divided by this input. We may then invoke the exact reverse procedure to adding an input described above, which, in reverse form, is to discard one of the two halves and all of the transverse arcs.

The other decompositional rule can be applied when all of the arcs in the system are consistent with a modification where a certain combinatorial function is selected to drive one of the internal nodes where if we nominally place this driving function outside the network then it allows the first decomposition to be executed.

To use these steps, in detail our procedure is as follows:

• 1. If we have the elemental Binary Network then we have finished and stop.

• 2. If there is a dimension which projects onto no arcs, then that dimension can be discarded. Goto 1.

• 3. If we have any dimensions over which there are no diagonal arcs then we generate a driver for one of these dimensions and remove its arcs. We pick the dimension that produces the most simple driver under some heuristic, such as the dimension with the most arcs. Goto 1.

• 4. While no dimension without diagonals exists, we reduce the diagonality of the dimension with the lowest number of diagonals by either diverting an arc or adding a bypass through an extra dimension. An arc may be diverted to flow to an alternative destination node that already has a delta arc to the original destination. Otherwise, we add a dimension and divert the arc perpendicularly to the new dimension and take a Manhattan route through the new dimension and drop down vertically into the old plane at the required destination. goto 1. Note that the route through the new dimension will not project onto any input node.

The motivation for adding the `extra dimension' is simply the generation of an intermediate gate whose output fans out to serve more than one purpose.

#### Generating a Driver

In the step 3 above, we generate a driver for a node from a non-diagonal set of arcs. The driver is initially considered to be an RS latch, but it may be found to degenerate to a single gate, inverter or buffer. The RS latch has a setting condition which is the disjunction of all the state decodes where an arc leaves a state without the the node set to where it is set. The clearing condition is the disjunction of all the state decodes where an arc leaves a state with the node set to where it is clear. A state decode is a logic AND function on the state space to select a particular state. If there are states that are at neither end of an arc for our node, then the full RS latch is needed. On the other hand, if all states participate in an arc for our node, then the setting condition is the complement of the clearing condition and the RS latch simplifies to a memoryless combinatorial logic function. Since all arcs are parallel to the driven node, the driven node itself may always be removed by a logic simplification step from both the setting and clearing disjunctions: hence it does not finally appear in either the setting or clearing conditions. A commonly occuring case of this is ... (explain about additional labels producing delta arcs).

The RS latch occurs in principal as shown above. But in general, for real designs, a vast amount of logic minimisation of the setting and clearing conditions occurs and frequently the whole driver should turn into a single NAND or NOR gate.

## Examples

These examples are being completed for the final version of this paper.

### Example 1 - Muller C Element

The Muller C element is a favourite component of asynchronous logic designers. The simplest form has two inputs and one output and is described for input to our tool either with the following statechart (left) or declarations (right).

```
statechart()                         hprls()
{                                   {
start:                                node q;
if (a & b) goto s;                  input a, b;
if (!a & !b) goto c;                always (a & b) -> q;
goto start;                         always (!a /\ !b) -> !q;
}
s:
q = 1;
goto start;

c:
q = 0;
goto start;
}
```

The following netlist is generated for the Muller element. It uses four gates, whereas a 3-gate design is more widely used.

### Example - A Master Slave D-type

There are two broad semantics for imperative behavioural code: In a conventional program, commands are processed in order and flow advances by itself to the next label. On the other hand, in a statechart, control has no implied flow: it sits at the current state with all commands at that point being active, with the conditional ones being ready to fire when circumstances match their condition.

Here is the AST for the statechart of a master slave D-type.

```val mychart = [
Olabel("start"),
Oe_as_c(Oassign(Onet("m"), Onet("d"))),
Oif(Onet("clk"), Ogoto("n1"), NONE),

Olabel("n1"),
Oe_as_c(Oassign(Onet("q"), Onet("m"))),
Oif(Olognot(Onet("clk")), Ogoto("start"), NONE)
]
;
```

Gives output:

```
CBG Orangepath Asynchronous Compiler.
Input contained 6 items
Input contained 2 twigs
Input contained 4 nodes
start n1 q m  were the Boolean Network nodes
Iterate on 10 arcs
start  3
n1  3
q  2
m  2
were straight
were diagonal
Creating driver for q
Iterate on 8 arcs
start  3
n1  3
m  2
were straight
were diagonal
Creating driver for m
Iterate on 6 arcs
start  3
n1  3
were straight
were diagonal
Creating driver for start
Iterate on 3 arcs
n1  3
were straight
were diagonal
Creating driver for n1
Iterate on 0 arcs
were straight
were diagonal
All driven
Created 25 gates.

// CBG Orangepath HPRLS System

module circuit(d, clk, q);
output q ;
input clk ;
input d ;
wire b127 ;
wire qb126 ;
wire qq125 ;
wire g124 ;
wire g123 ;
wire g122 ;
wire b121 ;
wire qb120 ;
wire qq119 ;
wire g118 ;
wire g117 ;
wire g116 ;
wire g115 ;
wire b114 ;
wire qb113 ;
wire qq112 ;
wire g111 ;
wire g110 ;
wire g109 ;
wire b108 ;
wire qb107 ;
wire qq106 ;
wire g105 ;
wire g104 ;
wire g103 ;
BUF ib127(n1, qq125);
NOR2 iqq125(qq125, qb126, g115);
NOR2 iqb126(qb126, qq125, g124);
OR2 ig124(g124, g122, g123);
AND2 ig123(g123, start, clk);
AND2 ig122(g122, n1, clk);
BUF ib121(start, qq119);
NOR2 iqq119(qq119, qb120, clk);
NOR2 iqb120(qb120, qq119, g118);
OR2 ig118(g118, g116, g117);
AND2 ig117(g117, start, g115);
AND2 ig116(g116, n1, g115);
INV ig115(g115, clk);
BUF ib114(m, qq112);
NOR2 iqq112(qq112, qb113, g111);
NOR2 iqb113(qb113, qq112, g109);
AND2 ig111(g111, start, g110);
INV ig110(g110, d);
AND2 ig109(g109, start, d);
BUF ib108(q, qq106);
NOR2 iqq106(qq106, qb107, g105);
NOR2 iqb107(qb107, qq106, g103);
AND2 ig105(g105, n1, g104);
INV ig104(g104, m);
AND2 ig103(g103, n1, m);
endmodule
```

Close inspection of this flip-flop shows it to contain master and slave transparent latches interconnected in the normal way, but there are two more transparent latches to generate the start and n1 values.

### Example - Dual Edge Flop

The Dual Edge-Triggered Flip-Flop is a useful building-block that sets on the active edge of its set input and is cleared on the active edge of its clear input.

In fundamental mode, it can be described with the following state chart:

```
statechart()
{
start:
wait(~s); wait(s);
q = 1;
wait(~r); wait(r);
q = 0;
goto start;
}
```

This is one of the standard circuits for this component from the 74 series MSI databook.

### Example 3 - Protocol Handshake

This example will be placed on a separate page: IEEE-488/GPIB Transcoding.

## Conclusions

A good deal of prior work has been done on asynchronous logic generation and the work presented here is little more than a simple place holder. Ideas from this prior work can be used instead of or incorporated in the above. In particular, a variety of basic forms for holding `point' can be explored. However, this place holder is sufficient to generate asynchronous interconnection logic between synchronous sections. As a next step we will concentrate how the higher layers of the compiler interact with this code generator and the APIs needed for optimisations.

This page describes work in progress and is not intended to be definitive.

## References

[Soulié 87] Françoise Fogelman Soulié. Boolean Networks. Automata Networks in Computer Science. Manchester University Press. ISBN 0 7190 2209.pp20-34.

TAST : Tool for Asynchronous cicruits SynThesis http://tima.imag.fr/cis/

Petrify: a tool for manipulating concurrent specifications and synthesis of asynchronous controllers. IEICE Transactions on Information and Systems, Vol. E80-D, No. 3, March 1997, pages 315-325. www.lsi.upc.es/~jordic/petrify/petrify.html.

http://www.esterel-technologies.com/v3/

http://www.cs.man.ac.uk/async/tools/index.html

Balsa

• (C) January 2003 DJ Greaves. Home.

PAGE UNDER CONSTRUCTION. MARCH 03. MODIFIED JAN 05.