Orangepath/HPRLS Project: Hardware and Embedded Software Synthesis from Executable Specifications.
H2 Transactor Generation

On this web page we will illustrate how to synthesise various transactors, bus monitors and bus-bridges, all from the same source code. We also illustrate that the source code can have a wide variety of forms, ranging from C-like imperative code, through PSL specifications and SysML-like state charts.

HPR Orangepath Transactor Synthesis.

White Paper: FDL'10, Southampton: `Synthesis of Glue Logic, Transactors, Multiplexors and Serialisors from Protocol Specifications.' Whitepaper PDF. Slides PDF.

Including Synthesis of Bus Monitors and Bus Bridges

An (ESL) Electronic System Level transactor converts from a hardware to a software style of component representation. A hardware style uses shared variables to represent each net, whereas a software style uses callable methods and up-calls. Transactors are frequently required for busses and I/O ports. Fortunately, formal specifications of such busses and ports are becoming commonly available, so synthesising a transactor from the specification is a natural thing to do.

A transactor tends to have two ports, one being a net-level interface and the other with a thread-oriented (TLM) interface defined by a number of method signatures. The thread-oriented interface may be a target that accepts calls from an external client or it may be an initiator that make calls to a remote client. The calls may typically be blocking to implement flow control.

There are four forms of transactor for a given bus protocol. Either side may be an initiator or a target, giving four possibilities.

The initiator of a net-level interface is the one that asserts the command signals that take the interface out of its starting or idle state. The initiator for an ESL/TLM interface is the side that makes a subroutine or method call and the target is the side that provides the entry point to be called.

Note that all four forms of transactors share a common vocabulary of transactions. In particular, each can perform both read and write transactions, and hence the actual flow of data can be from the left to the right or right to the left in each form. In other words, all of them do both push and pull transactions, where push denotes that data travels from the initiator to the target and pull the reverse.

Of the four transactors, those that are initiator on one side and target on the other are the most common and useful. The transactor that is a target on both sides is known as a mailbox or maildrop. The transactor that is an initiator on both sides accords with the pump design pattern. To be useful it must leave the idle state of its own accord (or as a result of some covert stimulus) to 'poll' its co-respondents. Having multiple TLM upcalls (such as the read, locked-read and write illustrated) is not normally appropriate since it would not know which one to first call.

One goal of TLM modelling in the EDA industry is to allow high-level models of systems to be implemented entirely in software, with the firmware linked directly against executable models of the hardware. However, there is commonly a need to generate hybrid models, using lower-level models of some parts of the design: transactors are one way to provide 'shims' to connect to these lower-level models.

Bus Monitors

A bus monitor does not drive the bus, and hence all of its net-level connections are inputs.

A bus monitor checks that the bus follows a protocol and reports errors on violations. Errors may be reported by logging to the console or by up calling to user-provided logging methods.

Violations are either of safety or liveness. A safety violation can be notified as soon as it occurs whereas liveness can only be fully checked with offline model checking. However, an entry point is provided on the monitor that the user may call at any time and any liveness conditions that have not so-far been encountered can be reported at that time.

Generating a transactor using H2

The H2 language enables one to define an interface and a protocol on that interface. In addition, the HPR synthesis core behind the H2 language has functions for generating executable code from specifications, such as the protocol definition. From the same definition, it is possible to create a master or slave transactor for a connection, using the 'forwards' and 'backwards' keywords of the language to indicate the pin directions.

Example: BVCI

BVCI is part of the OCP bus specification from the OCP International Partnership (Open Core Protocol Specification Release 1.0). See VSI Alliance Virtual Component Interface Standard Version 2 (OCB 2 2.0) that can be downloaded from www.ocpip.org.

The BVCI bus uses the following physical pins, defined in H2 as:

interface BVCI()
{

  neutral:
    node in : clk;

  // In the forward direction of an H2 interface, 'in' means inputs to a net-level client.
  //  The directions of all signals are reversed for an initiator.
  forward:


  // Command port
  node [ADDR_WIDTH-1:0] out :   address;
  node [LANES-1:0] out :        be;
  node [BVCI_CMD_HI:0] out:     cmd;
  node out:                     eop;
  node [PLEN_HI:0] out :        plen;
  node [DATA_WIDTH-1:0] out :   wdata;
  node out:                     cmdval;
  node in:                      cmdack;

  // Response port
  node in:                      rspval;
  node out:                     rspack;
  node in:                      bgt_rerror;
  node [DATA_WIDTH-1:0] in:     rdata;
  node in:                      reop;
  node in:                      rerror;
}

The protocol for a read operation is defined in a close-to-executable way with the following fragment of H2. Note, however, that instead of assignments, equality predicates are used. This enables the H2 tool to treat either the lhs or rhs of each expression as the value to be assigned according to whether an initiator or target is being synthesised.

action bvci_read(rdaddr : [31:0], rdlen : [255..0], rdbuf : [31:0] [])
{
  fundef min (a1, b1) 
  {
    return (a1<:b1) ? a1: b1;
  }

  node integer : read_cmdpos, read_rsppos;
  process(rose rclock)
  {
    read_cmdpos := 0; read_rsppos := 0;
    cmdval == 1; eop == 0;
    while (read_cmdpos <: rdlen) // can this be worked backwards! Quite surprising!
    {
      hpr_pause(1);
      if (cmd == BVCI_CMD_READ && address==rdaddr && plen == rdlen &&
         be==BE_FULL && cmdack && rdlen > 0)
           {
              rdaddr += DATA_BYTES;
              read_cmdpos += DATA_BYTES;
              eop == (read_cmdpos >= rdlen-DATA_BYTES);
           }
       else {     eop == 0; }
    }
    cmdval == 0;
    // The following wait statement prohibits multiple outstanding transactions
    // from this thread, but can this code be re-entrant ? 
    waituntil(read_cmdpos==read_rsppos); // This can just force one to be the other!
    return 0; // Successful return value
   }


  //separate thread  - enabled while parent is enabled
  //todo : need a keyword to make it live on (borrow word from sysml?)  (eg on write).
  process(rose rclock)
  {
    hpr_pause(1);
    while (rspval && rspack)
    {
      hpr_pause();      
      rdbuf[read_rsppos/DATA_BYTES] == rdata;
      read_rsppos += DATA_BYTES;
      if (reop) break;
    }
  }
}

A similar piece of code defines the write protocol and the read locked protocol.

We used the following compile-time parameters to define a D32/A32 instance of the bus:

  const ADDR_WIDTH = 32;        // Address width
  const DATA_WIDTH = 32;        // Data width
  const LANES = DATA_WIDTH/4;   // Byte lane enable msb (width-1).
  const PLEN_HI = 7;            // Maximum packet length (log of)
  const BVCI_CMD_HI = 3;        // Command bus width


  // Command values
  const BVCI_CMD_READ = 1;
  const BVCI_CMD_WRITE = 2;
  const BVCI_CMD_LOCKED_READ = 3;

Stimulus Generator

The transactor can be synthesised into a data generator that creates an infinite stream of transactions.

section TRANSGEN()
{
  node initiator BVCI_XTOR: genport;

  node [31:0] [256] : genrbuffer;
  node integer : addros, counter, back_ptr;
  process (rose rclock)
  {
    counter := 100;
    back_ptr := 10;
    while(1)
    {
        counter = counter + 1;
        hpr_pause(1);
        addros := counter * 32;
        genport.read(addros, 12, genrbuffer);
        back_ptr := back_ptr + 2;
        hpr_pause(1);
        back_ptr := back_ptr + 3;
    }
  }
}

Command line to compile the generator:

sand.cl.cam.ac.uk: h2comp bvci.h2 -no-clear-arrays -root TRANSGEN  -vnl transgen.v  -preserve-sequencer 1
sand.cl.cam.ac.uk: wc transgen.v 
 163  485 7379 transgen.v

Simulation plot from the generator:

Transactors: Output Forms

A number of output forms are possible. We can select from initiator and target, client and server and we can select the output language. The HPR tool can readily generate Verilog, C++ and assembler for the UIA processor family. Output forms for a transactor that just performs read transactions are shown below. Server side transactors were generated for C++ and assembler forms, but Verilog does not have a means to pass threads between Verilog modules, so a client was generated that makes a PLI upcall on each transaction.

The "-xtor style" command line flag instructs the tool to make a transactor. The style can be initiator, target or bus monitor.

SC_MODULE(stimulus)
{
  void run()
  {
    wait(100);
    for (int addr = 0; addr < 0x80; addr += 0x10)
      {
        unsigned int buffer[0x10];
        printf("Start TLM : read from addr=%X\n", addr);
        initiator0.read(addr, 0x10, buffer, 0);
        printf("TLM call returned %08X %08X ...\n", buffer[0], buffer[1]);
      }
    while (1) wait();
  }

  SC_CTOR(stimulus)
  {
    SC_THREAD(run);
  }
};



// Instantiate the structure...   
struct BVCI bvci0("bvci0");
bvci_initiator initiator0("initiator0");
bvci_target target0("target0");



int main()
{
			     
  //  Make both transactors talk to the same pin-level bus...
  initiator0.netside = &bvci0;
  target0.netside = &bvci0;

  // Install upcalls for the target transactor to call			      
  target0.read = read_upcall;
  target0.write = write_upcall;
  mainrun bsystem0;
  stimulus stimulus("stimulus0");
  slaver slaver("slaver0");
     ... 
  return 0;
}

Bus Router/Bridge Synthesis

Given definitions of transactors we can readily synthesise bus bridges with any number of ports. The HPR tool will generate Verilog and SystemC output for these designs.

Nominally, the example bridge consists of four transactors and some central glue logic. However, these are fully flattened into each other during the HPR synthesis process. The glue logic contains mutex flags that ensure only one output transaction is taking place on an output port at one time.

The transactors used in the earlier example above provide three entry points on the ESL side. However, for use in a router or bridge, the three entry points are combined into a single, general-purpose transactor with a command variable, ranging over an enumeration type, to select the operation. When initiating a transaction, the value of the command variable is known beforehand, whereas at a target the value is not known beforehand and it must be parsed from the observed behaviour on the net-level interface.

//
// Combine the three ESL entry points into a single 'generic' action that
// uses a local enumeration type to define which lower-level transaction to invoke.
//
typedef bvci_cmd_t = enum { E_READ, E_WRITE, E_LREAD };
action bvci_generic(cmd : bvci_cmd_t, Maddr : [31:0], Mlen : [255..0], Mbuffer : [31:0] [])
{
  arb	{ read(Maddr, Mlen, Mbuffer) && cmd == CMD_READ; },
  	{ lread(Maddr, Mlen, Mbuffer) && cmd == CMD_LREAD; },
  	{ write(Maddr, Mlen, Mbuffer) && cmd == CMD_WRITE; };
}
//
// The following section provides a suitable root for a two-port bridge/switch.
// 
// Each target port (A or B) can receive transactions and pass them
// on to the appropriate initiator ports (C or D), depending on the
// value of the address field.
//
// This fragment lacks locks and a few details that are now being added.
section BVCI_BRIDGE_FOURPORT()
{
  node in : clk;
  node target reverse BVCI_XTOR_CORE: port_inA;
  node target reverse BVCI_XTOR_CORE: port_inB;

  node initiator BVCI_XTOR_CORE :port_outC;
  node initiator BVCI_XTOR_CORE: port_outD;


  node [31:0] [256] : mbuf_A;
  node [255..0]     : mlen_A;
  node  [31:0]      : maddr_A;
  node bvci_cmd_t   : cmd_A, cmd1_A;
  process(rose clk)
  {
    // Service input port inA
    while(1) {
      port_inA.generic(cmd_A, maddr_A, mlen_A, mbuf_A);
      cmd1_A := (cmd_A == CMD_READ) ? CMD_WRITE:  CMD_READ;
      if (mbuf_A & ROUTING) port_outC.generic(cmd1_A, maddr_A, mlen_A, mbuf_A);
      else port_outD.generic(cmd1_A, maddr_A, mlen_A, mbuf_A);
    }
  }

  node [31:0] [256] : mbuf_B;
  node [255..0]     : mlen_B;
  node  [31:0]      : maddr_B;
  node bvci_cmd_t   : cmd_B, cmd1_B;
  const ROUTING = 0xC00000;
  process(rose clk)
  {
    // Service input port inB
    while(1) {
      port_inB.generic(cmd_B, maddr_B, mlen_B, mbuf_B);
      cmd1_B := (cmd_B == CMD_READ) ? CMD_WRITE:  CMD_READ;
      if (mbuf_B & ROUTING) port_outC.generic(cmd1_B, maddr_B, mlen_B, mbuf_B);
      else port_outD.generic(cmd_B, maddr_B, mlen_B, mbuf_B);
    }
  }


// Exclusion locks: each port is able to do one transaction at a time.
// Implement with a synchronised keyword on the actions.
// 
}

The generated Verilog for this design is here FOUR PORT BRIDGE VERILOG.

Other Input Forms

As well as providing the C-like imperative language, an H2 file may contain SysML hierarchic state charts and PSL assertions. Any mix of these can be combined to describe the above examples, since the HPR tool forms the union of all input specifications, whatever input form.

Statechart Input Form

The stategraph (or statechart) defines a finite state machine, where each state has a state name. A top-level stategraph is always active, meaning it is in exactly one state. On the other hand, a stategraph that is instanced as a child stategraph within a state in another stategraph is inactive (not in any state) unless its parent is in that instantiating state. A state may instantiate any number of child stategraphs but recursion is not allowed. The full semantics are given in the H2 manual.

TODO: Include the spec

PSL Input Form

PSL consists of temporal logic path and tree assertions. The PSL input form is compiled into monitor automata that range over state assertions. Standard techniques are used for this.

TODO: Include the spec and more details

Internal Tool Flow

Tool Flow: Flip and NDFA Synthesis Algorithms

We now present some details of the inner working of the compilation process.

Tool Flow: Input Processing

The concrete syntax grammars for the three different input forms were combined in a single bison front end, to enable free and easy mixing of the forms in a single source file. However, as shown, the corresponding parts of the resulting AST are dispatched to separate front-end compilers to handle each language.

The different input forms are all converted to a common input form that is held in an array. This is a set of non-deterministic finite-state automata with assertions and actions in each state. The collection is stored as an HPR virtual machine so that it can be operated on using various functions from the HPR/LS Orangepath library. The array resembles a conventional machine memory, with each location holding an instruction. The instructions vocabulary is: state assertion, assignment, wait for clock, conditional branch and fork. Side information includes the starting address for each thread and clock net information. Unlike a conventional machine, the expressions occurring in the instructions can be arbitrarily rich and complex, encompassing any strict, non-blocking, referentially transparent functions. Console output is represented using a state assertion predicate that always holds yet which has side effects: it is an implementation of printf that returns true.

Tool Flow: Flip and NDFA Synthesis Algorithms

The NDFA code is converted to a DFA by first performing flip synthesis and then using an NDFA to FDA reduction algorithm inspired by SLR grammar parsing techniques.

The basic idea for the flip synthesis algorithm is that each assertion is converted to executable code by creating a sequence of assignments that makes it hold. In the common, simple case, of an assertion like a==b, one side of the operator is a mutable expression and the other is not. A mutable expression is generally a variable or output signal whereas an immutable expression is a constant or input signal. Expressions may need to be re-arranged ... we simply fail on overly complex assertions and ask the user to rewrite his code. Underspecified, ... Choice over assignments problem...

The output from the flip synthesis algorithm is a similar HPR machine, except that the state assertions are generally all replaced with executable code. The exception is user assertions that ... are finally wanted as runtime monitors in the output code.

Like SLR parser generator algorithm, used by bison, the NDFA synthesis algorithm operates using the notion of a current set of possible configurations all running in parallel with all but one finally being rejected. The generated DFA can be thought of as a parser that follows the behaviour of a set of nets, making reductions according to a grammar that is defined by a protocol.

Tool Flow: Output Generation

The final code generation stages are all provided by the HPR/LS library and are described elsewhere ...

Summary

The paper that will describe this work will show three different implementations of an on-chip bus protocol, prove them equivalent using model checking and demonstrate their usefulness for synthesis of transactors, bus monitors and a bus bridge in Verilog and SystemC. The generated designs were tested as part of the design flow for an ASIC in industry.

The three implementations use, respectively, property-specification language (PSL), SysML statecharts and a stylised C-like imperative form that can be interpreted either as an input behaviour or an output behaviour.

A new flip synthesis algorithm was presented, that can convert an executable specification for protocols into either the initiator, target or monitor for that protocol. A novel aspect of this work is that the input can be any mix of the three different languages and these are converted to a common form before processing by the algorithm.

Under construction: Dec 2007... DJG.

NB, the exclusion calls to hpr_mutexset are missing in this example code and also the byte lane enable logic is not fully specified.

References

Krste Asanovic. "Transactors for Parallel Hardware and Software Co-Design". International High Level Design Validation and Test Workshop, IEEE, November, 2007.

A formal semantics of UML statecharts by model transition systems (2002) by Daniel Varro in Proceedings ICGT 2002: International Conference on Graph Transformation, Lecture Notes in Computer Science Varro PDF.

Sysml: Prevostin PDF