Orangepath/HPR Logic Synthesis Project: Hardware and Embedded Software Synthesis from Executable Specifications.
Compilation from .net CIL Bytecode (second example)

Kiwi Scientific Acceleration: Multi-FPGA Designs, Incremental Compilation and Remote Procedure Call (via IP-XACT)

In this example, the following C# program was compiled to .net CIL bytecode. The program has a second entry point that is callable from a separate compilation unit.

Note: This page currently shows an old demo and will shortly be replaced with our HPR System Integrator approach that uses either IP-XACT alone or instancing via Kiwi C#. HPR System Integrator can instantiate imperative and pure functional modules created by KiwiC or soucred from third parties, provided they are accompanied with an XML meta-data file that indicates their primary features.

Features include fixed/variable latency, precision, initiation interval, clobbers, stateless ...

One blocks are separately compiled, they can be combined using this tool: Multi-FPGA logic partitioner, instantiator and structural wiring generator: HPR System Integrator.

I/O Paradigms

Low-level RTL essentially uses a shared variable paradigm for net-level interconnection between compilation units. The protocol is not manifest in the design and must be kept and checked elsewhere. Kiwi supports that for interconnection with specific hardware standards, normall in Hard Pause mode, but it is not the preferred approach.

Object-oriented software sends threads between compilation units to perform actions. Synthesisable Verilog and VHDL do not allow threads to be passed between separately-compiled circuits: instead, additional I/O ports must be added to each circuit and then wired together at the top level. Accordingly, we mark up methods that are to be called from separate compilations with a remote attribute.

  public return_type entry_point(int a1, bool a2, ...)
  { ... }

When an implemented or up-called method is marked as `Remote', a protocol is given and KiwiC generates additional I/O terminals on the generated RTL that implement a stub for the call. The currently implemented protocol is asynchronous, using a four-phase handshake and a wide bus that carries all of the arguments in parallel. Another bus, of the reverse direction, conveys the result where non-void. Further protocols can be added to the compiler in future, but we would like to instead lift them so they can be specified with assertions in C# itself.

Protocol selection: Four-phase Handshake

KiwiC generates several possible protocols for unit interconnection. Here we use HSIMPLE four phase. (2016 note: AXI streaming is now the prefered protocol). Reference material regarding the four-phase handshake protocol is described HERE.

Other protocols supported by KiwiC are BVCI and HFAST. HFAST has higher throughput. BVCI supports multiple outstanding requests in order. External bus multiplexors can map a number of in-order HFAST clients onto a single AXI or other out-of-order bus protocol.

C# Source Code: Remote Procedure Call

The C# code has both a 'Main' entry point that is mentioned on the KiwiC command line and a secondary entry point, defined by the Kiwi.Remote attribute, that is compiled by KiwiC so as to be invoked by a separately-compiled hardware component. We can also use this style of interface when calling from software on a soft-core to hardware generated by KiwiC.

// Remote procedure call test (H/W port generation).
class test10
    static int limit = 10;
    static int jvar;

    [ Kiwi.Remote("client1-port", "parallel: four-phase") ]
    public static int bumper(int delta)
        jvar += delta;
	return jvar;

    public static void Main()
 	Console.WriteLine("Test 10 Limit=" + limit);
   	for (jvar=1;jvar<=limit;jvar+=2) 
	  Console.Write(jvar + " ");
     	Console.WriteLine(" Test 10 finished.");

A problem with the above code is a structural hazard that leads to a possible race when the variable jvar is potentially updated twice on the same clock cycle: once by the Main method and once by the remote invoker.

KiwiC detects the race problem, giving the following cryptic warning and will chose an arbitrary resolution:

VNL.v_1 incompatible assigns: consistency check
  lhs=pcnet105  g=true

Output code

KiwiC will generate hardware both for the client and the server as separate RTL files. In more-realistic examples, there will be multiple files, with one being the top-level that contains client calls to some of the others which in turn make client calls to others, with the leaf modules in the design hierarchy being servers only.

One can also envision leaf modules in the design hierarchy making upcalls to parents, but this is not currently implemented in Kiwi.

For the above example, the generated server RTL looks as follows:

// Verilog output file generated at Thu Sep 18 23:43:12 BST 2008
// KiwiC: HPR Orange IL/.net front end: Version alpha 21: 16-Sep-08
// -root test10 -vnl VNL.v -preserve-sequencer 1 
module VNL(reset, clk, bumper_delta, bumper_ack, bumper_req, bumper_retval);
  input reset;
  input clk;
  input bumper_delta;
  output bumper_ack;
  input bumper_req;
  output bumper_retval;
  reg [1:0] testtest10pc;
  reg [2:0] testtest11pc;
  reg bumper_ack;
  integer test10_jvar;
  integer test10_limit;
   always @(posedge clk) begin //Start HPR test10_test13
          if (reset) testtest10pc <= 0;
          case (testtest10pc)
          0:  begin testtest10pc <= 1;
                 test10_jvar <= 1;
                 $display("%s%d", "Test 10 Limit=", 10);
                 $write("%d%s", 1, " ");
          1:  begin if (9<=test10_jvar) testtest10pc <= 2;
                 if (test10_jvar<9) test10_jvar <= test10_jvar+2;
                 if (9<=test10_jvar) 
                   begin test10_jvar <= test10_jvar+2;
                      $display(" Test 10 finished.");
                       end if (test10_jvar<9) $write("%d%s", test10_jvar+2, " ");
          2:  begin 
                  end endcase
          //End HPR test10_test13

          //Start HPR test10_test12
          if (reset) testtest11pc <= 0;
          case (testtest11pc)
          0: testtest11pc <= 1;
          1: if (!bumper_req) testtest11pc <= 2;
          2:  begin testtest11pc <= 3;
                 bumper_ack <= 0;
          3: if (bumper_req) testtest11pc <= 4;
          4:  begin testtest11pc <= 5;
                 test10_jvar <= test10_jvar+bumper_delta;
          5:  begin 
                  end endcase
          //End HPR test10_test12
          // Start delx test10_test13
  // End delx test10_test13

  // Start delx test10_test12
  // End delx test10_test12


// eof (hprls verilog)

Fixing Races Using a Mutex

To overcome the race condition, each update to the shared variable must be guarded.

We can add explicit mutex variables to the code or use the C# lock primitive.

Here is one of the access sites augmented with explicit synchronisation primitives:

  [ Kiwi.Remote("client1-port", "parallel: four-phase") ]
    public static int bumper(int delta)
        while(!mutex1) hpr_testandset(ref mutex1, 1);
        jvar += delta;
        hpr_testandset(ref mutex1, 0);
        return jvar;

Alternatively we can use C#s lock keyword. Lock is often applied to the current object instance but for a static method we do not have an instance. We create a dummy object just to lock on.

   Object my_mutex = new Object();

   [ Kiwi.Remote("client1-port", "parallel: four-phase") ]
   public static int bumper(int delta)
      Lock(my_mutex) { jvar += delta; }
      return jvar;

To apply the mutex to the for loop requires a little restructuring since it is a most unpleasant example that probably has little real-world use:

   bool bb = true;
       lock(my_mutex) { bb = jvar<=limit; jvar+=2; } 
       if (bb) ...


We have seen how to manually control KiwiC compilation, making inter-compilation unit calls and so that a resource can be shared instead of expanding in line.

When multiple components are assembled with sharing of ports, whether it is the DRAM interface or multiple clients invoking one server, multiplexors and arbiters are required. KiwiC uses a standard set of protocols, so only a finite set of arbiters and multiplexors is needed. Another note will discuss their relative performance and the level of automation available for instantiating them.

(C) DJ Greaves and S Singh, April 2008. Being updated June 2017.               UP.