Orangepath/HPR Logic Synthesis Project: Hardware and Embedded Software Synthesis from Executable Specifications.
Compilation from .net CIL Bytecode (second example)

Kiwi: Synthesis from .net CIL Bytecode: Simple RPC Call

In this example, the following C# program was compiled to .net CIL bytecode. The program has a second entry point that is callable from a separate compilation unit.

The Need for Separate Compilation

Separate and incremental compilation is needed in all large projects to handle scale, component reuse, revision control and is the basis for project management. It can also be a basis for parallelism. Here we use it as a mechanism to trade execution time against silicon space using the hardware server concept.

Moreover, the default approach of Kiwi is one of 'flattening' where the leaf components are largely replicated at each instantiation within a hierarchic design. KiwiC reuses ALUs and local variable registers in both the spatial and time domains, but tends to generate the largest and fastest circuit it can, subject to ALU instance count limits per thread set in the recipe. For larger designs, to manage complexity, it is desirable to designate a multi-ALU resources as a hardware server and have manual control over the number of instances generated. KiwiC will share such server instances in the time domain rather than instantiate as many as it needs (subject to ALU count limits).

I/O Paradigms

Low-level RTL essentially uses a shared variable paradigm for net-level interconnection between compilation units. The protocol is not manifest in the design and must be kept and checked elsewhere. Kiwi supports that for interconnection with specific hardware standards, but it is not the preferred approach.

Object-oriented software sends threads between compilation units to perform actions. Synthesisable Verilog and VHDL do not allow threads to be passed between separately-compiled circuits: instead, additional I/O ports must be added to each circuit and then wired together at the top level. Accordingly, we mark up methods that are to be called from separate compilations with a remote attribute.

  [Kiwi.Remote("parallel:four-phase")]
  public return_type entry_point(int a1, bool a2, ...)
  { ... }

When an implemented or up-called method is marked as `Remote', a protocol is given and KiwiC generates additional I/O terminals on the generated RTL that implement a stub for the call. The currently implemented protocol is asynchronous, using a four-phase handshake and a wide bus that carries all of the arguments in parallel. Another bus, of the reverse direction, conveys the result where non-void. Further protocols can be added to the compiler in future, but we would like to instead lift them so they can be specified with assertions in C# itself.

Protocol selection: Four-phase Handshake

KiwiC generates several possible protocols for unit interconnection. Here we use HSIMPLE four phase. Reference material regarding the four-phase handshake protocol is described HERE.

Other protocols supported by KiwiC are BVCI and HFAST. HFAST has higher throughput. BVCI supports multiple outstanding requests in order. External bus multiplexors can map a number of in-order HFAST clients onto a single AXI or other out-of-order bus protocol.

C# Source Code: Remote Procedure Call

The C# code has both a 'Main' entry point that is mentioned on the KiwiC command line and a secondary entry point, defined by the Kiwi.Remote attribute, that is compiled by KiwiC so as to be invoked by a separately-compiled hardware component. We can also use this style of interface when calling from software on a soft-core to hardware generated by KiwiC.

//
// Remote procedure call test (H/W port generation).
//
class test10
{
    static int limit = 10;
    static int jvar;

    [ Kiwi.Remote("client1-port", "parallel: four-phase") ]
    public static int bumper(int delta)
    {
        jvar += delta;
	return jvar;
    }

    public static void Main()
    {
 	Console.WriteLine("Test 10 Limit=" + limit);
   	for (jvar=1;jvar<=limit;jvar+=2) 
	{
	  Console.Write(jvar + " ");
	  Kiwi.Pause();
        }  
     	Console.WriteLine(" Test 10 finished.");
    }
}

A problem with the above code is a structural hazard that leads to a possible race when the variable jvar is potentially updated twice on the same clock cycle: once by the Main method and once by the remote invoker.

KiwiC detects the race problem, giving the following cryptic warning and will chose an arbitrary resolution:

VNL.v_1 incompatible assigns: consistency check
  lhs=pcnet105  g=true
  v=1:pcIS1:AG
 v'=4:pcIS4:AG

Output code

KiwiC will generate hardware both for the client and the server as separate RTL files. In more-realistic examples, there will be multiple files, with one being the top-level that contains client calls to some of the others which in turn make client calls to others, with the leaf modules in the design hierarchy being servers only.

One can also envision leaf modules in the design hierarchy making upcalls to parents, but this is not currently implemented in Kiwi.

For the above example, the generated server RTL looks as follows:

// Verilog output file generated at Thu Sep 18 23:43:12 BST 2008
// KiwiC: HPR Orange IL/.net front end: Version alpha 21: 16-Sep-08
// -root test10 -vnl VNL.v -preserve-sequencer 1 
module VNL(reset, clk, bumper_delta, bumper_ack, bumper_req, bumper_retval);
  input reset;
  input clk;
  input bumper_delta;
  output bumper_ack;
  input bumper_req;
  output bumper_retval;
  reg [1:0] testtest10pc;
  reg [2:0] testtest11pc;
  reg bumper_ack;
  integer test10_jvar;
  integer test10_limit;
   always @(posedge clk) begin //Start HPR test10_test13
          if (reset) testtest10pc <= 0;
           else 
          case (testtest10pc)
          
          0:  begin testtest10pc <= 1;
                 test10_jvar <= 1;
                 $display("%s%d", "Test 10 Limit=", 10);
                 
                 $write("%d%s", 1, " ");
                 
                 
                  end 
          1:  begin if (9<=test10_jvar) testtest10pc <= 2;
                 if (test10_jvar<9) test10_jvar <= test10_jvar+2;
                 if (9<=test10_jvar) 
                   begin test10_jvar <= test10_jvar+2;
                      $display(" Test 10 finished.");
                      
                      
                       end if (test10_jvar<9) $write("%d%s", test10_jvar+2, " ");
                 
                 
                  end 
          2:  begin 
                  end endcase
          //End HPR test10_test13


          //Start HPR test10_test12
          if (reset) testtest11pc <= 0;
           else 
          case (testtest11pc)
          
          0: testtest11pc <= 1;
          
          1: if (!bumper_req) testtest11pc <= 2;
          
          2:  begin testtest11pc <= 3;
                 bumper_ack <= 0;
                 
                  end 
          3: if (bumper_req) testtest11pc <= 4;
          
          4:  begin testtest11pc <= 5;
                 test10_jvar <= test10_jvar+bumper_delta;
                 
                  end 
          5:  begin 
                  end endcase
          //End HPR test10_test12
           end 
          // Start delx test10_test13
  // End delx test10_test13


  // Start delx test10_test12
  // End delx test10_test12


  endmodule

// eof (hprls verilog)

Fixing Races Using a Mutex

To overcome the race condition, each update to the shared variable must be guarded.

We can add explicit mutex variables to the code or use the C# lock primitive.

Here is one of the access sites augmented with explicit synchronisation primitives:

  [ Kiwi.Remote("client1-port", "parallel: four-phase") ]
    public static int bumper(int delta)
    {
        while(!mutex1) hpr_testandset(ref mutex1, 1);
        jvar += delta;
        hpr_testandset(ref mutex1, 0);
        return jvar;
    }

Alternatively we can use C#s lock keyword. Lock is often applied to the current object instance but for a static method we do not have an instance. We create a dummy object just to lock on.

   Object my_mutex = new Object();

   [ Kiwi.Remote("client1-port", "parallel: four-phase") ]
   public static int bumper(int delta)
   {
      Lock(my_mutex) { jvar += delta; }
      return jvar;
   }

To apply the mutex to the for loop requires a little restructuring since it is a most unpleasant example that probably has little real-world use:

   jvar=1;
   bool bb = true;
   while(bb)
    {
       lock(my_mutex) { bb = jvar<=limit; jvar+=2; } 
       if (bb) ...
    }

Conclusion

We have seen how to manually control KiwiC compilation, making inter-compilation unit calls and so that a resource can be shared instead of expanding in line.

When multiple components are assembled with sharing of ports, whether it is the DRAM interface or multiple clients invoking one server, multiplexors and arbiters are required. KiwiC uses a standard set of protocols, so only a finite set of arbiters and multiplexors is needed. Another note will discuss their relative performance and the level of automation available for instantiating them.


(C) DJ Greaves and S Singh, April 2008.               UP.