Kiwi Developers' Guide and Compiler Internal Operation

KiwiC Internal Operation

Figure 3: The main components of the default KiwiC flow using the default recipe (KiwiC00.rcp) in the KiwiC tool.

Figure 4: The internal flow of the KiwiC front-end recipe stage.

KiwiC is a compiler for the Kiwi project. It aims to produce an RTL design out of a named sub-program of a C# program.

KiwiC does not currently invoke the C# compiler: instead it reads a CIL portable assembly language file (.exe or .dll) generated by a Microsoft or Mono C# compiler.

Figure 3 shows key components of the main flow through the tool as set up with the provided recipe file (KiwiC00.rcp). The full recipe contains ten or so stages and the obj folder created by running the tool contains the log files and intermediate forms for each stage. Other output flows and formats can be deployed by changing the recipe. The dotted line shows that using the simvnl command line option the internal simulator (Diosim) can be applied to the RTL after it has been round-tripped through Verilog. For debugging, Diosim can be applied to any HPR machine intermediate form, by varying the recipe. (There's also a shortcut `-conerefine=disable -repack=disable -verilogen=disable' that will cause diosim to run the original VM generated by the KiwiC front end without conversion to hardware). This is needed for the profile-directed feedback.

The .NET executable bytecode is read using the Mono.Cecil front end. Any needed libraries, including Kiwi.dll and Kiwic.dll are also read in. These are combined with some canned (hardwired in the front end) system libraries. The result is a large CIL abstract syntax tree. This can be output for tracing/debugging if desired (using the kiwic-cil-dump flag).

The KiwiC front end (IL Elaborate stage) converts the .net AST to the internal representation used by the core HPR/LS library. This is the HPR VM2 machine.

The VM code emitted by KiwiC front end is a set of parallel `DIC' blocks. These are `directly indexed code' arrays of imperative commands and there is one for each user thread. They are placed in parallel using the PAR construct. Each DIC array is indexed by a program counter for that thread. There is no stack or dynamic storage allocation. The statements are: assign, conditional branch, exit and calls to certain built-in functions, including hpr_testandset, hpr_printf and hpr_barrier. The expressions occurring in branch conditions, r.h.s. of assignment and function call arguments still use all of the arithmetic and logic operators found in the IL input form. In addition, limited string handling, including a string concat function are handled, so that console output from the CIL input is preserved as console output in the generated forms (eg. $display in Verilog RTL).

Memory disambiguation and partitioning into statically-sized memories and DRAM is done by the repack receipe stage (§29) . The KiwiC front end has labelled every storage operation with a storage class. Repack conglomorates classes that are assigned between and then uses arithmetic pointer analysis rules for alias analysis. Its input is an HPR VM where every variable and array location has a virtual address (hidx) in a so-called wondarray. A wondarray is allocated for every dotnet datatype (except structs). The wondarray contains $2^64$ words of that datatype but only the words on integer multiples of the datatype's size in bytes are used. The output from repack has had all of these mapped to scalars or to smaller 1-D arrays and each is branded with an identifier. Some input variables to repack have been allocated a reserved `unadressable' hidx which means they are scalar and do not have their address taken. These go through repack without modification and appear as identical scalars in the repack output. In Kiwi use, these correspond to static variables.

The conerefine recipe stage deletes unused parts of the design. A part of the design is unused if it generates no output. Outputs include PLI calls like Console.WriteLine or net-level outputs flagged with Kiwi.OutputWordPort or similar. Object and array handles that are not manipulated actively by the program are removed.

The conversion from imperative code to FSM is performed, normally, by bevelab, described in §24. This allocates work to clock cycles based on the Kiwi.Pause() statements manually embedded by the designer or automatically inserted by the KiwiC front end. The bevelab output is an HPR machine where every statement from every thread nominally operates in parallel -- i.e. pure RTL. However, some PC-like annotations are retained for easily projection (and re-encoding) in FSM form. FSM re-encoding for thread's controller will later typically be done by the FPGA tools to simplify the controller output decode function.

The restructure recipe stage §30) binds and schedules operations and storage to physical resources. Storage decisions are made as to which vectors and scalars to place in what type of component (flip-flops, unregistered SRAM, registered SRAM, DP SRAM or off-chip in DRAM) and which structural instance thereof to use. ALU's and other primitives are also instantiated and bindings of program operations are made. Owing to the FSM annotations preserved by bevelab, the binder can easily determine which RTL statements are disjoint. Each state in the input FSM potentially becomes multiple, so-called, microstates in the output as structural hazards on memory ports are avoided and pipelined ALU operations are composed. Allocation decisions are based on heuristic rules parametrised by command-line flags and recipe file values, such as the number of floating-point multipliers per thread.

The output forms available include Verilog RTL, which we have used for FPGA layout. The stylised output from the FSM generation stage is readily converted to a list of Verilog non-blocking assignments.

Background: HPR/LS Library (aka Orangepath)

HPR L/S (aka Orangepath) is a library and framework designed for synthesis and simulation of a broad class of computer systems, protocols and interfaces in hardware and software forms. The Orangepath library provides facilities for a number of experimental compilers.

The primary internal representation (IR) is a so-called HPR VM2 virtual machine. The framework consists of a number of plugins that each input and output components represented in this IR. Hence plugins can be applied in any order. Hence, in type terms at least, all operations are `src-to-src'. But in practice, certain forms cannot be used in certain places: for instance a VM2 containing RTL code cannot be rendered directly as C++ (it would have to be passed through the bevelab plugin first).

HPR virtual machines and the operations are stored in a standard opath command format to be executed by an Orangepath recipe (program of commands).

A characteristic feature of Orangepath is that plugins can potentially, always be applied in any order and often have inverses. For instance a plugin that outputs RTL is reveresed by a plugin that reads in RTL. A plugin that performs HLS from behavioural code to RTL would be reversed that by a plugin that gives a single-threaded imperative program from a large body of parallel RTL code.

A simulator plugin, called diosim, is able to simulate the IR in any form and, in particular, is able to simulate interactions between parts of the system defined in different styles. For instance it can simulate a pair of CPU cores communicating with each other where one is modelled in RTL and the other as a cycle-callable ISS. Asynchronous I/O and network hardware is also modellable with these primitives.

A so-called recipe, which is an XML file, invokes the plugins in a particular order, supplying parameters to them. The input and output of each recipe stage is a so-called HPR VM2 machine. Loops in the recipe can be user to repeat a step until a property holds. The opath core provides command line handling so that parameters from the recipe and the command line are combined and fed to the plugin components as they are invoked. The opath core also processes a few `early args' that must be at the start of the command line. These enable the recipe file to be specified and the logging level to be set.

The Orangepath library has plugins that support a variety of external input and output formats.

An HPR VM2 machine contains scalar and 1-D array declarations, imperative code sections and assertions.

Values are signed and unsigned integers of any width and floating point of any width is also supported in the framework but library components currently only work for IEEE 32 and 64 bit formats. Enumeration types are also supported, the most important being the boolean type. For all enumerations, an exclusion principle is applied, in that if an expression is known not to be any but one of the values of enumeration, then it must be that one value. Booleans are held differently from other enumerations internally but all expressions on enumerations are only stored in minimised form (using Espresso or otherwise). The library supports a great deal of constant folding and identity operation elimination (such as multiplying by zero or one). It has limited handling for strings and string constants, which are either treated in the same way that they are handled in Verilog, which is as an expression or register of width 8 times the string length in characters, or as a special string handle type (where widtho=-1). But the Kiwi front-end and the repack stage can map a fixed set of strings to an enumeration type of a suitable width with the strings stored only once and indexed by the enumeration.

Expressions are held memoised, and in a normal form, as far as possible, that makes identity checking and common sub-expression reuse easier. This is especially useful to be able to rapdily confirm, as often as possible, index expression equality or inequality, to avoid name alias RaW/WaW dependencies on arrays and loop value forwarding for sequential access patterns.

The imperative code is in any mix of RTL and DIC forms. RTL contains register transfer assignments, partitioned into clock domains, where all assignments in a clock domain run in parallel on the active edge of the clock. There is also a combinational domain that has no clock. The DIC imperative form (directly indexed code) is an array of statements indexed by a program counter, where the main statements are: scalar assignment, 1-D array assignment, library call and conditional branch within the array. Code sections can be in series or parallel with each other, using CSP/Occam-like SER and PAR blocks. Assertions are coded in temporal logic and associated with a clock domain, just like PSL (property specification language). And a dataflow/transport-triggered IR form is being implemented at the moment.

Dynamic storage allocation is also being added.

HPR L/S (aka Orangepath) represents a system as an hierarchy of abstract machines. Each machine is a collection of declarations, executable code and assertions/goals. The goals are assertions about the system behaviour, input directly, or generated from compilation of temporal logic and data conservation rules into automata. Executable code can pass through the system unchanged, but any undriven internal nodes are provided with driver code that ensures the system meets its goals.

For Kiwi use, the opath default recipe file is KiwiC00.rcp.

In this manual, we concentrate almost entirely on the .NET CIL input format and the Verilog RTL output format.

Figure 5: The main flow implemented in the KiwiC tool (same as figure [*]).

Internal Working of the KiwiC front end recipe stage

The IL Elaborate stage is implemented by the the FSharp files kiwipro/kiwic/src/*.fs. It reads in CIL code and writes out HPR `dic' form code. Internally it converts from CIL to, so-called, kcode, before generating HPR code. The kcode can be rendered to a file for debugging/inspection using the kiwic-kcode-dump flag. The dotnet VM is a stack machine and the dotnet code is stack code. The stack is removed during the conversion to kcode. Kcode is neither stack or register code: all data is instead stored in wondarrays or global static variables.

CIL code is the assembly language used by the mono and .NET projects. Like other assembly languages, it has an assembler and disassembler for converting between binary and human-readable forms. KiwiC originally read the assembly using a bison parser but now reads the binary using the mono.cecil libraries.

Front end flow steps are:

  1. Perform first pass of each invoked method body in isolation.

  2. Perform a symbolic execution of each thread at the CIL basic block level and emit kcode for each block. CIL branch instructions and CIL label names that are branch destinations define the basic block boundaries. This inlines all dotnet method applications.

  3. Optimise the kcode within each thread using constant folding.

  4. Analyse kcode to find the end of static elaboration point in each thread's lasso structure.

  5. Perform register allocation (colouring) for the run-time part of each lasso.

  6. Prefix start-up code from static class and method constructors to the lasso stem of the main thread.

  7. Perform symbolic evaluation of the kcode and emit HPR code. Further thread starts may be detected, which causes recursive activation of most of the steps above. Each thread becomes a separate HPR dic.

  8. Perform dataflow analysis of the kcode to establish and conglomorate label region names (storeclasses) and points-at relationships.

The front end peforms a first pass of every method body that will be needed. This finds the basic block boundaries and the dotnet stack depth at every branch or jump. It gives a symbolic name to every code site where a type is needed. It symbolically executes the code using types without data and ignoring the control flow. Basic blocks that commence or resume with values on the dotnet stack are modified to avoid this situation by defining additional local variables, known as spills, and byprefixing with loads and postfixing with stores. These spill variables are frequently optimised away within the front end, but if they hold data over a Kiwi.Pause() they may appear in the output RTL. All return statements within a method are replaced with a branch to the end of the method. This sets up all the ground work for removing the dot net stack, on the fly, each time the method is called.

A -root command line flag or HardwareEntryPoint attribute enables the user to select a number of methods or classes for compilation. The argument is a list of heriarchic names, separated by semicolons. Other items present in the CIL input code are ignored, unless called from the root items.

Where a class is selected as the root, its contents are converted to an RTL module with IO terminals consisting of various resets and clocks that are marked up in the CIL with custom attributes (see later, to be written). The constructors of the class are interpreted at compile time and all assignments made by these constructors are interpreted as initial values for the RTL variables. Where the values are not further changed at run time, the variables turn into compile-time constants and disappear from the object code.

Where a class is selected as a root, all of the methods in that class will be compiled as separate entry points and it is not normally appropriate for one to call another: calls should generally be to methods of other classes.

Where a method is given as a root component, its parameters are added to the formal parameter list of the RTL module created. Where the method code has a preamble before entering an infinite loop, the actions of the preamble are treated in the same way as constructors of a class, viz. interpreted at compile-time to give initial or reset values to variables. Where a method exits and returns a non-void value, an extra parameter is added to the RTL module formal parameter list.

The VM code can be processed by the HPR tool in many ways, but of interest here is the 'convert_to_rtl' operation that is activated by the '-vnl' command line option. (NB: This is now on by default in the KiwiC00 recipe, disable with -verilog-gen=disable).

   KiwiC TimesTable.exe -root `TimesTable;TimesTable.Main' -vnl TimesTable.v

More than one portable assembly (CIL/PE) file can be given on the command line and KiwiC will aggregate them. The file name of the last file listed will be used to name the compilation outputs by default (in the absence of other command line flags).

(At some point, KiwiC might be extended to also invoke the C# compiler if given a C# file.)

David Greaves 2016-12-05