Orangepath/HPR/LS Project: Hardware and Embedded Software Synthesis from Executable Specifications.
H2 Front-end Language Description.

The Orangepath HPR/LS Framework.

Overview

HPR L/S (aka Orangepath) is a library and framework designed for synthesis and simulation of a broad class of computer systems, protocols and interfaces in hardware and software forms. The Orangepath library provides facilities for a number of experimental compilers.

The aim is to 'seamlessly' model both hardware and software in a form that suits easy co-synthesis and co-simulation. It also includes some temporal logic for assertions. Software can exist as both machine code/assembler and a high-level, block-structured, AST form.

VM2 and Plugin

The primary internal representation (IR) is a so-called HPR VM2 virtual machine.

The framework consists of a number of plugins that each input and output components represented in this IR. Hence plugins can be applied in any order. Hence, in type terms at least, all operations are `src-to-src'. But in practice, certain forms cannot be used in certain places: for instance a VM2 containing RTL code cannot be rendered directly as C++ (it would have to be passed through the bevelab plugin first).

A characteristic feature of Orangepath is that plugins can potentially, always be applied in any order and often have inverses. For instance a plugin that outputs RTL is reversed by a plugin that reads in RTL. A plugin that performs HLS from behavioural code to RTL would be reversed that by a plugin that gives a single-threaded imperative program from a large body of parallel RTL code.

A simulator plugin, called diosim, is able to simulate the IR in any form and, in particular, is able to simulate interactions between parts of the system defined in different styles. For instance it can simulate a pair of CPU cores communicating with each other where one is modelled in RTL and the other as a cycle-callable ISS. Asynchronous I/O and network hardware is also modellable with these primitives.

A so-called recipe, which is an XML file, invokes the plugins in a particular order, supplying parameters to them. The input and output of each recipe stage is a so-called HPR VM2 machine. Loops in the recipe can be user to repeat a step until a property holds. The opath core provides command line handling so that parameters from the recipe and the command line are combined and fed to the plugin components as they are invoked. The opath core also processes a few `early args' that must be at the start of the command line. These enable the recipe file to be specified and the logging level to be set.

The Orangepath library has plugins that support a variety of external input and output formats.

An HPR VM2 machine contains scalar and 1-D array declarations, imperative code sections and assertions. And a dataflow/transport-triggered IR form is being implemented at the moment.

Values are signed and unsigned integers of any width and floating point of any width is also supported in the framework but library components currently only work for IEEE 32 and 64 bit formats. Enumeration types are also supported, the most important being the boolean type. For all enumerations, an exclusion principle is applied, in that if an expression is known not to be any but one of the values of enumeration, then it must be that one value. Booleans are held differently from other enumerations internally but all expressions on enumerations are only stored in minimised form (using Espresso or otherwise). The library supports a great deal of constant folding and identity operation elimination (such as multiplying by zero or one). It has limited handling for strings and string constants, which are mostly treated in the same way that they are handled in Verilog, which is as an expression or register of width 8 times the string length in characters. (But the Kiwi front-end can map a fixed set of strings to an enumeration type of a suitable width with the strings stored only once and indexed by the enumeration.)

Expressions are held memoised, and in a normal form, as far as possible, that makes identity checking and common sub-expression reuse easier. This is especially useful to be able to rapdily confirm, as often as possible, index expression equality or inequality, to avoid name alias RaW/WaW dependencies on arrays and loop value forwarding for sequential access patterns.

The imperative code is in any mix of RTL and DIC forms. RTL contains register transfer assignments, partitioned into clock domains, where all assignments in a clock domain run in parallel on the active edge of the clock. There is also a combinational domain that has no clock. The DIC imperative form (directly indexed code) is an array of statements indexed by a program counter, where the main statements are: scalar assignment, 1-D array assignment, library call and conditional branch within the array. Code sections can be in series or parallel with each other, using CSP/Occam-like SER and PAR blocks. Assertions are coded in temporal logic and associated with a clock domain, just like PSL (property specification language).

Dynamic storage allocation is also being added.

Message-passing, CSP-like channels are another thing that should perhaps be added as a primitive form but currently these are constructed out of thread synchronisation primitves in a shared-variable paradigm.

Variable Declarations

Variables are signed and unsigned integers of various precisions, single and double precision floating point and 1-D arrays of such variables. A small amount of string handling is also provided. All variable are static (no dynamic storage) and must be unique in a single namespace that spans the system. The variables are declared inside a given VM and may be global or local. Global variables may be accessed by code and assertions in any VM and local ones should (not enforced) only be accessed in locally (or in son machines?).

Expressions

Expressions commonly use the hexp_t form and commands use the hbev_t form. Single-bit variables have hbexp_t form. A library of 'ix_xxx' primitives can be called as functions or procedures from hexp_t, hbexp_t and hbev_t respectively. Expressions are all stored in a memoising heap using weak pointers.

Imperative Code Forms

The executable code of a VM has several basic forms (dic, asm, rtl, cmd, fsm). All code and assertions access the variables for read and write (but assertions don't tend to write!) regardless of form.

DIC - Directly-indexed array:  Imperative program (assign/conditional branch/builtin call) stored in an array indexed by a PC.
ASM - Assembler for a local family of microprocessors
RTL - Register transfer-level code - a set of parallel assignments to be executed on an event.
CMD - Abstract syntax tree of a behavioural program (for/while/break/continue/assign/if etc) or single assigment statement.
FSM - Finite state machine form - like RTL but collated into disjoint sets

The executable code may be clocked or non-clocked. Fragments may be put in serial or parallel using the SP_par and SP_seq combinators. There are two variants of SP_par, for lockstep and asynchronous composition.

Dataflow/transport-triggered Code Forms

... TBD ...


Further executable forms, just being added are executable dataflow graphs:

  VSDG - a dataflow graph for a single basic block with additional state edges representing memory order constraints.
  VSFG - an executable form of the VSDG where back edges in the control flow graph are represented using nested graphs.

Plugins

Other descriptive text ...

Under Construction/Editing
A system is a tree of VMs where each may be the root of a tree of VMs.


The library is structured as a number of components that operate on a
VM to return another VM.  The opath (orangepath) mini-language enables
a 'recipe' to be run that invokes a sequence of library operations in
turn.  An opath recipe is held in an XML file.


Loops in the recipe can be used to repeat a step until a property holds.

In principle it is possible to load and save VMs to disk (serialised in XML) and so incremental compilation
at intermediate points in the opath recipe is a future option.

The opath core provides command line handling so that parameters from
the recipe and the command line are fed to the components. The opath
core also processes a few 'early args' that must be at the start
of the command line. These enable the recipe file to be specified and
the logging level to be set.

Source Files and Plugins

HPR L/S is written in FSharp.

Main/core files describing the VM:

  hprls_hdr.fs       - main expressions and commands
  abstract_hdr.fs    - main VM structures
  abstracte.fs       - Contains a compiler to convert block-structured forms (such as 'if' and 'while') into dic basic/fortran style code.
  meox.fs             - memoizing heap with simple algebraic tautologies eliminated.

Synthesis/Elaborate Plugins:

  compose.fs          - Combine separate FSMs together and reorganise ready for RTL generation.

Behavioural elaborators: convert threaded programs to a more formal FSM form (used to generate a 'pure RTL' parallel form but now that step is achieved by calling compose on just one machine ?)
  bevctrl.fs
  bevelab.fs

Cone of influence plugin - trims code that has not effect on observable outputs
  conerefine.fs

Plugin to overcome structural hazards (c.f. dpmap)
  restructure.fs -  * ALU and RAM instances are created and structural hazards resolved.
  systolic.fs ?

Gate-level synthesis plugin (normally not used - RTL output is fine normally)
  gbuild.fs
  genericgates.fs

Datapath mapper pluging (experimental)
  dpmap.fs


Measurement plugins

Static timing analyser
  stimer.fs

Simulator
  diosim.fs

Output plugins:

Performance predictor output renderer (currently boiled inside restructure.fs but needs splitting out)

Area predictor output renderer (currently boiled inside verilog_gen.fs but needs splitting out)



SystemC output plugin
   cpp_render.fs

Xml input and output:
  hprxml.fs

Plugin Generate assembler or machine code output for a microcontroller
  microcode_hdr.fs

Verilog output plugin
   verilog_gen.fs - this also bit-blasts to a gate-level net list if needed. It can also compute area estimates for circuits.
   verilog_hdr.fs
   verilog_render.fs

Misc libraries and framework infrastructure

Opath is main framework:
   opath.fs
   opath_hdr.fs
   opath_interface.fs
   linepoint_hdr.fs
   linprog.fs
   yout.fs  - Logging code
   moscow.fs - misc library (compatibility shim for moscow ML)
   tableprinter.fs
   dotreport.fs

There are a few other plugins coded still in Moscow ML that are not ported to the FSharp version, such as Verilog RTL input and model checker I/O ...

Semantics

Expression Widths

One of the critical aspects of semantics to get right relates to integer promotion. Verilog and C++ differ widely over their implicit rules.

Semantics of bit-widths in Verilog expressions are a little bit unusual. Expressions are either self-determining in width or context-determined.

Jonathan Bromley (Dulous) writes: All operands are widened to the same width as the widest - including the left-hand side. Then, do the arithmetic in that width. Then assign the result to the left-hand side, dropping more-significant bits if necessary. If ALL the right-hand side operands are signed, then widening is performed by sign extension rather than by zero-fill. If ANY right-hand side operand is unsigned, then ALL widening and arithmetic is unsigned. This behaviour is NOT affected by whether the left-hand side is signed or unsigned.

Self-determined expressions are those inside a cat (hence a+b will typically drop the carry inside a cat), on the rhs of shifts and the operands to comparison predicates and unary reductions where the context is a bool and would be meaningless.

A problem can arise with "UInt32 value = ((UInt32)key << 24);" where key is a byte. Unlike C/C++, Verilog does not have integer promotion and the rhs operand to a shift, being 32 bits wide does not make the expression compute in 32 bits. The left shifting a byte by 24 bits makes it zero in Verilog semantics.

A related problem is demonstrated with operators that can cause 'carries' such as addition, multiplication, subtraction and negate.

The HPR L/S operators are defined to work using integer promotion, as per C++. CSharp requires explicit casts in all places anyway, and these are manifest in the hexp_t syntax tree.

Memory Sequential Consistency

There is no central 'main' memory intrinsic to the Orangepath model. Instead, any VM2 can have any number of 1-D arrays of any size and word width. One of these is typically used to model each bank of system DRAM etc.. A 1-D array of any VM2 can be remotely read and written by any other VM and each VM itself can contain concurrent activities.

The simulation semantics implemented by diosim model are complete sequential consistency. This is an accurate model for real hardware at the lowest level too. Other sequential consistency models and fence mechanisms arise only from delay and buffering present in interconnection networks between a reference point (such as the front-side bus of a CPU core or other active bus master) and the physical memory component (DRAM chip or instantiated SRAM). These interconnection networks (caches, busses, NoC and write buffers) are modelled inside the VM2 by the user when appropriate. They are not built-in.

(A high-performance, multicore implementation of diosim may have issues providing high-performance sequential consistency in the future, but the current diosim implementations is single-threaded.)

HPR L/S Library Primitives

... TBD ... See the table at the top of

// Built-in primitive functions:
let hpr_native_table:(int * string * native_fun_signature_t) list =
  [
      //
    (0, "hpr_pause",   { g_default_native_fun_sig with  biosnumber=0; rv=g_void_prec; args=[g_default_prec]; yielding=true; nonref=true; eis=true } );
    (1, "hpr_testandset", { g_default_native_fun_sig with  biosnumber=1; rv=g_bool_prec; args=[g_bool_prec; g_bool_prec]; yielding=true; nonref=true; eis=true } );
    (2, fst g_hpr_abs_fgis, snd g_hpr_abs_fgis);
    (3, "hpr_sysexit", { g_default_native_fun_sig with  biosnumber=3; eis=true; yielding=true; nonref=true; rv=g_void_prec; args=[g_default_prec]});
    (4, "hpr_writeln", { g_default_native_fun_sig with  biosnumber=4; eis=true; needs_printf_formatting=true; nonref=true; rv=g_void_prec; args=[]});
    (6, "hpr_write",   { g_default_native_fun_sig with  biosnumber=6; eis=true; needs_printf_formatting=true; nonref=true; rv=g_void_prec; args=[]});
    (7, "hpr_concat",  { g_default_native_fun_sig with  biosnumber=7; eis=false; yielding=false;  nonref=false; rv=g_string_prec; args=[g_string_prec; g_string_prec]});
    // Note: some of these table entries are canned in meox too.
    (8, fst g_hpr_max_fgis, snd g_hpr_max_fgis);
    (9, fst g_hpr_min_fgis, snd g_hpr_min_fgis);
    (10, "hpr_select",  { g_default_native_fun_sig with  biosnumber=10; eis=false; yielding=false; nonref=true; rv=g_absmm_prec; args=[]});    

    (11, "hpr_KppMark", { g_default_native_fun_sig with  biosnumber=11; eis=true; yielding=false; nonref=true; rv={g_default_prec with widtho=None; }; args=[g_bool_prec]});
    (12, "hpr_alloc",   { g_default_native_fun_sig with  biosnumber=12; eis=true; yielding=false; nonref=true; rv={g_default_prec with widtho=None; }; args=[g_default_prec;g_default_prec]});

    (13, "hpr_pause_control",  { g_default_native_fun_sig with  biosnumber=13; eis=true; yielding=false; nonref=true; rv={g_default_prec with widtho=Some 32; }; args=[g_default_prec]});        
    (14, fst g_hpr_flt2dbl_fgis, snd g_hpr_flt2dbl_fgis);  // Convert between floating point precisions
    (15, fst g_hpr_dbl2flt_fgis, snd g_hpr_dbl2flt_fgis);

    (16, fst g_hpr_flt2int32_fgis, snd g_hpr_flt2int32_fgis);  // Convert integer forms to/from floating point forms.
    (17, fst g_hpr_flt2int64_fgis, snd g_hpr_flt2int64_fgis);
    (18, fst g_hpr_dbl2int32_fgis, snd g_hpr_dbl2int32_fgis);
    (19, fst g_hpr_dbl2int64_fgis, snd g_hpr_dbl2int64_fgis);

    (20, fst g_hpr_flt_from_int32_fgis, snd g_hpr_flt_from_int32_fgis); // Further convert integer forms to/from floating point forms.
    (21, fst g_hpr_flt_from_int64_fgis, snd g_hpr_flt_from_int64_fgis);
    (22, fst g_hpr_dbl_from_int32_fgis, snd g_hpr_dbl_from_int32_fgis);
    (23, fst g_hpr_dbl_from_int64_fgis, snd g_hpr_dbl_from_int64_fgis);
    (24, fst g_hpr_toChar_fgis, snd g_hpr_toChar_fgis);

    (25, fst g_hpr_bitsToDouble_fgis, snd g_hpr_bitsToDouble_fgis); // Get access to floating point as raw bits.
    (26, fst g_hpr_bitsToSingle_fgis, snd g_hpr_bitsToSingle_fgis);
    (27, fst g_hpr_doubleToBits_fgis, snd g_hpr_doubleToBits_fgis);
    (28, fst g_hpr_singleToBits_fgis, snd g_hpr_singleToBits_fgis);
    
  ]

Links

  • Kiwi Scientific Acceleration (FPGA High-Level Synthesis from dotnet/CIL ByteCode): Kiwi and Kiwic.

  • Bluespec compiler:Toy BlueSpec Compiler

  • Experimental specification language (old) H2Spec.

  • UIA Processor Use and Synthesis.