The HPR library contains a number of output code generators. All of these write out a representation of an internal HPR machine. Not all forms of HPR machine can be written out in all output forms, but, where this is not possible, a synthesis engine should be available that can be applied to the internal HPR machine to convert it.
Certain output formats can encode both an RTL/hardware-style and a software/threaded style. For instance, a C-like input file can be rendered out again in threaded C style, or as a list of non-blocking assignments using the SystemC library.
The following output formats may be created:
The net-based output architecture is suitable for direct implementation as a custom SoC (system on chip). H2 defines its own microcontroller and we use the term MPU to denote an H2 microcontroller with an associated firmware ROM. The net-based architecture consists of RTL logic and some number of MPUs. However, by requesting that all output is as C code for a single MPU, the net-based output degenerates to a single file of portable C code.
Additional output files include log files and synthesisable and high-level models of the UISA microprocessor that executes IMP machine machine code.
The tool will expect the user to provide definitions of various leaf cells with the output from the tool at the input to the RTL synthesis step. A number of suitable definitions are included in cvgates.v and cv_fpgates.v and it may commonly be sufficient just to include these two files in the RTL compilation.
The leaf cell names follow a few conventions:
For variable-latency leaf cells, the handshake protocols is as follows:
Note: The above is for on-chip devices instantiated directly by the tool. Off-chip RAM connections use a separate protocol (HSIMPLE, HFAST, AXI, BVCI).
The RTL backend will use built-in RTL operators for adders and substractors. For multipliers and dividers and modulus with non-constant arguments it instantiates specific units, such as CV_INT_VL_DIVIDER_US. Very small multipliers are rendered with the RTL asterisk infix operator and left to the FPGA tools as per the adders/substractors.
Kiwi generally calls out to variable latency dividers and fixed-latency multipliers. It uses an estimate for the variable latency computation time in its schedules. When using a fixed latency it increases the latency requested for larger parameter widths. Whether fixed or variable is indicated in the component kind name. Instantiated components cope with any argument width as specified by parameter overloads.
Kiwi does not currently generate the fixed-point ALU implementations and it may request one that is not in the provided cvgates.v baseline library, in which case the poor user must provide their own implementation. For example, an extreme design might call for a 512 by 1024 fixed latency multiplier with 5 clock cycle latency.
Recipe parameters alter the points at which the library enlarges the provisioned latency.
Floating-point ALUs follow the pattern of fixed-points ALUs, except that add and subtract are also always instantiated ALUs and the RTL compiler is not expected to handle them. A different set of recipe parameters control their structure (fixed/variable latency and expected/required latency).
Only 32 and 64 bit, IEEE standard floating point is currently used by default. A future extension will provide for custom width floating point, since this is a very powerful feature of HLS that can save a lot of energy and area. The extension will give the same behaviour on mono WD as on RTLSIM and FPGA.
A core set of floating point ALUs is provide in cv_fpgates.v. These are soft macros that the RTL tools are expected to map to whatever is available in the target FPGA or ASIC library. Specific shims and bindings to assist with Altera and Xilinx are likely to be added to the distro in the near future.
There is no budget limit on the number of convertors is currently imposed.
The convertors required normally are
CV_FP_CVT_FL2_F32_I32 // Integer 32 to float 32 with fixed latency of 2 CV_FP_CVT_FL2_F32_I64 // Integer 32 to float 32 with fixed latency of 2 CV_FP_CVT_FL2_F64_I32 // Integer 32 to float 32 with fixed latency of 2 CV_FP_CVT_FL2_F64_I64 // Integer 32 to float 32 with fixed latency of 2 CV_FP_CVT_FL2_I32_F32 // Integer 32 from float 32 with fixed latency of 2 CV_FP_CVT_FL2_I32_F64 // Integer 32 from float 32 with fixed latency of 2 CV_FP_CVT_FL2_I64_F32 // Integer 32 from float 32 with fixed latency of 2 CV_FP_CVT_FL2_I64_F64 // Integer 32 from float 32 with fixed latency of 2 CV_FP_CVT_FL0_F32_F64 // Float 32 from float 64 (FL=0 implies combinational) CV_FP_CVT_FL0_F64_F32 // Float 32 from float 64 (FL=0 implies combinational)
A set of standard static RAM cells is provided in cvgates.v. These are parameterisable in width, length and number of lanes by overrides. They are single and dual ported and of latencies 0, 1 and 2 clock cycles.
Kiwi and other tools built in the HPR library generate instances of these RAMs.
RTL tools are expected to map these to appropriate structures, such as LUT RAM and block RAM on FPGA.
RAM instances are also generated with no write ports and static initialisations using the Verilog initial statements. RTL tools will treat these as ROMs. Unlike RAMs, where the user is expected to manually couple a definition from cvagtes.v or elsewhere to their RTL synthesis step input, ROMs are are embedded in the main RTL output files from a run of the tool.