HPR L/S (aka Orangepath) Facilities

HPR L/S (aka Orangepath) is a library and framework designed for synthesis and simulations of a broad class of computer systems, protocols and interfaces in hardware and software forms.

The HPR L/S library provides facilities for a number of experimental compilers. This part of the manual describes the core features, not all of which will be used in every flow.

FILES and DIRECTORIES

Recipe

Orangepath tools read a recipe file which instructs the sequence of steps used: compilation, analysis, optimisation, output generation and so on.

As well as specific output files generated by components of the recipe (such as RTL or Systemc), a log file is created for each recipe stage and an overall report file is made that any recipe stage can write to.

When an Orangepath tool is run, it creates a directory for temporary and report files. This is the obj directory. The name can be changed from the default ('obj' in the current directory) with the early arg -log-dir-name.

A similar early arg sets the name of the default directory for the specific outputs generated by a recipe stage.

The obj directory contains a sub-directory for each recipe stage. This is the stage report directory.

The output code from a recipe stage can be printed to a report file in its stage report directory if the -report-each-step flag is given.

the -cfg-plot-each-step flag

the -bevelab-cfg-dotreport=enable flag

The .plt files are plot files that can be viewed using diogif, either on an X display or converted to .gif files.

Environment Variables and IncDir Search Paths

Tools must load various files from the filesystem and must know where to look.

Environment variables can provide places to look.

An HPR L/S tool itself will expect to have all of its dlls on the system search path or else in the folder accessed by ../lib from where its binary file (such as kiwic.exe) is stored.

A user can specify additional folders to search for loadable files, such as previous outputs from incremental compilation steps and standard IP blocks. These are defined by the incdir path. The HPRLS_IP_INCDIR environment variable and the -ip-incdir command line or recipe flag can be set to a string that contains a colon-separated (semicolon on Windows) list of search folders. This is the incdir path. Most earlier outputs are described in IP-XACT and it is these metafiles that need to be found in this way, with the actual IP being held in a file named in the IP-XACT xml `files' section. Where those filenames are non-absolute, they will be looked up in the incdir path.

The HPRLS environment variable may be used to specify another search path for core parts of the system, but this would needs better documentation ...

Espresso

The traditional unix espresso tool is not needed for Fsharp implementation of HPR L/S since this has its own internal implementation.

The Moscow ML implementation of the Orangepath tool required Espresso to be installed in /usr/local or else the ESPRESSO environment variable to point to the binary. If set to the ASCII string NULL then the optimiser is not used.

The -no-espresso flag can also be used to disable call outs to this optimiser. Internal code may be used instead.

Cone Refine

The cone refine optimiser deletes parts of the design that have no observable output. It can be disabled using the flag -cone-refine=disable.

It may also be programmed to retain other named features of interest.

HPR Command Line Flags

The very first args to an HPR/Orangepath tool are the early args that enable the receipe file to be selected and the logging level and location to be set.

The first argument to an HPR/Orangepath tool, such as h2comp or KiwiC, is a source file name. Everything else that follows is an option. Options are now described in turn.

The HPR/LS logger makes an object directory and writes log files to it.

Flag -verboselevel=n turns on diversion of log file content to be mirrored on the standard output. 0 is the default and 10 makes everything also come out on the console. Console writes are flushed after each line and this is also a means of viewing the final part of a log that has not been flushed owing to stdio buffering.

Flag -verbose turns on a level of console reporting. Certain lines that are written to the obj/log files appear also on the console.

Flag -verbose2 turns on a further level of console reporting. Certain lines that are written to the obj/log files appear also on the console.

Flag -recipe fn.xml sets the file name for the recipe that will be followed.

Flag -loglevel n sets the logging level with 100 being the maximum n that results in the most output.

Flag -give-backtrace prevents interceptions of HPR backtraces and will therefore give a less processed, raw error output from mono.

The developer mode flag, -devx, enables internal messages from the toolchain that are for the benefit of developers of the tool. Setting the environment variable HPRLS_DEVX=1 performs the same action.

NOTE: Many of the command line flags listed here have a different command line syntax using the FSharp version of KiwiC. This manual is still being updated. To get their effect one must currently either make manual edits to the recipe xml file (e.g. kiwici00.rcp) or else simply list then on the command line using the form -flagname value

If the special name -GLOBALS is specified as a root, then the outermost scope of the assembly, covering items such as the globals found in the C language, is scanned for variable declarations.

Flag -preserve-sequencer structures output code with an explicit case or switch statement for each finite-state machine.

Synthcontrol -bevelab-repack-pc=disable creates sequencer encodings where the PC ranges directly over the h2 line numbers: easier for cross-referencing when debugging. Otherwise it defaults to a packed binary or unary coding depending on -bevelab-onehot-pc.

Option -array-scalarise all converts all arrays to register files. Other forms allows names to be specifically listed. See § .

 -vnl-resets=none
 -vnl-resets=synchronous
 -vnl-resets=asynchronous

or change this XML line in the file /distro/lib/recipes/KiwiC00.rcp

 <defaultsetting> resets none </defaultsetting>

When doing RTL simulation of the KiwiC-generated RTL output, one can sometimes encounter a `lock up' where the design makes no further progress. Tracing the `pc' variable in the output code will reveal it is stuck when trying to make a conditional branch whose predicate evaluates to dont-care owing to un-initialised registers or disconnected inputs.

HPR (KiwiC) (by default) does not generate initialisation code to set static variables to their default values (zero for integers and floats and false for booleans). The same goes for RAM contents.

For RAM contents, with KiwiC, the user code must contain an explicit clear operation in a C# loop.

To overcome the problem with uninitialised registers, we can potentially use -vnl-resets=synchronous or -vnl-resets=asynchronous. This will make the RTL simulate properly and overcomes most lockup problems. But we get additional wiring in the output that can repeat the FPGA's own hardwired or global reset mechanisms.

Clearly the design can be synthesised separately with and without resets. But to avoid the duplication of effort, hence with a common RTL file (one synthesis run only), one must take one of the following five routes, where the first two use a KiwiC compile with the default -vnl-resets=none.

use an RTL simulator option that has an option where all registers start as zero instead of X,
add a set of additional initial statements to the generated RTL that are ignored for FPGA synthesis (HPR vnl could generate these automatically but does not at the moment),
request a reset input to the generated sub-system (using -vnl-resets=synchronous) but tie this off to the inactive state at the FPGA instantiation of that subsystem and expect the FPGA tools to strip it out as redundant logic so that it does not consume FPGA resource.
trust the FPGA tools to detect a synchronous reset net as such (by boolean dividing FPGA D-input expressions by it) and map it to the FPGA hardwired reset mechanisms so that it does not consume FPGA resource.
use -vnl-resets=asynchronous and trust the FPGA tools to map this to the hardware global reset net.

Note, the vnl output stage always generates subsystems with a reset input but this is (mostly) ignored under the default option of -vnl-resets=none.

See § .

 "-subexps=off"

The subexps flag turns off sub-expression commoning-up in the backend.

 -vnl-rootmodname name

Use the -vnl-rootmodname flag to set the output module name in Verilog RTL output files.

 -vnl-roundtrip name= [ enable | disable ]

Converts generated Verilog back to internal VM form for further processing.

When enabled, generated RTL will be converted back again before (for example) being simulated with diosim. When disabled, the input to the verilog generate (vnl) recipe stage will be passed on unchanged and a typical recipe will then simulate that directly.

 "-ifshare=on"
 "-ifshare=none"
 "-ifshare=simple"

The default ifshare operation is that guards are tally counted and the most frequently used guard expressions are placed outermost in a nested tree of if statements.

The ifshare flag turns off if-block generation in output code. If set to 'none' then ever statement has its own 'if' statement around it. If it is set to 'simple' then minimal processing is performed. The default setting is 'on'.

 "-dpath=on"
 "-dpath=none"
 "-dpath=simple"

When dpath=on, with the preserve sequencer options for a thread, a separate 'datapath' engine is split out per threads and shared over all data operations by that thread.

Synthcontrol cone-refine-keep=a,b,c accepts a comma-separated list of identifiers names as an argument and instructs the cone-refine optimiser/trimmer to retain logic that supports those nets.

-xtor mode specifies the generation of TLM transactors and bus monitors. The mode may be initiator, target or monitor.

-render-root rootname specifies the root facet for output from the the current run. If not specified, the root facet is used. This has effect for interface synthesis where the root module is not actually what is wanted as the output from the current run.

-ubudget n specifies a budget number of basic blocks to loop unwind when generating RTL style outputs.

The -finish={true false} flag controls what happens when the main thread exits. Supplying this flag causes generated output code to exit to the simulation environment rather than hanging forever. When running under a simulator such as Modelsim, or when generating SystemC, it is helpful to exit the simulation but certain design compiler and FPGA tools will not accept input code that finishes since there is no gate-level equivalent (no self-destruct gate).

Other output formats

The -smv flag causes the tool to generate a nuSMV output file.

The -ucode flag causes generation of UIA microprocessor code for the design.

-vnl fn.v specifies to generate a Verilog model and write it to file fn.v.

-gatelib NAME requests that the Verilog output is in gate netlist format instead of RTL. The identifier NAME specifies the cell library and is currently ignored: a default CAMHDL cell library is used.

-gatelib NAME requests that the Verilog output is in gate netlist format. This takes precedence over -vnl that causes RTL output.

General Command Line Flags

The -version flag give tool version and help string.

The -help flag give tool version and help string.

HPR L/S (aka Orangepath) FAQ

Q. I get the error

Error: Could not load file or assembly 'FSharp.Core, Version=4.4.0.0

A. This is not related to any missing files in the Kiwi distro. Instead it is do with FSharp version incompatibilities. The FSharp.Core is part of the FSharp system. If you are using pre-built dll files then the version of mono or FSharp on your system may be incompatible with the pre-built dll files and you would have to change version or else regenerate the dll files by compiling the FSharp source code with the 'fsharpc' compiler on your system.

You may wish to just compile a trivial 'Hello World' FSharp program on your system to check that FSharp is all set up ok.

HPR System Integrator

Q. I cannot see how to start using System Integrator?

A. SystemIntegrator is a standalone program, written in FSharp and using the HPR library. It is in its own folder (and the binary is hprls/system-integrator/distro/lib/sysint.exe). Examples are currently missing.

**Figure 10:** Example of multi-compilation structural assembly with internal and external instances.

The HPR System Integrator compiler/generator takes a set of HPR VMs and generates SP_RTL constructs to wire up their ports following the VM instantiation pattern or an input IP-XACT document. It will instantiate protocol adaptors and glue logic based on pre-defined rules.

The resulting system can then be emitted without the actual instances using other recipe stages, such as SystemC, RTL or IP-XACT. These output files will typically be combined with the instantiated components in external tools, such as FPGA logic synthesis.

The resulting system can also be passed on to the Diosim simulator for execution within Orangepath, for auditing tools to run, or for any other purpose.

Figure 10 illustrates a typical structural set-up arising from multiple compilation units assembled on a single FPGA. In detail, the figure shows a top-level application (primary IP block) that instantiates a separately-compiled child component that, in turn, instantiates three grand children of two different types. The children and grand children are subsidiary IP blocks. They do not do anything unless commanded by a primary IP block. Each compilation unit connects to its child by an arg/result port that is of a custom design for the current application. It is application-specific (A/S).

In addition, each child component requires access to RAM resources. In this particular example, the top-level module did not require RAM access (although it could well have its own BRAM privately instantiated).

Finally, every component has a directorate port for error reporting. The primary IP block also receives its run/stop control via this port.

The HPR System Integrator compiler takes a set of HPR VMs and generates an hierarchic netlist to wire up their ports using pre-defined rules that are based on the concept of domains of connection. It will instantiate as many protocol adaptors, bus switches and arbiters as is needed. The resulting structure is typically rendered as RTL. In the future it can invoke Greaves/Nam glue logic synthesis or other generators and then instantiate the glue in the netlist.

The resulting system can also be passed on to the Diosim simulator for execution within Orangepath, for auditing tools to run, or for any other purpose.

Its internal datastructure, prior to rendering the output, is in a form that can be output as IP-XACT spirit:design document.

A future facility to read in and obey IP-XACT spirit:design documents could easily be added, but there are plenty of third-party tools offering that service.

HPR System Integrator supports:

Creating inter-module wiring structures with tie-off of unused ports.
Working both at the TLM level and structural net list level.
Glue logic insertion in the form of instantiated adapators from the library are readily inserted automatically using rules based on interface type differences.
Allocation of AXI tag numbers.
Custom glue logic from the Greaves/Nam cross-product technique can also be rendered.
Outputs are rendered in Verilog, IP-XACT, SystemC TLM, SystemC behavioural and SystemC RTL-styles depending on the subsequent recipe stage the output is passed to.
Server farm mode supporting dynamic dispatch will be added during 2017.

A component instance can be internal or external. The distinction is more pronounced for RTL than SystemC. External instantiation is where the instance is inside the current (instantiating) module, in the style of a traditional hierarchic design. An external instance is instead formed outside the current module, resulting in additional bindings in the signature of the current module. External instantiation leads to a flatter design. Its principle advantage is where the instantiated component has a number of service ports whose bindings would instead need to be conveyed through the current instance signature.

**Figure 11:** System Integrator Tool: Basic Auxiliary Components.

The HPR System Integrator rule engine understands the following types of component:

Primary IP Block -- a top-level component of the design, such as a primary output from Kiwi HLS, that embodies an algorithm or processes and generates work for the all the other components.
Subsidiary IP Block -- an IP-block with slave ports that performs an operation. Examples are RAMs, ALUs and HLS outputs from earlier parts of an incremental compilation process.
External Port -- a connection to an externally-instantiated resource, such as a DRAM bank, inter-FPGA bridge or Ethernet port.
Aggregators -- for arbitrating and dispatching between initiators and demultiplexing based on addressed target
Concentrators -- for tagged multiplexing and demultiplexing over a shared channel.
Protocol Adaptor -- for converting between bus standards

Every block is accompanied with non-functional meta-info that gives an area, latency, throughput and energy cost using IP-XACT extensions.

Every external block port and port on a primary IP block must also be manually given a so-called domain name. The standing rules used by HPR System Integrator endeavour, for each domain, to wire everything together, thereby achieving conservation of data. There will generally be at least one domain name for each connection between separately-compiled modules in an incremental compilation. Also, there will be domains associated with each disjoint memory map/space and one for the debug/directing logic.

The system synthesis is guided by a goal function, which is a scalar metric that factors area, delay and energy according to a weights that the user can adjust as desired.

The automatic generation axioms are:

The number of primary IP blocks and external ports is set in the initial configuration, together with their instance names. Their plurality may not not be adjusted by HPR System Integrator.
The plurality of all other components may be freely adjusted by HPR System Integrator, but it may not replicate state-bearing components (unless they have mirror rules defined in the future).
Except for broadcast connections (that have no reverse direction signals), such as clocks, resets and status codes, all initiating ports must be connected to a matching target port with a one-to-one direct connection.
The IP-XACT max-masters and max-slaves attributes Ports are either multicast or one-to-one. They may have to be connected or may be left disconnected. All initiating ports must be connected to a matching target port with a one-to-one direct connection.
The resulting design should give a low value for the goal function.
This will tend to minimise the number of additionally instantiated components and typically causes them to be wired in tree-like structures to minimise latency.

Per domain metric functions and upper bounds

Algorithm: for each domain name, while there is an unconnected initiator, create a connection for it to a suitable serving resource. If the serving resource is an external port that is currently disconnected, a direct connection can be made. But if the external port is already bound, an additional bus switch will be instantiated or the arity of an existing one will be increased.

If the serving resource would be an instance of replicatable IP block, ...

If the serving resource would be an instance of mirrorable IP block, ...

Memory Map Management (Link Editing)

A shared memory resource that is serving a plurality of disjoint requirements needs memory management to statically or dynamically allocate disjoint memory to each component. This is essentially a link editing problem.

Kiwi solves this in two ways. For static allocation in each bank, HPR System Integrator reads in from IP-XACT how much static memory is required and supplies a base address as an RTL parameter to each instantiated component. This base address is promolgated into the core of the logic by constant propagation in the logic synthesiser (FPGA tool) that is applied to the KiwiC output.

For dynamic allocation, an allocator component, coded in C# must implement a free pointer or equivalent policy, be instantiated once, and serve out memory blocks. This will require unsafe C# in each client (or shim thereof) to cast the address to the required struct or object type. Only the alloc/dealloc requests need be sent to the shared component: the data read and write transfers themselves are transferred over a general the AXI switch fabric that can provide as much spatial diversity as is appropriate.

For genuinely shared pools there will inevitibly be a C# module that directs the requests for WD development and this must be separately compiled and connected to by multiple parent IP blocks.

For multiple address spaces it is convenient to add extra fantom bits ...

Deadlock and Combinational Paths

... TBD

Constructive Placement

**Figure 12:** System Integrator Tool: Inputs and Outputs and flow between the two stages.

The general flow for the tool is illustrated in Fig. 12. Its inputs are the name of a primary IP block for the top-level, a search path for lookup of the so-called subsidiary and auxiliary IP blocks, and a description of the target platform described in a file blade-manifest.xml. The tool operates in two stages. The first is a planner that makes floorplanning and memory layout decisions and instantiates subsidiary and auxiliary blocks as needed. The resulting high-level design is written out as an IP-XACT design report, a graphical plot and a human-readable report that tabulates utilisation metrics. The second stage compiles the design to a structural netlist. This writes out a master RTL file for each FPGA.

The blade manifest lists the number of FPGAs available on the platform, describing their size, interconnection pattern and hardened IP ports and capabilities. It is an XML file crafted by hand or using an XML editor.

The tool can potentially use any standard optimisation procedure to minimise its global cost metric. The current implementation uses a constructive placer that is run about 50 times using different pseudo-random seeds with the best solution and spread being reported. A critical consideration is whether any IP blocks themselves are good candidates for consequential re-synthesis. There are three reasons for re-synthesising a component:

General time/space fold: Standard HLS tools have considerable freedom to produce large and fast designs or smaller designs that require a greater number of clock cycles.
Degree of Port Mirroring: Where a subsidiary block can be mirrored, the parent needs to be synthesised with a determined number of master ports when these are connected one-to-one with the children. Moreover, the number of load, store and load/store stations on the component can also be manually controlled with our tool.
Move to variable-latency handshakes: Where a block instantiates a fixed-latency child connection, but then that connection has to be converted to variable-latency owing to inter-FPGA bridges (or perhaps being in a server farm in the future).

The System Integrator's main job is to generate a design that includes the primary IP block and all the support it needs. Starting from the primary IP block, it adds the subsidiary IP blocks referred to in its port list. These may have further application-specific ports (as shown in Figure 10) that in turn need to be supported. Hence it iterates at this stage. Using its constructive placer, it puts each block on a named FPGA where there is sufficient area remaining. Connections that span multiple dies have their necessary protocol adaptors instantiated straightaway. Where a bridge link is shared between bus connections, concentrators are added (addressing tags are later created in a global colouring step). Any placement attempt where any hard limit is breached is aborted without further study. Hard limits include any FPGA being full, as just mentioned, or a guaranteed throughput or latency (sequential or combinational) cannot be met.

As illustrated in Figure 10, there are three forms of bus connection understood by System Integrator:

A Primary Application-Specific Interface enables a component to invoke functions using a custom bus structure on a child component that has a reverse interface of the same type. In our HLS system, such bus specifications are emitted automatically as augmented IP-XACT bus abstraction documents. The same file is emitted when either side is compiled, with the second simply overwriting the first. When the boundary reflects a class definition in the high-level language, the file name and interface name are the same as the class name. Such a class can have any number of methods and each method will use some set of the busses (or `ports' as they are called in IP-XACT) making up the interface. This sort of connection is also used for connections to the standard libraries of maths functions.
A Service Interface provides access to main memory resources for the component. The component is free to instantiate its own RAMs where it wishes, such as FPGA block RAM, but larger regions need wiring to DRAM resources. These are either statically instantiated on the server blade or else accessed over AXI or PCIe on some platforms.
A Directing Interface provides start/stop control of the primary application and collects status and abnormal end codes from subsidiary blocks. It may also provide debug inspection.

A connection between two components is valid when all of the following conditions hold:

Kind Name: the protocol kinds have the same name. Differences in the other three IP-XACT naming attributes, vendor, version and library name, are warned about but otherwise ignored.
Connection Rule: A one-to-one connection must have two peers: one an initiator and the other a target. A multicast connection must have exactly one initiator.
Parameters Match: IP-XACT parameters are key/value pairs, and these must match apart from any that the user specifically annotates (on the command line) as allowed to mismatch. This ensures, for instance, that a 32-bit data bus is not connected to 64-bit data bus. To overcome simple mismatches of any complexity, one side needs to be manually renamed by the user and an additional protocol adapator added on the search path that encompasses the adaption, such as ignoring unused address bits. Automation of this is expected in the future.
Unified Domains: The connection domains must either already match under the current unification or else a fresh, non-contradictorary, unification is added for the remainder of the design construction.

The domain unifier operates over equivalence classes that contain at most one domain constant and any number of domain variables that are unified to that constant or just to each other when a domain constant is not present.

When a protocol adaptor is instantiated, it is given a fresh domain variable that is allocated to both ports.

Multi-FPGA designs

HPR System Integrator can allocate logic between FPGA chips.

**Figure:** Inter-FPGA bridge structure: typical setup. The SERDES instances, as described manually in the blade manifest, are utilised by the System Integrator's instantiation of protocol adaptors and concentrators as required.

As illustrated by the SERDES pair in Figure 13, inter-FPGA bridges are bi-directional and have four ports for binding by the System Integrator as it creates an inter-FPGA network. The two ends of each simplex channel have the same domain name, but the bandwidth and latency for the two channels can be described differently in the associated IP-XACT description. Each of the four bus interfaces is AXI streaming with a specified word width, giving the lossless FIFO paradigm. Each direction of the pair is kept matched by the System Integrator, as it adapts the hardware resource to its needs. The adaption steps are just the same as may be freely used elsewhere in the assembly: they are inserting a protocol adaptor pair on each side or inserting a concentrator pair consisting of a tagging mux and an inverse de-multiplexing component that processes and removes the tags. There is a set of standard protocol adaptors corresponding to all basic method signatures of up to 3 arguments with and without a result in our standard distribution. Others can be created by hand as needed and added to the library, or they can be macro-generated on demand in the future. Glue logic for these purposes can also be synthesised from a non-deadlocking, data-conserving product of protocol state machines by known techniques, such as [2].

Mux and Demux Blocks

**Figure 14:** Three forms of multiplexing/demultiplexing where the demultiplexors respectively uses tags, addresses and utilisation to make a routing decision.

Figure 14 shows three use cases involving tagged multiplexing and differing demultiplexing approaches. The arrows in the Figure indicate direction of initiation, but each underlying bus can normally carry data in either direction according to the whether read-style or write-style operations are currently being conveyed.

The left of the Figure shows straightforward concentration, where multiple logical channels are conveyed over a shared physical channel. A tagging multiplexor is matched with a detagging demultiplexor. The tags inserted at the top are removed at the bottom and are private to the configuration. This configuration provides perfect data conservation with respect to the mulitplexed channels from the point of entry at the top to the point of exit at the bottom.

The centre of the Figure shows shared access to a logical address space by a number of initiators where the address space is served by an aggregation of physical memory resources. The demultiplexor operates using address ranges. The multiplexor still inserts tags, but this time these are removed again also by the multiplexor. These tags are only examined by the multiplexor that created them: it removes them when the result is forwarded upwards to the originator. The tags are conveyed opaquely within all lower components. Two degenerate forms of the aggregation configuration arise: 1. when there is only a single client for an aggregated resource, the multiplexor is not needed; and 2. when the resource is monolithic the demultiplexor is not needed.

The right of the Figure illustrates the server farm configuration, that again uses a tagging multiplexor, but the demultiplexor operates on a load-balancing basis. The server farm is not currently natively supported by HPR System Integrator. Instead, the user must implement this paradigm by writing their own implementations of the multiplexor and demultiplexor. This is easy to do in C# for synthesis to RTL by KiwiC. If the C# is marked up for separate synthesis of the relevant components, the HPR System Integrator will then assemble the system, treating the farming blocks as subsidiary IP blocks to be assembled as normal. An example will be placed here ... TBD.

Non-uniform Memory Access (NUMA)

**Figure 15:** Illustration of non-uniform memory access designs: Left is current implementation style and right is preferred style.

It is desirable for traffic to take the shortest route between layout zones. The HPR System Integrator implements Warshall's algorithm to find available routes and to price design solutions that use them. However, Figure 15 shows, on the left, the typical structure that arises when static resources in two different zones are aggregated and then shared by clients where the clients (A, B, C and D) are also distributed over the zones.

The right-hand side of the figure illustrates a preferred design that is typically exploited in non-uniform memory architectures (NUMA). Although this has a little more logic, the average access latency for Zone 1 is improved.

The HPR System Integrator operates by first creating the required data paths as a rats' nest without regard to layout zone. As mentioned in §, it then inserts bridges and concentrators as it maps that network onto the layout zones. This leads to the left-hand style of design. To achieve the preferred design, greater smartness is needed: a spatially aware design is needed from the outset. We aim to address this in a subsequent release of the tool.

Warshall's algorithm is also applied to protocol adaptors in the library, to see what can be connected to what in principle and the best pattern of adaptors, giving each adaptor a unit cost at this time. // We must avoid building wandering chains that convert backwards and forwards between protocols, but as Warshall considers each protocol a node in a multi-hop journey, it will only instantiated at most one of each type of adaptor in a path.

Network On Chip (NoC)

It is interesting to examine whether HPR System Integrator can be said to be synthesising a Network-on-Chip (NoC).

Although there may be no absolute definition of what constitutes a NoC, the following defining principles can be identified:

Connectivity: Data can be sent, in principle, from ingress node to any exit node.
Sharing: Traffic for different purposes uses a common bus infrastructure with sharing in the time domain.
Route Diversity: Traffic may take various routes from a source to a destination determined by some static or run-time decision or policy.

The HPR System Integrator will make a custom mesh network as it instantiates concentrators to exploit shared inter-zone bridges. So it does sometimes generate a NoC using the `sharing' principle. Where the inter-zone bridges are arranged just as a physical ring, then the resulting network is a ring network (being a degenerate form of mesh). The ring is bi-directional or uni-directional, in terms of instantiation, according to the same property in the pattern of the available bridges. But each bridge is bi-directional in data terms, in that responses are carried in the reverse direction over the bridge that carried the request. Overall, there is currently no route diversity.

In the future, for large FPGAs, it is sensible perhaps to divide them into several layout zones, perhaps with fluid boundaries where area can be vired between zones. It will then be neccessary to instantiate inter-zone bridges in the blade manifest between these zones. Such bridges will be nothing more than point-to-point wiring, which will be totally reified by the back-end logic synthesis tool, so there is no run-time overhead. The advantage is that the pattern of concentrators will closer resemble a fine-grained NoC and the generated wiring will resemble ...

Bus Definitions

Bus definitions in IP-XACTare split over two files. The definition and the abstraction. ....

All IP-XACTfiles needed will be search for using the path specified with ip-incdir=. This is a colon or semicolon (on Windows) separated list of folders.

Sewing Kit for Miscellaneous Nets

Any hardware design will have a few extra nets (wires) that System Integrator does not need to understand. Since the tool emits the top-level design file on each run, the simplistic approach would be to reinsert such logic with a sed or perl script. This is a bit messy. Nonetheless, there are some SED_BIND_POINT tokens emitted as comments in the rendered output to assist with such an approach.

A more general support mechanism called the Sewing Kit will be added to direct System Integrator to emit miscellaneous nets in the future.

System Integrator Example Run

Under construction May/Sept 2017...

The HPR System Integrator application is invoked from the command line using the shell script found in HPRLS/system_integrator/dist/bin called sysint that invokes the sysint.exe portable assembly under mono.

You need to also minimally supply

   -blade-manifest=my-manifest.xml 
   -root=rootipname
   -ip-incdir=folder1:folder2:folder3

For a single zone, you can set the Verilog output file name with vnl=filename.v but for multiple zones, you are better setting the output folder name with -obj-dir=outputfolder and allowing the tool to create its own output files. The output files follow a built-in naming scheme that extends a root name which defaults to `roger' and which can be set to something else with `-outroot=fred' etc..

All flags can also be set instead from the Organgepath recipe by extending the default sysint00.rcp file and redirecting to the extended copy with -recipe=myrecipe.rcp which needs to be (pretty much) the first argument after the mono sysint.exe part of the overall command line.

The HPR System Integrator generates (by default) a graphical plot like this early example:

**Figure:** Draft. Output via `dot' of the inter-IP block wiring generated in an example System Integrator run. Two layout zones were used, corresponding to two FPGAs.

The primary and subsidiary I/P blocks are in black. Concentrators are in green. Adaptors are in brown.

For complex designs, a separate plot for each layout zone is helpful. This is achieved with ...

Diosim Simulator

The HPR L/S library provides a built-in simulator called Diosim. It is intended to be able to execute any mixture of intermediate codes since all have executable semantics.

Diosim is invoked by the recipe. Typically a recipe may invoke it on the same intermediate form that is being rendered as RTL or SystemC etc..

The Orangepath system contains its own simulator called Diosim. Since the target is output from the compiler as portable code to be fed into third-party C and Verilog compilers, it is not strictly necessary to use the Orangepath simulator. However, the simulator provides a self-contained means of evaluating a generated target without using external tools.

The simulator accepts a hierarchical set of VM2 machines and simulates them and their interactions.

The simulator will dynamically validate all safety assertion rules that contain no temporal logic operators. Other safety and all liveness assertions are ignored.

Non-deterministic choices are made on the basis of a PRBS that the user may seed.

The PRBS is also used for synthetic input generation from plant machines or external inputs. PRBS values used for external inputs are checked against plant safety assertions and rejected if they would violate.

Output is to files. Several files are generated:

A log file where individual events are visible if logging level is set high enough, eg. with diosim-tl=100.
A plot file. The plot file is currently in diogif plot format.
A VCD file - viewable with gtkwave and/or modelsim etc..
A console spool file, typically called diosim.out.

Simulation Control Command Line Flags

As well as providing simulation output in VCD and console form, diosim can collect statistics and help with profile generating. However, it is fairly slow and it is best to collect profiles from faster execution engines, such as via Verilator.

The statistics that diosim can collect range from net-level switching activity to higher-level statistics like imperative DIC instructions executed, RTL sequential and combinational assignment counts.

Only the two Verilog output forms, RTL and gatelevel, support conversion back into HPR machine form for post generation simulation.

-sim n specifies to simulate the system using the builtin HPR event-driven simulator for n cycles. The output is written to t.plt for viewing. The -traces flag provides a list of net patterns to trace in the simulator.

The -title title flag names the diosim plot title.

The -diosim-techno=enable flag causes print statements from the simulator to include ANSI colour escape codes for various highlighting options.

The -plot plotfile flag causes plot file output of the diosim simulation to a named plot file in diogif format.

The plot file can be viewed under X-windows and/or converted to a gif using the diogif program.

The -diosim-vcd=filename.vcd flag causes diosim to write a Verilog Change Dump (vcd) report to the named file.

Detailed logging can be found in the obj/log files. If a program prints the string 'diosim:traceon' or 'diosim:traceoff' the level of logging is changed dynamically.

If a program prints 'diosim:exit' then diosim will exit a though builtin function hpr_exit() were called.

KiwiC using C++ instead of C#

Visual Basic, Visual C++ and gcc4cil will generate dotnet portable assemblies from C++ code.

Using the gcc4cil compiler you should find a binary called "cil32-gcc" in the <path_to_cross_compiler>/bin directory. To create a CIL file use this compiler with the -S option.

Getting gcc4cil.

1. Get Gcc4Cil from the svn-repository that is mentioned on the
Gcc4Cil website (http://www.mono-project.com/Gcc4cil)
"svn co svn://gcc.gnu.org/svn/gcc/branches/st/cli"

2. As Gcc4Cil wants to compile files for the Mono-platform, you
need the Mono-project installed on your system. The easiest way to
install it is to use "Linux installer for x86" that can be found
under http://www.mono-project.com/Downloads . Installation
instructions are available under
http://www.mono-project.com/InstallerInstructions .

3. It may be possible that you need to install the portable .NET
project. During the manual compilation of gcc4cil I got errors, that
made me install this project. However I could not find a line in the
automatic generated Makefile that has a reference to the p.net path
in my home-dir. If you get the impression that you need it, you can
find it here: http://www.gnu.org/software/dotgnu/pnet-install.html

4. Because I did not know that there was a automatic script for this, I did a
<path_to_gcc4cil>/configure using the following options
--prefix=<where it should be installed to>
--with-mono=<install_dir_of_mono>
--with-gmp=<install_dir_of_glib>

I then did a make bootstrap-lean and installed the following libraries because
of compile errors:
- bison-2.3.tar.gz*
- glib-2.12.9.tar.gz
- pkg-config-0.22.tar.gz

I think it is likely that you may want so skip this step, as
this step DOES_NOT generate a compiler for cil but for boring x86
code (what I learned after I did this). However I set up paths to the
installed libraries in this step, so I mention it. I do not know for
sure if all those paths are needed in the end. As it works for me
now, I wont remove them:

setenv HOST_MONOLIB "/home/petero/mono-1.2.5.1/lib"
setenv HOST_MONOINC "/home/petero/mono-1.2.5.1/include/mono-1.0:/home/petero/mono-1.2.5.1/include/mono-1.0/mono:/home/petero/mono-1.2.5.1/include/mono-1.0/mono/cil:/home/petero/mono-1.2.5.1/include/mono-1.0/mono/jit:/home/petero/mono-1.2.5.1/include/mono-1.0/mono/metadata"
setenv CIL_AS "/home/petero/p.net/lib:/home/petero/p.net/bin"

5. in the directory where you put the gcc4cil source code, you can
find a shell script called "cil32-crosstool.sh". Execute this and the
crosscompiler for C-to-CIL compilation hopefully now gets compiled.

Nov 2016 note: The main gcc4cil problem was a lack of any sort of linker, as I recall. I do not recall why a linker was critical since KiwiC and dotnet are both happy to accept multiple dll files. Perhaps there was a related problem with .h files. I don't know whether gcc4cil maintenance is now abandoned.

Of course Visual C++ produces dotnet code that should work pretty much as well as the recent Visual Basic demo. I don't know how much Visual C++ resembles standard C++ or whether it can only be compiled on windows.

All of the HPR recipe stages except for the first, kiwife, are independent of dotnet. The intermediate HPR VM forms between recipe stages are all supposed to be serialisable to disk: you use recipe files that start and end with a load and save of VM code. But that facility has not been used recently. It might become important again to help overcome long monolithic compile times.

Subsections

David Greaves 2019-11-14