FAQ and Bugs

Note: Do not use Console.Writeline or Write with 4 or more arguments since MCS converts these calls to a different style not supported by KiwiC.

Q. My design takes forever to compile but seems to make more progress with -repack=disable.

A. -repack=disable will cause all arrays to be of size 2**64 words. The only thing you can usefully do with repack disabled is run the internal simulator, Diosim. Diosim models enormous arrays as dictionary-based sparse structures. It is nice to see the Diosim output, but the resulting RTL will break most back-end simulation or synth flows (unless they too are able to handle arrays like that).

Q. Can I use Kiwi for Visual Basic?

A. Kiwi has not been directed to address Visual Basic but there is a little trial/demo on the following link:


Q. If I multiply by a constant, floating-point number, will specialist FP ALUs be made or will KiwiC use a standard FP adder with a tied-off argument?

A. Currently it is the latter, although the argument may not be tied off in all cases: generally the multiplier will be being used for various operations with multiplexing of provided arguments. Also, where it is tied off, the FPGA tools will typically perform some (considerable?) constant folding.

Q. I am converting from C code that contains legacy unions ...

A. KiwiC is not set up to handle unsafe unions at all. It mostly works on the basis that the input code is strongly typed, but there is a little backdoor (called FastBitConvert) somewhere for floating point operations. The standard GetBytes forms in BitConverter should also work, but they produce a lot of intermediate code that goes all down the KiwiC recipe until, hopefully, allmost totally disappearing in load/store elides in the final output.

From test56 - Adding the FastBitConvert attribute makes KiwiC ignore the bodies of functions such as these and replaces the body with its own fast-path identity code based only on the signatures of the functions.

  static ulong fast_from_double(double darg)
    byte [] asbytes = BitConverter.GetBytes(darg);
    return BitConverter.ToUInt64(asbytes, 0);

  static double fast_to_double(ulong farg)
    byte [] asbytes = BitConverter.GetBytes(farg);
    double rr = BitConverter.ToDouble(asbytes, 0);
    return rr;

  static uint fast_from_float(float darg)
    byte [] asbytes = BitConverter.GetBytes(darg);
    return BitConverter.ToUInt32(asbytes, 0);

  static float fast_to_float(uint farg)
    byte [] asbytes = BitConverter.GetBytes(farg);
    float rr = BitConverter.ToSingle(asbytes, 0);
    return rr;

Q. KiwiC stops with an incomprehensible error. How can I tell how far KiwiC is getting through my compilation?

A. The most simple approach, with a fragile tool, is to build up your application slowly and check whether KiwiC keeps compiling it successfully as you go. Visibility can be gained by adding command line flags to write out the disassembled PE file and intermediate kcode. The PE file can be found in obj/ast.cil if you add flag +-kiwic-kcode-dump=enable+. You should get one kcode listing file for each thread of your design. These can be found in files such as obj/kcode.T403.gt4.txt. These contain low-level imperative code generated from the C# method bodies. If the full Kiwife recipe stage runs successfully, you should see a file called obj/h02_kiwife/report-full which is the input to the HLS toolchain implemented by HPR in its subsequent receipe stages. You may need to add -report-each-step to get each report file added. Also, there are serveral verbose logging modes that can be enabled from the command line with flags called loglevel which should be set to zero for maximum output.

Q. Can we have 2 [Kiwi.HardwareEntryPoint()] in the same class? Are the threads being translated as different always blocks to Verilog?

A. There are three ways to make new threads.

1. I normally create a second thread from the first using the C# standard approach that you show and as used in some of the tests like test44.cs

   Thread threadx = new Thread(newThreadStart(reader.ReceiveProcess));
but 2. having more than one hardware entry point attribute or 3. more than one entry on -root cmd line flag should all also work fine. The threads do not have to be in different classes but techniques 2 and 3 can only be added to a static method.

Note: Join is not supported at the moment.

Regarding the number of always blocks resulting, I am not too sure off hand. The compose recipe stage combines updates from different VM2s and this should perhaps ensure there is only one. But most designs, I think, run the same or and/or compile faster with -compose=disable. So the verilog_gen stage is also doing the same trick I think. Certainly a shared variable needs to be only written by one always block in the standard synthesisable Verilog subset. Or if it is an on-chip RAM then two threads maximum owing to dual-port RAM available in FPGA.

Q. ... but the compiler exhausts all of the memory and the machine crashes ...

A. Which stage is taking all the time ? Can you see the relative timestamps of the create time of the various folders in the obj folder?

Are you in hard pause mode and is all the time time being taken in the kiwife or bevelab? If so, make sure that every control flow path in your non-unwound loops contains a Kiwi.Pause(). You should be able to set the unwind budgets to smaller values to make the compiler stop attempting earlier. Defaults are large:


Q. I got another 2 warnings:

+++ precision failure? ::: diadic_promote_and_resolve did not know
 what to do with CT_cr(Emu/debug_operands, <<NONE>>) V_minus
 CTL_net(false, 32, Signed,[native])

+++ precision failure? ::: diadic_promote_and_resolve did not know
 what to do with &(CT_arr(CTL_net(false, 64, Unsigned,[native]),
 <unspec>)) V_bitor CTL_net(false, 64, Signed,[native])

A. This first one is a subtract of a 32 bit integer from a class reference (object pointer). The second one looks like you are doing bitwise or of a 64-bit value with with the address of an array.

Neither of these is allowed in safe C# although you can do what you want in unsafe C#. These operations are not supported. Kiwi only supports comparisons, multiplexing and assignment of array bases.

Q. If I want to multiply a pair of 32-bit numbers to get a 64-bit result I would typically use something like

 int a, b;
 long p = ((long)a) * b;
but won't this instantiate a 64-bit multiplier component?

A. The multipliers that KiwiC (restructure2) instantiates from cvgates.v, such as CV_INT_FL3_MULTIPLIER_S, are just soft macros that the FPGA tools will flatten and optimise on a use-case basis. If that multiplier is used just for the one multipication, the FPGA tools will trim the internal logic of the multiplier to handle only 32-bit inputs, using fewer DSP splices. If the instantiated multiplier has been schedulled for use at other use sites that use higher-order input or output bits, the multiplier will be trimmed less. But, the latency allocated to the 64-bit multiplier will be a couple of cycles more than the smaller one and the FPGA tools do not, of course, retime the design such that this can be reclaimed.

Q. I get a postscript file called 'nolayout.eps' what is this?

A. The HPR library contains a constructive placer that writes a floor plan to an eps file. This is used for net-length power analysis on output RTL. It is also being used in the constructive placer to decide how best to colour registers and bind functional units such as ALUs.

Q. Do you have any Xillybus or JetStream (Manchester) demos?

A. No, but we expect these to be contributed soon ...

Q. KiwiC is generating a circuit with too many output terminals to fit in my FPGA. Why is this?

A. You may be directly instantiating the Kiwi-generated RTL as the top-level of your FPGA. This is not a normal design route: you should most likely be using a standard Kiwi substrate for your FPGA and it is the substrate that instantiates the Kiwi code. The problem most likely arises from the Waypoint outputs. These are only for simulation purposes and they can be safely ignored. If they are left disconnected in the component that instantiates the Kiwi-generated RTL the FPGA tools will delete the logic that drives them instead of attempting to route them to a lot of output pads (IO BLOCKS).

   output reg [639:0] KppWaypoint0,
   output [639:0] KppWaypoint1,

You can also use command line flag -vnl-keep-waypoints=disable to turn off their rendering.

Q. What IP-XACT support does Kiwi have?

A. None at the moment, but the debug access port that Kiwi should provide in the future will likely have its structure reported in IP-XACT (§10.4).

Q. I tried more ideas for one-liners, such as:

         exist = Array.IndexOf(LUT, tmp) > -1 ? true : false;
but it didn't work.

A. Since Kiwi imports very little of the standard C# libraries, the .Index method of the Array class is most likely missing. For 2-D and greater arrays, Kiwi uses an implementation in Kiwic.cs and it is easily possible to add the implementation of Index into those implementations in C# src code form and it should then work. For 1-D arrays, the bulk of the implementation is hardcoded inside KiwiC, but there should be potential to extend the hardcoding with additional C# code and place that, ultimately, in Kiwic.cs as well. Its a matter of knowing what to put in there. In short you should easily be able to contribute your own implementation of such things.

Q. Why do I get KiwiC error: do not update your formal parameters for now.

A. The message you have now encountered is a result of storing or modifying a formal parameter to a function which is functionality was missing. Just copy your formal into a local var at the start of the function body for now. Fixed in version 2.16 onwards, August 2016.

void myfun (int fp)
  int copied_fp = fp;
  copied_fp += 1;   // Do not directly modify your call by value formals before Sept 2016.  (Pass by reference works fine).

Q. What does this mean: System.Exception: CV_INT_FL2_MULTIPLIER_S unrecognised gate for presim: arity=6

A. This is from the built-in simulator, diosim. The design has used a fixed-latency of 2 multiplier component (from cvgates.v or elsewhere) but the simulator does not know how to simulate it. Restructure2 should have included its own simulation model for each component it deploys, but one fix is to not apply diosim to this design (miss off the -sim=nnnn flag) since the generated RTL should be ok.

Q. How can I get meaningful line numbers in my error messages from KiwiC ?

A. Line numbers are hard to track through the C# front end, but errors should be reported on a method name basis. There is a fairly-detailed log file written to the obj/h02_kiwic folder but it is hard to understand. Increasingly you can get a finer cross reference with the source code by embedding waypoints in your source file. 10.2

Q. Why are bools using 32 bits, even in arrays ?

A. A C# compiler may compiles them this way - CIL has no run-time bool class. It may be best to instantiate your own bit-packed array class with suitable overloads if you want to exploit bit-level storage.

Q. Can I generate a VCD using the builtin simulator, diosim.

A. Yes, use the "-sim=nnnn" argument to set the number of cycles to simulate for and add "-diosim-vcd=myvcd.vcd" to set the output file name. The "-recipe=recipes/simkcode.rcp" command line flag is also useful for just running the KiwiC front end in a software-like simulation.

Q. Why is the reset input not used in the generated RTL?

A. See §35. The reset net is disconnected unless you indeed add


or change this XML line in the file /distro/lib/recipes/KiwiC00.rcp

 <defaultsetting> resets none </defaultsetting>

Q. Why does the type of the output result end up as: reg [31:0] FIFO_FIFO2_result; instead of reg FIFO_FIFO2_result; ?

A. In Verilog, integers are signed and registers are not. You can alter this by adjusting the definition of result. Recent Verilog standards also allow signed registers to be defined.

Q. I have lots of X uncertain values in my simulation

A. Is the source of X from flip-flops that are not cleared at reset or is it floating inputs? Did you put -vnl-resets=synchronous ? You do not need this on all FPGA simulations since FPGA flops are self resetting, but with the associated simulator you may need this.

It is good to trace the pc10nz program counter (or similar name) generated by KiwiC for each thread. This normally starts at zero. You can cross check that with the dot graphviz output or the tables appended to the back of the .v file (also present in the obj/h08_restructure/s00... file).

Q. I thought I would have a go at synthesizing the ... However, the Verilog finish statement gets in the way. Should there really be a finish command in synthesizable Verilog?

A. If the main entry point to the C# program allows its thread to exit then a finish will be put in the output code. This is indeed not synthesisable. Quite often one wants the program to exit when run native but not when synthesised. The solution to this is to place the main body of the program in a subroutine that is called from the Main method (ie the entry point). The same subroutine is also called from a second method where it is enclosed in an infinite while loop. This second method can then be named as the root to KiwiC and this will avoid a finish statement in the generated code.

Suppressing the default operation on main thread exit statement can be controlled with a command line flag -kiwic-finish.

  -kiwic-finish= [ enable | disable ]
Another solution is to mark up the main body subroutine with the Kiwi.Remote() attribute. This places it in an infinite loop, and adds handshaking wires to start and stop its execution.

Another solution is to put an infinite loop in the main entry point (perhaps including a Kiwi.Pause() statement in the loop if there is other complexity to ensure KiwiC spends less time working out that it is infinite).

Q. I get the error 'kiwife: ran out of lasso steps, please increase fe unwind budget' ?

A. If your program has no input, compiling it is the same thing as interpreting it. KiwiC is probably trying to run the whole program at compile time. To give it something to do at run time, a Kiwi.Pause() should be inserted before you enter the main outer loop of your application.

Q. I get the following strange error message even when I am sure my program is not allocating fresh memory inside the thread lasso loop :Bad form heap pointer for obj_alloc (already allocated a variable sized object ?).

A. Check whether you are allocating local arrays on the stack: if these are just constant lookup tables makes sure you put the keyword const in front to make them statically-allocated.

Q. I get an error like [ERROR] FATAL UNHANDLED EXCEPTION: System.Exception: threadstart//T403/Main/t55_2: Creating class instance this/uid token=System/Action`2/$star1$/@/16/SS/TX1/SINT/TX0: Bad form heap pointer for obj_alloc of type System/Action`2/$star1$/@/16/SS/TX1/SINT/TX0 post end of elaboration point (or have already allocated a runtime variable sized object ?). storemode=STOREMODE_compiletime_heap, sbrk=/tend:nota_const constant_fold_meets entry_point=0

A. This is a Kiwi 1 restriction - most heap objects need to be allocated before the end of static elaboration. Consider moving the code that allocates the heap object to the class constructor or else to another method that you call earlier. (For allocate-once items, this code migration will become automatic soon.)

Q.Can I use in Kiwi the data type struct?

A. Kiwi aims to support static and dynamic classes well. Structs in C# are slightly odd things and Kiwi has little support form them that is properly well tested. This is being fixed 4Q2016. Normally you should use classes but it you have a good reason to use structs we can see how well it is currently working.

Q. What string formatting is supported in Console.Write or WriteLine?

A. Up to three arguments are supported. String, integer decimal, integer hex and floating point should all work. String catenation is also supported provided it is done a KiwiC compile time.

Q. I get FPGA or RTL SIM error regarding CV_SP_SSRAM_FL1 missing.

A. This is a single-ported synchronous static RAM with fixed latency of 1 read cycle. It will most likely be mapped to block RAM by FPGA tools. There are a number of such components that KiwiC instantiates. Please include a Kiwi technology library such as distro/lib/cvgates.v in your back end compile

Q. Does Kiwi supports the keyword `break'?

A. Yes, all control flow constructs like for/while/continue/break are handled by the C# compiler and just appear as goto's in the CIL dot net code input to KiwiC.

Q. What Console.Write formatting is supported?


examples - all are standard dot net
  {0} - arg 0 in decimal or floating
  {1} - arg 1
  {2} - arg 2
  {1:x} - arg 1 in hex
  {1:X} - arg 1 in upper case hex
  {1:3} - field width of 3 decimal
  {1:03X} - field width of 3, hex with leading zeros

Q. If I instantiate : static ulong[] buffer = new ulong[10] , KiwiC will generate registers. In the simulation I noticed that I got, not 10 regs, but 18 I tried also with static ulong[] buffer = new ulong[5] and got 8 regs.

A. A short array of 10 entries is most likely to be mapped to 10 separate registers, especially if you only use constant subscripts. If your subscripts can be determined not to use the whole range or only use multiples of a some constant or fall in disjoint regions you will get other patterns. Quite how it gets allocated depends on the pattern of subscriptions you use. The figure 18 you quote is presumably inflation on top of that from other aspects of the design? Kiwi does not replicate and mirror storage at the moment (but this is being added for ROM mirrors) although this could possibly be useful under some circumstances. Ditto 5 to 8. Also, it depends on how many time you assign to buffer and how many different calls to new you make. I assume you have just one assign outside of any loop or re-entrant code.

Q. I try to instantiate 2 ulong[256] arrays. In the RTL there are two memories, one A_64_US...[255:0][63:0] and one A_64_US...[2047][63:0]. I checked also the verilog file and I noticed that the address of the second array, whenever there is an operation, is multiplied by 8. Is it because of some optimization?

A.The byte address of a u64 array will be a factor of 8 different from the word address. Also If you only used every 8th location in an array, the repack recipe stage might notice this and divide each address by 8 to save space. The addresses on the input to the repack recipe stage are byte addresses. The addresses afterwards should be efficiently packed addresses, which would be /64 if you used only every 8th word owing to both effects acting.

Q. KiwiC seems to be deleting most of my design. Is this correct?

A. The processing stage called conerefine deletes unused parts of the design. A part of the design is unused if it generates no output. Outputs include PLI calls like Console.WriteLine or net-level outputs flagged with kiwi.outputwordport or similar. Adding -conerefine=disable to the command line suppresses the associated trimming, resulting in a larger RTL or other output file, although occasionally this may lead to elements being present at the code generation stage that cannot be sensibly rendered in the output language.

Also, certain keeps can be marked up on the command line so that conerefine uses these as roots.

Another common cause of an empty or near-empty RTL file is that no compilation roots were specified. This can be spotted when the file obj/h02_kiwife/report-full contains no executable code. You then need to add something like -root=MyApp.MyMain. You also see in KiwiC.rpt that no root was processed, except for perhaps the odd class constructor.

Q. If I want a net-level I/O bus wider than 128 bits (the size of a ulong), what can I do?

A. There is some support for this that needs documenting, where an array is passed as I/O. The colourbars example illustrates this style, but it is not in the repo and has not been tested for a while. However, having a static C# struct (not a class) as an I/O ought to work. However, C# structs is not mature in KiwiC. We can easily fix a few basic cases now however. See test51.

Q. KiwiC is taking a very long time to compile and then fails. It says it has run out of unwind steps. Why is this?

A. If you are in a soft pause mode, KiwiC will infer Kiwi.Pause() statements where it feels necessary to allocate work to clock cycles. In hard pause mode KiwiC is not free to insert such pauses. If you have an infinite loop without a pause in it, KiwiC will fail to unwind the loop. Check that all control paths (PC trajectories) inside infinite loops have at least one Kiwi.Pause() inside them. Also, try setting the unwind attempt limits (cil-unwind-budget, bevelab-ubudget, etc.) to smaller values to discover the error earlier or to larger values if you think the effort is warranted.

Q. KiwiC is trying to start wine and creating file paths with backslashes in them, even though I am running on Linux. It also reports it is running on NT 5.2 when there is no windows machine anywhere involved.

A. On recent linux systems, on encountering a .exe the shell will start wine and try to open windows and so on. The KiwiC shell scripts enable you to define MONO and you should set this in your environment to `mono' or `/usr/bin/mono'. If this still does not fix the problem please set you shell env var MONO_OS_OVERRIDE to something begining with 'l' such as linux64 and KiwiC will override the installed path combiner and related options.

+++ checking failed:
Factorial_fac[15:0]:OUTPUT::Unsigned{init=0, io_output=true, HwWidth=16, storage
=32} := Factorial_fac*FTFT4FactorialCircuit_V_0: assignment may wrap differently
: rhs/w=32, lhs/w=16, store/w=32

  [Kiwi.OutputWordPort(15, 0)] static uint fac = 1;

Q. Hi, I was looking at the Kiwi project for compiling C# Programs into FPGA, what the tool does is convert the C# program to a logic circuit? is there is a way to visualize the logic circuit associated to program?

A. You can look at the circuit in the FPGA tools schematic viewer. But the generated circuit is typically very large indeed and you need to look at a block diagram of the datapath and a flowchart of the controller relating to each thread. The controller flowcharts are rendered in GraphViz dot but is often too large for that tool if it has 1000 or so codepoints. Graphical output for the datapath is being worked on at the moment as part of the new spatially-aware register colouring system that tries to minimise wiring and multiplexor complexity.

Q. Can I use Xilinx FIFOs? pg057-fifo-generator.pdf

A. To use them in Kiwi I would probably (currently) split the code for the source and sink units such that each can be separately compiled by Kiwi but so that the composite design can also be run as a mono program where the FIFO functionality is supplied by a fifo.dll generated from C#. For the FPGA implementation I would read the separate Verilog outputs from the two Kiwi compiles into the FPGA tools along with an implementation of the FIFO. My first implementation would be some simple hand-crafted RTL and then later I would replace this with the output of the Xilinx FIFO generator. The two stages are to retain ease of debugging and design portability, where an RTL simulation of the system without Xilinx IP remains possible.

Q. The burning question for me is, what options are available for exploiting parallelism that are not explicitly referred to from the C code? Does your converter alleviate the Von Neumann ALU bottleneck from critical paths or is an imperative C description unsuitable for substantial acceleration opportunities?

A. With KiwiC, all the standard HLS limits on parallelism apply. This means a program that can be executed in one clock cycle will be executed in one clock cycle provided sufficient budgets on hardware resource use and and logic in a clock cycle are set.

The is no intrinsic parallelisation limit arising from a single-threaded, imperative description. But limits arise in practice from data and control dependencies/hazards.

Regarding data dependencies, where array subscript comparison is undecidable at compile-time (name aliases), the resulting h/w design from trying to go massively parallel is generally dominated by spurious multiplexing paths and not a good design. When making array subscript comparisons at compile time, KiwiC can spot common paradigms, such as identical expressions, constant expressions and mainfestly unequal expressions like $x$ and $x+1$. Computing theory states that there will always be decidable equalities outside those KiwiC is programmed to decide.

Regarding control dependencies, the current KiwiC elaboration algorithms do not dynamically unwind outer loops when inner loops are still being unwound - this will be addressed in the VSFG replacement to bevelab. But a programming style where the loop exit predicate is determinable near the head of the loop body always helps in sequencer modes, as it does with Von Neumann computers, and the compilers always try to hoist it. There is no problem, of course, with data-independent loop control.

All object fields and static variables are currently strictly updated in program order. Additional annotation or policy control as 'non-architectural' or 'relaxed' for for fields or static variables may be supported in the near future. These will enable KiwiC to do more speculative execution but make debugging harder because program order will not be followed. To help this, architectural 'slave' registers may be added for debug viewing that can simply be deleted by the FPGA toolchain if not being monitored in any way.

Q. What endianness is Kiwi - I need this for unsafe bit conversion routines ?

A. KiwiC supports only little-endian operations. There are various dot net API calls that you can make to interogate this at run time and Kiwi's libraries provided this information. For your code to remain portable you should invoke this API and KiwiC will progagate the constants accordingly, discarding any code for big-endian support.

Q. Sorry to take your time again but I'm new to this and I wan't to be sure of something, what is implemented on the FPGA is a processor that runs the program or is directly the representation of the program as a logic circuit?

A. There are various compilation styles. The fully-pipelined accelerator will run the whole program every clock tick, accepting new data every clock cycle, allbeit with some number of clock cycles latency between a particular input appearing at the output. Sequencer mode will generate a custom datapath made up of RAMs, ALUs and external DRAM connections and fold the program onto this structure using some small number of clock cycles for each iteration of the inner loops. Compilation directives alter the trade off between silicon used and the number of clock cycles needed. No standard processor is used. High-level synthesis of this kind is used in your mobile phone and enables it to compress motion video from the camera without instantly flattening the battery.

For larger programs, a good deal of the code tends to be start up and reporting code that is executed far less frequently than the main inner loops. This code can be placed on a standard processor and coupled to the HLS-generated hardware or else the datapath for the higher-performance parts can also be used as an unoptimised datapath for the less-commonly-executed code.

A. These are warnings that the generated RTL will behave differently from the dot net versions if overflow occurs in the custom bit width fields.

You defined the output port to be a sixteen bit register but used the 'uint' dot net valuetype to model it in the dll. You are performing an operation on this field that is sensitive to its width. The warning is that there might be a difference in behaviour if, e.g. you increment this value so that it goes above 56535.

David Greaves 2017-02-25