Orangepath/HPR-L/S Project: Hardware and Embedded Software Synthesis from Executable Specifications.
Kiwic .net Compilation to Hardware.

Kiwi HLS and Kiwi Scientific Acceleration

Kiwi is a High-Level Synthesis (HLS) system that primarily generates FPGA designs from C# source code.

Compared with existing high-level synthesis tools, KiwiC supports a wider subset of standard programming language features. In particular, it supports multi-dimensional arrays, threading, file-server I/O, object management and limited recursion. Release 1 of KiwiC supports static heap management, where all memory structures are allocated at compile-time and permanently allocated to on-FPGA RAM or external DRAM. Release 2 of KiwiC, which has had some successful tests already, supports arbitrary heap-allocation at run time but does not implement garbage collection.

The Kiwi performance predictor is another important design tool, enabling HPC users to explore the expected speed up of their application as the modify it, without having to wait for multi-hour FPGA compilations in each development iteration.

This web page summarises various KiwiC examples and use cases. HPC-focussed work is presented more fully here: Kiwi Scientific Acceleration: New Site for Scientific Users: Kiwi HPC.

Kiwi was developed at the University of Cambridge Computer Laboratory and Microsoft Research Limited, headed by David Greaves (UoCCL) and Satnam Singh (MRL). Kiwi continues at the Computer Laboratory as part of a logic synthesis project called HPR L/S and it uses the Orangepath core library.

Kiwi Use Cases

Kiwi can be used for generating custom accelerators to be embedded in other systems or it can be used as an execution platform for a complete scientific application.

The KiwiC compiler operates in several design styles. These vary in the amount of control the user has over the mapping of work to hardware clock cycles and the rate at which the resulting system can accept new input arguments.

Classical HLS generates a custom datapath and controlling sequencer for an application. The application may run once and exit or be organised as a server that goes busy when given new input data.

At the other extreme, we can generate a fully-pipelined, fixed-latency stream processor that tends not to have a controlling sequencer, but which instead relies on predicated execution and a little backwards and forwards forwarding along its pipeline.

In all uses cases, the user's application is first coded in C# and can be developed and tested on the user's workstation using Visual Studio or Mono.

When high performance is required, the self-same binary file is further compiled using KiwiC for programmable hardware FPGAs.

FPGAs can use as little as 1/1000th of the energy and run 100 times faster than standard workstations. An everyday use of a hardware accelerator is the MPEG compression on a smart phone. This would instantly flatten the battery if done in software instead!

Kiwi Hardware Server and Real-Time Accelerator Synthesis

When generating a real-time accelerator, a C# function (method with arguments and return value) is designated by the user as the target root, either using a C# attribute or a command line flag to the KiwiC compiler. The user may also state the maximum processing latency. He will also typically state the reissue frequency, which could be once per clock cycle and whether stalls (flow control) is allowed.

For a real-time accelerator, multiple 'calls' to the designated function are being evaluated concurrently in the generated hardware. Operations on mutable state, including static RAMs and DRAM are allowed, but care must be taken over the way multiple executions appear to be interleaved, just as care is needed with re-entrant, multithreaded software operating on shared variables. Local variables are private to each invokation.

Note: real-time server mode is being implemented 3Q16. The prior 'hardware server' RPC-based implementation (LINK) is non-rentrant and does not enforce hard real time.

Accelerating CPU-bound Applications

Applications that do not involve much I/O are always good candidates for FPGA execution since FPGA I/O performance and facilities have generally been inferior compared with standard processors, such an x86 motherboard.

Additionaly, applications that do not require a great deal of memory are also ideal for FPGA, since the FPGA DRAM controllers tend to be a little behind those on the latest x86 motherboards. However, many FPGAs now have considerable on-chip static RAM and fairly high bandwidth connection to larger co-located static RAMs, so memory footprint is not such an issue.

Kiwi works very well on intensive CPU-bound applications: particularly those that do a lot of bit-level operation and not much floating point. Typical examples are encryption, hashing and linear programming. The latest FPGAs now have much better quality floating-point support and this has recently extended the class of applications that benefit.

Accelerating Data-Intensive Applications Using Networked FPGA

 

FPGAs can stream big data from and to fileservers via Ethernet, QPX or AXI bus on the Zynq platform.

In the future, FPGA platforms may become a standard offering in Cloud Computing. Microsoft has installed FPGAs in many of its cloud blades but not yet opened up the FPGA resource for public programming.

Kiwi is a form of acceleration for scientific applications and parallel programming. It uses the parallel constructs of the C# language and dotnet runtime. Specifically, Kiwi consists of a run-time library for native simulation of hardware descriptions within C# and the KiwiC compiler that generates RTL for FPGA from constrained/stylised .net bytecode.

Or in other words: Kiwi is developing a methodology for algorithm acceleration using parallel programming and the C# language. Specifically, Kiwi consists of a run-time library for hardware FPGA execution of algorithms expressed within C# and a compiler, KiwiC, that converts dotnet bytecode into Verilog RTL for further compilation for FPGA execution. In the future, custom domain-specific front ends that generate dotnet bytecode can be used.

The Kiwi technology has many potential uses, but some of note are:

Kiwi Talks

Kiwi Online Resources

  • Early hardware-centric demonstrations: Demo Pages.

  • Kiwi Scientific Acceleration Manual (web and PDF): PDF, HTML.

  • Kiwi Scientific Acceleration: New Site: Larger Demos for Scientific Users (under development 2Q16).

  • KiwiC is built on the HPR L/S library and so shares a lot of internal implementation with the H2 compiler and the JinJoin Glue Logic Synthesiser, toy BSV compiler and so on: PDF, HTML.
    (C) 2007-2016 David J Greaves.