CBG CtoV Manual: Ansi C to Verilog Compiler (Draft 11th July 2000)


Motivation

Synthesis from conventional hardware design languages uses a large number of process loops, each with a nominal thread, but restricts variables to be updated by one thread, the number of such threads being determined at compile time. There is no support for mutex variables. The lack of signed arithmetic is also quite a severe problem for DSP designs in Verilog.

On the other hand, the software approaches to system design uses a thread where there is a sequence of actions to be done and this thread can flow from one module to another, using up and down calls, and updating variables inside those modules under mutex locking. Threads can be dynamically created and terminated ans both thread and routine handles can be stored in variables.

With these observations in mind, it is not suprising that software has been easier to create than hardware. CtoV allows the designer to use the tranche of software techniques when designing hardware.

CBG CtoV Compiler

CBG CtoV is a compiler that reads ANSI C source code and generates a synthesisable Verilog design which performs the same function. The places where clock cycles are consumed are either inferred by the compiler or correspond to places where the programmer created a call to the built-in routine `cx_barrier'.

A given input design can be compiled and run as a normal C program, using gcc, or can be compiled with CtoV to a chosen hardware architecture. The architecture is characterised by the number of RAMs, adders and multipliers allowed: the compiler works out how to use these resources on a cycle-by-cycle basis.

It is intended that CtoV is normally run from inside a makefile and that, for large designs, separate compilations from CtoV (and other Verilog producing compilers) are combined as instances in a higher-level Verilog module.

Special features of CtoV are:

  • A design can be compiled with parameters which minimise clock cycle consumption or else minimise the number of RAMs, adders and multipliers generated. RF-RELEASE no parameters can be set.

  • Advanced SRAM mapping functions determine access schedules for synchronous SRAMs of various sizes and number of access ports.

  • Scalar variables and arrays from the source program may be packed together into memory arrays in the hardware design.

  • Array variables in the source program can be unpacked into registers in the hardware design.

  • Low-level design is possible, where the exact structure of the resulting gate-level net-list can be controlled from the source file.

  • The source C code can be run as normal using the TENOS libraries to produce an emulation of the hardware.

  • The source C code may be multi-threaded and use the pthreads library to create threads and for mutexes.

Designs can be compiled back and forwards between C and Verilog using alternately VTOC and CtoV.

Language Restrictions

For most, existing, large C programs, it is not surprising to learn that CtoV cannot compile them into hardware. CtoV has various restrictions, the most severe being that many of the functions of the posix libraries and system call interface are not available. There are a couple of restrictions on the actual C language code as well, and these are described in this section.

Using CtoV, the exact number of bits required for data storage must be determined at compile time. Therefore the C code for compilation may not use recursive routines or malloc for heap access. All pointer variables must be compile-time resolvable to specific arrays or functions. The user can define an array to act as `main memory' and write his own malloc routine to assign areas within it if desired.

In version 2, the restriction to non-recursive functions will be lifted.

Floating point arithmetic is not supported at run time in the generated hardware. All floating point arithemetic is performed at compile time and then the result is truncated to an integer in the object code.

CtoV: Basic Operation

CtoV is run from a command line or makefile. It reads one or more C files and special versions of the standard C library files and include files. The include files are in the cxincludes directory which forms part of the distribution.

In a given compilation, the C routine whose name is given by the command line flage -iroot and all routines called by it are compiled. The output is a single Verilog module whose name is given by the -oroot command line option.

The inputs and outputs to the generated hardware are the formal parameters to the top level module and any free variables which have been declared to be I/O using an IOMAP macro. SRAM blocks can be placed outside the generated target module, which is useful for large RAMs in the FPGA environment, in which case the connections to such SRAM blocks may also be present as I/O ports to the generated module.

A pair of additional inputs are generated by the compiler for all designs (except for fully combinatorial designs) `clk' and `cx_reset'. These are the master clock and the master synchronous, active high reset. If these signals are alread present in the signature of the top-level routine then the existing signals are used. The input design should list them in the order clk, cx_reset as the first two arguments. Note that this is the order generated by the companion VTOC program.

Other files: report and control

Apart from the main input C file(s) and the main output Verilog file, an additional `.ctl' file may be read in to control the compilation and a `.rpt' file is written out.

The control file name may be specified with the `-ctl fn' command line flag. The report file name may be specified with the `-rpt fn' command line flag.

If no control file is specified, the top `.c' source file is used read as the control file. Owing to the form of control commands, they may be embedded by the user inside comments in the `.c' file.

If no report file is specified, the default name is the same as the `.c' source file but with the suffix changed to `.rpt'.

Note that the report file generated by one compilation is suitable for direct use as the control file for a subsequent compilation. However, in practice, users might paste selected lines from a report file into a manually-maintained control file using a text editor.

Control File Syntax

So that the control file commands can be easily extracted when they are embedded in source files or report files, each command starts with the string `$TTSET' and ends with the string `$'. The command consists of a number of alphanumeric strings separated by white space or other punctuations characters, including the vertical stile used in tables.

Link Editing

CtoV will accept exactly one C filename from the command line. Access to source code from multiple files in one compilation is either via system libraries or via #include directives in the C file first specified.

Multiple compilations from CtoV will lead to multiple output Verilog modules. These can be stitched together, manually, in the Verilog domain by instantiating them in higher-level modules.

Three modes of compilation

The compiler will automatically select one of the following three modes of compilation. These are demonstrated in the examples section of this manual.

  • Fully combinatorial
  • One cycle (simple RTL)
  • Sequenced.

A fully combinatorial design needs no clock and so the compiler will not automatically add the clock and reset nets to the signature port list.

A one cycle design corresponds to a design with no requirement for a run-time program counter. All assignments to variables can be made in parallel using Verilog RTL constructs.

A sequenced design requires a run-time program counter. The compiler generates RTL Verilog (as opposed to behavioural Verilog) and an additional variable called `tt_statereg' which acts as a program counter and as a sequencer to schedule accesses to RAM arrays. For multi-threaded designs where the number of threads does not change at run-time, one such state register is created per thread. When the number of threads may change, the compiler generates multiple DFFs and uses one hot coding to record the thread state where necessary.

The compilation mode and the mapping of barrier statements to values of tt_statereg is shown in the report file.

Variable Declaration

The user can use the normal C datatypes: byte, short, int and long int. These may be qualified as signed or unsigned. In addtion, the include file < cxtypes.h> has definitions of variables with explicit widths in bits. For instance a u5 is a five-bit unsigned quantity. For normal C compilation, these macro definitions expand to the next largest standard C type, but for CtoV compilation, the compiler takes not of the detailed width. This can lead to variations in program function if a design relies on the bits which fall between a CtoV sized variable and the next largest C variable.

All of the variables in the souce C program are mapped to static addresses in the generated hardware, either in register variables or in RAM arrays. Nonetheless, the keyword static has a meaning when a routine is compiled by CtoV. Taking this example

  void xfun(... )
  {
    static int x = 3; int y = 2;
    ...
  }
the variable x will be initialised to 3 in clock cycles where `cx_reset' is active whereas y will be initialised to 2 each time a thread enters xfun. Each thread will see its own copy of y whereas there is one x.

Variables in different scopes with the same textual name are given unique names by appending an integer to the textual name. This action can be seen in the report file. Where a give subroutine is called from a large number of places, the local variables will by default be mapped to unique register names (unless they are declared static). Making them share the same register is a backtracking optimisation (see backtracking notes).

Formal parameters to subroutines will generally evaporate in the compilation process unless they are used also as working variables inside the body of that subroutine (i.e. are assigned to).

Addresses of Variables

If a program takes the address of a variable, this can safely be used as a pointer to that variable later on. Where the values of these pointers do not evaporate at compile time, numeric values are invented by CtoV and used at run time. The numeric values are shown in the report file.

The normal C sequencing of variables in memory space is not preserved by CtoV V1, so performing arithemtic on pointers to anything other than array subscripts is not supported. This rule will defeats any attemt to implement `varargs.h'. The addressing of consecutive components of a structure or union is also not implemented.

Structures and Unions

Memories

Memories must be on chip or offchip. CtoV will generate instances of memories and the signals to connect to them, but the user must provide the actual memories in the target technology environment.

The name of a memory always starts `SSRAM' and is then extended with width, length and port descriptions encoded into part of the name.

All memories have a clock input and data is stored in a memory via a write port if the associated write enable signal is high on the positive edge of the clock. Memories are L words long and W bits wide and L must be a power of 2. Write ports have a write enable signal and a W-bit wide input bus. Read ports have a W-bit wide read bus.

A memory may have any number of ports. Each port has an address bus of log2 L bits. Each port must fit one of the following forms:

  • RW - read or write, half duplex.
  • RAW - read and write (at once of of one location)
  • RO - read only
  • WO - write only
  • WD - write, but destructive of existing contents at the write address for reads on other ports during the same cycle
  • RWD - read or write, half duplex, but destructive of existing contents.

The RW and RWD ports have two sets of data wires, one for reading and one for writing, but only one set is used during any clock cycle. The two sets can be merged into a tri-state bus if the user places suitable tri-state buffers in a wrapper file (pad ring) which instantiates the module(s) generated by CtoV run(s).

Each output array needs to be delcared and then the ports can be defined afterwards.

The following command will define an output array called `holya' of width 32 bits and length 5 words. The minus sign is shorthand for `onchip'. The alternative allowable value is `offchip'.

      declare_output_array  -  holya 32  5

This declaration will make an entry in the Ram Array Report as follows. The number of ports and other information is added after the closing dollars marker and so does not form part of the declaration command.

CtoV Ram Array Report
*-----------------------------+------+-------+-------+-----------+---------*
| Array command               | Mode | Name  | Width | Locations | # Ports |
*-----------------------------+------+-------+-------+-----------+---------*
| $TTSET declare_output_array | -    | holya | 32    | 5 $       | 2       |
*-----------------------------+------+-------+-------+-----------+---------*
Flags: static=1, register=2, unsigned=4, voltaile=8, const=64



Ports are added to a declared output array using the commands like the following command, which adds the second port (0 is first) of type read only to output array 'holya'.

         declare_array_port holya 1 RO 
Output array holya, width=32, length=5
Ports
*---------------------------------+-------------+-------*
| Port command                    | Port Number | Class |
*---------------------------------+-------------+-------*
| $TTSET declare_array_port holya | 1           | RO $  |
| $TTSET declare_array_port holya | 0           | WO $  |
*---------------------------------+-------------+-------*

Mapping to Output Array

An input scalar or array can be manually mapped to a location in an output array provided the output array is sufficiently wide. The locations selected for the mapping should not be in use for other purposes.

The following command will map input scalar `mm' to location 5 of output array `holya'. If there is more than one variable in the source code with this name, all but one will be renamed by addition of a suffix. The new names can be found in the report file, but may change from one compilation to another, so if mapto is to be used, it is best to start with unique names.

TODO explain about multiple instances and scoping.

     $TTSET  mapto_output_array mm holya 5 $

The variable which is mapped (`mm') could equaly be an input array, in which case the nunmber given becomes the base address of a sequence of consecutive locations used for the input array in the output array.

The number '5' could be replaced with a minus sign, in which case the compiler will chose an otherwise unused location in the output array. TODO this autoselection is not currently implemented.

The output array name can be replaced with a minus sign, in which case the compiler will chose an output array to keep the scalar in. It is inadvisable to allow the comipler to select if low compilation effort is selected since a poor result is possible.

The final chosen maping of input variables is shown in the `CtoV Input Register Report' table of the report file. This table is for information only since it does not contain embedded control commands.

Note that the compiler cannot support generation of logic where a combinatorial output depends on the value in a RAM array.

TODO explain how to map part of an input array to an output array.

Default Array Creation

Uninitialised, static arrays in the C source file will default to onchip SSRAMs with a single RW port.

Initialised arrays in the C source file are converted to ROMs and cannot be changed at runtime. This is a source language restriction in CtoV version 1.

If an array has no ports declared, a single RW port is assumed.

Ram Simulation Models

If needed, a simulation model in Verilog of a RAM produced by CtoV can be automatically generated by the CtoV_RAMGEN program which can be run from the commandline.

Parameter Passing

CtoV V1 operates by expanding all subroutine calls into a flat structure before further compilation. For both the top-level routine and those called below it, input and output can be via free variables (globals in C) or the parameters. Call-by-value parameters can only be used as inputs since changes made to the formal parameter inside a routine do not have side-effects on the outside environment. For output, a value can be returned by a routine and call-by-reference can be used to pass out other values.

Major Design Styles

CtoV supports three modes of compilation and these support more than three major design styles.

Purely Combinatorial

A purely combinatorial design requires no clock or reset input and has no internal state. This is illustrated here with an AND gate.

Here is a section of C which makes a single AND gate. It is important for practical hardware design that an engineer can instruct the tools to produce gate-level features where required.

    void and2gate(u1 *y, u1 a, u1 b)
    {
       *y = a && b;
    }
In the example, the AND output is via a call by reference parameter instead of a returned value. An alternative would be as follows
    u1 and2gate(u1 a, u1 b)
    {
       return a & b;
    }
The switch to logical AND (&&) from binary AND (&) makes no difference in this example. The output from compiling the first version of this simple gate is as follows. The second version does not make sense to compile as a top level routine since the CtoV compiler discards the value returned from the top level routine.



Note that the combinatorial output from the simple AND gate is equivalent to a Mealy output from the sequential examples presented next.

A Simple Sequential Example

We can generate a four bit counter with clock enable, synchronous reset and combinatorial indication that the output is 15, with the following section of C.

      //
      // A four bit counter, implemented in C, for compilation to Verilog.
      //

      #include < cxctypes.h>

      void ctr4(u1 clk, u1 cx_reset, u1 cen, u4 *coutout, u1 *strobe)
      {
	static u4 state = 0;

	if (reset) state = 0;
	else if (cen) state = state + 1;

	*coutout = state;   // Moore output
	cx_barrier(clk);
	*strobe = (state == 15) && ~cx_reset; // Mealy output
      }

In this example, the first two terminals given by the user are called clk and cx_reset. Therefore the compiler will not need to add these to the output Verilog module, meaning the output module will have the same number of connections as the input module.

The most striking aspect of this section of code is the call the the subroutine `cx_barrier'. The assignment to strobe on after the barrier makes this a Mealy output that is a combinatorial function of the current state and the input values, whereas state is a registered (Moore) output.

If the barrier statement were missing.... TODO

The output Verilog created from this compilation is ... TODO

Sequential Example that creates a State Register for User States

Here is an example run of CtoV .

Sequential Example that creates a State Register for Array Sequencing

Compilation Effort and Metrics

There is no unique, optimal solution to non-trivial compilations. The compiler makes heuristic choices at many points within its algorithms. The user can influence the number of choices available and the number of solutions explored. This alters the trade off between quality of results and compile time.

The decisions made by CtoV during a particular compilation are reflected in the report file in many places, but particularly in a concise decision report table. The information in the decision report table is not intended to be fully comprehensible by the user but it can be read in as part for the control file for a subsequent recompilation, perhaps after a minor modification. The user needs to be aware that decision information from previous compilations may not be helpful if there have been significant changes to the source program or if it is intended to recompile with greater effort or different metrics: therefore the user should delete or otherwise avoid feeding the decision information into subsequent comilations in these circumstances.

C Library Routines

A set of basic, standard C library routines is provided in the directory cxlibc which is part of the standard distribution. These libraries include basic string routines and ctypes and so on. The routines are written in standard C.

Backtracking Optimisations

CtoV has to make a large number of decisions about how to map the program to hardware. Choices exisit in the order of evaluation of diadic operators and the degree to which ALU logic elements and anonymous registers are replicated or reused. The backtracking options will allow the user to ask CtoV to explore various parts of this space.

Installation and Directory Structure

The system currently requires VTOC to have been installed and that a soft link to the cv3core binary with the name `ctov' is placed on the PATH shell variable.

The TTCtoV shell variable should point to a directory containing the CtoV distribution, including the following directories:

  • cxinclude - the include files used instead of /usr/include during compilation
  • cxlibc - versions of basic C library functions such `strcmp'.
  • cxccfe - the C front end executable

Note that the C preprocessor has the flag TTCtoV defined when invoked by CtoV and this may be used by the designer to control compilation if desired. One additional flag can be passed in from the CtoV command line using the `-CD flag' argument.

The CVMETAPATH variable must include the directory CtoV/sml ahead of the VTOC entries.

A valid license file called LICENSE.cv3 should be in one of the directories on the CVMETAPATH. Contact us for license information.

Once installed and setup, the following help message should be displayed upon executing `ctov'.

CBG/TT CVA EDA CORE (Core release 1.3n shep pg ). Nov 15 1999
usenix:
ctov  [ options ] -read fn [ options ]
Normal options are:
    -about               print version and license information
    -o               specify output filename
    -f   read further command arguments from file
    -files         short for -f files
    -I          override CVPATH directory list
    +libdir+    add to CVPATH directory list
    -M          override CVMETAPATH directory list
    -vm n                tell garbage collector the target VM use to n in Mbytes
    -hardvm n            set hard limit on VM use to n in Mbytes
        -vndebug                 cause large backtrace on any error
        -CtoV    <.>     override CtoV root environment setting
        -rpt     <.>     give name of report file
        -ctl     <.>     give name of mapping and rolling info file
	-pathlen  n      set the pathlength (default 300)
        -iroot   <.>     give name of top level c routine to compile (default is
 main)
        -oroot   <.>     give name of output verilog module
        -quiet   <.>     disable progress messages during long compilations
        -read    <.>     give file name to be processed (with or without .c suffix)
        -verbose                 verbose style progress reports

Pathlen

The pathlen parameter is a limit on the number of barrier steps from the top of the flow graph derived from the input C files to any point that is actually executed. If the compiler exits with `pathlen exceeded' error, then the user can try to increase this limit. This situation may occur when a great deal of loop unwinding is needed or when the input program is long in the sense of a linear chain of barriers.


DJ Greaves, 2002.
CBG CtoV Home.