Department of Computer Science and Technology

Course pages 2017–18

ECAD and Architecture Practical Classes

Clarvi - a RISC-V processor

Clarvi ('Computer LAboratory RISC-V Implementation') is a simple, in-order, 6-stage pipeline implementation of a processor in SystemVerilog. It implements the base 32-bit RISC-V instruction set (RV32I) with minimal supervisor mode support. It can use a shared external instruction and data memory but has no caches. Additionally it can communicate with other peripherals in a system using a simple memory-mapped I/O bus, which we will use later on. Clarvi is described in more detail in the Computer Design lectures.

The full specification of the RISC-V instruction set can be found on RiscV.org, and we also have a copy of the RISC-V Green Card instruction set summary (also included in your Computer Design handout). You may also wish to consult our Assembly Programming Guide for tips on assembly language programming.

Clarvi tour

To fetch the Clarvi code, download the clarvi.zip

Inside you will find several files:

clarvi.sv
The main processor description
riscv.svh
RISC-V instruction set definitions
clarvi_debug.sv
Debugging $display statements for the processor in simulation, in a separate file for clarity
clarvi_sim.sv
A toplevel testbench for the processor in simulation
bram.sv
The shared instruction/data memory for use in simulation
clarvi_avalon.sv
A wrapper for Clarvi when used on FPGA
clarvi_test.do
A Tcl script to compile and configure a Clarvi simulation
clarvi_hw.tcl
A script to generate a Qsys component for Clarvi/dd>

Glance through clarvi.sv, riscv.svh and sim.sv and familiarise yourself with the key parts of the code (this will be covered in lectures).

Programming Clarvi

We will begin by writing RISC-V assembly code, and then later link our assembly with code written in C. We compile code into a memory image we either can load into a Modelsim simulation of Clarvi, or build into Clarvi running on FPGA.

To generate the memory contents we use a C compiler and assembler, using the following process:

Compiler flowchart

First, source files are compiled into matching object files, which contain the instructions for each function but has not yet made a decision to where it will go in memory. Then all the object files, together with any additional libraries that might be used (none in this example) are linked together into a program binary, using a linker script to indicate where in memory all the parts should go. In Clarvi's case the binary is in the ELF format.

Each binary is made up of a number of sections, including program instructions (.text), read-only data (.rodata), pre-initialised writable data (.data), and data the program has declared but not pre-defined (.bss). In a traditional operating system the binary would be loaded into memory using the operating system's loader or runtime linker, which would also allocate memory, maybe start a new process, and jump to the loaded code.

We are using Clarvi bare metal, i.e. with no operating system, so we use a much simpler technique. We extract the relevant sections from the ELF binary and simply bundle them all together into one memory image which we arrange to be in memory when the processor starts. This way the first instruction Clarvi executes is the first instruction of your program - it doesn't have to worry about how to load code or initialise the program's data.

To manage the multiple steps of this build process, we use a Makefile. Make is a command-line tool that calculates build dependencies and does the minimal number of steps necessary. A Makefile is a series of rules describing how to build one type of file from another type of file. For example, if you want to build sourcefile.o, Make will look for a source file called sourcefile.c or sourcefile.s and try and build it, using the rules provided in the Makefile. Make will do the minimal amount of work necessary - if it seems that the date on sourcefile.c has not changed since the last time it compiled it, it will not build it again but instead use the existing sourcefile.o. If you type 'make' alone, it will look for a file of rules called Makefile, and attempt to execute the first rule it finds in it. You can also follow 'make' with a rule name, for example 'make clean'.

The RISC-V toolchain

For creating RISC-V programs that can run on the Clarvi, there is a cross-compiler installed prefixed riscv32-unknown-elf-. To compile a simple program, you can invoke riscv32-unknown-elf-gcc. To deal with specific details of the Clarvi processor, a Makefile, linker script (link.ld) and example program framework is provided.

Download: software.zip. Navigate to the example-asm directory in your terminal and type "make", you should see an output something like this:

$ make
mkdir -p build
riscv32-unknown-elf-gcc -c -o build/init.o src/init.s -O0 -march=rv32i
riscv32-unknown-elf-gcc -c -o build/main.o src/main.s -O0 -march=rv32i
riscv32-unknown-elf-gcc -o build/program.elf build/init.o build/main.o -T link.ld -nostdlib
riscv32-unknown-elf-objcopy -O binary --only-section=.data* --only-section=.text* build/program.elf build/mem.bin
hexdump -v -e '"%08x\n"' build/mem.bin > build/mem.txt
python txt2hex.py build/mem.txt build/mem.hex 4
riscv32-unknown-elf-objdump -S -s build/program.elf > build/program.dump

In this case the Makefile will search the src directory for files ending in .c and .s and attempt to build them. After creating a directory to hold the output, you can see it assembles two source files main.s and init.s into .o files. These are linked (using the .ld script) into an ELF file, which is then converted into a flat binary image (.bin). Then we generate memory images (in two different hexadecimal formats, .txt for Modelsim, .hex for Quartus). Finally a disassembly dump is generated.

Have a look at build/program.dump. See if you can follow how the information in link.ld, init.s and main.s was used to build the program and how the program is structured.

To save time while compiling, make will try hard not to recompile files you didn't change. In case you want to start afresh, the clean target inside the Makefile will delete all the compiler output files. Typing make clean will force the next run of make to rebuild everything.

Simulating the Clarvi

Now we have a program, we need to simulate the Clarvi processor.

Make a new directory exercise3. Put clarvi and example-asm directories inside. Then, in a terminal:

To simulate the Clarvi, start Modelsim from the clarvi directory. When started, type:

do clarvi_test.do ../example-asm/build/mem.txt TRACE

in the transcript window. This should compile the Clarvi sources, set up the waveforms and initialise the clock signal. Now you can use the run buttons to start simulation, or type run 1us in the transcript window to run for a short length of time.

In addition to waveforms, Clarvi also outputs an instruction trace in the transcript window. Besides tracing program flow, we can use this trace output to track both intermediate and return values of a function by looking at the register and memory writes. See here for an explanation of what is happening in the trace.

In simulation, the basic Clarvi has no input/output devices. We have added a simple debugging printout to display a 32 bit value. Control/Status Register (CSR) 0x7b2 is the 'dscratch' register, a temporary register used by the debugging unit. We have wired so that any write to this register prints the register name and value to the simulation log.

Register xN can be printed by:

	csrw 0x7b2, xN
		

Because this construction is a bit clumsy to use regularly, at the top of your assembly source you can define an assembler macro to give it a nicer name:

	.macro  DEBUG_PRINT     reg
	csrw 0x7b2, \reg
	.endm
		

And then you can print register t1 with:

	DEBUG_PRINT	t1
		

You can also see the value appear on the debug_scratch wires in your simulation. If the instruction trace makes it hard to find your debug printouts, you can turn it off by removing TRACE from the clarvi_test.do command line.

Exercise

Write a RISC-V assembly program to perform integer division.

Division is slow and expensive in hardware, and we use it relatively rarely. Instead we can use a subroutine to perform division in software, and call this instead of a hardware divide instruction.

Wikipedia gives the long division algorithm as follows (in Pascal like pseudo-code):

Q := 0                 -- initialize quotient and remainder to zero
R := 0                     
for i = n-1...0 do     -- where n is number of bits in N
  R := R << 1          -- left-shift R by 1 bit
  R(0) := N(i)         -- set the least-significant bit of R equal to bit i of the numerator
  if R >= D then
    R := R - D
    Q(i) := 1
  end
end

In assembly, write a function div that for two numbers in registers a0 and a1, calculates (a0/a1) and returns the quotient in a0 and remainder in a1. Use the provided example-asm framework as a starting point, and put your function at the top of main.s. Call it from the main() function. You might want to think about how your function should behave when given 0 as a denominator: the above pseudo-code would lead to a quotient with n 1's as the least significant bits, and a remainder equal to the numerator. A viable alternative might be to throw an exception (with ECALL).

Verify that the program you just wrote behaves as expected. For this, you will simulate the Clarvi processor in Modelsim. Use this template to invoke your code:

	.macro  DEBUG_PRINT     reg
	csrw 0x7b2, \reg
	.endm

        addi    a0, zero, 12    # a0 <- 12
        addi    a1, zero, 4     # a1 <- 4
        call    div
        DEBUG_PRINT a0          # display the quotient
        DEBUG_PRINT a1          # display the remainder

        addi    a0, zero, 93    # a0 <- 93
        addi    a1, zero, 7     # a1 <- 7
        call    div
        DEBUG_PRINT a0          # display the quotient
        DEBUG_PRINT a1          # display the remainder

        lui     a0, (0x12345000>>12)
        ori     a0, a0, 0x678   # a0 <- 0x12345678
        addi    a1, zero, 255   # a1 <- 255
        call    div
        DEBUG_PRINT a0          # display the quotient
        DEBUG_PRINT a1          # display the remainder
		

Follow the procedure to convert your source code into textual format and preload the Clarvi memory with it. Carefully analyse the program trace to compare the implementation's output with your expectation.

C on Clarvi

Clarvi implements enough of the RISC-V instruction set to be targetable by a C compiler. C enables us to build larger programs and makes it easier to port them to different CPUs.

Copy example-c from software.zip which you downloaded earlier into the exercise3 directory. Change to the example-c directory.

This is similar to the example-asm project, except with C functions. C programs always begin at the main() function, which we have provided in main.c. As before, init.s sets up the environment so that we can begin execution of the C program.

The Makefile works in a similar way - run 'make' to build, and then simulate using the name of the mem.txt file inside the example-c/build directory. Currently the code will not compile because we have not defined the debug_print function, but we will define it in a second.

C syntax is similar to Java. See An Introduction to C for differences.

The GCC C compiler has a number of optimisation levels, that set how hard it works to make the output assembler faster or smaller. Try adding -O0 to -O3 to CFLAGS in the Makefile to optimise for speed, or -Os to optimise for size, and look at the assembly code it generates. A full list of options can be found in the GCC manual. (-Os will try to use functions not present in our environment and so fail to link, but you can still look at the generated .s file).

Mixing C and assembler

You can write some parts of your program in C and some in assembler. To do this, we need to arrange for the C to put the parameter values into the correct registers for the assembler to pick up, and vice versa. This is defined by the processor's calling convention. On the RISC-V the calling convention is to use registers 10-17 (named a0-a7) for function arguments and 10-11 (a0-a1) for return values. a0-a7 correspond to the first 8 arguments provided to a C function, and the return value from C should be provided in a0. Note that C only provides access to a single return value.

For example, an addition function might look like:

	# int add(int a, int b): add two numbers together
	# parameters supplied in a0 and a1
	# doesn't call any other functions, so no need to store return address register (ra)
	# doesn't corrupt any callee-save registers, so no need to use the stack

	.global add		# export the function symbol so the linker can find it
	add:
	        # add the two parameters, returning the result in the return register a0
	       	add		a0,a0,a1
	        
	        ret
		

To call from C code we need a function prototype to tell the C compiler the types and parameters of the function, without defining it. For example, if we have an assembler function taking two integers and returning another, in a file called asmfunctions.h (or some other name) enter:

		int add(int a, int b);
		

Then your C code can:

		#include "asmfunctions.h"
		

to make the definition available to this C file. You can then call myfunction() anywhere below the #include. The advantage of .h header files is you can easily include them in multiple files, to enable calling your function from multiple places.

Note that, while C is a type-checked language, the assembler will not type check you. So there are no safeguards if your declaration of myfunction() does not match your usage of registers in assembler.

Exercise 3

Make a copy of your division code and modify it into a function rem that returns the remainder in a0. Write a C function prototype for it.

In C, write a program using div and rem to display a digital clock in minutes, seconds and centiseconds. You'll produce the output in Binary Coded Decimal, ie 17m 34.89s is 0x00173489.

To read the time, we'll use the Clarvi's internal cycle counter. Make a new file cycles.s:

.section .text
.global get_time

get_time:
        csrr a0, time
        ret

This defines a function with the prototype:

int get_time(void);

The get_time function will now return the number of CPU cycles since power on (as this is a 32-bit value, it will wraparound every 86 seconds at 50MHz). To make the simulation faster, for now you should assume there are 1,000 ticks of this value per second. Using the actual value (100,000,000 in simulation) would require you to simulate for a very long time to test even the seconds counter of your clock. This highlights how slow simulation can be compared to FPGA, which could run this in real time.

The Modelsim log window will only display a limited number of lines. It also outputs the log to a file called transcript in the directory you started Modelsim from. You can open this in an editor, or the UNIX command
grep Debug transcript
will print all the lines containing the string Debug. You can also run without TRACE to not display the instruction trace.

Output the value by writing to address 0x04000080, the address of the hex LEDs in the simulator. This will cause a message in the log. You can use this function:

void hex_output(int value)
{
	int *hex_leds = (int *) 0x04000080;  // define a pointer to the register
	*hex_leds = value;                   // write the value to that address
}

Note also that we don't have a standard library, so functions like printf, malloc or memset do not exist (unless we write them). The Clarvi CPU also does not have multiply - if the compiler cannot optimise a constant multiply to adds and shifts, it will call a multiply function which we don't have (unless you write one).

To use the debug port from C you can use the following C function (an example of inline assembly):

void debug_print(int value)
{
	asm ("csrw	0x7B2, %0" : : "r" (value) );
}

Continuously read and display the time so that you can check your system correctly displays transitions from centiseconds to seconds and seconds to minutes.