ECAD and Architecture Practical Classes
Tutorial 5 - RISC-V
The RISC-V Instruction-Set Architecture (ISA) is covered in the Introduction to Computer Architecture lectures. The following tutorial introduces our own Clarvi RISC-V core, use of the standard Spike RISC-V ISA simulator, how to compile assembler code for a RISC-V processor, and how to run this code on Spike (ISA only simulation) and Clarvi (full hardware cycle-accurate simulation).
Clarvi - a RISC-V processor
Clarvi ('Computer LAboratory RISC-V Implementation') is a simple, in-order, 6-stage pipeline implementation of a processor in SystemVerilog. It implements the base 32-bit RISC-V instruction set (RV32I) with minimal supervisor mode support. It can use a shared external instruction and data memory but has no caches. Additionally it can communicate with other peripherals in a system using a simple memory-mapped I/O bus, which we will use later on.
The full specification of the RISC-V instruction set can be found on RiscV.org, and we also have a copy of the RISC-V Green Card instruction set summary (also included in your Introduction to Computer Architecture lecture handout). You may also wish to consult our Assembly Programming Guide (see below) for tips on assembly language programming.
Clarvi tour
Find the Clarvi code in ecad-distribution2025/riscv/clarvi/
It contains several files:
- clarvi.sv
- The main processor description
- riscv.svh
- RISC-V instruction set definitions
- clarvi_debug.sv
- Debugging $display statements for the processor in simulation, in a separate file for clarity
- clarvi_sim.sv
- A toplevel testbench for the processor in simulation
- bram.sv
- The shared instruction/data memory for use in simulation
- clarvi_avalon.sv
- A wrapper for Clarvi when used on FPGA
- clarvi_hw.tcl
- A script to generate a Qsys component for Clarvi (not currently used in these ticks)
Look through clarvi.sv, riscv.svh and sim.sv and familiarise yourself with the key parts of the code. A key design aim was to keep the code short and simple to understand.
The Spike instruction-set simulator
The Spike ISA simulator is supported by RISC-V International, the custodians of the RISC-V architecture. See Routes/Setup for access to Spike. ISA simulators allow the functionality of a processor to be simulated but without implementation detail. Thus, Spike can run software compiled for the RISC-V ISA but it provides no information about processor cycle time, cache misses, etc.
Programming RISC-V
We will be writing RISC-V assembly code. We compile code into a memory image we either can load into Spike or a Verilator hardware simulation of Clarvi.
To generate the memory contents we use an assembler, using the following process:
For completeness, this diagram includes compilation of C code as well as assembler, but for the next two ticks we're going to focus on assembler.
First, source files are compiled into matching object files, which contain the instructions for each function but has not yet made a decision to where it will go in memory. Then all the object files, together with any additional libraries that might be used (none in this example) are linked together into a program binary, using a linker script to indicate where in memory all the parts should go. In our case the binary is in the ELF format. Since the target architecture we are generating code for (32-bit RISC-V) doesn't match the architecture the compiler is running on (e.g. 64-bit x86 Linux) this process is called cross-compiling.
Each binary is made up of a number of sections, including program instructions (.text), read-only data (.rodata), pre-initialised writable data (.data), and data the program has declared but not pre-defined (.bss). In a traditional operating system the binary would be loaded into memory using the operating system's loader or runtime linker, which would also allocate memory, maybe start a new process, and jump to the loaded code.
We are using RISC-V processors bare metal, i.e. with no operating system, so we use a much simpler technique. We extract the relevant sections from the ELF binary and simply bundle them all together into one memory image which we arrange to be in memory when the processor starts. This way the first instruction the CPU executes is the first instruction of your program - it doesn't have to worry about how to load code or initialise the program's data.
To manage the multiple steps of this build process, we use a Makefile. make is a command-line tool that calculates build dependencies and does the minimal number of steps necessary. A Makefile is a series of rules describing how to build one type of file from another type of file. For example, if you want to build sourcefile.o, Make will look for a source file called sourcefile.c or sourcefile.s and try and build it, using the rules provided in the Makefile. Make will do the minimal amount of work necessary - if it sees that the date on sourcefile.c has not changed since the last time it compiled it, it will not build it again but instead use the existing sourcefile.o. If you type make alone, it will look for a file of rules called Makefile, and attempt to execute the first rule it finds in it. You can also follow make with a rule name, for example make clean to remove any previously generated/built files.
RISC-V toolchain usage
To create RISC-V programs that can run on a RISC-V CPU, there is a cross-compiler installed prefixed riscv32-unknown-elf-. To compile a simple program, you can invoke riscv32-unknown-elf-gcc. To deal with specific details of the our processor, a Makefile, linker script (link.ld) and example program framework is provided.
Navigate to the ecad-distribution2025/riscv/assembly/ directory in your terminal and type make, you should see an output something like this:
$ make mkdir -p build riscv32-unknown-elf-gcc -c -o build/init.o src/init.s -O0 -march=rv32i riscv32-unknown-elf-gcc -c -o build/div.o src/div.s -O0 -march=rv32i riscv32-unknown-elf-gcc -c -o build/main.o src/main.c -O0 -march=rv32i riscv32-unknown-elf-gcc -o build/program.elf -O0 -march=rv32i -static -fvisibility=hidden -nostdlib -nostartfiles -T link.ld build/init.o build/div.o build/main.o riscv32-unknown-elf-objcopy -O binary --only-section=.data* --only-section=.text* build/program.elf build/mem.bin hexdump -v -e '"%08x\n"' build/mem.bin > build/mem.txt python3 txt2hex.py build/mem.txt build/mem.hex 4 riscv32-unknown-elf-objdump -S -s build/program.elf > build/program.dump
In this case the Makefile will search the src directory for files ending in .c and .s and attempt to build them. After creating a directory to hold the output, you can see it assembles three source files main.c, init.s and div.s into .o files. These are linked (using the .ld script) into an ELF file, which is then converted into a flat binary image (.bin). Finally a disassembly dump is generated.
Have a look at build/program.dump. See if you can follow how the information in link.ld, init.s, div.s and main.c was used to build the program and how the program is structured.
To save time while compiling, make will try hard not to recompile files you didn't change. In case you want to start afresh, the clean target inside the Makefile will delete all the compiler output files. Typing make clean will force the next run of make to rebuild everything.
Simulating using Spike
In the files bundle we have provided a Makefile to simulate the code using Spike. Run, navigate to the ecad-distribution2025/riscv/assembly/ directory and run:
make spike-log
You can also use make spike-debug which provides an interactive debugger which can single-step through the assembly instructions (type 'h' for help). Using make spike will run without the instruction trace being printed.
Simulating Clarvi
To generate a cycle accurate simulation of the Clarvi SystemVerilog code, we are going to use Verilator to produce a C-model of the hardware and then compile that into a binary. To build the simulator, navigate to ecad-distribution2025/riscv/clarvi and in a terminal type:
make
This will create a new directory obj_dir/ and in it the simulation binary: ./obj_dir/Vclarvi_sim
Navigating back to ecad-distribution2025/riscv/assembly/ we can then use the Makefile to run the code on the simulator viz:
make clarvi-sim
This produces a simulation trace as output showing the instructions executed, the time taken and any pipeline bubbles.
Debugging your code
In simulation, the basic Clarvi and Spike have no input/output devices like LEDs or displays. We have added a simple debugging printout to display a 32 bit value. We have implemented Control/Status Register (CSR) 0x800. so that any write to this register prints the register name and value to the simulation log.
Register xN can be printed by:
csrw 0x800, xN
Because this construction is a bit clumsy to use regularly, in the assembly source we have defined an assembler macro to give it a nicer name:
.macro DEBUG_PRINT reg csrw 0x800, \reg .endm
And then you can print register t1 with:
DEBUG_PRINT t1
Assembler programming guide
Some tips on assembly language programming for RISC-V...
Registers
Rather than local variables, we have registers to store temporary values and pass data to and from functions. The RISC-V calling convention dictates whether each register must be preserved by the caller or the callee, and which have special purposes. In particular, it specifies which registers are used for function arguments and return values.
To preserve data we can use the stack. This just involves decrementing the stack pointer `sp` appropriately and then storing data to the location that `sp` points to. Since you will only be writing simple assembly functions this may not be needed.
Instructions
The RISC-V Green Card has a full listing (note that our architecture is RV32I). Here are the instructions you will likely find most useful:
beq rs1, rs2, label # branch if rs1 == rs2 # similarly: bne (!=), blt (<), bge (>=) add rd, rs1, rs2 # rd := rs1 + rs2 sub rd, rs1, rs2 # rd := rs1 - rs2 # similarly: xor, or, and sll rd, rs1, rs2 # rd := rs1 << rs2 srl rd, rs1, rs2 # rd := rs1 >> rs2 sra rd, rs1, rs2 # rd := rs1 >>> rs2 (arithmetic shift sign-extends the top bits of rs1) slt rd, rs1, rs2 # if rs1 < rs2 then rd := 1 else rd := 0 addi rd, rs, immediate # rd := rs1 + immediate. The immediate must be at most 12-bits. # similarly xori, ori, andi, slli, srli, srai, slti sw rs2, offset(rs1) # *(rs1 + offset) := rs2. Store the value in rs2 to the address (rs1 + offset). lw rd, offset(rs1) # rd := *(rs1 + offset). Load the value at address (rs1 + offset) into rd.
The assembler also provides a variety of pseudo-instructions, which correspond to one or two more complicated instructions but make life easier for the pogrammer. You may find the following useful:
li rd, immediate # rd := immediate mv rd, rs # rd := rs bgt rs1, rs2, label # branch to label if rs1 > rs2 ble rs1, rs2, label # branch to label if rs1 <= rs2 beqz rs, label # branch to label if rs1 == 0 # similarly: bnez, blez, bgez, bltz, bgtz j label # unconditional jump to label call label # call subroutine (jump to label and store the return address into register ra) ret # return from subroutine (jump to the address in register ra)
We have also provided some special functionality for debugging:
ecall # stops the simulation of Clarvi csrw 0x800, rs # output a debug message in the trace containing the value of rs
Control flow
In high level languages `goto` is to be avoided, but in assembler that is the only control flow mechanism. This means we must translate for-loops and if-else statements into linear versions using branches instead.
For instance, a conditional statement in C such as:
if (i > 42) { // true path } else { // false path }
Might become in assembly:
li t1, 42 # initialize state for conditiom ble t0, t1, else # if condition not met, go to false path ... # true path j end else: ... # false path end: ...
And a simple for loop in C:
for (int i = 0; i < 10; i++) { // something terribly clever }
Becomes in assembly:
li t0, 0 # initialize the loop variable li t1, 10 # initialize the condition variable loop: bge t0, t1, end # if the loop condition is not met, exit ... # something terribly clever addi t1, t1, 1 # increment the loop variable j loop # loop end: ...
In many cases the number of branches required can be reduced by updating the loop variable at the start of the loop or checking the condition at the end. Extra care must be taken when there are `break` statements within the loop to ensure that the loop variable is always updated.
Do and while loops are implemented similarly.
RISC-V ticks
The next two tabs present ticks that cover:- Tick 3 - Assembler programming and simulation on Spike
- Tick 4 - Performance analysis running code on hardware