ECAD and Architecture Practical Classes

Tutorial 5 - RISC-V

The RISC-V Instruction-Set Architecture (ISA) is covered in the Introduction to Computer Architecture lectures. The following tutorial introduces our own Clarvi RISC-V core, use of the standard Spike RISC-V ISA simulator, how to compile assembler code for a RISC-V processor, and how to run this code on Spike (ISA only simulation) and Clarvi (full hardware cycle-accurate simulation).

Clarvi - a RISC-V processor

Clarvi ('Computer LAboratory RISC-V Implementation') is a simple, in-order, 6-stage pipeline implementation of a processor in SystemVerilog. It implements the base 32-bit RISC-V instruction set (RV32I) with minimal supervisor mode support. It can use a shared external instruction and data memory but has no caches. Additionally it can communicate with other peripherals in a system using a simple memory-mapped I/O bus, which we will use later on.

The full specification of the RISC-V instruction set can be found on RiscV.org, and we also have a copy of the RISC-V Green Card instruction set summary (also included in your Introduction to Computer Architecture lecture handout). You may also wish to consult our Assembly Programming Guide (see below) for tips on assembly language programming.

Clarvi tour

Find the Clarvi code in ecad-distribution2025/riscv/clarvi/

It contains several files:

clarvi.sv: The main processor description
riscv.svh: RISC-V instruction set definitions
clarvi_debug.sv: Debugging $display statements for the processor in simulation, in a separate file for clarity
clarvi_sim.sv: A toplevel testbench for the processor in simulation
bram.sv: The shared instruction/data memory for use in simulation
clarvi_avalon.sv: A wrapper for Clarvi when used on FPGA
clarvi_hw.tcl: A script to generate a Qsys component for Clarvi (not currently used in these ticks)

Look through clarvi.sv, riscv.svh and sim.sv and familiarise yourself with the key parts of the code. A key design aim was to keep the code short and simple to understand.

The Spike instruction-set simulator

The Spike ISA simulator is supported by RISC-V International, the custodians of the RISC-V architecture. See Routes/Setup for access to Spike. ISA simulators allow the functionality of a processor to be simulated but without implementation detail. Thus, Spike can run software compiled for the RISC-V ISA but it provides no information about processor cycle time, cache misses, etc.

Programming RISC-V

We will be writing RISC-V assembly code. We compile code into a memory image we either can load into Spike or a Verilator hardware simulation of Clarvi.

To generate the memory contents we use an assembler, using the following process:

For completeness, this diagram includes compilation of C code as well as assembler, but for the next two ticks we're going to focus on assembler.

First, source files are compiled into matching object files, which contain the instructions for each function but has not yet made a decision to where it will go in memory. Then all the object files, together with any additional libraries that might be used (none in this example) are linked together into a program binary, using a linker script to indicate where in memory all the parts should go. In our case the binary is in the ELF format. Since the target architecture we are generating code for (32-bit RISC-V) doesn't match the architecture the compiler is running on (e.g. 64-bit x86 Linux) this process is called cross-compiling.

Each binary is made up of a number of sections, including program instructions (.text), read-only data (.rodata), pre-initialised writable data (.data), and data the program has declared but not pre-defined (.bss). In a traditional operating system the binary would be loaded into memory using the operating system's loader or runtime linker, which would also allocate memory, maybe start a new process, and jump to the loaded code.

We are using RISC-V processors bare metal, i.e. with no operating system, so we use a much simpler technique. We extract the relevant sections from the ELF binary and simply bundle them all together into one memory image which we arrange to be in memory when the processor starts. This way the first instruction the CPU executes is the first instruction of your program - it doesn't have to worry about how to load code or initialise the program's data.

To manage the multiple steps of this build process, we use a Makefile. make is a command-line tool that calculates build dependencies and does the minimal number of steps necessary. A Makefile is a series of rules describing how to build one type of file from another type of file. For example, if you want to build sourcefile.o, Make will look for a source file called sourcefile.c or sourcefile.s and try and build it, using the rules provided in the Makefile. Make will do the minimal amount of work necessary - if it sees that the date on sourcefile.c has not changed since the last time it compiled it, it will not build it again but instead use the existing sourcefile.o. If you type make alone, it will look for a file of rules called Makefile, and attempt to execute the first rule it finds in it. You can also follow make with a rule name, for example make clean to remove any previously generated/built files.

RISC-V toolchain usage

To create RISC-V programs that can run on a RISC-V CPU, there is a cross-compiler installed prefixed riscv32-unknown-elf-. To compile a simple program, you can invoke riscv32-unknown-elf-gcc. To deal with specific details of the our processor, a Makefile, linker script (link.ld) and example program framework is provided.

Navigate to the ecad-distribution2025/riscv/assembly/ directory in your terminal and type make, you should see an output something like this:

$ make
mkdir -p build
riscv32-unknown-elf-gcc -c -o build/init.o src/init.s -O0 -march=rv32i
riscv32-unknown-elf-gcc -c -o build/div.o src/div.s -O0 -march=rv32i
riscv32-unknown-elf-gcc -c -o build/main.o src/main.c -O0 -march=rv32i
riscv32-unknown-elf-gcc -o build/program.elf -O0 -march=rv32i -static -fvisibility=hidden -nostdlib -nostartfiles -T link.ld build/init.o build/div.o build/main.o
riscv32-unknown-elf-objcopy -O binary --only-section=.data* --only-section=.text* build/program.elf build/mem.bin
hexdump -v -e '"%08x\n"' build/mem.bin > build/mem.txt
python3 txt2hex.py build/mem.txt build/mem.hex 4
riscv32-unknown-elf-objdump -S -s build/program.elf > build/program.dump

In this case the Makefile will search the src directory for files ending in .c and .s and attempt to build them. After creating a directory to hold the output, you can see it assembles three source files main.c, init.s and div.s into .o files. These are linked (using the .ld script) into an ELF file, which is then converted into a flat binary image (.bin). Finally a disassembly dump is generated.

Have a look at build/program.dump. See if you can follow how the information in link.ld, init.s, div.s and main.c was used to build the program and how the program is structured.

To save time while compiling, make will try hard not to recompile files you didn't change. In case you want to start afresh, the clean target inside the Makefile will delete all the compiler output files. Typing make clean will force the next run of make to rebuild everything.

Simulating using Spike

In the files bundle we have provided a Makefile to simulate the code using Spike. Run, navigate to the ecad-distribution2025/riscv/assembly/ directory and run:

make spike-log

You can also use make spike-debug which provides an interactive debugger which can single-step through the assembly instructions (type 'h' for help). Using make spike will run without the instruction trace being printed.

Simulating Clarvi

To generate a cycle accurate simulation of the Clarvi SystemVerilog code, we are going to use Verilator to produce a C-model of the hardware and then compile that into a binary. To build the simulator, navigate to ecad-distribution2025/riscv/clarvi and in a terminal type:

make

This will create a new directory obj_dir/ and in it the simulation binary: ./obj_dir/Vclarvi_sim

Navigating back to ecad-distribution2025/riscv/assembly/ we can then use the Makefile to run the code on the simulator viz:

make clarvi-sim

This produces a simulation trace as output showing the instructions executed, the time taken and any pipeline bubbles.

Debugging your code

In simulation, the basic Clarvi and Spike have no input/output devices like LEDs or displays. We have added a simple debugging printout to display a 32 bit value. We have implemented Control/Status Register (CSR) 0x800. so that any write to this register prints the register name and value to the simulation log.

	csrw 0x800, xN

Because this construction is a bit clumsy to use regularly, in the assembly source we have defined an assembler macro to give it a nicer name:

	.macro  DEBUG_PRINT     reg
	csrw 0x800, \reg
	.endm

And then you can print register t1 with:

	DEBUG_PRINT	t1

Assembler programming guide

Some tips on assembly language programming for RISC-V...

Registers

Rather than local variables, we have registers to store temporary values and pass data to and from functions. The RISC-V calling convention dictates whether each register must be preserved by the caller or the callee, and which have special purposes. In particular, it specifies which registers are used for function arguments and return values.

To preserve data we can use the stack. This just involves decrementing the stack pointer `sp` appropriately and then storing data to the location that `sp` points to. Since you will only be writing simple assembly functions this may not be needed.

Instructions

The RISC-V Green Card has a full listing (note that our architecture is RV32I). Here are the instructions you will likely find most useful:

    beq  rs1, rs2, label    # branch if rs1 == rs2
                            # similarly: bne (!=), blt (<), bge (>=)

    add  rd, rs1, rs2       # rd := rs1 + rs2
    sub  rd, rs1, rs2       # rd := rs1 - rs2
                            # similarly: xor, or, and

    sll  rd, rs1, rs2       # rd := rs1 << rs2
    srl  rd, rs1, rs2       # rd := rs1 >> rs2
    sra  rd, rs1, rs2       # rd := rs1 >>> rs2 (arithmetic shift sign-extends the top bits of rs1)

    slt  rd, rs1, rs2       # if rs1 < rs2 then rd := 1 else rd := 0

    addi rd, rs, immediate  # rd := rs1 + immediate. The immediate must be at most 12-bits.
                            # similarly xori, ori, andi, slli, srli, srai, slti

    sw   rs2, offset(rs1)   # *(rs1 + offset) := rs2. Store the value in rs2 to the address (rs1 + offset).
    lw   rd, offset(rs1)    # rd := *(rs1 + offset). Load the value at address (rs1 + offset) into rd.

The assembler also provides a variety of pseudo-instructions, which correspond to one or two more complicated instructions but make life easier for the pogrammer. You may find the following useful:

    li   rd, immediate      # rd := immediate
    mv   rd, rs             # rd := rs

    bgt  rs1, rs2, label    # branch to label if rs1 > rs2
    ble  rs1, rs2, label    # branch to label if rs1 <= rs2
    beqz rs, label          # branch to label if rs1 == 0
                            # similarly: bnez, blez, bgez, bltz, bgtz

    j    label              # unconditional jump to label
    call label              # call subroutine (jump to label and store the return address into register ra)
    ret                     # return from subroutine (jump to the address in register ra)

We have also provided some special functionality for debugging:

    ecall                   # stops the simulation of Clarvi
    csrw  0x800, rs         # output a debug message in the trace containing the value of rs

Control flow

In high level languages `goto` is to be avoided, but in assembler that is the only control flow mechanism. This means we must translate for-loops and if-else statements into linear versions using branches instead.

For instance, a conditional statement in C such as:

    if (i > 42) {
        // true path
    } else {
        // false path
    }

Might become in assembly:

        li   t1, 42         # initialize state for conditiom
        ble  t0, t1, else   # if condition not met, go to false path
        ...                 # true path
        j    end
    else:
        ...                 # false path
    end:
        ...

And a simple for loop in C:

    for (int i = 0; i < 10; i++) {
        // something terribly clever
    }

Becomes in assembly:

        li   t0, 0          # initialize the loop variable
        li   t1, 10         # initialize the condition variable
    loop:
        bge  t0, t1, end    # if the loop condition is not met, exit
        ...                 # something terribly clever
        addi t1, t1, 1      # increment the loop variable
        j    loop           # loop
    end:
        ...

In many cases the number of branches required can be reduced by updating the loop variable at the start of the loop or checking the condition at the end. Extra care must be taken when there are `break` statements within the loop to ensure that the loop variable is always updated.

Do and while loops are implemented similarly.

RISC-V ticks

The next two tabs present ticks that cover:

Tick 3 - Assembler programming and simulation on Spike
Tick 4 - Performance analysis running code on hardware