Course pages 2015–16

ECAD and Architecture Practical Classes

Yarvi - a RISC-V processor

Yarvi ('Yet Another RISC-V Implementation') is a simple, in-order, 3-stage pipeline implementation of a processor in SystemVerilog. It implements the base 32-bit RiscV instruction set (RV32I). It has both 4KB of instruction and data memory but no caches. The version that we use also features an Avalon master port so that we can communicate with other peripherals in a system using a simple memory-mapped I/O bus, which we will use later on. Yarvi is described in more detail in the Computer Design lectures.

The full specification of the RISC-V instruction set can be found on RiscV.org, and we also have a copy of the RISC-V Green Card instruction set summary.

Yarvi tour

To fetch the Yarvi code, download the yarvi.zip

Inside you will find several files:

yarvi.sv: The main processor description
yarvi_sim.svp: Debugging $display statements for the processor in simulation, in a separate file for clarity
riscv.h: RISC-V instruction set definitions
yarvi_test.sv: A toplevel testbench for the processor in simulation
initregs.txt: Initial register contents
yarvi_test.do: A Tcl script to compile and configure a Yarvi simulation
yarvi_avalon.sv, yarvi_avalon.c, yarvi_avalon.h, avalon_addr.h, displayemul.c, Makefile: An emulator for FPGA components, using an interface between C and Modelsim, to emulate peripherals without simulating their Verilog (more on this later)

Glance through yarvi.sv and yarvi_test.sv and familiarise yourself with the key parts of the code (this will be covered in lectures)

Programming Yarvi

Yarvi has separate instruction and data memories, which means that it will fetch instructions from a different memory to where it loads and stores data (and thus it cannot modify its own code). Each memory is defined in a hex file, a simple list of hexadecimal numbers, the data at each address being on a successive line. These are defined in several files that are included in the Yarvi when you simulate or synthesise:

imem.txt: Hexadecimal representation of the program instructions
dmem.txt: Hexadecimal representation of data memory contents as 32 bit words
dmem.N.txt(for 0 ≤ N ≤ 3): Hexadecimal representation of the N'th byte of data memory.

Because of the way Yarvi is designed, its data memory is stored in 8 bit chunks, one for each byte in a 32 bit word. Thus we generate dmem.txt and then split it into four dmem.N.txt chunks.

To generate the memory contents we use a C compiler and assembler, using the following process:

First, source files are compiled into matching object files, which contain the instructions for each function but has not yet made a decision to where it will go in memory. Then all the object files, together with any additional libraries that might be used (none in this example) are linked together into a program binary, using a linker script to indicate where in memory all the parts should go. In Yarvi's case the binary is in the ELF format.

Each binary is made up of a number of sections, including program instructions (.text), read-only data (.rodata), pre-initialised writable data (.data), and data the program has declared but not pre-defined (.bss). In a traditional operating system the binary would be loaded into memory using the operating system's loader or runtime linker, which would also allocate memory, maybe start a new process, and jump to the loaded code.

We are using Yarvi bare metal, i.e. with no operating system, so we use a much simpler technique. From the ELF binary we extract the program instructions (.text segment) and write them into the imem.txt file. When the Yarvi starts, its instruction memory will already contain the program ready to start. Similarly we take any predefined data in the program and first output dmem.txt, and then split into dmem.N.txt.

To manage the multiple steps of this build process, we use a Makefile. Make is a command-line tool that calculates build dependencies and does the minimal number of steps necessary. A Makefile is a series of rules describing how to build one type of file from another type of file. For example, if you want to build sourcefile.o, Make will look for a source file called sourcefile.c or sourcefile.s and try and build it, using the rules provided in the Makefile. Make will do the minimal amount of work necessary - if it seems that the date on sourcefile.c has not changed since the last time it compiled it, it will not build it again but instead use the existing sourcefile.o. If you type 'make' alone, it will look for a file of rules called Makefile, and attempt to execute the first rule it finds in it. You can also follow 'make' with a rule name, for example 'make clean' (which we define to delete all the generated output files).

The RISC-V toolchain

For creating RiscV programs that can run on the Yarvi, there is a cross-compiler installed prefixed riscv32-unknown-elf-. To compile a simple program, you can invoke riscv32-unknown-elf-gcc. To deal with specific details of the Yarvi processor, a Makefile, linker script (prog-load.ld) and example program framework is provided.

Download: softwae=re.zip. Navigate to the example-asm directory in your terminal and type "make", you should see an output something like this:

$ make
riscv32-unknown-elf-as -c -o main.o main.s 
Assembler messages:
Warning: cannot compress debug sections (zlib not installed)
riscv32-unknown-elf-as -c -o init.o init.s 
Assembler messages:
Warning: cannot compress debug sections (zlib not installed)
if [ ! -d "program" ]; then \
		mkdir program; \
	fi
riscv32-unknown-elf-ld -o program/program.bin -T prog-link.ld init.o main.o
riscv32-unknown-elf-objcopy -O binary --only-section=.text program/program.bin program/program.imem
hexdump -v -e '"%08x\n"' program/program.imem > program/imem.txt
riscv32-unknown-elf-objcopy -O binary --pad-to=0x10001000 --only-section=DATA program/program.bin program/program.dmem
hexdump -v -e '"%08x\n"' program/program.dmem > program/dmem.txt
cut -b 7-8 program/dmem.txt > program/dmem.0.txt
cut -b 5-6 program/dmem.txt > program/dmem.1.txt
cut -b 3-4 program/dmem.txt > program/dmem.2.txt
cut -b 1-2 program/dmem.txt > program/dmem.3.txt
riscv32-unknown-elf-objdump -d program/program.bin > program/program.dump

You can see it first assembles two source files main.s and init.s into .o files. Then it creates a directory to hold the output files, and links (using the 'ld' linker tool and .ld script) these into a binary image. Then the various sections are extracted and hex files generated. Finally a disassembly dump is generated.

Have a look at program/program.dump. See if you can follow how the information in prog-link.ld, init.s and main.s was used to build the program and how the program is structured.

Simulating the Yarvi

Now we have a program, we need to simulate the Yarvi processor.

Make a new directory exercise2. Put yarvi and example-asm directories inside. Then, in a terminal:

cd exercise2/yarvi
rm -f *mem*.txt
ln -s ../example-asm/program/*mem*.txt .

(make sure you copy the last line carefully, including the dot on the end). This makes symbolic links ('symlinks') between the Yarvi processor and the hex files in your software project. This means you can recompile your software project and the hex files in the processor will be immediately updated. To change to another software project, simply re-run this with a different directory name than 'example-asm'

To simulate the Yarvi, start modelsim from the yarvi directory. When started, type "do yarvi_test.do" in the transcript window. This should compile the yarvi sources, set up the waveforms and initialise the clock, reset and pause signal. Now you can use the run buttons to start simulation, or type run 1us in the transcript window to run for a short length of time.

In addition to waveforms, Yarvi also outputs an instruction trace in the transcript window. Besides tracing program flow, we can use this trace output to track both intermediate and return values of a function by looking at the register- and memory writes.

The basic Yarvi has no input/output devices. We have added a simple debugging printout to display a 32 bit value. Any word written to address 0x87000000 is printed in the Modelsim trace. In your code, add:

	lui    t2, (0x87000000>>12)		# load address into a register
	sw     a0, 0(t2)			# store value of a0 to 0x87000000

will print DEBUG PRINT followed by the value in a0.

Exercise

Write a RISC-V assembly program to perform integer division.

Division is slow and expensive in hardware, and we use it relatively rarely. Instead we can use a subroutine to perform division in software, and call this instead of a hardware divide instruction.

Wikipedia gives the long division algorithm as follows (in Pascal like pseudo-code):

Q := 0                 -- initialize quotient and remainder to zero
R := 0                     
for i = n-1...0 do     -- where n is number of bits in N
  R := R << 1          -- left-shift R by 1 bit
  R(0) := N(i)         -- set the least-significant bit of R equal to bit i of the numerator
  if R >= D then
    R := R - D
    Q(i) := 1
  end
end

In assembly, write a function div that for two numbers in registers a0 and a1, calculates (a0/a1) and returns the quotient in a0 and remainder in a1. Use the provided example-asm framework as a starting point, and put your function at the top of main.s. Call it from the main() function.

Part 2

Verify that the program you just wrote behaves as expected. For this, you will simulate the Yarvi processor in Modelsim. Use this template to invoke your code:

        lui     s11, (0x87000000>>12)   # location of debug output register

        addi    a0, zero, 12    # a0 <- 12
        addi    a1, zero, 4     # a1 <- 4
        call    div
        sw      a0, 0(s11)      # display the quotient
        sw      a1, 0(s11)      # display the remainder

        addi    a0, zero, 93    # a0 <- 93
        addi    a1, zero, 7     # a1 <- 7
        call    div
        sw      a0, 0(s11)      # display the quotient
        sw      a1, 0(s11)      # display the remainder

        lui     a0, (0x12345000>>12)
        ori     a0, a0, 0x678   # a0 <- 0x12345678
        addi    a1, zero, 255   # a1 <- 255
        call    div
        sw      a0, 0(s11)      # display the quotient
        sw      a1, 0(s11)      # display the remainder

Follow the procedure to convert your source code into textual format and preload the Yarvi instruction- and data memory with it. Carefully analyse the program trace to compare the implementation's output with your expectation.

C on Yarvi

Yarvi implements enough of the RISC-V instruction set to be targetable by a C compiler. C enables us to build larger programs and makes it easier to port them to different CPUs.

Change to the example-c directory.

This is similar to the example-asm project, except with C functions. C programs always begin at the main() function, which we have provided in main.c. As before, init.s sets up the environment so that we can begin execution of the C program.

The Makefile works in a similar way - run 'make' to build, and then generate the symlinks in your yarvi directory to configure the Yarvi to execute this program.

C syntax is similar to Java. See An Introduction to C for differences.

The GCC C compiler has a number of optimisation levels, that set how hard it works to make the output assembler faster or smaller. Try adding -O0 to -O3 to CFLAGS in the Makefile to optimise for speed, or -Os to optimise for size, and look at the assembly code it generates. A full list of options can be found in the GCC manual.

Mixing C and assembler

You can write some parts of your program in C and some in assembler. To do this, we need to arrange for the C to put the parameter values into the correct registers for the assembler to pick up, and vice versa. This is defined by the processor's calling convention. On the RISC-V the calling convention is to use registers 10-17 (named a0-a7) for function arguments and 10-11 (a0-a1) for return values. a0-a7 correspond to the first 8 arguments provided to a C function, and the return value from C should be provided in a0. Note that C only provides access to a single return value.

For example, an addition function might look like:

	# int add(int a, int b): add two numberoptional-shiftreg.htmls together
	# parameters supplied in a0 and a1
	# doesn't call any other functions, so no need to store return address register (ra)
	# doesn't corrupt any callee-save registers, so no need to use the stack

	.global add		# export the function symbol so the linker can find it
	add:
	        # add the two parameters, returning the result in the return register a0
	       	add		a0,a0,a1
	        
	        ret

To call from C code we need a function prototype to tell the C compiler the types and parameters of the function, without defining it. For example, if we have an assembler function taking two integers and returning another, in a file called asmfunctions.h (or some other name) enter:

		int add(int a, int b);

Then your C code can:

		#include "asmfunctions.h"

to make the definition available to this C file. You can then call myfunction() anywhere below the #include. The advantage of .h header files is you can easily include them in multiple files, to enable calling your function from multiple places.

Note that, while C is a type-checked language, the assembler will not type check you. So there are no safeguards if your declaration of myfunction() does not match your usage of registers in assembler.

If you made a new file, either rem.s for assembly or rem.c for C code, you will need to add rem.o to the OBJECT_FILES line in the Makefile to tell the compiler to build it. Make will then see it needs to build the object file, and work out how it can build it from rem.s or rem.c.

In the bare metal environment, there is nothing to return to when your program has finished, so init.s simply ends in an infinite loop. In a real operating system we would either return to the environment or stop the CPU to save power.

Exercise

Part 3

Make a copy of your division code and modify it into a function rem that returns the remainder in a0. Write a C function prototype for it.

In C, write a program using rem to check whether the first 100 numbers are prime by a sieve by trial division:

The Modelsim log window will only display a limited number of lines. It also outputs the log to a file called transcript in the directory you started Modelsim from. You can open this in an editor, or the UNIX command
grep DEBUG transcript
will print all the lines containing the string DEBUG.

Allocate array N of size 100
Initialise each element of N to 1
For i in 2..100:
  If N[i] is 1:
  	For j in i+1..100
  	  If N[j] is 1:
  	    If rem(j, i) is 0:
  	      N[j] = 0

We are programming the Yarvi bare-metal so we do not have any memory allocation functions, so use the following in your main() function to allocate an (uninitialised) 100-element array:

		int array[100];

Note also that we don't have a standard library, so functions like malloc or memset do not exist (unless we write them). The Yarvi CPU also does not have multiply - if the compiler cannot optimise a constant multiply to adds and shifts, it will call a multiply function which we don't have (unless you write one).

Output the prime numbers you find on the debug port - you can use the following function:

		void output(int x)
		{
			volatile int *debug = (volatile int *) 0x87000000;
			debug[0] = x;
		}

Computer Laboratory

ECAD and Architecture Practical Classes

Yarvi - a RISC-V processor

Yarvi tour

Programming Yarvi

The RISC-V toolchain

Simulating the Yarvi

Exercise

Part 2

C on Yarvi

Mixing C and assembler

Exercise

Part 3