skip to primary navigationskip to content

Department of Computer Science and Technology

ECAD and Architecture Practical Classes

Course pages 2020–21

ECAD and Architecture Practical Classes

ModelSim Clarvi instruction trace

The example-asm directory initially contains all the assembly code required to initialise Clarvi and call the function myfunction (initially empty). Running in ModelSim should produce the following trace:

# 0xxxxxxxxx:   ---
# 0xxxxxxxxx:   ---
# 0xxxxxxxxx:   ---
# 0xxxxxxxxx:   ---
# 0xxxxxxxxx:   ---

Clarvi is six-stage, and instructions are only printed during the last stage (write-back) when their effects are fully known. These five clock cycles are cycles where the pipeline is "filling up".

We first execute the code to set up the stack and exception handler compiled from init.s:

# 0x00000000:   AUIPC	sp, 0x00010000		sp := 0x00010000 

The number before the colon shows the address of the instruction currently in write-back. Note that instructions are one word (i.e. four bytes) long, and so the address increases by 4 for subsequent instructions. The trace then shows the instruction mnemonic and arguments the same way they would be specified in an assembly file. The effects of the instruction are shown after this, i.e. here setting the register sp to 0x00010000. Assignments are shown with a := whereas values which are read by the instruction have their original values displayed with a =.

AUIPC (add upper-immediate to pc) stores the program counter (i.e. address of the instruction currently being executed) to the target register with a 20-bit immediate added to the 20 most significant bits. sp is a register reserved for the stack pointer. This allows us to create the stack relative to the location of the code in memory. In our case, the code starts at 0x0 so adding the pc has no effect.

# 0x00000004:   ADDI	sp, sp, -32		sp := 0x0000ffe0, sp = 0x00010000 

Allocate room on the stack (which grows downwards) for 32 bytes. We don't actually store anything in the top address: this instruction just serves to guard against an incorrect stack push running off the end of memory (e.g. by writing before decrementing sp).

# 0x00000008:   AUIPC	t0, 0x00000000		t0 := 0x00000008 
# 0x0000000c:   ADDI	t0, t0, 20		t0 := 0x0000001c, t0 = 0x00000008 
# 0x00000010:   CSRRW	zero, MTVEC, t0 

This sets up exception handling. t0 is used as a temporary to build up the address to jump to when an exception occurs. The first instruction again makes this relative to code position, the second instruction indicates the exception handler is at offset 20 from 0x8, i.e. 0x1c from 0x0.

The final instruction writes this address to the MTVEC "Control/Status Register": a special register reserved for determining the jump target after an ECALL instruction. The target address contains an unconditional jump to itself, so that in the event of an exception (which we also use to indicate the program finishing), a synthesised FPGA will spin doing nothing, while a simulator can detect the ECALL instruction and terminate.

# 0x00000014:   JAL	ra, 32		ra := 0x00000018, target = 0x00000034 
# 0x00000018:   ---
# 0x0000001c:   ---
# 0x00000020:   ---

Jump to main, saving the current address (+4) to ra, which is reserved for the return address a function should use when it is complete. Note the three wasted cycles due to the control hazard in Clarvi: branches are executed during the execute stage, meaning the first instruction fetch stage is requesting an incorrect instruction, the second instruction fetch stage is receiving an incorrect instruction, and decode is decoding an incorrect instruction. We therefore invalidate these stages and so we have three "bubbles" which pass through the pipeline.

# 0x00000034:   ADDI	sp, sp, -32		sp := 0x0000ffc0, sp = 0x0000ffe0 
# 0x00000038:   SW	ra, 0(sp)		mem[0000ffc0] := 0x00000018, sp = 0x0000ffc0, ra = 0x00000018 
# 0x0000003c:   JAL	ra, -28		ra := 0x00000040, target = 0x00000020 
# 0x00000040:   ---
# 0x00000044:   ---
# 0x00000048:   ---

main saves to the stack the address it has to return to and then calls myfunction, incurring another 3-cycle branch delay.

# 0x00000020:   ADDI	sp, sp, -32		sp := 0x0000ffa0, sp = 0x0000ffc0 
# 0x00000024:   SW	ra, 0(sp)		mem[0000ffa0] := 0x00000040, sp = 0x0000ffa0, ra = 0x00000040 

myfunction saves the return address so it can store other values to ra.

The code you write to do actual work will be executed here.

# 0x00000028:   LW 	ra, 0(sp)		ra := 0x00000040 = mem[0x0000ffa0], sp = 0x0000ffa0 
# 0x0000002c:   ADDI	sp, sp, 32		sp := 0x0000ffc0, sp = 0x0000ffa0 
# 0x00000030:   JALR	zero, ra, 0		, ra = 0x00000040, target = 0x00000040 
# 0x00000034:   ---
# 0x00000038:   ---
# 0x0000003c:   ---

Restore the return address and jump to it. Note that the JALR instruction jumps to an address relative to a register argument (ra here), rather than relative to the current pc.

# 0x00000040:   LW 	ra, 0(sp)		ra := 0x00000018 = mem[0x0000ffc0], sp = 0x0000ffc0 
# 0x00000044:   ADDI	sp, sp, 32		sp := 0x0000ffe0, sp = 0x0000ffc0 
# 0x00000048:   JALR	zero, ra, 0		, ra = 0x00000018, target = 0x00000018 
# 0x0000004c:   ---
# 0x00000050:   ---
# 0x00000054:   ---

Similarly, return from main.

# 0x00000018:   ECALL 

Throw exception: interpreted by the simulator as "stop".