Computer Laboratory

Course pages 2016–17

ECAD and Architecture Practical Classes

Exercise 4: Etch A Sketch in simulation

Functional modelling

In a typical design project, there are often several teams working in parallel. For instance, in a smartphone project the chip design team may be working on the hardware, the OS team may be working on the operating system and writing drivers, the applications team may be designing the user interface, the mechanical team working on the packaging and the manufacturing team making a production line to build and test the product cost-effectively. To reduce the time-to-market frequently these teams start their work concurrently: there is not time to perfect the hardware before writing the software.

To allow concurrent development we can simulate parts that do not yet exist. For example we can develop drivers on a simulation of the chip before silicon is available. However a full-scale simulation of the chip is very slow and unnecessarily detailed for the driver writer, and the driver writer still needs to wait for a large chunk of the chip design to be completed before proceeding.

One approach to address this is top-down refinement. We can write functional models for components that are yet to exist. These express the functionality required, before the implementation details have been determined. They are then made available for developers in other teams to start their work. Once the real implementation of the components exist they can replace the functional models, but we may retain them as they are often much quicker to simulate than the real thing. Functional models may make use of features not available in the target environment to aid debugging - for example simulation models can be written in software rather than hardware, and mechanical models might use rapidly 3D printed plastic instead of metal parts.

By the end of the course we are going to synthesise an FPGA design containing the Clarvi processor and your SystemVerilog peripherals and download it onto an FPGA board. FPGA synthesis is a computationally hard problem and so can take up to 20 minutes to run. To aid development times you will test your software in the simulator against a functional model of the display and rotary dials on the FPGA board. The functional model will run in C and communicate over a TCP socket with your Clarvi SystemVerilog that runs in Modelsim, displaying in a window what you would see on the LCD on the FPGA board. By running in software simulation you are able to see the full behaviour of the CPU which would not be possible when it is running in hardware.

The Etch-a-Sketch

The goal of this exercise is to build a system to implement an Etch A Sketch in simulation. An Etch A Sketch is a 1950s mechanical toy bearing two knobs, which allows the user to draw patterns on the screen. The unit (mechanically) maintains an X,Y coordinate the pixel of which is continually painted black. When the left knob is rotated the X coordinate is increases or decreases, when the right knob is rotated the Y coordinate increases or decreases, which allows lines to be drawn. Moving both knobs at the same time enables drawing diagonal lines, and shaking the unit clears the display.

Avalon Memory-Mapped (Avalon MM) is Altera's on-FPGA memory interconnect. Clarvi is able to be a master to peripherals on the Avalon interconnect, which we will configure for the FPGA in Qsys.

In our case we have two dials, a 'clear' button, and a 480x272 pixel display to show the picture. These are mapped into Clarvi memory space as follows: (the symbols are defined in avalon_addr.h)

ComponentClarvi BaseLengthAccessSymbol
Display framebuffer0x08000000480*272*16 bits = 261120 bytesRead/writeFRAMEBUFFER_BASE
LED output0x0400000032 bitsWrite onlyPIO_LED_BASE
Hex digits output0x0400008032 bitsWrite onlyPIO_HEX_BASE
Left dial counter0x0400010032 bitsRead onlyPIO_ROTARY_L
Right dial counter0x0400020032 bitsRead onlyPIO_ROTARY_R
Display buttons0x0400030016 bitsRead onlyPIO_BUTTONS

We provide a display emulator program to implement these in software that Modelsim is able to connect to. You can run the emulator as follows:

Errata: to fix a bug in the VM, before you proceed fetch some missing packages in a terminal with:
sudo apt-get install gcc-multilib libc6-dev:i386
  • First, download a copy of the Clarvi source with emulator components and a fresh example-c, put them in a directory exercise4.
  • Open a new terminal, source your usual setup script, and change to the directory containing your new Clarvi source.
  • Type: make
    This will build the emulator for your local machine.
  • Type: ./displayemul
    This will run the emulator, which will then listen for connections from Modelsim. It will only accept one connection at a time.
  • A black window will appear. This shows the display framebuffer, which you can write to from the Clarvi. You can also use the following keypresses:
KeyActionBit in button register
Cursor leftLeft dial anticlockwisen/a
Cursor rightLeft dial clockwisen/a
Cursor upRight dial anticlockwisen/a
Cursor downRight dial clockwisen/a
NLeft dial click1
MRight dial click2
AButton A press14
BButton B press15
XButton X press12
YButton Y press13
SNavigation stick left6
FNavigation stick right5
ENavigation stick up7
CNavigation stick down4
DNavigation stick centre click3

Now run a Clarvi simulation in Modelsim. You should see:

# Display emulator setup...
# Successful connection to display emulator

to indicate a connection has been successful. When you read or write emulated devices, for example:

lui  s10, (0x04000000>>12)
addi s10, s10, 0x300
lw a0, 0(s10)

you should see logs in both the ModelSim window:

#          800: pc=0x0000000c:   LUI	s10, 0x04000000		s10 := 0x04000000 
# Read of Avalon[0x4000300] returned 00000000
#          810: pc=0x00000010:   ADDI	s10, s10, 768		s10 := 0x04000300, s10 = 0x04000000 
#          820: pc=0x00000014:   LW 	a0, 0(s10)		a0 := 0x00000000 = mem[0x04000300], s10 = 0x04000300 

and on the terminal in which you are running the display emulator:

Read address=0x4000300, byteenable=0xf
---- Buttons PIO <- 0x00000000 (byte enable 0xffffffff)
Read address=0x4000300, byteenable=0xf. returned=0. response=d 0

Note that the memory access messages are slightly out of sync with the instruction trace. This is because they are printed at the start of the cycle they occur, while the trace is printed when instructions commit as they exit the pipeline (the instruction may fail to commit if an exception happens in between).

When running longer programs the messages can slow down the simulation. You can turn most of them off by commenting out the line that defines VERBOSE in displayemul.c and not including TRACE when you run the simulation in Modelsim.

volatile here is an instruction to the C compiler that it should not attempt to optimise away memory accesses for this type. This is because we are accessing a hardware register that may change, rather than memory which we expect to be constant if we do not write to it. If we did not declare it volatile, the compiler could optimise away multiple reads and instead return stale data.

The following C functions will enable access to the peripherals:

	// this file contains the locations of the registers as described in the table above
	// copy it into your example-c/src folder from the clarvi distribution
	#include "avalon_addr.h"

	int avalon_read(unsigned int address)
		volatile int *pointer = (volatile int *) address;
		return pointer[0];

	void avalon_write(unsigned int address, int data)
		volatile int *pointer = (volatile int *) address;
		pointer[0] = data;

For example, you read the left dial position with:

C's types 'int', 'long', 'long long', short' and 'char' can vary in size depending on the compiler and CPU architecture. While we know the RISC-V GCC uses 'int' (32 bits) and 'short' (16 bits), normally it is better to #include <stdint.h> which provides types like uint16_t for a 16-bit unsigned integer. We don't do this here to make the code easier to read by not needing explicit casts, forcing conversion of one type to another.
	int left;

	left = avalon_read(PIO_ROTARY_L);

When doing so, you should see both a log in the Modelsim trace and a message in the display emulator window.

You can plot a pixel with the following function:

	// our pixel format in memory is 5 bits of red, 6 bits of green, 5 bits of blue
	#define PIXEL16(r,g,b) (((b & 0x1F)<<11) | ((g & 0x3F)<<5) | ((r & 0x1F)<<0))
	// ... but for ease of programming we refer to colours in 8/8/8 format and discard the lower bits
	#define PIXEL24(r,g,b) PIXEL16((r>>3), (g>>2), (b>>3))

	#define PIXEL_WHITE PIXEL24(0xFF, 0xFF, 0xFF)
	#define PIXEL_BLACK PIXEL24(0x00, 0x00, 0x00)
	#define PIXEL_RED   PIXEL24(0xFF, 0x00, 0x00)
	#define PIXEL_GREEN PIXEL24(0x00, 0xFF, 0x00)
	#define PIXEL_BLUE  PIXEL24(0x00, 0x00, 0xFF)

	#define DISPLAY_WIDTH	480
	#define DISPLAY_HEIGHT	272

	void vid_set_pixel(int x, int y, int colour)
		// derive a pointer to the framebuffer described as 16 bit integers
		volatile short *framebuffer = (volatile short *) (FRAMEBUFFER_BASE);

		// make sure we don't go past the edge of the screen
		if ((x<0) || (x>DISPLAY_WIDTH-1))
		if ((y<0) || (y>DISPLAY_HEIGHT-1))

		framebuffer[x+y*DISPLAY_WIDTH] = colour;


Write some C code to implement the Etch A Sketch. Your software should read the dial inputs and set the relevant pixel to white. When the left or right dial is clicked, the screen should clear to black. You'll find clearing the screen is slow in the emulator, so don't clear it on startup.

One problem you will have to solve is the rotary encoders return 8-bit values, while the screen is 480 pixels wide and 272 pixels high. Your code should handle wrapping so you can draw on the full area of the screen.

Optional exercise for starred Tick

Take a copy of your Etch A Sketch code and modify it to implement Pong or some other similar game.

Pong differs from Etch A Sketch in a key way: Etch A Sketch is time invariant, while Pong becomes unplayable if gameplay is too fast or too slow. To remedy this, first write a delay function that spins for 100 microseconds, and calibrate this with Modelsim. While you are running the code in simulation you should omit delays, but you will need to insert delays in the code when it runs on FPGA. (We can also use the cycle counter for real-world timing but, since the simulation speed depends on workload, it is difficult to calibrate gameplay in simulation)