Computer Laboratory Home Page Search A-Z Directory Help
University of Cambridge Home Computer Laboratory
Lab 4
Computer Laboratory > Course material 2004-05 > ECAD + Architecture Main Page > Lab 4


  Prime number generator

In this practical, you'll create a project in Quartus II that will use the ARM processor on the Excalibur chip to generate a series of prime numbers. This will allow you to try out your assembly programming skills and gain experience of using a remote debugger. The ARM is part of the Excalibur Embedded Stripe - this is a hard-wired part of the board. It contains the ARM, Dual-Ported RAM, SRAM, and a serial port.

In order for Quartus to understand that we want to use the ARM in the stripe, a MegaFunction has to be defined. This is a computer-generated Verilog module that the compiler knows is actually the ARM processor. A small piece of Verilog then joins up the pins that are needed, in a similar manner to any other Verilog Module.

You might want to read through the ARM Quick Reference, the copy of the the instruction set reference provided, and maybe also look back through lecture four of the Computer Design notes on the ARM Procedure Calling Standard. One or two of the instructions listed on the instruction quick reference may not work if there is something in the Greek-looking symbol column, so it is an idea to check (the ARM922T is an ARMv4 architecture).



  Making a new Project

First create a new directory in your file space, and download the following files into it:

  • main.v - The Verilog file that links the ARM to the Excalibur's pins
  • main.s - Assembler code for the ARM
  • serialio.s - Assembler library to allow communication via the serial port. There is some documentation for each function inside the file.

Open Quartus II from the Start menu, and select File | New Project Wizard.... Enter the directory, the project name as 'primes' and the top-level entity as 'main'. On the next screen, add 'main.s', 'main.v', and 'serialio.s' to the project files, and set the rest of the options as Lab 1 suggested. Now, go to the TCL Console and type 'source excalibur_pins.tcl' as in the first Lab.

Now, the final part of setting up the project is creating the ARM Stripe MegaFunction.

  • Select Tools | MegaWizard Plug-In Manager..., and choose Create a new MegaFunction Variation.
  • On the next screen, select ARM-Based Excalibur from the list on the left, EXCALIBUR_ARM for the device family, Verilog HDL for the type of output file, and for the output filename, your project directory\arm.v.
  • On the next page, make sure the device is EPXA1, and its Endianness is set to 'Little'. All the tickboxes apart from UART should be unticked. Outputs and Inputs should be left at their default values.
  • On the next page, untick everything.
  • On the page after, for external reference frequency, use 25 MHz. Bypass PLL1 should be ticked, with the two AHB frequencies at their default values, and the tickbox for using an EEPROM should be unticked.
  • The next page is where you select your memory map. This is where in the ARM's address space it is able to access the devices available to it. Select 16K for SRAM0 and SRAM1, and set their addresses to 00000000 and 00004000 respectively. This means that there are now two 16K blocks of SRAM available from address 0 to 7FFFh (the 'h' suffix means 'hexadecimal').
  • DPRAM is dual-ported RAM and should be OFF for this practical, and the registers should be left as they are.
  • Click next, and then finish, and you should be able to compile your project using the standard Compile button.

Unfortunately, you're not quite done yet, because Quartus II needs telling that it should also compile main.s and create the files for it.

  • Go to Assignments | Settings...
  • Select Software build settings in the left-hand pane.
  • Set Embedded processor architecture to ARM922T, and Software toolset to ADS Standard Tools. ADS stands for ARM Developer Suite.
  • Set Byte order to Little endian.
  • Set Output file format to .hex and the file name to "prime.hex".
  • Under Programming file generation, select passive configuration and set the file name to "main.psof".
  • Now, go to Software Build Settings | Linker in the left-hand pane, and set Link Type to Simple. Tick all three boxes, and set them to "0". This tells the Linker that it must fit everything into memory starting at address 0.
  • You also need to ensure that main.s appears first in the linked image. To do this add "-first main.o" to the command line arguments. If we had been running from FLASH or non-writable memory, we probably would have set these up differently.
  • Now go to Software Build Settings | Toolset directories in the left-hand pane.
  • Finally, select ADS Standard Tools from the list. The toolset directory should be correctly set. If toolset directory is not set, set it to "y:\armds1.2\bin" and click change.
  • Click on OK and click Build on the toolbar to build the program.
  • After this is done, you can upload to the FPGA. Go to Tools | Programmer and click the button marked 'Auto Detect'. This will detect all devices on the JTAG chain - there should be 2 of them (if not, change the jumper on the EPXA1 board marked JSELECT to the 2-3 position and try again). Right-click on the one marked EPXA1 and choose 'Change File'. Select main.sof, and tick the 'Program/Configure' box.

    When you click Start Programming the image will be uploaded to the FPGA, and a message will be output on the Serial port. To see this, plug your serial cable into the connector on the short edge of the EPXA1 board (the one that protrudes through a hole in the box), and load TeraTerm from the start menu. Set it up to connect via the serial port your cable is plugged into at the computer (the top one is COM2 and the bottom one is COM1), and then go to Setup | Serial port... and set it to 38400 baud, 8 data bits, no parity, 1 stop bit, and no flow control. After uploading the file, you should see the following:

    **************************************
    *  ECAD + Arch Lab 4 - Prime Number  *
    *              Generator             *
    **************************************
    
    Finished! :)
    


  Remote Debugging

The ARM Debugger (AXD) is provided for remote debugging via the MasterBlaster interface. The debugger allows you to step through your code whilst monitoring registers and memory contents.

AXD should be in the usual place in the Start menu.

When AXD loads, it may ask you to choose a target. The target is the hardware or emulator on which your code is running. In this case you need to select Altera-RDI to connect to the EPXA1 board. If this does not appear, click on the Add button, and open the file called Y:\Quartus40\quartus\bin\Altera-RDI.dll. If the Choose Target window does not appear, select Options | Configure Target. If AXD fails to connect to the EPXA1, try choosing the target again.

The debugger allows you to watch your program being executed instruction by instruction. You can see also the contents of the registers and memory changing. In order to see all of these you will need to open a sub-window for each:

  • To see the contents of the registers, select Processor Views | Registers and expand the "Current" sub-tree to create a small window showing the current register values.
  • To see the contents of the memory, select Processor Views | Memory. Set the Start address to 0, as this is where your program will reside.
  • To see the instructions being executed, select Processor Views | Disassembly. Note that there is a blue arrow pointing to one instruction. This represents the current execution point.

If you have just loaded the sample code into the board, you should find that the processor has reached the infinite loop (at address 0x18). Now set the PC to 00 (Right click in the Disassembly window and select Set PC...).

  • To execute the program hit F5 or click the icon.
  • To step through the program, press F8 (or click the icon) a few times. Try this after setting the PC back to 0. This will advance the blue arrow and execute the next assembler instruction, following subroutine calls. Notice how the register values and memory change.
  • When you get to a bl instruction, pressing F8 again will step you into a subroutine. Using F10 (or clicking ) instead, will advance over the instructions one at a time, but will not 'step' into the subroutine.
  • Running an entire program like this could get very tedious, so it will be useful to set breakpoints. These allow the program to run freely until the processor reaches a specific instruction, at which point it stops and the registers and memory are displayed. To set a breakpoint, place the cursor by the instruction and press F9 (or click on , or double-click the instruction in the disassembly). Now press F5 (); the program will run until it reaches the breakpoint.
  • The Run To Cursor button (F7 or ) does exactly what it says...

In summary, "step in" will do one instruction at a time; "step over" will do one instruction or subroutine at a time; and breakpoints halt the run whenever they are hit. They provide different granularities of control over debugging.


  Task: Prime Number Generator

Write a program to produce a table of the first 1000 primes in hexadecimal. The table should be neatly printed on the console and should include an index number by each prime. You might like to use one of the following two methods.

Repeated Division Method

One method for determining if N is prime is to try dividing it by primes less than sqrt(N). Note that the ARM does not have instructions that perform division and remainder operations. However, you can use the code from the Computer Design notes. Note that it expects r0 to contain the value to be divided, and r1 to contain the dividend. The result is left in r1, and remainder in r0, but r2 and r3 are corrupted in the process. It might also be useful to know that in the event you divide something by 0, it dies horribly and just gets stuck in an endless loop.

DivStart				; enter with numbers in r0 and r1
	mov	r2,#1			; bit to control the division
Div1	cmp	r1,#0x80000000		; shift r1 left until top bit set...
	cmpcc	r1,r0			; ...or r1>r0
	movcc	r1,r1,LSL#1		; shift r1 left if require
	movcc	r2,r2,LSL#1		; shift r2 left if r1 is shifted
	bcc	Div1			; repeat whilst more shifting required
	mov	r3,#0			; zero r3
Div2	cmp	r0,r1			; test for possible subtraction
	subcs	r0,r0,r1		; subtract if r0>r1
	addcs	r3,r3,r2		; put relevant bit into result
	movs	r2,r2,LSR#1		; shift control bit
	movne	r1,r1,LSR#1		; halve unless finished
	bne	Div2			; loop if there is more to do
					; divide result in r3
					; remainder in r0
	mov	r1,r3			; divide result (r0/r1) now in r1
	mov	r15,r14			; return from subroutine

Sieve of Eratosthenes

Alternatively you could use the Sieve of Eratosthenes. To the right is an incomprehensible diagram showing this algorithm working. From left to right are numbers, and from top to bottom are steps of computation. In the first step, every 2nd number is marked as non-prime, since they are multiples of 2. The next prime is found (3) and every multiple of 3 is ticked off, and so on. This leaves only prime numbers unmarked (which are shown in red).

For the sieve, you need some memory that can be used to store the table of values. If you insert the following code at the end of the file (just before END), the assembler will allocate 1024 bytes of memory. Due to the ARM only being able to directly access 4k of memory relative to the program counter, and the use of the LDR r#,=... macro (which in some cases allocates memory to store constants too), allocating more than around 2000 bytes may cause the assembler to complain.

	ALIGN			; ensure allocated memory is word aligned
primes_table	%	1024	; e.g. allocated 1024 bytes
				; to load the address of this into a register, use:
				; ldr r0,=primes_table

It is recommended that you use entire bytes (and use the ldrb and strb instructions) to store values for the Sieve of Eratosthenes.

Data Output

Use the UART_TXD_DECIMAL routine to output an index number and the UART_TXD_HEX routine to output the prime. These take the value in r0 and output it to the serial port (r1,r2 and r3 are changed in the process). To output a space character to separate the numbers, use the following code (32 is the character code for space):

	mov	r0,#32
	bl	UART_TXD

Similarly, to end the line, output a line-feed character and a newline:

	mov	r0,#13
	bl	UART_TXD
	mov	r0,#10
	bl	UART_TXD

To output a single character, you can either use the ASCII code as in the examples above, or you can use the character itself:

	mov	r0, 'a'
	bl	UART_TXD


  Assessment

Questions

  1. What are the first ten primes in hexadecimal?
  2. What is the 1000th prime in hexadecimal?
  3. What is the computational complexity of your implementation?

Ticking Criteria

  • A sequence of primes up to 1000 should be produced on the RS-232 port. Any more is a bonus.
  • There should be no numbers that are not primes; in particular, number 1 should not be displayed. Primes below 10 are allowed to be hard-coded if it would have hard for your technique to produce them.
  • The 100th prime should be numbered (counting from 2 being the first prime) - it is best to number every prime using UART_TXD_DECIMAL.
  • You must give a live demonstration of your solution.
  • Your ARM code must be cleanly formatted and commented.
  • Answers to the questions for the workshop must be added to the end of your code.
  • The following header must be added to all assembly code submitted:
    ;    -----------------------------------
    ;  --=== ECAD and Architecture Lab 4 ===--
    ;    -----------------------------------
    ;  Your name
    ;  Your college
    ;  Your CRSid
    ;  Date
    
  • There is no need to submit any Verilog, arm.s, or serialio.s for this practical.

Ticking procedure

  1. Show your work to one of the demonstrators (on screen or paper). They will award you with a tick if the work is up to standard.
  2. Print out your final work and add it to your portfolio to be submitted as instructed in the Head of Department notice.