Department of Computer Science and Technology

Advanced Computer Architecture Supervision 2

Recommended reading

Computer Architecture: A Quantitative Approach (5th edition) by Hennessy and Patterson:

  • Chapter 3: Instruction-Level Parallelism and Its Exploitation

(Previous editions are also fine but may have different chapter names/numbers.)

Exercises

  1. Why would it be difficult to build and exploit a superscalar processor that fetched and issued 16 instructions per clock cycle?

  2. The out-of-order execution of ALU instructions in a superscalar processor is only constrained by the availability of functional units and true data dependencies. Why must the out-of-order execution of memory instructions (e.g. load and store instructions) be constrained further?

  3. 2021 Paper 9 Question 4, parts a) and b) only

  4. How does loop unrolling help the compiler to expose greater amounts of ILP?

  5. Some VLIW processors contain additional hardware to permit memory reference speculation.
    1. What optimisations does memory reference speculation permit?
    2. Briefly describe the additional hardware required to support this type of speculation

  6. The fragment of code listed below is an example of memory reference speculation. The code before and after memory reference speculation has been performed is provided. What “fix-up” code must be executed if the speculative load is unsuccessful (i.e. a memory carried dependency is discovered at run-time between the store and load instructions)?
    *** Code before speculation ***
    ...
    store [r3]=r8 ; a store instruction
    ld r1=[r2]    ; a load instruction
    r5=r1+r4
    
    *** Code after speculation ***
    ld.a r1=[r2]
    ...
    r5=r1+r4
    ...
    store [r3]=r8
    chk.a r1, fixup
    
    back:
    ...
    
    fixup:
    ????
    ????
    ????
    

  7. Why might it be possible to achieve better performance with software pipelining than with loop unrolling when using a VLIW machine?

  8. In which situations can we improve performance by duplicating instructions?

  9. Case study: compare and contrast an "old" (pre-2000) and a "modern" high-performance processor of your choice. Make notes on anything you have covered in the two Computer Architecture courses: number of transistors, branch predictor, caches, pipeline depth, support for out-of-order execution, instruction set(s), parallelism, etc.

    Here are some suggestions to get you started. Note that much of the requested information is not published by the manufacturers, so you might need to dig deeper to find it out. Please spend 90 minutes to 2 hours on this.