Advanced Computer Architecture Supervision 2
Recommended reading
Computer Architecture: A Quantitative Approach (5th edition) by Hennessy and Patterson:
- Chapter 3: Instruction-Level Parallelism and Its Exploitation
(Previous editions are also fine but may have different chapter names/numbers.)
Exercises
- Why would it be difficult to build and exploit a superscalar processor that fetched and issued 16 instructions per clock cycle?
- The out-of-order execution of ALU instructions in a superscalar processor is only constrained by the availability of functional units and true data dependencies. Why must the out-of-order execution of memory instructions (e.g. load and store instructions) be constrained further?
- 2021 Paper 9 Question 4, parts a) and b) only
- How does loop unrolling help the compiler to expose greater amounts of ILP?
- Some VLIW processors contain additional hardware to permit memory reference speculation.
- What optimisations does memory reference speculation permit?
- Briefly describe the additional hardware required to support this type of speculation
- The fragment of code listed below is an example of memory reference speculation. The code before and after memory reference speculation has been performed is provided. What “fix-up” code must be executed if the speculative load is unsuccessful (i.e. a memory carried dependency is discovered at run-time between the store and load instructions)?
*** Code before speculation *** ... store [r3]=r8 ; a store instruction ld r1=[r2] ; a load instruction r5=r1+r4 *** Code after speculation *** ld.a r1=[r2] ... r5=r1+r4 ... store [r3]=r8 chk.a r1, fixup back: ... fixup: ???? ???? ????
- Why might it be possible to achieve better performance with software pipelining than with loop unrolling when using a VLIW machine?
- In which situations can we improve performance by duplicating instructions?
- Case study: compare and contrast an "old" (pre-2000) and a "modern" high-performance processor of your choice. Make notes on anything you have covered in the two Computer Architecture courses: number of transistors, branch predictor, caches, pipeline depth, support for out-of-order execution, instruction set(s), parallelism, etc.
Here are some suggestions to get you started. Note that much of the requested information is not published by the manufacturers, so you might need to dig deeper to find it out.- Old: Alpha 21264, Intel Pentium 4, MIPS R10K
- Modern: AMD Zen 2, Apple M1/A14, Intel Alder Lake, RISC-V BOOM (unusual: AMD Bulldozer)