Advanced Computer Architecture Supervision 2

Exercises

Why would it be difficult to build and exploit a superscalar processor that fetched and issued 16 instructions per clock cycle?

The out-of-order execution of ALU instructions in a superscalar processor is only constrained by the availability of functional units and true data dependencies. Why must the out-of-order execution of memory instructions (e.g. load and store instructions) be constrained further?

Some VLIW processors contain additional hardware to permit memory reference speculation.
1. What optimisations does memory reference speculation permit?
2. Briefly describe the additional hardware required to support this type of speculation

Why might it be possible to achieve better performance with software pipelining than with loop unrolling when using a VLIW machine?

Case study: compare and contrast an "old" (pre-2000) and a "modern" high-performance processor of your choice. Make notes on anything you have covered in the two Computer Architecture courses: number of transistors, branch predictor, caches, pipeline depth, support for out-of-order execution, instruction set(s), parallelism, etc.

Here are some suggestions to get you started. Note that much of the requested information is not published by the manufacturers, so you might need to dig deeper to find it out.
- Old: Alpha 21264, Intel Pentium 4, MIPS R10K
- Modern: AMD Zen 2, Apple M1/A14, Intel Alder Lake, RISC-V BOOM (unusual: AMD Bulldozer)
Please spend 90 minutes to 2 hours on this.