Advanced Computer Architecture Supervision 3
Recommended reading
Computer Architecture: A Quantitative Approach (5th edition) by Hennessy and Patterson:
- Chapter 2: Memory Hierarchy Design
- Appendix G: Vector Processors in More Depth
(Previous editions are also fine but may have different chapter names/numbers.)
Exercises
- Briefly describe the differences between coarse-grained, fine-grained and simultaneous multithreading.
- Describe one technique for reducing the thread switch penalty in a coarse-grained multithreaded processor.
- What are the advantages of exploiting Thread-Level Parallelism (TLP) in addition to Instruction-Level Parallelism (ILP)?
- 2020 Paper 9 Question 4, parts b) and c) only
- 2014 Paper 7 Question 5, all except part d)
- 2008 Paper 7 Question 5
- A naive programmer writes the following code for performing the matrix multiply-add function C=AB+C on square matrices:
for (i=0; i<N; ++i) { for (j=0; j<N; ++j) { for (k=0; k<N; ++k) { C[k][i] = C[k][i] + ( A[k][j] * B[j][i] ); } } }
(Where X[v][u] refers to the element in row v, column u. Arrays are stored in memory row by row, i.e. X[0][0], X[0][1], X[0][2], ...X[0][N], X[1][0], ... etc.)- When used to multiply very large matrices, performance of the programmer’s algorithm is very poor. Explain what is happening.
- The algorithm can be improved simply by changing the order of the loops. Demonstrate how and why.
- Why can vector processors be particularly energy efficient when executing some types of program? In which situations might a vector processor perform worse than a simple pipelined processor?