Skip to content | Access key help

Department of Computer Science and Technology

Advanced Computer Architecture Supervision 3

Recommended reading

Computer Architecture: A Quantitative Approach (5th edition) by Hennessy and Patterson:

Chapter 2: Memory Hierarchy Design
Appendix G: Vector Processors in More Depth

(Previous editions are also fine but may have different chapter names/numbers.)

Exercises

Briefly describe the differences between coarse-grained, fine-grained and simultaneous multithreading.

Describe one technique for reducing the thread switch penalty in a coarse-grained multithreaded processor.

What are the advantages of exploiting Thread-Level Parallelism (TLP) in addition to Instruction-Level Parallelism (ILP)?

2020 Paper 9 Question 4, parts b) and c) only

2014 Paper 7 Question 5, all except part d)

2008 Paper 7 Question 5

A naive programmer writes the following code for performing the matrix multiply-add function C=AB+C on square matrices:
```
for (i=0; i<N; ++i) {
  for (j=0; j<N; ++j) {
    for (k=0; k<N; ++k) {
      C[k][i] = C[k][i] + ( A[k][j] * B[j][i] );
    }
  }
}
```
(Where X[v][u] refers to the element in row v, column u. Arrays are stored in memory row by row, i.e. X[0][0], X[0][1], X[0][2], ...X[0][N], X[1][0], ... etc.)
1. When used to multiply very large matrices, performance of the programmer’s algorithm is very poor. Explain what is happening.
2. The algorithm can be improved simply by changing the order of the loops. Demonstrate how and why.

Why can vector processors be particularly energy efficient when executing some types of program? In which situations might a vector processor perform worse than a simple pipelined processor?

© 2022 Department of Computer Science and Technology, University of Cambridge