Department of Computer Science and Technology

Advanced Computer Architecture Supervision 3

Recommended reading

Computer Architecture: A Quantitative Approach (5th edition) by Hennessy and Patterson:

(Previous editions are also fine but may have different chapter names/numbers.)

Exercises

  1. Briefly describe the differences between coarse-grained, fine-grained and simultaneous multithreading.

  2. Describe one technique for reducing the thread switch penalty in a coarse-grained multithreaded processor.

  3. What are the advantages of exploiting Thread-Level Parallelism (TLP) in addition to Instruction-Level Parallelism (ILP)?

  4. 2020 Paper 9 Question 4, parts b) and c) only

  5. 2014 Paper 7 Question 5, all except part d)

  6. 2008 Paper 7 Question 5

  7. A naive programmer writes the following code for performing the matrix multiply-add function C=AB+C on square matrices:
    for (i=0; i<N; ++i) {
      for (j=0; j<N; ++j) {
        for (k=0; k<N; ++k) {
          C[k][i] = C[k][i] + ( A[k][j] * B[j][i] );
        }
      }
    }
    
    (Where X[v][u] refers to the element in row v, column u. Arrays are stored in memory row by row, i.e. X[0][0], X[0][1], X[0][2], ...X[0][N], X[1][0], ... etc.)
    1. When used to multiply very large matrices, performance of the programmer’s algorithm is very poor. Explain what is happening.
    2. The algorithm can be improved simply by changing the order of the loops. Demonstrate how and why.

  8. Why can vector processors be particularly energy efficient when executing some types of program? In which situations might a vector processor perform worse than a simple pipelined processor?