Course pages 2015–16

Computer Design

Here I'll post a list of questions that have been asked, whose answers may be useful to everyone.

If there will be more specialisation in computer architecture, will we see a return to more CISC-like ISAs?

This is unlikely. RISC instruction sets became popular because they make the job of designing the processor's pipeline and associated structures simple. In fact, instructions from CISC ISAs, like Intel's x86, are commonly decoded internally into RISC-like micro-ops at the start of the pipeline, to keep the complexity of the pipeline implementation low.

What we may well see with specialisation are additional accelerator units that can be added or removed depending on the processor's / SoC's target market segment. Those computing domains that don't require the additional functionality can choose to leave it out. This is quite timely because today ARM announced the introduction of a new processor, the Cortex-A35, which can be implemented with or without NEON (ARM's SIMD extensions), cryptography instructions, and other features. See the end of this article on AnandTech for more information.

Within the MSI cache coherence protocol, can you have multiple caches holding a block with status M?

No, this is not possible. If one cache wants to write to a cache line, the protocol prevents any other cache from holding the data in M or S state. A cache wishing to write to a line must issue a BusRdX transaction on the snoopy bus. Considering slide 39 from lecture 14, you can see that when a cache holding the data in state M sees this, then it must flush the data and move to I state. Any cache holding the data in state S must simply move to I state when seeing this transaction.

So do we have to do a full cache access each time we snoop the bus for cache coherence?

We don't have to do a full cache access, but we do have to do a tag check and, if there is a match, query the status bits for that line so that we know what action we need to take, if any. Although I showed the status bits appended to the data part of the cache line in lecture 14, slide 28, this is only the logical view on the cache line. A real implementation might split them off to another structure that is independent of the data part of the lines, so that the status bits can be queried more power efficiently.

Do we actually need to have status bits to indicate 'Invalid' state?

No, we don't. All addresses that are not in the cache are in I state. If the tag check doesn't find a match, the address is not in the cache and the status is 'Invalid' - we don't need to query any bits to find this out.

Why do we read data into the cache if we're doing a store, since we're just going to write over it?

We're assuming a write-allocate cache here, so on a store that misses in the cache, we bring the data in first, then do the write. The reason that we have to load it all in is that (for an L1 cache) we are only writing to part of the cache line. Recall from lecture 10 that cache lines contain multiple words of data, to take advantage of spatial locality. Typically the processor will only store a single word of data or less (e.g. a byte). We load the whole line into the cache then overwrite the part we want to change so that we keep the keep the hardware simple and don't have to keep track of the bytes within each line that have changed and those that still need to be loaded in.

Computer Laboratory