HOME UP PREV NEXT (Other Expression forms: Hardware Construction Languages)

Other Models of Computation: Channel Communication

Using shared variables to communicate between threads is a low-level model of computation that:

requires that the user abide by self-imposed protocol conventions and hence can be hazard prone,
may not be apparent to the toolchain and hence optimisations may be missed,
requires cache consistency and sequential consistency,
has become (unfortunately) the primary parallel communications paradigm in today's chip multiprocessors (CMPs),
is generally better avoided (so say many at least)!

Mutable variables suffer from RaW, WaR and WaW hazards!

CSP and Kahn-like process networks are an important model of computation based on channels. In computation theory terms, they might be viewed as a set of Turing machines connected via one-way tapes. »CSP»Kahn Process Networks»pn tool for Process Networks.

Disadvantage: deadlock is possible.

Some languges, such as Handel-C, Occam, Erlang and the best Bluespec coding styles completely ban shared variables and enforce use of CSP-like channels »(LINK: Handel-C.pdf)

Handel-C uses explicit Kahn/Occam/CSP-like channels ('!' to write, '?' to read):

   // Generator (src)     // Processor          // Consumer (sink)
   while (1)              while(1)              while(1)
   {                      {                     {
     ch1 ! (x);             ch2 ! (ch1? + 2)      $display(ch2?);
     x += 3;              }                     }
   }

Using channels makes concurrency explicit and allows synthesis to re-time the design.

In CSP and Kahn-like networks, all communication is via blocking read and lossless write to/from unbound FIFOs.

A variation on the basic paradigm is whether or not a reader can peek into the FIFO and make random access removal. Another variation: a chordal dequeue may typically be supported, where an atomic (pattern-matching) read of multiple input channels is made at once, as in the join calculus. Atomic write multiple is also sensible to support at the same time. A basic system based on this paradigm requires the user code in this manner and renders RTL circuits correspondingly. Each channel has some physical FIFO or one-place buffer manifestation with the handshake wires being automatically synthesised.

But advanced compilers can:

Automatically convert standard code to this form (every shared variable or array becomes a process with exlicit read and write messages)
Automatically determine the FIFO depth needed at each point (or remove the FIFO entirely if deadlock will provably not arise),
Automatically conglomerate and collapse such forms during code generation, with handshaking wires internal to a module or that are always ready disappearing during synthesis.

A systolic array is similar to a CSP/Kahn network, but the processing elements operate in lock-step instead of having FIFO queues between the nodes. A number of HLS compilers target systolic arrays instead of (or as well as) the classical sequencer approach. Exercise for the sheet: It is argued that the systolic array approach is superior to a Kahn network when there will be no deviations from a static schedule. Consider a matrix multiplication or CNN application: is the FIFO between nodes needed for CNN accelerators?