HOME UP PREV NEXT (Instruction-Level Parallelism)

Parellelism: The key to high performance.

Transistors are abundant and having a lot of hardware is not itself a problem (We discussed the Power Wall elsewhere).

Three forms of parallel speed-up are well-known for classical imperative parallel programming:

Task-Level Parallelism: partition the input data over nodes and run the same program on each node without inter-node communication (aka embarassingly parallel).
Programmer-defined, Thread-Level Parallelism: The programmer uses constructs such as pthreads or CSharp Parallel.for loop to explicitly denote local regions of concurrent activity that typically communicate using shared variables.
Instruction-Level Parallelism: The imperative program (or local region of) is converted to dataflow form, where all ALU operations can potentially be run in parallel, but operands remain pre-requisite to results and load/store operations on a given mutable object must respect program order.

A major (yet sadly less-popular) alternative to thread-level parallelism is programmer-defined channel-based communication, that bans mutable shared variables (examples: Erlang/Occam/Handel-C/Kahn Networks).