HOME UP PREV NEXT (NOTITLE)

Discovering Parallelism: Classical HLS Paradigms

The well-known map-reduce paradigm allows the map part to be done in parallel.

An associative reduction operator gives the same answer under diverse bracketings.

Common associative operators are addition, multiplication, maximum and bitwise-OR. Our first example uses xor.

Here we look at some examples where the bodies of an iteration may or may not be run in parallel.

  public static int associative_reduction_example(int starting)
  {
    int vr = 0;
    for (int i=0;i<15;i++) // or also i+=4
      {
	int vx = (i+starting)*(i+3)*(i+5);
	vr ^= ((vx & 128)>0 ? 1:0);
      }
    return vr;
  }

Where the loop variable evolves linearly, variables and expressions in the body that depend linearly on the loop variable are termed linear induction variables/expressions and can be ignored for loop classification since their values will always be independently available in each loop body with minimal overhead.

A loop-carried dependency means that parallelisation of loop bodies is not possible.

Often the loop body can be split into a part that is and is-not dependent on the previous iteration, with the is-not parts run in parallel at least.

  public static int loop_carried_example(int seed)
  {
    int vp = seed;
    for (int i=0;i<5;i++)
      {
	vp = (vp + 11) * 31241/121;
      }
    return vp;
  }

A value read from an array in one iteration can be loop forwarded from one iteration to another using a holding register to save having to read it again.

  static int [] foos = new int [10];
  static int ipos = 0;
  public static int loop_forwarding_example(int newdata)
  {
    foos[ipos ++] = newdata;
    ipos %= foos.Length;
    int sum = 0;
    for (int i=0;i<foos.Length-1;i++)
      {
	sum += foos[i]^foos[i+1];
      }
    return sum;
  }

Data-dependent exit conditions also limit parallelisation, although a degree of speculation can be harmless. How early in a loop body the exit condition can be determined is an important consideration and compilers will pull this to the start of the block schedule (as illustrated earlier).

When speculating we continue looping but provide a mechanism to discard unwanted side effects.

  public static int data_dependent_controlflow_example(int seed)
  {
    int vr = 0;
    int i;
    for (i=0;i<20;i++)
      {
	vr += i*i*seed;
	if (vr > 1111) break;
      }
    return i;
  }

The above examples have been demonstrated using Kiwi HLS on the following link »Kiwi Common HLS Paradigms Demonstrated. »Wikipedia:Loop Dependence