HOME UP PREV NEXT (Polyhedral Address Mapping)

Discovering Parallelism: Classical HLS Paradigms

The map-reduce paradigm allows the map part to be done in parallel.

An associative reduction operator gives the same answer under diverse bracketings.

Common associative operators are addition, multiplication, maximum and bitwise-OR. Our first example uses xor.

Here we look at some examples where the bodies of an iteration may or may not be run in parallel.

  public static int associative_reduction_example(int starting)
  {
    int vr = 0;
    for (int i=0;i<15;i++) // or also i+=4
      {
	int vx = (i+starting)*(i+3)*(i+5);
	vr ^= ((vx & 128)>0 ? 1:0);
      }
    return vr;
  }

Where the loop variable evolves linearly, variables and expressions in the body that depend linearly on the loop variable are termed linear induction variables/expressions and can be ignored for loop classification since their values will always be independently available in each loop body with minimal overhead.

A loop-carried dependency means that parallelisation of loop bodies is not possible.

Often the loop body can be split into a part that is and is-not dependent on the previous iteration, with the is-not parts run in parallel at least.

  public static int loop_carried_example(int seed)
  {
    int vp = seed;
    for (int i=0;i<5;i++)
      {
	vp = (vp + 11) * 31241/121; // (fixed)
      }
    return vp;
  }

A value read from an array in one iteration can be loop forwarded from one iteration to another using a holding register to save having to read it again.

  static int [] foos = new int [10];
  static int ipos = 0;
  public static int loop_forwarding_example(int newdata)
  {
    foos[ipos ++] = newdata;
    ipos %= foos.Length;
    int sum = 0;
    for (int i=0;i<foos.Length-1;i++)
      {
	sum += foos[i]^foos[i+1];
      }
    return sum;
  }

Data-dependent exit conditions also limit parallelisation, although a degree of speculation can be harmless. How early in a loop body the exit condition is determined is an important consideration.

When speculating we continue looping but provide a mechanism to discard unwanted side effects.

  public static int data_dependent_controlflow_example(int seed)
  {
    int vr = 0;
    int i;
    for (i=0;i<20;i++)
      {
	vr += i*i*seed;
	if (vr > 1111) break;
      }
    return i;
  }

The above examples have been demonstrated using Kiwi HLS on the following link »Kiwi Common HLS Paradigms Demonstrated. »Wikipedia:Loop Dependence

Compared with software compilers, like gcc, HLS tools are currently relatively immature.

Here is an example from Vivado HLS of a false positive on loop-carried dependency. The following refuses to compile. »LINK

//directive PIPELINE set
uint8_t process_image(uint8_t pxVal)
{
    static unsigned int x = 0;
    static uint8_t lines[16][WIDTH];
       
         //directive UNROLL set
    for(int i=0; i < 15; i++) {
        lines[i][x] = lines[i + 1][x];
    }
    lines[15][x] = pxVal;
    uint8_t result = lines[1][x];
    x = (x + 1) == WIDTH ? 0 : (x + 1);
    return result;
}

Why does this fail? When we convert from a 2-D to a 1-D array by mutiplying up the indecies we loose information that the subscripts are manifestly distinct.