All of these tricks are intended to save a factor of at least two (and often ten or more) by local changes to code. Of course algorithmic improvement can provide much greater efficiency gains, so I am assuming that you have done all you can in this respect, BEFORE using these tricks.
int parity(unsigned8 x) {
return (((x * 0x10101) & 011111111) * 011111111) >> 21 & 1;
/* MULT makes 3 copies of x, then AND re-selects each bit in `x' only */
/* once (separated). MULT adds up all such in bit 21 (cf long */
/* multiplication); OK since there can be no carry in to bit 21. */
}
int haszerobyte(unsigned32 x) { return ((x - 0x01010101) & (~x) & 0x80808080) != 0; }
int haszerobyte(unsigned64 x) { return ((x - 0x010101010101010) & (~x) & 0x8080808080808080) != 0; }
int count_ones(unsigned36 x) { return ((((x * 01001001001) & 0x111111111) % 15; }
if A is a 9 bit quantity, B gets number of 1's (Schroeppel) IMUL A,[1001001001] ;4 copies AND A,[42104210421] ;every 4th bit IDIVI A,17 ;casting out 15.'s in hexadecimal ;if A is 6 bit quantity, B gets 6 bits reversed (Schroeppel) IMUL A,[2020202] ;4 copies shifted AND A,[104422010] ;where bits coincide with reverse repeated base 2^8 IDIVI A,377 ;casting out 2^8 - 1's ;reverse 7 bits (Schroeppel) IMUL A,[10004002001] ;4 copies sep by 000's base 2 (may set arith. o'flow) AND A,[210210210010] ;where bits coincide with reverse repeated base 2^8 IDIVI A,377 ;casting out 377's ;reverse 8 bits (Schroeppel) MUL A,[100200401002] ;5 copies in A and B AND B,[20420420020] ;where bits coincide with reverse repeated base 2^10 ANDI A,41 ;" DIVI A,1777 ;casting out 2^10 - 1's