## ¶SIMD averaging without SIMD hardware

If you need to average two sets of bitfields using larger word operations, there is a trick derived from a hardware algorithm for constructing adders that can come in handy. The basic comes from the design of a *carry-save adder*, which splits apart carry and sum signals to allow multiple values to be added with only one carry propagation pass at the end:

a+b = ((a & b) << 1) + (a ^ b)

(& is the C operator for bitwise AND, ^ is bitwise exclusive OR, and << is a left shift.)

To convert this into an average, propagate through the necessary shift:

(a+b) >> 1 = (a & b) + ((a ^ b) >> 1)

...and to apply this to bitfields, just mask off the least significant bits from each bitfield sum to avoid cross-field contamination:

(pel0+pel1) >> 1 = (pel0 & pel1) + (((pel0 ^ pel1) & 0xfefefefe)>>1)

This can be useful even if you have SIMD hardware that has built-in averaging. For instance, SSE has support for unsigned byte averages, but the above algorithm can be used to average four 565 pixels at a time by using the mask 0xf7def7def7def7de.

Note that the above algorithm always rounds down, which isn't always appropriate; MPEG motion prediction, for instance, requires rounding up. There is a trick that can be used to average up without using any additional operations.

The trick is to recognize that the addition can also be formulated using inclusive OR and subtraction:

a+b = ((a & b) << 1) + (a^b) = ((a|b) << 1) - (a^b)

Shifting out the LSB from the exclusive OR result then has the effect of rounding up instead of rounding down, giving:

(a+b+1) >> 1 = (a|b) - ((a^b)>>1)

(pel0+pel1+1)>>1 = (pel0 | pel1) - (((pel0 ^ pel1) & 0xfefefefe)>>1)