Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Donate
Contact info
Forum
 
Other projects
   Altirra

Search

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ Alpha blending without SIMD support

Now that we've covered averaging bitfields, how to efficiently alpha blend with a factor other than one-half?

Alpha blending is normally done using an operation known as linear interpolation, or lerp:

lerp(a, b, f) = a*(1-f) + b*f = a + (b-a)*f

...where a and b are the values to be blended, and f is a blend factor from 0-1 where 0 gives a and 1 gives b. To blend a packed pixel, you could just expand all channels of the source and destination pixels and do the blend in floating point, but it really hurts to see this, since the code turns out nasty on practically any platform. Unless you've got a platform that gets pixels in and out of a floating-point vector really easily, you should use integer math for fast alpha blending.

So how to blend quickly without resorting to per-channel?

First, if you are dealing with an alpha channel instead of a constant alpha value, chances are that the alpha value ranges from 0 to 2^N-1, which is not a convenient factor for division. You could cheat and just divide by 2^N, but that leads to the unpleasant result of either the fully transparent or opaque case not working correctly (sloppy). Conditionally adding one to the alpha value fixes this at the cost of introducing a tiny amount of error. I used to add the high bit, thus mapping [128,255] to [129,256] for 8-bit values; I'm told that shifting [1,255] to [2,256] leads to better accuracy. Either way will prevent the glaring error cases, though.

The next step is to reformulate the blend equation in terms of integer math:

lerp(a, b, f) = a + (((b-a) * f + round) >> shift)

where round = 1 << (shift - 1).

To eliminate some of the pack/unpack work, realize that you can alpha blend a channel in place as long as you isolate it and have enough headroom above it in the machine word to accommodate the intermediate result of the multiply. In other words, instead of extracting red = (pixel >> 16)&0xff, blending that, and then shifting it back up, simply blend (pixel & 0x00ff0000).

Now, the magic: you can actually do more than one bitfield this way as long as you have enough space between them. If you have two non-overlapping bitfields combined as (a << shift1) + (b << shift2), multiplying their combined form by an integer gives the same result as splitting them apart, multiplying each, and then recombining. For a 565 pixel, you could thus blend red and blue in the following manner (remember that the red/green/blue masks for 565 are 0xf800, 0x07e0, and 0x001f, respectively):

rbsrc = src & 0xf81f
rbdst = dst & 0xf81f
rbout = ((rbsrc * f + rbdst * (32-f) + 0x8010) >> 5) & 0xf81f

Which leads to the surprising result that you can safely subtract the two bitfields together and scale the difference without any fancy SIMD bitfield support.

rbout = (rbdst + (((rbsrc - rbdst) * f + 0x8010) >> 5)) & 0xf81f

The remaining green channel is easy. Doing it this way does limit precision in the blend factor, since you're limited to the number of bits of headroom you have, but five bits for 565 is decent. If you also have an alpha channel to blend, you can do so, although you might need to temporarily shift down green and alpha together to make headroom at the top of the machine word if you're dealing with a big pixel.

What if you didn't have a hardware multiply, or the one you have is very slow? Well, you might use lookup tables, then. Ideally, though, you'd like to avoid inserting and extracting the channels again. One dirty trick you can use revolves around the fact that you can distribute the multiplication over the additive nature of bits, thus allowing the lookup tables to be indexed off the raw bytes instead of the channels:

unsigned blend565[33][2][256];
void init() {
for(unsigned alpha=0; alpha<=32; ++alpha) {
unsigned f = alpha;
        for(unsigned i=0; i<256; ++i) {
blend565[alpha][1][i] = (((i & 0xf8)*f) << 19) + (((i & 0x07)*f) <<  3) + (0x04008010 >> 1);
blend565[alpha][0][i] = (((i & 0xe0)*f) >>  5) + (((i & 0x1f)*f) << 11);
}
}
}
void blend565(unsigned dst, unsigned src, unsigned alpha) {
unsigned ialpha = 32-alpha;
unsigned sum = blend565tab[alpha][0][src & 0xff] + blend565tab[ialpha][0][dst & 0xff] + blend565tab[alpha][1][src >> 8] + blend565tab[ialpha][1][dst >> 8];
    sum &= 0xf81f07e0;
    return (sum & 0xffff) + (sum >> 16);
}

It may look odd because we're actually splitting the green bitfield between the two lookup tables, but it works -- essentially, it's combining partial products from the lower and upper halves of the green bitfield. I've also thrown the rounding constant into the tables to save an addition. The table's rather big at 67K, but if you are doing alpha blending off of a constant, you can cache pointers to the two pertinent rows and then only 4K of tables are used, which is much nicer on the cache. The shifting/masking in the table lookups are also unnecessary if you load the source pixels as pairs of bytes instead of as words.

Incidentally, if you think about it, this trick can also be used to convert any bitfield-based 16-bit packed pixel format to any other bitfield-based pixel format up to 32 bits with a single routine, just by changing 2K of tables. This generally isn't worthwhile if you have a SIMD multiplier -- Intel's MMX application notes describe how you can abuse MMX's pmaddwd instruction to convert 8888 to 565 at about 2.1 clocks/pixel -- but it can be handy if you find yourself without a hardware multiplier or even a barrel shifter.

Comments

Comments posted:


I'm a bit confused, can you show how blending is done between an ARGB (foreground) and RGB32 (background, no alpha data, same bit positions for RGB values)?

Blight - 08 07 06 - 16:44


unsigned blend2(unsigned src, unsigned dst) {
unsigned alpha = src >> 24;
alpha += (alpha > 0);

unsigned srb = src & 0xff00ff;
unsigned sg = src & 0x00ff00;
unsigned drb = dst & 0xff00ff;
unsigned dg = dst & 0x00ff00;

unsigned orb = (drb + (((srb - drb) * alpha + 0x800080) >> 8)) & 0xff00ff;
unsigned og = (dg + (((sg - dg ) * alpha + 0x008000) >> 8)) & 0x00ff00;

return orb+og;
}

Phaeron - 08 07 06 - 17:01


Another nice trick in this area is using premultiplied inverse alpha when you need to stack together a lot of images with alpha channel before blending them on top of the destination image/video.

Haali - 08 07 06 - 18:22


Thanks Phaeron, quite informative.

Also, you can skip alpha if alpha = 0 or alpha = 255 (just copy the source/destination at 100%, that actually speeds things considerably).

Blight - 09 07 06 - 07:08


Thank for these great informations!
I've a question: this is the integer math version of the formula:

lerp(a, b, f) = a + (((b-a) * f + round) >> shift)

where round = 1

dudez - 26 07 06 - 17:40


where "round" come from?!

dudez - 26 07 06 - 18:26


It's just the fixed-point version of 0.5. The shift rounds toward negative infinity, so the bias is necessary to minimize average error.

Phaeron - 27 07 06 - 02:18


thank you, i thought it was the 0.5 fp version, but better you confirmed me it was ;)

dudez - 27 07 06 - 10:04


Hi,

lerp(a, b, f) = a + (((b-a) * f + round) >> shift)

dont you think you ll be screwing with color channels by doing (b-a), say if Blue(b) < Blue(a), then a carry form RED(b) will screw GREEN(b)-GREEN(a).

reprobate - 28 09 09 - 18:56


Yes, in the general case. That's kind of the point of this post -- you can do multiple channels at a time via bit masks if you have enough padding bits between the bitfields to accommodate the extra precision produced by the multiplication.

Phaeron - 29 09 09 - 15:17

Comment form


Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
Name:  
Remember personal info?

Email (Optional):
Your email address is only revealed to the blog owner and is not shown to the public.
URL (Optional):
Comment: /

An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam.



Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.