Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Contact info
Forum
 
Other projects
   Altirra

Search

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ Beware of Huffyuv's "Predict median" Mode

Ben Rudiak-Gould's Huffyuv is a fairly common lossless video codec on Windows. Its main advantages are fast compression speeds, lossless support for both YCbCr and RGB video, and of course, it's free. Because it's lossless, the compression ratios that can be obtained are limited, but generally you can get around 1.6:1 to 2.2:1 on average, and that's a big help when you can use it during real-time video capture.

There is, however, a bear trap in its settings.

If you bring up the configuration dialog for the video codec (Video > Compression > Huffyuv, Configure in VirtualDub), there are two combo boxes at the top of the dialog which set the prediction modes for YUY2 and RGB video. The prediction mode changes the way that Huffyuv attempts to detect patterns in the video and thus the speed and effectiveness of the compression. Most of the modes are fast for both compression and decompression, but you should be aware of the Predict Median mode on the YUY2 side. I've made the mistake of picking this mode before, and while it's fast for compression, it's unexpectedly slow for decompression. The reason it's so slow is that the Predict Median predictor goes through a non-vectorized scalar code path on the decompression side and is thus very slow. This won't affect your video, of course, but if this turns out to be too slow for your post-processing work, you can always recompress it again with another Huffyuv predictor thanks to the lossless nature of the codec.

I should note that I only have experience with the last official version of Huffyuv, 2.1.1. There have been some unofficial updates to Huffyuv to add workarounds for compatibility issues with applications and to add YV12 support, but I don't know if those have any speed improvements to the predictor. I looked into trying to speed up the median predictor in it at one point but was unable to do so, and I'm guessing that it's a hard problem because it shares a lot in common with the Paeth predictor in PNG, which is also inherently serial and hard to parallelize. I also looked into rewriting the bitstream parser into C++, but I tried using compiler intrinsics and of course ended up hanging both the VC8 and VC9 compilers.

The MultimediaWiki has a description of the Huffyuv compression algorithm: http://wiki.multimedia.cx/index.php?title=HuffYUV. Unfortunately, it seems to be missing a few critical details, most notably the exact nature of the VLC bitstream: 32-bit words in little endian format, codewords placed starting from the MSB, Huffman codes allocated longest codeword first, and with a maximum codeword length of 31 bits.

Incidentally, just like with Avisynth, I've noticed that people like to butcher the name of this codec for some reason. It's Huffyuv, first letter capitalized, rest in lowercase. It's been that way since 1.0.0 and appears that way in the documentation, dialogs, and source code. Yet, for some reason, people keep misnaming this codec HuffYUV.

Comments

Comments posted:


I don't find "HuffYUV" objectionable, but I do see people type "HuffyUV" a lot. People also tend to butcher my handle, typing "stickyboy" instead of "stickboy". Go figure.

As for AviSynth, although that's not the capitalization BenRG originally used, it is the capitalization used by the people who picked up the project after BenRG.

James - 26 05 08 - 21:43


Did you try something like this: http://forum.doom9.org/showthread.php?p=..

I gave it a shot but speed ended up being the same.

squid - 28 05 08 - 01:42


Ew, yuck. You definitely DON'T want to do that since you'll get killed on the transpose due to cache misses. Huffyuv has a low ALU-to-memory count, so it's pretty sensitive to issues like that (read: it's fast because it doesn't do much). Also, as the guy noted, you can't do that with the stock Huffyuv format due to the cylindrical dependency.

It is possible to decode U and V in parallel, but I don't know how useful that is with Y being the bottleneck. I guess if the operation is bottlenecked just by pure ALU op count then it could give a ~30% boost or so.

I'm currently working on implementing support for some of the extensions that have been introduced by others post 2.1.1, and I think I want to beat the person who introduced them. (Was it you?) The annoying addition is dynamic Huffman tables, which wouldn't be a problem except that they're ENDIAN SWAPPED. Basically, someone wrote their own decoder/encoder that worked by swizzling the entire frame from Huffyuv ordering (32-bit LE words) to big endian so standard JPEG/MPEG style bitstream routines would work, and then they added the Huffman table on the front and reswizzled the whole buffer. This means that not only is the Huffman table swizzled, but it also misaligns the rest of the frame so that it can't be decoded with the original bitstream decoder! Endian swizzling the entire buffer is not a good idea for speed. I think I can work around this by priming the decoder, but what a pain.

(In case you haven't guessed, I'm adding a Huffyuv decoder to VirtualDub.)

Phaeron - 28 05 08 - 04:24


Avery, for fast endian swap on Core 2 Duo and Penryn you can use PSHUFB instruction to swap 16 bytes at a time (best if unrolled to cache line size of course). I agree that HuffYUV is a mess.

As for the capitalization, it is simple -- Huffman + YUV, only tech savvy people will capitalize it like that. I suppose that those who write Huffyuv follow standard English capitalization rules.

Igor Levicki (link) - 28 05 08 - 12:06


The version I’ve been using for a long time is “HuffYUV revisited” 2.2.0
http://forum.doom9.org/showthread.php?t=..

BugsBunny - 28 05 08 - 16:08


I did add yv12, dynamic tables and a directshow encoder interface but I didn't release it into the wild. Any encodes using advanced features use the fourcc ADHF to avoid conflicts. For dynamic tables I stored the tables (in the same format as what goes in extradata) before the compressed frame data and they were extracted using the existing huffuv functions. The frequency of table recalculation could be set in the codec config so there were delta frames with virtually no speed loss when seeking as long as the app used the ICDECOMPRESS_PREROLL flag properly.
AFAIK the only other dynamic tables implementation comes from ffdshow/libavcodec...

squid - 28 05 08 - 16:26


Also, isn't the maximum codelength limited to 27 bits, with the remaining 5 bits used to store the length for faster encoding? I'm not sure about this but seem to recall something like it somewhere...

squid - 28 05 08 - 18:42


@Igor:
Yeah, but I doubt that endian swap would be ALU bound even if you just did BSWAP, which is single clock. SSE2 isn't so bad either (PSHUFLW + PSHUFHW + PSRLW + PSLLW + POR).

The main objection I have is that the second group of people decided to rename the frigging codec for no good reason.

@squid:
Might be a restriction of the 2.1.1 encoder, but the decoder should be able to decode anything up to 31 bits. (It ORs in the LSB in order to avoid having to test the flags from the BSR instruction.) I'm not sure what ffmpeg does, although when I tested ffvfw it just put the same Huffman tables in all frames for all channels, which isn't very dynamic.

This certainly has been more interesting than I had expected -- got pretty quick and decent responses from the VC++ team about the 64-bit codegen bugs I hit.

Phaeron - 28 05 08 - 23:55


They type HuffYUV to avoid confusion with ♥Huffyluv♥

user - 29 05 08 - 02:42


Avery, AFAIK HuffYUV actually works on R-G, G, and B-G for prediction so the name is a bit off anyway.

As for BSWAP, it is 1 clock but it can swap only 4 bytes at a time while PSHUFB shuffles 16 bytes per clock, and on Core i7 (Nehalem) CPUs PSHUFB has two 128-bit shuffler units so it executes at a rate of 0.5 clocks.

Igor Levicki (link) - 02 11 08 - 03:49


If you're willing to change the HuffYUV bitstream format, it's possible to make Median SIMDable by reordering the pixel scan in a diagonal manner, so that you can decode 16 independent pixels at the same time, thus allowing SIMD median. This would also speed up the other prediction modes.

Dark Shikari - 06 01 09 - 12:39

Comment form


Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
Name:  
Remember personal info?

Email (Optional):
Your email address is only revealed to the blog owner and is not shown to the public.
URL (Optional):
Comment: /

An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam.



Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.