Current version

v1.10.4 (stable)


Main page
Archived news
Plugin SDK
Knowledge base
Contact info
Other projects



« May 2021
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31          


01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004


Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ VirtualDub 1.8.1 released

I've finally pushed out 1.8.1 as a stable release, which promotes the new features added in 1.8.0 to stable status. If you are interested in features that were previously only in the 1.8.0 experimental version or are having trouble with the previous 1.7.8 stable version, I recommend checking out 1.8.1 to see if your issue has been fixed. This version was supposed to go out last week, but I was stymied by a login problem in the SourceForge upload system.

There are a couple of new features in this release, the main one being the distributed job support, which allows using different instances of VirtualDub to create and run jobs, and you can even use multiple machines if you get all the paths and plugins synchronized. It uses a filesystem-based sharing model, as it's designed for ease of setup with a few machines rather than a whole cluster farm. There is also now support for running the video compressor in a second thread for better dual-core performance, and now that I have a readily available 64-bit station again, I'm starting to bring the AMD64 build up to feature parity with the 32-bit version.

I've been really busy at work lately, so I'm afraid my progress on VirtualDub will be substantially slowed for the next few months (more so than usual), but I do have a few things already in flight for 1.8.2/1.8.3. Nothing big, but one of the things I've been doing is reversing some of my assembly-only features to C so I have an easily maintained baseline and so that the features can be enabled for AMD64. I've also been doing some prototypes here and there for the next big change, but nothing concrete's surfaced yet.

Changelist after the jump.

(Read more....)

§ 3D graphics acceleration over Remote Desktop

A friend of mine tipped me off to an interesting post on DIRECTXDEV about GPU acceleration over Remote Desktop. After some investigation with a couple of machines, I discovered that yes, it is actually true that you can run 3D apps over Remote Desktop (Terminal Server). The key is that the machine you are remoting into must be running Windows Vista with a WDDM driver. Beyond that, it actually works as expected, although a bit slow. Some details:

Now, the downside: it's slow. Really slow. So slow that I'd say it's basically unusable unless you're on a LAN, and a fast one at that. It looks like Terminal Services tries to send over the data from every Present(), and it blocks the app until that happens, instead of just skipping frames. It eats a lot of CPU in the process, too, instead of just waiting. I was just barely able to push 640x360 video at 24 fps with VirtualDub's D3D9 display minidriver, which was throwing about 10MB/sec across the gigabit LAN here -- probably about the best that the consumer-level hub can do. Readback doesn't seem to be a problem, at least not on the GeForce 6 and 8 cards I have here, because everything speeds back up if you cover enough of the 3D window.

Basically, this means that 3D support is usable with apps that just use it for 2D acceleration or otherwise static rendering, but anything dynamic like a video player or a game is going to be far too slow. It might be more viable if Microsoft had implemented frame dropping, but it looks like VNC may still be better for that. What this is definitely good for, though, is running GPU accelerated apps remotely. On XP, this isn't possible over Remote Desktop because as soon as you log in all your apps are pushed onto the software-only driver. On Vista, though, they could continue to run on the server-side GPU, and performance isn't a problem if the app isn't displaying the result continuously.

(Read more....)

§ A bit of an unfortuate icon mixup

Windows caches icons in several places, one being the shell, and another being in VRAM by the display driver. Whenever any of these caches goofs up, the wrong icons show up. This is a particularly unfortunate mixup:

[Firefox with IE icon]

Looks like the browser war is escalating inside my computer....

(Read more....)

§ Does VirtualDub do a color conversion when converting I420 to YV12?

No, current versions of VirtualDub do lossless conversions between I420 and YV12.

I420 and YV12 are FOURCCs for uncompressed video formats that use the YCbCr color space with 4:2:0 sampling and with the images stored as back-to-back, non-interleaved planes. The only difference between them is that the Cb and Cr planes are swapped in the encoding order. Parts of VirtualDub's video processing pipeline were made YCbCr-aware starting in 1.6.0, and this was extended over later versions up to 1.8.0, which makes the video filter pipeline YCbCr-capable as well. The video pipeline will still do format conversions as necessary, but when converting between YCbCr formats it is capable of simply reinterleaving the data or resampling the chroma planes without touching the luma plane. In the specific case of I420 and YV12, the two video formats map to the same internal format (YUV420_Planar), but with the two secondary plane entries swapped. Conversion between the two simply involves swapping the planes or doing three plane copies.

Sharp users will notice that the Video Color Depth dialog only allows you to choose YV12 and not I420. Well, that was a mistake on my part. When I originally reworked the video pipeline to allow YCbCr formats, I made the mistake of exposing the internal format as the setting in the pipeline configuration. As a result, when you select YV12 in that dialog, you're actually selecting the internal 4:2:0 planar format, and this later gets mapped in the back end to YUV420_Planar variant 0, which is YV12. I420 is variant 1, which you can't get to because the pipeline hardcodes variant 0. The same goes for Y800 vs. Y8. I suppose it wouldn't be too hard to push out the variant setting and add compatibility code in the script layer, but I haven't heard many requests for it (read: zero).

(Read more....)

§ A not so good way to decode Huffman codes

Having made a working Huffyuv decoder, I took a shot at making it faster than the 2.1.1 decoder... and failed.

I guess I'll present the algorithm here as a curiosity. Perhaps someone can suggest a variant that's actually worthwhile.

Huffyuv encodes its main bitstream as a series of interleaved Huffman-encoded codes, with the order being a repeating Y-Cb-Y-Cr for the YUY2 mode, and the output being a series of byte values or byte deltas, which then possibly go through the predictor. There are thus three variable length decoders involved. The traditional way to handle this is to break down each code into a series of tables and select the right table according to the next N bits in the stream, either by a branch tree of comparisons against the window value, or by some faster way of looking up the ranges. In particular, the Bit Scan Forward (BSF) and Bit Scan Reverse (BSR) instructions on x86 are attractive for this.

I decided to try a different tack, which was to build a state machine out of all three of the tables simultaneously, with the form: (current state, input byte) -> (next state, advance_input, advance_output, output_byte). There are a few advantages to doing this, one being that no bit manipulations are needed on the input stream since it is always consumed a byte at a time, and no branches at all are required. All three decoders live in the same state machine, so in the end, the decoding loop looks like this:

movzx edx, byte ptr [ecx] ;read next byte
mov   eax, [eax + edx*4]  ;read next state
mov   [ebx], al           ;write output byte
and   eax, 0ffffff00h     ;remove output byte
add   eax, eax            ;shift out input_advance and compute next_state*2
adc   ecx, 0              ;advance input pointer
add   eax, eax            ;shift out output_advance and compute next_state*4
adc   ebx, 0              ;advance output pointer
add   eax, edi            ;compute next state address
cmp   ebx, ebp            ;check for end of decoding
jne   decloop             ;if not, decode more bytes

Now, one of the annoying issues with trying to optimize a Huffman decoder, or a decoder for any sort of variable-length prefix code for that matter, is that it's an inherently serial procedure. You can't begin decoding the second code in parallel until you know where the first one ends. (Okay, you could if you had a parallel decoder for every bit position, but that's typically impractical.) That means the decoding speed is determined by the length of the critical dependency path, which in this case is 9 instructions (movzx, mov, mov, and, add, add, add, cmp, jne). I suspect I could trim that down a little bit if I hardcoded the base of the table and rearranged the code somewhat, but it turns out to be irrelevant. For the standard Huffyuv 2.1.1 YUY2 tables, the state machine turns out to be 1.8MB in size, and that means the decoding routine spends most of its time in cache misses on the instruction that fetches the state entry, the second instruction. Rats.

In the end, it did work, but was around 20-30% slower than a SHLD+BSR based decoder, at least on a T9300. That doesn't count the bitstream swizzle and prediction passes, but those take almost no time in comparison. It might be more lucrative for a smaller sized variable length code or one where the minimum code length is 8 bits and the conditional input advance could be dropped.

In general, it seems pretty hard to beat a SHLD+BSR decoder. This is particularly the case on a PPro-based architecture where BSR is very fast, two clocks for PPro/P2/P3/PM/Core/Core2, and one clock for enhanced Core 2 (!). The P4s seem a bit screwed in general, because while BSR is slow, so's practically everything else. Athlons are a bit weird -- they have slow BSR/BSF like the original Pentium and slow scalar<->vector register moves, but they're fast at scalar ops. That probably explains why I saw a branch tree instead of BSR the last time someone sent me a crash dump in Huffyuv 2.2.0's decoder....

I'm tempted to try putting a first-level table check on the decoder to see if that helps. The way this works is that you pick a number of bits, k, and determine how many codes are of length k or less. You encode those directly in a table of size 2^k as (code, length) pairs and everything else goes to more tables or an alternate decoder. Ideally, the effectiveness of this table is determined simply by how many entries are encoded directly in it, e.g. if 1800/2048 entries can use the fast path then you'll hit the fast path 87% of the time. In practice, this can vary depending on how well the distribution of the encoded data matches the distribution used to create the prefix code tree; they may not always match well when the tree is static, as is the case in vanilla Huffyuv. It's also questionable in this case because the advantage over the BSR is slight and the need for a fallback from the table requires adding an asymmetric but poorly predicted branch.

As a final note, most of these optimizations are only possible when Huffman codes are placed in the bitstream starting from the MSB of each word, as Huffyuv does, and as many standard encodings do, including JPEG and MPEG. Deflate's a pain in the butt to decode in comparison because it places codes starting at the LSB, which means you can't easily treat the bitstream as a huge binary fraction and use numeric comparisons, as the methods I've described above do.

(Read more....)

§ Gee, thanks a lot

It might surprise some of you to learn that I've never ripped a DVD. The main reason is that I don't really have an interest in doing so; my DVD collection is fairly small and I've already watched Alias and Slayers a thousand times. I don't travel much, either, so the portability issue hasn't cropped up.

I was tempted to learn today, though.

As I was walking out of the local neighborhood tech store, I spotted a DVD version of Office Space, which is a movie I like, but haven't seen in a while and never bothered to get, so I grabbed it and bought it along with the rest of the stuff I had. When I got home and started populating my new hard drive, I popped the DVD into the DVD player for background entertainment and waited for the standard FBI screen to expire so I could get to the main menu.

Only to be subjected to another clip comparing DVD piracy to shoplifting and other kinds of theft and some rather horrible music. That was practically a full minute long.

Now, I'm going to warn you that I really don't care to get into any arguments about whether copying a DVD is piracy or theft or copyright infringement or whether it is legal, ethical, moral, social, methyl, episcopal or whatever. I really don't care to have such a flamewar on my blog, and if anyone posts a comment about that I'm just going to delete it on the spot. I will say, though, that my thinking upon seeing this clip was more along these lines:


I haven't bought a DVD in a while and I had no idea they were now doing this. Putting in a 1-2 second FBI warning is one thing, but why should I buy a DVD ever again if they're going to force this irritating, condescending garbage down my throat every time I want to watch a movie that I legitimately purchased? Is it any wonder that people pirate movies now with this stupidity going on?


I received a request by email to tone down the language in this post. I refuse to do this. I am very offended by the contempt shown by the addition of this video and I want to make it very clear that I consider the addition of such a video to a product to be very inappropriate. A friend of mine brought up the very good point that he has videos intended for children that have this video prepended, which starts with a clip of a woman having her purse stolen. I don't condone piracy -- but this kind of treatment is NEVER called for.

(Read more....)