Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Contact info
Forum
 
Other projects
   Altirra

Search

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ A shader compiler for VirtualDub, part 3

Okay, back to something VirtualDub related. :)

I went ahead and did some more work on the vdshader filter, and thus I present version 1.2:

http://www.virtualdub.org/downloads/vdshader-1.2.zip
http://www.virtualdub.org/downloads/vdshader-1.2-src.zip

This version has two major enhancements:

The combined effect of these two changes is that you can now make new VirtualDub filters entirely in vdshader, without writing any C++ code. The individual .fx files show up in the VirtualDub filter list, have their own config dialog, work in batch mode, and can be used just like any other VirtualDub filter written as a regular DLL.

Now, the performance of such a video filter won't be quite as good as a well-optimized C++/asm filter, but I've made some changes on that front, as well. First, vdshader now has an SSE2 JIT which transforms shaders into SoA form and does register allocations, thus running shaders much faster than the scalar x87 JIT. Second, the optimizer has been beefed up substantially and performs more optimizations, such as:

As an example, in the expression tex2D(src, float2(uv.x, pos*4-1)), the optimizer will identify that (pos*4-1) is an invariant, and hoist both it and the texture V axis clamp/wrap checks out of the loop. These optimizations are particularly useful for tunable parameters, where you can perform complex preconditioning on the parameters and the optimizer will ensure that the calculations are done only once before entering the pixel loop.

There are several other improvements:

Example of a custom shader after the jump.

We'll use this file as an example:

// Name: RGB Scale
// Author: Avery Lee
// Description: Scales individual RGB channels.
float red <
 bool vd_tunable = true;
 float vd_tunablemin = 0;
 float vd_tunablemax = 2;
 float vd_tunablesteps = 200;
> = 1;
float green <
 bool vd_tunable = true;
 float vd_tunablemin = 0;
 float vd_tunablemax = 2;
 float vd_tunablesteps = 200;
> = 1;
float blue <
 bool vd_tunable = true;
 float vd_tunablemin = 0;
 float vd_tunablemax = 2;
 float vd_tunablesteps = 200;
> = 1;
extern sampler src : register(s0);
float4 main(float2 uv : TEXCOORD0) : COLOR0 {
 return tex2D(src, uv) * float4(red, green, blue, 0);
}

With vdshader.vdf in the plugins or plugins32 subdirectory under VirtualDub.exe, and with this file saved as FXFilters\test.fx underneath that, this filter will then show up in the filter list:

[Video filter dialog with RGB Scale (VDFX)]

Add this filter to the list and hit Configure, and vdshader automatically creates a configuration dialog based on the tunable parameters:

[Filter configuration dialog]

The configuration dialog exposes each tunable float parameter as a slider, and supports live preview functionality as well. (As I write this, I notice that I forgot to add code to change the dialog caption from "Dialog." Whoops.)

And finally, choosing the base vdshader filter brings up the IDE as usual, where you can interactively edit shader .fx files.

[VirtualDub shader editor]

Comments

Comments posted:


i guess you have no problem finding a job ;-) you must have become a total expert during the time you spent developing vdub. just one question: what does "SoA form" mean? i couldnt find anything about it.

Tobias Rieper - 02 11 08 - 20:18


SoA stands for "structure of arrays." When processing data, it refers to a style of computation where each vector contains a single component across a number of data items. The opposite is the more traditional AoS (array of structures), where each vector contains multiple components from the same data item.

As an example, with a pixel that contains three components (Y/Cb/Cr or R/G/B), 4-way vector processing already loses 25% of its power in AoS form, because each vector contains RGB or YCbCr components from a single components and the fourth component is unused. With SoA processing, you would process four pixels at a time, with one vector holding all of the red components, another with all the green components, etc.

In a pixel shader, SoA processing is attractive because shaders can do a number of cross-component operations that are slow to do on current CPUs, including swizzles, write masks, and dot products. These require fixup operations that add latency and slow down the operation, whereas with SoA form you don't have to do deal with that. It's also a lot easier to compile for SoA because all you need to do is break down all the vector operations to scalar operations and then apply a traditional scalar optimizer, whereas vector optimizers are more difficult to write.

SoA processing is not without its own issues. One problem is that you have to process four items at a time, so you have to deal with odd counts at the end. Another problem is that SoA form tends to require a lot more temporary registers. The third problem is that inputs and outputs are often in SoA form, requiring expensive transpose operations in and out of the SoA routine.

Phaeron - 02 11 08 - 22:18


That was a great explaination; even I understood it.

Something I've been worndering about: could a video compressor or decompression be translated to run on the GPU, I guess in shader language or something like OpenGL? Even if it was slower, it would let the GPU act as another coporcessor, good for bath operations.

Well, I guess the real question is.. is it so much work that no-one has done it yet, or what?

Bajan13k (link) - 08 11 08 - 23:06


People are probably experimenting with it, but I'm out of the loop so I couldn't tell you how far along they are. I tend to do things on my own. :)

There are several fairly big problems with doing this on commonly available hardware. The first one is accuracy. You need to control error in the encoding pipeline to reduce mismatches against the decoder; if it's excessive, you get pulsing, gritting, and other ugly artifacts. In the MPEG-1 days you could get away with a lot, but as the video compression schemes have gotten higher and higher quality, the tolerances have gotten tighter, to the point that I believe many or all algorithms in the formats are now exactly specified. Unfortunately, all pre-DX10 hardware is floating point, and fairly loose (non-IEEE compliant) at that, so you're already forced to take a bit of error just trying to use a pixel shader. DX10 class hardware can do integer, but some integer operations are slower and less available than their floating-point counterparts.

The second problem you run into is an old one, which is the rate at which data can be pulled back from the video card to the graphics card. This has gotten much, MUCH better in recent years, but for the relatively low amounts of data and low latencies you'd need for a well-performing decoder, it's still a problem. Readback rates are pretty good at the 512x512 size (~1GB/sec IIRC), but it starts tapering off pretty badly below 64x64, where the overhead of initiating the transfer dominates. Performance is better with OpenGL and asynchronous readback with pixel buffer objects, but it's still not going to be great at small sizes.

The third problem is the CPU cost. Fact is, dispatching draw calls to the video card is expensive -- it takes a few hundred microseconds each, amortized, and involves pushing a couple of kilobytes of data through the command buffer. That's CPU power that could have been used to encode directly. By the time you throw in locks, memory traffic -- remember that transferring data to and from the video card requires hitting memory, not just the cache -- and things start to slow down considerably.

The fourth problem is parallelism. Things like color conversion and motion block search, I can see doing on the GPU. Bitrate regulation and stream encoding are much more sequential operations and thus difficult to parallelize. The standard shader model is hideously restrictive for this; CUDA is much more flexible, but still wants lots and lots of parallelism. Fact is, if you've got an inherently sequential operation, a Core 2 or dedicated bitstream hardware is going to kick the GPU's butt.

I suspect the final problem is just the amount of work involved in setting up the partial GPU acceleration and how invasive it would be in an optimized CPU encoding engine. It'd probably be difficult for a commercial encoding company to justify given the unknown benefit and the time it would take away from making the encoder run faster on an 8-way Xeon. As for the open source community, the problem is likely a paucity of people with heavy experience in both graphics hardware and video encoding (and who isn't already employed along those lines).

Phaeron - 08 11 08 - 23:54


I have to comment on this GPU encoder stuff.

First, AMD/ATI has just announced that with Catalyst 8.12 (that is December release) they will ship their AVIVO video transcoder which will enable encodes that take 3hrs to be done in less than 20min on ATI Radeon HD 4850 video card.

Second, latest GPUs have support for double precision as well.

As for the CPU encoding, I had a chance to test DivX on 24-core server (4x 6-core Dunnington CPUs) -- the result is disappointing because it only used 8 threads and the FPS was just a bit better than on my dual-core.

Igor Levicki (link) - 14 11 08 - 15:05


Thanks for the detailed explaination and relevant infomation.

It's given me a lot to think about....

Bajan13k (link) - 16 11 08 - 01:20

Comment form


Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
Name:  
Remember personal info?

Email (Optional):
Your email address is only revealed to the blog owner and is not shown to the public.
URL (Optional):
Comment: /

An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam.



Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.