Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Donate
Contact info
Forum
 
Other projects
   Altirra

Search

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ A shader editor for VirtualDub, part 2

A while ago, I introduced a video filter for VirtualDub that compiles a subset of High Level Shader Language (HLSL) and thus lets you write simple video filters without needing to use C++ or another native code language. Well, after digging around in WPF's pixel shader emulator and after a couple of feature requests, I had the urge to revisit my compiler, and so I present version 1.1:

http://www.virtualdub.org/downloads/vdshader-1.1.zip
http://www.virtualdub.org/downloads/vdshader-1.1-src.zip

The main change was adding support for several critical ps2.0 instructions, notably abs, cmp, log, exp, frc, min, and max. This allowed me to greatly expand the list of supported intrinsics, as well as adding relational and ternary operators. The last one is particularly important as I don't support if() right now, so they're the only way you can do branching. Technically, it's syntactic sugar, as the way you implement branching in ps2.0 is by evaluating all the paths and doing a cmp instruction at the end to merge the results, but I would like add if() support at some point. That aside, the shader engine should now support enough functionality that you can now do most pixel processing that you'd do in a real ps2.0 hardware shader on a GPU, and more importantly, ones that are actually useful for video. I've added example sRGB-to-linear and linear-to-sRGB functions to the preset list, if anyone wants to play around with linearizing algorithms.

One of the things I realized while working on this again, by the way, is the inadequacy of the HLSL documentation. As an example, one of the issues I ran into was how to handle argument matching for intrinsics that have overloads. Well, Microsoft didn't document that. Nor did they document the rules for conversions between scalar/vector/matrix types, as all I could find was this statement on casts:

An expression preceded by a type name in parenthesis is an explicit type cast. A type cast converts the original expression to the data type of the cast. In general, the simple data types can be cast to the more complex data types (with a promotion cast), but only some complex data types can be cast into simple data types (with a demotion cast).

That's quite inadequate, particularly since HLSL's behavior with regard to promoting matrices is to zero extend. This can lead to counterintuitive behavior when a 4x3 or 3x4 matrix is implicitly promoted to 4x4 with a zero in the (3,3) entry and doesn't work as intended. I've also found in general that HLSL has become a bit of a mess with the DX10 additions, since the docs mix DX9 and DX10-only material all over the place and don't indicate DX10-only functionality well at all. The HLSL changes themselves are a bit messy too -- on the one hand, we have intrinsics like asint() and asuint(), and on the other hand we have the abomination GetRenderTargetSamplePosition(). In the end, I just ended up chucking the HLSL docs and using NVIDIA's Cg specification, which is much more thorough and exact and contains precise rules for type conversions and function overload resolution.

A dream I have is to polish this up and integrate it into VirtualDub itself as a standard "custom filter" feature. Obviously it would need some streamlining -- the technique/pass/sampler abstraction from D3DFX is extraneous for most purposes -- but I think it's a fairly powerful mechanism, and the vector nature of HLSL makes it concise compared to straight r/g/b scalar expressions. There are a lot of other things I might want to play around with that aren't strictly necessary for that, too, such as:

As usual, though, I have a lot more ideas than time, especially since I still have to maintain and evolve the main program. I honestly don't know how much more work I'll put into this or when I might get bored enough again, but if you try out this filter and have some comments or ideas on how you'd like it to evolve, I'm all ears. I figure I've gotten far enough on this that I might as well try to make it useful.

Detailed changelist after the jump.

Changelist:

Comments

Comments posted:


I see you process one pixel at a time for the software JIT code path. This works great in scalar mode, it could become less than ideal once you implement a SSE/SSE2 JIT as you need to deal with swizzling and masking for pretty much every individual opcode. Also, stuff like 'add r0.g,r1.g,r2.g' will stay a scalar operation in SSE unless you imlement the merger you talk about.

One solution is to sample 4 pixels at a time, transpose them (_MM_TRANSPOSE4_PS), process them and transpose them again before writing the result back to memory. That way you can do the same exact same operation on 4 reds, 4 greens etc. using SSE without having to worry about swizzling and masking. There will be no need anymore to merge vector instructions by components. Also, sampling 4 pixels at a time can be greatly optimized with SSE assembly or intrinsics. All of this might be a major undertaking with your current architecture though.

Inforhix - 01 10 08 - 20:32


Yup, I'd considered that possibility, but hadn't gotten to it yet. The slowest parts of the generated code aren't actually the ALU ops. In practice, the texture fetch itself is expensive, and if you have any log or exp ops, those easily dominate the execution time by far. rcp/rsq are also expensive, although those are much faster if reduced precision is allowed and they are done in SSE.

Actually, if you think about it, there is not much difference between a scalar JIT and an SOA4 vector JIT, other than the added complexity in the load and store paths. The SOA4 JIT basically does the same ALU operations as the scalar JIT on parallel 4-vectors. If you look at the way that the filter works, it's actually structured as follows:

- Compile HLSL to shader asm.
- Optimize shader asm (copy propagation, dead store elimination).
- Convert shader asm to scalar form.
- Optimize scalar asm (copy propagation, dead store elimination, register allocation).
- Convert scalar asm to x86+x87.

All that needs to happen to implement SOA4 SSE/SSE2 mode is just to replace the back-end. The same is true for x64.

Phaeron - 02 10 08 - 02:55


And how about using GEGL? The Gimp seems to enjoy that...

Mitch 74 - 02 10 08 - 09:19


An important criterion I have for this is that it needs to be compatible with a real 3D API so hardware acceleration is straightforward. GEGL doesn't appear to be amenable to this. Right now, I'm getting close to parity with Direct3D PS2.0, which makes things a lot more interesting.

Phaeron - 04 10 08 - 04:01


Whoops, forget about it - it was a bad link trying to link GEGL and OpenGL's GLSL, but it actually won't work (due to lack of precision in data returned by GLSL).

Mitch 74 - 04 10 08 - 04:50


Don't see why it wouldn't, actually, since any graphics card that supports shaders supports at least 8 bits/component, which is enough for baseline graphics and video work. Most PS2.0 graphics cards support half float (16F) and full float (32F) to some extent, too. I imagine a naive translation of GEGL to GLSL would create a lot of passes, though.

Phaeron - 04 10 08 - 14:07

Comment form


Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
Name:  
Remember personal info?

Email (Optional):
Your email address is only revealed to the blog owner and is not shown to the public.
URL (Optional):
Comment: /

An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam.



Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.