A while ago, I introduced a video filter for VirtualDub that compiles a subset of High Level Shader Language (HLSL) and thus lets you write simple video filters without needing to use C++ or another native code language. Well, after digging around in WPF's pixel shader emulator and after a couple of feature requests, I had the urge to revisit my compiler, and so I present version 1.1:
The main change was adding support for several critical ps2.0 instructions, notably abs, cmp, log, exp, frc, min, and max. This allowed me to greatly expand the list of supported intrinsics, as well as adding relational and ternary operators. The last one is particularly important as I don't support if() right now, so they're the only way you can do branching. Technically, it's syntactic sugar, as the way you implement branching in ps2.0 is by evaluating all the paths and doing a cmp instruction at the end to merge the results, but I would like add if() support at some point. That aside, the shader engine should now support enough functionality that you can now do most pixel processing that you'd do in a real ps2.0 hardware shader on a GPU, and more importantly, ones that are actually useful for video. I've added example sRGB-to-linear and linear-to-sRGB functions to the preset list, if anyone wants to play around with linearizing algorithms.
One of the things I realized while working on this again, by the way, is the inadequacy of the HLSL documentation. As an example, one of the issues I ran into was how to handle argument matching for intrinsics that have overloads. Well, Microsoft didn't document that. Nor did they document the rules for conversions between scalar/vector/matrix types, as all I could find was this statement on casts:
An expression preceded by a type name in parenthesis is an explicit type cast. A type cast converts the original expression to the data type of the cast. In general, the simple data types can be cast to the more complex data types (with a promotion cast), but only some complex data types can be cast into simple data types (with a demotion cast).
That's quite inadequate, particularly since HLSL's behavior with regard to promoting matrices is to zero extend. This can lead to counterintuitive behavior when a 4x3 or 3x4 matrix is implicitly promoted to 4x4 with a zero in the (3,3) entry and doesn't work as intended. I've also found in general that HLSL has become a bit of a mess with the DX10 additions, since the docs mix DX9 and DX10-only material all over the place and don't indicate DX10-only functionality well at all. The HLSL changes themselves are a bit messy too -- on the one hand, we have intrinsics like asint() and asuint(), and on the other hand we have the abomination GetRenderTargetSamplePosition(). In the end, I just ended up chucking the HLSL docs and using NVIDIA's Cg specification, which is much more thorough and exact and contains precise rules for type conversions and function overload resolution.
A dream I have is to polish this up and integrate it into VirtualDub itself as a standard "custom filter" feature. Obviously it would need some streamlining -- the technique/pass/sampler abstraction from D3DFX is extraneous for most purposes -- but I think it's a fairly powerful mechanism, and the vector nature of HLSL makes it concise compared to straight r/g/b scalar expressions. There are a lot of other things I might want to play around with that aren't strictly necessary for that, too, such as:
- improving the vector optimizer: a big problem is that it currently can't split or merge vector instructions by component
- improving the scalar optimizer: better constant folding, common subexpression elimination, loop invariant hoisting, simple arithmetic transformations
- adding register allocation to the existing x87 JITter (it always spills to memory right now)
- adding an SSE/SSE2 JITter
- adding the rest of the HLSL intrinsics
- supporting sampler states, particularly filtering
- supporting inline assembly
- merging vdshader with the GPU accelerated filter
- reading and using raw Direct3D ps2.0 compiled bytecode
- direct YCbCr support
As usual, though, I have a lot more ideas than time, especially since I still have to maintain and evolve the main program. I honestly don't know how much more work I'll put into this or when I might get bored enough again, but if you try out this filter and have some comments or ideas on how you'd like it to evolve, I'm all ears. I figure I've gotten far enough on this that I might as well try to make it useful.
Detailed changelist after the jump.
- More intrinsics supported: sin(), cos(), abs(), frac(), min(), max(), rsqrt(), sqrt(), cross(), log2(), exp2(), pow(), and lerp().
- More operators supported: ==, !=, <, >, <=, >=, &&, ||, and ?:.
- The execution engines now support nearly the full pixel shader 2.0 instruction set. The main notable exception is texkill, which doesn't make much sense in this context anyway. The dp2 instruction has been removed and replaced with a proper dp2add instruction.
- The execution engines now handle the entire horizontal pixel loop, including horizontal interpolator stepping.
- The vector-to-scalar engine has been updated to handle simple loop carried dependencies and can now eliminate unused interpolator components.
- The compilation output pane shows the output of all compilation phases (and yeah, I should clean this up).
- The V axis for textures has been flipped to match Direct3D's top-down convention.