A while ago, I introduced a video filter for VirtualDub that compiles a subset of High Level Shader Language (HLSL) and thus lets you write simple video filters without needing to use C++ or another native code language. Well, after digging around in WPF's pixel shader emulator and after a couple of feature requests, I had the urge to revisit my compiler, and so I present version 1.1:
The main change was adding support for several critical ps2.0 instructions, notably abs, cmp, log, exp, frc, min, and max. This allowed me to greatly expand the list of supported intrinsics, as well as adding relational and ternary operators. The last one is particularly important as I don't support if() right now, so they're the only way you can do branching. Technically, it's syntactic sugar, as the way you implement branching in ps2.0 is by evaluating all the paths and doing a cmp instruction at the end to merge the results, but I would like add if() support at some point. That aside, the shader engine should now support enough functionality that you can now do most pixel processing that you'd do in a real ps2.0 hardware shader on a GPU, and more importantly, ones that are actually useful for video. I've added example sRGB-to-linear and linear-to-sRGB functions to the preset list, if anyone wants to play around with linearizing algorithms.
One of the things I realized while working on this again, by the way, is the inadequacy of the HLSL documentation. As an example, one of the issues I ran into was how to handle argument matching for intrinsics that have overloads. Well, Microsoft didn't document that. Nor did they document the rules for conversions between scalar/vector/matrix types, as all I could find was this statement on casts:
An expression preceded by a type name in parenthesis is an explicit type cast. A type cast converts the original expression to the data type of the cast. In general, the simple data types can be cast to the more complex data types (with a promotion cast), but only some complex data types can be cast into simple data types (with a demotion cast).
That's quite inadequate, particularly since HLSL's behavior with regard to promoting matrices is to zero extend. This can lead to counterintuitive behavior when a 4x3 or 3x4 matrix is implicitly promoted to 4x4 with a zero in the (3,3) entry and doesn't work as intended. I've also found in general that HLSL has become a bit of a mess with the DX10 additions, since the docs mix DX9 and DX10-only material all over the place and don't indicate DX10-only functionality well at all. The HLSL changes themselves are a bit messy too -- on the one hand, we have intrinsics like asint() and asuint(), and on the other hand we have the abomination GetRenderTargetSamplePosition(). In the end, I just ended up chucking the HLSL docs and using NVIDIA's Cg specification, which is much more thorough and exact and contains precise rules for type conversions and function overload resolution.
A dream I have is to polish this up and integrate it into VirtualDub itself as a standard "custom filter" feature. Obviously it would need some streamlining -- the technique/pass/sampler abstraction from D3DFX is extraneous for most purposes -- but I think it's a fairly powerful mechanism, and the vector nature of HLSL makes it concise compared to straight r/g/b scalar expressions. There are a lot of other things I might want to play around with that aren't strictly necessary for that, too, such as:
- improving the vector optimizer: a big problem is that it currently can't split or merge vector instructions by component
- improving the scalar optimizer: better constant folding, common subexpression elimination, loop invariant hoisting, simple arithmetic transformations
- adding register allocation to the existing x87 JITter (it always spills to memory right now)
- adding an SSE/SSE2 JITter
- adding the rest of the HLSL intrinsics
- supporting sampler states, particularly filtering
- supporting inline assembly
- merging vdshader with the GPU accelerated filter
- reading and using raw Direct3D ps2.0 compiled bytecode
- direct YCbCr support
As usual, though, I have a lot more ideas than time, especially since I still have to maintain and evolve the main program. I honestly don't know how much more work I'll put into this or when I might get bored enough again, but if you try out this filter and have some comments or ideas on how you'd like it to evolve, I'm all ears. I figure I've gotten far enough on this that I might as well try to make it useful.
Detailed changelist after the jump.(Read more....)
VirtualDub 1.8.6 is out and is a stable release containing bug fixes for issues reported by users. Notable bug fixes include errors handling audio in NTSC DV type-1 files, several crashes, and a few glitches in batch mode (job control).
Those of you on the forums know that I've been pushing out experimental features as "1.8.X2" test releases. Chances are at this point that I will rename that to 1.9.0, because there is enough in it that I wouldn't want to pollute the 1.8.x branch in case there are enough fixes to warrant 1.8.7. After that I'm kind of screwed with respect to major version numbers, although I guess I'll deal with that when I get there.(Read more....)
Windows Presentation Foundation (WPF) gained an interesting feature in .NET Framework 3.5 SP1, which is the ability to execute pixel shader effects in software via a just-in-time (JIT) compiler. Issues with introducing features in service packs aside, this is a cool addition, since it allows the same pixel shader code to run on the GPU and the CPU with reasonable performance on the latter. It's certainly better than the old effects system, which only supported software mode and required you to write a custom routine in a separate DLL instead.
Of course, being a sort of graphics guy but not a .NET kind of person, I had to dig into the shader jitter....(Read more....)
I cringe whenever I see people implement YCbCr to RGB conversion like this:
y = 1.164 * (y - 16.0 / 256.0);
r = y + 1.596 * (cr - 0.5);
g = y - 0.813 * (cr - 0.5) - 0.391 * (cb - 0.5);
b = y + 2.018 * (cb - 0.5);
What's wrong with this, you say? Too slow? No, that's not the problem. The problem is that the bias constants are wrong. The minor error is the 16/256 luma bias, which should be 16/255. That's a 0.02% error over the full range, so we can grudgingly let that slide. What isn't as excusable are the 0.5 chroma bias constants. If you're working with 8-bit channels, the chroma center is placed at 128, which when converted to float is 128/255 rather than exactly one-half. This is an error of 0.5/255, which would also be barely excusable, except for one problem. The coefficients for converting chroma red and chroma blue are greater than 1 in magnitude, so they'll actually amplify the error. If you're converting from 8-bit YCbCr to 8-bit RGB, this basically guarantees that you'll frequently be off by one in one or more components. Even more fun is that the green channel errors won't coincide with the red and blue errors and will be in the opposite direction, which then leads to ugliness like blacks that aren't black and have color to them.
In other words, please use 16/255 and 128/255 when those are actually where your luma and chroma values are based.
(I have to confess that the reason this came to mind is that I found this issue in some prototype code of mine when I started actually hacking it into shape. Needless to say, the production code doesn't have this problem.)
You might be thinking that it's a bit weird to be doing a conversion on 8-bit components with floating point, and you'd be correct. The place where this frequently comes up is in 3D pixel shaders. A pixel shader is a natural place to do color conversion, and is also where you'd frequently encounter 8-bit components converted to floating point. Unlike a CPU-based converter, however, there is basically no extra cost to using the correct constants. The only time you'd be better off using 0.5 in a shader instead of 128/255 is if you're on a GeForce 3, in which case (a) you're already skating on thin ice precision-wise and (b) the hardware is 9-bit signed fixed point so you're going to get 128/255 anyway. Otherwise, it's kind of sloppy to be displaying 720p video on-screen with a shader that doesn't get colors right just because someone was too lazy to type in the right constants.(Read more....)