I spent part of last weekend tracking down an annoying problem in 1.7.2's video display code. One of my current obsessions is field display in Windows -- now that I have a very small and convenient video capture device, it annoys me that most programs in Windows still display video as if it were progressive, which leads to a poor quality live display. For some reason, DScaler has abnormally high latency with my USB 2.0 device, so it's back to rolling my own. I also want to make use of 3D hardware acceleration, because (a) it's extremely CPU intensive to fill a 1920x1200 display at 60fps, and (b) I'm lazy and it's easier to experiment with pixel shaders than highly optimized SSE2 code.
(As I've said in the past, nearly all features in VirtualDub are tied to some sort of video game or anime series. The non-interlaced field display code got me through Lunar 2. Interlaced field display is for Valkyrie Profile 2.)
Now, the problem with doing 60 fps field display with 3D acceleration is that with a 60Hz refresh rate, you must hit every frame exactly, or at least close enough that the glitches are more than several seconds apart. This is very difficult when you take into account the need to avoid tearing, by not switching frames/fields in the middle of the screen. In windowed mode, this is very difficult. DirectX is lame and doesn't give you any sort of vertical blank event or interrupt -- well, actually, it's IBM's fault for reportedly making the VBI optional for VGA -- and so the only option is to poll. I tried just letting Direct3D do this with D3DPRESENT_INTERVAL_ONE, and not only did it do a poor job of avoiding the beam in windowed mode, but it burned up a lot of CPU time doing so and also blocked my message loop for unacceptable periods, which caused the latency on the DirectShow graph to skyrocket. So, I had to resort to another method.
What I ended up with was moving the entire display window to another thread, so that it could poll in peace at high priority. A persistent problem that kept cropping up here was the display thread taking 100% of the CPU, even though I had a MsgWaitForMultipleObjects() loop with a 1ms timeout. I tracked the problem down to that function constantly returning WAIT_OBJECTS_0, meaning that a message available, without there actually being one -- meaning that PeekMessage() was getting called in a tight loop. I hacked in a Sleep(1) as a temporary workaround, but then I had the weird problem of the UI becoming totally unresponsive even though the CPU was idle 80-90% of the time -- but still repainting. Even weirder, when I took the Sleep() out, VTune showed an abnormally high amount of time being spent in the kernel (ring 0) in functions like "win32k!xxxWindowHitTest."
It wasn't until I looked at the ReactOS and Wine source code that I discovered the culprit.
The problem was a WM_NCHITTEST handler I had put in to accommodate the cropping UI. The cropping UI needs mouse clicks to go through the display, so the display code returns HTTRANSPARENT so that all mouse input propagates to the parent window. There is a warning in MSDN saying that this only applies to windows within the same thread, and it turns out that returning HTTRANSPARENT when your parent is on a different thread is indeed a very bad idea. What happens is that Windows has problems determining which window "owns" the mouse message, and keeps bouncing it back and forth between the threads, resending WM_NCHITTEST to the transparent window each time. In Wine, this is apparently caused by a WindowFromPoint() call after the thread hop, which apparently doesn't return faithful results for transparent windows. Somehow in the real Windows this doesn't cause the threads to lock together, so the threads do idle, but the loop still blocks input messages, giving you a set of windows that repaints properly but doesn't respond to input. This also likely explains the phantom returns from MsgWaitForMultipleObjects(), probably caused by some sort of internal callback.
Removing the WM_NCHITTEST handler gave silky smooth 60Hz video, which freed me to solve some evil jumping puzzles in VP2. :)
The next problem I have to solve is trying to come up with a pixel shader that does better than bicubic interpolation with motion-detection-based weave/bob switching and gamma correction, but that's less enigmatic, at least.