¶SSE warning in 1.7.0
Many of you have reported a warning like this showing up in the experimental 1.7.0 release:
[E] Internal error: SSE state was bad before entry to external code at
(.sourcew32videocodecpack.cpp:670). This indicates an uncaught bug
either in an external driver or in VirtualDub itself that could cause
application instability. Please report this problem to the author!
(MXCSR = 00009fc0)
In all of the cases I've seen, this warning is harmless and can be ignored. I'll be tweaking the codebase in 1.7.1 to prevent these from appearing in such circumstances.
From what I can tell, the reason for the warnings is that modules compiled with Intel C/C++ with fast math optimizations enabled causes the runtime to flip the Flush to Zero (FTZ) and Denormals are Zero (DAZ) bits on in the CPU's SSE control register. SSE stands for Streaming SIMD Extensions and refers to a streamlined vector math instruction set added starting with the Intel Pentium III and AMD Athlon XP CPUs. In my opinion, the runtime really shouldn't be flipping these settings, because those settings affect math precision in other code running in the thread, and I'm pretty sure it's against the Win32 calling convention to call external code with those bits set. It's definitely against the Windows x64 calling convention, which explicitly defines them as part of nonvolatile state and as normally disabled. Nevertheless, it appears that there are several video codecs and video filters that are compiled in this manner, and thus trip the problem.
The reason I added this check in 1.7.0 is due to an additional vulnerability that I gained when moving to Visual Studio 2005, which is sensitivity to SSE math settings. There has always been a problem with third-party DLLs screwing around with x87 floating point state, like changing precision and unmasking exceptions, which causes all sorts of mayhem such as crashes and calculations going haywire. For this reason, VirtualDub monitors the x87 state in and out of all external calls to video codecs and filters, and fixes the x87 state whenever it detects a violation. However, it didn't check the SSE control register, because it didn't use any SSE math.
1.7.0 is different, however, because starting with Visual Studio .NET (2002), the C runtime library will use SSE2-optimized versions of math functions when possible, such as pow(). These implementations use sequences of primitive operations (add/multiply/etc.) instead of microcoded transcendental math instructions in the FPU. This is often faster, but an unfortunate side effect is that it can be a lot more inaccurate when the rounding mode in the FPU is inappropriately set. Shortly before shipping 1.7.0, I discovered that the resize video filter had a long-standing bug where it would switch the FPU from round-to-nearest-even to truncate, and not restore it properly. This was harmless in 1.6.x because the filter code auto-fixed the x87 state, but in Visual Studio 2005 the _control87() function also changes the SSE state. As a result, the levels filter started showing speckling errors which I tracked down a bizarre result of something like pow(0.997, 0.998) = 1.001, which in turn was caused by the bad rounding mode. Thus, after fixing the resize filter, I added code to check for and fix the analogous SSE violations. Unfortunately, I didn't have a video filters or codecs installed that were compiled with Intel C/C++ aggressive optimization settings, so I missed the warning problem. There was also a bug in the startup code which caused the SSE check to be enabled too late, so any video filters which tripped this problem showed up as an internal error instead of properly tagging the violator.
FTZ (flush-to-zero) and DAZ (denormals-are-zero) are flags which, when set, tell the FPU to allow slight violations of IEEE math rules for faster speed. The numbers in question are denormals, which are really tiny numbers that are so small that they are missing the leading 1 bit normally implicit in IEEE-encoded floating point numbers; for a float, these are smaller in magnitude than about 5.8*10^-39. The FPU normally handles these special cases by grinding the pipeline to a halt and executing special microcode. Most applications won't need the additional accuracy provided by denormals, though, so setting these bits can increase performance slightly by reinterpreting the tiny numbers as zero. It's not that huge of a deal on the x86 architecture because microcoded execution is still hardware support, whereas on some RISC CPUs denormals actually cause a trap to a software emulation handler, which is thousands of times slower than the hardware unit.
The change in accuracy caused by enabling FTZ and DAZ is very minor compared to flipping precision or rounding modes; I was unable to find any computations not involving denormals which were affected by their absence. As a result, 1.7.1 will simply ignore those bits in the external code check.