§ ¶Beware of the CPU-specific optimizations
Every once in a while, I get a crash report whose diagnosis looks like this:
10116789: 0ff6c1 psadbw mm0, mm1 <-- FAULT
An integer SSE (Pentium III/Athlon) instruction not supported by the CPU was executed in module '****'...
...while decompressing video frame 0 with "********* Codec" [biCompression=********] (VideoSource.cpp:1772).
I blocked out the codec ID information so as to not single out a video codec manufacturer.
A crash like this usually means that the video codec you were attempting to use was compiled with CPU-specific optimizations that your CPU doesn't support. This generally means that your CPU is below minimum requirements for the codec. Unlike normal CPU requirements, missing instructions doesn't mean the codec will run really slowly -- it simply means the codec won't work at all. There's nothing I can do about this in VirtualDub; if you're seeing something like this and are indeed below the minimum spec, you need to either upgrade or beg the codec vendor to support your CPU (if it is actually fast enough to handle the video format).
Note that crashing here also means that the codec didn't properly check for the availability of special CPU instructions before attempting to use them. This is unwise from a customer support standpoint and I would encourage adding CPU detection code and an error dialog instead. One trap that a lot of coders fall into is that they attempt to use the Pentium Pro conditional move instructions (CMOVcc), assuming that they are available since no one will be using a CPU below 300MHz. Unfortunately, the AMD K6 series of CPUs don't support this instruction and are available at least as high as 400MHz, so this is a bad assumption. Similarly, Athlons exist as fast as 1GHz that don't support SSE. Also, you would be surprised how slow of a system people will try your code on; I got a crash report recently from someone who tried using modern video codecs on a Pentium without MMX!
Embrace the CPUID instruction. CPUID is your friend. If you are targeting the integer SSE instructions -- such as pshufw, psadbw, pavgb, pavgw, and movntq -- remember to check for either the SSE bit (for PIII and Athlon XP or higher) and the 3DNow! extensions bit (for the original Athlon).
In VirtualDub, I generally write scalar C versions of processing routines and then keep those around even after writing CPU-specific assembly optimized versions. I do this for two reasons: one is for compatibility with all CPUs, and another is so that I don't have to write the AMD64 version immediately. (The AMD64 compiler doesn't support inline assembly, which was a pain at least during the initial port.) The scalar code also serves as a reference test for the optimized code. VirtualDub queries for CPU capabilities at the start of an operation and automatically chooses the appropriate optimized routine.
The AMD K6 series reached 550Mhz. You can still buy a 533Mhz K6-II on Pricewatch.
Cyberia - 17 12 04 - 21:09
Not quite so simple. CPUID SSE detection is flawed on Athlon processors because the motherboard can disable SSE support.
A bit of a history lesson... Many years ago when 3Dnow was popular and SSE was the new kid on the block some programs had code paths for both instruction sets. In general most applications will use the newest instruction set, so if a processor supports SSE then that code path will be used, otherwise the program will check for 3dnow, then MMX+, MMX and gracefully fall back to that support level.
Some astute motherboard manufacturers found that on the newest Athlons the 3dnow path was faster than SSE in certain programs, and that they could get a small but noticeable improvement in the common benchmarks by disabling SSE in the CPU. In other words, they could appear to have a competitve advantage over other manufacturers by killing this feature set.
All of a sudden you end up in a situation where there are three classes of Athlon BIOS. Those that leave SSE alone at all times (enabled), those that provide a user option to toggle SSE support, and finally those that force SSE off and give no option to the user. In this case the only option is to use a program like WCPUID to enable it each and every time the PC boots. Many companies have support pages to cover this such as: http://www.adobe.com/support/techdocs/32..
Anyway this issue has been a pain for many developers big and small, not to mention pissing AMD off too. Fast forward to 2004 where 3dnow is dead (removed in 64-bit) and there is little reason for developers to implement it. So instead they just do an SSE path that should cover all processors for the last 4 or 5 years. And all of a sudden their support gets a flood of irate emails from people with brand new Athlon 2ghz+ machines that are being told they are missing features...
So the question you now have to ask yourself is how many Virtualdub users with Athlon CPU's have had their performance crippled over the past few years because they've been forced to use older code paths. Maybe you should add an "Always assume SSE" option to the program ;)
Ok maybe it's not that bad. You can ignore the feature bits and build your own tables based upon the CPU manufacturer/family/model/stepping. However I certainly understand why some companies have just put the onus on the end user to make sure they meet minimum specs.
anon - 19 12 04 - 02:24
Interesting, I didn't know this. How lame. Fortunately, VirtualDub has very little floating-point SSE code, which is where the SSE/3DNow conflict occurs; it has some code that uses the integer set, which CPUID will still flag as available as long as 3DNow Extensions is enabled, but that is only a minor improvement over the MMX path. Floating-point SSE simply has low throughput compared to integer MMX. In the end, I'm not too worried about it.
Btw, it isn't safe to assume SSE support just from CPU model identification, because the OS actually has to set a bit in the CPU to indicate support for context-switching the extra registers. You have to also actually execute an SSE instruction and possibly catch the exception. You can't just check CPUID or model ID because the OS may not have support for saving/restoring the registers (some of us aren't lazy and can still run on Windows 95), and you can't just test-execute an SSE FP instruction because older CPUs might simply mis-execute the instructions as they look like older instructions with a size override or REPZ/REPNZ prefix.
Phaeron - 19 12 04 - 02:47
Another example, BTW, is PAE. You'd think that, since all Intel processors since Pentium Pro supported PAE, CPUs that does not support PAE would be nowadays be completely obsolete. Except that AMD's CPUs did not implement PAE until the Athlon, and VIA's CPUs did not support PAE until VIA C7, and Transmeta's older Crusue CPU did not support PAE, only the newer Efficion did support PAE. In fact even AMD's own Geode GX and LX and versions of Intel's own Pentium M and Celeron M without NX support did not support PAE! I think NX was what pushed VIA and Transmeta to add PAE support into their CPUs. Also even if the CPU problem were solved, I am sure there are some buggy BIOSes that can crash when PAE is on. That is why most Linux distros does not make the PAE kernel the default just to take advantage of NX. Windows by default automaticly select a kernel based on whether PAE is going to be on or not, Linux can't do this. BTW, see this blog article for more info on the relationship between PAE and NX in Windows XP SP2 and later:
Yuhong Bao - 09 02 08 - 18:31
Also older versions of VMware and even current version of Microsoft Virtual PC and Parallels does not support PAE.
Yuhong Bao - 13 02 08 - 22:18
Please keep comments on-topic for this entry.
If you have unrelated comments about VirtualDub, the forum is a better place to post them.