After thinking a bit about the SSE4.1 problem, I decided that the best route for taking advantage of SSE4.1 would be through external assembly, and that the best way to do that was to ditch MASM. There are various workarounds to try to kludge MASM into assembling alternate opcodes, but after being unable to quickly find a workable SSE4.1 macro set and after a failed attempt to make one myself (as well as rediscovering the mess that is MASM's macro facilities), I decided to go for plan B. I'd been wanting to do this for a while for a few reasons. One reason is the half hearted way in which Microsoft has been maintaining MASM, including:
- Making trivial and undocumented breaking changes to MASM that keep requiring me to modify my assembly language code, like changing the parameter sizes for the memory argument of MOVD and MOVQ.
- Chopping functionality out wholesale in the ML64, the AMD64 (X64) version.
- Sorely incomplete documentation in MSDN. (OPATTR returns a low byte identical to .TYPE. For information on .TYPE, see OPATTR.)
The other reason is MASM's availability, since it's only normally available in Professional Edition or higher. Oh, but you can sometimes get a download for non-commercial use, when they remember to update it, and you can grab it out of the Windows SDK, but you have to make sure not to mix up the bin paths since the SDK compiler isn't always compatible with the VC++ headers. Ugh. MASM is the only thing that prevents the 32-bit version of VirtualDub from being built with VC++ Express, a restriction which I've wanted to lift even though I don't use Express myself for that purpose.
So I spent some time last night porting all of my assembly language to YASM.
The reason I chose YASM is that it has support for x64 Windows, and more importantly, for VC-compatible line number debugging information. I had looked at NASM before, but dropped the idea due to the lack of debug info support. YASM, on the other hand, does, and it looks like a lot of other work has been put into making it VC friendly, such as adding support for emitting errors in VC compatible form. YASM shares NASM syntax, and while it's far closer to Intel syntax than GAS syntax is, it's unfortunately gratuitously different enough to make the conversion non-trivial. I was able the do a lot of the conversion with an ugly Visual Studio macro (why oh why do I have to use Visual Basic?!?), but I still had to fix up a lot of assembly by hand. Among the changes I had to do:
- .model flat, .code, .686, .mmx, .xmm: bye bye, gone.
- PROC doesn't exist, so all instances of "_foo proc public" had to change to "global _foo / _foo:" and all other labels had to become local labels.
- "ptr" is forbidden: dword ptr [eax] -> dword [eax]
- Had to add brackets around all absolute memory accesses: mov eax, foo -> mov eax, [foo]
- xmmword -> oword
- Structures and macros had to be rewritten, although I have to say that NASM/YASM macros make a lot more sense.
- Commented out frame pointer omission (.FPO) statements, since YASM doesn't support emitting FPO debug records.
It took me about four hours to convert all 700K of assembly language, after which I actually did end up with a working build of VirtualDub. I even got it hooked in cleanly via a custom build rule. So far, so good. However, as you might have guessed, that means I now have an enormous test coverage problem, because that 700K of asm covers about fifty different features in VirtualDub and only a small fraction of them have test cases. Worse yet, there are about five different CPU levels involved (scalar, MMX, SSE, ISSE, SSE2). Writing all of the test cases necessary to get complete coverage would probably take a very long time, so for now what I'm probably going to do is just do a DUMPBIN /DISASM on both the MASM and YASM based builds and see if I can do verification by automated diff.
Haven't actually gotten to writing any SSE4.1 code yet, but I'm getting there....