AMD64 and frustrations of an assembly language programmer

¶AMD64 and frustrations of an assembly language programmer

I recently discovered a bug in current versions of VirtualDub that manifests itself as occasionally massive audio sync errors when working with large frame start offsets, at least frame 60000 or so. It is caused by an overflow in the fraction scaling routines, which are used to compute the audio offset since AVI sample rates are 32:32 rational fractions. Whether or not you hit this bug depends on how reduced the fraction is in the AVI file; most files have small numbers here and thus the bug requires a very large frame offset, but it is rarely possible to encounter it in a 1hr+ file if you have a really large or unnormalized fraction. This is being fixed for the future 1.6.6 version by adding 96-bit intermediate arithmetic. What is annoying, however, is that the lack of both sufficient intrinsics and inline assembly support in the AMD64 compiler forces a rather silly assembly language function:

VDFractionScale64 proc public
    mov  rax, rcx
    mul  rdx
    div  r8
    mov  [r9], edx
    ret
VDFractionScale64 endp

For some reason, the VS2005 compiler for AMD64 has intrinsics to do 128-bit multiply and shift operations, but no add, subtract, or divide operations. Gee, thanks.

On the good side, the AMD64-compatible version of the Platform SDK is out for public download!

http://www.microsoft.com/downloads/details.aspx?FamilyID=d8eecd75-1fc4-49e5-bc66-9da2b03d9b92&DisplayLang=en

What's in the Platform SDK for AMD64

First of all, the Windows Server 2003 SP1 SDK does include the optimizing x64 compiler. It is version 14.00.40310.41, which is much newer than the DDK compiler; however, it appears to be older than the Visual Studio 2005 Beta 1 compiler, which is build 14.00.40607.16. The beta 1 compiler was rather stable and expect the PSDK compiler to be the same. 64-bit MASM (ML64) is also included (yay!), and of course, LINK. DUMPBIN is not included, but this is just an alias for LINK /DUMP anyway, and even if that weren't the case, surprisingly the VC7.1 DUMPBIN is capable of disassembling AMD64 object files. Sadly, NMAKE is not included. Can't give away everything, I suppose.

Far more interesting, though, is a little file in the Bin\win64\x86\AMD64 directory called "SWConventions.doc." This documents the Application Binary Interface (ABI) for x64 Windows, and is finally a real specification, not a handful of scattered pages stuck deep in MSDN Library. (Try reading the docs for Structured Exception Handling in the Core SDK sometime; they're a sad joke.) Among the interesting parts of this document, relative to the previously available kernel ABI docs:

x87 and MMX registers are now officially preserved across context switches and sanctioned in userspace; except for FPUCW, they simply have no calling convention defined and the compiler doesn't touch them. Which, for us asm coders, is perfect. :)
All 128 bits of XMM6-XMM15 must now be preserved, not just the low 64 bits. This invalidates my trick of stacking XMM registers with MOVLHPS/MOVHLPS, which I think was illegal from an exception handling standpoint anyway.
Tentatively, all functions must begin with an instruction that is at least two bytes, and have at least six bytes of unused space available before them. If the function is a nested function that begins with a PUSH, the instruction must be the two byte (modrm) form, and if it is a leaf function, a NOP must be used. The reason for this is so that the first instruction can safely be hot-patched by a short jump backwards to a long jump, which in turn jumps to a replacement function. This is really cool, as hotpatching in this manner is tougher and unsafe on x86, due to the possibility of a function starting with a one-byte instruction followed by another one-byte instruction that is a branch target.
There is a ksamd64.inc file that declares MASM macros for easier authoring of exception-safe prolog and epilog code in assembly language.
malloc() must return 16-byte aligned memory. This means _aligned_malloc() is no longer necessary for objects containing SSE 4-vectors (yay!).

It's too late to do so tonight, but I hope to compile 64-bit VirtualDub on the new PSDK tomorrow. The DDK includes and compiler are somewhat scrambled and it'd be nice to back out some of the hacks.

5 comments | Apr 27, 2005 at 03:38 | default

Current version

Navigation

Archives

¶AMD64 and frustrations of an assembly language programmer

Comments