Current version

v1.10.4 (stable)


Main page
Archived news
Plugin SDK
Knowledge base
Contact info
Other projects


Blog Archive

FPO and a callee-pops parameter passing convention makes perfect stack walks impossible

There's a bit of discussion over at Larry Osterman's blog about the Frame Pointer Omission (FPO) optimization in the Visual C++ compiler and how it affects stack walking, which I've been participating in. I figured I'd expound it a bit more here.

The basic problem to be solved when doing a stack walk is finding the locations of return addresses in the stack, which are also the locations of the stack pointer upon entry to each function in the call stack. If you can somehow determine how much local data is present at each stack frame, you can maintain a virtual stack pointer and hop from stack frame to stack frame until the call stack is determined. On x86, the steps involved are generally as follows:

  1. Obtain the instruction pointer (EIP) and the stack pointer (ESP) of the thread.
  2. Look up the current virtual EIP in debugging information to determine the current function.
  3. Obtain the base of the stack frame, either by reading the frame pointer on the stack, or offsetting ESP if there is no frame pointer. This is now the new virtual ESP.
  4. Read the return address into the virtual EIP.
  5. Go to step 2.

The trick is trying to determine the base each stack frame. When EBP frame pointers are present, this is easy -- just keep following the saved frame pointers next to each return address. What's not so easy is the FPO case, where ESP is used directly, because the offset from ESP to the return address depends on how much local variable space is allocated, and how many parameters for called functions are present.

I claim that it is impossible to reliably stack walk in the general case with the __stdcall or thiscall calling convention and FPO involved -- even with full debugging information! And no, the code doesn't have to be that weird.

Consider the following function disassembly:

  00000000: 8B 01              mov         eax,dword ptr [ecx]
  00000002: 6A 02              push        2
  00000004: 6A 01              push        1
  00000006: FF D0              call        eax
  00000008: FF D0              call        eax
  0000000A: C3                 ret

What would be the appropriate debug information for this function? Ideally, you would want to encode an ESP-to-return-address offset for each instruction, so based on the instruction pointer you could unambiguously determine the offset from every possible instruction that could crash. In some cases, you wouldn't even need to encode this information, if you could walk the instruction stream and update a virtual ESP based on executed instructions. This is frequently possible with compiler-generated code, since the compiler uses well-defined and simple patterns to maintain the stack. This is often done with RISC CPUs that have very easy to parse instruction streams. It's also done on X64, with the help of restrictions on prolog/epilog code and unwind bytecode. X86, however, has neither of these advantages.

Let's say the second CALL instruction in the above code crashes, due to EAX=0 -- which would mean that the first function call returned a null pointer. What would the proper offset to add to ESP to get to the return address? You can't tell from the called function, because the call is indirect and you don't know which function was called.

If you answered 0, you were wrong. If you answered 8, you were wrong. In fact, no matter what value you picked, you would be wrong.

Here's the source code that produced the above machine code, when compiled with Visual Studio 2005 SP1 at /O2:

typedef void *(__stdcall *StateFunction0)();
typedef void *(__stdcall *StateFunction1)(int a);
typedef void *(__stdcall *StateFunction2)(int a, int b);
struct IState { virtual void RunState() = 0; };
struct State0 : public IState { virtual void RunState(); StateFunction0 fn; };
struct State1 : public IState { virtual void RunState(); StateFunction1 fn; };
void State0::RunState() { ((StateFunction2)fn())(1, 2); }
void State1::RunState() { ((StateFunction1)fn(1))(2); }

(The unusual casting is due to C++'s inability to create a recursive typedef. Returning a function pointer as a void* is common when programming state machines, for this reason.)

You might say, aren't there two methods there? Yes, but they compile to the exact same code, and the Visual C++ linker will collapse two functions that have the same code even if they do completely different things. Essentially, the correct ESP offset at the second CALL instruction can be either +8 or +4, depending on whether State0::RunState() or State1::RunState() was executing. Both of these are implementations of the same virtual call on the same interface, so knowing the parent function doesn't help; the only way you could tell is by examining the type of this by checking the vtable pointer, and unfortunately after the first CALL instruction this is no longer available (ECX is a volatile register in the thiscall calling convention). I'm pretty sure that this is unsolvable in the general case except by knowing the entire execution history of the program up to this point.

Moral of this story: Callee-pops calling conventions are absolutely evil with regard to accurate stack walks.


This blog was originally open for comments when this entry was first posted, but was later closed and then removed due to spam and after a migration away from the original blog software. Unfortunately, it would have been a lot of work to reformat the comments to republish them. The author thanks everyone who posted comments and added to the discussion.