An x86 CPU has eight main registers in its scalar register file in 32-bit mode: EAX, EBX, ECX, EDX, ESI, EDI, EBP, and ESP. All of these have various special uses, but of them, the eighth, ESP, has the most special status as the stack pointer.
I did say [i]eight[/i] general purpose registers.
It is possible, in some cases, to temporarily reuse ESP as a general purpose register. Since the x86 architecture is so register-starved, adding an eighth register can eliminate the need to spill variables to memory and boost the speed of a critical inner loop. I've used this in a couple of places in VirtualDub when I really needed it, and some have asked if it is actually safe to do this. The answer is yes.
How to do it
Simply reusing the stack pointer register is easy, as you just modify and use ESP directly. You do have to save and restore it, unless you expect to run for all eternity with no stack (good luck). The easiest way to do this is to just use a global variable:
mov stacksave, esp
mov esp, stacksave
Forget about using any high-level language within this scope — you will be writing the body in assembly language, whether it's inline assembly or external.
This works, but has the serious disadvantage of making your routine non-reentrant, as only one executing instance of the function can save its stack. (This is really only a problem for multithreaded scenarios, because if you are reusing the stack pointer, reentrancy is probably not a problem.) One way around this is to stash it in an MMX register, but that assumes you have MMX and don't need one of the registers. What I do on Win32 is to stash the stack pointer in the Structured Exception Handling (SEH) chain instead:
push fs:dword ptr 
mov fs:dword ptr , esp
mov esp, fs:dword ptr 
pop fs:dword ptr 
The SEH chain is a linked list of active exception handling scopes in the current thread; it is used both for C++ exceptions and for system exceptions, and is pointed to by the first location in the thread environment block (TEB). The TEB is in turn pointed to by the FS: selector. We link a dummy node into the SEH chain to hold the stack pointer, and since there is a unique TEB for each thread, this allows our routine to run concurrently on multiple threads.
Now you can reuse ESP.
Aren't you screwed if an interrupt occurs?
Those of you who have programmed in DOS are likely squirming at this point about the possibility of interrupts. Ordinarily, reusing the stack pointer like this is a really bad idea because you have no idea when an interrupt might strike, and when one does, the CPU dutifully pushes the current program counter and flags onto the stack. If you have reused ESP, this would cause random data structures to be trashed. In this kind of environment, ESP must always point to valid and sufficient stack space to service an interrupt, and whenever this does not hold, interrupts must be disabled. Running with interrupts disabled for a long time lowers system responsiveness (lost interrupts and bad latency), and isn't practical for a big routine.
However, we're running in protected mode here.
When running in user space in Win32, interrupts do not push onto the user stack, but onto a kernel stack instead. If you think about it, it isn't possible for the user stack to be used. If the thread were out of stack space, or even just had an invalid stack, when the CPU tried to push EIP and EFLAGS, it would page fault, and you can't page fault in an interrupt handler. Thus, the scheduler can do any number of context switches while a no-stack routine is running, and any data structures that are being pointed to be ESP will not be affected.
There is one case where the OS will try to push data onto the invalid stack, and that is if an exception occurs. The most likely exception is an access violation (C0000005). The good news is that this will never happen for several reasons:
- The exception handling routine in the dummy SEH node we pushed above is invalid (NULL).
- The OS will detect that the stack isn't within the stack bounds pointed to by the TEB.
- Even if it is within the stack bounds, the stack pointer is unlikely to be below the node that we pushed. If anything, it would be higher, to point to a parameter.
The bad news is that violating any one of these is enough to make the application toast when an exception occurs on that thread. Whenever an exception unwind fails like this, Windows will simply kill the task outright and the application disappears. No unhandled exception handler, no crash dialog, nothing. This means that you better make very sure that your routine is debugged before you ship it out to users, because no amount of in-process exception trapping is going to be able to fire a report if the routine dies. The good news is that a debugger can intercept such exceptions before the OS tries to resolve it in-thread, so you can still debug a stackless routine without problems.