Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Contact info
Forum
 
Other projects
   Altirra

Search

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ Eight general purpose registers on x86

An x86 CPU has eight main registers in its scalar register file in 32-bit mode: EAX, EBX, ECX, EDX, ESI, EDI, EBP, and ESP. All of these have various special uses, but of them, the eighth, ESP, has the most special status as the stack pointer.

I did say [i]eight[/i] general purpose registers.

It is possible, in some cases, to temporarily reuse ESP as a general purpose register. Since the x86 architecture is so register-starved, adding an eighth register can eliminate the need to spill variables to memory and boost the speed of a critical inner loop. I've used this in a couple of places in VirtualDub when I really needed it, and some have asked if it is actually safe to do this. The answer is yes.

How to do it

Simply reusing the stack pointer register is easy, as you just modify and use ESP directly. You do have to save and restore it, unless you expect to run for all eternity with no stack (good luck). The easiest way to do this is to just use a global variable:

mov stacksave, esp
...
mov esp, stacksave

Forget about using any high-level language within this scope  you will be writing the body in assembly language, whether it's inline assembly or external.

This works, but has the serious disadvantage of making your routine non-reentrant, as only one executing instance of the function can save its stack. (This is really only a problem for multithreaded scenarios, because if you are reusing the stack pointer, reentrancy is probably not a problem.) One way around this is to stash it in an MMX register, but that assumes you have MMX and don't need one of the registers. What I do on Win32 is to stash the stack pointer in the Structured Exception Handling (SEH) chain instead:

push 0
push fs:dword ptr [0]
mov fs:dword ptr [0], esp
...
mov esp, fs:dword ptr [0]
pop eax
pop fs:dword ptr [0]

The SEH chain is a linked list of active exception handling scopes in the current thread; it is used both for C++ exceptions and for system exceptions, and is pointed to by the first location in the thread environment block (TEB). The TEB is in turn pointed to by the FS: selector. We link a dummy node into the SEH chain to hold the stack pointer, and since there is a unique TEB for each thread, this allows our routine to run concurrently on multiple threads.

Now you can reuse ESP.

Aren't you screwed if an interrupt occurs?

Those of you who have programmed in DOS are likely squirming at this point about the possibility of interrupts. Ordinarily, reusing the stack pointer like this is a really bad idea because you have no idea when an interrupt might strike, and when one does, the CPU dutifully pushes the current program counter and flags onto the stack. If you have reused ESP, this would cause random data structures to be trashed. In this kind of environment, ESP must always point to valid and sufficient stack space to service an interrupt, and whenever this does not hold, interrupts must be disabled. Running with interrupts disabled for a long time lowers system responsiveness (lost interrupts and bad latency), and isn't practical for a big routine.

However, we're running in protected mode here.

When running in user space in Win32, interrupts do not push onto the user stack, but onto a kernel stack instead. If you think about it, it isn't possible for the user stack to be used. If the thread were out of stack space, or even just had an invalid stack, when the CPU tried to push EIP and EFLAGS, it would page fault, and you can't page fault in an interrupt handler. Thus, the scheduler can do any number of context switches while a no-stack routine is running, and any data structures that are being pointed to be ESP will not be affected.

There is one case where the OS will try to push data onto the invalid stack, and that is if an exception occurs. The most likely exception is an access violation (C0000005). The good news is that this will never happen for several reasons:

The bad news is that violating any one of these is enough to make the application toast when an exception occurs on that thread. Whenever an exception unwind fails like this, Windows will simply kill the task outright and the application disappears. No unhandled exception handler, no crash dialog, nothing. This means that you better make very sure that your routine is debugged before you ship it out to users, because no amount of in-process exception trapping is going to be able to fire a report if the routine dies. The good news is that a debugger can intercept such exceptions before the OS tries to resolve it in-thread, so you can still debug a stackless routine without problems.

Comments

Comments posted:


And all this for just one more register to use?

Murmel - 23 01 06 - 15:00


Sure! By the time you use one for the loop counter, a second one for the destination pointer, a third for a lookup table, and a fourth for temporary ops, you've only got three other registers available for source pointers and other arithmetic. Considering that you also have to pipeline in software to try to help the hardware scheduler, that's pretty tight. An extra register definitely helps.

Eight more registers helps even more, which is one reason that AMD64 can result in faster code even if the 64-bit arithmetic is not used.

Ordinarily some of these issues can be avoided by dynamically generating code so that some of the data pointers are encoded as relative offsets from other pointers or even just hardcoded addresses. Problem is, any immediate or displacement values outside of +/-32K can require more than one trace cache entry and slow down the front end, so this isn't always as attractive on a P4 as on other CPUs.

Phaeron - 24 01 06 - 02:17


This could probably be done on relatively newer Linux systems, too. Stash ESP in TLS (same concept, but gs instead of fs, and allocate the space with __thread), and set up a separate signal stack with sigaltstack(). I'm not sure if there are any other issues.

Glenn Maynard - 28 01 06 - 20:02


And if I use VirtualDub with Windows Vista. Can you guarantee that this 'evil hack' will still work? If not - What are the consequences? Will my whole system just die or will there be naked Bills dancing all over my desktop screaming at me?

Murmel - 30 01 06 - 16:23


I'm in the beta program for Windows Vista and haven't seen this to be a problem. The technique has been around for a long time, at least back to the time of the Pentium, and has been used in a lot of shipping products, so I would think that Microsoft can't afford to break the technique at this point.

As for Bills running all over your desktop, you probably won't get those unless you run the Windows port of xbill.

Phaeron - 31 01 06 - 00:49


Seriously Dude, you should go work for AMD/INTEL or something. Better Still go work for Microsoft. With your knowhow of this stuff maybe my computer will stop crashing whenever i try to install x64 (Those Bloody Microsoft programers dont know shit...).
BTW - What are your opinions of Visual Studio 2005 in terms of the compiler and inline assembly ? I remember some time ago you were bitching about how it generated inefficient code and stuff.

dre - 01 02 06 - 17:46


"When running in user space in Win32, interrupts do not push onto the user stack, but onto a kernel stack instead. If you think about it, it isn't possible for the user stack to be used."

Unless a privledge transition is not done, and it is possible that the code running at interrupt time is in the same privilege level as the interrupt code, but in Windows NT, all of them are in kernel mode. Also don't forget CS in the list of registers pushed on an interrupt.

Yuhong Bao - 30 07 06 - 23:26


Probably so -- I haven't written an OS kernel -- but you would still have to page-lock the stack. Also, using a user-space stack would be a security hole, given that you could twiddle the return address from another thread on a different CPU while the interrupt was running.

Phaeron - 03 08 06 - 12:42

Comment form


Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
Name:  
Remember personal info?

Email (Optional):
Your email address is only revealed to the blog owner and is not shown to the public.
URL (Optional):
Comment: /

An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam.



Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.