§ ¶Why the Visual Studio debugger occasionally locks up the entire Windows GUI
A few days ago, I finally solved a mystery that had been annoying the heck out of me for years.
Ever since I moved to Windows XP, I had been seeing a weird problem where occasionally, when a program that I had been working on had crashed with an access violation, the Visual C++ 6.0 debugger would stop responding after I dismissed the exception dialog. Soon thereafter, nearly everything else would also lock up, except for the Alt+Tab popup and console windows (particularly command prompts). The GUI programs weren't completely dead, but they ran really slowly, to the point that I could wait over ten minutes just for Visual Studio to redraw. CPU load was not the problem, or else the laptop fans would have gone on. Killing the debuggee process didn't work; the command would go through, but nothing would happen. I only knew three solutions to the problem: spamming Shift+F5 into the debugger (took way too long), killing the debugger process using TASKKILL /F /PID (lost work), and logging off (took too long and lost work). Very frustrating.
At times I thought that DirectShow or Spy++ were to blame, since the problem seemed to occur more often when those were involved... but I couldn't nail anything down. I also thought that it was an issue with Visual C++ 6.0, since it seemed to happen less frequently with Visual Studio .NET 2003, but I had it happen on that version too. I even dragged out the kernel debugger at one point and hard broke into the system when it happened, but couldn't see anything out of the ordinary. So, basically, it was one of those seldomly occurring but intensely annoying bugs that I couldn't resolve.
Then... it happened with Visual Studio 2005. Target process, then debugger, and finally the whole system frozen. What was unusual this time was that the app that broke was HTML Help, since that's what I have VS launch when I compile VirtualDub's help file... and it hadn't crashed! By chance I thought attach NTSD to devenv.exe (ntsd -p -pv <pid>), which worked since NTSD is a console-mode app... and after dumping the thread stacks and running Sysinternals Process Explorer veeeerrrryyy sloooowwly I finally figured out what had been pissing me off all this time.
In short, the problem is caused by shared mutexes in Windows system DLLs.
The type of hang-up I ran into consistently turned out to be caused by that crappy Text Services Framework that comes with Office and Windows XP. It maintains a bunch of per-user, interprocess mutex objects with names like CTF.LBES.MutexDefaultS-1-5-21-790525478-1715562821-839522115-1003 to arbitrate access to shared memory structures. What happens is that one thread in the debuggee process happens to grab some of these mutex objects to draw text, and in the meantime, another thread hits an exception. All threads in the debuggee process, including the one holding the mutexes, are then suspended by the debugger. The debugger then decides to draw some text in its editor, and it hangs trying to get the mutexes... and other processes try to draw text, and they hang too. Except for the command prompts, which are handled by good old csrss.exe and apparently either don't use the same mutexes or the same framework. And all the rest of the processes just sit tight until the timeout on the mutex wait expires. Kill the debugger, and the problem goes away because that unblocks the debuggee, and when a thread is killed the NT kernel makes sure any mutexes it held are released.
The second hang-up was a bit more esoteric. HTML Help, the process I was debugging, needed to check the user Internet Zones permissions information before loading up the initial page in the help file it was viewing. To do this, it grabbed a mutex protecting the permissions data, which is held in a shared memory window. While this was happening, though, some DLLs were loaded into the hh.exe process -- possibly from another thread -- for which Visual Studio 2005 didn't have symbols. So it decided to contact the Microsoft public symbol server -- and instantly blocked on the same mutex trying to set up the HTTP query.
Working around the first one is easy: Disable Text Services Framework in Regional and Language Settings. You probably don't need it, and CTFMON.EXE is not particularly known for contributing to system stability anyway. The second one can be worked around by unchecking the Microsoft symbol server in VS2005's symbol server options after the DLL symbols have been downloaded; in that case, it will still check the local symbol cache, just not download new PDBs. This is a good idea anyway as the symbol server support has a habit of repeatedly trying to download symbols for DLLs that don't have public symbols or aren't even made by Microsoft.
Unfortunately, I don't see a good way to truly fix this problem; launching the application as a different user than the debugger should work since the mutexes involved seem to be user-specific so far, but I don't know of a good way to do that in Visual Studio. It'd be nice if programmers would stop using shared mutable memory like this, as it punches holes in the protected memory system with regard to isolating crashes, but somehow I think that otherwise they'd just add a service instead, which would be even worse. There are enough background tasks running on the average Windows system as it is.
The win32 API when it comes to the user and gdi parts is bloated like hell. This is what we get when Microsoft carried with the Windows 3.x API for compatiblity. There will be a time when win32 API will reach the point where un-maintanability, and I think we already saw that with the initial development of longhorn (vista). I hope that with vista they introduce a new windoing system separate from actual (the win32 one should seat on that). Looking at the Windows NT native API, it's very well designed API, soo they sould start from there.
Just my 2cents.
nugget - 09 07 06 - 10:23
Good detective job, Avery..
..yet I've always wondered what's with the startup delay when you have Event Log and Task Scheduler disabled.. but I guess I still have to polish my windbg skills :).
GrayShade - 09 07 06 - 16:31
Microsoft is trying to introduce a new windowing system... but only for .NET code. :(
Phaeron - 09 07 06 - 18:09
That windowing system (avalon) communicates heavily with USER. It is not as reliant on GDI. MSFT needs to trash GDI for new development and provide back-compat through virtualization.
nksingh - 10 07 06 - 04:40
In a way, they are doing that -- when Aero Glass is enabled, all GDI rendering drops to software rasterization. Unfortunately, they don't provide a viable replacement for accelerated 2D rendering in native apps, because the milcore API isn't published.
Phaeron - 10 07 06 - 04:53
The Speech hang is well known and currently under investigation (at last!) by Systems. The Help one doesn't look familiar to me.
Global mutexes are STUPID. Sadly this is not well understood by everyone (MS is not the only offender here).
The other popular way for the debugger to hang is when an errant app calls SendMessage( HWMD_TOPMOST ) and one of the receiving apps is stopped in the debugger. SendMessageWithTimeout is your friend here. Clipboard ops can also hang the debugger (as the System decides to ask the app that is stopped for its clipboard, which won't ever return). Natually everyone blames the debugger.
Andy Pennell (link) - 12 07 06 - 20:09
Yup, I've run into the clipboard one, too. Fortunately I don't usually need to work with apps that have clipboard input support.
WinDbg has grown on me over the last couple of years; it's rougher to use than Visual Studio, but it has lots of interesting and powerful features, and it's a lot simpler than Visual Studio so it's much less likely to trip on problems like these.
Phaeron - 13 07 06 - 00:08
Unfortunately, if you use other langauges, such as Japanese, you can't really disable Text Services Framework, as it kills the language bar and such. Whee. I wonder if this is why starting VS 2003 would occasionally cause Explorer to stop responding for a bit, and why VS would take much longer to start up from time to time. :-
Coderjoe - 13 07 06 - 15:03
I also noticed that behaviour with VS, but also with Delphi...
Mike - 23 07 06 - 13:02
I installed windbg 2 days ago to look at some running processes using sysinternals processviewer. Then yesterday I start up Visual Studio to do some coding. Everything is fine until I run the app in managed mode and the system goes crazy! No updates on the screen EXCEPT the mouse. No excessive CPU usage. No keyboard input is accepted. I run the app outsidde of VS and it works fine. I try different projects and they all crash! I figure it is the debugger. I uninstall Windbg and it still does it. I still haven't fixed it. Today I will reinstall VS2005. It is a pain because I think I am going to have to reinstall my 3'rd party components as well. Initially, I thought it was them that were causing the crash. Then a ren a project that didn't have any 3'rd party components and it still crashed. What a pain in the ass! I do have office installed but I wasn't using it at the time. I will try your workarounds to see if they fix the problem but I am afraid to try anything because I am doing a hard reset every time it happens.
brickbat - 02 10 06 - 04:16
"crappy Text Services Framework"
"Disable Text Services Framework in Regional and Language Settings. You probably don't need it, and CTFMON.EXE is not particularly known for contributing to system stability anyway."
I don't believe the TSF is that crappy, and you need it if you use non-English languages, because it hosts the IMEs, and things like that.
Yuhong Bao - 09 11 08 - 15:50
I'd call a framework that contributes to global system instability crappy.
Phaeron - 09 11 08 - 15:54
"I'd call a framework that contributes to global system instability crappy"
And I agree, but the TSF does not contribute to instability in my experience.
Can you be more specific? Because maybe this needs to be debugged.
Yuhong Bao - 09 01 09 - 15:46
Did you even read the blog entry? The global mutex allows hangs in one program to affect others.
Phaeron - 10 01 09 - 01:50
This is a security vulnerability. The desktop (in my case IE) will suffer a DoS whenever an attacker squats the mutex.
Jeffrey Walton - 23 08 09 - 09:31
> This is a security vulnerability. The desktop (in my case IE) will suffer a DoS whenever an attacker squats the mutex.
I don't think this is relevant. The mutex is a local resource of the window station, so if an attacker has access to the mutex, they could also just run a high priority thread at 100% CPU and have a similar effect.
Phaeron - 23 08 09 - 10:18
> I don't think this is relevant.
One man's feature is another man's securuty vulnerability. On a fresh XP image (US/English installation) without Office 2003, the mutextes are not present and the desktop does not suffer.
In an Enterprise, users typically do not have the right to elevate thread priorities (Local Security Settings, 'Increase scheduling priorities'). But the Enterprise usually has Office installed.
Jeffrey Walton - 23 08 09 - 14:20
Or an attacker could keep activating a full-screen Direct3D device, or create a top-most full screen window, or monopolize the disk with non-buffered I/O, or exhaust the system-wide supply of USER handles....
There are far too many actions that can be done at the local code level without special privileges that can effectively make a desktop session useless on Windows to worry about this one.
Phaeron - 23 08 09 - 16:06
I have a similar problem which annoys the hell out of me: apparently randomly Visual Studio 2005 (SP1) freezes almost the entire computer (Windows 2003 Ent. x64 Ed SP2). I can still use the remote desktop sessions or some other non-microsoft applications but even the command prompts freeze any command I enter. Trying to kill devenv processes does not help either. At some point I waited more than half hour and no luck. My only option is hard reset :(
I have read this post and I will try to see if it still hangs after turning off the text services.
dan - 30 03 10 - 06:31