§ ¶Beware of QueryPerformanceCounter()
When it comes to high-precision timing on Windows, many have gotten used to using the CPU's time stamp counter (TSC). The time stamp counter is a 64-bit counter that was added to most x86 CPUs starting around the Pentium era, and which counts up at the clock rate of the CPU. The TSC is generally readable via the RDTSC instruction from user mode, making it the fastest, easiest, and most precise time base available on modern machines.
Alas, it is rather unsafe to use.
The first problem you quickly run into is that there is no easy way to accurately and reliably determine the clock speed of the CPU, short of perhaps doing calibration over a longish period of time. Sometimes you don't need super accuracy or only need to deal with timing ratios, in which this doesn't matter. However, you're still screwed when you discover that on CPUs with speed switching, the speed at which the TSC counts will change when the CPU speeds up or slows down, which makes the TSC's rate swing all over the place. And if this weren't enough, the TSC is not always synchronized on dual-core or SMP systems, meaning that the reading from the TSC will jump back and forth by as much as 0.2ms as the kernel moves your thread back and forth across the CPUs. Programs which do not have adequate safety protection may be surprised when time momentarily runs backwards.
For reasons like these, Microsoft now recommends that you use QueryPerformanceCounter() to do high-precision timing. What they don't tell you, though, is that QPC() is equally broken.
The documentation for QueryPerformanceFrequency() says that not all systems have a high-performance counter. Truth be told, I've never seen a system that didn't support QPF/QPC, including ones running Windows 98, NT4, and XP. However, the timer that is used can vary widely between systems. On Win9x systems that I've seen, QPF() returns 1193181 -- which looks suspiciously like the clock rate of the venerable 8253/8254 timer. On a PIII-based Windows 2000 system, I got 3549545, which happens to be the frequency of the NTSC color subcarrier, but is probably just a factor of a common clock crystal used by some chipset timer. And I've also seen the CPU clock speed show up, or CPU clock divided by 3.
Some of these timers used for QPC also have bugs.
When I was looking at some anomalous capture logs from one of my systems, I noticed that the global_clock values from the capture subsystem, which were recorded in the capture log, occasionally jumped forward or backward by a few seconds compared to the video capture clock. (While video capture drivers are notoriously flaky, there were no gaps in the video and I'm pretty sure my PlayStation 2 didn't burp for three seconds.) When I tried Windows XP x64 Edition, the HAL used the CPU TSC for QueryPerformanceCounter() without realizing that Cool & Quiet would cause it to run at half normal speed. And recently, I've had the pleasure of seeing a dual-core system where use of the TSC exposed QPC-based programs to the same CPU-mismatch bug that RDTSC incurred. So, realistically, using QPC() actually exposes you to all of the existing problems of the time stamp counter AND some other bugs.
So, what to do?
I switched VirtualDub's capture subsystem from QueryPerformanceCounter() to timeGetTime(). I had to give up microsecond precision for only millisecond, but it's more reliable. If you don't really need high precision, you can use GetTickCount(), which has terrible precision on Win9x (55ms), but it's reliable, and it's fast, since it just reads a counter in memory. If you're a user suffering from this problem, you can try fixing the problem by adding /usepmtimer to the end of the boot.ini entry, which switches QPC() to use an alternate timer (usual disclaimers apply; back up data before trying; no purchase necessary; void where prohibited).
I thought that was what's going on in Unreal Tournament (the original one). On my laptop, if I have the processor speed set to adaptive then it runs far too fast to be playable.
I'm guessing that it works out the CPU speed, then the CPU bumps its clock speed up from 1/5th normal to full speed. Still, I've seen worse ones - there's a couple of games I've got that assume the processor is clocked at whatever the average was for when they were released, and hence are unplayable on modern computers.
Thomas (link) - 07 06 06 - 06:29
Remember WingCommander? Totally CPU perfomance based. It had NO timer, as the processers then were not powerful enough to run a timer with such a sophisticated game.
TechMage89 - 07 06 06 - 10:02
That might explain why Unreal 2 had some problems running on a dual core. Frames that go back and forth... AND have the game running at 50% speed because it used only 1 core. Strange enough most games or maybe programs that have similar problems can be solved by using the Windows Compatibility set to Win98... don't ask why, it work!
Simbou - 07 06 06 - 13:19
timeGetTime() is 32-bit, so it loops at about 50 days (uptime, not runtime); applications need to cope with that.
Related: gettimeofday() in Linux is the system clock, so can move all over the place. clock_gettime(CLOCK_MONOTONIC) fixes this; if that's used, watch out for timed thread operations (eg. pthread_cond_timedwait); the pthread_*_setclock API needs to be used.
Glenn Maynard - 07 06 06 - 13:54
same with Counter Strike 1.6(I think it was) running on windows 2000 on a dual celeron 400 MHz. Overclocking the processor to 550 MHz in windows using bp6fsb increased the movement speed for the player causing some complaints about cheating when outrunning the other players.
tsp - 07 06 06 - 15:01
I believe that if you set your call between "timeBeginPeriod" and "timeEndPeriod" with a value of 1ms, the accuracy of GetTickCount goes down to 1ms, which is good enough for most things.
blight - 07 06 06 - 16:00
Sorry, that's just lazy. Either using a counter off the timer interrupt (the default BIOS int 08h does this for you), or reading timer 0 directly is cheap if done once a frame, and you can still CPU-time if you recalibrate your loops to real-time once in a while.
Everyone should be using differences anyway. :) I suppose that some C# programmers may be caught if they have checked arithmetic and don't wrap the expression with unchecked(), though.
Do keep in mind what that does, though -- it causes the kernel timer interrupt to run at 1KHz, which slows down the system. It's not too bad on modern systems, but I remember the overhead being noticeable on Windows 98, to the point that I had a tiny DOS program that I ran to reset the timer whenever some program forgot to do so.
Phaeron - 07 06 06 - 23:41
Pthreads timeout operations (eg. pthread_cond_timedwait) use a "give up at this time" interface, and not "give up after this amount of time". It means you have to be sure to use the clock that it expects, or configure it to use the one you want.
(Clock selection aside, the interface has its advantages. It makes accurate repeat timeouts much cleaner, which is handy when you want to timeout on a condition variable that might be triggered for other reasons, for example.)
Glenn Maynard - 08 06 06 - 13:23
Well, that does raise the question of why you would need such an accurate timeout. With or without a timeout, the code surrounding the synchronization call should always recheck the predicate in order to be robust against spurious wakeups. I would also argue that using the wait timeout as a timer is goofy, and that you should use an actual timer; Win32 at least offers waitable timers. Usually, the only good reason I see for using a non-zero timeout is either as a safety measure against synchronization accidents or as a workaround for poorly designed synchronization interfaces. Admittedly, though, the lack of a multiple-wait in pthreads is a problem (Win32 at least has WaitForMultipleObjects, even if it might not be very fast).
Phaeron - 09 06 06 - 00:45
A lot of games have this issue also. You can set cool n quiet of EIST in laptop/newer cpu's to run at a lower rate, load up your game, then set it to full speed. It will make your movement faster and other unfair bonuses.
Supjohndog - 09 06 06 - 16:23
Just saw this today, and it reminded me of this entry: http://www.reghardware.co.uk/2006/07/04/..
squid - 05 07 06 - 05:22
Out of curiousity, I looked into how this is usually handled, and the Linux kernel at least runs a short calibration loop with an adaptive algorithm to sync the TSCs manually. While this can get very close, it seems that you can still get errors upwards of 500 clocks between CPUs. Given very fast accesses to shared memory, I can still see the potential for anachronisms to appear. Synchronization algorithms in particular would be very intolerant to seeing time momentarily regress.
So, while running such a utility would greatly diminish the impact of the problem, I still think it's a good idea to avoid QPC() when feasible. Besides, not everyone who has an affected dual-core or SMP system will run the program.
Phaeron - 06 07 06 - 00:15
"So, realistically, using QPC() actually exposes you to all of the existing problems of the time stamp counter AND some other bugs."
Not all at the same time. It is like claiming Netscape 8 is insecure because it expose you to the bugs of both IE and Firefox.
Yuhong Bao - 07 08 08 - 22:57
Also, at least the OS can fix the problem, while RDTSC is just an assembly instruction that the OS cannot do anything about. The /usepmtimer switch is a good example. It would not be able to fix program that use RDTSC, but it will fix programs that use QPC().
Yuhong Bao - 07 08 08 - 23:00
This blog article on another blog reminds me of this:
Yuhong Bao - 08 09 08 - 19:32
Jason (link) - 30 09 08 - 04:44
Just recently, I was looking for a more accurate timing method (for a 3D rendering engine we're working on), but I was not aware of these issues until I stumbled upon this page. There's a lot of good information here on your blog, thanks for this great article!
Marlon (link) - 20 01 09 - 17:08
blight wrote on 07 06 06 - 16:00:
> I believe that if you set your call between "timeBeginPeriod" and "timeEndPeriod" with a value of 1ms, the accuracy of GetTickCount goes down to 1ms, which is good enough for most things.
IIRC, it only have effect on the timeGetTime() mmsystem function.
bohan - 17 02 09 - 08:40
Hi guys, I am researching a method for timing input/output control for a neuroscience experiment and we are experiencing some problems with Windows and jitter. If there is a delay between the input triggering and the output being sent to our other system that is okay because we can subtract it out, but we need this time for the card to go from input to output to only vary by at most 5 microseconds so we can sync it reliably with our neural activity data. So far the card seems to have almost no jitter itself, but it seems to be all from our software.
I found this article really helpful and I am going to try these commands out, but if you have spare time and any ideas and would like to help advance some brain science, please e-mail email@example.com
Brian - 20 04 09 - 18:21
I am writing an animation engine - this bug was giving me subtle stutter and shake on playback, thought I must have a bug until I saw QueryPerformanceCounter running backwards! Thanks for the tips (And thanks for Virtualdub too - I love it).
Aartform Games (link) - 09 10 09 - 08:07
Using either QueryPerformanceCounter(), TimeGetTime() or GetTickCount() is now discouraged by Microsoft. Developers should be using Multimedia Timers instead:
Anonymous (link) - 31 10 11 - 13:00
Uh, I don't see where in the link you posted that it says that. In fact, timeGetTime() is within the Multimedia Timers section.
Phaeron - 31 10 11 - 15:21
By now QueryPerformanceCounter has been fixed for all supported OS/machine versions and it is no longer discouraged, instead it is now the recommended way to measure high precision time.
Zarat - 23 02 15 - 19:54