Over the past couple of years, several of us have dedicated a lot of time to Chrome’s timer system. Because we do things a little differently, this has raised some eyebrows. Here is why and what we did.
Goal
Our goal was to have fast, precise, and reliable timers. By “fastâ€, I mean that the timers should fire repeatedly with a low period. Ideally we wanted microsecond timers, but we eventually settled for millisecond timers. By “preciseâ€, I mean we wanted the timer system to work without drift – you should be able to monitor timers over short or long periods of time and still have them be precise. And by “reliableâ€, I mean that timers should fire consistently at the right times; if you set a 3.67ms timer, it should be able to fire repeatedly at 3.67ms without significant variance.
Why?
It may be surprising to hear that we had to do any work to implement these types of timers. After all, timers are a fundamental service provided by all operating systems. Lots of browsers use simpler mechanisms and they seem to work just fine. Unfortunately, the default timers really are too slow.
Specifically, Windows timers by default will only fire with a period of ~15ms. While processor speeds have increased from 500Mhz to 3Ghz over the past 15 years, the default timer resolution has not changed. And at 3GHz,15ms is an eternity.
This problem does affect web pages in a very real way. Internally, browsers schedule time-based tasks to run a short distance in the future, and if the clock can’t tick faster than 15ms, that means the application will sleep for at least that long. To demonstrate, Erik Kay wrote a nice visual sorting test. Due to how Javascript and HTML interact in a web page, applications such as this sorting test use timers to balance execution of the script with responsiveness of the webpage.
John Resig at Mozilla was also wrote a great test for measuring the scalability, precision, and variance of timers. He conducted his tests on the Mac, but here is a quick test on Windows.
In this chart, we’re looking at the performance of IE8, which is similar to what Chrome’s timers looked like prior to our timer work. As you can see, the timers are slow and highly variable. They can’t fire faster than ~15ms.
A Seemingly Simple Solution
Internally, Windows applications are often architected on top of Event Loops. If you want to schedule a task to run later, you must queue up the task and wake your process later. On Windows, this means you’ll eventually land in the function WaitForMultipleObjects(), which is able to wait for UI events, file events, timer events, and custom events. (Here is a link to Chrome’s central message loop code) By default, the internal timer for all wait-event functions in Windows is 15ms. Even if you set a 1ms timeout on these functions, it will only wake up once every 15ms (unless non-timer related events are pumped through it).
To change the default timer, applications must call timeBeginPeriod(), which is part of the multimedia timers API. This function changes the clock frequency and is close to what we want. Its lowest granularity is still only 1ms, but that is a lot better than 15ms. Unfortunately, it also has a a couple of seriously scary side effects. The first side effect is that it is system wide. When you change this value, you’re impacting global thread scheduling among all processes, not just yours. Second, this API also effects the system’s ability to get into it’s lowest-power sleep states.
Because of these two side effects, we were reluctant to use this API within Chrome. We didn’t want to impact any process other than a Chrome process, and all of the possible impacts of the API were nebulous. Unfortunately, there are no other APIs which could make our message loop work quickly. Although Windows does have a high-performance cycle counter API, that API is slow to execute1, has bugs on some AMD hardware2, and has no effect on the system-wide wait functions.
Justifying timeBeginPeriod
At one point during our development, we were about to give up on using the high resolution timers, because they just seemed too scary. But then we discovered something. Using WinDbg to monitor Chrome, we discovered that every major multi-media browser plugin was already using this API. And this included Flash3, Windows Media Player, and even QuickTime. Once we discovered this, we stopped worrying about Chrome’s use of the API. After all – what percentage of the time is Flash open when your browser is open? I don’t have an exact number, but it’s a lot. And since this API effects the system globally, most browsers are already running in this mode.
We decided to make this the default behavior in Chrome. But we hit another roadblock for our timers.
Browser Throttles and Multi-Process
With the high-resolution timer in place, we were now able to set events quickly for Chrome’s internals. Most internal delayed tasks are long timers, and didn’t need this feature, but there are a half dozen or so short timers in the code, and these did materially benefit. Nonetheless, the one which matters most, the timer stall for the browser’s setTimeout and setInterval functions did not yet benefit. This is because our WebKit code (and other browsers do this too) was intentionally preventing any timer sustaining a faster than 10ms tick.
There are probably several reasons for the 10ms timer in browsers. One was simply for convention. But another is because some websites are poorly written, and will set timers to run like crazy. If the browser attempts to service the timers, this can spin the CPU, and who gets the bug report when the browser is spinning? The browser vendor, of course. It doesn’t matter that the real bug is in the website, and not the web browser, so it is important for the browser to address the issue.
But the 3rd, and probably most critical reason is that most single-process browser architectures can become non-responsive if you allow websites to loop excessively with 0-millisecond delays in their JavaScript. Remember that browsers are generally written on top of Event Loops. If the slow JavaScript interpreter is constantly scheduling a wakeup through a 0ms timer, this clogs the Event Loop which also processes mouse and keyboard events. The user is left with not just a spinning CPU, but a basically hung browser. While I was able to reproduce this behavior in single-process browsers, Chrome turned out to be immune – and the reason was because of Chrome’s multi-process architecture. Chrome puts the website into a separate process (called a “rendererâ€) from the browser’s keyboard and mouse handling process. Even if we spin the CPU in a renderer, the browser remains completely responsive, and unless the user is checking her Task Manager, she might not even notice.
So the multi-process architecture was the enabler. We wrote a simple test page to measure the fastest time through the setTimeout call and verified that a tight loop would not damage Chrome’s responsiveness. Then, we modified WebKit to reduce the throttle from 10ms to 1ms and shipped the world’s peppiest beta browser: Chrome 1.0beta.
Real World Problems
Our biggest fear with shipping the product was that we would identify some website which was spinning the CPU and annoying users. We did identify a couple of these, but they were with relatively obscure sites. Finally, we found one which mattered – a small newspaper known as the New York Times. The NYTimes is a well constructe site – they just ran into a little bug with a popular script called prototype.js, and this hadn’t been an issue before Chrome cranked up the clock. We filed a bug, but we had to change Chrome too. At this point, with a little experimentation we found that increasing the minimum timer from 1ms to 4ms seemed to work reasonably well on most machines. Indeed, to this day, Chrome still uses a 4ms minimum tick.
Soon, a second problem emerged as well. Engineers at Intel pointed out that Chrome was causing laptops to consume a lot more power. This was a far more serious problem and harder to fix. We were not concerned much about the impact on desktops, because Flash, Windows Media Player, and QuickTime, were already causing this to be true. But for laptops, this was a big problem. To mitigate, we started tapping into the Windows Power APIs, to monitor when the machine is running on battery power. So before Chrome 1.0 shipped out of beta, we modified it to turn off fast timers if it detects that the system is running on batteries. Since we implemented this fix, we haven’t heard many complaints.
Results
Overall, we’re pretty happy with the results. First off, we can look at John Resig’s timer performance test. In contrast to the default implementation, Chrome has very smooth, consistent, and fast timers:
Finally, here is the result at the Visual Sorting Test mentioned above. With a faster clock in hand, we see performance doubles.
Future Work
We’d still like to eliminate the use of timeBeginPeriod. It is unfortunate that it has such side effects on the system. One solution might be to create a dedicated timer thread, built atop the machine cycle counter (despite the problems with QueryPerformanceCounter), which wakens message loops based on self-calculated, sub-millisecond timers. This sounds trivial, but if we forget any operating system call which is stuck in a wait and don’t manually wake it, we’ll have janky timers. We’d also like to bring the current 4ms timer back down to 1ms. We may be able to do this if we better detect when web pages are accidentally spinning the CPU.
From the operating system side, we’d like to see sub-millisecond event waits built in by default which don’t use CPU interrupts or otherwise prevent CPU sleep states. A millisecond is a long time.
1. Although written in 2003, the data in this article is still relatively accurate: Win32 Performance Measurement Options.
2. http://developer.amd.com/assets/TSC_Dual-Core_Utility.pdf
3. Note: The latest versions of Flash (10) no longer use timeBeginPeriod.
NOTE: This article is my own view of events, and do not reflect the views of my employer.
Great, great article. I love the detail. Some questions:
1. timeBeginPeriod is only on Windows. What’s the story for Mac? Linux?
2. If Flash changes timeBeginPeriod globally, then if you re-run Resig’s test with Flash enabled are the results better?
3. Similarly, if Flash changes timeBeginPeriod, is it bad for my laptop battery to have Flash enabled? (Clearly, running Flash for an hour is bad, but it seems like what you’re saying is if I load Flash it changes the setting globally, so after 1 minute when I stop using Flash then Windows is still hammering the laptop battery because of the faster clock frequency.)
I agree with Steve: this is one of the best articles I’ve read in a long time. And those pesky timers: always causing havoc! 🙂
I’d also like to find out how timers were handles on Mac OS X and Linux — it’d result in a very insightful, real-world comparison of the different major platform’s timers.
Thanks!
This article is about time on Windows. I don’t know of any OS other than Windows that has such problems. On Mac and Linux we just call the one-liner API for getting a high res time, and everything works. There are no global side effects, and using the time routines doesn’t cause the hardware to use more power. You could argue that Windows has a legacy, but Unix is older than Windows 🙂
Regarding Resig’s test if Flash were running. No – it would not change the results for IE. I didn’t mention this in the article, but the time routine that you pick matters; you need to use one which uses a finer grained clock.
Finally, yes, any program which uses timeBeginPeriod will drain your battery faster on Windows. This includes Windows Media Player, Quicktime, and old versions of Flash. The latest version of Flash (10.x) doesn’t use it.
Here’s a picture of John’s test on Linux.
http://i.imgur.com/xHkaW.png
(Other runs after I took the screenshot were more bumpy, but not much.)
As for how it works: you’ll need a bit more background, unfortunately.
First, to be clear, Mike was posting about two separate things. (1) Windows timers aren’t hi-res by default; (2) browser timers have a minimum period that’s rather low-res. The latter probably historically came from the former but the two aren’t necessarily related.
Regarding (1): the timeBeginPeriod business is only set up for a UI message loop, so I was initially confused because within a renderer on Win/Linux we use our “default” message loop (basically, a fallback message loop for bits of the code that aren’t interacting with UI or IO).
But the Windows “default” message loop seems to also use the WaitFor* APIs, so I guess it benefits from the fact that hi-res timers are globally set by Chrome’s UI thread. (Our third loop type, “IO” uses a different API. Maybe Mike can comment on whether we would need this timeBeginPeriod call if we were using an IO loop within a renderer.)
The “default” message loop on Linux/Mac (the code is shared) is just the normal pthreads API (pthread_cond_wait), which doesn’t have any resolution limitation. (On Linux, as I understand it the kernel just reacts to scaling dynamically; you can read about it here: http://kerneltrap.org/node/6750 .)
One final important caveat: only on Mac we actually use a UI message loop even within the renderer, which means it’s actually going through the higher-level OS timer code. The code says it’s needed to make rendering work with Cocoa. I glanced at the Mac UI loop code but it’s a bunch of CF stuff and I know even less about Mac than I do Windows. I would be pretty surprised if they had a timeout limitation like Windows, though.
To summarize, I think all the timer-adjusting workarounds are Windows-specific and it Just Works on other platforms, though I know very little about Mac. And the web-level timer adjustment is cross-platform code.
Ugh, I totally lied in my fourth paragraph above. Chrome just always puts the system into high-res timer mode (modulo Mike’s battery remarks), so the type of loop is irrelevant. (I’m still curious whether the IO loop implementation needs the timeBeginPeriod call, though.)
Evan – thanks for the extra details. The Tickless Kernel Timers are nothing short of awesome on Linux – especially given the amazing granularity.
As for Windows – if you’re going into any of the kernel waits, I think you eventually bump up against the system clock. It simply won’t wake you up at a lower interval.
Maybe someone from the Windows Kernel Team will reply on here.
So how does one get IE to do better on Resig’s test? I’ve tried the trick of doing the test with Windows Media Player running at the same time, but that doesn’t seem to work.
I can’t say for sure what IE9 does internally. I’d guess they use a low-resolution clock timer API. But I’m speculating. Maybe someone from the IE9 team will reply here.
For those curious about Mac & Linux: Chrome has hundreds of lines of code in it’s windows implementation in order to deal with the complexities of high resolution timers. On mac and linux, it’s literally a one liner.
Probably a shot in the dark, but what about IE 6/7/8?
Or, do you know someone who might know more on IE internals?
Fantastic article. It seems the performance of the pattern I blog about here
http://www.arbingersys.com/2010/06/im-believer-chrome-javascript-fast.html
is directly related to the Chrome timer implementation.
Very interesting post. One approach to minimizing delay for well-behaved apps while still throttling poorly-behaved ones would be to use a token bucket, where a token represents permission to schedule a sub-10ms timer:
http://en.wikipedia.org/wiki/Token_bucket
Seems like I’m 6 months late to the party, but I disagree on some of the points being made there.
What you are basically doing is making poorly written pages run faster at the expense of the rest of the system (both the performance of other tasks, and energy consumption).
First there’s the fact that modern processors like to be in sleep states as long as possible. You are already aware of this. OS vendors are doing a lot of work to try and accomplish this. There’s a Microsoft presentation where they say they’re looking to enforce low wakeup rates for background applications (on the order of 100ms) on future OSs, because there are so many mis-behaved applications.
Then there’s cache efficiency. It might not show up in “CPU usage” statistics (these are not very precise anyway), but touching memory 1000 times a second pollutes the CPU L2 cache. Context switches are expensive and they clear the L1 cache completely.
However, the real question is why would you want to wake up 1000 times a second? Or 250? This is unjustifiable for 99.99% of all applications, and certainly all web pages existing today. You don’t even gain any perceived performance; no screen updates that fast.
Look at the IE9 performance demos. They’re capped at 60fps. If you watch them in other browsers, the system timer still runs at low granularity (16ms). Yet they run smoothly at 60fps. Running them faster would not improve them a bit.
Using 1ms timers usually means the person writing the code doesn’t know what he’s doing. He’s using a timer as a synchronization primitive, and just put the lowest value that doesn’t cause 100% CPU load.
If some artificial benchmarks bother you that much, just lump timer firings together and fire them back-to-back every 16ms or so (so if the page requests 1ms, fire it 16 times every 16ms). Same end result, much better on the system.
Firefox has some adaptive correction where if they see a timer fired late, the next one gets adjusted to compensate. That’s the reason they can do consistent 60fps without calling timeBeginPeriod(). They don’t fire multiple times though so they’re limited to that speed (apart from the 10ms limit).
Finally, relying on timers firing at some frequency for visual animations is wrong (and arguably the root of the problems you’re trying to fix). As wrong as the first PC games from the early 80s whose speed depended on CPU MHz. The right way to do it is to check the amount of time elapsed and update the animation accordingly. For example:
desired_updates = 60
t = current_time – last time
new_position = old_position + (speed * t)
setTimeout(1000/desired_updates * 2 – t)
This runs at 60fps if possible, but if not the animation will run at the same speed (less smoothly). This is what any decent real-time application does nowadays. Otherwise you run into problems sooner or latter.
Juan –
This is not a tradeoff to make other applications run more slowly. If it is, please provide a benchmark to demonstrate. I’d also like to see a benchmark to demonstrate the L2 cache effects. 1000 interrupts per second is a mere 1Hz! Surely you appreciate that 1Hz is in the noise on a modern 2GHz system. If you do have benchmarks to demonstrate that these are real problems, I’ll be happy to work on new solutions.
As for the rest, I don’t disagree with much that you said. Trust me, we’d rather not do this. And we only have to do it on Windows. Mac and Linux don’t have this problem.
Other systems can dynamically adjust the hardware tick intervals to match the requirements of the application. (read up on tickless timers for more info) I’ve spoken to Microsoft about the issue, and they have been non-committal on fixing the underlying problem.
The latest algorithm employed by Chrome is a user-level implementation to do precisely what Mac and Linux do natively. If no apps are requesting high precision timers, we don’t use them. But if an app asks for it, we enable it.
Mike
Pingback:Better JavaScript animations with requestAnimationFrame | NCZOnline
Pingback:Script yielding with setImmediate | NCZOnline
Pingback:Timer resolution in browsers | NCZOnline
Timer’s in Chrome INHERENTLY have drift BUILT IN. A recurring timer in Chrome is rescheduled based not upon the scheduled fire time, but upon the time the timer was actually fired. So any delay/drift in firing a timer propagates to ALL future fired times. The source code says it all: https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/platform/ThreadTimers.cpp&l=131
Pingback:Why It Took Year For Google To Address Battery Draining Bug In Chrome | Battery News
Pingback:The Default Effect and Usability | Richard E. Latham