When being wrong is right

I like July. There are many good things about this month, like having no snow in Oslo or no night in Oslo (technically we do, but it's an hour or two long only). Another one of them is, like Harald said, vacations. And though I like mine very much, the one I'm thinking of is about most of Finland being on vacation. That means less emails, less things I need to handle. And in turn that means I have some time for free thinking.

A couple of weeks ago, I was discussing on the #meego IRC channel with Arjan van de Ven and some others about timers. He had come to me with a suggestion for a new API in Qt which would match the glib g_timeout_source_new_seconds and related API. Arjan is the reference when it comes to conserving power on Linux systems. He's the author of PowerTOP, Timechart and his blog is full of information on this subject (unfortunately he doesn't seem to blog often, which is a pity).

What he explained to me is that many timers in applications don't need an exact precision, but just a ballpark. More importantly, it's better for the battery life if the CPU sleeps as most as possible, so batching up the wakeups within one application and across the applications achieves interesting results. We sacrifice precision in favour of battery life. As he explained to me, Glib will create a hash of some session attributes and then schedule wakeups at this fraction of a second, called "perturbation". All applications use the same session attributes, so all applications calculate the same perturbation. And since these values are constant for the duration of the session, we have wake-up points spaced one second apart from each other.

Why doesn't it simply choose a constant value instead of calculating? Something like 0? Well, the answer is that you want to save battery power on your device, but you want to decrease network congestion. Imagine that you're on a network with other devices using the same algorithm. Also imagine that you have synchronised your system's clock with an outside source via NTP. If everyone wakes up at the same time, there's a possibility that they will all try to use the network at the same time, causing collisions and thus retransmissions.

Now, this isn't news to us. I remember discussing the idea of coarse timers with Brad more than once, and possibly with some others. But we've never actually done it, because we didn't have a good algorithm.

So I sat down at my workstation two evenings and I hashed out a basic implementation, including a new QAbstractEventDispatcherV2 class and an enum to QTimer. Then I improved the algorithm while on the flight to Finland, to attend Akademy 2010 in Tampere -- as usual, it's when you're on battery power that you worry about saving battery :-)

Instead of two modes that Glib offers you (high precision and seconds-backed), I implemented three, aided by C++ overloading and one enum. The enum is given so that you can choose the current mode (QTimer::HighPrecision) or if you want to choose the seconds-backed implementation (currently called QTimer::VeryCoarse, API reviews clearly not done yet). My implementation of the seconds-backed algorithm doesn't calculate a perturbation, though: it simply rounds the fraction of second to the nearest second (as well as your interval).

How can I get away with using a perturbation of 0, unlike Glib, you may ask? The difference is in how we calculate time in Qt. Glib's main loop uses the g_get_current_time function, which returns the actual time, like Qt's QDateTime::currentDateTime or currentMSecsSinceEpoch (new in 4.7), which are adjusted by NTP. Qt's event loop, however, uses a feature of Unix systems called "monotonic time", which is not adjusted by NTP, and returns a value that never goes backwards nor jumps (hence the name "monotonic", related "monotonous" because it's very boring). The monotonic timer is exposed in Qt as of 4.7 with the new class QElapsedTimer.

The other big new feature I added is the new, default mode. I've made QTimer default to an adaptative algorithm which I've called QTimer::Coarse (again, I haven't done API reviews yet). Remember: we've been telling you for years that QTimer is not precise, so I'm taking the opportunity to make it imprecise. But I'm not going overboard, so I chose a maximum absolute error of 5%. This number also gave me the two boundaries for the adaptative algorithm: 20 ms and 20 seconds.

For 20 ms, 5% error is 1 ms, so the adaptative algorithm would introduce errors of less than the calculation granularity. So instead, if you specify a timer of 20 ms or less, we'll simply switch to the high-precision mode. This means all your 16 ms timers (to reach 60 fps) will continue working as they did before. Similarly, at 20 seconds, the error I can introduce has reached 1 second, so we simply switch to the seconds-backed algorithm. For this case, all your 30 second timers for network timeouts will be rounded to the second.

Between 20 and 100 ms (except for 25, 50 and 75 ms), we simply round off the wakeup time. So if you ask for a 23-millisecond interval, your code will wake up first at 22 ms, then at 24 ms later, then 22 again, etc. So in average you get 23 ms, but it's always a bit off.

For the other cases, the algorithm will introduce up to 5% errors to try and align wakeups. The algorithm I implemented simply checks if the interval [expected-ε, expected+ε] (where expected is the precise wakeup time and ε is the 5% error) includes one of the preferred wakeup times. If it's possible to wake-up at a full second, it will do so. Otherwise, it will check if it's possible to wake up at 500 ms after the full second. And then other instants...

More than that, it chooses what boundaries to wake up based on what your timer is a multiple of. If you have an interval multiple of 500 ms, it will try to wake up at a multiple of 500 ms (that is, 0 or 500). It's multiple of 250 ms, it will try to wake up at a multiple of 250 ms (0, 250, 500 and 750). And so forth for 200, 100, or 50 ms. If your interval is not a multiple of 50 ms, like an interval of 623 ms, you're just weird and the code won't do you much good :-)

The interesting net effect of this is that timers drift slowly over time. They drift towards a preferred boundary and will reach it within 20 cycles (in the worst case). Once they've reached the preferred boundary, they will lock to it and become rather precise.

Take the following as an example: now the time is 40120.673 seconds and you ask for a 400 ms timer, we calculate that the maximum absolute error is 20 ms and our preferred wakeup time is a multiple of 200 ms. The first wakeup time would be 40121.073 and the closest multiple there is 40121.000. Given the maximum error we can introduce, we'll schedule a wakeup at 40121.053, that is an interval of 380 ms. The next timeout will be at 40121.433, then 40121.813. After that, we realise that the instant of 40122.200 is within reach, so we choose that one. And after that, what we realise is that we can wake up at 400 ms intervals precisely.

While for one timer this isn't very useful, a normal application has a couple of timers running, and a system has several applications running. Now applications will ask the kernel to wake up at the same instant, so the kernel can in turn schedule a long sleep and wake up the CPU from deep sleep only when needed.

The above isn't perfect, though. For one thing, if you start two 400 ms timers, depending on the time that you started them in, the algorithm might decide to drift them in different directions (because 1000 isn't a multiple of 400). So your application will instead be consistently waking up every 200 ms. So you still need to find your own way of synchronising events.

Arjan also pointed out that Linux since 2.6.28 has a feature that allows applications to inform the kernel what their timer slack is, via prctl(2) (though the PR_SET_TIMERSLACK constant doesn't seem to be documented). I'm not sure how to best use this feature to save power. What I'd like is to tell the kernel, at the moment that we're going to sleep, what our slack is, so we can avoid one extra syscall. Arjan told me that, if we can make use of it, he'll add that syscall to the kernel :-)

And I did try the code in existing applications. Actually, I have been running KDE for now two weeks with different versions of this code, without ill-effects. In fact, the number of wakeups per second in complex applications like plasma-desktop seem to have diminished. Such complex applications often load plugins that do independent tasks. I noticed that plasma, for example, starts 62 timers of 2 seconds on my workstation (a little less on the laptop). With my code, that is all reduced to one or two wakeups (on the laptop, the startup is slow enough that it always crosses a boundary and thus plasma will wake up twice a second).

Anyway, about the title of this blog... when I wrote it, I actually was reminded of "Einstein's biggest blunder," but which today is under debate because Einstein might have been right even when he thought he was wrong.

Happy hacking and happy holidays. See you in August (± 5%).


Blog Topics:

Comments