Incremental improvements

I don't know if this is showing up for the community, but we in Qt have been dedicating a lot of effort for performance improvements in Qt. For Qt 4.5, we had a project codenamed "Falcon" whose job was to improve the graphics engines and make them perform much faster. From that project, we got the graphics engines, including the raster and OpenGL ones.

For Qt 4.6, there was a lot of work done on Graphics View. For Qt 4.7, we're going to do some more. Where, I don't know yet.

Among the many ideas, one I'm interested in seems very small, but may be of important benefit: removing volatile from QAtomicInt and QAtomicPointer.

Here's what happens: QAtomicInt derives from the internal class QBasicAtomicInt, which is a struct of one member: a volatile int _q_value. Similarly, QAtomicPointer derives from QBasicAtomicPointer, which is a struct of one member: T * volatile _q_value. The idea here is to remove those two "volatile" keywords.

Before you cry foul and tell me that I'm going to break your code, let me quote the Qt documentation for these two classes:

For convenience, QAtomicPointer provides pointer comparison, cast, dereference, and assignment operators. Note that these operators are not atomic.

(emphasis is in the documentation)

With that card up my sleeve, I claim that I'm not breaking any contracts. All of the atomic operations that these two classes support (fetch-and-add, test-and-set, fetch-and-store) are implemented in assembly, which means the compiler cannot optimise them anyway. And the assembly code will not be influenced by any caching of the variable contents that the compiler may want to produce. What's more, we also tell the compiler that we changed the value, so that it will discard its cache.

So, why are we considering this?

Well, the reason I hinted above: the compiler caching the value. The whole point of the volatile keyword is that the compiler may not cache the value. It must reload the value every time it tries to access the variable.

And if we look at any of the Qt tool classes, the non-const functions start by calling detach(), which is generally implemented like this (extracted from qlist.h):

inline void detach() { if (d->ref != 1) detach_helper(); }

That is, "if our reference count is not one (i.e., if we're being shared), do the detaching."

And since QAtomicInt::operator int() simply returns _q_value, which is volatile, the compiler has to reload the value every single time. Then it must actually compare that value to 1 and generate the proper branching.

What the compiler doesn't know, is that once a container detaches, the reference count will remain 1. It can only increase from 1 in a way that is visible to the compiler: that is, either in the same thread or, if the container is a globally-visible variable, after mutex locking/unlocking.

So, if we remove the volatile keyword, the compiler is allowed to cache the value of the reference count. Once the first detaching happens, the compiler knows that the reference count is 1. It can therefore optimise out all the next checks, because it also knows that the reference count remains 1.

This would mean that our reference counting system would be a lot more efficient (hence the title of this blog). It might turn out to be the best ratio of performance gain vs effort ever. After all, it's a one-word change in one header file.

That's the theory anyway. I haven't yet tested to see if the compiler really knows how to optimise this the way I expect it to.

PS: credit where credit is due: this optimisation was not my idea. It was Olivier's. And at first I resisted, saying it would break stuff and wouldn't work. But now I'm in favour of it. :-)


Blog Topics:

Comments