Sorry guys…

Sorry guys...

I know you have spent a lot of money on buying faster machines to get Qt to run fast enough for the software you are creating, but I'm now sad to inform you that Qt won't make full use of these machines. In fact it will use your CPU a little bit less.

Jokes aside... We've been running a few optimizations internally, under the code name "Qt Falcon" (its going to make Qt fly!), and I wanted to share some of the current results. These things are not yet integrated into Qt Main, but they should be in place by the time Qt 4.5 goes Tech Preview.

Personally I sit on Windows for most of my daily work, so the benchmarks are from a Windows XP machine, and during the falcon initiative we did take the time to iron out a few things that has bothered us with the windows code paths along the way. Windows uses a software rendering engine, internally referred to as the "Raster Engine". This is also the engine used for embedded and QImage drawing on X11 and Mac.

Lets start at the beginning, QPainter::begin(), a function that is called at the start of every single paint event in the history of Qt. When we originally designed the paint engines, we aimed at them being shared amongst different instances of the same subclass. e.g. all QWidget painting was done using a single paint engine. For this reason, we put most of the initialization logic into QPaintEngine::begin(), because the actual device changed all the time. With the introduction of the backingstore in 4.2 all widgets are actually being drawn to the same device so this begin() initialization started to make less sense, but the design stayed. With 4.5, we make raster engines be one per image and one per backingstore. The initialization is done outside of begin(), in the constructor if you can believe that.

On Windows, we also checked, in QRasterPaintEngine::begin(), if the system had switched its cleartype settings since the last time. This check was actually costing ~25% of the call to QPainter::begin() (ouch!). By listening for the system event when the users changes the settings instead of polling the value in the registry, we could kill those 25% and also support the feature that Qt switches cleartype when the user press "Apply" in the control panel, which we previously didn't do.

A comparison of a plain QPainter::begin() / end() looks like this:


The graph shows, in microseconds, the time to create a QPainter on a device and call end on it

We had a fair idea that save / restore was very costly when clipping was enabled in the 4.4. Part of the reason for this is that the communication between QPainter / QPaintEngine was done via a flat state update. A restore was performed by replaying the previous stack element, so if you consider the case of

p.setClipRect(rect1, Qt::ReplaceClip);
p.scale(2, 2);
p.setClipRect(rect2, Qt::IntersectClip);
p.setClipRect(rect3, Qt::IntersectClip);;

In the last line, restore(), QPainter would replay three clip operations to the underlying engine. Horrible you may think, and it is a piece of code that has been troubling me since I wrote it in the early 4.0 days. With Falcon, the engines can be made aware of QPainters stack, making it possible to cache the results on each level. That means that restore() becomes just a stack pop(). The results look like this:


The graph shows, in microseconds, the time it takes to run save, followed by a state change, followed by restore

Another Windows thing that had bugged me for a while is the text drawing. Two separate things came together in QRasterPaintEngine::drawTextItem() as a bit of a mess. Again, I'm much to blame and the code has been troubling me for some time, but I didn't have time to get back and fix these things. Until now, anyway... Point one, was that the only way to draw nice fonts on windows is using GDI, it does (in my opinion, I know people disagree ;) ) the nicest font rendering of all the systems, so using FreeType or another method would not be acceptable visually, as Qt would look worse than other apps on the platform. So we had to mix GDI and our own raster engine together to do clipping, textured / gradient text. The other point is that any pixel touched by GDI will have 0 as its alpha channel. The raster engine relies on premultiplied alpha so this basically destroys all rendering done afterwards. *sigh*

The solution to the two was to use GDI to render into a buffer and sample the values back into the raster engines buffer and at the same time patching the alpha channel. For cleartype this was even worse as the separate buffer first needed to be filled with the background (cleartype pixels depend on the background as well as the foreground). It ran "ok", but it we were quite aware of that it could be done better. With Falcon, we introduce a mask-texture for each font engine, which generates the glyphs once using GDI. The approach supports both normal and cleartype text drawing and the cleartype approach even does the full RGB blend with Gamma correction (thanks to Samuel who spent an entire day with me to get the gamma correction of three-component alphablending proper). The speedups are quite noticeable. The results are measured in milliseconds:


The graph shows, in milliseconds, the time it takes to draw the text "abcdefg" on a QPixmap using the default font with either cleartype or non-cleartype

We also removed some of the overhead of drawPixmap() on the raster engine. The problem was that there was a bit of set-up before we could get into the actual pixel-by-pixel blending. Instead of going through the generic rendering pipeline, we introduced a faster path for unclipped pixmaps (or pixmaps that fit inside a rectangular clip). I won't bore you with the details, as this blog is already twice as long as I intended it to be. The results for 4.4 and Falcon fall together and become identically, speed wise, at around 200x200, but for the icon-size pixmap drawing, there is quite a visible difference.


The graph shows, in microseconds, how long time it takes to draw a QPixmap, solid or semi-transparent, at the specified sizes

What I've mentioned above, are a bunch of separate small things, and you may be thinking that how does this affect me in the real-world. You probably don't do for-loops of begin/end() or save/restore. So I'll finish up with some real-world examples. The numbers below are taken from a benchmark of a few widgets where we simply run "repaint()" a bunch of times. The QLabel numbers don't have much relation to QComboBox etc, so don't pay too much attention to that, but rather how each widget changes from 4.4 to Falcon.


The graph shows, in milliseconds, how long repaint of a few widgets take

Now, much of the overall speedup here, probably over 50% of it is caused by some awesome work that Bjørn Erik has done in the backingstore. The rest is due to small things here and there, like the ones mentioned above.

There has been quite a bit of refactoring in the works here, and some work is still ahead of us, (like re-enabling GDI based glyph-generation of transformed text, which will be vector path based in the upcoming TP) but I hope that the changes we've made will benefit most users and that those that see problems with the new approaches let us know so we can look at those too.


Blog Topics: