Insanity is shaping the same text again and expecting a different result

Albert Einstein has been quoted as saying that "insanity is doing the same thing over and over again and expecting a different result." Apparently this is a misquote, and the original quote actually belongs to Rita Mae Brown, but that's not important right now. What's important is that most Qt applications are crazy.

Background
I'll explain. Some readers may remember Gunnar's excellent blog series about graphics performance, how to get the most of it in Qt. He mentioned the fact a few times, that text rendering in Qt is slower than we'd like.

To see why text rendering is so slow, we need to look at what happens when you pass a QString into QPainter::drawText() and ask it to display it on screen. A QString is just an array of integer values which are defined to signify specific symbols in specific writing systems. How these symbols should actually look on the screen is defined by the font you have selected on your painter.

So the first step of drawText() is to take the code points and turn them into index values which reference an internal table in the font. The indices are specific to each font, and have no meaning outside the context of the current font.

The second step of drawText() is to collect data from the font which describe how the glyph should be positioned in relation to the surrounding glyphs. This step, the positioning of each glyph is potentially very complex. Several different tables in the font file need to be consulted, with programs and instructions that e.g. do things like kerning (allowing parts of certain glyphs to "hang over" og "stretch underneath" other glyphs) and placing one or more diacritical marks on the same character. Some writing systems also allow complex reordering of glyphs based on context of the surrounding characters, as explained by Simon in his blog from 2007. This complex shaping of the text is currently handled by the Harfbuzz library in Qt.

The third step applies only if the text has a layout applied to it. The layout would be the part which breaks text into nicely formatted lines. In Qt, this could be based on HTML code, using QTextDocument or WebKit, or it could be a simpler layout, just making the text wrap and align within a bounding rectangle. The former isn't supported by QPainter::drawText(), so I'll focus on the latter. Using information from the shaping step, the text layout calculates the width of unbreakable portions of the text and tries to format the text in a way which looks nice on screen but which does not expand beyond the bounds set by the user.

In the fourth and final step, the paint engine takes over. Its job is to draw the symbols retrieved in the first step at the positions calculated in the second and third step. In most of Qt's performance-sensitive paint engines, this is done by caching a pixmap representation of the glyph the first time it is drawn, and then just redrawing this pixmap for every call. This is potentially very quick.

While these four steps may be slightly intertwined in Qt today, this is in principle what happens every single time you call drawText() and pass in a QString and a bounding QRect. Yet, in very many cases, both the text, the font and the rectangle remains completely static for the duration of your application, or at least for the main bulks of it. And this is the insane part: a lot of time is wasted here. Qt already provides QTextLayout as a way to cache the results of the first three steps and pushing this directly into the paint engine. However, QTextLayout is somewhat complicated to use, it has overheads related to its other use cases, and it stores a lot more information than what is needed specifically for putting the symbols on the screen, making it unsatisfactory in very memory sensitive settings.

QStaticText!
We decided there was a need for a specialized class to solve this problem. We named it QStaticText, and it will be available in Qt 4.7. QStaticText has been optimized specifically for the use case of redrawing text which does not change from one paint event to another. We've tried to keep the memory footprint to a minimum, and currently it has an overhead of approximately 14 bytes per glyph (including the 2 bytes per unicode character in the string, which would assumably already be part of the application), as well as about 200 bytes of constant overhead.

In the rest of this blog, I'll show some graphs to illustrate the benefits of using QStaticText for drawing text. QStaticText is supported by the raster engine (the software renderer used as default on Windows), the opengl engine and the openvg engine. For now, I'll focus the attention of this blog on the raster engine and the opengl engine. I'll also focus on the following platforms: Windows/desktop, Linux/desktop and the N900 (also running Linux, of course.) Note that the hardware on the Windows and Linux machines is different, so the results will not be comparable from platform to platform.

Benchmarks for fifty character, single-line text
The benchmark I'm running is this: drawing the same 50 character string over and over again in each paint event and measuring how many "glyphs per second" we can achieve using different techniques to draw the text. I am testing the following text drawing mechanisms:

  • A call to QPainter::drawText() with no bounding rectangle.
  • A call to QPainter::drawStaticText() with no bounding rectangle.
  • Caching the entire string in a pixmap before-hand and drawing this in each paint event using QPainter::drawPixmap().
  • When testing on the OpenGL paint engine, the graph will also contain results for QStaticText with the performance hint QStaticText::AggressiveCaching. This is a hint to the paint engine that it is allowed to cache its own data, trading some memory for speed. It is currently used by the OpenGL engine to cache the vertex and texture coordinate arrays that are passed to the GPU when drawing the glyphs.

    On Windows
    Lets start off with the results for the raster engine on Windows. As I said, the measurement is in "glyphs per second", i.e. the number of symbols we can put to the screen during a second of running the test. The measurement is based on the frame rate of the test, which is taken as the average of nine seconds of execution per test case. Note that cleartype rendering was turned off in the OS during the test. The difference between a drawPixmap() result and a drawStaticText() result would be larger with cleartype turned on, but cleartype is not generally supported when caching the text in a pixmap, since the pixmap will inevitably need to have a transparent background, and you can't do subpixel antialiasing on top of a transparent background. Therefore all the benchmarks are run without subpixel antialiasing to get a better comparison.

    windows_raster1.png

    As you can see, the fastest way to draw text is to cache it in a pixmap and draw this, as pixmap drawing is extremely fast on modern hardware. However, in many circumstances you don't have the memory to spare for this kind of extravagance, and drawStaticText() pushes over half as many glyphs per second as the equivalent drawPixmap() call. It is also three times faster than a regular drawText() call.

    Using the OpenGL paint engine instead, performance of drawPixmap() shoots through the roof:

    windows_opengl1.png

    The other bars look small in comparison, but drawStaticText() using the aggressive caching performance hint in fact pushes out 5,6 million glyphs per second in this benchmark, while a regular drawText() call manages a measly fifth of that.

    On Linux
    Similar numbers occur on Linux:

    linux_raster.png

    Using drawStaticText() gives you more than a 2x performance boost over using drawText(), and drawPixmap() is a little bit less than 1,5 times the speed of drawStaticText(). When using the OpenGL engine, the difference is smaller:

    linux_opengl.png

    As you can see, drawing a cached pixmap on Linux desktop is only slightly faster than drawing the static text item when aggressive caching is used. The hardware and the driver both play a part here, but at the very least we can see that both outperform drawText() by seven or eight times.

    On N900
    All the benchmarks so far have been on the desktop, where memory is cheap. Caching a few text items as pixmaps may not be the proverbial drop on those platforms, and as we have seen, using pixmap caching has the potential of being really fast. On an embedded device, however, we need to be a little bit more careful when we allocate big chunks of memory, so something like QStaticText, which is both lean and fast, can be a great tool on these platforms. So lets look at a few benchmarks for the N900 as well.

    For the raster engine on the N900, the drawText() baseline performance on the N900 is currently nothing less of horrible, as you can see from the following chart:

    n900_raster.png

    This is of course a puzzle which will be investigated closer, as there's no reason why it should be this much slower to call drawText(), but for now we recommend using the native engine or a QGLWidget viewport on this device. At least it makes the other bars look really large in comparison. A more interesting result is that drawStaticText() can push as much as two thirds the number of glyphs per second as when just drawing a single pixmap that covers the same area, so we have a pretty good ratio of performance on this device.

    As we see from the following chart, similar numbers can be achieved when using the OpenGL engine:

    n900_opengl.png

    Conclusion
    The benchmark results displayed here so far are for a single-line piece of text, thus there is no need for the third step in the overview from earlier, where the text is formatted based on a layout. This has some implications, namely that the drawText() call can skip the third step as outlined in the beginning of the blog, as it does not need to do any high level text layout. On text which requires this in addition, performance will be even worse with drawText(), but approximately the same with drawStaticText() and drawPixmap(), since the layout step has already been done in advance. Another thing to note is that the text is fairly long and fairly dense. For shorter texts, and/or text which has more space (such as a multi-line string might have), the performance of drawStaticText() may very well be greater than that of drawing a pixmap, since the number of pixels touched becomes a greater factor in the equation.

    An interesting measurement which is not included here, is the CPU load of the different functions also. We don't have any formal benchmarks for that at the moment, but since less time is spent on CPU intensive work when using drawStaticText() over drawText(), the CPU will have more free time to do other stuff, which is a good thing. And another pleasant discovery we made while benchmarking QStaticText on the N900, is that you have to increase the number of draw-calls made per frame to a pretty high number for it to visibly factor into the time spent in the paint event. This means that even with, say, fifty strings, the drawStaticText() calls should not be any considerable impact on the performance of the application. Swapping the front and back buffers will still be the main bottle neck, which is a suitable ideal.

    So the bottom line is: If you are using drawText() in your application to draw text that is never or very rarely updated, then you might consider using QStaticText instead when you start building against Qt 4.7, and we'd love to hear what you think about the API and the performance once you get a chance to try it out.


    Blog Topics:

    Comments