Braindump, Graphics View and the Joys of off-screen rendering

Hi all, a short braindump from me here. Sometimes the best way to get things out of your head is to write things down. And there's a cabin trip for Qt Software this weekend; I don't want to be thinking about code (yeah right!).

Qt is designed to support both direct (framebuffer, software) rendering (fex Raster, our current default engine on Windows), and indirect rendering models such as X11, OpenGL, and perhaps other future engines such as OpenVG. Graphics View encapsulates structured graphics in Qt into a mid-level API that hides some of the painting details (while giving you full control to do detailed painting). It provides an object model / scene graph, and an abstraction over the rendering pipeline that lets us do many neat tricks to make it easier to run the same application efficiently on several different rendering architectures. Because Qt decides when and how your item is drawn, this gives Graphics View useful control:

  • Qt calls QGraphicsView::paintEvent
  • QGraphicsView::paintEvent calls QGraphicsView::draw{Background,Items,Foreground}
  • each call goes to QGraphicsScene::draw{Background,Items,Foreground}
  • QGraphicsScene::drawItems calls QGraphicsItem::paint

Speed. Software rendering can be very fast, and pixel perfect, although it doesn't scale very well. For small devices, we've seen that software rendering can even outperform GL and do quite amazing things: fullscreen effects and blending, even blur. Still, the GL chipset can often do things faster. It's just a matter of knowing what it can do, and how to make use of it. But it's very rare that hardware acceleration gives you pixel perfection and 100% pixel-by-pixel control. The closest you get in other paint engines than Raster is either by rendering into an intermediate paint buffer (using Raster!) and passing that to the rendering system, or by using a shader/fragment program as is done in OpenGL 2.0 (which, on desktop systems with modern cards, produces very very good results!). With indirect rendering models there are several classic "problems" that lead to slow and jerky, sometimes ugly-looking graphics, if you make the assumption that it works just like software rendering. Rotating a pixmap on X11 requires a client-server network roundtrip (or unix domain socket). OpenGL has extensions that allow video playback, but the equivalent of a real-time software 2D/3D rendering engine pushing its pixels onto a GL context is absurd; it's just not how GL works. The basic idea with indirect models is that you should try to store as much state as possible server-side (X11) or on the graphics card (OpenGL). Fonts glyphs, standard icons and transformable theme elements, push them over! Then per frame all you need to do is say where you want the elements drawn, and how.

Now QPainter has an imperative API that's based on rendering vector and pixmap graphics to a device. If you want to support an indirect rendering model efficiently, you should render your contents into a buffer which can be passed on to the graphics system, and then referenced when you need it. In Graphics View, because the rendering abstraction is slightly higher, you can avoid having to do this by enabling what we call "Cache Modes". Notice that cache modes are implemented on top of QPainter, and QGraphicsView is a regular widget, so all Graphics View does is use the cool stuff in Qt, and put it together in a way that's hopefully useful. ;-)


item->setCacheMode(QGraphicsItem::ItemCoordinateCache);

Cache modes are for configuring two different types of offscreen buffers in order to accelerate item rendering:

  • ItemCoordinateCache / "Item cache"
    • Rendering the item using an untransformed painter, into a pixmap that is "axis-aligned" with the item's local coordinate system
    • The pixmap is truncated to the item's local bounding rect
    • The resolution of the pixmap is configurable
    • The result is never pixel-perfect
    • Repainting the item happens if
      • You call update() on the item
      • The item's geometry changes (prepareGeometryChange(), or updateGeometry()).
  • Examples: The Qt 4.5 "Boxes" Demo (written by Kim Kalland), and Samuel Rødal's WolfenQt, use ItemCoordinateCache to allow transformations without repaints.
  • DeviceCoordinateCache / "Device cache"
  • Rendering item using a transformed painter, into a pixmap that is "axis aligned" with the viewport
  • The pixmap is truncated to the item's _mapped_ bounding rect. (mapped to view)
  • The resolution of the pixmap is fixed and unconfigurable
  • The result is pixel-perfect (no visual difference from direct rendering)
  • Repainting the item only happens if
    • You call update() on the item
    • The item's geometry changes (prepareGeometryChange(), or updateGeometry()).
    • The item is rotated, scaled, sheared, or projected
  • Example: Plasmoids in KDE use DeviceCoordinateCache to avoid repainting when moving applets around

ItemCoordinateCache has a visual impact because it requires you to decide in advance what resolution your off-screen pixmap should have. This is the only way we can avoid any repaints unless the item wants to be repainted. Here's what the different cache modes end up presenting to the user:

collidingmice-mouse.png collidingmice-mouse-itemcache.png collidingmice-mouse-itemcache-low.png
NoCache / DeviceCoordinateCache ItemCoordinateCache Low-res ItemCoordinateCache

You can compare DeviceCoordinateCache to rendering exactly what you see in the viewport, at that exact resolution, but into a secondary buffer first. This ensures that the item is rendered in a pixel-perfect way. However, it also means the item must be rerendered when scaled, rotated, sheared, projected. If you transform the item in a way that ends up being a pixel-translation on the screen, the pixmap is just reused and redrawn at a new pos.

The purpose of these modes is to avoid having to repaint the item. In any indirect graphics system, repainting the item requires some sort of a roundtrip as image or vector data is transferred from one side of the graphics pipeline to the other. This is often expensive. QPainter translates into raw OpenGL calls or shader scripts when you draw onto a QGLWidget, and can (as the Pad Navigator Demo shows) produce OpenGL UIs that run fast (but on fast hardware!). However with limited hardware, poor OpenGL chips, when the graphics bus is slow as is common on embedded GL chipsets, the best approach may be to push textures to the "other side", and "remote control them".

For the "colliding mice" example, direct rendering is used (i.e., no cache). You get the best performance from this example on Windows, or when running Qt 4.5.0 on Linux with "-graphicssystem raster". If you switch to OpenGL, it will still run fast, but not as fast as you'd expect for an OpenGL-powered application. So maybe a cache mode would help? Well for one, DeviceCoordinateCache is unsuitable for this demo, as the mice are continuously rotated (invalidating the off-screen buffer at every frame). ItemCoordinateCache is a good match. By setting ItemCoordinateCache on the mice, the mice don't look as pixel-perfect as without it (especially using Raster), but graphics performance is very high - in fact rendering of the scene goes down to taking ~0% of my desktop system's resources (the bottleneck of the example becomes the collision-detection). On embedded systems with no FPU, ItemCoordinateCache can also be faster btw, as the example is very floating-point and path-heavy, and painting rotated images might be faster than doing those floatops.

OT: I just want to mention at this point that some people have requested that cache modes become "implicit", i.e., Qt should devide for you, and you shouldn't need to toggle something as low-level as this. For a very high level API I would agree. But at the abstraction level Graphics View lives, QGV does not know whether you need pixel-perfection or not, and it does not know if you intent to rotate your items a lot. Only you, dear item author and user, know :-). And that's why the API is there.

Now what's happening on the research side with Cache Modes? We are currently researching subtree caching.

Graphics View's cache modes work fine with simple items. But what about more complex items; items with children? Today, you need to explicitly set the right cache mode on each child. This isn't an uncommon situation to be in. You stack items inside each other, like you would typically do when creating forms, layout stacks. If you want to do transformations on the whole thing, the most optimal solution is to have the entire item subtree collapsed into a single texture, which is then transformed. In contrast, painting several smaller items (sometimes there's many of them! 50-100?) with a transformation can be very costly. Either it's sluggish (no cache) or it exhausts your texture memory (individual caches).

The "Embedded Dialogs" demo shows how a dialog, which represents a potentially complex subtree, is collapsed into a single surface to ensure that there are no repaints as the dialogs are transformed when hovering in and out. This is a "happy accident" though ;-). The QWidget subtree is proxied into a single QGraphicsItem. But it did get me and several others thinking, why doesn't QGV support this for any kind of item subtree?

deepcache.png
On the left, the item is rendered using no cache (direct rendering). The middle image shows how each element is individually cached, which is the only approach you have today (Qt 4.2, 4.3, 4.5.0). On the right all items are cached into a single surface allowing "no repaints", while conserving texture memory.

Collapsing a subtree into a single offscreen buffer is possible. I've spent two days this week researching it, wrote some code, and ended up with a prototype that's so ugly I don't want to share it _just yet_. ;-) But I've seen that it's perfectly possible without messing up QGV's internals. I dubbed two new cache modes:

  • DeepItemCoordinateCache - caches the item and "all" children, no repaints for "any" child if the parent is transformed
  • DeepDeviceCoordinateCache - save for DeviceCoordinateCache

During prototyping, I found a few issues that need to be solved, but it's not a huge job to make this work.

  • I currently have no idea how to handle ItemIgnoresTransformations
  • Window children probably shouldn't be collapsed into the same cache
  • As children move, transform or update, the changes must be recorded in the caching parent's offscreen buffer
  • Combining existing cache modes with new deep cache modes should work fine (i.e., parent sets deep cache, child already have itemcoordinatecache)
    • But what if the child uses DeviceCoordinateCache? :-P

How can deep caching be reused in an unexpected way?

  • In the chip demo, when selecting and moving items around, we can temporarily reparent all selected items into a parent, enable deep device coordinate cache on the parent, and then remove the parent when the items stop moving. This means that even though you might be moving hundreds of items around, you're actually only _really_ moving a little pixmap. Hah!

So many unanswered questions, but that's the fun part with research. Since I have seen that it works and produces the result I want (no repaints! whole dialog lives in graphics memory), I'm certain the answers to the questions above will show up one at a time.

OK that was a long blog, but I felt like writing it all down.

What do you think? Is DeepItemCoordinateCache and DeepDeviceCoordinateCache useful?


Blog Topics:

Comments