What about Qt 3D in Qt 6?

In the previous article we described the changes to Qt 3D that are already in recent Qt 5 versions or in upcoming ones. This article will explain the work that is going on towards the major changes for Qt 3D in Qt 6.

There are many things we would like to improve in Qt 3D for Qt 6 but here I will focus on a couple of the big ones: caching renderer work and modern graphics APIs like Vulkan, Metal and DirectX 12.

Renderer Caching

The way that Qt 3D works is that we have two data structures:

  1. The scene graph – describes the contents of a scene;
  2. The frame graph – describes how the scene graph should be rendered.

Each time we render a frame we must do a lot of work to translate from the abstract descriptions of the scene graph and frame graph to the low-level draw calls that need to be submitted to the GPU. In brief the steps are:

  1. Traverse the frame graph and identify the various phases of rendering. Each phase consists of the render target (screen or an FBO); which camera to use; which window to use; which parts of the scene graph should we draw; any special state to set on the GPU (e.g. disable depth testing or writing, or enable stencil testing).
  2. For each phase of rendering from step 1, filter the scene graph down to the entities we care about.
  3. Select which shaders to use for each entity and the current phase of rendering. Entities may use different shaders for different phases e.g. a simple fragment shader for performing an early z-fill pass or shadow map generation vs the full-blown lighting used for the final pixels on screen.
  4. Pull together the values of uniform variables (variables to customise the GPU shaders).
  5. Bundle all of this information together into RenderCommands.
  6. Once this is done for all phases, we submit the RenderCommandsto OpenGL on a dedicated thread – due to its long history OpenGL is very picky about threading.
  7. The OpenGL submission thread iterates over the render phases and the commands contained therein and translates them from our internal description and dispatches them to raw OpenGL function calls.

This all allows Qt 3D to be extremely flexible but comes at a cost of runtime performance. The usual way to get a nice performance boost is to not do the work in the first place. In theory we can do this by caching some of the intermediate results. In practise it turns out this is really hard to do properly when taken in conjunction with all the other stuff we have to worry about such as the on-demand rendering mode.

There are just a huge number of things to track that could change the appearance of the scene and to know what is the minimum set of tasks that must be re-calculated when any given set of things change between frames. We have added some of this tracking to Qt 5 releases but doing it completely requires a more major refactoring.

Before I describe the work we are doing here I will discuss the other issue:

Modern Graphics APIs

Thus far, Qt Quick has (mainly) been on top of OpenGL (or OpenGL ES) and Qt 3D entirely so. OpenGL has long served us graphics engineers well, but it is a very old API and some issues are so deeply ingrained in the way it is structured that they cannot be fixed without introducing a new API. Also, OpenGL has been extended and “reinvented” over the years in an attempt to keep up with how modern GPUs actually work, and deal with the ever-increasing volumes of data that artists wish to throw at them. Though this has allowed OpenGL to do a great deal of impressive work, it is still limited particularly by its threading model and the heuristics implemented in the drivers that try to guess what we, as application developers, are doing.

Behind the curtains of the driver, OpenGL operates in a very similar fashion to Qt 3D, as described in the previous section. When you issue a bunch of OpenGL function calls, these get translated into commands and stored in a command buffer and at some point (determined by the driver’s best guess), are submitted to the hardware for processing.

Once the commands in the command buffer have been processed by the hardware they are thrown away, and the next frame we have to issue the OpenGL function calls again. The same dance happens frame after frame which can be wasteful.

Creating the commands in the driver is quite an expensive operation and with OpenGL it is all limited to a single thread. So, throwing them away is a bit of a waste. The GPU vendors who write the drivers add in all kinds of heuristics to try to guess what we as library and application developers actually want and they cache things and tune operations as best they can. This makes the drivers larger, more complex, harder to maintain and leads to massive performance differences between GPU vendors in some cases.

The threading model of OpenGL is essentially single threaded only. Yes, there is some support for multi-threading via shared contexts and the like, but these are still serialised internally by the driver. Not surprising given OpenGL’s 20+ year heritage.

The age of OpenGL is another problem. Apple has announced that they have deprecated OpenGL and will only focus on Metal as their graphics API. At some point in the future we may find that OpenGL disappears from MacOS and iOS. Even before then, OpenGL on these platforms will not see any new features (and indeed they haven’t for many years already).

What can we do about these issues? Well, modern graphics APIs have emerged in the last few years to address these and other issues. Vulkan, Metal and DirectX 12 are all very explicit APIs that expose more direct control over the GPU than OpenGL does.

That’s great you say, but again there are trade-offs. Much of the work that was being done by the OpenGL driver is now the responsibility of the library or application developer. That initially sounds very scary, and to some extent it is. However, it allows us to use our higher-level knowledge of what our application is doing to extract much greater performance from the GPU. Or if we wish, to do a similar amount of work in a shorter time and allow the CPU/GPU to do much better in the race-to-sleep and save battery power. A big win for mobile devices as well as desktop.

Whereas the OpenGL driver will throw away the command buffers it has gone to great expense creating for every frame, when we as application developers use Vulkan or similar, we can know when it is safe to keep those command buffers around and just resubmit them next frame.

You may be wondering what good that does. Doesn’t submitting the same command buffers just get us exactly the same thing on screen as the previous frame? And if so, why bother doing anything at all?

These are good questions. However, even if we submit the same command buffers to the GPU again and again, the resources to which they refer may well contain different data. This is not just the vertex buffers and textures but also uniform buffer objects which are often used to hold material properties and camera transformation matrices etc. That’s good, we can potentially save ourselves a lot of work if we can track what has changed and so we know if we can just resubmit the same work to the GPU.

It gets even better though! Vulkan exposes the concept of primary and secondary command buffers. Primary command buffers are what we submit to the GPU and these may contain calls out to secondary command buffers. One way of using this facility is to record the commands for drawing some entities into secondary command buffers up front.

When we wish to draw the whole scene, our renderer can create a primary command buffer that calls out to those command buffers for those entities that are deemed to be visible. When the visibility changes (e.g. if the camera moves or some entities move), then we can re-record the primary command buffer. That is also very good.

It gets even better though! With Vulkan, we can record the command buffers on different threads too! We are responsible for submitting the command buffers to the GPU queues and synchronising the work that they do – both between different GPU queues (graphics/compute/transfer etc) and between the GPU and the CPU.

As you can see, we can get a lot more control over the operations involved and the hardware, but we do have to do more work. Overall this is a big potential win.

Back to the Qt 3D in Qt 6 Story

KDAB is actively investigating both of these large stories for the Qt 6 timescale. As you can see from the descriptions above, both tasks involve a lot of work in tracking what state the user changes on the scene graph and frame graph and how this flows through the work that Qt 3D must do. This includes how we can ultimately cache command buffers and other intermediate states between frames to avoid repeating work unnecessarily.

As you may have read, Qt Quick and Qt Quick 3D are being rebased on top of the QRhi layer that provides support for Vulkan, Metal, DirectX 11 and OpenGL. We are still researching whether this can reasonably be extended for Qt 3D’s needs in terms of features and threading or if we need to integrate the graphics API in some other way such that Qt 3D can still play nicely with Qt Quick and Qt Widgets.

There is still a great deal of work to do here but the initial results are looking very promising. We have test scenes containing around 1000 entities able to be rendered at 600 frames per second (with no tearing) on a mid-range desktop when we try to max out the GPU or at 1% CPU load when we clamp to 60fps! This is all on a single core for now too! We have some ideas we are testing out to further improve the threading architecture beyond what is possible within the Qt 5 series.

A side effect of this work is that we are also creating the next iteration of the frame graph which is much more linear in nature and therefore much easier to reason about and is a lot more amenable to having tooling developed for it.

Summary

As you can see, there is a lot of work going on behind the scenes to improve Qt 3D during the Qt 5.x timeline and beyond. We will also be looking for ways to improve the public API but we do not expect many large changes in this regard, rather a few clean-ups of some less than ideal function names and property names.

All of these improvements will also benefit Kuesa which sits on top of Qt 3D and any other application that utilises Qt 3D for its 3D content. These changes will help us build a strong foundation that will allow us to add even more exciting additions during the Qt 6 lifetime.


Blog Topics:

Comments