QtQuick3D instanced rendering

QtQuick3D introduces support for instanced rendering in Qt 6.1. This is a feature of the graphics processor (GPU) that can dramatically improve performance. Instanced rendering makes it possible to render a large number of items with a single draw call. (For those familiar with low-level OpenGL, the function glDrawElementsInstanced is one example.)

A simple instanced rendering example showing 20 000 metallic doughnut shapes.

Using this new instancing feature on my development machine, QtQuick3D can render one million cubes at 60 frames per second (FPS), using only 2% CPU time. The same scene recreated with the API in Qt 6.0, using Repeater3D to generate cubes, starts to struggle at ten thousand cubes: only managing 42 FPS and using 100% of the CPU. 

Here is the source code for the simple example shown above, creating a scene that shows 20 000 metallic doughnut shapes with random position, color, and rotation:

import QtQuick3D
import QtQuick3D.Helpers
import QtQuick

Window {
    width: 800
    height: 450
    visible: true
    View3D {
        anchors.fill: parent
        camera: camera
        environment: SceneEnvironment {
            backgroundMode: SceneEnvironment.SkyBox
            lightProbe: Texture {
                source: "skybox.hdr"
            }
            probeExposure: 3
        }
        PerspectiveCamera {
            id: camera
            position: Qt.vector3d(0, 300, 500)
            eulerRotation.x: -25
        }
        DirectionalLight {
            eulerRotation.x: -30
            eulerRotation.y: -70
        }
        RandomInstancing {
            id: randomTable
            instanceCount: 20000
            position: InstanceRange { from: Qt.vector3d(-5000, -2000, -9000); to: Qt.vector3d(5000, 200, 500) }
            rotation: InstanceRange { from: Qt.vector3d(20, 0, -45); to: Qt.vector3d(60, 0, 45) }
            color: InstanceRange { from: Qt.rgba(0.1, 0.1, 0.1, 1); to: Qt.rgba(1, 1, 1, 1) }
        }
        Model {
            instancing: randomTable
            source: "torus.mesh"
            materials: PrincipledMaterial { metalness: 1.0; roughness: 0.2; baseColor: "#ffffff" }
        }              
    }
}

 

The API

The main principle of the instancing API is that it is explicit. It does not try to autodetect opportunities for instancing within the existing API.

The starting point is the Instancing object: a table that defines how each copy is rendered. The modifications are transformation (position, rotation, and scale); color (blended with the model’s material); and custom data that can be used with custom materials. In 6.1 there are two ready-made QML types:

  • InstanceList lets you enumerate all instances and bind to the properties of each instance. 

  • RandomInstancing, as seen above, provides a way to quickly test and prototype.

Other kinds of instance tables can easily be defined using the C++ API. We will add additional QML instancing tables in future releases. Possibilities include tables defined by data models, and tables read from external sources.

Once the table is defined, it is then applied to a Model by setting its instancing property. A single table can be used with several different models at the same time.

By writing custom shader code, it is possible to use instancing to control additional properties, such as variables for physics based rendering, skeletal animation weights, distortion, or anything else that can be expressed with custom materials. Currently the custom data in the instancing table is limited to four floating point numbers.

The custominstancing example in Qt 6.1 uses custom materials and an instance table implemented in C++ to draw a complex scene consisting solely of one single cube repeated multiple times:

The custom instancing example shows how to combine instanced rendering with custom materials.

Trade-offs

Who wouldn’t like 100 times better performance? Should we instance all the things? Instancing is not a silver bullet, of course, and it will not always be the right tool for the job. Here are some considerations:

  • First of all: instancing is meant for rendering a large number of copies of the same model with well-defined modifications. It will not help when drawing a small number of very different things.

  • The main improvements are in CPU and memory usage. GPU performance is a mixed bag: We get an improvement by reducing the number of draw calls, and possibly also the amount of data, but the vertex shader is more complex since it needs to multiply in the transformation for each instance.

  • When rendering individual models, QtQuick3D will sort opaque and semi-transparent objects separately and render them in the optimal order. With instancing, the GPU will render in the order specified by the instancing table, and QtQuick3D will not sort the table for you. This saves a lot of CPU time, but it has two consequences:

    • Opaque objects will not be rendered in the optimal order, meaning that the same pixel may be written multiple times, causing more work for the fragment shader, and potentially making the performance worse.

    • For semi-transparent objects, the blending will only be correct if the items are rendered back-to-front. This may not be a big problem if the copies don’t overlap too much with each other, or if the opacity of each copy is low. In the worst case, however, it may look like this:

      opacity2

Tech preview

Instancing is in tech preview in Qt 6.1 and will be fully supported in Qt 6.2. We are not planning major changes to the API, but there could be minor modifications based on your feedback. Binary compatibility will probably not be preserved: The part that is most likely to change is that currently the binary layout of the GPU instancing table is reflected in the public C++ API.
If you are considering using instanced rendering with Qt 6.2, now is the perfect time to start. Your code will need at most very minor modifications, and your feedback helps us shape Qt 6.2 so it even better supports your use case.

QtQuick3D Benchmark application with instancing
And remember: Always measure performance when optimizing.


Blog Topics:

Comments