Qt 6.6 and 6.7 Make QML Faster than Ever: A New Benchmark and Analysis

It has been a while since the last time I've posted here and a lot has happened to the Qt Quick Compiler infrastructure. It's time to show some updated numbers. The benchmark discussed in my previous post was heavily focused on value types and lists of value types. It applied some rather complex tricks to eke out the maximum speedup between the interpreted and compiled versions of the same program.

Today I'm going to work with something more familiar to most people. I've written a new benchmark that's mostly based on object types (and lists of those), and refrains from underhandedly instructing the compiler's type propagation using secret knowledge about the quirks of JavaScript operators. It also does something useful this time around. I've re-implemented the DeltaBlue constraint solver found in the V8 benchmarks in typed QML.

In and of itself this is a somewhat foolish endeavour. Since I want to use object types, I use a separate QObject for each variable and each constraint. QObjects, as we all know, have a rather significant static overhead. Allocating a QObject just to store a few integers is quite a waste. The original implementation uses JavaScript objects. While still not ideal, those are somewhat more lightweight. Furthermore, in order to run the actual algorithm, we have to call a lot of functions, and calling functions in QML contexts is generally more expensive than doing the same in JavaScript contexts. This is because the context and scope hierarchy in QML is much more complex, and we often have to perform extra type conversions.

So, why did I do this? Most of you will want to deal with QObjects in a lot of places since all of Qt Quick is built on QObjects. You cannot avoid allocating a QObject if you need an Item. So, in a way the implementation using QObject as storage for everything, while slower, is also more realistic. You may argue that I should have written the benchmark with Qt Quick itself to get even more realistic. I decided not to do so because as soon as you add actual graphics to the mix, you have to deal with a lot more noisy data. Qt Quick itself often adds unpredictable overhead you don't want to deal with in a benchmark. For example, if you happen to have any text in your application, it has to create the font database at some point. Or, the scenegraph performs complex operations in the background to put the pixels on the screen. Those operations may or may not happen in a separate thread, and if so, there is still a synchronization phase for each frame. Finally, the graphics driver itself kicks in and performs its own calculations. This is all very interesing if you're benchmarking Qt Quick. However, I want to benchmark the QML language here. For me this is all just noise. Therefore, I've written a non-graphical application built with QObjects. You can find the code in this repository.

And here is the good news: The performance numbers for dealing with QObjects and calling typed functions on them have improved massively in Qt 6.6 and Qt 6.7.

Time taken to run the DeltaBlue benchmark with different versions of Qt

result

On the Y axis you see the milliseconds it took to run one iteration of the benchmark. Lower is better. The benchmark was run with:

  • Qt 5.15, the last version of the Qt5 series. This is our baseline. In Qt 5.15 the Qt Quick Compiler didn't generate any C++ code for functions and bindings. It only produced byte code to be interpreted or JIT-compiled.
  • Qt 6.2, since that is when the new Qt Quick Compiler was introduced as tech preview.
  • Qt 6.5, the last LTS version.
  • Qt 6.6, the most recent release, highlighted where appropriate.
  • A recent snapshot of Qt 6.7, highlighted where appropriate

The setup

The benchmarked program takes as input a number of variables and constraints between them. The variables are effectively numbers and the constraints hold:

  1. An input variable
  2. An output variable
  3. A scale variable
  4. An offset variable

Either of these can be null. The constraint solver then manipulates the variable values, trying to achieve a state where for each constraint we get:

output == input * scale + offset

There are more details to it, but this is the gist. Suffice to say, it's a somewhat demanding computational problem and as such well suited for our purposes.

We run this on two sets of inputs: 1. A chain of alternating variables and constraints, 100 variables long. 2. A projection where 100 inputs are scaled and offset into 100 outputs.

The split between those two inputs is not very interesting. I'm giving them both together as a single data point in all the discussions below.

Collected data points

As mentioned before, there are two implementations of the actual algorithm:

  1. The JavaScript version, almost as found in the V8 benchmark suite.
  2. The QML version I've written.

Finally, I've split the execution into two phases for both implementations:

  1. A setup phase where all the objects are created that shall hold the variables and constraints.
  2. The actual execution of the DeltaBlue algorithm.

Combining all this, a single run of the benchmark produces 4 data points:

  1. The total time for the QML version
  2. The object creation time for the QML version
  3. The total time for the JavaScript version
  4. The object creation time for the JavaScript version

It has to be said that we cannot run the exact same code for all versions of Qt to be tested:

  • Qt 6.2 and Qt 5.15 cannot declare and initialize list properties in one QML binding/declaration. So, those had to be split in two lines.
  • Qt 6.2 and Qt 5.15 do not know pragma ComponentBehavior so this had to be dropped, causing some IDs to become invisible to the compiler.
  • Qt 6.2 and Qt 5.15 do not know that ':/qt/qml' is a default import path. It's added manually.
  • Qt 6.2 and Qt 5.15 cannot construct a QQmlComponent from URI and name. They have to load by URL instead.
  • Qt 5.15 has no proper build system API for QML modules. We build using qmake and CONFIG+=qtquickcompiler instead.
  • Qt 5.15 does not understand imports without versions. We add some versions to make it happy.

I could have avoided some of those differences, but I intentionally used the new features. They lead to improved performance where they are available, and the improved performance is what we are after.

No disk cache mode

Now, since we don't have enough dimensions in our data, yet, we're adding another one. The benchmark by default uses Qt Quick Compiler to compile bindings and functions to C++. Comparing the numbers produced by the compiled code should give us the speedup caused by the compilation for each version of Qt, right? Well, unless the performance of the interpreter has also changed. In order to control for this, we also run the benchmark with QML_DISABLE_DISK_CACHE=1 for each version of Qt. This makes it ignore the compiled artifacts and instead work with the QML source code.

Finally, the Qt Quick Compiler Extensions have an extra feature that comes in very handy here:

Static mode

Consider three files A.qml, B.qml, and C.qml:

// A.qml
import QtQml
QtObject { property int v: 11 }

// B.qml
import QtQml
A { property string v: "foos" }

// C.qml
import QtQml
QtObject { 
    property A a: A {}
    function evil(b: B) { a = b }
    function bark() { console.log(a.v) }
}

If you instantiate C and play with the evil and bark functions a bit, you will discover a feature of the QML language you didn't want to know about. It's called property shadowing. For great many properties and methods we cannot know in advance what types they will have at run time. This is a nasty problem for the Qt Quick Compiler. In Qt 6.6 it has learned to deal with it by wrapping the affected values in QVariant and checking their types where necessary. This comes at a performance cost, though. qmlsc has an extra option --static that tells it to ignore any shadowing. You can use it at your own risk. There are some properties that are intentionally shadowed. For example we're moving the focusReason property to QQuickItem, leaving a property of the same name in QQuickControl for backwards compatibility. Most shadowing, however, is a mistake.

The --static option was not available in Qt 6.2 and only takes effect with Qt 6.5, 6.6, and 6.7.

In our benchmark, we know we haven't shadowed anything, and we don't want to pay the performance price of checking. Therefore, we add a third, static, mode to each benchmark run to see how much we can gain in comparison to the normal, shadowable mode.

The results

I've tried very hard to produce stable, comparable, data. The benchmarks are run on a linux machine booted directly into a shell, without init system. For program run I first try to warm the caches by performing a dry run, and then run 1000 iterations of the benchmark. For each benchmark function the program is re-started from scratch so that they cannot interfer with each other. So let's go back to the graph above.

The first thing to note here is that I was not fully successful in my attempts to produce clean data. The JavaScript numbers should all be the same, especially within a single version of Qt. The way the QML code is compiled should not have any effect on the JavaScript execution. All the JavaScript at play here lives in a separate deltablue.js file that cannot be compiled to C++. Realizing this, I advise you to take all of the data with a roughly 5%-sized grain of salt.

Another thing you can immediately see is that the QML version of the algorithm is generally much slower than the JavaScript version. As noted above, this is due to it being built on QObjects rather than JavaScript objects.

On top of this, there is a noticable drop in performance between Qt 5.15 and Qt 6.2, for the QML version. If you look at the code you notice that there are a lot of as casts in there that tell the compiler what type to expect for some potentially shadowed value. In 5.15 as is a no-op. It was originally meant as a compile time only construct. Later, however, we noticed that this will lead to behavior differences between compiled and interpreted/JIT'ed code. To avoid those, we introduced type checks for both the compiled code and the interpreter and JIT. So, the later versions of Qt do more work here, but for Qt 6.2 and 6.5 it does not pay off, yet. Qt 6.2 and 6.5 still have to interpret or JIT most of the code here since their compilers' language coverage is rather limited.

With that out of the way, let's look at the happy side of things. I've highlighted it in orange and red. Qt 6.6 takes about half the time Qt 6.5 takes to run the QML version of the benchmark, and Qt 6.7 improves on this some more. In static mode we get down to about a third of the 6.5 numbers. Here we get into a territory where the object creation overhead starts to dominate the benchmark. With Qt 6.7 in static mode, it took less time to run the whole benchmark than it took for the object creation alone with Qt 6.2.

Object creation also includes initial binding evaluation, which is why the object creation also benefits from compilation of bindings and expressions to C++. A complementary solution to object creation overhead will be qmltc, once it's ready.


Blog Topics:

Comments