The Qt Company has been running benchmarks like QMLBench for a long time that assist us knowing when a change creates a performance regression, but it’s also important to see how Qt performs at a higher level, allowing components to interact in ways that granular tests like QMLBench can’t show. In this blog post, we’ll walk through new application startup testing results from a more real-world QML benchmark application.
For these tests, a relatively simple QML application was developed that utilizes many areas of QtDeclarative and QtGraphicalEffects. The application code was written as a casual developer might, and has not been designed for optimal startup, memory consumption, or performance. Because we’re benchmarking, the application does not make use of Interactive elements or user input. The application is of low complexity, without divergent logic, so that results are as consistent as possible between test runs. Though no benchmark will ever truly simulate real-world performance with user interaction, the test discussed here aims to more accurately represent a real-world QML workload than QMLBench or the QtQuickControls “Gallery” example.
The benchmark application. It combines textures, animations, QML shapes, repeaters, complex text, particle effects, and GL shaders to simulate a heavier, more real-world application than other QML benchmarks like QMLBench.
Lars has previously written about The Qt Company’s commitment to improving the performance of Qt, and with the recent release of Qt 5.12 LTS, the efforts made are really showing, especially on QML. Among the improvements, a good number have been towards improving startup performance. Out of the platforms tested, the greatest startup performance improvement was seen on the lowest power device we tested, a Toradex Apalis i.MX6. Let’s explore that.
The chart above shows how the features in Qt 5.12 LTS really cut down on the startup performance, dropping time-to-first-frame from 5912ms in Qt 5.6 to only 1258ms in Qt 5.12.0, a 79% reduction! This is thanks to a number of new features that can be stacked to improve startup performance. Let’s walk through each.
The shader cache saves compiled OpenGL shaders to disk where possible to avoid recompiling GL shaders on each execution.
Pros: Lowers startup time and avoids application lag when a new shader is encountered if the shader is already in the cache. Cons: Systems with small storage can occasionally clear shader caches. If your application uses very complex shaders and runs on a low-power device where compiling the shader may produce undesirable startup times, it may be recommended to use pre-compiled shaders to avoid caching issues. There is no performance difference between cached shaders and pre-compiled shaders. Difficulty to adopt: None! This process is automatic and does not need to be manually implemented.
Without use of the Qt Quick Compiler detailed below, QML applications built on Qt versions prior to 5.9 LTS would always be compiled at runtime on each and every run of the application. Depending on the application’s size and host's processing capabilities, this action could lead to undesirably long load times. Two advancements in Qt now make it possible to greatly speed up the startup of complex QML applications. Both of which provide the same startup performance boost. They are:
The Qt Quick Cache saves runtime-compiled QML to disk in a temporary location so that after the first run when qml gets compiled, it can be directly loaded on subsequent executions instead of running costly compiles every time.
Pros: Can greatly speed up complex applications with many qml files Cons: If your device has a very small storage device, the operating system may clear caches automatically, leading to occasional unexpected long startup times. Difficulty to adopt: None! This process is automatic and does not need to be manually implemented.
The Quick Compiler allows a QML application to be packaged and shipped with pre-compiled QML. Initially available under commercial license from Qt 5.3 onwards, it is available for both commercial and open-source users from Qt 5.11 onwards.
Pros: Using Quick Compiler has the advantage of not needing to rely on the runtime generated QML cache, so you never need to worry about a suddenly unexpected long startup time after a given application host clears its temporary files. Cons: None! Difficulty to adopt: Low. See the linked documentation. It’s often as simple as adding “qtquickcompiler” to CONFIG in your project’s .pro file!
Though Qt has been using Distance Fields in font rendering for a long time in order to have cleaner, crisper, animatable fonts, Qt 5.12 introduces a method for pre-computing the distance fields. Learn more about Distance Fields and implementation in this blog post by Eskil.
Pros: Using pre-generated Distance Field fonts can drastically reduce start-up performance when using complex fonts like decorative Latin fonts, Chinese, Japanese, or Sanskrit. If your application uses a lot of text, multiple fonts, or complex fonts, pre-generating your distance fields can knock off a huge chunk of time to startup. Cons: Generated distance field font files will be marginally larger on disk than standard fonts. This can be optimized by selecting only the glyphs that will appear in your application when using the Distance Field Generator tool. Non-selected glyphs will be calculated as-needed at runtime. Difficulty to adopt: Low. See the linked documentation. No additional code is necessary, and generating the distance fields for you font takes seconds.
Providing OpenGL with compressed textures, ready to be uploaded to video memory right out of the gate, saves Qt from need to prepare other file types (jpg, png, etc…) for upload.
Pros: Using compressed textures provides a faster startup time, decrease in memory usage. It may even provide a bit of performance boost depending on how heavy your texture use is, and how strong of compression you choose to utilize. Cons: While the compression algorithms in use for textures inherently require some tradeoff in visual fidelity, all but the most extreme compression schemes will usually not suffer any visible fidelity loss. Choosing the right compression scheme for your application’s use case is an important consideration. Difficult to adopt: Low +. See this blog post by Eirik for implementation details. Almost no coding is required, needing only to change texture file extensions in your qt code. Easy-to-use tools for texture compression are available, like the “texture-compressor” package for Node.
The i.MX6 is a great representation of mid-tier embedded hardware, and the performance improvements included in Qt 5.12 LTS really shine in this realm. Stack all the improvements together and you can really cut down on the startup time required in low power devices.
With these latest test results for low-power hardware, Qt 5.12 could lend a hand to your development by greatly decreasing startup times, particularly when running on low and mid-tier embedded devices. These new performance improvements are easy to adopt, requiring only the most minor of changes to your codebase, so there’s very little reason to not start using Qt 5.12 right away, especially if your project is cramming heavy QML applications into a fingernail sized SoC. The chart below is a reminder of what’s possible with Qt 5.12 LTS, and faster start-up time makes happier customers.