Optimizing the QML compiler pipeline

Qt QML is our engine to run and execute QML, the language used to create Qt Quick based user interfaces. Qt QML does not have any dependencies on graphical things, and in addition to the QML language support, it features a Javascript engine that is compliant with ECMAScript 5.1.

As the foundation for our Qt Quick user interface technology, the module is core to many of our users. Internally, it mainly consists of two parts. A compiler pipeline used to compile QML and ECMAScript down to binary data structures and bytecode/assembly and a runtime that assists in executing the generated code. This talk will focus on the first part.

The compiler pipeline that we used in Qt QML until version 5.10 was growing increasingly complex in recent Qt versions.

Below, you can see a simplified picture of the old compiler pipeline.


We basically started off by sending QML and Javascript code into a Lexer and Parser, which created an Abstract Syntax Tree (AST) for the code being compiled. This AST was then sent into a class called Codegen, that generated an Intermediate Representation (IR) out of the AST. The IR was then sent through various optimization stages, doing dead code elimination, some limited type deduction, and other tricks.

As a next step, the optimized IR was then sent into one of two Instruction selection backends (Isel), that generated either Bytecode or Assembly (and some binary data stored in the Compilation Unit) out of the IR.

Together, this was a rather long and complex pipeline. The drawbacks of it were a rather high cost for compiling QML/JS, leading to relatively long startup times. In addition, there was no way to mix and match bytecode and assembly. It was either one or the other.

Other Javascript engines (such as v8 and JavascriptCore) have lately been moving over to a different pipeline, where you directly generate Bytecode out of the AST and then use that Bytecode as a basis for the JIT. There are several advantages to such an approach. First, you gain a platform-independent representation of your QML/JS in bytecode format. Startup is much faster, as parsing QML/JS to generate the bytecode is a very fast step. You can rather easily collect tracing information on the generated bytecode for a second stage JIT compiler, that then generates fully optimized assembly.

Last summer, we did a small research session and tried a similar approach for Qt QML. We got extremely far with only two days of work and could see that this would lead to huge simplifications in our internal compiler pipeline.

Because of this, we then sat down and implemented the new pipeline we had prototyped with the goal of having it ready for Qt 5.11. The picture below gives you an idea how the new pipeline is looking.


With this new pipeline, we now directly generate Bytecode from the Codegen, completely skipping any intermediate representation. This bytecode is platform independent and can be executed directly. Where we see hot code paths (currently functions that are executed often enough), we invoke a JIT that compiles the bytecode down to assembly.

At the same time, we’ve been completely reworking our bytecode format and the interpreter to boost its performance. Our main goals here were to generate a very compact bytecode format to save RAM, disk space and have an efficient and fast interpreter on top. We are pretty happy with the results we achieved, showing that the new interpreter is almost twice as fast as the old one, and achieves around 80-90% of the performance of the old JIT. The bytecode is also very compact, using up significantly less memory than the old format. Here’s a short example showing you the bytecode of a simple function:

function add(x, y) {
return x + y;

=== Bytecode for "add" strict mode false 1 0: 0a 06 LoadReg a2 2: 6a 05 Add a1, acc 4: 00 Ret

As you can see, the generated bytecode is just 5 bytes, loading argument 2 into the accumulator, adding argument 1 to it, and then returning the result.

Once we had the new interpreter implemented, we then went ahead and added a relatively simple hotspot JIT on top of the interpreter, to further optimize frequently executed functions. The JIT doesn’t yet use any tracing information or other optimizations, but already beats the performance of the JIT we had in 5.10. The graph below gives you an overview of the performance improvements when running our v8-bench performance test suite (higher numbers are better). New Interpreter/JIT refers to the interpreter in Qt 5.11, old Qt 5.9.



These improvements allowed us to rework out QML caching and ahead-of-time compiling infrastructure. Both now cache the generated Compilation unit and the Bytecode on disk, instead of trying to generate assembly. This leads to significantly smaller binaries keeping the fast startup times. And with the fast, new interpreter, and the hotspot JIT this beats the old QML compiler in runtime performance.

As a side-effect, we now unified things, and made the ahead-of-time QML compiler available in the open source version of Qt as well. But be aware, that by using the compiler you will be tying yourself to the exact patch level version of Qt you’re using, as we are not guaranteeing that the bytecode format will stay compatible between versions. The transparently generated .qmlc/.jsc files however do contain a version check, and will get updated whenever the underlying .qml/.js files or the patch level version of Qt changes.

Simplifying the compiler pipeline also enabled us to simplify and clean up other parts of our code base. We re-architectured the calling convention and the way we set up stack frames when calling JS/QML functions saving a lot of overhead when invoking functions.

One place where this can be seen is in a small benchmark where we do recursive function calling to compute a Fibonacci series. That benchmark became almost 3 times as fast in Qt 5.11 compared to earlier Qt versions.

Moving forward towards Qt 5.12 and later, we have more good stuff coming. There is research ongoing towards a tracing JIT, and we’re also working on updating our engine to fully support Ecmascript 7. But more on those topics in a future blog post.

Blog Topics: