QStringView Diaries: QAnyStringView - A Variant String-View

In Qt, the vast majority of strings are held in QString objects, and most functions take strings by const QString& and return by QString. This works fine in practice, because QString is so readily created from string literals that for the most part, you don't need to pay attention. The compiler will helpfully convert string literals to QString when calling such functions. It doesn't convert std::string, nor even std::u16string, but who cares about those? :)

This pseudo-convenience comes at a cost, though. Contrary to what the documentation may have you believe, constructing a QString and even copying one are far from being cheap. Let's take a look at an example.


  void consumeQString(const QString &) noexcept;
  consumeQString("hello, world");

If you didn't have a QString class, but only C strings, you would expect that all this does is loading the address of the string literal, and then calling the function. Thanks to QString, this is what happens when we compile this with GCC 11, -std=c++20 -O2, though:


callConsumeQStringHelloWorld():
        pushq   %rbp
        movl    $12, %esi
        leaq    .LC76(%rip), %rdx
        subq    $32, %rsp
        movq    %rsp, %rbp
        movq    %rbp, %rdi
        call    QString::fromUtf8(QByteArrayView)@PLT
        movq    %rbp, %rdi
        call    consumeQString(QString const&)@PLT
        movq    (%rsp), %rax
        testq   %rax, %rax
        je      .L834
        lock subl       $1, (%rax)
        je      .L839
.L834:
        addq    $32, %rsp
        popq    %rbp
        ret
.L839:
        movq    (%rsp), %rdi
        movl    $8, %edx
        movl    $2, %esi
        call    QArrayData::deallocate(QArrayData*, long long, long long)@PLT
        addq    $32, %rsp
        popq    %rbp
        ret

To avoid skewing the result, we marked the callee noexcept, even though that's not a common thing to do. I can positively assure you that you don't want to see the assembly with consumeQString() not marked as noexcept.

Even if you, as I, don't speak x86 assembler, we can see that we do load the address of the string literal (lea), in order to construct a QByteArrayView in registers (size in %esi, pointer in %rdx), passing it to QString::fromUtf8(QByteArrayView). That places the resulting QString into the caller's stack frame (return by value), so we can just pass the QString's address to consumeQString() (pass by reference-to-const). So far, this is all expected.

The first unusual thing that should jump out at you (no pun intended) are the two branches following the call to consumeQString(). The original C++ code doesn't have branches, so where do they come from? The lock subl gives it away: it's an atomic decrement operation that, when the result is zero, jumps to .L839. Get this: the whole code after the the call to consumeQString(), until the end of the function, excluding the duplicated bits between the two labels .L834 and .L839, are the inlined destructor of QString. Don't believe? Switch off optimisations:


callConsumeQStringHelloWorld():
        pushq   %rbp
        movq    %rsp, %rbp
        subq    $32, %rsp
        leaq    -32(%rbp), %rax
        leaq    .LC5224(%rip), %rdx
        movq    %rdx, %rsi
        movq    %rax, %rdi
        call    QString::QString(char const*)
        leaq    -32(%rbp), %rax
        movq    %rax, %rdi
        call    consumeQString(QString const&)@PLT
        leaq    -32(%rbp), %rax
        movq    %rax, %rdi
        call    QString::~QString()
        nop
        leave
        ret

There: no more jumps, but a call to a QString constructor and the destructor, as expected. This author thinks it's fair to say that the construction and destruction of the QString object dominate the code of the entire function.

Now multiply that code by the 100s of calls to QString-taking functions being passed string literals, and you know you have an opportunity to improve: This very same constructor/destructor code is literally being littered all over the code base by the trigger-happy optimizer. Even the more compact code of the debug build has that property.

The underlying problem is that QString is not a trivial type. You can see the difference very clearly in the optimized version where we call fromUtf8(): Even though we construct an object of class type there, too, and even though that object is passed by value, we see nothing of it in the final assembly, except two stores of constants into each its CPU register. QByteArrayView is a trivial type. In particular, its small enough and trivially copyable, so most platform ABIs allow the compiler to pass it in registers: it never even hits the stack!

Wouldn't it be great if QString was also trivial? Yes, it would, but it's not possible, because QString owns its data and therefore needs to manage the lifetime of dynamically-allocated memory.

But we can use views instead:


void consumeQLatin1String(QLatin1String) noexcept;
void consumeQStringView(QStringView) noexcept;

callConsumeQLatin1StringHelloWorld():
        movl    $12, %edi
        leaq    .LC76(%rip), %rsi
        jmp     consumeQLatin1String(QLatin1String)@PLT

Yay, jackpot: tail-call! This does the very bare minimum necessary to call the function, and even tail-calls (jmp instead of call) it, so the ret in consumeQLatin1String() will return to the caller of callConsumeQLatin1StringHelloWorld(), not to callConsumeQLatin1StringHelloWorld() itself. Wow. Just ... wow.

Problem:


consumeQLatin1String("hello, world");

doesn't compile. You need to say


consumeQLatin1String(QLatin1String("hello, world"));

instead. It's a bit of a mouthful, but QtCreator can do it for you: place the cursor on the string literal and hit CTRL-Enter.

If you ever wondered why a lot of high-performance code in QtCore (esp. in the string classes themselves) have overloaded functions for QString and QLatin1String; now you know. The QLatin1String overload just expands to that much less code, and, if the implementation honors the implicit user expectation and has a fast-path for QLatin1String instead of simply converting the QLatin1String to a QString and then calling the QString overload, actually avoids a memory allocation, too.

But what if the QLatin1String overload cheated and did just call the QString overload? Well, if it does so out-of-line, behind the ABI boundary, then the calling code will still look the same. The implementation will have all the extra QString code - but that's one place (O(1)), instead of once per caller (O(N)).

This is important to understand, so let me rephrase that: The sequence of salient assembler instructions would be the same whether you took by QString or by QLatin1String, converting to QString in the implementation, but a very large percentage would be de-duplicated in the implementation instead of copied into each call site anew. This is equivalent to compressing the code, increasing effective instruction cache size. QtCore will get a bit larger, but any user of QtCore will get smaller. You probably know the principle from enterprise file systems: they detect duplicate file content and, by de-duplicating it, can store more data in the same physical disks/ssds than without.

Next problem:


consumeQLatin1String(QLatin1String("ä"));

doesn't work. The string literal is UTF-8 encoded, which Qt enforces, and that's encoded as two octets, 0xC3, 0xA4, whereas ä in Latin-1 is a single character, 0xE4. So you need to write


consumeQLatin1String(QLatin1String("\xE4")); // ä

We have been ok with that in the past, though, because the generated code is so much more compact and efficient.

Of course, we also have QStringView. That, too, doesn't accept a simple string literal, the way QString would. We must, instead, pass a char16_t[] literal:


consumeQStringView(u"hello, world");

And the result?


callConsumeQStringViewHelloWorld():
        leaq    2+.LC77(%rip), %rax
        leaq    24(%rax), %rdx
        jmp     .L842
.L844:
        movq    %rax, %rdi
        addq    $2, %rax
        cmpw    $0, -2(%rax)
        je      .L846
.L842:
        cmpq    %rax, %rdx
        jne     .L844
        movl    $13, %edi
        leaq    .LC77(%rip), %rsi
        jmp     consumeQStringView(QStringView)@PLT
.L846:
        leaq    .LC77(%rip), %rsi
        subq    %rsi, %rdi
        sarq    %rdi
        jmp     consumeQStringView(QStringView)@PLT

Whoops. This shouldn't happen™. This looks so awfully broken that I suspect a compiler error. Indeed, with Clang++, we get the expected


callConsumeQStringViewHelloWorld():
        leaq    .L.str.3401(%rip), %rsi
        movl    $12, %edi
        jmp     consumeQStringView(QStringView)@PLT # TAILCALL

The problem with using QStringView are the char16_t literals. Here's how the data is stored in the executable for QLatin1String:


        .string "Hello World"

and here for QStringView:


        .string "h"
        .string "e"
        .string "l"
        .string "l"
        .string "o"
        .string ","
        .string " "
        .string "w"
        .string "o"
        .string "r"
        .string "l"
        .string "d"
        .string ""

Don't worry about the multiple strings. That's just GCC's way of telling the assembler to insert '\0' bytes in between each character, blowing the string up to UTF-16, ie. twice the size of the Latin-1 version.

So, in Qt 6, we added QUtf8StringView to complete the string view set for UTF-8, an encoding that, like Latin-1, is a superset of US-ASCII, but, unlike Latin-1, can represent the whole Unicode range.

With that, we can now overload our high-performance string APIs like this:


void consume(const QString &);
void consume(QStringView)
void consume(QChar c) { consume(QStringView{&c, 1}); }
void consume(QLatin1String);
void consume(QUtf8StringView);

this does not include legacy API like


void consume(const char *);
void consume(const char *, qsizetype);

which are still found in Qt APIs, too (and the first one is actually required to disambiguate between QString and QUtf8StringView).

We can't really get rid of any of these, because QStringView isn't constructible from a single QChar, nor from, say, a QStringBuilder expression, so the QString overload needs to stay, and, therefore, the QChar one, because we don't want it to be implicitly converted into QString.

Likewise, we can't really get rid of QLatin1String because Latin-1, unlike UTF-8, is a fixed-width character encoding, allowing comparisons with UTF-16 to filter out negatives with a size check before traversing both strings.

I don't know about you, but I always found this dissatisfying. Even without legacy overloads, that's a whopping five overloads for all functions taking a string. Clearly, this does't fly with, say, the maintainers of QLabel::setText().

There's another problem: what if the function takes two strings?


void consume2(const QString &, const QString &);
void consume2(QStringView, const QString &)
void consume2(QChar, const QString &);
void consume2(QLatin1String, const QString &);
void consume2(QUtf8StringView, const QString &);

void consume2(const QString &, QStringView);
void consume2(QStringView, QStringView)
void consume2(QChar, QStringView);
void consume2(QLatin1String, QStringView);
void consume2(QUtf8StringView, QStringView);

void consume2(const QString &, QChar);
void consume2(QStringView, QChar)
void consume2(QChar, QChar);
void consume2(QLatin1String, QChar);
void consume2(QUtf8StringView, QChar);

void consume2(const QString &, QLatin1String);
void consume2(QStringView, QLatin1String)
void consume2(QChar, QLatin1String);
void consume2(QLatin1String, QLatin1String);
void consume2(QUtf8StringView, QLatin1String);

void consume2(const QString &, QUtf8StringView);
void consume2(QStringView, QUtf8StringView)
void consume2(QChar, QUtf8StringView);
void consume2(QLatin1String, QUtf8StringView);
void consume2(QUtf8StringView, QUtf8StringView);

Or, heaven forbid, three?

"Nobody in their right mind would write such an overload set," I hear you say. And I answer: have you looked at QString::replace()?

Since it'll be Christmas soon, let's make a wishlist: I'd like a magic string type with which I can write


void consume2(QMagicString, QMagicString);

It should have optimum efficiency for every possible call and be the only function I need to write. Let's see what said QMagicString would need to offer:

First, it would need to accept anything that the overload sets above would accept, too, to wit:

  • QString, or anything that implicitly converts to it
  • QStringView, or anything that implicitly converts to it
  • QChar, or anything that implicitly converts to it (within reason; QChar's ctors are a mess)
  • QLatin1String (nothing implicitly converts to QLatin1String)
  • QUtf8StringView, or anything that implicitly converts to it

Second, it would need to be a view type, because we've seen that only Trivial Types give optimal results in calling code, and owning containers cannot be Trivial Types.

Third, since it cannot allocate, the type must detect and preserve the information about the encoding used to construct it (UTF-8, L1, UTF-16).

Something like std::variant<QStringView, QLatin1String, QUtf8StringView, QChar>. We cannot add QString to the variant, because that would make the variant be non-trivial again.

It just so happens that I added exactly that to Qt 6.0: It's called QAnyStringView and it's at your service:


callConsumeQAnyStringViewHelloWorld():
        leaq    .LC76(%rip), %rdi
        movl    $12, %esi
        jmp     consumeQAnyStringView(QAnyStringView)@PLT

(yes, the .LC76 means that the string literal used here is shared with the one used to construct the QLatin1String in callConsumeQLatin1StringHelloWorld() in the same TU).

By changing from QString to QAnyStringView, you allow each caller to pass its own preferred data structure, in any of the three encodings, and the call will always be optimally efficient for the caller! This is the pinnacle of API design: the simplest use of the API is also the most efficient.

With one exception: when your chosen storage type matches the caller's choice of data structure exactly, then you could have written a function that takes the storage type by rvalue reference, and simply std::move()d the argument to your member variable. Yes, this is quite likely in Qt code for QString as a storage type, but you don't know that for a fact.

For example, the function taking the string could simply not store it. That's true for virtually all parsers, e.g. QVersionNumber::fromString(), QUuid::fromString(), QColor::fromString(), etc. In this case, QAnyStringView causes a duplication of code in the implementation, by instantiating one code path each for the three supported encodings, but first, this duplication has often been there already, in the form of overloaded functions, and second, the compiler can often do that work for you - you just write the parser as a template for any of the three view classes.

We'll see how to do that, exactly, in a follow-up blog post. Suffice to say for now that it's done via QAnyStringView::visit().

Or the function could do some preprocessing on the string, causing a detach() from a QString passed by reference-to-const, anyway, in which case it also doesn't matter that you now take by QAnyStringView. Only taking by QString&& would - potentially - be faster, but only if the QString passed isn't shared with another one (lest we detach()) and the preprocessing doesn't make the string larger (lest we need to reallocate). But that are two very big ifs. Esp. the non-shared part. It effectively means that the caller might have to control the construction of the QString, and then we'd be back to square one and the littering of call sites with QString destructor code.

Or the function stores the string, but not in a QString, but in, say, std::u16string (to gain the small-string optimisation QString still lacks) or QVarLengthArray<char16_t>.

As I said in my QStringView talks: using views in the API instead of owning containers allows the caller and the implementation to choose their optimal data structure independently of each other, and therefore increases encapsulation. I call this the NOI (Non-Owning Interface) Idiom and QAnyStringView is the master-class NOI for string data. I suggest you take a look.

In the next post of this series, we'll look at some functions that take QAnyStringView and inspect how they cope with the problem of having to handle three different encodings. I'm sure you'll be able to find a strategy that works for your use case, too.

Yes, the gentleman in the blue jacket at the back has a question?

The question was: "Does it matter at all? Isn't text rendering so much slower, anyway?"

It probably is, but, first, not every processed string is rendered on screen, and, second, execution speed is but one side of the medal here: Who would not like a 7.3% client code executable size shrink without changing the client code? QAnyStringView is proven to give you that, and possibly more: https://codereview.qt-project.org/c/qt/qtbase/+/353688.


Blog Topics:

Comments