Type-punning and strict-aliasing

A couple of months ago, I posted a lengthy email to the internal discussion list about the problems with type punning and breaking of strict aliasing. At the time, my sole objective was to clean up any warnings in my build of Qt 4.7 when using GCC 4.5. Since then, GCC 4.6 was released and bug QTBUG-19736 was reported against the very same code I had been trying to clean-up.

So a colleague nudged me into posting the content of my email explaining the situation on the blog. Here it goes, with a couple of updates:

The warning from GCC reads:

"dereferencing type-punned pointer will break strict aliasing"

It's quite scary because it says this will break stuff. So what is this?

Quoting from http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html:

One pointer aliases another when they both point to the same memory location.

Type-punning is the trick to refer to an object by another type. Strict aliasing is the requirement from C99 that an object be accessed only by its own type or by char (see the exact definition from C99 below). That means the following is not acceptable:

        int i = 42;
short s = *(short*)&i;

The above will probably work (emphasis on probably), but the results are undefined. That means the compiler is free to do anything, like emailing your boss about this transgression. But even without the strict-aliasing rule, the code above still has no defined behaviour, as it has two possible results: 0 or 42, depending on the endianness.

And that was going to a type with less strict alignment rules. Increasing the requirement like this:

        short s[2] = { 0, 42 };
int i = *(int *)s;

Has three possible outcomes: i == 0, i == 42 or crash (unaligned 4-byte load).

Now, it gets interesting when we put together the strict aliasing rule with optimisations. The compiler is allowed by the standard to assume that dereferencing pointers to objects of different types will never refer to the same memory location (they will not alias each other).

That means this code:

        void *buf[4];
int *i = (int *) buf;
short *s = (short *) buf;

*i = 42;
s[0] = 0;
s[1] = 1;

printf("%dn", *i);

is also undefined, because you broke the rule. The above can print three different things (or defrost your fridge):

    1 (big-endian architecture)
    65536 (little-endian architecture)

The reason why it can print 42 is because the compiler is allowed to assume that the short *s variable never aliases the int *i one. That means it knows *i == 42 and can optimise the short code out of existence. In fact, that's exactly what GCC 4.5 does and the disassembly of the above code confirms it.

This also applies to the code:

        union {
int i;
short s;
} u;
u.i = 42;
u.s = 1;

printf("%dn", u.i);

The behaviour above is undefined according to the C standard. It is, however, accepted by GCC (it prints 1 on x86) -- but not by other compilers. I refer to you to the case of c1b067ea8169e1d37e2a120334406f1f115298bb. QMutexLocker had:

    union {
QMutex *mtx;
quintptr val;

And we did:

            if ((val & quintptr(1u)) == quintptr(1u)) {
val &= ~quintptr(1u);

Which was meant to be read as "if the pointer address has the lowest bit set, unset it and call mtx->unlock()". However, this breaks the strict-aliasing rule and Sun CC generated bad -- but perfectly valid -- code which called QMutex::unlock() with the lowest bit still set. Of course, the code crashed.

A much harder to see case was 2c1b11f2192fd48da01a1093a7cb4a848de43c8a (task 247708, sorry, not imported into the new bugtracker), affecting QDataStream's byte-swapping code. It did:

QDataStream &QDataStream::operator>>(qint16 &i)
register uchar *p = (uchar *)(&i);
char b[2];
if (dev->read(b, 2) == 2) {
*p++ = b[1];
*p = b[0];

Looks safe, right? Well, it wasn't and for a while I wasn't convinced this wasn't a compiler bug. What happened was that, due to Link Time Code Generation, MSVC inlined the operator>> above and removed the code that actually set the qint16 variable. See below the C99 definition to see why I wasn't convinced.

Anyway, the problem is that the optimisations done by the compiler obey the strict aliasing rule, which can produce unexpected results. Take the example from this blog:


The author had code he thought was valid and had been working for a long time. An upgrade to GCC 4.4 broke the code, but that's because there was violation to strict aliasing. The line

        tail = (Node *) &list;

created a type-punned variable and its dereferencing in

        tail->next = node;

was the violation.

So you may ask why this rule is there at all. Well, the reason is that this rule is very useful for compiler optimisations, as they allow the compiler to take some liberties that it otherwise couldn't. Take this example from the first part of the blog What Every C Programmer Should Know About Undefined Behavior:

float *P;
void zero_array() {
int i;
for (i = 0; i < 10000; ++i)
P[i] = 0.0f;

The author of the blog tells you that

this rule allows clang to optimize this function [...] into "memset(P, 0, 40000)"

and that without such liberty

Clang is required to compile this loop into 10000 4-byte stores (which is several times slower)

A bit further down and in part 3, the author explains why that is. So think about it: without strict-aliasing, why must the compiler do 10000 4-byte stores instead a simple 40000-byte memset? While you think about it, here's the link to part 2.

The reason is that the compiler cannot assume that the value of P (a pointer type) stays unchanged while it's storing values to each of P[i] (a float type). Someone could write the following code:

int main() {
P = (float*)&P;

In that case, the store to P[0] also changes the value of P, so the next operation (the storing at P[1]) must be computed at a different address.

With careful coding, it's possible to use type-punning and not break strict aliasing. However, it's very hard to do so and it's also hard not to throw the compiler into a fit. So, the rule of thumb is: if you type-pun (that is, if you cast to a different pointer type via C-cast or reinterpret_cast) you should think again. If you type-pun and dereference, think again twice.

C99 6.5 7:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.