Moving code around

In the past I have blogged about some highly technical subjects, like Binary compatibility, Calling convention, Memory ordering semantics and the effect of one single volatile in optimisation. By the way, memory ordering semantics have been in the Qt Developer Days yearly "fact or crap" quiz twice in a row now.

Yesterday, a colleague in the office asked me to explain what relocations were and how they worked. I spent some hours explaining it to him, so I thought it would be a good idea to explain here too. And I'm not sure now, but I think another colleague had asked me to write about this before.

So, anyway, what are relocations? The verb to "relocate" is means to locate somewhere or, in the specific case of computer software, like I said in the title of this blog, to move code around. Another name, a bit more generic, is that it's a "fixup table", indicating where code needs to be fixed up.

I'm going to focus on Linux here because it's the easiest for me, but the principles apply everywhere, even if under slightly different semantics.

Compile-time relocations

There are two kinds of relocations that are relevant to us: the compile-time ones and the run-time (load-time) ones. Let's start with the simpler ones to understand.

If you write the following code in C or C++:

void *malloc(size_t n);

void *qMalloc(uint n)
return malloc(n);

It's clear that the malloc function isn't part of this compilation unit. So when the compiler compiles this file (and the assembler assembles the output), it cannot contain an actual definition of the malloc function. This symbol is going to come from the C library, which doesn't enter the picture until the linking stage. In fact, the linker has its name because of this main feature: it links the references to the definitions of the symbol.

The output of the compiler is the assembly code:

pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl 8(%ebp), %eax
movl %eax, (%esp)
call malloc

And after assembling, this is the contents of the .o (objdump -Cdr):

00000000 <qMalloc>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 18 sub $0x18,%esp
6: 8b 45 08 mov 0x8(%ebp),%eax
9: 89 04 24 mov %eax,(%esp)
c: e8 fc ff ff ff call d
d: R_386_PC32 malloc
11: c9 leave
12: c3 ret

What happened here? In the assembly source code as generated by the compiler, malloc appears as just any other symbol name. But when assembling it, the assembler couldn't find the definition of that symbol, so it had to leave behind a reference to the name. Now, the processor instruction encoding requires a value, not a name, so this note is left "out of band". That's the relocation, and you can see it in the disassembly from objdump.

Note the properties of the relocation: it contains the address it affects (addresss 0xd), a type (R_386_PC32) and a target address. Since relocations are supposed to modify processor binary code, they are processor specific -- the type here indicates the target's address should be replaced by the displacement from the Program Counter. If we try the same with ARM code (compiled with -O1 for brevity):

00000000 <qMalloc>:
0: e92d4008 push {r3, lr}
4: ebfffffe bl 0
4: R_ARM_CALL malloc
8: e8bd8008 pop {r3, pc}

We see now an ARM relocation that is specific for making function calls.

Run-time relocations

A bit of history

Run-time relocations are a bit harder to explain. It's simpler if I go back in time to an example of where they didn't exist. So let's go back to the time of DOS 1.0, when it could run only ".com" executables. Due to the 8086 16-bit segmented architecture, each segment can address only 64 kB of memory (you needed to use the segment registers to address the rest of the 1 MB of memory). The format of a .com executable was simple to the extreme: the operating system found a segment of memory that was free, loaded the entire executable starting at offset 0x100 in that segment and transferred control there. So the application started running at the first byte in the executable.

Since each segment could address only 64kB of memory and the entire program was loaded into one, effectively the entire program's code and data was confined to 64kB - 256 bytes. It couldn't be any larger.

That soon became a limitation that needed to be overcome. That's where the ".exe" files came into place, in the DOS MZ executable format. It lifted the limitation of 64kB by introducing relocations, which were also very simple compared to today's standards. The entire relocation table was a list of offsets in the executable's code and data whose values should be replaced. So the operating system would do what it did before and load the executable at the start of available memory. But then, before transferring control to the executable, it would go over this list and add the value of the segment where the program had been loaded. So if the value in the disk was "0x0000", in memory it would contain the value of the segment where the program had been loaded; if the value on disk were "0x0100", the value on memory would be the program's load address plus 256.

Today's relocations

Today's relocations are a bit more complex than that, especially since we several features that DOS didn't have: dynamic linking, virtual memory, and multiple processes. In modern multi-purpose operating systems, in a process's address space you cannot find anything else but that process. This is due to the virtual memory, which allows the operating system to map any virtual address to a different physical address.

With that in mind, static executables are quite dumb today. Since the operating system can load the program anywhere that it wants to, it does just that. The linker chooses a value at link time and it's not supposed to change. So no relocations are necessary.

But if you add the dynamic linking, things start to get complicated. And they'll get even more complicated afterwards. So let's start with our example above of qMalloc. If we compile it to an executable and dump again, we see:

080481a8 <qMalloc>:
80481a8: 55 push %ebp
80481a9: 89 e5 mov %esp,%ebp
80481ab: 83 ec 18 sub $0x18,%esp
80481ae: 8b 45 08 mov 0x8(%ebp),%eax
80481b1: 89 04 24 mov %eax,(%esp)
80481b4: e8 df ff ff ff call 8048198
80481b9: c9 leave
80481ba: c3 ret

The relocation we had "R_386_PC32 malloc" was replaced by an actual value: 0x8048198, which corresponds to the symbol malloc@plt. How does replacing one symbol for another even work here? Wouldn't it simply run into the same problem that we don't know what the address of this symbol is? Well, the trick here is that we do know the address of the malloc@plt symbol. It's not the actual malloc function, as you'll see below. We know its address because the symbol is in the executable, not in the library: the linker knew the exact address to place there.

Anyway, if we go look for that symbol, we see:

8048198: ff 25 68 92 04 08 jmp *0x8049268
804819e: 68 00 00 00 00 push $0x0
80481a3: e9 e0 ff ff ff jmp 8048188

Which reads "jump to the address pointed by 0x8049268" (never mind the two instructions afterwards, they're not relevant for the discussion). And if we look at the dynamic relocation table by running objdump -R, we see:

08049268 R_386_JUMP_SLOT malloc

The dynamic relocation table is just like the normal one, except it's to be treated by the dynamic linker or program loader. The relocation type above simply says that the loader should find the address of the malloc function and write it in full at that address in memory.

Looking at the ARM assembly, we see it did the same to our function:

0000844c <qMalloc>:
844c: e92d4008 push {r3, lr}
8450: ebffffc8 bl 8378 <_init+0x48>
8454: e8bd8008 pop {r3, pc}

Which calls the address 0x8378:

    8378:       e28fc600        add     ip, pc, #0, 12
837c: e28cca08 add ip, ip, #8, 20 ; 0x8000
8380: e5bcf2c8 ldr pc, [ip, #712]! ; 0x2c8

This one is a bit hard to read, but it's trying to load into the Program Counter the value of 0x8378+8+0x82c8 (the instruction at 0x8378 loads the value of PC and the ARM manual says it uses the actual value plus 8). And if we look at the relocation table, we see:

00010648 R_ARM_JUMP_SLOT malloc

Summarising, the linker replaced the direct call relocation by a call to an instruction or set of instructions that do an indirect jump. Why did it do that?

Well, consider that your average program is a bit longer than my source code at the beginning of the blog. It usually has more than a few calls to malloc. So instead of fixing up each and every point in the code where a call to malloc is made, the use of the indirect jump slot, or trampoline, makes it possible to fix only one address in memory. That's a load-time performance. Also remember that some architectures, like ARM or the 64-bit architectures, have a limited range of displacements a function call or jump can take, so you can't call very far.

This is the kind of relocation one finds in executables in Linux today when function calls are placed. When you're accessing variables defined in other libraries, it works slightly differently -- on some platforms it uses a very ugly kind of relocation that you really don't want to know about. The important thing is that this designed so that the binary code of an executable (the so-called "text") doesn't need any fixups. Instead, they are all concentrated at other portions of the process's image, which further improves the load performance (cache locality, fewer pages differ from the disk contents, etc.)

The catch here is that the code is "position dependent": it can only be loaded at one specfic address in memory, or it will fail miserably to run. In the next blog, I'll explore what "position independent code" means and how it's implemented. In the meantime, you may want to read up on these concepts: clean and dirty memory, pure and impure pages.

Blog Topics: