Moving code around

December 04, 2010 by Thiago Macieira | Comments

In the past I have blogged about some highly technical subjects, like Binary compatibility, Calling convention, Memory ordering semantics and the effect of one single volatile in optimisation. By the way, memory ordering semantics have been in the Qt Developer Days yearly "fact or crap" quiz twice in a row now.

Yesterday, a colleague in the office asked me to explain what relocations were and how they worked. I spent some hours explaining it to him, so I thought it would be a good idea to explain here too. And I'm not sure now, but I think another colleague had asked me to write about this before.

So, anyway, what are relocations? The verb to "relocate" is means to locate somewhere or, in the specific case of computer software, like I said in the title of this blog, to move code around. Another name, a bit more generic, is that it's a "fixup table", indicating where code needs to be fixed up.

I'm going to focus on Linux here because it's the easiest for me, but the principles apply everywhere, even if under slightly different semantics.

Compile-time relocations

There are two kinds of relocations that are relevant to us: the compile-time ones and the run-time (load-time) ones. Let's start with the simpler ones to understand.

If you write the following code in C or C++:

void *malloc(size_t n);
void *qMalloc(uint n)

{

    return malloc(n);

}

It's clear that the malloc function isn't part of this compilation unit. So when the compiler compiles this file (and the assembler assembles the output), it cannot contain an actual definition of the malloc function. This symbol is going to come from the C library, which doesn't enter the picture until the linking stage. In fact, the linker has its name because of this main feature: it links the references to the definitions of the symbol.

The output of the compiler is the assembly code:

qMalloc:

        pushl   %ebp

        movl    %esp, %ebp

        subl    $24, %esp

        movl    8(%ebp), %eax

        movl    %eax, (%esp)

        call    malloc

        leave

        ret

And after assembling, this is the contents of the .o (objdump -Cdr):

00000000 <qMalloc>:

   0:   55                      push   %ebp

   1:   89 e5                   mov    %esp,%ebp

   3:   83 ec 18                sub    $0x18,%esp

   6:   8b 45 08                mov    0x8(%ebp),%eax

   9:   89 04 24                mov    %eax,(%esp)

   c:   e8 fc ff ff ff          call   d 
  
 
 
 
 
    
  
  
  
  

                        d: R_386_PC32   malloc
    
  
  
  
  

  11:   c9                      leave
    
  
  
  
  

  12:   c3                      ret

What happened here? In the assembly source code as generated by the compiler, malloc appears as just any other symbol name. But when assembling it, the assembler couldn't find the definition of that symbol, so it had to leave behind a reference to the name. Now, the processor instruction encoding requires a value, not a name, so this note is left "out of band". That's the relocation, and you can see it in the disassembly from objdump.

Note the properties of the relocation: it contains the address it affects (addresss 0xd), a type (R_386_PC32) and a target address. Since relocations are supposed to modify processor binary code, they are processor specific -- the type here indicates the target's address should be replaced by the displacement from the Program Counter. If we try the same with ARM code (compiled with -O1 for brevity):

00000000 <qMalloc>:

   0:   e92d4008        push    {r3, lr}

   4:   ebfffffe        bl      0 
  
 
 
 
 
    
  
  
  
  

                        4: R_ARM_CALL   malloc
    
  
  
  
  

   8:   e8bd8008        pop     {r3, pc}

We see now an ARM relocation that is specific for making function calls.

Run-time relocations

A bit of history

Run-time relocations are a bit harder to explain. It's simpler if I go back in time to an example of where they didn't exist. So let's go back to the time of DOS 1.0, when it could run only ".com" executables. Due to the 8086 16-bit segmented architecture, each segment can address only 64 kB of memory (you needed to use the segment registers to address the rest of the 1 MB of memory). The format of a .com executable was simple to the extreme: the operating system found a segment of memory that was free, loaded the entire executable starting at offset 0x100 in that segment and transferred control there. So the application started running at the first byte in the executable.

Since each segment could address only 64kB of memory and the entire program was loaded into one, effectively the entire program's code and data was confined to 64kB - 256 bytes. It couldn't be any larger.

That soon became a limitation that needed to be overcome. That's where the ".exe" files came into place, in the DOS MZ executable format. It lifted the limitation of 64kB by introducing relocations, which were also very simple compared to today's standards. The entire relocation table was a list of offsets in the executable's code and data whose values should be replaced. So the operating system would do what it did before and load the executable at the start of available memory. But then, before transferring control to the executable, it would go over this list and add the value of the segment where the program had been loaded. So if the value in the disk was "0x0000", in memory it would contain the value of the segment where the program had been loaded; if the value on disk were "0x0100", the value on memory would be the program's load address plus 256.

Today's relocations

Today's relocations are a bit more complex than that, especially since we several features that DOS didn't have: dynamic linking, virtual memory, and multiple processes. In modern multi-purpose operating systems, in a process's address space you cannot find anything else but that process. This is due to the virtual memory, which allows the operating system to map any virtual address to a different physical address.

With that in mind, static executables are quite dumb today. Since the operating system can load the program anywhere that it wants to, it does just that. The linker chooses a value at link time and it's not supposed to change. So no relocations are necessary.

But if you add the dynamic linking, things start to get complicated. And they'll get even more complicated afterwards. So let's start with our example above of qMalloc. If we compile it to an executable and dump again, we see:

080481a8 <qMalloc>:

 80481a8:       55                      push   %ebp

 80481a9:       89 e5                   mov    %esp,%ebp

 80481ab:       83 ec 18                sub    $0x18,%esp

 80481ae:       8b 45 08                mov    0x8(%ebp),%eax

 80481b1:       89 04 24                mov    %eax,(%esp)

 80481b4:       e8 df ff ff ff          call   8048198 
  
 
 
 
 
    
  
  
  
  

 80481b9:       c9                      leave
    
  
  
  
  

 80481ba:       c3                      ret

The relocation we had "R_386_PC32 malloc" was replaced by an actual value: 0x8048198, which corresponds to the symbol malloc@plt. How does replacing one symbol for another even work here? Wouldn't it simply run into the same problem that we don't know what the address of this symbol is? Well, the trick here is that we do know the address of the malloc@plt symbol. It's not the actual malloc function, as you'll see below. We know its address because the symbol is in the executable, not in the library: the linker knew the exact address to place there.

Anyway, if we go look for that symbol, we see:

08048198 
  
 
 
 
 
  
  
  
  
    :
    
  
  
  
  

 8048198:       ff 25 68 92 04 08       jmp    *0x8049268
    
  
  
  
  

 804819e:       68 00 00 00 00          push   $0x0
    
  
  
  
  

 80481a3:       e9 e0 ff ff ff          jmp    8048188

Which reads "jump to the address pointed by 0x8049268" (never mind the two instructions afterwards, they're not relevant for the discussion). And if we look at the dynamic relocation table by running objdump -R, we see:

DYNAMIC RELOCATION RECORDS

OFFSET   TYPE              VALUE

08049268 R_386_JUMP_SLOT   malloc

The dynamic relocation table is just like the normal one, except it's to be treated by the dynamic linker or program loader. The relocation type above simply says that the loader should find the address of the malloc function and write it in full at that address in memory.

Looking at the ARM assembly, we see it did the same to our function:

0000844c <qMalloc>:

    844c:       e92d4008        push    {r3, lr}

    8450:       ebffffc8        bl      8378 <_init+0x48>

    8454:       e8bd8008        pop     {r3, pc}

Which calls the address 0x8378:

    8378:       e28fc600        add     ip, pc, #0, 12

    837c:       e28cca08        add     ip, ip, #8, 20  ; 0x8000

    8380:       e5bcf2c8        ldr     pc, [ip, #712]! ; 0x2c8

This one is a bit hard to read, but it's trying to load into the Program Counter the value of 0x8378+8+0x82c8 (the instruction at 0x8378 loads the value of PC and the ARM manual says it uses the actual value plus 8). And if we look at the relocation table, we see:

DYNAMIC RELOCATION RECORDS

OFFSET   TYPE              VALUE

00010648 R_ARM_JUMP_SLOT   malloc

Summarising, the linker replaced the direct call relocation by a call to an instruction or set of instructions that do an indirect jump. Why did it do that?

Well, consider that your average program is a bit longer than my source code at the beginning of the blog. It usually has more than a few calls to malloc. So instead of fixing up each and every point in the code where a call to malloc is made, the use of the indirect jump slot, or trampoline, makes it possible to fix only one address in memory. That's a load-time performance. Also remember that some architectures, like ARM or the 64-bit architectures, have a limited range of displacements a function call or jump can take, so you can't call very far.

This is the kind of relocation one finds in executables in Linux today when function calls are placed. When you're accessing variables defined in other libraries, it works slightly differently -- on some platforms it uses a very ugly kind of relocation that you really don't want to know about. The important thing is that this designed so that the binary code of an executable (the so-called "text") doesn't need any fixups. Instead, they are all concentrated at other portions of the process's image, which further improves the load performance (cache locality, fewer pages differ from the disk contents, etc.)

The catch here is that the code is "position dependent": it can only be loaded at one specfic address in memory, or it will fail miserably to run. In the next blog, I'll explore what "position independent code" means and how it's implemented. In the meantime, you may want to read up on these concepts: clean and dirty memory, pure and impure pages.

Development Framework & Tools

Qt Framework

Qt Development Tools

Qt Design Studio

Qt Quality Assurance

Qt Digital Ads

Qt Insight

Quality Assurance Tools

Squish

Coco

Test Center

Axivion Static Code Analysis

Axivion Architecture Verification

More

Qt 6

Licensing

Qt Features

Qt for Python

Industry & Platform Solutions

Industry

Automotive

Aviation & Aerospace

Industrial Vehicles

Micro-Mobility Interfaces

Consumer Electronics

Industrial Automation

Medical Devices

Platform

Desktop, Mobile & Web

Embedded Devices

MCU (Microcontrollers)

Cloud Solutions

More

Next-Gen UX

Limitless Scalability

Productivity

Our Ultimate Collection of Resources

Development Framework & Tools

Qt Resource Center

Qt Blog

Qt Success Stories

Qt Demos

Quality Assurance Tools

QA Resources

QA Blog

QA Success Stories

More

Live Events & Webinars

Documentation

Take Learning Qt to the Next Level

Learn with us

Qt Academy

Qt Educational License

Qt Documentation

Qt Forum

We're Here for You—Support and Services

Helpful Links

Contact Us

Qt Partners

Qt Support

Qt Customer Portal

Qt Customer Success

Qt Professional Services

Moving code around

Compile-time relocations

Run-time relocations

A bit of history

Today's relocations

Blog Topics:

Comments

Subscribe to our newsletter

Subscribe Newsletter

Try Qt 6.7 Now!

We're Hiring

Read Next

"Coding in the Dark" or what one can do in Qt in 10 minutes

Qt Creator 12: C++ Code Model Update

Qt/.NET — Hosting .NET code in a Qt application (1/3)