Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have done some experiments in which I created a local variable of type pointer to function that points to printf. Then I called printf regularly and using that variable as following:

#include<stdio.h>
typedef int (*func)(const char*,...);

int main()
{
        func x=printf;
        printf("%p
", x);
        x("%p
", x);
        return 0;
}

I have compiled it and looked at the disassembly of main using gdb and got that:

   0x000000000000063a <+0>:     push   %rbp
   0x000000000000063b <+1>:     mov    %rsp,%rbp
   0x000000000000063e <+4>:     sub    $0x10,%rsp
   0x0000000000000642 <+8>:     mov    0x20098f(%rip),%rax        # 0x200fd8
   0x0000000000000649 <+15>:    mov    %rax,-0x8(%rbp)
   0x000000000000064d <+19>:    mov    -0x8(%rbp),%rax
   0x0000000000000651 <+23>:    mov    %rax,%rsi
   0x0000000000000654 <+26>:    lea    0xb9(%rip),%rdi        # 0x714
   0x000000000000065b <+33>:    mov    $0x0,%eax
   0x0000000000000660 <+38>:    callq  0x520 <printf@plt>
   0x0000000000000665 <+43>:    mov    -0x8(%rbp),%rax
   0x0000000000000669 <+47>:    mov    -0x8(%rbp),%rdx
   0x000000000000066d <+51>:    mov    %rax,%rsi
   0x0000000000000670 <+54>:    lea    0x9d(%rip),%rdi        # 0x714
   0x0000000000000677 <+61>:    mov    $0x0,%eax
   0x000000000000067c <+66>:    callq  *%rdx
   0x000000000000067e <+68>:    mov    $0x0,%eax
   0x0000000000000683 <+73>:    leaveq
   0x0000000000000684 <+74>:    retq

What is weird to me is that calling to printf directly uses the plt (as expected) but calling it using the local variable uses a whole different address (as you can see in line 4 of the assembly that the value stored in local variable x is not the address of the plt entry).

How can that be? Don't all the calls to functions undefined in the executable go first through the plt for better performance and for pic code?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
647 views
Welcome To Ask or Share your Answers For Others

1 Answer

(as you can see in line 4 of the assembly that the value stored in local variable x is not the address of the plt entry)

Huh? The value isn't visible in the disassembly, only the location it's loaded from. (In practice it's not loading a pointer to the PLT entry, but line 4 of the assembly doesn't tell you that1.) Use objdump -dR to see dynamic relocations.

That's a load from memory using a RIP-relative addressing mode. In this case it's loading a pointer to the real printf address in libc. That pointer is stored in the Global Offset Table (GOT).

To make this work, the printf symbol gets "early binding" instead of lazy dynamic linking, avoiding PLT overhead for later uses of that function pointer.

Footenote 1: Although maybe you were basing that reasoning on the fact that it's a load instead of a RIP-relative LEA. That pretty much does tell you it's not the PLT entry; part of the point of the PLT is to have an address that's a link-time constant for call rel32, which also enables LEA with a RIP+rel32 addressing mode. The compiler would have used that if it wanted the PLT address in a register.


BTW, the PLT stub itself also uses the GOT entry for its memory-indirect jump; for symbols that are only used as function call targets, the GOT entry holds a pointer back to the PLT stub, to the push / jmp instructions that invoke the lazy dynamic linker to resolve that PLT entry. i.e. to update the GOT entry.


Don't all the calls to functions undefined in the executable go first through the plt for better performance

No, the PLT costs runtime performance by adding an extra level of indirection to every call. gcc -fno-plt uses early binding instead waiting for the first call, so it can inline the indirect call through the GOT right into each call site.

The PLT exists to avoid runtime fixups of call rel32 offsets during dynamic linking. And on 64-bit systems, to allow reaching addresses that are more than 2GB away. And also to support symbol interposition. See https://www.macieira.org/blog/2012/01/sorry-state-of-dynamic-libraries-on-linux/ (written before -fno-plt existed; it's basically like one of the ideas he was suggesting).

The PLT's lazy binding can improve startup performance vs. early binding, but on modern systems where cache hits are very important, doing all the symbol-scanning stuff at once during startup is nice.

and for pic code?

Your code is PIC, or actually PIE (position-independent executable), which most distros configure GCC to do by default.

I expected x to point to the address of the PLT entry of printf

If you use -fno-pie, then the address of the PLT entry is a link-time constant, and at compile time the compiler doesn't know whether you're going to link libc statically or dynamically. So it uses mov $printf, %eax to get the address of a function-pointer into a register, and at link time that can only convert to mov $printf@plt, %eax.

See it on Godbolt. (The Godbolt default is -fno-pie, unlike on most current Linux distros.)

# gcc9.2 -O3 -fpie    for your first block
        movq    printf@GOTPCREL(%rip), %rbp
        leaq    .LC0(%rip), %rdi
        xorl    %eax, %eax
        movq    %rbp, %rsi        # saved for later in rbp
        call    printf@PLT

vs.

# gcc9.2 -O3 -fno-pie
        movl    $printf, %esi          # linker converts this symbol reference to printf@plt
        movl    $.LC0, %edi
        xorl    %eax, %eax
        call    printf                 # will convert at link-time to printf@plt
      # next use also just uses mov-immediate to rematerialize, instead of saving a load result in a register.

So a PIE executable actually has better efficiency for repeated-use of function pointers to functions in standard libraries: the pointer is the final address, not just the PLT entry.

-fno-plt -fno-pie works more like PIE mode for taking function pointers. Except it can still use $foo 32-bit immediates for the addresses of symbols in the same file, instead of a RIP-relative LEA.

# gcc9.2 -O3 -fno-plt -fno-pie
        movq    printf@GOTPCREL(%rip), %rbp    # saved for later in RBP
        movl    $.LC0, %edi
        xorl    %eax, %eax
        movq    %rbp, %rsi
        call    *printf@GOTPCREL(%rip)
  # pointers to static functions can use  mov $foo, %esi

It seems you need int foo(const char*,...) __attribute__((visibility("hidden"))); to tell the compiler it definitely doesn't need to go through the GOT for this symbol, with pie or -fno-plt.

Leaving it until link-time for the linker to convert symbol to symbol@plt if necessary allows the compiler to always use efficient 32-bit absolute immediates or RIP-relative addressing and only end up with PLT indirection for functions that turn out to be in a shared library. But then you end up with pointers to PLT entries, instead of pointers to the final address.


If you were using Intel syntax, it would be mov rbp, QWORD PTR printf@GOTPCREL[rip] in GCC's output for this, if you look at asm instead of disassembly.

Looking at compiler output gives you significantly more information that just numeric offsets from RIP in plain objdump output. -r to show relocation symbols helps some, but compiler output is generally better. (Except you don't see that printf gets rewritten to printf@plt)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...