$ gcc -ggdb program.c -o program
$ gdb ./program
The target architecture is assumed to be i386:x86-64:intel
Reading symbols from ./program...done.
让我们在 gdb 中执行 info files 这个指令。这个指令会打印关于被不同段占据的内存和调试目标的信息。
(gdb) info files
Symbols from "/home/alex/program".
Local exec file:
`/home/alex/program', file type elf64-x86-64.
Entry point: 0x400430
0x0000000000400238 - 0x0000000000400254 is .interp
0x0000000000400254 - 0x0000000000400274 is .note.ABI-tag
0x0000000000400274 - 0x0000000000400298 is .note.gnu.build-id
0x0000000000400298 - 0x00000000004002b4 is .gnu.hash
0x00000000004002b8 - 0x0000000000400318 is .dynsym
0x0000000000400318 - 0x0000000000400357 is .dynstr
0x0000000000400358 - 0x0000000000400360 is .gnu.version
0x0000000000400360 - 0x0000000000400380 is .gnu.version_r
0x0000000000400380 - 0x0000000000400398 is .rela.dyn
0x0000000000400398 - 0x00000000004003c8 is .rela.plt
0x00000000004003c8 - 0x00000000004003e2 is .init
0x00000000004003f0 - 0x0000000000400420 is .plt
0x0000000000400420 - 0x0000000000400428 is .plt.got
0x0000000000400430 - 0x00000000004005e2 is .text
0x00000000004005e4 - 0x00000000004005ed is .fini
0x00000000004005f0 - 0x0000000000400610 is .rodata
0x0000000000400610 - 0x0000000000400644 is .eh_frame_hdr
0x0000000000400648 - 0x000000000040073c is .eh_frame
0x0000000000600e10 - 0x0000000000600e18 is .init_array
0x0000000000600e18 - 0x0000000000600e20 is .fini_array
0x0000000000600e20 - 0x0000000000600e28 is .jcr
0x0000000000600e28 - 0x0000000000600ff8 is .dynamic
0x0000000000600ff8 - 0x0000000000601000 is .got
0x0000000000601000 - 0x0000000000601028 is .got.plt
0x0000000000601028 - 0x0000000000601034 is .data
0x0000000000601034 - 0x0000000000601038 is .bss
(gdb) break *0x400430
Breakpoint 1 at 0x400430
(gdb) run
Starting program: /home/alex/program
Breakpoint 1, 0x0000000000400430 in _start ()
有趣。我们并没有看见 main 函数的执行,但是我们看见另外一个函数被调用。这个函数是 _start 而且根据调试器展现给我们看的,它是我们程序的真正入口。那么,这个函数是从哪里来的,又是谁调用了这个 main 函数,什么时候调用的。我会在后续部分尝试回答这些问题。
内核如何运行新程序
首先,让我们来看一下下面这个简单的 C 程序:
// program.c
#include <stdlib.h>
#include <stdio.h>
static int x = 1;
int y = 2;
int main(int argc, char *argv[]) {
int z = 3;
printf("x + y + z = %d\n", x + y + z);
return EXIT_SUCCESS;
}
$ gcc -nostdlib program.c -o program
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 000000000040017c
/tmp/cc02msGW.o: In function `main':
/home/alex/program.c:11: undefined reference to `printf'
collect2: error: ld returned 1 exit status
After the dynamic linker has built the process image and performed the relocations, each shared object gets the opportunity to execute some initialization code. ... Similarly, shared objects may have termination functions, which are executed with the atexit (BA_OS) mechanism after the base process begins its termination sequence.
It takes address of the main function of a program, argc and argv. init and fini functions are constructor and destructor of the program. The rtld_fini is termination function which will be called after the program will be exited to terminate and free dynamic section. The last parameter of the __libc_start_main is the pointer to the stack of the program. Before we can call the __libc_start_main function, all of these parameters must be prepared and passed to it. Let's return to the sysdeps/x86_64/start.S assembly file and continue to see what happens before the __libc_start_main function will be called from there.
该函数以程序 main 函数的地址,argc 和 argv 作为输入。init 和 fini 函数分别是程序的构造函数和析构函数。rtld_fini 是当程序退出时调用的终止函数,用来终止以及释放动态段。__libc_start_main 函数的最后一个参数是一个指向程序栈的指针。在我们调用 __libc_start_main 函数之前,所有的参数都要被准备好,并且传递给它。让我们返回 sysdeps/x86_64/start.S 这个文件,继续看在 __libc_start_main 被调用之前发生了什么。
$ gcc -nostdlib /lib64/crt1.o -lc -ggdb program.c -o program
/lib64/crt1.o: In function `_start':
(.text+0x12): undefined reference to `__libc_csu_fini'
/lib64/crt1.o: In function `_start':
(.text+0x19): undefined reference to `__libc_csu_init'
collect2: error: ld returned 1 exit status
After the dynamic linker has built the process image and performed the relocations, each shared object gets the opportunity to execute some initialization code. ... Similarly, shared objects may have termination functions, which are executed with the atexit (BA_OS) mechanism after the base process begins its termination sequence.
所以链接器除了一般的段,如 .text, .data 之外创建了两个特殊的段:
.init
.fini
We can find it with readelf util:
我们可以通过 readelf 工具找到它们:
$ readelf -e test | grep init
[11] .init PROGBITS 00000000004003c8 000003c8
$ readelf -e test | grep fini
[15] .fini PROGBITS 0000000000400504 00000504
where the PREINIT_FUNCTION is the __gmon_start__ which does setup for profiling. You may note that we have no return instruction in the sysdeps/x86_64/crti.S. Actually that's why we got segmentation fault. Prolog of _init and _fini is placed in the sysdeps/x86_64/crtn.S assembly file: