I stumbled upon Patrick Horgan’s famous post how programs start in linux here.
I tried to read and understand the article, but it was a bit too hard for me.
So I decided to write a prequel on inspecting the binary of an empty main function.
I used the exact same code that Patrick used for his blog post.
Compile the code below with gcc.
int main(){
}
Use the -nostdlib flag or else, you’ll get a whole bunch of code that’s irrelevant to the code you wrote inside the main function and it makes it much harder to examine the disassembly.
gcc prog1.c -nostdlib -o prog1
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000001000
gcc will complain that the linker couldn’t find the _start symbol.
ld initiates the linking process by resolving the address of the _start symbol.
Since we use dynamic linking or static linking the gcc and glibc will at some start up code called crt0 before linking.
Stackoverflow and fedora discussion both have good explanations regarding this.
Let’s use objdump to have a look at the disassembly and the binary.
You could see that there’s a .text section and inside it is the main function.
objdump -d -M intel prog1
prog1: file format elf64-x86-64
Disassembly of section .text:
0000000000001000 <main>:
1000: f3 0f 1e fa endbr64
1004: 55 push rbp
1005: 48 89 e5 mov rbp,rsp
1008: b8 00 00 00 00 mov eax,0x0
100d: 5d pop rbp
100e: c3 ret
On the right side of the main function you can see that there are random numbers like f3 0f 1e fa.
These are the actually values that the CPU executes. Assembly instructions such as endbr64 aren’t acutally executed.
Then you might be curious why would we need to use assembly instructions?
Well to humans f3 0f 1e fa doesn’t really have any meaning at all. It’s just a bunch of hexadecimal numbers.
Therefore we needed to create a mapping or a relationship that doesn’t require us to use random hexadecimal numbers but still allows us to execute programs that the CPU understands.
That’s why pioneers in the past created assembly so that we can reference hexadecimal numbers via assembly instructions.
Without assembly we would still be writing programs solely in random numbers which would be excruciating.
Another thing you can notice from the objdump output is that the main function is in the .text section.
Now let’s have a look at the disassembly.
The main function starts off with a endbr64 instruction.
endbr is one of the instructions introduced by Intel from their CET(Control-Flow Enforcement Techonology).
0000000000001000 <main>:
1000: f3 0f 1e fa endbr64
1004: 55 push rbp
1005: 48 89 e5 mov rbp,rsp
1008: b8 00 00 00 00 mov eax,0x0
100d: 5d pop rbp
100e: c3 ret
There’s an explanation about what endbr64 is on stackoverflow but it was very difficult for me at least.
The main purpose of Intel creating CET was to prevent attacks like Return-oriented programming aka ROP.
According to the gcc document the -fcf-protection flag enables CET.

If you run dpkg-buildflags you can check which gcc flags are used Ubuntu uses for its distribution executables.
dpkg-buildflags --get CFLAGS
-g -O2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -ffile-prefix-map=/home/hwkim301=. -flto=auto -ffat-lto-objects -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection
At the very end you can spot the -fcf-protection flag.
Not only distibuted tools like GNU-coreutils (ls,cat…) are compiled with -fcf-protection, the ELF’s we create for fun will also start with a endbr instruction at the beginning of a function if you don’t disable the -fcf-protection.
One instruction done, only a couple more to go.
Then comes the push rbp instruction.
1004: 55 push rbp
1005: 48 89 e5 mov rbp,rsp
1008: b8 00 00 00 00 mov eax,0x0
100d: 5d pop rbp
100e: c3 ret
From here on this will be the assembly for the function prologue and epilogue.
Function prologues are used to create a stack frame before calling a function.
The base pointer or framer pointer always points to the bottom (higher) address of the stack, and the stack pointer always points to the top of the stack (lower) address.
Counter-intuitively the stack grows from higher address to lower address.
To get a better grasp on understanding the stack I recommend reading Eli Benderksy’s post here and here.
Another important point is that whenever the CPU executes a push or pop instruction the stack pointer will change immediately.
For example although the function prologue starts of with a push rbp although it hasn’t used the stack pointer directly the stack pointer will decrease by 4 bytes if it’s x86 or 8 bytes if it’s amd64.
In most cases it will be 8 bytes because we usually use amd64 instead of x86.
Since there is only one base pointer in amd64 and each function uses the base pointer in a function call, when a function calls another function it needs to save the previous base pointer so that when executing the current function it doesn’t corrupt the other function’s stack.
According to the intel manual Volume 1 3.4.1.1 there’s only one base pointer for x86 and amd64.

The x86-64 ABI states that the %rbp register belongs to the calling function.
Registers %rbp, %rbx and %r12 through %r15 “belong” to the calling function and the called function is required to preserve their
values.
So since there’s only one base pointer and all functions use the base pointer the calling function needs to save the previous function’s base pointer on the stack before calling new functions.
That’s why the function prologue starts off by a push rbp instruction, so that it doesn’t corrupt the previous functions stack frame.
Now that the previous function’s base pointer is on the stack. It can now use the base pointer freely in a new stack frame for a new function.
It now will set the current stack pointer as the anchor or start for the current new frame via mov rbp, rsp.
Then it creates some space for the stack frame to use with a sub rsp, N instruction, although it’s not in the disassembly for my code.
Then comes the mov eax,0x0 instruction.
The main function in C or C++ is usually an int main therefore although we don’t always explicity put a return 0, the compiler has to.
In gcc the eax or rax register holds the value for the return value, so this instruction will set the return value of the main function to 0.
From here on is the function prologue.
Normally there’s a mov rsp, rbp so that the stack frame cleans the area it used for the function.
However, since the empty main function didn’t use any variables or what so ever the instruction isn’t included.
Then pop rbp gets executed this will decrease the stack pointer and will save the value the stack pointer pointed which into the one and only base pointer rbp.
Finally the ret instruction will do a pop rip and save what the stack pointer points which was the return address and will continue executing the program.
Wait I never mentioned the return address? where did it come from?
Well, the moment you call a function, the CPU internally uses the CALL instruction which pushes the return address on the stack right before the function prologue.

Aha, the ret instruction which equivalently does a pop rip although such instruction doesn’t exist.
The ret takes the top value of the stack, where the return address is stored and saves that to the instruction pointer.
Finally, the instruction pointer will continue program execution at the saved return address.