overview edit doc new doc remove

Jan 26, 2018

Reverse Engineering

Virtual memory of a program

Each program is layed out in virtual memory, which is mapped by the kernel to real memory. The amount of virtual memory each program has depends on a lot of factors.

0x00        ______________________
            |
            |       ELF HEADER
            |
            |---------------------
            |
            |       TEXT # executable instructions, readable text
            |
            |---------------------
            |
            |       RO DATA # static and global initialized variabels
            |       DATA
            |
            |---------------------
            |
            |       BSS # static and global uninitialized variabels
            | 
0xD80000    |---------------------
            |
            |       HEAP # allocated by malloc
            | 
            |
            | 
            |       SHARED LIBRARIES # for example stdlib
            |
            |
            |           ^
            |           |
            |       STACK # frames, with fuction variables
0x7FFFFFFF  |_____________________

The stack

All functions are pushed on the stack as a stack frame, in this frame there is data about the function for example local variables from within the function. Next to local variables, a return value is also stored, so the CPU knows where to go after execution of all what is inside a function. To get this done efficient, there are 2 things that happen:

Before we make a stack frame for the new function we need to store a return address to continue our program, so when calling a function, we first store the next instruction on the stack and then jump to the function address.

push rip+len(instr)
jmp <address>

First we have the function prologue which let say allocate some space for the function (setting up the stack frame):

push rbp 
mov rbp, rsp
sub rsp, 0x20 ; allocating 32 bytes

After everything inside the function is done we need to get back to the previous function (mostly main). We have stored the address of main on the stack when we called our function (push rip+len(instr), which means store the next instruction on the stack.

The last step we need to perform after execution of our function is reversing the stack frame so we can return to the instruction stored under the stack frame, this is called the function epilogue:

mov rsp, rbp
pop rbp

Registers

Register lengths

    0 1 2 3 4 5 6 7 8 9 A B C D E F * (00)

rax X X X X X X X X X X X X X X X X
eax               X X X X X X X X X 
ax                          X X X X 
ah                          X X
al                              X X

Calls

When doing a call, what actually is happening:

push rip+len(instr)
jmp <address>

System call:

rax = syscall number
rdi # arg0
rsi # arg1
rdx # arg2

Radare2

$ r2 -d a.out

Objdump

$ objdump -d ./a.out
1