Skip to content

Commit

Permalink
understanding ELFs part 3
Browse files Browse the repository at this point in the history
Signed-off-by: innocentzero <[email protected]>
  • Loading branch information
InnocentZero committed Jan 30, 2025
1 parent b111320 commit 7516508
Showing 1 changed file with 79 additions and 0 deletions.
79 changes: 79 additions & 0 deletions content/posts/understanding_elfs_3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
+
title = "Understanding ELFs, part 3"
date = 2025-01-30
authors = ["InnocentZero"]
+++

## On relocations, loading binaries, and more

The reason we need relocations is because of a simple fact, the existence of shared libraries.

One question anyone may ask is the necessity of having shared libraries. That is done to avoid
repitition of pages in memory, a thing which was critical in older days because of low memory.
Another thing to note is that there is separation of the library and the binary. The library can
be updated without updating the binary as such.

This is dealt with by using _relocation sections_. These contain the info needed to do the
relocation of the symbol within the binary's context. The section usually links to an additional
section where the relocation is going to happen.

There are two ways in which object files may be linked: statically and dynamically.

Static linking is fairly straightforward, the linker takes in all the object files and archive
files (=libc.a=) and creates a single self-contained binary containing all the required
functionality. This is done at the end of compilation itself.

Dynamic linking is a slightly more complex and involved process. It defers the linking part from
compile time to runtime. The binary contains the information about its choice of runtime linker
(also referred to as an _interpreter_) and the dynamic symbols and how to obtain them.


## Loading an ELF on the memory

The system first executes the file's "interpreter" before handing over execution to the binary.
Over here, the interpreter is obtained from the `.interp` section in the `PT_INTERP` segment in
memory. This can be read using `readelf -p .interp example`.

```
$ readelf -p .interp example
String dump of section '.interp':
[ 0] /lib64/ld-linux-x86-64.so.2
```

The interpreter loads the binary into memory first.

The interpreter sets up the environment using the `.dynamic` section of the binary. This can be
seen using `readelf -d executable`.

In this, the interpreter will recursively begin visiting all the **NEEDED** dynamic libraries to be
loaded into memory. For each dependency, the following steps are executed:

- The ELF is mapped into memory.
- Relocations are performed, in the original binary we patch all the absolute addresses and
resolve references to other object files.
- Its dynamic table is parsed and dependencies loaded.
- Run `dl_init`, which executes all the functions from `INIT`, and `INIT_ARRAY` for the just loaded
libraries.

Now the control is handed over to `_start` in the ELF binary. That gets the pointer to `_dl_fini`
in `rdx`. This prepares the stack with a few arguments and calls `_libc_start_main`.

`_libc_start_main` receives a function pointer to `main`, `init`, `fini`, and `rtld_fini` (this is the
same as `dl_fini`).

This function has a bunch of things going on, such as setting up of thread local storage and
such. Here we only care about two things:

- `__cxa_atexit__` which sets up `_dl_fini` as the destructor after the program is done.

- A call to `call_init` that run the constructors in the `INIT` and `INIT_ARRAY` dynamic table
entries. Note that `dl_init` was for the entries in the shared libraries themselves, but this
is for the binary.

- Finally, control after this is handed over to `main`.

- Immediately after `main`, `exit` is called. This only transfers the control to
`__run_exit_handlers`.

- This runs all the functions registered in `__exit_funcs` which also contains `_dl_fini`.

0 comments on commit 7516508

Please sign in to comment.