-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Create Linux_Memory_Management_Essentials.md
Signed-off-by: Igor Stoppa <[email protected]>
- Loading branch information
1 parent
6faef03
commit 455029f
Showing
1 changed file
with
167 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,167 @@ | ||
# **Linux Memory Management Essentials (Work In Progress)** | ||
|
||
## Index | ||
|
||
[Terms and Abbreviations](#Terms-and-Abbreviations) | ||
|
||
[References](#References) | ||
|
||
[Disclaimer](#Disclaimer) | ||
|
||
[Purpose of the document](#Purpose-of-the-document) | ||
|
||
[Structure of the document](#Structure-of-the-document) | ||
|
||
[Kernel-space memory allocations](Kernel-space-memory-allocations) | ||
|
||
[User-space memory allocations](User-space-memory-allocations) | ||
|
||
[License: CC BY-SA 4.0](#License-CC-BY-SA-40) | ||
|
||
|
||
## **Terms and Abbreviations** | ||
Plese refer to the Linux Kernel documentation. | ||
|
||
|
||
## **References** | ||
|
||
1. [Linux Kernel website](https://www.kernel.org) - <https://www.kernel.org> | ||
2. ***Interference Scenarios for an ARM64 Linux System*** | ||
3. [CC BY-SA 4.0 Deed | Attribution-ShareAlike 4.0 International | Creative Commons](https://creativecommons.org/licenses/by-sa/4.0/) - <https://creativecommons.org/licenses/by-sa/4.0/> License | ||
|
||
|
||
## **Disclaimer** | ||
* This document is not intended to be a replacement for understanding the memory management of the Linux, | ||
nor it attempts to be an exhaustive analysis of safety implications. | ||
* Because of the very volatile nature of the code within the Linux Kernel, each of the statements made | ||
below should not be taken at face value, but rather verified, for any Linux kernel version following | ||
the one used while writing the document. | ||
* When referring to specific HW features, the document refers to the ARM64 architecture. | ||
|
||
## **Purpose of the document** | ||
This document aims to providing an holistic view of what happens in Linux memory management, so that | ||
one is at least aware of certain features and can use this document as jumping pad toward more detailed documentation. | ||
Or even to the code base. | ||
|
||
## **Structure of the document** | ||
The document is divided in two parts, based on hte destinatary of the memory allocations discussed: kernel-space or user-space. | ||
Individual points are numbered for ease of reference, but the numbering is not meant to represent any sequence. | ||
|
||
## **Memory management in Linux** | ||
|
||
### **Kernel-space memory allocations** | ||
|
||
#### **Facts** | ||
1. unlike processes memory, kernel memory pages are not swapped, nor dropped silently by the kernel itself, | ||
although an hypervisor will do to a VM what the kernel does to a process (but this is beyond the control of the kernel) | ||
2. the kernel context (usually EL1 on ARM64) uses one single memory map (page tables) across all the cores | ||
executing in kernel mode | ||
3. on 64 bit systems (e.g. ARM64 and x86_64), usually almost all physical memory is mapped in a | ||
(semi)contiguous (there can be holes) range. Memory within this range is both virtually and physically contiguous. | ||
4. physically contiguous memory is treated as a scarce resource, and typically is not provided to userspace, | ||
unless it explicitly asks for it (e.g. for implementing some DMA buffer) | ||
5. the kernel can access userspace in two ways: | ||
1. through the userspace mapping | ||
1. it is done in exceptional cases, like copy_to_user_copy_from_user | ||
2. it uses the userspace memory map, which can implement HW protections against kernel misuses | ||
3. it allows the kernel to see process pages in the same sequence and with the same mappings as the proces does | ||
2. through the EL1 mappings + linear map | ||
1. not the intended way to access process context | ||
2. not very useful, even for an attacker, in a security scenario | ||
3. the sequence of pages mapped in the userspace process is not known | ||
4. some process pages might even be "missing", because they have been either swapped out or dropped | ||
5. bypasses any protection that the process might employ through its own mapping | ||
6. attempt to keep pages hot in the system cache make it very difficult to predict which pages might surface in a certain place: | ||
1. during various operations, it is necessary to reserve one or more pages, that later on will be released. | ||
2. the kernel adopts various optimisations for attempting to keep low the fragmentation, thus allowing | ||
for availability of higher-order allocations, though slabs/buddy allocator and folios. | ||
3. Even the slab allocator contains queues of free chunks, grouped by size | ||
4. However, it also attempts to cache locally these free pages, in per-cpu queues, with the intent of minimising | ||
the amount of cache flushes that would be driven by releasing a recently feed page, and subsequently allocating | ||
another one | ||
|
||
|
||
### **User-space memory allocations** | ||
|
||
#### **Facts** | ||
1. user-space memory must be assumed to not always be backed by physical memory pages | ||
meaning that a non running process might have had some of its pages dropped or swapped out to disk | ||
2. unless some extra effort has been taken, to pin down memory allocations for user-space, it is | ||
not really possible to assume much, other than at some point both code that has been executed | ||
and data that has been read were present in physical memory. The presence of caching and other | ||
optimisations like the zero-page make it very difficult to be assertive beyond this point. | ||
3. the management of memory pages associated with a process is handled through the process memory | ||
map, which consists of several virtual memory areas, which represents address ranges within the | ||
process address space, that are put to some use. | ||
They come in two flavors: | ||
1. anonymous mappings: | ||
1. process heap, stack | ||
2. zero-initialised variables | ||
2. file-backed mappings: | ||
1. constants | ||
2. the main executable associated with the process | ||
3. the runtime linker (optional but typical) | ||
4. linked libraries | ||
4. the kernel uses various optimisations for dealing with processes on-demand mapping: | ||
1. a physical page is allocated and mapped to a process only when the process accesses it: | ||
1. file backed pages are allocated/mapped only when read/written to | ||
2. anonymous pages are allocated only when written to | ||
2. zero page: when a page is known to be empty, it is not reserved and mapped; instead, | ||
the kernel has 1 specific memory page that is mapped read-only | ||
3. shared libraries: | ||
when the same library pages are mapped by multiple processes, the library physical pages | ||
are mapped as read-only into each process address space | ||
4. copy-on-write: | ||
1. when a library has own data, this is initially mapped as read-only and shared; | ||
only when written to, then a separate physical page is reserved | ||
2. same happens for data pages that were treated as zero-page, but then are written to. | ||
5. folios: data structures that try to better abstract compound pages and *might* also be used | ||
to represent optimised contiguous pages on ARM64 (instead of mapping 16 entries, it is possible | ||
to map a 16-pages chunk of physically contiguous memory, that is also aligned to a 16 pages | ||
boundary, reducing TLB use). | ||
1. A folio acts as intermediary between vma and lower level memory management | ||
2. it might pre-allocate/map more pages than explicitly requested | ||
5. read-ahead: when asked to fetch data from disk, the kernel might attempt to optimise the operation, | ||
reading more pages than requested, under the assumption that more requests might be coming soon | ||
|
||
|
||
#### **Safety-Oriented consideration** | ||
1. a process that is supposed to support safety requirements should not have pages swapped out / dropped, | ||
because this would introduce: | ||
1. uncertainty in the timing required to recover the content, if not immediately available | ||
2. additional risk, involving the userspace paging mechanisms in the fulfilling of the safety requirements | ||
3. additional dependency on runtime linking, in case the process requires it, and code pages have been | ||
discarded - reloading them from disk will not be sufficient | ||
3. The optimisations made by the kernel in providing physical backing to process memory make it very | ||
questionable if it can be assessed when a (part of) a process memory content is actually present in the | ||
system physical memory. | ||
|
||
|
||
|
||
## **License: CC BY-SA 4.0** | ||
|
||
### **DEED** | ||
### **Attribution-ShareAlike 4.0 International** | ||
|
||
Full License text: <https://creativecommons.org/licenses/by-sa/4.0/> | ||
|
||
**You are free to:** | ||
|
||
* **Share** — copy and redistribute the material in any medium or format for any purpose, even commercially. | ||
|
||
* **Adapt** — remix, transform, and build upon the material for any purpose, even commercially. | ||
|
||
The licensor cannot revoke these freedoms as long as you follow the license terms. | ||
|
||
**Under the following terms:** | ||
|
||
* **Attribution** — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. | ||
|
||
* **ShareAlike** — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. | ||
|
||
* **No additional restrictions** — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits. | ||
|
||
**Notices:** | ||
|
||
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation . | ||
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material. |