29.3. 内存管理

29.3.1. 使用 4 级页表的完整虚拟内存映射

注意

  • 诸如 “-23 TB” 之类的负地址是以字节为单位的绝对地址,从 64 位地址空间的顶部向下计数。当以绝对地址和距顶部的距离两种方式表示时,更容易理解布局。

    例如 0xffffe90000000000 == -23 TB,它比 64 位地址空间的顶部 (ffffffffffffffff) 低 23 TB。

    请注意,当我们接近地址空间的顶部时,符号会从 TB 更改为 GB,然后更改为 MB/KB。

  • “16M TB” 起初看起来可能很奇怪,但它比 “16 EB” 更容易可视化大小表示,很少有人会一眼就认出 16 EB 为 16 艾字节。它还很好地展示了 64 位地址空间是多么的巨大。

========================================================================================================================
    Start addr    |   Offset   |     End addr     |  Size   | VM area description
========================================================================================================================
                  |            |                  |         |
 0000000000000000 |    0       | 00007fffffffefff | ~128 TB | user-space virtual memory, different per mm
 00007ffffffff000 | ~128    TB | 00007fffffffffff |    4 kB | ... guard hole
__________________|____________|__________________|_________|___________________________________________________________
                  |            |                  |         |
 0000800000000000 | +128    TB | 7fffffffffffffff |   ~8 EB | ... huge, almost 63 bits wide hole of non-canonical
                  |            |                  |         |     virtual memory addresses up to the -8 EB
                  |            |                  |         |     starting offset of kernel mappings.
                  |            |                  |         |
                  |            |                  |         | LAM relaxes canonicallity check allowing to create aliases
                  |            |                  |         | for userspace memory here.
__________________|____________|__________________|_________|___________________________________________________________
                                                            |
                                                            | Kernel-space virtual memory, shared between all processes:
__________________|____________|__________________|_________|___________________________________________________________
                  |            |                  |         |
 8000000000000000 |   -8    EB | ffff7fffffffffff |   ~8 EB | ... huge, almost 63 bits wide hole of non-canonical
                  |            |                  |         |     virtual memory addresses up to the -128 TB
                  |            |                  |         |     starting offset of kernel mappings.
                  |            |                  |         |
                  |            |                  |         | LAM_SUP relaxes canonicallity check allowing to create
                  |            |                  |         | aliases for kernel memory here.
____________________________________________________________|___________________________________________________________
                  |            |                  |         |
 ffff800000000000 | -128    TB | ffff87ffffffffff |    8 TB | ... guard hole, also reserved for hypervisor
 ffff880000000000 | -120    TB | ffff887fffffffff |  0.5 TB | LDT remap for PTI
 ffff888000000000 | -119.5  TB | ffffc87fffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
 ffffc88000000000 |  -55.5  TB | ffffc8ffffffffff |  0.5 TB | ... unused hole
 ffffc90000000000 |  -55    TB | ffffe8ffffffffff |   32 TB | vmalloc/ioremap space (vmalloc_base)
 ffffe90000000000 |  -23    TB | ffffe9ffffffffff |    1 TB | ... unused hole
 ffffea0000000000 |  -22    TB | ffffeaffffffffff |    1 TB | virtual memory map (vmemmap_base)
 ffffeb0000000000 |  -21    TB | ffffebffffffffff |    1 TB | ... unused hole
 ffffec0000000000 |  -20    TB | fffffbffffffffff |   16 TB | KASAN shadow memory
__________________|____________|__________________|_________|____________________________________________________________
                                                            |
                                                            | Identical layout to the 56-bit one from here on:
____________________________________________________________|____________________________________________________________
                  |            |                  |         |
 fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
                  |            |                  |         | vaddr_end for KASLR
 fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
 fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
 ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
 ffffff8000000000 | -512    GB | ffffffeeffffffff |  444 GB | ... unused hole
 ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
 ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
 ffffffff80000000 |   -2    GB | ffffffff9fffffff |  512 MB | kernel text mapping, mapped to physical address 0
 ffffffff80000000 |-2048    MB |                  |         |
 ffffffffa0000000 |-1536    MB | fffffffffeffffff | 1520 MB | module mapping space
 ffffffffff000000 |  -16    MB |                  |         |
    FIXADDR_START | ~-11    MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
 ffffffffff600000 |  -10    MB | ffffffffff600fff |    4 kB | legacy vsyscall ABI
 ffffffffffe00000 |   -2    MB | ffffffffffffffff |    2 MB | ... unused hole
__________________|____________|__________________|_________|___________________________________________________________

29.3.2. 使用 5 级页表的完整虚拟内存映射

注意

  • 使用 56 位地址,用户空间内存扩展了 512 倍,从 0.125 PB 扩展到 64 PB。所有内核映射都向下移动到 -64 PB 的起始偏移量,并且许多区域都扩展以支持更大的物理内存。

========================================================================================================================
    Start addr    |   Offset   |     End addr     |  Size   | VM area description
========================================================================================================================
                  |            |                  |         |
 0000000000000000 |    0       | 00fffffffffff000 |  ~64 PB | user-space virtual memory, different per mm
 00fffffffffff000 |  ~64    PB | 00ffffffffffffff |    4 kB | ... guard hole
__________________|____________|__________________|_________|___________________________________________________________
                  |            |                  |         |
 0100000000000000 |  +64    PB | 7fffffffffffffff |   ~8 EB | ... huge, almost 63 bits wide hole of non-canonical
                  |            |                  |         |     virtual memory addresses up to the -8EB TB
                  |            |                  |         |     starting offset of kernel mappings.
                  |            |                  |         |
                  |            |                  |         | LAM relaxes canonicallity check allowing to create aliases
                  |            |                  |         | for userspace memory here.
__________________|____________|__________________|_________|___________________________________________________________
                                                            |
                                                            | Kernel-space virtual memory, shared between all processes:
____________________________________________________________|___________________________________________________________
 8000000000000000 |   -8    EB | feffffffffffffff |   ~8 EB | ... huge, almost 63 bits wide hole of non-canonical
                  |            |                  |         |     virtual memory addresses up to the -64 PB
                  |            |                  |         |     starting offset of kernel mappings.
                  |            |                  |         |
                  |            |                  |         | LAM_SUP relaxes canonicallity check allowing to create
                  |            |                  |         | aliases for kernel memory here.
____________________________________________________________|___________________________________________________________
                  |            |                  |         |
 ff00000000000000 |  -64    PB | ff0fffffffffffff |    4 PB | ... guard hole, also reserved for hypervisor
 ff10000000000000 |  -60    PB | ff10ffffffffffff | 0.25 PB | LDT remap for PTI
 ff11000000000000 |  -59.75 PB | ff90ffffffffffff |   32 PB | direct mapping of all physical memory (page_offset_base)
 ff91000000000000 |  -27.75 PB | ff9fffffffffffff | 3.75 PB | ... unused hole
 ffa0000000000000 |  -24    PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
 ffd2000000000000 |  -11.5  PB | ffd3ffffffffffff |  0.5 PB | ... unused hole
 ffd4000000000000 |  -11    PB | ffd5ffffffffffff |  0.5 PB | virtual memory map (vmemmap_base)
 ffd6000000000000 |  -10.5  PB | ffdeffffffffffff | 2.25 PB | ... unused hole
 ffdf000000000000 |   -8.25 PB | fffffbffffffffff |   ~8 PB | KASAN shadow memory
__________________|____________|__________________|_________|____________________________________________________________
                                                            |
                                                            | Identical layout to the 47-bit one from here on:
____________________________________________________________|____________________________________________________________
                  |            |                  |         |
 fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
                  |            |                  |         | vaddr_end for KASLR
 fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
 fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
 ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
 ffffff8000000000 | -512    GB | ffffffeeffffffff |  444 GB | ... unused hole
 ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
 ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
 ffffffff80000000 |   -2    GB | ffffffff9fffffff |  512 MB | kernel text mapping, mapped to physical address 0
 ffffffff80000000 |-2048    MB |                  |         |
 ffffffffa0000000 |-1536    MB | fffffffffeffffff | 1520 MB | module mapping space
 ffffffffff000000 |  -16    MB |                  |         |
    FIXADDR_START | ~-11    MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
 ffffffffff600000 |  -10    MB | ffffffffff600fff |    4 kB | legacy vsyscall ABI
 ffffffffffe00000 |   -2    MB | ffffffffffffffff |    2 MB | ... unused hole
__________________|____________|__________________|_________|___________________________________________________________

架构定义了 64 位虚拟地址。实现可以支持更少的地址。目前支持的是 48 位和 57 位虚拟地址。第 63 位到最高有效位(已实现)进行符号扩展。如果您将它们解释为无符号,则会导致用户空间和内核地址之间存在空洞。

直接映射覆盖系统中所有内存,直到最高的内存地址(这意味着在某些情况下,它也可以包括 PCI 内存空洞)。

我们在 ‘efi_pgd’ PGD 中映射 EFI 运行时服务,其虚拟内存窗口为 64GB(此大小是任意的,如果需要,可以稍后提高)。这些映射不属于任何其他内核 PGD,并且仅在 EFI 运行时调用期间可用。

请注意,如果启用了 CONFIG_RANDOMIZE_MEMORY,则所有物理内存、vmalloc/ioremap 空间和虚拟内存映射的直接映射将被随机化。它们的顺序保持不变,但其基地址将在启动时早期偏移。

在更改此处的任何内容时,请非常小心 KASLR。KASLR 地址范围不得与 KASAN 阴影区域以外的任何内容重叠,这是正确的,因为 KASAN 禁用了 KASLR。

对于 4 级和 5 级布局,最后 2MB 空洞中的 STACKLEAK_POISON 值为:ffffffffffff4111