Kexec Handover Subsystem¶
Overview¶
Kexec HandOver (KHO) is a mechanism that allows Linux to preserve memory regions, which could contain serialized system states, across kexec.
KHO uses flattened device tree (FDT) to pass information about the preserved state from pre-exec kernel to post-kexec kernel and scratch memory regions to ensure integrity of the preserved memory.
KHO FDT¶
Every KHO kexec carries a KHO specific flattened device tree (FDT) blob that describes the preserved state. The FDT includes properties describing preserved memory regions and nodes that hold subsystem specific state.
The preserved memory regions contain either serialized subsystem states, or in-memory data that shall not be touched across kexec. After KHO, subsystems can retrieve and restore the preserved state from KHO FDT.
Subsystems participating in KHO can define their own format for state serialization and preservation.
KHO FDT and structures defined by the subsystems form an ABI between pre-kexec
and post-kexec kernels. This ABI is defined by header files in
include/linux/kho/abi directory.
Scratch Regions¶
To boot into kexec, we need to have a physically contiguous memory range that contains no handed over memory. Kexec then places the target kernel and initrd into that region. The new kernel exclusively uses this region for memory allocations before during boot up to the initialization of the page allocator.
We guarantee that we always have such regions through the scratch regions: On
first boot KHO allocates several physically contiguous memory regions. Since
after kexec these regions will be used by early memory allocations, there is a
scratch region per NUMA node plus a scratch region to satisfy allocations
requests that do not require particular NUMA node assignment.
By default, size of the scratch region is calculated based on amount of memory
allocated during boot. The kho_scratch kernel command line option may be
used to explicitly define size of the scratch regions.
The scratch regions are declared as CMA when page allocator is initialized so
that their memory can be used during system lifetime. CMA gives us the
guarantee that no handover pages land in that region, because handover pages
must be at a static physical memory location and CMA enforces that only
movable pages can be located inside.
After KHO kexec, we ignore the kho_scratch kernel command line option and
instead reuse the exact same region that was originally allocated. This allows
us to recursively execute any amount of KHO kexecs. Because we used this region
for boot memory allocations and as target memory for kexec blobs, some parts
of that memory region may be reserved. These reservations are irrelevant for
the next KHO, because kexec can overwrite even the original kernel.
Kexec Handover Radix Tree¶
This is a radix tree implementation for tracking physical memory pages across kexec transitions. It was developed for the KHO mechanism but is designed for broader use by any subsystem that needs to preserve pages.
The radix tree is a multi-level tree where leaf nodes are bitmaps representing individual pages. To allow pages of different sizes (orders) to be stored efficiently in a single tree, it uses a unique key encoding scheme. Each key is an unsigned long that combines a page’s physical address and its order.
Client code is responsible for allocating the root node of the tree, initializing the mutex lock, and managing its lifecycle. It must use the tree data structures defined in the KHO ABI, include/linux/kho/abi/kexec_handover.h.
Public API¶
-
int kho_radix_add_page(struct kho_radix_tree *tree, unsigned long pfn, unsigned int order)¶
Marks a page as preserved in the radix tree.
Parameters
struct kho_radix_tree *treeThe KHO radix tree.
unsigned long pfnThe page frame number of the page to preserve.
unsigned int orderThe order of the page.
Description
This function traverses the radix tree based on the key derived from pfn and order. It sets the corresponding bit in the leaf bitmap to mark the page for preservation. If intermediate nodes do not exist along the path, they are allocated and added to the tree.
Return
0 on success, or a negative error code on failure.
-
void kho_radix_del_page(struct kho_radix_tree *tree, unsigned long pfn, unsigned int order)¶
Removes a page’s preservation status from the radix tree.
Parameters
struct kho_radix_tree *treeThe KHO radix tree.
unsigned long pfnThe page frame number of the page to unpreserve.
unsigned int orderThe order of the page.
Description
This function traverses the radix tree and clears the bit corresponding to the page, effectively removing its “preserved” status. It does not free the tree’s intermediate nodes, even if they become empty.
-
int kho_radix_walk_tree(struct kho_radix_tree *tree, kho_radix_tree_walk_callback_t cb)¶
Traverses the radix tree and calls a callback for each preserved page.
Parameters
struct kho_radix_tree *treeA pointer to the KHO radix tree to walk.
kho_radix_tree_walk_callback_t cbA callback function of type kho_radix_tree_walk_callback_t that will be invoked for each preserved page found in the tree. The callback receives the physical address and order of the preserved page.
Description
This function walks the radix tree, searching from the specified top level down to the lowest level (level 0). For each preserved page found, it invokes the provided callback, passing the page’s physical address and order.
Return
0 if the walk completed the specified tree, or the non-zero return value from the callback that stopped the walk.
Parameters
phys_addr_t physphysical address of the folio.
Return
pointer to the struct folio on success, NULL on failure.
-
struct page *kho_restore_pages(phys_addr_t phys, unsigned long nr_pages)¶
restore list of contiguous order 0 pages.
Parameters
phys_addr_t physphysical address of the first page.
unsigned long nr_pagesnumber of pages.
Description
Restore a contiguous list of order 0 pages that was preserved with
kho_preserve_pages().
Return
the first page on success, NULL on failure.
-
int kho_add_subtree(const char *name, void *fdt)¶
record the physical address of a sub FDT in KHO root tree.
Parameters
const char *namename of the sub tree.
void *fdtthe sub tree blob.
Description
Creates a new child node named name in KHO root FDT and records the physical address of fdt. The pages of fdt must also be preserved by KHO for the new kernel to retrieve it after kexec.
A debugfs blob entry is also created at
/sys/kernel/debug/kho/out/sub_fdts/**name** when kernel is configured with
CONFIG_KEXEC_HANDOVER_DEBUGFS
Return
0 on success, error code on failure
Parameters
struct folio *foliofolio to preserve.
Description
Instructs KHO to preserve the whole folio across kexec. The order will be preserved as well.
Return
0 on success, error code on failure
Parameters
struct folio *foliofolio to unpreserve.
Description
Instructs KHO to unpreserve a folio that was preserved by
kho_preserve_folio() before. The provided folio (pfn and order)
must exactly match a previously preserved folio.
-
int kho_preserve_pages(struct page *page, unsigned long nr_pages)¶
preserve contiguous pages across kexec
Parameters
struct page *pagefirst page in the list.
unsigned long nr_pagesnumber of pages.
Description
Preserve a contiguous list of order 0 pages. Must be restored using
kho_restore_pages() to ensure the pages are restored properly as order 0.
Return
0 on success, error code on failure
Parameters
struct page *pagefirst page in the list.
unsigned long nr_pagesnumber of pages.
Description
Instructs KHO to unpreserve nr_pages contiguous pages starting from page.
This must be called with the same page and nr_pages as the corresponding
kho_preserve_pages() call. Unpreserving arbitrary sub-ranges of larger
preserved blocks is not supported.
-
int kho_preserve_vmalloc(void *ptr, struct kho_vmalloc *preservation)¶
preserve memory allocated with
vmalloc()across kexec
Parameters
void *ptrpointer to the area in vmalloc address space
struct kho_vmalloc *preservationplaceholder for preservation metadata
Description
Instructs KHO to preserve the area in vmalloc address space at ptr. The physical pages mapped at ptr will be preserved and on successful return preservation will hold the physical address of a structure that describes the preservation.
NOTE
The memory allocated with vmalloc_node() variants cannot be reliably
restored on the same node
Return
0 on success, error code on failure
-
void kho_unpreserve_vmalloc(struct kho_vmalloc *preservation)¶
unpreserve memory allocated with
vmalloc()
Parameters
struct kho_vmalloc *preservationpreservation metadata returned by
kho_preserve_vmalloc()
Description
Instructs KHO to unpreserve the area in vmalloc address space that was
previously preserved with kho_preserve_vmalloc().
-
void *kho_restore_vmalloc(const struct kho_vmalloc *preservation)¶
recreates and populates an area in vmalloc address space from the preserved memory.
Parameters
const struct kho_vmalloc *preservationpreservation metadata.
Description
Recreates an area in vmalloc address space and populates it with memory that
was preserved using kho_preserve_vmalloc().
Return
pointer to the area in the vmalloc address space, NULL on failure.
-
void *kho_alloc_preserve(size_t size)¶
Allocate, zero, and preserve memory.
Parameters
size_t sizeThe number of bytes to allocate.
Description
Allocates a physically contiguous block of zeroed pages that is large enough to hold size bytes. The allocated memory is then registered with KHO for preservation across a kexec.
Note
The actual allocated size will be rounded up to the nearest power-of-two page boundary.
return A virtual pointer to the allocated and preserved memory on success,
or an ERR_PTR() encoded error on failure.
-
void kho_unpreserve_free(void *mem)¶
Unpreserve and free memory.
Parameters
void *memPointer to the memory allocated by
kho_alloc_preserve().
Description
Unregisters the memory from KHO preservation and frees the underlying
pages back to the system. This function should be called to clean up
memory allocated with kho_alloc_preserve().
-
void kho_restore_free(void *mem)¶
Restore and free memory after kexec.
Parameters
void *memPointer to the memory (in the new kernel’s address space) that was allocated by the old kernel.
Description
This function is intended to be called in the new kernel (post-kexec)
to take ownership of and free a memory region that was preserved by the
old kernel using kho_alloc_preserve().
It first restores the pages from KHO (using their physical address) and then frees the pages back to the new kernel’s page allocator.
-
bool is_kho_boot(void)¶
check if current kernel was booted via KHO-enabled kexec
Parameters
voidno arguments
Description
This function checks if the current kernel was loaded through a kexec operation with KHO enabled, by verifying that a valid KHO FDT was passed.
Note
This function returns reliable results only after
kho_populate() has been called during early boot. Before that,
it may return false even if KHO data is present.
Return
true if booted via KHO-enabled kexec, false otherwise
-
int kho_retrieve_subtree(const char *name, phys_addr_t *phys)¶
retrieve a preserved sub FDT by its name.
Parameters
const char *namethe name of the sub FDT passed to
kho_add_subtree().phys_addr_t *physif found, the physical address of the sub FDT is stored in phys.
Description
Retrieve a preserved sub FDT named name and store its physical address in phys.
Return
0 on success, error code on failure