Swap Table

Swap table implements swap cache as a per-cluster swap cache value array.

Swap Entry

A swap entry contains the information required to serve the anonymous page fault.

Swap entry is encoded as two parts: swap type and swap offset.

The swap type indicates which swap device to use. The swap offset is the offset of the swap file to read the page data from.

Swap Cache

Swap cache is a map to look up folios using swap entry as the key. The result value can have three possible types depending on which stage of this swap entry was in.

  1. NULL: This swap entry is not used.

  2. folio: A folio has been allocated and bound to this swap entry. This is the transient state of swap out or swap in. The folio data can be in the folio or swap file, or both.

  3. shadow: The shadow contains the working set information of the swapped out folio. This is the normal state for a swapped out page.

Swap Table Internals

The previous swap cache is implemented by XArray. The XArray is a tree structure. Each lookup will go through multiple nodes. Can we do better?

Notice that most of the time when we look up the swap cache, we are either in a swap in or swap out path. We should already have the swap cluster, which contains the swap entry.

If we have a per-cluster array to store swap cache value in the cluster. Swap cache lookup within the cluster can be a very simple array lookup.

We give such a per-cluster swap cache value array a name: the swap table.

Each swap cluster contains 512 entries, so a swap table stores one cluster worth of swap cache values, which is exactly one page. This is not coincidental because the cluster size is determined by the huge page size. The swap table is holding an array of pointers. The pointer has the same size as the PTE. The size of the swap table should match to the second last level of the page table page, exactly one page.

With swap table, swap cache lookup can achieve great locality, simpler, and faster.

Locking

Swap table modification requires taking the cluster lock. If a folio is being added to or removed from the swap table, the folio must be locked prior to the cluster lock. After adding or removing is done, the folio shall be unlocked.

Swap table lookup is protected by RCU and atomic read. If the lookup returns a folio, the user must lock the folio before use.