Filesystem-level encryption (fscrypt)¶
Introduction¶
fscrypt is a library which filesystems can hook into to support transparent encryption of files and directories.
Note: “fscrypt” in this document refers to the kernel-level portion,
implemented in fs/crypto/
, as opposed to the userspace tool
fscrypt. This document only
covers the kernel-level portion. For command-line examples of how to
use encryption, see the documentation for the userspace tool fscrypt. Also, it is recommended to use
the fscrypt userspace tool, or other existing userspace tools such as
fscryptctl or Android’s key
management system, over
using the kernel’s API directly. Using existing tools reduces the
chance of introducing your own security bugs. (Nevertheless, for
completeness this documentation covers the kernel’s API anyway.)
Unlike dm-crypt, fscrypt operates at the filesystem level rather than at the block device level. This allows it to encrypt different files with different keys and to have unencrypted files on the same filesystem. This is useful for multi-user systems where each user’s data-at-rest needs to be cryptographically isolated from the others. However, except for filenames, fscrypt does not encrypt filesystem metadata.
Unlike eCryptfs, which is a stacked filesystem, fscrypt is integrated directly into supported filesystems — currently ext4, F2FS, and UBIFS. This allows encrypted files to be read and written without caching both the decrypted and encrypted pages in the pagecache, thereby nearly halving the memory used and bringing it in line with unencrypted files. Similarly, half as many dentries and inodes are needed. eCryptfs also limits encrypted filenames to 143 bytes, causing application compatibility issues; fscrypt allows the full 255 bytes (NAME_MAX). Finally, unlike eCryptfs, the fscrypt API can be used by unprivileged users, with no need to mount anything.
fscrypt does not support encrypting files in-place. Instead, it supports marking an empty directory as encrypted. Then, after userspace provides the key, all regular files, directories, and symbolic links created in that directory tree are transparently encrypted.
Threat model¶
Offline attacks¶
Provided that userspace chooses a strong encryption key, fscrypt protects the confidentiality of file contents and filenames in the event of a single point-in-time permanent offline compromise of the block device content. fscrypt does not protect the confidentiality of non-filename metadata, e.g. file sizes, file permissions, file timestamps, and extended attributes. Also, the existence and location of holes (unallocated blocks which logically contain all zeroes) in files is not protected.
fscrypt is not guaranteed to protect confidentiality or authenticity if an attacker is able to manipulate the filesystem offline prior to an authorized user later accessing the filesystem.
Online attacks¶
fscrypt (and storage encryption in general) can only provide limited protection, if any at all, against online attacks. In detail:
fscrypt is only resistant to side-channel attacks, such as timing or electromagnetic attacks, to the extent that the underlying Linux Cryptographic API algorithms are. If a vulnerable algorithm is used, such as a table-based implementation of AES, it may be possible for an attacker to mount a side channel attack against the online system. Side channel attacks may also be mounted against applications consuming decrypted data.
After an encryption key has been provided, fscrypt is not designed to hide the plaintext file contents or filenames from other users on the same system, regardless of the visibility of the keyring key. Instead, existing access control mechanisms such as file mode bits, POSIX ACLs, LSMs, or mount namespaces should be used for this purpose. Also note that as long as the encryption keys are anywhere in memory, an online attacker can necessarily compromise them by mounting a physical attack or by exploiting any kernel security vulnerability which provides an arbitrary memory read primitive.
While it is ostensibly possible to “evict” keys from the system,
recently accessed encrypted files will remain accessible at least
until the filesystem is unmounted or the VFS caches are dropped, e.g.
using echo 2 > /proc/sys/vm/drop_caches
. Even after that, if the
RAM is compromised before being powered off, it will likely still be
possible to recover portions of the plaintext file contents, if not
some of the encryption keys as well. (Since Linux v4.12, all
in-kernel keys related to fscrypt are sanitized before being freed.
However, userspace would need to do its part as well.)
Currently, fscrypt does not prevent a user from maliciously providing an incorrect key for another user’s existing encrypted files. A protection against this is planned.
Key hierarchy¶
Master Keys¶
Each encrypted directory tree is protected by a master key. Master keys can be up to 64 bytes long, and must be at least as long as the greater of the key length needed by the contents and filenames encryption modes being used. For example, if AES-256-XTS is used for contents encryption, the master key must be 64 bytes (512 bits). Note that the XTS mode is defined to require a key twice as long as that required by the underlying block cipher.
To “unlock” an encrypted directory tree, userspace must provide the appropriate master key. There can be any number of master keys, each of which protects any number of directory trees on any number of filesystems.
Userspace should generate master keys either using a cryptographically secure random number generator, or by using a KDF (Key Derivation Function). Note that whenever a KDF is used to “stretch” a lower-entropy secret such as a passphrase, it is critical that a KDF designed for this purpose be used, such as scrypt, PBKDF2, or Argon2.
Per-file keys¶
Master keys are not used to encrypt file contents or names directly. Instead, a unique key is derived for each encrypted file, including each regular file, directory, and symbolic link. This has several advantages:
- In cryptosystems, the same key material should never be used for different purposes. Using the master key as both an XTS key for contents encryption and as a CTS-CBC key for filenames encryption would violate this rule.
- Per-file keys simplify the choice of IVs (Initialization Vectors)
for contents encryption. Without per-file keys, to ensure IV
uniqueness both the inode and logical block number would need to be
encoded in the IVs. This would make it impossible to renumber
inodes, which e.g.
resize2fs
can do when resizing an ext4 filesystem. With per-file keys, it is sufficient to encode just the logical block number in the IVs. - Per-file keys strengthen the encryption of filenames, where IVs are reused out of necessity. With a unique key per directory, IV reuse is limited to within a single directory.
- Per-file keys allow individual files to be securely erased simply by securely erasing their keys. (Not yet implemented.)
A KDF (Key Derivation Function) is used to derive per-file keys from the master key. This is done instead of wrapping a randomly-generated key for each file because it reduces the size of the encryption xattr, which for some filesystems makes the xattr more likely to fit in-line in the filesystem’s inode table. With a KDF, only a 16-byte nonce is required — long enough to make key reuse extremely unlikely. A wrapped key, on the other hand, would need to be up to 64 bytes — the length of an AES-256-XTS key. Furthermore, currently there is no requirement to support unlocking a file with multiple alternative master keys or to support rotating master keys. Instead, the master keys may be wrapped in userspace, e.g. as done by the fscrypt tool.
The current KDF encrypts the master key using the 16-byte nonce as an AES-128-ECB key. The output is used as the derived key. If the output is longer than needed, then it is truncated to the needed length. Truncation is the norm for directories and symlinks, since those use the CTS-CBC encryption mode which requires a key half as long as that required by the XTS encryption mode.
Note: this KDF meets the primary security requirement, which is to produce unique derived keys that preserve the entropy of the master key, assuming that the master key is already a good pseudorandom key. However, it is nonstandard and has some problems such as being reversible, so it is generally considered to be a mistake! It may be replaced with HKDF or another more standard KDF in the future.
Encryption modes and usage¶
fscrypt allows one encryption mode to be specified for file contents and one encryption mode to be specified for filenames. Different directory trees are permitted to use different encryption modes. Currently, the following pairs of encryption modes are supported:
- AES-256-XTS for contents and AES-256-CTS-CBC for filenames
- AES-128-CBC for contents and AES-128-CTS-CBC for filenames
It is strongly recommended to use AES-256-XTS for contents encryption. AES-128-CBC was added only for low-powered embedded devices with crypto accelerators such as CAAM or CESA that do not support XTS.
New encryption modes can be added relatively easily, without changes to individual filesystems. However, authenticated encryption (AE) modes are not currently supported because of the difficulty of dealing with ciphertext expansion.
For file contents, each filesystem block is encrypted independently. Currently, only the case where the filesystem block size is equal to the system’s page size (usually 4096 bytes) is supported. With the XTS mode of operation (recommended), the logical block number within the file is used as the IV. With the CBC mode of operation (not recommended), ESSIV is used; specifically, the IV for CBC is the logical block number encrypted with AES-256, where the AES-256 key is the SHA-256 hash of the inode’s data encryption key.
For filenames, the full filename is encrypted at once. Because of the requirements to retain support for efficient directory lookups and filenames of up to 255 bytes, a constant initialization vector (IV) is used. However, each encrypted directory uses a unique key, which limits IV reuse to within a single directory. Note that IV reuse in the context of CTS-CBC encryption means that when the original filenames share a common prefix at least as long as the cipher block size (16 bytes for AES), the corresponding encrypted filenames will also share a common prefix. This is undesirable; it may be fixed in the future by switching to an encryption mode that is a strong pseudorandom permutation on arbitrary-length messages, e.g. the HEH (Hash-Encrypt-Hash) mode.
Since filenames are encrypted with the CTS-CBC mode of operation, the
plaintext and ciphertext filenames need not be multiples of the AES
block size, i.e. 16 bytes. However, the minimum size that can be
encrypted is 16 bytes, so shorter filenames are NUL-padded to 16 bytes
before being encrypted. In addition, to reduce leakage of filename
lengths via their ciphertexts, all filenames are NUL-padded to the
next 4, 8, 16, or 32-byte boundary (configurable). 32 is recommended
since this provides the best confidentiality, at the cost of making
directory entries consume slightly more space. Note that since NUL
(\0
) is not otherwise a valid character in filenames, the padding
will never produce duplicate plaintexts.
Symbolic link targets are considered a type of filename and are encrypted in the same way as filenames in directory entries. Each symlink also uses a unique key; hence, the hardcoded IV is not a problem for symlinks.
User API¶
Setting an encryption policy¶
The FS_IOC_SET_ENCRYPTION_POLICY ioctl sets an encryption policy on an
empty directory or verifies that a directory or regular file already
has the specified encryption policy. It takes in a pointer to a
struct fscrypt_policy
, defined as follows:
#define FS_KEY_DESCRIPTOR_SIZE 8
struct fscrypt_policy {
__u8 version;
__u8 contents_encryption_mode;
__u8 filenames_encryption_mode;
__u8 flags;
__u8 master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE];
};
This structure must be initialized as follows:
version
must be 0.contents_encryption_mode
andfilenames_encryption_mode
must be set to constants from<linux/fs.h>
which identify the encryption modes to use. If unsure, use FS_ENCRYPTION_MODE_AES_256_XTS (1) forcontents_encryption_mode
and FS_ENCRYPTION_MODE_AES_256_CTS (4) forfilenames_encryption_mode
.flags
must be set to a value from<linux/fs.h>
which identifies the amount of NUL-padding to use when encrypting filenames. If unsure, use FS_POLICY_FLAGS_PAD_32 (0x3).master_key_descriptor
specifies how to find the master key in the keyring; see Adding keys. It is up to userspace to choose a uniquemaster_key_descriptor
for each master key. The e4crypt and fscrypt tools use the first 8 bytes ofSHA-512(SHA-512(master_key))
, but this particular scheme is not required. Also, the master key need not be in the keyring yet when FS_IOC_SET_ENCRYPTION_POLICY is executed. However, it must be added before any files can be created in the encrypted directory.
If the file is not yet encrypted, then FS_IOC_SET_ENCRYPTION_POLICY verifies that the file is an empty directory. If so, the specified encryption policy is assigned to the directory, turning it into an encrypted directory. After that, and after providing the corresponding master key as described in Adding keys, all regular files, directories (recursively), and symlinks created in the directory will be encrypted, inheriting the same encryption policy. The filenames in the directory’s entries will be encrypted as well.
Alternatively, if the file is already encrypted, then FS_IOC_SET_ENCRYPTION_POLICY validates that the specified encryption policy exactly matches the actual one. If they match, then the ioctl returns 0. Otherwise, it fails with EEXIST. This works on both regular files and directories, including nonempty directories.
Note that the ext4 filesystem does not allow the root directory to be encrypted, even if it is empty. Users who want to encrypt an entire filesystem with one key should consider using dm-crypt instead.
FS_IOC_SET_ENCRYPTION_POLICY can fail with the following errors:
EACCES
: the file is not owned by the process’s uid, nor does the process have the CAP_FOWNER capability in a namespace with the file owner’s uid mappedEEXIST
: the file is already encrypted with an encryption policy different from the one specifiedEINVAL
: an invalid encryption policy was specified (invalid version, mode(s), or flags)ENOTDIR
: the file is unencrypted and is a regular file, not a directoryENOTEMPTY
: the file is unencrypted and is a nonempty directoryENOTTY
: this type of filesystem does not implement encryptionEOPNOTSUPP
: the kernel was not configured with encryption support for this filesystem, or the filesystem superblock has not had encryption enabled on it. (For example, to use encryption on an ext4 filesystem, CONFIG_EXT4_ENCRYPTION must be enabled in the kernel config, and the superblock must have had the “encrypt” feature flag enabled usingtune2fs -O encrypt
ormkfs.ext4 -O encrypt
.)EPERM
: this directory may not be encrypted, e.g. because it is the root directory of an ext4 filesystemEROFS
: the filesystem is readonly
Getting an encryption policy¶
The FS_IOC_GET_ENCRYPTION_POLICY ioctl retrieves the struct
fscrypt_policy
, if any, for a directory or regular file. See above
for the struct definition. No additional permissions are required
beyond the ability to open the file.
FS_IOC_GET_ENCRYPTION_POLICY can fail with the following errors:
EINVAL
: the file is encrypted, but it uses an unrecognized encryption context formatENODATA
: the file is not encryptedENOTTY
: this type of filesystem does not implement encryptionEOPNOTSUPP
: the kernel was not configured with encryption support for this filesystem
Note: if you only need to know whether a file is encrypted or not, on most filesystems it is also possible to use the FS_IOC_GETFLAGS ioctl and check for FS_ENCRYPT_FL, or to use the statx() system call and check for STATX_ATTR_ENCRYPTED in stx_attributes.
Getting the per-filesystem salt¶
Some filesystems, such as ext4 and F2FS, also support the deprecated ioctl FS_IOC_GET_ENCRYPTION_PWSALT. This ioctl retrieves a randomly generated 16-byte value stored in the filesystem superblock. This value is intended to used as a salt when deriving an encryption key from a passphrase or other low-entropy user credential.
FS_IOC_GET_ENCRYPTION_PWSALT is deprecated. Instead, prefer to generate and manage any needed salt(s) in userspace.
Adding keys¶
To provide a master key, userspace must add it to an appropriate
keyring using the add_key() system call (see:
Documentation/security/keys/core.rst
). The key type must be
“logon”; keys of this type are kept in kernel memory and cannot be
read back by userspace. The key description must be “fscrypt:”
followed by the 16-character lower case hex representation of the
master_key_descriptor
that was set in the encryption policy. The
key payload must conform to the following structure:
#define FS_MAX_KEY_SIZE 64
struct fscrypt_key {
u32 mode;
u8 raw[FS_MAX_KEY_SIZE];
u32 size;
};
mode
is ignored; just set it to 0. The actual key is provided in
raw
with size
indicating its size in bytes. That is, the
bytes raw[0..size-1]
(inclusive) are the actual key.
The key description prefix “fscrypt:” may alternatively be replaced with a filesystem-specific prefix such as “ext4:”. However, the filesystem-specific prefixes are deprecated and should not be used in new programs.
There are several different types of keyrings in which encryption keys may be placed, such as a session keyring, a user session keyring, or a user keyring. Each key must be placed in a keyring that is “attached” to all processes that might need to access files encrypted with it, in the sense that request_key() will find the key. Generally, if only processes belonging to a specific user need to access a given encrypted directory and no session keyring has been installed, then that directory’s key should be placed in that user’s user session keyring or user keyring. Otherwise, a session keyring should be installed if needed, and the key should be linked into that session keyring, or in a keyring linked into that session keyring.
Note: introducing the complex visibility semantics of keyrings here was arguably a mistake — especially given that by design, after any process successfully opens an encrypted file (thereby setting up the per-file key), possessing the keyring key is not actually required for any process to read/write the file until its in-memory inode is evicted. In the future there probably should be a way to provide keys directly to the filesystem instead, which would make the intended semantics clearer.
Access semantics¶
With the key¶
With the encryption key, encrypted regular files, directories, and symlinks behave very similarly to their unencrypted counterparts — after all, the encryption is intended to be transparent. However, astute users may notice some differences in behavior:
- Unencrypted files, or files encrypted with a different encryption policy (i.e. different key, modes, or flags), cannot be renamed or linked into an encrypted directory; see Encryption policy enforcement. Attempts to do so will fail with EPERM. However, encrypted files can be renamed within an encrypted directory, or into an unencrypted directory.
- Direct I/O is not supported on encrypted files. Attempts to use direct I/O on such files will fall back to buffered I/O.
- The fallocate operations FALLOC_FL_COLLAPSE_RANGE, FALLOC_FL_INSERT_RANGE, and FALLOC_FL_ZERO_RANGE are not supported on encrypted files and will fail with EOPNOTSUPP.
- Online defragmentation of encrypted files is not supported. The EXT4_IOC_MOVE_EXT and F2FS_IOC_MOVE_RANGE ioctls will fail with EOPNOTSUPP.
- The ext4 filesystem does not support data journaling with encrypted regular files. It will fall back to ordered data mode instead.
- DAX (Direct Access) is not supported on encrypted files.
- The st_size of an encrypted symlink will not necessarily give the length of the symlink target as required by POSIX. It will actually give the length of the ciphertext, which will be slightly longer than the plaintext due to NUL-padding and an extra 2-byte overhead.
- The maximum length of an encrypted symlink is 2 bytes shorter than the maximum length of an unencrypted symlink. For example, on an EXT4 filesystem with a 4K block size, unencrypted symlinks can be up to 4095 bytes long, while encrypted symlinks can only be up to 4093 bytes long (both lengths excluding the terminating null).
Note that mmap is supported. This is possible because the pagecache for an encrypted file contains the plaintext, not the ciphertext.
Without the key¶
Some filesystem operations may be performed on encrypted regular files, directories, and symlinks even before their encryption key has been provided:
File metadata may be read, e.g. using stat().
Directories may be listed, in which case the filenames will be listed in an encoded form derived from their ciphertext. The current encoding algorithm is described in Filename hashing and encoding. The algorithm is subject to change, but it is guaranteed that the presented filenames will be no longer than NAME_MAX bytes, will not contain the
/
or\0
characters, and will uniquely identify directory entries.The
.
and..
directory entries are special. They are always present and are not encrypted or encoded.Files may be deleted. That is, nondirectory files may be deleted with unlink() as usual, and empty directories may be deleted with rmdir() as usual. Therefore,
rm
andrm -r
will work as expected.Symlink targets may be read and followed, but they will be presented in encrypted form, similar to filenames in directories. Hence, they are unlikely to point to anywhere useful.
Without the key, regular files cannot be opened or truncated. Attempts to do so will fail with ENOKEY. This implies that any regular file operations that require a file descriptor, such as read(), write(), mmap(), fallocate(), and ioctl(), are also forbidden.
Also without the key, files of any type (including directories) cannot be created or linked into an encrypted directory, nor can a name in an encrypted directory be the source or target of a rename, nor can an O_TMPFILE temporary file be created in an encrypted directory. All such operations will fail with ENOKEY.
It is not currently possible to backup and restore encrypted files without the encryption key. This would require special APIs which have not yet been implemented.
Encryption policy enforcement¶
After an encryption policy has been set on a directory, all regular files, directories, and symbolic links created in that directory (recursively) will inherit that encryption policy. Special files — that is, named pipes, device nodes, and UNIX domain sockets — will not be encrypted.
Except for those special files, it is forbidden to have unencrypted files, or files encrypted with a different encryption policy, in an encrypted directory tree. Attempts to link or rename such a file into an encrypted directory will fail with EPERM. This is also enforced during ->lookup() to provide limited protection against offline attacks that try to disable or downgrade encryption in known locations where applications may later write sensitive data. It is recommended that systems implementing a form of “verified boot” take advantage of this by validating all top-level encryption policies prior to access.
Implementation details¶
Encryption context¶
An encryption policy is represented on-disk by a struct
fscrypt_context
. It is up to individual filesystems to decide where
to store it, but normally it would be stored in a hidden extended
attribute. It should not be exposed by the xattr-related system
calls such as getxattr() and setxattr() because of the special
semantics of the encryption xattr. (In particular, there would be
much confusion if an encryption policy were to be added to or removed
from anything other than an empty directory.) The struct is defined
as follows:
#define FS_KEY_DESCRIPTOR_SIZE 8
#define FS_KEY_DERIVATION_NONCE_SIZE 16
struct fscrypt_context {
u8 format;
u8 contents_encryption_mode;
u8 filenames_encryption_mode;
u8 flags;
u8 master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE];
u8 nonce[FS_KEY_DERIVATION_NONCE_SIZE];
};
Note that struct fscrypt_context
contains the same
information as struct fscrypt_policy
(see Setting an
encryption policy), except that struct fscrypt_context
also contains a nonce. The nonce is randomly generated by the kernel
and is used to derive the inode’s encryption key as described in
Per-file keys.
Data path changes¶
For the read path (->readpage()) of regular files, filesystems can read the ciphertext into the page cache and decrypt it in-place. The page lock must be held until decryption has finished, to prevent the page from becoming visible to userspace prematurely.
For the write path (->writepage()) of regular files, filesystems cannot encrypt data in-place in the page cache, since the cached plaintext must be preserved. Instead, filesystems must encrypt into a temporary buffer or “bounce page”, then write out the temporary buffer. Some filesystems, such as UBIFS, already use temporary buffers regardless of encryption. Other filesystems, such as ext4 and F2FS, have to allocate bounce pages specially for encryption.
Filename hashing and encoding¶
Modern filesystems accelerate directory lookups by using indexed directories. An indexed directory is organized as a tree keyed by filename hashes. When a ->lookup() is requested, the filesystem normally hashes the filename being looked up so that it can quickly find the corresponding directory entry, if any.
With encryption, lookups must be supported and efficient both with and without the encryption key. Clearly, it would not work to hash the plaintext filenames, since the plaintext filenames are unavailable without the key. (Hashing the plaintext filenames would also make it impossible for the filesystem’s fsck tool to optimize encrypted directories.) Instead, filesystems hash the ciphertext filenames, i.e. the bytes actually stored on-disk in the directory entries. When asked to do a ->lookup() with the key, the filesystem just encrypts the user-supplied name to get the ciphertext.
Lookups without the key are more complicated. The raw ciphertext may
contain the \0
and /
characters, which are illegal in
filenames. Therefore, readdir() must base64-encode the ciphertext for
presentation. For most filenames, this works fine; on ->lookup(), the
filesystem just base64-decodes the user-supplied name to get back to
the raw ciphertext.
However, for very long filenames, base64 encoding would cause the
filename length to exceed NAME_MAX. To prevent this, readdir()
actually presents long filenames in an abbreviated form which encodes
a strong “hash” of the ciphertext filename, along with the optional
filesystem-specific hash(es) needed for directory lookups. This
allows the filesystem to still, with a high degree of confidence, map
the filename given in ->lookup() back to a particular directory entry
that was previously listed by readdir(). See struct
fscrypt_digested_name
in the source for more details.
Note that the precise way that filenames are presented to userspace
without the key is subject to change in the future. It is only meant
as a way to temporarily present valid filenames so that commands like
rm -r
work as expected on encrypted directories.