This is the fscrypt section of my draft patch series [1] to add
encryption support to btrfs.
Last October, Omar Sandoval sent out a design proposal for using fscrypt
with btrfs [2]. To tersely summarize the challenges laid out in the
document, btrfs supports sharing the same physical storage between
multiple files (reflinks); moving the physical location of file data
without access to the file inodes; and creating writable snapshots of
some directories (subvolumes). To allow encryption to coexist with these
features, the proposal was for btrfs to create and preserve an IV per
data block, no matter its physical location or owning inode(s), and for
btrfs to allow partially-encrypted writable snapshots of unencrypted
subvolumes.
To deal with these issues, a few changes to fscrypt are proposed:
- To enable snapshotting and then encrypting new writes to that
subvolume, a flag to allow partially encrypted directories;
- To allow filesystems to supply an IV in cases where the logical block
number and owning inode for data may change, a new policy and
interface to convey IVs from filesystem to fscrypt.
Comments, especially on the new policy and interface, appreciated!
[1] https://lore.kernel.org/linux-btrfs/[email protected]
[2] https://lore.kernel.org/linux-btrfs/[email protected]/
Omar Sandoval (3):
fscrypt: expose fscrypt_nokey_name
fscrypt: add flag allowing partially-encrypted directories
fscrypt: add fscrypt_have_same_policy() to check inode's compatibility
Sweet Tea Dorminy (1):
fscrypt: Add new encryption policy for btrfs.
fs/crypto/crypto.c | 28 +++++++++++++++--
fs/crypto/fname.c | 56 ++++++++++-----------------------
fs/crypto/fscrypt_private.h | 4 +--
fs/crypto/inline_crypt.c | 20 ++++++++----
fs/crypto/keysetup.c | 5 +++
fs/crypto/policy.c | 42 ++++++++++++++++++++++++-
include/linux/fscrypt.h | 61 +++++++++++++++++++++++++++++++++++-
include/uapi/linux/fscrypt.h | 1 +
8 files changed, 166 insertions(+), 51 deletions(-)
--
2.35.1
Certain filesystems may want to use IVs generated and stored outside of
fscrypt's inode-based IV generation policies. In particular, btrfs can
have multiple inodes referencing a single block of data, and moves
logical data blocks to different physical locations on disk; these two
features mean inode or physical-location-based IV generation policies
will not work for btrfs. For these or similar reasons, such filesystems
may want to implement their own IV generation and storage for data
blocks.
Plumbing each such filesystem's internals into fscrypt for IV generation
would be ungainly and fragile. Thus, this change adds a new policy,
IV_FROM_FS, and a new operation function pointer, get_fs_derived_iv. If
this policy is selected, the filesystem is required to provide the
function pointer, which populates the IV for a particular data block.
The IV buffer passed to get_fs_derived_iv() is pre-populated with the
inode contexts' nonce, in case the filesystem would like to use this
information; for btrfs, this is used for filename encryption. Any
filesystem using this policy is expected to appropriately generate and
store a persistent random IV for each block of data.
Filesystems implementing this policy are expected to be incompatible
with existing IV generation policies, so if the function pointer is set,
only dummy or IV_FROM_FS policies are permitted. If there is a
filesystem which allows other policies as well as IV_FROM_FS, it may be
better to expose the policy to filesystems, so they can determine
whether any given policy is compatible with their operation.
Signed-off-by: Sweet Tea Dorminy <[email protected]>
---
fs/crypto/crypto.c | 28 ++++++++++++++++++++++++++--
fs/crypto/fscrypt_private.h | 4 ++--
fs/crypto/inline_crypt.c | 20 ++++++++++++++------
fs/crypto/keysetup.c | 5 +++++
fs/crypto/policy.c | 16 +++++++++++++++-
include/linux/fscrypt.h | 21 ++++++++++++++++++++-
include/uapi/linux/fscrypt.h | 1 +
7 files changed, 83 insertions(+), 12 deletions(-)
diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c
index e78be66bbf01..3c29c437ae0b 100644
--- a/fs/crypto/crypto.c
+++ b/fs/crypto/crypto.c
@@ -69,6 +69,20 @@ void fscrypt_free_bounce_page(struct page *bounce_page)
}
EXPORT_SYMBOL(fscrypt_free_bounce_page);
+int fscrypt_mode_ivsize(struct inode *inode)
+{
+ struct fscrypt_info *ci;
+
+ if (!fscrypt_needs_contents_encryption(inode))
+ return 0;
+
+ ci = inode->i_crypt_info;
+ if (WARN_ON_ONCE(!ci))
+ return 0;
+ return ci->ci_mode->ivsize;
+}
+EXPORT_SYMBOL(fscrypt_mode_ivsize);
+
/*
* Generate the IV for the given logical block number within the given file.
* For filenames encryption, lblk_num == 0.
@@ -81,13 +95,23 @@ void fscrypt_generate_iv(union fscrypt_iv *iv, u64 lblk_num,
const struct fscrypt_info *ci)
{
u8 flags = fscrypt_policy_flags(&ci->ci_policy);
+ struct inode *inode = ci->ci_inode;
memset(iv, 0, ci->ci_mode->ivsize);
+ if (flags & FSCRYPT_POLICY_FLAG_IV_FROM_FS) {
+ const struct fscrypt_operations *s_cop = inode->i_sb->s_cop;
+ /* Provide the nonce in case the filesystem wants to use it */
+ memcpy(iv->nonce, ci->ci_nonce, FSCRYPT_FILE_NONCE_SIZE);
+ s_cop->get_fs_defined_iv(iv->raw, ci->ci_mode->ivsize, inode,
+ lblk_num);
+ return;
+ }
+
if (flags & FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64) {
WARN_ON_ONCE(lblk_num > U32_MAX);
- WARN_ON_ONCE(ci->ci_inode->i_ino > U32_MAX);
- lblk_num |= (u64)ci->ci_inode->i_ino << 32;
+ WARN_ON_ONCE(inode->i_ino > U32_MAX);
+ lblk_num |= (u64)inode->i_ino << 32;
} else if (flags & FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32) {
WARN_ON_ONCE(lblk_num > U32_MAX);
lblk_num = (u32)(ci->ci_hashed_ino + lblk_num);
diff --git a/fs/crypto/fscrypt_private.h b/fs/crypto/fscrypt_private.h
index 6b4c8094cc7b..084c6ba1e766 100644
--- a/fs/crypto/fscrypt_private.h
+++ b/fs/crypto/fscrypt_private.h
@@ -279,8 +279,6 @@ fscrypt_msg(const struct inode *inode, const char *level, const char *fmt, ...);
#define fscrypt_err(inode, fmt, ...) \
fscrypt_msg((inode), KERN_ERR, fmt, ##__VA_ARGS__)
-#define FSCRYPT_MAX_IV_SIZE 32
-
union fscrypt_iv {
struct {
/* logical block number within the file */
@@ -326,6 +324,7 @@ int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, const u8 *master_key,
#define HKDF_CONTEXT_DIRHASH_KEY 5 /* info=file_nonce */
#define HKDF_CONTEXT_IV_INO_LBLK_32_KEY 6 /* info=mode_num||fs_uuid */
#define HKDF_CONTEXT_INODE_HASH_KEY 7 /* info=<empty> */
+#define HKDF_CONTEXT_IV_FROM_FS_KEY 8 /* info=mode_num */
int fscrypt_hkdf_expand(const struct fscrypt_hkdf *hkdf, u8 context,
const u8 *info, unsigned int infolen,
@@ -498,6 +497,7 @@ struct fscrypt_master_key {
struct fscrypt_prepared_key mk_direct_keys[FSCRYPT_MODE_MAX + 1];
struct fscrypt_prepared_key mk_iv_ino_lblk_64_keys[FSCRYPT_MODE_MAX + 1];
struct fscrypt_prepared_key mk_iv_ino_lblk_32_keys[FSCRYPT_MODE_MAX + 1];
+ struct fscrypt_prepared_key mk_iv_from_fs_keys[FSCRYPT_MODE_MAX + 1];
/* Hash key for inode numbers. Initialized only when needed. */
siphash_key_t mk_ino_hash_key;
diff --git a/fs/crypto/inline_crypt.c b/fs/crypto/inline_crypt.c
index 90f3e68f166e..8a8330caadfa 100644
--- a/fs/crypto/inline_crypt.c
+++ b/fs/crypto/inline_crypt.c
@@ -476,14 +476,22 @@ u64 fscrypt_limit_io_blocks(const struct inode *inode, u64 lblk, u64 nr_blocks)
return nr_blocks;
ci = inode->i_crypt_info;
- if (!(fscrypt_policy_flags(&ci->ci_policy) &
- FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32))
- return nr_blocks;
- /* With IV_INO_LBLK_32, the DUN can wrap around from U32_MAX to 0. */
+ if (fscrypt_policy_flags(&ci->ci_policy) &
+ FSCRYPT_POLICY_FLAG_IV_FROM_FS) {
+ return 1;
+ }
- dun = ci->ci_hashed_ino + lblk;
+ if ((fscrypt_policy_flags(&ci->ci_policy) &
+ FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32)) {
+ /*
+ * With IV_INO_LBLK_32, the DUN can wrap around from U32_MAX to
+ * 0.
+ */
+ dun = ci->ci_hashed_ino + lblk;
+ return min_t(u64, nr_blocks, (u64)U32_MAX + 1 - dun);
+ }
- return min_t(u64, nr_blocks, (u64)U32_MAX + 1 - dun);
+ return nr_blocks;
}
EXPORT_SYMBOL_GPL(fscrypt_limit_io_blocks);
diff --git a/fs/crypto/keysetup.c b/fs/crypto/keysetup.c
index c35711896bd4..62b30b567c0d 100644
--- a/fs/crypto/keysetup.c
+++ b/fs/crypto/keysetup.c
@@ -323,6 +323,11 @@ static int fscrypt_setup_v2_file_key(struct fscrypt_info *ci,
*/
err = setup_per_mode_enc_key(ci, mk, mk->mk_direct_keys,
HKDF_CONTEXT_DIRECT_KEY, false);
+ } else if (ci->ci_policy.v2.flags & FSCRYPT_POLICY_FLAG_IV_FROM_FS) {
+ err = setup_per_mode_enc_key(ci, mk,
+ mk->mk_iv_from_fs_keys,
+ HKDF_CONTEXT_IV_FROM_FS_KEY,
+ false);
} else if (ci->ci_policy.v2.flags &
FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64) {
/*
diff --git a/fs/crypto/policy.c b/fs/crypto/policy.c
index 5763462af9e8..878650b8e3e7 100644
--- a/fs/crypto/policy.c
+++ b/fs/crypto/policy.c
@@ -199,7 +199,8 @@ static bool fscrypt_supported_v2_policy(const struct fscrypt_policy_v2 *policy,
if (policy->flags & ~(FSCRYPT_POLICY_FLAGS_PAD_MASK |
FSCRYPT_POLICY_FLAG_DIRECT_KEY |
FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64 |
- FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32)) {
+ FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32 |
+ FSCRYPT_POLICY_FLAG_IV_FROM_FS)) {
fscrypt_warn(inode, "Unsupported encryption flags (0x%02x)",
policy->flags);
return false;
@@ -208,6 +209,7 @@ static bool fscrypt_supported_v2_policy(const struct fscrypt_policy_v2 *policy,
count += !!(policy->flags & FSCRYPT_POLICY_FLAG_DIRECT_KEY);
count += !!(policy->flags & FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64);
count += !!(policy->flags & FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32);
+ count += !!(policy->flags & FSCRYPT_POLICY_FLAG_IV_FROM_FS);
if (count > 1) {
fscrypt_warn(inode, "Mutually exclusive encryption flags (0x%02x)",
policy->flags);
@@ -235,6 +237,18 @@ static bool fscrypt_supported_v2_policy(const struct fscrypt_policy_v2 *policy,
32, 32))
return false;
+ if ((policy->flags & FSCRYPT_POLICY_FLAG_IV_FROM_FS) &&
+ !inode->i_sb->s_cop->get_fs_defined_iv)
+ return false;
+
+ /*
+ * If the filesystem defines its own IVs, presumably it does so because
+ * no existing policy works, so disallow other policies.
+ */
+ if (inode->i_sb->s_cop->get_fs_defined_iv &&
+ !(policy->flags & FSCRYPT_POLICY_FLAG_IV_FROM_FS))
+ return false;
+
if (memchr_inv(policy->__reserved, 0, sizeof(policy->__reserved))) {
fscrypt_warn(inode, "Reserved bits set in encryption policy");
return false;
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 1686b25f6d9c..3ebca02c4ba0 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -94,6 +94,12 @@ struct fscrypt_nokey_name {
/* Maximum value for the third parameter of fscrypt_operations.set_context(). */
#define FSCRYPT_SET_CONTEXT_MAX_SIZE 40
+/*
+ * Maximum size needed for an IV. Defines the size of the buffer passed to a
+ * get_fs_defined_iv() function.
+ */
+#define FSCRYPT_MAX_IV_SIZE 32
+
#ifdef CONFIG_FS_ENCRYPTION
/*
@@ -198,7 +204,13 @@ struct fscrypt_operations {
*/
void (*get_ino_and_lblk_bits)(struct super_block *sb,
int *ino_bits_ret, int *lblk_bits_ret);
-
+ /*
+ * Generate an IV for a given inode and lblk number, for filesystems
+ * where additional filesystem-internal information is necessary to
+ * keep the IV stable.
+ */
+ void (*get_fs_defined_iv)(u8 *iv, int ivsize, struct inode *inode,
+ u64 lblk_num);
/*
* Return the number of block devices to which the filesystem may write
* encrypted file contents.
@@ -317,6 +329,8 @@ static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
void fscrypt_free_bounce_page(struct page *bounce_page);
+int fscrypt_mode_ivsize(struct inode *inode);
+
/* policy.c */
int fscrypt_have_same_policy(struct inode *inode1, struct inode *inode2);
int fscrypt_ioctl_set_policy(struct file *filp, const void __user *arg);
@@ -492,6 +506,11 @@ static inline void fscrypt_free_bounce_page(struct page *bounce_page)
{
}
+static inline int fscrypt_mode_ivsize(struct inode *inode)
+{
+ return 0;
+}
+
/* policy.c */
static inline int fscrypt_ioctl_set_policy(struct file *filp,
const void __user *arg)
diff --git a/include/uapi/linux/fscrypt.h b/include/uapi/linux/fscrypt.h
index 9f4428be3e36..3fbde7668b07 100644
--- a/include/uapi/linux/fscrypt.h
+++ b/include/uapi/linux/fscrypt.h
@@ -20,6 +20,7 @@
#define FSCRYPT_POLICY_FLAG_DIRECT_KEY 0x04
#define FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64 0x08
#define FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32 0x10
+#define FSCRYPT_POLICY_FLAG_IV_FROM_FS 0x20
/* Encryption algorithms */
#define FSCRYPT_MODE_AES_256_XTS 1
--
2.35.1
From: Omar Sandoval <[email protected]>
btrfs will have the possibility of encrypted and unencrypted files in
the same directory, and it's important to not allow these two files to
become linked together. Therefore, add a function which allows checking
the encryption policies of two inodes to ensure they are compatible.
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Sweet Tea Dorminy <[email protected]>
---
fs/crypto/policy.c | 26 ++++++++++++++++++++++++++
include/linux/fscrypt.h | 1 +
2 files changed, 27 insertions(+)
diff --git a/fs/crypto/policy.c b/fs/crypto/policy.c
index 5f858cee1e3b..5763462af9e8 100644
--- a/fs/crypto/policy.c
+++ b/fs/crypto/policy.c
@@ -407,6 +407,32 @@ static int fscrypt_get_policy(struct inode *inode, union fscrypt_policy *policy)
return fscrypt_policy_from_context(policy, &ctx, ret);
}
+/**
+ * fscrypt_have_same_policy() - check whether two inodes have the same policy
+ * @inode1: the first inode
+ * @inode2: the second inode
+ *
+ * Return: %true if equal, else %false
+ */
+int fscrypt_have_same_policy(struct inode *inode1, struct inode *inode2)
+{
+ union fscrypt_policy policy1, policy2;
+ int err;
+
+ if (!IS_ENCRYPTED(inode1) && !IS_ENCRYPTED(inode2))
+ return true;
+ else if (!IS_ENCRYPTED(inode1) || !IS_ENCRYPTED(inode2))
+ return false;
+ err = fscrypt_get_policy(inode1, &policy1);
+ if (err)
+ return err;
+ err = fscrypt_get_policy(inode2, &policy2);
+ if (err)
+ return err;
+ return fscrypt_policies_equal(&policy1, &policy2);
+}
+EXPORT_SYMBOL(fscrypt_have_same_policy);
+
static int set_encryption_policy(struct inode *inode,
const union fscrypt_policy *policy)
{
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index fb48961c46f6..1686b25f6d9c 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -318,6 +318,7 @@ static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
void fscrypt_free_bounce_page(struct page *bounce_page);
/* policy.c */
+int fscrypt_have_same_policy(struct inode *inode1, struct inode *inode2);
int fscrypt_ioctl_set_policy(struct file *filp, const void __user *arg);
int fscrypt_ioctl_get_policy(struct file *filp, void __user *arg);
int fscrypt_ioctl_get_policy_ex(struct file *filp, void __user *arg);
--
2.35.1
From: Omar Sandoval <[email protected]>
Btrfs stores its data structures, including filenames in directories, in
its own buffer implementation, struct extent_buffer, composed of
several non-contiguous pages. We could copy filenames into a
temporary buffer and use fscrypt_match_name() against that buffer, such
extensive memcpying would be expensive. Instead, exposing
fscrypt_nokey_name as in this change allows btrfs to recapitulate
fscrypt_match_name() using methods on struct extent_buffer instead of
dealing with a raw byte array.
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Sweet Tea Dorminy <[email protected]>
---
fs/crypto/fname.c | 39 +--------------------------------------
include/linux/fscrypt.h | 37 +++++++++++++++++++++++++++++++++++++
2 files changed, 38 insertions(+), 38 deletions(-)
diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 14e0ef5e9a20..5d5c26d827fd 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -14,7 +14,6 @@
#include <linux/namei.h>
#include <linux/scatterlist.h>
#include <crypto/hash.h>
-#include <crypto/sha2.h>
#include <crypto/skcipher.h>
#include "fscrypt_private.h"
@@ -26,43 +25,7 @@
#define FSCRYPT_FNAME_MIN_MSG_LEN 16
/*
- * struct fscrypt_nokey_name - identifier for directory entry when key is absent
- *
- * When userspace lists an encrypted directory without access to the key, the
- * filesystem must present a unique "no-key name" for each filename that allows
- * it to find the directory entry again if requested. Naively, that would just
- * mean using the ciphertext filenames. However, since the ciphertext filenames
- * can contain illegal characters ('\0' and '/'), they must be encoded in some
- * way. We use base64url. But that can cause names to exceed NAME_MAX (255
- * bytes), so we also need to use a strong hash to abbreviate long names.
- *
- * The filesystem may also need another kind of hash, the "dirhash", to quickly
- * find the directory entry. Since filesystems normally compute the dirhash
- * over the on-disk filename (i.e. the ciphertext), it's not computable from
- * no-key names that abbreviate the ciphertext using the strong hash to fit in
- * NAME_MAX. It's also not computable if it's a keyed hash taken over the
- * plaintext (but it may still be available in the on-disk directory entry);
- * casefolded directories use this type of dirhash. At least in these cases,
- * each no-key name must include the name's dirhash too.
- *
- * To meet all these requirements, we base64url-encode the following
- * variable-length structure. It contains the dirhash, or 0's if the filesystem
- * didn't provide one; up to 149 bytes of the ciphertext name; and for
- * ciphertexts longer than 149 bytes, also the SHA-256 of the remaining bytes.
- *
- * This ensures that each no-key name contains everything needed to find the
- * directory entry again, contains only legal characters, doesn't exceed
- * NAME_MAX, is unambiguous unless there's a SHA-256 collision, and that we only
- * take the performance hit of SHA-256 on very long filenames (which are rare).
- */
-struct fscrypt_nokey_name {
- u32 dirhash[2];
- u8 bytes[149];
- u8 sha256[SHA256_DIGEST_SIZE];
-}; /* 189 bytes => 252 bytes base64url-encoded, which is <= NAME_MAX (255) */
-
-/*
- * Decoded size of max-size no-key name, i.e. a name that was abbreviated using
+ * Decoded size of max-size nokey name, i.e. a name that was abbreviated using
* the strong hash and thus includes the 'sha256' field. This isn't simply
* sizeof(struct fscrypt_nokey_name), as the padding at the end isn't included.
*/
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index e60d57c99cb6..6020b738c3b2 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -16,6 +16,7 @@
#include <linux/fs.h>
#include <linux/mm.h>
#include <linux/slab.h>
+#include <crypto/sha2.h>
#include <uapi/linux/fscrypt.h>
/*
@@ -54,6 +55,42 @@ struct fscrypt_name {
#define fname_name(p) ((p)->disk_name.name)
#define fname_len(p) ((p)->disk_name.len)
+/*
+ * struct fscrypt_nokey_name - identifier for directory entry when key is absent
+ *
+ * When userspace lists an encrypted directory without access to the key, the
+ * filesystem must present a unique "no-key name" for each filename that allows
+ * it to find the directory entry again if requested. Naively, that would just
+ * mean using the ciphertext filenames. However, since the ciphertext filenames
+ * can contain illegal characters ('\0' and '/'), they must be encoded in some
+ * way. We use base64url. But that can cause names to exceed NAME_MAX (255
+ * bytes), so we also need to use a strong hash to abbreviate long names.
+ *
+ * The filesystem may also need another kind of hash, the "dirhash", to quickly
+ * find the directory entry. Since filesystems normally compute the dirhash
+ * over the on-disk filename (i.e. the ciphertext), it's not computable from
+ * no-key names that abbreviate the ciphertext using the strong hash to fit in
+ * NAME_MAX. It's also not computable if it's a keyed hash taken over the
+ * plaintext (but it may still be available in the on-disk directory entry);
+ * casefolded directories use this type of dirhash. At least in these cases,
+ * each no-key name must include the name's dirhash too.
+ *
+ * To meet all these requirements, we base64url-encode the following
+ * variable-length structure. It contains the dirhash, or 0's if the filesystem
+ * didn't provide one; up to 149 bytes of the ciphertext name; and for
+ * ciphertexts longer than 149 bytes, also the SHA-256 of the remaining bytes.
+ *
+ * This ensures that each no-key name contains everything needed to find the
+ * directory entry again, contains only legal characters, doesn't exceed
+ * NAME_MAX, is unambiguous unless there's a SHA-256 collision, and that we only
+ * take the performance hit of SHA-256 on very long filenames (which are rare).
+ */
+struct fscrypt_nokey_name {
+ u32 dirhash[2];
+ u8 bytes[149];
+ u8 sha256[SHA256_DIGEST_SIZE];
+}; /* 189 bytes => 252 bytes base64url-encoded, which is <= NAME_MAX (255) */
+
/* Maximum value for the third parameter of fscrypt_operations.set_context(). */
#define FSCRYPT_SET_CONTEXT_MAX_SIZE 40
--
2.35.1
From: Omar Sandoval <[email protected]>
Creating several new subvolumes out of snapshots of another subvolume,
each for a different VM's storage, is a important usecase for btrfs. We
would like to give each VM a unique encryption key to use for new writes
to its subvolume, so that secure deletion of the VM's data is as simple
as securely deleting the key; to avoid needing multiple keys in each VM,
we envision the initial subvolume being unencrypted. However, this means
that the snapshot's directories would have a mix of encrypted and
unencrypted files.
To allow this, add another FS_CFLG to allow filesystems to opt into
partially encrypted directories.
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Sweet Tea Dorminy <[email protected]>
---
fs/crypto/fname.c | 17 ++++++++++++++++-
include/linux/fscrypt.h | 2 ++
2 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 5d5c26d827fd..c5dd19c1d19e 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -389,6 +389,7 @@ int fscrypt_setup_filename(struct inode *dir, const struct qstr *iname,
fname->usr_fname = iname;
if (!IS_ENCRYPTED(dir) || fscrypt_is_dot_dotdot(iname)) {
+unencrypted:
fname->disk_name.name = (unsigned char *)iname->name;
fname->disk_name.len = iname->len;
return 0;
@@ -424,8 +425,16 @@ int fscrypt_setup_filename(struct inode *dir, const struct qstr *iname,
* user-supplied name
*/
- if (iname->len > FSCRYPT_NOKEY_NAME_MAX_ENCODED)
+ if (iname->len > FSCRYPT_NOKEY_NAME_MAX_ENCODED) {
+ /*
+ * This isn't a valid nokey name, but it could be an unencrypted
+ * name if the filesystem allows partially encrypted
+ * directories.
+ */
+ if (dir->i_sb->s_cop->flags & FS_CFLG_ALLOW_PARTIAL)
+ goto unencrypted;
return -ENOENT;
+ }
fname->crypto_buf.name = kmalloc(FSCRYPT_NOKEY_NAME_MAX, GFP_KERNEL);
if (fname->crypto_buf.name == NULL)
@@ -436,6 +445,12 @@ int fscrypt_setup_filename(struct inode *dir, const struct qstr *iname,
if (ret < (int)offsetof(struct fscrypt_nokey_name, bytes[1]) ||
(ret > offsetof(struct fscrypt_nokey_name, sha256) &&
ret != FSCRYPT_NOKEY_NAME_MAX)) {
+ /* Again, this could be an unencrypted name. */
+ if (dir->i_sb->s_cop->flags & FS_CFLG_ALLOW_PARTIAL) {
+ kfree(fname->crypto_buf.name);
+ fname->crypto_buf.name = NULL;
+ goto unencrypted;
+ }
ret = -ENOENT;
goto errout;
}
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 6020b738c3b2..fb48961c46f6 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -102,6 +102,8 @@ struct fscrypt_nokey_name {
* pages for writes and therefore won't need the fscrypt bounce page pool.
*/
#define FS_CFLG_OWN_PAGES (1U << 1)
+/* The filesystem allows partially encrypted directories/files. */
+#define FS_CFLG_ALLOW_PARTIAL (1U << 2)
/* Crypto operations for filesystems */
struct fscrypt_operations {
--
2.35.1
On Sat, Jul 23, 2022 at 08:52:26PM -0400, Sweet Tea Dorminy wrote:
> From: Omar Sandoval <[email protected]>
>
> Creating several new subvolumes out of snapshots of another subvolume,
> each for a different VM's storage, is a important usecase for btrfs. We
> would like to give each VM a unique encryption key to use for new writes
> to its subvolume, so that secure deletion of the VM's data is as simple
> as securely deleting the key; to avoid needing multiple keys in each VM,
> we envision the initial subvolume being unencrypted. However, this means
> that the snapshot's directories would have a mix of encrypted and
> unencrypted files.
>
> To allow this, add another FS_CFLG to allow filesystems to opt into
> partially encrypted directories.
>
> Signed-off-by: Omar Sandoval <[email protected]>
> Signed-off-by: Sweet Tea Dorminy <[email protected]>
> ---
> fs/crypto/fname.c | 17 ++++++++++++++++-
> include/linux/fscrypt.h | 2 ++
> 2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
> index 5d5c26d827fd..c5dd19c1d19e 100644
> --- a/fs/crypto/fname.c
> +++ b/fs/crypto/fname.c
> @@ -389,6 +389,7 @@ int fscrypt_setup_filename(struct inode *dir, const struct qstr *iname,
> fname->usr_fname = iname;
>
> if (!IS_ENCRYPTED(dir) || fscrypt_is_dot_dotdot(iname)) {
> +unencrypted:
> fname->disk_name.name = (unsigned char *)iname->name;
> fname->disk_name.len = iname->len;
> return 0;
> @@ -424,8 +425,16 @@ int fscrypt_setup_filename(struct inode *dir, const struct qstr *iname,
> * user-supplied name
> */
>
> - if (iname->len > FSCRYPT_NOKEY_NAME_MAX_ENCODED)
> + if (iname->len > FSCRYPT_NOKEY_NAME_MAX_ENCODED) {
> + /*
> + * This isn't a valid nokey name, but it could be an unencrypted
> + * name if the filesystem allows partially encrypted
> + * directories.
> + */
> + if (dir->i_sb->s_cop->flags & FS_CFLG_ALLOW_PARTIAL)
> + goto unencrypted;
> return -ENOENT;
> + }
>
> fname->crypto_buf.name = kmalloc(FSCRYPT_NOKEY_NAME_MAX, GFP_KERNEL);
> if (fname->crypto_buf.name == NULL)
> @@ -436,6 +445,12 @@ int fscrypt_setup_filename(struct inode *dir, const struct qstr *iname,
> if (ret < (int)offsetof(struct fscrypt_nokey_name, bytes[1]) ||
> (ret > offsetof(struct fscrypt_nokey_name, sha256) &&
> ret != FSCRYPT_NOKEY_NAME_MAX)) {
> + /* Again, this could be an unencrypted name. */
> + if (dir->i_sb->s_cop->flags & FS_CFLG_ALLOW_PARTIAL) {
> + kfree(fname->crypto_buf.name);
> + fname->crypto_buf.name = NULL;
> + goto unencrypted;
> + }
> ret = -ENOENT;
> goto errout;
> }
> diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
> index 6020b738c3b2..fb48961c46f6 100644
> --- a/include/linux/fscrypt.h
> +++ b/include/linux/fscrypt.h
> @@ -102,6 +102,8 @@ struct fscrypt_nokey_name {
> * pages for writes and therefore won't need the fscrypt bounce page pool.
> */
> #define FS_CFLG_OWN_PAGES (1U << 1)
> +/* The filesystem allows partially encrypted directories/files. */
> +#define FS_CFLG_ALLOW_PARTIAL (1U << 2)
I'm very confused about what the semantics of this are. So a directory will be
able to contain both encrypted and unencrypted filenames? If so, how will it be
possible to distinguish between them? Or is it just both encrypted and
unencrypted files (which is actually already possible, in the case where
encrypted files are moved into an unencrypted directory)? What sort of metadata
is stored with the parent directory?
Please note that any new semantics and APIs will need to be documented in
Documentation/filesystems/fscrypt.rst.
- Eric
On Sat, Jul 23, 2022 at 08:52:28PM -0400, Sweet Tea Dorminy wrote:
> Certain filesystems may want to use IVs generated and stored outside of
> fscrypt's inode-based IV generation policies. In particular, btrfs can
> have multiple inodes referencing a single block of data, and moves
> logical data blocks to different physical locations on disk; these two
> features mean inode or physical-location-based IV generation policies
> will not work for btrfs. For these or similar reasons, such filesystems
> may want to implement their own IV generation and storage for data
> blocks.
>
> Plumbing each such filesystem's internals into fscrypt for IV generation
> would be ungainly and fragile. Thus, this change adds a new policy,
> IV_FROM_FS, and a new operation function pointer, get_fs_derived_iv. If
> this policy is selected, the filesystem is required to provide the
> function pointer, which populates the IV for a particular data block.
> The IV buffer passed to get_fs_derived_iv() is pre-populated with the
> inode contexts' nonce, in case the filesystem would like to use this
> information; for btrfs, this is used for filename encryption. Any
> filesystem using this policy is expected to appropriately generate and
> store a persistent random IV for each block of data.
This is changed from the original proposal to store just a random "starting IV"
per extent, right? Given that this new proposal uses per-block metadata, has
support for authenticated encryption been considered? Has space been reserved
in the per-block metadata for authentication tags so that authenticated
encryption support could be added later even if it's not in the initial version?
Also, could the new IV generation method just be defined as RANDOM_IV instead of
IV_FROM_FS? Why do individual filesystems have to generate the IVs? Shouldn't
IV generation happen in common code, with filesystems just storing and
retrieving the IVs?
> diff --git a/fs/crypto/inline_crypt.c b/fs/crypto/inline_crypt.c
> index 90f3e68f166e..8a8330caadfa 100644
> --- a/fs/crypto/inline_crypt.c
> +++ b/fs/crypto/inline_crypt.c
> @@ -476,14 +476,22 @@ u64 fscrypt_limit_io_blocks(const struct inode *inode, u64 lblk, u64 nr_blocks)
> return nr_blocks;
>
> ci = inode->i_crypt_info;
> - if (!(fscrypt_policy_flags(&ci->ci_policy) &
> - FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32))
> - return nr_blocks;
>
> - /* With IV_INO_LBLK_32, the DUN can wrap around from U32_MAX to 0. */
> + if (fscrypt_policy_flags(&ci->ci_policy) &
> + FSCRYPT_POLICY_FLAG_IV_FROM_FS) {
> + return 1;
> + }
This effectively means that this IV generation method is incompatible with
inline encryption. I assume this is okay with you?
- Eric
>> diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
>> index 6020b738c3b2..fb48961c46f6 100644
>> --- a/include/linux/fscrypt.h
>> +++ b/include/linux/fscrypt.h
>> @@ -102,6 +102,8 @@ struct fscrypt_nokey_name {
>> * pages for writes and therefore won't need the fscrypt bounce page pool.
>> */
>> #define FS_CFLG_OWN_PAGES (1U << 1)
>> +/* The filesystem allows partially encrypted directories/files. */
>> +#define FS_CFLG_ALLOW_PARTIAL (1U << 2)
>
> I'm very confused about what the semantics of this are. So a directory will be
> able to contain both encrypted and unencrypted filenames? If so, how will it be
> possible to distinguish between them? Or is it just both encrypted and
> unencrypted files (which is actually already possible, in the case where
> encrypted files are moved into an unencrypted directory)? What sort of metadata
> is stored with the parent directory?
Yes, a directory for a filesystem with this flag could have both
encrypted and unencrypted filenames.
When a directory switches to encrypted, the filesystem can get and store
a fscrypt_context for it, as though it were a new directory. All new
filenames for that directory will be encrypted, as will any filename
lookup requests, by fscrypt_prepare_filename() since the directory has a
context.
When a request for a lookup of a name in that directory comes in, it'll
be an encrypted or nokey name; the directory can be searched for both
the encrypted and unencrypted versions of that name. I don't think any
filename collisions can result, as any encrypted filename which happens
to match a plaintext filename will be detected as a collision when it's
first added.
>
> Please note that any new semantics and APIs will need to be documented in
> Documentation/filesystems/fscrypt.rst.
Good point.
Thanks!
Sweet Tea
On 7/25/22 19:32, Eric Biggers wrote:
> On Sat, Jul 23, 2022 at 08:52:28PM -0400, Sweet Tea Dorminy wrote:
>> Certain filesystems may want to use IVs generated and stored outside of
>> fscrypt's inode-based IV generation policies. In particular, btrfs can
>> have multiple inodes referencing a single block of data, and moves
>> logical data blocks to different physical locations on disk; these two
>> features mean inode or physical-location-based IV generation policies
>> will not work for btrfs. For these or similar reasons, such filesystems
>> may want to implement their own IV generation and storage for data
>> blocks.
>>
>> Plumbing each such filesystem's internals into fscrypt for IV generation
>> would be ungainly and fragile. Thus, this change adds a new policy,
>> IV_FROM_FS, and a new operation function pointer, get_fs_derived_iv. If
>> this policy is selected, the filesystem is required to provide the
>> function pointer, which populates the IV for a particular data block.
>> The IV buffer passed to get_fs_derived_iv() is pre-populated with the
>> inode contexts' nonce, in case the filesystem would like to use this
>> information; for btrfs, this is used for filename encryption. Any
>> filesystem using this policy is expected to appropriately generate and
>> store a persistent random IV for each block of data.
>
> This is changed from the original proposal to store just a random "starting IV"
> per extent, right?
This is intended to be a generic interface that doesn't require any
particular IV scheme from the filesystem. In practice, the btrfs side of
the code is using a per-extent starting IV, as originally proposed. I
don't see a way for the interface to require IVs per extent, but maybe
there is a better way than this. Or, is there more detail I can add to
the change description to clarify that the filesystem doesn't
necessarily have to store an IV for each individual data block?
> Given that this new proposal uses per-block metadata, has
> support for authenticated encryption been considered? Has space been reserved
> in the per-block metadata for authentication tags so that authenticated
> encryption support could be added later even if it's not in the initial version?
I don't know sufficiently much about authenticated encryption to have
considered it. As currently drafted, btrfs encrypts before checksumming
if checksums are enabled, and checks against checksums before
decrypting. Although at present we haven't discussed authentication
tags, btrfs could store them in a separate itemtype which could be added
at any time, much as we currently store fsverity data. We do have
sufficient room saved for adding other encryption types, if necessary;
we could use some of that to indicate the existence of authentication
tags for the extents' data.
>
> Also, could the new IV generation method just be defined as RANDOM_IV instead of
> IV_FROM_FS? Why do individual filesystems have to generate the IVs? Shouldn't
> IV generation happen in common code, with filesystems just storing and
> retrieving the IVs?
I think you're imagining an interface similar to get/set_context, where
the first time a block is written the filesystem's set_IV method is
called, and subsequent encryption/decryption calls get_IV, which is
definitely elegant in its symmetry. But I'm not sure how to have a
per-block set_IV and also only store an IV per extent, and it would be a
significant cost to store an IV per block.
I would be happy to add a fscrypt_get_random_iv() method, instead of
having the filesystem call get_random_bytes() itself, if you'd like.
Thank you!
Sweet Tea
On Mon, Jul 25, 2022 at 10:16:07PM -0400, Sweet Tea Dorminy wrote:
> On 7/25/22 19:32, Eric Biggers wrote:
> > On Sat, Jul 23, 2022 at 08:52:28PM -0400, Sweet Tea Dorminy wrote:
> > Given that this new proposal uses per-block metadata, has
> > support for authenticated encryption been considered? Has space been reserved
> > in the per-block metadata for authentication tags so that authenticated
> > encryption support could be added later even if it's not in the initial version?
>
> I don't know sufficiently much about authenticated encryption to have
> considered it. As currently drafted, btrfs encrypts before checksumming
> if checksums are enabled, and checks against checksums before
> decrypting. Although at present we haven't discussed authentication
> tags, btrfs could store them in a separate itemtype which could be added
> at any time, much as we currently store fsverity data. We do have
> sufficient room saved for adding other encryption types, if necessary;
> we could use some of that to indicate the existence of authentication
> tags for the extents' data.
The AEAD tag can be used in place of checksum (also stored in the
checksum item).
On Mon, Jul 25, 2022 at 10:16:07PM -0400, Sweet Tea Dorminy wrote:
>
>
> On 7/25/22 19:32, Eric Biggers wrote:
> > On Sat, Jul 23, 2022 at 08:52:28PM -0400, Sweet Tea Dorminy wrote:
> > > Certain filesystems may want to use IVs generated and stored outside of
> > > fscrypt's inode-based IV generation policies. In particular, btrfs can
> > > have multiple inodes referencing a single block of data, and moves
> > > logical data blocks to different physical locations on disk; these two
> > > features mean inode or physical-location-based IV generation policies
> > > will not work for btrfs. For these or similar reasons, such filesystems
> > > may want to implement their own IV generation and storage for data
> > > blocks.
> > >
> > > Plumbing each such filesystem's internals into fscrypt for IV generation
> > > would be ungainly and fragile. Thus, this change adds a new policy,
> > > IV_FROM_FS, and a new operation function pointer, get_fs_derived_iv. If
> > > this policy is selected, the filesystem is required to provide the
> > > function pointer, which populates the IV for a particular data block.
> > > The IV buffer passed to get_fs_derived_iv() is pre-populated with the
> > > inode contexts' nonce, in case the filesystem would like to use this
> > > information; for btrfs, this is used for filename encryption. Any
> > > filesystem using this policy is expected to appropriately generate and
> > > store a persistent random IV for each block of data.
> >
> > This is changed from the original proposal to store just a random "starting IV"
> > per extent, right?
>
> This is intended to be a generic interface that doesn't require any
> particular IV scheme from the filesystem.
I don't think that's a good way to do it. The fscrypt settings are supposed to
be very concrete, meaning that they specify a particular way of doing the
encryption, which can be reviewed for its security and which can be tested for
correctness of the on-disk format. There shouldn't be cryptographic differences
between how different filesystems implement the same setting.
The fscrypt settings also shouldn't specify internal implementation details of
the code, as "IV_FROM_FS" does. From userspace's perspective, *all* fscrypt
settings have IVs chosen by the filesystem; the division between the
"filesystem" and fs/crypto/ is an internal kernel implementation detail.
So I think you should go with something like RANDOM_IV or IV_PER_EXTENT.
> In practice, the btrfs side of the code is using a per-extent starting IV, as
> originally proposed.
This is inconsistent with your commit message, which says that there is a random
IV for each block of data. It's also inconsistent with your proposed change to
fscrypt_limit_io_blocks(). So I don't know which to believe.
Clearly this can't be properly reviewed on its own, so please send out the whole
patch series and not just the fs/crypto/ parts.
- Eric