2019-06-20 20:54:09

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v5 00/16] fs-verity: read-only file-based authenticity protection

Hello,

This is a redesigned version of the fs-verity patchset, implementing
Ted's suggestion to build the Merkle tree in the kernel
(https://lore.kernel.org/linux-fsdevel/[email protected]/).
This greatly simplifies the UAPI, since the verity metadata no longer
needs to be transferred to the kernel. Now to enable fs-verity on a
file, one simply calls FS_IOC_ENABLE_VERITY, passing it this structure:

struct fsverity_enable_arg {
__u32 version;
__u32 hash_algorithm;
__u32 block_size;
__u32 salt_size;
__u64 salt_ptr;
__u32 sig_size;
__u32 __reserved1;
__u64 sig_ptr;
__u64 __reserved2[11];
};

The filesystem then builds the file's Merkle tree and stores it in a
filesystem-specific location associated with the file. Afterwards,
FS_IOC_MEASURE_VERITY can be used to retrieve the file measurement
("root hash"). The way the file measurement is computed is also
effectively part of the API (it has to be), but it's logically
independent of where/how the filesystem stores the Merkle tree.

The API is fully documented in Documentation/filesystems/fsverity.rst,
along with other aspects of fs-verity. I also added an FAQ section that
answers frequently asked questions about fs-verity, e.g. why isn't it
all at the VFS level, why isn't it part of IMA, why does the Merkle tree
need to be stored on-disk, etc.

Overview
--------

This patchset implements fs-verity for ext4 and f2fs. fs-verity is
similar to dm-verity, but implemented on a per-file basis: a Merkle tree
is used to measure (hash) a read-only file's data as it is paged in.
ext4 and f2fs hide this Merkle tree beyond the end of the file, but
other filesystems can implement it differently if desired.

In general, fs-verity is intended for use on writable filesystems;
dm-verity is still recommended on read-only ones.

Similar to fscrypt, most of the code is in fs/verity/, and not too many
filesystem-specific changes are needed. The Merkle tree is built by the
filesystem when the FS_IOC_ENABLE_VERITY ioctl is executed.

fs-verity provides a file measurement (hash) in constant time and
verifies data on-demand. Thus, it is useful for efficiently verifying
the authenticity of large files of which only a small portion may be
accessed, such as Android application package (APK) files. It may also
be useful in "audit" use cases where file hashes are logged.

fs-verity can also provide better protection against malicious disks
than an ahead-of-time hash, since fs-verity re-verifies data each time
it's paged in. Note, however, that any authenticity guarantee is still
dependent on verification of the file measurement and other relevant
metadata in a way that makes sense for the overall system; fs-verity is
only a tool to help with this.

This patchset doesn't include IMA support for fs-verity file
measurements. This is planned and we'd like to collaborate with the IMA
maintainers. Although fs-verity can be used on its own without IMA,
fs-verity is primarily a lower level feature (think of it as a way of
hashing a file), so some users may still need IMA's policy mechanism.
However, an optional in-kernel signature verification mechanism within
fs-verity itself is also included.

This patchset is based on v5.2-rc3. It can also be found in git at tag
fsverity_2019-06-20 of:

https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git

fs-verity has a userspace utility:

https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity-utils.git

xfstests for fs-verity can be found at branch "fsverity" of:

https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/xfstests-dev.git

fs-verity is supported by f2fs-tools v1.11.0+ and e2fsprogs v1.45.2+.

Examples of setting up fs-verity protected files can be found in the
README.md file of fsverity-utils.

Other useful references include:

- Documentation/filesystems/fsverity.rst, added by the first patch.

- LWN coverage of v3 patchset: https://lwn.net/Articles/790185/

- LWN coverage of v2 patchset: https://lwn.net/Articles/775872/

- LWN coverage of v1 patchset: https://lwn.net/Articles/763729/

- Presentation at Linux Security Summit North America 2018:
- Slides: https://schd.ws/hosted_files/lssna18/af/fs-verity%20slide%20deck.pdf
- Video: https://www.youtube.com/watch?v=Aw5h6aBhu6M
(This corresponded to the v1 patchset; changes have been made since then.)

- LWN coverage of LSFMM 2018 discussion: https://lwn.net/Articles/752614/

Changed since v4:

- Made ext4 and f2fs store the verity metadata beginning at a 64K
aligned boundary, to be ready for architectures with 64K pages.

- Made ext4 store the verity descriptor size in the file data stream,
so that no xattr is needed.

- Added support for empty files.

- A few minor cleanups.

Changed since v3:

- The FS_IOC_GETFLAGS ioctl now returns the verity flag.

- Fixed setting i_verity_info too early.

- Restored pagecache invalidation in FS_IOC_ENABLE_VERITY.

- Fixed truncation of fsverity_enable_arg::hash_algorithm.

- Reject empty files for both open and enable, not just enable.

- Added a couple more FAQ entries to the documentation.

- A few minor cleanups.

- Rebased onto v5.2-rc3.

Changed since v2:

- Large redesign: the Merkle tree is now built by
FS_IOC_ENABLE_VERITY, rather than being provided by userspace. The
fsverity_operations provide an interface for filesystems to read and
write the Merkle tree from/to a filesystem-specific location.

- Lot of refactoring, cleanups, and documentation improvements.

- Many simplifications, such as simplifying the fsverity_descriptor
format, dropping CRC-32 support, and limiting the salt size.

- ext4 and f2fs now store an xattr that gives the location of the
fsverity_descriptor, so loading it is more straightforward.

- f2fs no longer counts the verity metadata in the on-disk i_size,
making it consistent with ext4.

- Replaced the filesystem-specific fs-verity kconfig options with
CONFIG_FS_VERITY.

- Replaced the filesystem-specific verity bit checks with IS_VERITY().

Changed since v1:

- Added documentation file.

- Require write permission for FS_IOC_ENABLE_VERITY, rather than
CAP_SYS_ADMIN.

- Eliminated dependency on CONFIG_BLOCK and clarified that filesystems
can verify a page at a time rather than a bio at a time.

- Fixed conditions for verifying holes.

- ext4 now only allows fs-verity on extent-based files.

- Eliminated most of the assumptions that the verity metadata is
stored beyond EOF, in case filesystems want to do things
differently.

- Other cleanups.

Eric Biggers (16):
fs-verity: add a documentation file
fs-verity: add MAINTAINERS file entry
fs-verity: add UAPI header
fs: uapi: define verity bit for FS_IOC_GETFLAGS
fs-verity: add Kconfig and the helper functions for hashing
fs-verity: add inode and superblock fields
fs-verity: add the hook for file ->open()
fs-verity: add the hook for file ->setattr()
fs-verity: add data verification hooks for ->readpages()
fs-verity: implement FS_IOC_ENABLE_VERITY ioctl
fs-verity: implement FS_IOC_MEASURE_VERITY ioctl
fs-verity: add SHA-512 support
fs-verity: support builtin file signatures
ext4: add basic fs-verity support
ext4: add fs-verity read support
f2fs: add fs-verity support

Documentation/filesystems/fsverity.rst | 710 +++++++++++++++++++++++++
Documentation/filesystems/index.rst | 1 +
Documentation/ioctl/ioctl-number.txt | 1 +
MAINTAINERS | 12 +
fs/Kconfig | 2 +
fs/Makefile | 1 +
fs/ext4/Makefile | 1 +
fs/ext4/ext4.h | 23 +-
fs/ext4/file.c | 4 +
fs/ext4/inode.c | 48 +-
fs/ext4/ioctl.c | 12 +
fs/ext4/readpage.c | 207 ++++++-
fs/ext4/super.c | 18 +-
fs/ext4/sysfs.c | 6 +
fs/ext4/verity.c | 354 ++++++++++++
fs/f2fs/Makefile | 1 +
fs/f2fs/data.c | 72 ++-
fs/f2fs/f2fs.h | 23 +-
fs/f2fs/file.c | 40 ++
fs/f2fs/inode.c | 5 +-
fs/f2fs/super.c | 3 +
fs/f2fs/sysfs.c | 11 +
fs/f2fs/verity.c | 233 ++++++++
fs/f2fs/xattr.h | 2 +
fs/verity/Kconfig | 55 ++
fs/verity/Makefile | 10 +
fs/verity/enable.c | 355 +++++++++++++
fs/verity/fsverity_private.h | 185 +++++++
fs/verity/hash_algs.c | 279 ++++++++++
fs/verity/init.c | 61 +++
fs/verity/measure.c | 57 ++
fs/verity/open.c | 357 +++++++++++++
fs/verity/signature.c | 207 +++++++
fs/verity/verify.c | 281 ++++++++++
include/linux/fs.h | 11 +
include/linux/fsverity.h | 209 ++++++++
include/uapi/linux/fs.h | 1 +
include/uapi/linux/fsverity.h | 40 ++
38 files changed, 3839 insertions(+), 59 deletions(-)
create mode 100644 Documentation/filesystems/fsverity.rst
create mode 100644 fs/ext4/verity.c
create mode 100644 fs/f2fs/verity.c
create mode 100644 fs/verity/Kconfig
create mode 100644 fs/verity/Makefile
create mode 100644 fs/verity/enable.c
create mode 100644 fs/verity/fsverity_private.h
create mode 100644 fs/verity/hash_algs.c
create mode 100644 fs/verity/init.c
create mode 100644 fs/verity/measure.c
create mode 100644 fs/verity/open.c
create mode 100644 fs/verity/signature.c
create mode 100644 fs/verity/verify.c
create mode 100644 include/linux/fsverity.h
create mode 100644 include/uapi/linux/fsverity.h

--
2.22.0.410.gd8fdbe21b5-goog


2019-06-20 20:54:14

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v5 08/16] fs-verity: add the hook for file ->setattr()

From: Eric Biggers <[email protected]>

Add a function fsverity_prepare_setattr() which filesystems that support
fs-verity must call to deny truncates of verity files.

Reviewed-by: Theodore Ts'o <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
fs/verity/open.c | 21 +++++++++++++++++++++
include/linux/fsverity.h | 7 +++++++
2 files changed, 28 insertions(+)

diff --git a/fs/verity/open.c b/fs/verity/open.c
index 3a3bb27e23f5e3..21ae0ef254a695 100644
--- a/fs/verity/open.c
+++ b/fs/verity/open.c
@@ -296,6 +296,27 @@ int fsverity_file_open(struct inode *inode, struct file *filp)
}
EXPORT_SYMBOL_GPL(fsverity_file_open);

+/**
+ * fsverity_prepare_setattr - prepare to change a verity inode's attributes
+ * @dentry: dentry through which the inode is being changed
+ * @attr: attributes to change
+ *
+ * Verity files are immutable, so deny truncates. This isn't covered by the
+ * open-time check because sys_truncate() takes a path, not a file descriptor.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int fsverity_prepare_setattr(struct dentry *dentry, struct iattr *attr)
+{
+ if (IS_VERITY(d_inode(dentry)) && (attr->ia_valid & ATTR_SIZE)) {
+ pr_debug("Denying truncate of verity file (ino %lu)\n",
+ d_inode(dentry)->i_ino);
+ return -EPERM;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fsverity_prepare_setattr);
+
/**
* fsverity_cleanup_inode - free the inode's verity info, if present
*
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 1372c236c8770c..cbcc358d073652 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -46,6 +46,7 @@ static inline struct fsverity_info *fsverity_get_info(const struct inode *inode)
/* open.c */

extern int fsverity_file_open(struct inode *inode, struct file *filp);
+extern int fsverity_prepare_setattr(struct dentry *dentry, struct iattr *attr);
extern void fsverity_cleanup_inode(struct inode *inode);

#else /* !CONFIG_FS_VERITY */
@@ -62,6 +63,12 @@ static inline int fsverity_file_open(struct inode *inode, struct file *filp)
return IS_VERITY(inode) ? -EOPNOTSUPP : 0;
}

+static inline int fsverity_prepare_setattr(struct dentry *dentry,
+ struct iattr *attr)
+{
+ return IS_VERITY(d_inode(dentry)) ? -EOPNOTSUPP : 0;
+}
+
static inline void fsverity_cleanup_inode(struct inode *inode)
{
}
--
2.22.0.410.gd8fdbe21b5-goog

2019-06-20 20:54:14

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v5 11/16] fs-verity: implement FS_IOC_MEASURE_VERITY ioctl

From: Eric Biggers <[email protected]>

Add a function for filesystems to call to implement the
FS_IOC_MEASURE_VERITY ioctl. This ioctl retrieves the file measurement
that fs-verity calculated for the given file and is enforcing for reads;
i.e., reads that don't match this hash will fail. This ioctl can be
used for authentication or logging of file measurements in userspace.

See the "FS_IOC_MEASURE_VERITY" section of
Documentation/filesystems/fsverity.rst for the documentation.

Reviewed-by: Theodore Ts'o <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
fs/verity/Makefile | 1 +
fs/verity/measure.c | 57 ++++++++++++++++++++++++++++++++++++++++
include/linux/fsverity.h | 11 ++++++++
3 files changed, 69 insertions(+)
create mode 100644 fs/verity/measure.c

diff --git a/fs/verity/Makefile b/fs/verity/Makefile
index 04b37475fd280a..6f7675ae0a3110 100644
--- a/fs/verity/Makefile
+++ b/fs/verity/Makefile
@@ -3,5 +3,6 @@
obj-$(CONFIG_FS_VERITY) += enable.o \
hash_algs.o \
init.o \
+ measure.o \
open.o \
verify.o
diff --git a/fs/verity/measure.c b/fs/verity/measure.c
new file mode 100644
index 00000000000000..05049b68c74553
--- /dev/null
+++ b/fs/verity/measure.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fs/verity/measure.c: ioctl to get a verity file's measurement
+ *
+ * Copyright 2019 Google LLC
+ */
+
+#include "fsverity_private.h"
+
+#include <linux/uaccess.h>
+
+/**
+ * fsverity_ioctl_measure() - get a verity file's measurement
+ *
+ * Retrieve the file measurement that the kernel is enforcing for reads from a
+ * verity file. See the "FS_IOC_MEASURE_VERITY" section of
+ * Documentation/filesystems/fsverity.rst for the documentation.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int fsverity_ioctl_measure(struct file *filp, void __user *_uarg)
+{
+ const struct inode *inode = file_inode(filp);
+ struct fsverity_digest __user *uarg = _uarg;
+ const struct fsverity_info *vi;
+ const struct fsverity_hash_alg *hash_alg;
+ struct fsverity_digest arg;
+
+ vi = fsverity_get_info(inode);
+ if (!vi)
+ return -ENODATA; /* not a verity file */
+ hash_alg = vi->tree_params.hash_alg;
+
+ /*
+ * The user specifies the digest_size their buffer has space for; we can
+ * return the digest if it fits in the available space. We write back
+ * the actual size, which may be shorter than the user-specified size.
+ */
+
+ if (get_user(arg.digest_size, &uarg->digest_size))
+ return -EFAULT;
+ if (arg.digest_size < hash_alg->digest_size)
+ return -EOVERFLOW;
+
+ memset(&arg, 0, sizeof(arg));
+ arg.digest_algorithm = hash_alg - fsverity_hash_algs;
+ arg.digest_size = hash_alg->digest_size;
+
+ if (copy_to_user(uarg, &arg, sizeof(arg)))
+ return -EFAULT;
+
+ if (copy_to_user(uarg->digest, vi->measurement, hash_alg->digest_size))
+ return -EFAULT;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fsverity_ioctl_measure);
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 7ef2ef82653409..247359c86b72e0 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -116,6 +116,10 @@ static inline struct fsverity_info *fsverity_get_info(const struct inode *inode)

extern int fsverity_ioctl_enable(struct file *filp, const void __user *arg);

+/* measure.c */
+
+extern int fsverity_ioctl_measure(struct file *filp, void __user *arg);
+
/* open.c */

extern int fsverity_file_open(struct inode *inode, struct file *filp);
@@ -143,6 +147,13 @@ static inline int fsverity_ioctl_enable(struct file *filp,
return -EOPNOTSUPP;
}

+/* measure.c */
+
+static inline int fsverity_ioctl_measure(struct file *filp, void __user *arg)
+{
+ return -EOPNOTSUPP;
+}
+
/* open.c */

static inline int fsverity_file_open(struct inode *inode, struct file *filp)
--
2.22.0.410.gd8fdbe21b5-goog

2019-06-20 20:54:15

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v5 12/16] fs-verity: add SHA-512 support

From: Eric Biggers <[email protected]>

Add SHA-512 support to fs-verity. This is primarily a demonstration of
the trivial changes needed to support a new hash algorithm in fs-verity;
most users will still use SHA-256, due to the smaller space required to
store the hashes. But some users may prefer SHA-512.

Reviewed-by: Theodore Ts'o <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
fs/verity/fsverity_private.h | 2 +-
fs/verity/hash_algs.c | 5 +++++
include/uapi/linux/fsverity.h | 1 +
3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index eaa2b3b93bbf6b..02a547f0667c13 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -29,7 +29,7 @@ struct ahash_request;
* Largest digest size among all hash algorithms supported by fs-verity.
* Currently assumed to be <= size of fsverity_descriptor::root_hash.
*/
-#define FS_VERITY_MAX_DIGEST_SIZE SHA256_DIGEST_SIZE
+#define FS_VERITY_MAX_DIGEST_SIZE SHA512_DIGEST_SIZE

/* A hash algorithm supported by fs-verity */
struct fsverity_hash_alg {
diff --git a/fs/verity/hash_algs.c b/fs/verity/hash_algs.c
index 46df17094fc252..e0462a010cabfb 100644
--- a/fs/verity/hash_algs.c
+++ b/fs/verity/hash_algs.c
@@ -17,6 +17,11 @@ struct fsverity_hash_alg fsverity_hash_algs[] = {
.digest_size = SHA256_DIGEST_SIZE,
.block_size = SHA256_BLOCK_SIZE,
},
+ [FS_VERITY_HASH_ALG_SHA512] = {
+ .name = "sha512",
+ .digest_size = SHA512_DIGEST_SIZE,
+ .block_size = SHA512_BLOCK_SIZE,
+ },
};

/**
diff --git a/include/uapi/linux/fsverity.h b/include/uapi/linux/fsverity.h
index 57d1d7fc0c345a..da0daf6c193b4b 100644
--- a/include/uapi/linux/fsverity.h
+++ b/include/uapi/linux/fsverity.h
@@ -14,6 +14,7 @@
#include <linux/types.h>

#define FS_VERITY_HASH_ALG_SHA256 1
+#define FS_VERITY_HASH_ALG_SHA512 2

struct fsverity_enable_arg {
__u32 version;
--
2.22.0.410.gd8fdbe21b5-goog

2019-06-20 20:54:21

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v5 15/16] ext4: add fs-verity read support

From: Eric Biggers <[email protected]>

Make ext4_mpage_readpages() verify data as it is read from fs-verity
files, using the helper functions from fs/verity/.

To support both encryption and verity simultaneously, this required
refactoring the decryption workflow into a generic "post-read
processing" workflow which can do decryption, verification, or both.

The case where the ext4 block size is not equal to the PAGE_SIZE is not
supported yet, since in that case ext4_mpage_readpages() sometimes falls
back to block_read_full_page(), which does not support fs-verity yet.

Co-developed-by: Theodore Ts'o <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
fs/ext4/ext4.h | 2 +
fs/ext4/inode.c | 2 +
fs/ext4/readpage.c | 207 ++++++++++++++++++++++++++++++++++++++-------
fs/ext4/super.c | 9 +-
4 files changed, 190 insertions(+), 30 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 5a1deea3fb3e37..3c0d491c497025 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3158,6 +3158,8 @@ static inline void ext4_set_de_type(struct super_block *sb,
extern int ext4_mpage_readpages(struct address_space *mapping,
struct list_head *pages, struct page *page,
unsigned nr_pages, bool is_readahead);
+extern int __init ext4_init_post_read_processing(void);
+extern void ext4_exit_post_read_processing(void);

/* symlink.c */
extern const struct inode_operations ext4_encrypted_symlink_inode_operations;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 514e24f88f90f4..37571d080b3c64 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3893,6 +3893,8 @@ static ssize_t ext4_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
if (IS_ENCRYPTED(inode) && S_ISREG(inode->i_mode))
return 0;
#endif
+ if (fsverity_active(inode))
+ return 0;

/*
* If we are doing data journalling we don't support O_DIRECT
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index c916017db3344e..84152b686e498e 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -47,6 +47,11 @@

#include "ext4.h"

+#define NUM_PREALLOC_POST_READ_CTXS 128
+
+static struct kmem_cache *bio_post_read_ctx_cache;
+static mempool_t *bio_post_read_ctx_pool;
+
static inline bool ext4_bio_encrypted(struct bio *bio)
{
#ifdef CONFIG_FS_ENCRYPTION
@@ -56,6 +61,100 @@ static inline bool ext4_bio_encrypted(struct bio *bio)
#endif
}

+/* postprocessing steps for read bios */
+enum bio_post_read_step {
+ STEP_INITIAL = 0,
+ STEP_DECRYPT,
+ STEP_VERITY,
+};
+
+struct bio_post_read_ctx {
+ struct bio *bio;
+ struct work_struct work;
+ unsigned int cur_step;
+ unsigned int enabled_steps;
+};
+
+static void __read_end_io(struct bio *bio)
+{
+ struct page *page;
+ struct bio_vec *bv;
+ struct bvec_iter_all iter_all;
+
+ bio_for_each_segment_all(bv, bio, iter_all) {
+ page = bv->bv_page;
+
+ /* PG_error was set if any post_read step failed */
+ if (bio->bi_status || PageError(page)) {
+ ClearPageUptodate(page);
+ /* will re-read again later */
+ ClearPageError(page);
+ } else {
+ SetPageUptodate(page);
+ }
+ unlock_page(page);
+ }
+ if (bio->bi_private)
+ mempool_free(bio->bi_private, bio_post_read_ctx_pool);
+ bio_put(bio);
+}
+
+static void bio_post_read_processing(struct bio_post_read_ctx *ctx);
+
+static void decrypt_work(struct work_struct *work)
+{
+ struct bio_post_read_ctx *ctx =
+ container_of(work, struct bio_post_read_ctx, work);
+
+ fscrypt_decrypt_bio(ctx->bio);
+
+ bio_post_read_processing(ctx);
+}
+
+static void verity_work(struct work_struct *work)
+{
+ struct bio_post_read_ctx *ctx =
+ container_of(work, struct bio_post_read_ctx, work);
+
+ fsverity_verify_bio(ctx->bio);
+
+ bio_post_read_processing(ctx);
+}
+
+static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
+{
+ /*
+ * We use different work queues for decryption and for verity because
+ * verity may require reading metadata pages that need decryption, and
+ * we shouldn't recurse to the same workqueue.
+ */
+ switch (++ctx->cur_step) {
+ case STEP_DECRYPT:
+ if (ctx->enabled_steps & (1 << STEP_DECRYPT)) {
+ INIT_WORK(&ctx->work, decrypt_work);
+ fscrypt_enqueue_decrypt_work(&ctx->work);
+ return;
+ }
+ ctx->cur_step++;
+ /* fall-through */
+ case STEP_VERITY:
+ if (ctx->enabled_steps & (1 << STEP_VERITY)) {
+ INIT_WORK(&ctx->work, verity_work);
+ fsverity_enqueue_verify_work(&ctx->work);
+ return;
+ }
+ ctx->cur_step++;
+ /* fall-through */
+ default:
+ __read_end_io(ctx->bio);
+ }
+}
+
+static bool bio_post_read_required(struct bio *bio)
+{
+ return bio->bi_private && !bio->bi_status;
+}
+
/*
* I/O completion handler for multipage BIOs.
*
@@ -70,30 +169,53 @@ static inline bool ext4_bio_encrypted(struct bio *bio)
*/
static void mpage_end_io(struct bio *bio)
{
- struct bio_vec *bv;
- struct bvec_iter_all iter_all;
+ if (bio_post_read_required(bio)) {
+ struct bio_post_read_ctx *ctx = bio->bi_private;

- if (ext4_bio_encrypted(bio)) {
- if (bio->bi_status) {
- fscrypt_release_ctx(bio->bi_private);
- } else {
- fscrypt_enqueue_decrypt_bio(bio->bi_private, bio);
- return;
- }
+ ctx->cur_step = STEP_INITIAL;
+ bio_post_read_processing(ctx);
+ return;
}
- bio_for_each_segment_all(bv, bio, iter_all) {
- struct page *page = bv->bv_page;
+ __read_end_io(bio);
+}

- if (!bio->bi_status) {
- SetPageUptodate(page);
- } else {
- ClearPageUptodate(page);
- SetPageError(page);
- }
- unlock_page(page);
+static inline bool ext4_need_verity(const struct inode *inode, pgoff_t idx)
+{
+ return fsverity_active(inode) &&
+ idx < DIV_ROUND_UP(inode->i_size, PAGE_SIZE);
+}
+
+static struct bio_post_read_ctx *get_bio_post_read_ctx(struct inode *inode,
+ struct bio *bio,
+ pgoff_t first_idx)
+{
+ unsigned int post_read_steps = 0;
+ struct bio_post_read_ctx *ctx = NULL;
+
+ if (IS_ENCRYPTED(inode) && S_ISREG(inode->i_mode))
+ post_read_steps |= 1 << STEP_DECRYPT;
+
+ if (ext4_need_verity(inode, first_idx))
+ post_read_steps |= 1 << STEP_VERITY;
+
+ if (post_read_steps) {
+ ctx = mempool_alloc(bio_post_read_ctx_pool, GFP_NOFS);
+ if (!ctx)
+ return ERR_PTR(-ENOMEM);
+ ctx->bio = bio;
+ ctx->enabled_steps = post_read_steps;
+ bio->bi_private = ctx;
}
+ return ctx;
+}

- bio_put(bio);
+static inline loff_t ext4_readpage_limit(struct inode *inode)
+{
+ if (IS_ENABLED(CONFIG_FS_VERITY) &&
+ (IS_VERITY(inode) || ext4_verity_in_progress(inode)))
+ return inode->i_sb->s_maxbytes;
+
+ return i_size_read(inode);
}

int ext4_mpage_readpages(struct address_space *mapping,
@@ -141,7 +263,8 @@ int ext4_mpage_readpages(struct address_space *mapping,

block_in_file = (sector_t)page->index << (PAGE_SHIFT - blkbits);
last_block = block_in_file + nr_pages * blocks_per_page;
- last_block_in_file = (i_size_read(inode) + blocksize - 1) >> blkbits;
+ last_block_in_file = (ext4_readpage_limit(inode) +
+ blocksize - 1) >> blkbits;
if (last_block > last_block_in_file)
last_block = last_block_in_file;
page_block = 0;
@@ -218,6 +341,9 @@ int ext4_mpage_readpages(struct address_space *mapping,
zero_user_segment(page, first_hole << blkbits,
PAGE_SIZE);
if (first_hole == 0) {
+ if (ext4_need_verity(inode, page->index) &&
+ !fsverity_verify_page(page))
+ goto set_error_page;
SetPageUptodate(page);
unlock_page(page);
goto next_page;
@@ -241,18 +367,15 @@ int ext4_mpage_readpages(struct address_space *mapping,
bio = NULL;
}
if (bio == NULL) {
- struct fscrypt_ctx *ctx = NULL;
+ struct bio_post_read_ctx *ctx;

- if (IS_ENCRYPTED(inode) && S_ISREG(inode->i_mode)) {
- ctx = fscrypt_get_ctx(GFP_NOFS);
- if (IS_ERR(ctx))
- goto set_error_page;
- }
bio = bio_alloc(GFP_KERNEL,
min_t(int, nr_pages, BIO_MAX_PAGES));
- if (!bio) {
- if (ctx)
- fscrypt_release_ctx(ctx);
+ if (!bio)
+ goto set_error_page;
+ ctx = get_bio_post_read_ctx(inode, bio, page->index);
+ if (IS_ERR(ctx)) {
+ bio_put(bio);
goto set_error_page;
}
bio_set_dev(bio, bdev);
@@ -293,3 +416,29 @@ int ext4_mpage_readpages(struct address_space *mapping,
submit_bio(bio);
return 0;
}
+
+int __init ext4_init_post_read_processing(void)
+{
+ bio_post_read_ctx_cache =
+ kmem_cache_create("ext4_bio_post_read_ctx",
+ sizeof(struct bio_post_read_ctx), 0, 0, NULL);
+ if (!bio_post_read_ctx_cache)
+ goto fail;
+ bio_post_read_ctx_pool =
+ mempool_create_slab_pool(NUM_PREALLOC_POST_READ_CTXS,
+ bio_post_read_ctx_cache);
+ if (!bio_post_read_ctx_pool)
+ goto fail_free_cache;
+ return 0;
+
+fail_free_cache:
+ kmem_cache_destroy(bio_post_read_ctx_cache);
+fail:
+ return -ENOMEM;
+}
+
+void ext4_exit_post_read_processing(void)
+{
+ mempool_destroy(bio_post_read_ctx_pool);
+ kmem_cache_destroy(bio_post_read_ctx_cache);
+}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 05a9874687c365..23e7acd43e4ee7 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -6103,6 +6103,10 @@ static int __init ext4_init_fs(void)
return err;

err = ext4_init_pending();
+ if (err)
+ goto out7;
+
+ err = ext4_init_post_read_processing();
if (err)
goto out6;

@@ -6144,8 +6148,10 @@ static int __init ext4_init_fs(void)
out4:
ext4_exit_pageio();
out5:
- ext4_exit_pending();
+ ext4_exit_post_read_processing();
out6:
+ ext4_exit_pending();
+out7:
ext4_exit_es();

return err;
@@ -6162,6 +6168,7 @@ static void __exit ext4_exit_fs(void)
ext4_exit_sysfs();
ext4_exit_system_zone();
ext4_exit_pageio();
+ ext4_exit_post_read_processing();
ext4_exit_es();
ext4_exit_pending();
}
--
2.22.0.410.gd8fdbe21b5-goog

2019-06-20 20:54:24

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v5 13/16] fs-verity: support builtin file signatures

From: Eric Biggers <[email protected]>

To meet some users' needs, add optional support for having fs-verity
handle a portion of the authentication policy in the kernel. An
".fs-verity" keyring is created to which X.509 certificates can be
added; then a sysctl 'fs.verity.require_signatures' can be set to cause
the kernel to enforce that all fs-verity files contain a signature of
their file measurement by a key in this keyring.

See the "Built-in signature verification" section of
Documentation/filesystems/fsverity.rst for the full documentation.

Reviewed-by: Theodore Ts'o <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
fs/verity/Kconfig | 17 +++
fs/verity/Makefile | 2 +
fs/verity/enable.c | 20 +++-
fs/verity/fsverity_private.h | 48 +++++++-
fs/verity/init.c | 6 +
fs/verity/open.c | 27 +++--
fs/verity/signature.c | 207 +++++++++++++++++++++++++++++++++++
fs/verity/verify.c | 6 +
8 files changed, 319 insertions(+), 14 deletions(-)
create mode 100644 fs/verity/signature.c

diff --git a/fs/verity/Kconfig b/fs/verity/Kconfig
index c2bca0b01ecfa9..88fb25119899d3 100644
--- a/fs/verity/Kconfig
+++ b/fs/verity/Kconfig
@@ -36,3 +36,20 @@ config FS_VERITY_DEBUG
Enable debugging messages related to fs-verity by default.

Say N unless you are an fs-verity developer.
+
+config FS_VERITY_BUILTIN_SIGNATURES
+ bool "FS Verity builtin signature support"
+ depends on FS_VERITY
+ select SYSTEM_DATA_VERIFICATION
+ help
+ Support verifying signatures of verity files against the X.509
+ certificates that have been loaded into the ".fs-verity"
+ kernel keyring.
+
+ This is meant as a relatively simple mechanism that can be
+ used to provide an authenticity guarantee for verity files, as
+ an alternative to IMA appraisal. Userspace programs still
+ need to check that the verity bit is set in order to get an
+ authenticity guarantee.
+
+ If unsure, say N.
diff --git a/fs/verity/Makefile b/fs/verity/Makefile
index 6f7675ae0a3110..570e9136334d47 100644
--- a/fs/verity/Makefile
+++ b/fs/verity/Makefile
@@ -6,3 +6,5 @@ obj-$(CONFIG_FS_VERITY) += enable.o \
measure.o \
open.o \
verify.o
+
+obj-$(CONFIG_FS_VERITY_BUILTIN_SIGNATURES) += signature.o
diff --git a/fs/verity/enable.c b/fs/verity/enable.c
index 144721bbe4aab9..d4fb6b3b6e1a1f 100644
--- a/fs/verity/enable.c
+++ b/fs/verity/enable.c
@@ -153,7 +153,7 @@ static int enable_verity(struct file *filp,
const struct fsverity_operations *vops = inode->i_sb->s_vop;
struct merkle_tree_params params = { };
struct fsverity_descriptor *desc;
- size_t desc_size = sizeof(*desc);
+ size_t desc_size = sizeof(*desc) + arg->sig_size;
struct fsverity_info *vi;
int err;

@@ -175,6 +175,16 @@ static int enable_verity(struct file *filp,
}
desc->salt_size = arg->salt_size;

+ /* Get the signature if the user provided one */
+ if (arg->sig_size &&
+ copy_from_user(desc->signature,
+ (const u8 __user *)(uintptr_t)arg->sig_ptr,
+ arg->sig_size)) {
+ err = -EFAULT;
+ goto out;
+ }
+ desc->sig_size = cpu_to_le32(arg->sig_size);
+
desc->data_size = cpu_to_le64(inode->i_size);

pr_debug("Building Merkle tree...\n");
@@ -215,6 +225,10 @@ static int enable_verity(struct file *filp,
goto rollback;
}

+ if (arg->sig_size)
+ pr_debug("Storing a %u-byte PKCS#7 signature alongside the file\n",
+ arg->sig_size);
+
/* Tell the filesystem to finish enabling verity on the file */
err = vops->end_enable_verity(filp, desc, desc_size, params.tree_size);
if (err) {
@@ -274,8 +288,8 @@ int fsverity_ioctl_enable(struct file *filp, const void __user *uarg)
if (arg.salt_size > FIELD_SIZEOF(struct fsverity_descriptor, salt))
return -EMSGSIZE;

- if (arg.sig_size)
- return -EINVAL;
+ if (arg.sig_size > FS_VERITY_MAX_SIGNATURE_SIZE)
+ return -EMSGSIZE;

/*
* Require a regular file with write access. But the actual fd must
diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index 02a547f0667c13..e74c79b64d8898 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -75,23 +75,41 @@ struct fsverity_info {
};

/*
- * Merkle tree properties. The file measurement is the hash of this structure.
+ * Merkle tree properties. The file measurement is the hash of this structure
+ * excluding the signature and with the sig_size field set to 0.
*/
struct fsverity_descriptor {
__u8 version; /* must be 1 */
__u8 hash_algorithm; /* Merkle tree hash algorithm */
__u8 log_blocksize; /* log2 of size of data and tree blocks */
__u8 salt_size; /* size of salt in bytes; 0 if none */
- __le32 sig_size; /* reserved, must be 0 */
+ __le32 sig_size; /* size of signature in bytes; 0 if none */
__le64 data_size; /* size of file the Merkle tree is built over */
__u8 root_hash[64]; /* Merkle tree root hash */
__u8 salt[32]; /* salt prepended to each hashed block */
__u8 __reserved[144]; /* must be 0's */
+ __u8 signature[]; /* optional PKCS#7 signature */
};

/* Arbitrary limit to bound the kmalloc() size. Can be changed. */
#define FS_VERITY_MAX_DESCRIPTOR_SIZE 16384

+#define FS_VERITY_MAX_SIGNATURE_SIZE (FS_VERITY_MAX_DESCRIPTOR_SIZE - \
+ sizeof(struct fsverity_descriptor))
+
+/*
+ * Format in which verity file measurements are signed. This is the same as
+ * 'struct fsverity_digest', except here some magic bytes are prepended to
+ * provide some context about what is being signed in case the same key is used
+ * for non-fsverity purposes, and here the fields have fixed endianness.
+ */
+struct fsverity_signed_digest {
+ char magic[8]; /* must be "FSVerity" */
+ __le16 digest_algorithm;
+ __le16 digest_size;
+ __u8 digest[];
+};
+
/* hash_algs.c */

extern struct fsverity_hash_alg fsverity_hash_algs[];
@@ -127,7 +145,7 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params,
const u8 *salt, size_t salt_size);

struct fsverity_info *fsverity_create_info(const struct inode *inode,
- const void *desc, size_t desc_size);
+ void *desc, size_t desc_size);

void fsverity_set_info(struct inode *inode, struct fsverity_info *vi);

@@ -136,8 +154,32 @@ void fsverity_free_info(struct fsverity_info *vi);
int __init fsverity_init_info_cache(void);
void __init fsverity_exit_info_cache(void);

+/* signature.c */
+
+#ifdef CONFIG_FS_VERITY_BUILTIN_SIGNATURES
+int fsverity_verify_signature(const struct fsverity_info *vi,
+ const struct fsverity_descriptor *desc,
+ size_t desc_size);
+
+int __init fsverity_init_signature(void);
+#else /* !CONFIG_FS_VERITY_BUILTIN_SIGNATURES */
+static inline int
+fsverity_verify_signature(const struct fsverity_info *vi,
+ const struct fsverity_descriptor *desc,
+ size_t desc_size)
+{
+ return 0;
+}
+
+static inline int fsverity_init_signature(void)
+{
+ return 0;
+}
+#endif /* !CONFIG_FS_VERITY_BUILTIN_SIGNATURES */
+
/* verify.c */

int __init fsverity_init_workqueue(void);
+void __init fsverity_exit_workqueue(void);

#endif /* _FSVERITY_PRIVATE_H */
diff --git a/fs/verity/init.c b/fs/verity/init.c
index b593805aafcc89..94c104e00861d2 100644
--- a/fs/verity/init.c
+++ b/fs/verity/init.c
@@ -45,9 +45,15 @@ static int __init fsverity_init(void)
if (err)
goto err_exit_info_cache;

+ err = fsverity_init_signature();
+ if (err)
+ goto err_exit_workqueue;
+
pr_debug("Initialized fs-verity\n");
return 0;

+err_exit_workqueue:
+ fsverity_exit_workqueue();
err_exit_info_cache:
fsverity_exit_info_cache();
return err;
diff --git a/fs/verity/open.c b/fs/verity/open.c
index 7a2cd000dc4f06..810810ea306338 100644
--- a/fs/verity/open.c
+++ b/fs/verity/open.c
@@ -122,22 +122,32 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params,
return err;
}

-/* Compute the file measurement by hashing the fsverity_descriptor. */
+/*
+ * Compute the file measurement by hashing the fsverity_descriptor excluding the
+ * signature and with the sig_size field set to 0.
+ */
static int compute_file_measurement(const struct fsverity_hash_alg *hash_alg,
- const struct fsverity_descriptor *desc,
+ struct fsverity_descriptor *desc,
u8 *measurement)
{
- return fsverity_hash_buffer(hash_alg, desc, sizeof(*desc), measurement);
+ __le32 sig_size = desc->sig_size;
+ int err;
+
+ desc->sig_size = 0;
+ err = fsverity_hash_buffer(hash_alg, desc, sizeof(*desc), measurement);
+ desc->sig_size = sig_size;
+
+ return err;
}

/*
* Validate the given fsverity_descriptor and create a new fsverity_info from
- * it.
+ * it. The signature (if present) is also checked.
*/
struct fsverity_info *fsverity_create_info(const struct inode *inode,
- const void *_desc, size_t desc_size)
+ void *_desc, size_t desc_size)
{
- const struct fsverity_descriptor *desc = _desc;
+ struct fsverity_descriptor *desc = _desc;
struct fsverity_info *vi;
int err;

@@ -153,8 +163,7 @@ struct fsverity_info *fsverity_create_info(const struct inode *inode,
return ERR_PTR(-EINVAL);
}

- if (desc->sig_size ||
- memchr_inv(desc->__reserved, 0, sizeof(desc->__reserved))) {
+ if (memchr_inv(desc->__reserved, 0, sizeof(desc->__reserved))) {
fsverity_err(inode, "Reserved bits set in descriptor");
return ERR_PTR(-EINVAL);
}
@@ -199,6 +208,8 @@ struct fsverity_info *fsverity_create_info(const struct inode *inode,
pr_debug("Computed file measurement: %s:%*phN\n",
vi->tree_params.hash_alg->name,
vi->tree_params.digest_size, vi->measurement);
+
+ err = fsverity_verify_signature(vi, desc, desc_size);
out:
if (err) {
fsverity_free_info(vi);
diff --git a/fs/verity/signature.c b/fs/verity/signature.c
new file mode 100644
index 00000000000000..b8e7b7ad69741a
--- /dev/null
+++ b/fs/verity/signature.c
@@ -0,0 +1,207 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fs/verity/signature.c: verification of builtin signatures
+ *
+ * Copyright 2019 Google LLC
+ */
+
+#include "fsverity_private.h"
+
+#include <linux/cred.h>
+#include <linux/key.h>
+#include <linux/verification.h>
+
+/*
+ * /proc/sys/fs/verity/require_signatures
+ * If 1, all verity files must have a valid builtin signature.
+ */
+static int fsverity_require_signatures;
+
+/*
+ * Keyring that contains the trusted X.509 certificates.
+ *
+ * Only root (kuid=0) can modify this. Also, root may use
+ * keyctl_restrict_keyring() to prevent any more additions.
+ */
+static struct key *fsverity_keyring;
+
+struct verify_arg {
+ const struct fsverity_info *vi;
+ u8 measurement[FS_VERITY_MAX_DIGEST_SIZE];
+ bool have_measurement;
+};
+
+static int extract_measurement(void *ctx, const void *data, size_t len,
+ size_t asn1hdrlen)
+{
+ struct verify_arg *arg = ctx;
+ const struct fsverity_info *vi = arg->vi;
+ const struct inode *inode = vi->inode;
+ const struct fsverity_signed_digest *d = data;
+ const struct fsverity_hash_alg *hash_alg;
+
+ if (len < sizeof(*d) || memcmp(d->magic, "FSVerity", 8) != 0) {
+ fsverity_warn(inode,
+ "Signed file measurement uses unrecognized format");
+ return -EBADMSG;
+ }
+
+ hash_alg = fsverity_get_hash_alg(inode,
+ le16_to_cpu(d->digest_algorithm));
+ if (IS_ERR(hash_alg))
+ return PTR_ERR(hash_alg);
+
+ if (le16_to_cpu(d->digest_size) != hash_alg->digest_size) {
+ fsverity_warn(inode,
+ "Wrong digest_size in signed file measurement: wanted %u for algorithm %s, but got %u",
+ hash_alg->digest_size, hash_alg->name,
+ le16_to_cpu(d->digest_size));
+ return -EBADMSG;
+ }
+
+ if (len < sizeof(*d) + hash_alg->digest_size) {
+ fsverity_warn(inode, "Signed file measurement is truncated");
+ return -EBADMSG;
+ }
+
+ if (hash_alg != vi->tree_params.hash_alg) {
+ fsverity_warn(inode,
+ "Signed file measurement uses %s, but file uses %s",
+ hash_alg->name, vi->tree_params.hash_alg->name);
+ return -EBADMSG;
+ }
+
+ memcpy(arg->measurement, d->digest, hash_alg->digest_size);
+ arg->have_measurement = true;
+ return 0;
+}
+
+/**
+ * fsverity_verify_signature - check a verity file's signature
+ *
+ * Verify a signed fsverity_measurement against the certificates in the
+ * fs-verity keyring. The signature is given as a PKCS#7 formatted message, and
+ * the signed data is included in the message (not detached).
+ *
+ * Return: 0 on success (signature valid or not required); -errno on failure
+ */
+int fsverity_verify_signature(const struct fsverity_info *vi,
+ const struct fsverity_descriptor *desc,
+ size_t desc_size)
+{
+ const struct inode *inode = vi->inode;
+ const struct fsverity_hash_alg *hash_alg = vi->tree_params.hash_alg;
+ const unsigned int digest_size = hash_alg->digest_size;
+ const u32 sig_size = le32_to_cpu(desc->sig_size);
+ struct verify_arg arg = {
+ .vi = vi,
+ .have_measurement = false,
+ };
+ int err;
+
+ if (sig_size == 0) {
+ if (fsverity_require_signatures) {
+ fsverity_err(inode,
+ "require_signatures=1, rejecting unsigned file!");
+ return -EBADMSG;
+ }
+ return 0;
+ }
+
+ if (sig_size > desc_size - sizeof(*desc)) {
+ fsverity_err(inode, "Signature overflows verity descriptor");
+ return -EBADMSG;
+ }
+
+ err = verify_pkcs7_signature(NULL, 0, desc->signature, sig_size,
+ fsverity_keyring,
+ VERIFYING_UNSPECIFIED_SIGNATURE,
+ extract_measurement, &arg);
+ if (err) {
+ fsverity_err(inode, "Error %d verifying PKCS#7 signature", err);
+ return err;
+ }
+
+ if (!arg.have_measurement) {
+ fsverity_err(inode, "PKCS#7 message is missing internal data");
+ return -EBADMSG;
+ }
+
+ if (memcmp(arg.measurement, vi->measurement, digest_size) != 0) {
+ fsverity_err(inode,
+ "FILE CORRUPTED (signed measurement differs from actual measurement): signed %s:%*phN, actual %s:%*phN",
+ hash_alg->name, digest_size, arg.measurement,
+ hash_alg->name, digest_size, vi->measurement);
+ return -EBADMSG;
+ }
+
+ pr_debug("Valid signature for measurement: %s:%*phN\n",
+ hash_alg->name, digest_size, vi->measurement);
+ return 0;
+}
+
+#ifdef CONFIG_SYSCTL
+static int zero;
+static int one = 1;
+static struct ctl_table_header *fsverity_sysctl_header;
+
+static const struct ctl_path fsverity_sysctl_path[] = {
+ { .procname = "fs", },
+ { .procname = "verity", },
+ { }
+};
+
+static struct ctl_table fsverity_sysctl_table[] = {
+ {
+ .procname = "require_signatures",
+ .data = &fsverity_require_signatures,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &zero,
+ .extra2 = &one,
+ },
+ { }
+};
+
+static int __init fsverity_sysctl_init(void)
+{
+ fsverity_sysctl_header = register_sysctl_paths(fsverity_sysctl_path,
+ fsverity_sysctl_table);
+ if (!fsverity_sysctl_header) {
+ pr_err("sysctl registration failed!\n");
+ return -ENOMEM;
+ }
+ return 0;
+}
+#else /* !CONFIG_SYSCTL */
+static inline int fsverity_sysctl_init(void)
+{
+ return 0;
+}
+#endif /* !CONFIG_SYSCTL */
+
+int __init fsverity_init_signature(void)
+{
+ struct key *ring;
+ int err;
+
+ ring = keyring_alloc(".fs-verity", KUIDT_INIT(0), KGIDT_INIT(0),
+ current_cred(), KEY_POS_SEARCH |
+ KEY_USR_VIEW | KEY_USR_READ | KEY_USR_WRITE |
+ KEY_USR_SEARCH | KEY_USR_SETATTR,
+ KEY_ALLOC_NOT_IN_QUOTA, NULL, NULL);
+ if (IS_ERR(ring))
+ return PTR_ERR(ring);
+
+ err = fsverity_sysctl_init();
+ if (err)
+ goto err_put_ring;
+
+ fsverity_keyring = ring;
+ return 0;
+
+err_put_ring:
+ key_put(ring);
+ return err;
+}
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index 2a0f9e2ebc9f16..783f4042b679da 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -273,3 +273,9 @@ int __init fsverity_init_workqueue(void)
return -ENOMEM;
return 0;
}
+
+void __init fsverity_exit_workqueue(void)
+{
+ destroy_workqueue(fsverity_read_workqueue);
+ fsverity_read_workqueue = NULL;
+}
--
2.22.0.410.gd8fdbe21b5-goog

2019-06-20 20:54:25

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v5 16/16] f2fs: add fs-verity support

From: Eric Biggers <[email protected]>

Add fs-verity support to f2fs. fs-verity is a filesystem feature that
enables transparent integrity protection and authentication of read-only
files. It uses a dm-verity like mechanism at the file level: a Merkle
tree is used to verify any block in the file in log(filesize) time. It
is implemented mainly by helper functions in fs/verity/. See
Documentation/filesystems/fsverity.rst for the full documentation.

The f2fs support for fs-verity consists of:

- Adding a filesystem feature flag and an inode flag for fs-verity.

- Implementing the fsverity_operations to support enabling verity on an
inode and reading/writing the verity metadata.

- Updating ->readpages() to verify data as it's read from verity files
and to support reading verity metadata pages.

- Updating ->write_begin(), ->write_end(), and ->writepages() to support
writing verity metadata pages.

- Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().

Like ext4, f2fs stores the verity metadata (Merkle tree and
fsverity_descriptor) past the end of the file, starting at the first 64K
boundary beyond i_size. This approach works because (a) verity files
are readonly, and (b) pages fully beyond i_size aren't visible to
userspace but can be read/written internally by f2fs with only some
relatively small changes to f2fs. Extended attributes cannot be used
because (a) f2fs limits the total size of an inode's xattr entries to
4096 bytes, which wouldn't be enough for even a single Merkle tree
block, and (b) f2fs encryption doesn't encrypt xattrs, yet the verity
metadata *must* be encrypted when the file is because it contains hashes
of the plaintext data.

Signed-off-by: Eric Biggers <[email protected]>
---
fs/f2fs/Makefile | 1 +
fs/f2fs/data.c | 72 +++++++++++++--
fs/f2fs/f2fs.h | 23 ++++-
fs/f2fs/file.c | 40 ++++++++
fs/f2fs/inode.c | 5 +-
fs/f2fs/super.c | 3 +
fs/f2fs/sysfs.c | 11 +++
fs/f2fs/verity.c | 233 +++++++++++++++++++++++++++++++++++++++++++++++
fs/f2fs/xattr.h | 2 +
9 files changed, 376 insertions(+), 14 deletions(-)
create mode 100644 fs/f2fs/verity.c

diff --git a/fs/f2fs/Makefile b/fs/f2fs/Makefile
index 776c4b93650496..2aaecc63834fc8 100644
--- a/fs/f2fs/Makefile
+++ b/fs/f2fs/Makefile
@@ -8,3 +8,4 @@ f2fs-$(CONFIG_F2FS_STAT_FS) += debug.o
f2fs-$(CONFIG_F2FS_FS_XATTR) += xattr.o
f2fs-$(CONFIG_F2FS_FS_POSIX_ACL) += acl.o
f2fs-$(CONFIG_F2FS_IO_TRACE) += trace.o
+f2fs-$(CONFIG_FS_VERITY) += verity.o
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index eda4181d20926b..8f175d47291d0b 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -73,6 +73,7 @@ static enum count_type __read_io_type(struct page *page)
enum bio_post_read_step {
STEP_INITIAL = 0,
STEP_DECRYPT,
+ STEP_VERITY,
};

struct bio_post_read_ctx {
@@ -119,8 +120,23 @@ static void decrypt_work(struct work_struct *work)
bio_post_read_processing(ctx);
}

+static void verity_work(struct work_struct *work)
+{
+ struct bio_post_read_ctx *ctx =
+ container_of(work, struct bio_post_read_ctx, work);
+
+ fsverity_verify_bio(ctx->bio);
+
+ bio_post_read_processing(ctx);
+}
+
static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
{
+ /*
+ * We use different work queues for decryption and for verity because
+ * verity may require reading metadata pages that need decryption, and
+ * we shouldn't recurse to the same workqueue.
+ */
switch (++ctx->cur_step) {
case STEP_DECRYPT:
if (ctx->enabled_steps & (1 << STEP_DECRYPT)) {
@@ -130,6 +146,14 @@ static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
}
ctx->cur_step++;
/* fall-through */
+ case STEP_VERITY:
+ if (ctx->enabled_steps & (1 << STEP_VERITY)) {
+ INIT_WORK(&ctx->work, verity_work);
+ fsverity_enqueue_verify_work(&ctx->work);
+ return;
+ }
+ ctx->cur_step++;
+ /* fall-through */
default:
__read_end_io(ctx->bio);
}
@@ -553,8 +577,15 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
up_write(&io->io_rwsem);
}

+static inline bool f2fs_need_verity(const struct inode *inode, pgoff_t idx)
+{
+ return fsverity_active(inode) &&
+ idx < DIV_ROUND_UP(inode->i_size, PAGE_SIZE);
+}
+
static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
- unsigned nr_pages, unsigned op_flag)
+ unsigned nr_pages, unsigned op_flag,
+ pgoff_t first_idx)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct bio *bio;
@@ -570,6 +601,10 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,

if (f2fs_encrypted_file(inode))
post_read_steps |= 1 << STEP_DECRYPT;
+
+ if (f2fs_need_verity(inode, first_idx))
+ post_read_steps |= 1 << STEP_VERITY;
+
if (post_read_steps) {
ctx = mempool_alloc(bio_post_read_ctx_pool, GFP_NOFS);
if (!ctx) {
@@ -591,7 +626,7 @@ static int f2fs_submit_page_read(struct inode *inode, struct page *page,
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct bio *bio;

- bio = f2fs_grab_read_bio(inode, blkaddr, 1, 0);
+ bio = f2fs_grab_read_bio(inode, blkaddr, 1, 0, page->index);
if (IS_ERR(bio))
return PTR_ERR(bio);

@@ -1514,6 +1549,15 @@ int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
return ret;
}

+static inline loff_t f2fs_readpage_limit(struct inode *inode)
+{
+ if (IS_ENABLED(CONFIG_FS_VERITY) &&
+ (IS_VERITY(inode) || f2fs_verity_in_progress(inode)))
+ return inode->i_sb->s_maxbytes;
+
+ return i_size_read(inode);
+}
+
static int f2fs_read_single_page(struct inode *inode, struct page *page,
unsigned nr_pages,
struct f2fs_map_blocks *map,
@@ -1532,7 +1576,7 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page,

block_in_file = (sector_t)page->index;
last_block = block_in_file + nr_pages;
- last_block_in_file = (i_size_read(inode) + blocksize - 1) >>
+ last_block_in_file = (f2fs_readpage_limit(inode) + blocksize - 1) >>
blkbits;
if (last_block > last_block_in_file)
last_block = last_block_in_file;
@@ -1576,6 +1620,11 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page,
} else {
zero_out:
zero_user_segment(page, 0, PAGE_SIZE);
+ if (f2fs_need_verity(inode, page->index) &&
+ !fsverity_verify_page(page)) {
+ ret = -EIO;
+ goto out;
+ }
if (!PageUptodate(page))
SetPageUptodate(page);
unlock_page(page);
@@ -1594,7 +1643,7 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page,
}
if (bio == NULL) {
bio = f2fs_grab_read_bio(inode, block_nr, nr_pages,
- is_readahead ? REQ_RAHEAD : 0);
+ is_readahead ? REQ_RAHEAD : 0, page->index);
if (IS_ERR(bio)) {
ret = PTR_ERR(bio);
bio = NULL;
@@ -1991,7 +2040,7 @@ static int __write_data_page(struct page *page, bool *submitted,
if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
goto redirty_out;

- if (page->index < end_index)
+ if (page->index < end_index || f2fs_verity_in_progress(inode))
goto write;

/*
@@ -2383,7 +2432,8 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi,
* the block addresses when there is no need to fill the page.
*/
if (!f2fs_has_inline_data(inode) && len == PAGE_SIZE &&
- !is_inode_flag_set(inode, FI_NO_PREALLOC))
+ !is_inode_flag_set(inode, FI_NO_PREALLOC) &&
+ !f2fs_verity_in_progress(inode))
return 0;

/* f2fs_lock_op avoids race between write CP and convert_inline_page */
@@ -2522,7 +2572,8 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping,
if (len == PAGE_SIZE || PageUptodate(page))
return 0;

- if (!(pos & (PAGE_SIZE - 1)) && (pos + len) >= i_size_read(inode)) {
+ if (!(pos & (PAGE_SIZE - 1)) && (pos + len) >= i_size_read(inode) &&
+ !f2fs_verity_in_progress(inode)) {
zero_user_segment(page, len, PAGE_SIZE);
return 0;
}
@@ -2585,7 +2636,8 @@ static int f2fs_write_end(struct file *file,

set_page_dirty(page);

- if (pos + copied > i_size_read(inode))
+ if (pos + copied > i_size_read(inode) &&
+ !f2fs_verity_in_progress(inode))
f2fs_i_size_write(inode, pos + copied);
unlock_out:
f2fs_put_page(page, 1);
@@ -2906,7 +2958,9 @@ void f2fs_clear_page_cache_dirty_tag(struct page *page)

int __init f2fs_init_post_read_processing(void)
{
- bio_post_read_ctx_cache = KMEM_CACHE(bio_post_read_ctx, 0);
+ bio_post_read_ctx_cache =
+ kmem_cache_create("f2fs_bio_post_read_ctx",
+ sizeof(struct bio_post_read_ctx), 0, 0, NULL);
if (!bio_post_read_ctx_cache)
goto fail;
bio_post_read_ctx_pool =
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 06b89a9862ab2b..8477191ad1c9b2 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -25,6 +25,7 @@
#include <crypto/hash.h>

#include <linux/fscrypt.h>
+#include <linux/fsverity.h>

#ifdef CONFIG_F2FS_CHECK_FS
#define f2fs_bug_on(sbi, condition) BUG_ON(condition)
@@ -148,7 +149,7 @@ struct f2fs_mount_info {
#define F2FS_FEATURE_QUOTA_INO 0x0080
#define F2FS_FEATURE_INODE_CRTIME 0x0100
#define F2FS_FEATURE_LOST_FOUND 0x0200
-#define F2FS_FEATURE_VERITY 0x0400 /* reserved */
+#define F2FS_FEATURE_VERITY 0x0400
#define F2FS_FEATURE_SB_CHKSUM 0x0800

#define __F2FS_HAS_FEATURE(raw_super, mask) \
@@ -626,7 +627,7 @@ enum {
#define FADVISE_ENC_NAME_BIT 0x08
#define FADVISE_KEEP_SIZE_BIT 0x10
#define FADVISE_HOT_BIT 0x20
-#define FADVISE_VERITY_BIT 0x40 /* reserved */
+#define FADVISE_VERITY_BIT 0x40

#define FADVISE_MODIFIABLE_BITS (FADVISE_COLD_BIT | FADVISE_HOT_BIT)

@@ -646,6 +647,8 @@ enum {
#define file_is_hot(inode) is_file(inode, FADVISE_HOT_BIT)
#define file_set_hot(inode) set_file(inode, FADVISE_HOT_BIT)
#define file_clear_hot(inode) clear_file(inode, FADVISE_HOT_BIT)
+#define file_is_verity(inode) is_file(inode, FADVISE_VERITY_BIT)
+#define file_set_verity(inode) set_file(inode, FADVISE_VERITY_BIT)

#define DEF_DIR_LEVEL 0

@@ -2344,6 +2347,7 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr)
#define F2FS_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/
#define F2FS_HUGE_FILE_FL 0x00040000 /* Set to each huge file */
#define F2FS_EXTENTS_FL 0x00080000 /* Inode uses extents */
+#define F2FS_VERITY_FL 0x00100000 /* Verity protected inode */
#define F2FS_EA_INODE_FL 0x00200000 /* Inode used for large EA */
#define F2FS_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
#define F2FS_NOCOW_FL 0x00800000 /* Do not cow file */
@@ -2351,7 +2355,7 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr)
#define F2FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
#define F2FS_RESERVED_FL 0x80000000 /* reserved for ext4 lib */

-#define F2FS_FL_USER_VISIBLE 0x30CBDFFF /* User visible flags */
+#define F2FS_FL_USER_VISIBLE 0x30DBDFFF /* User visible flags */
#define F2FS_FL_USER_MODIFIABLE 0x204BC0FF /* User modifiable flags */

/* Flags we can manipulate with through F2FS_IOC_FSSETXATTR */
@@ -2417,6 +2421,7 @@ enum {
FI_PROJ_INHERIT, /* indicate file inherits projectid */
FI_PIN_FILE, /* indicate file should not be gced */
FI_ATOMIC_REVOKE_REQUEST, /* request to drop atomic data */
+ FI_VERITY_IN_PROGRESS, /* building fs-verity Merkle tree */
};

static inline void __mark_inode_dirty_flag(struct inode *inode,
@@ -2456,6 +2461,12 @@ static inline void clear_inode_flag(struct inode *inode, int flag)
__mark_inode_dirty_flag(inode, flag, false);
}

+static inline bool f2fs_verity_in_progress(struct inode *inode)
+{
+ return IS_ENABLED(CONFIG_FS_VERITY) &&
+ is_inode_flag_set(inode, FI_VERITY_IN_PROGRESS);
+}
+
static inline void set_acl_inode(struct inode *inode, umode_t mode)
{
F2FS_I(inode)->i_acl_mode = mode;
@@ -3524,6 +3535,9 @@ void f2fs_exit_sysfs(void);
int f2fs_register_sysfs(struct f2fs_sb_info *sbi);
void f2fs_unregister_sysfs(struct f2fs_sb_info *sbi);

+/* verity.c */
+extern const struct fsverity_operations f2fs_verityops;
+
/*
* crypto support
*/
@@ -3546,7 +3560,7 @@ static inline void f2fs_set_encrypted_inode(struct inode *inode)
*/
static inline bool f2fs_post_read_required(struct inode *inode)
{
- return f2fs_encrypted_file(inode);
+ return f2fs_encrypted_file(inode) || fsverity_active(inode);
}

#define F2FS_FEATURE_FUNCS(name, flagname) \
@@ -3564,6 +3578,7 @@ F2FS_FEATURE_FUNCS(flexible_inline_xattr, FLEXIBLE_INLINE_XATTR);
F2FS_FEATURE_FUNCS(quota_ino, QUOTA_INO);
F2FS_FEATURE_FUNCS(inode_crtime, INODE_CRTIME);
F2FS_FEATURE_FUNCS(lost_found, LOST_FOUND);
+F2FS_FEATURE_FUNCS(verity, VERITY);
F2FS_FEATURE_FUNCS(sb_chksum, SB_CHKSUM);

#ifdef CONFIG_BLK_DEV_ZONED
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 45b45f37d347e4..6706c2081941a2 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -493,6 +493,10 @@ static int f2fs_file_open(struct inode *inode, struct file *filp)
{
int err = fscrypt_file_open(inode, filp);

+ if (err)
+ return err;
+
+ err = fsverity_file_open(inode, filp);
if (err)
return err;

@@ -781,6 +785,10 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
if (err)
return err;

+ err = fsverity_prepare_setattr(dentry, attr);
+ if (err)
+ return err;
+
if (is_quota_modification(inode, attr)) {
err = dquot_initialize(inode);
if (err)
@@ -1656,6 +1664,8 @@ static int f2fs_ioc_getflags(struct file *filp, unsigned long arg)

if (IS_ENCRYPTED(inode))
flags |= F2FS_ENCRYPT_FL;
+ if (IS_VERITY(inode))
+ flags |= F2FS_VERITY_FL;
if (f2fs_has_inline_data(inode) || f2fs_has_inline_dentry(inode))
flags |= F2FS_INLINE_DATA_FL;
if (is_inode_flag_set(inode, FI_PIN_FILE))
@@ -2980,6 +2990,30 @@ static int f2fs_ioc_precache_extents(struct file *filp, unsigned long arg)
return f2fs_precache_extents(file_inode(filp));
}

+static int f2fs_ioc_enable_verity(struct file *filp, unsigned long arg)
+{
+ struct inode *inode = file_inode(filp);
+
+ f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
+
+ if (!f2fs_sb_has_verity(F2FS_I_SB(inode))) {
+ f2fs_msg(inode->i_sb, KERN_WARNING,
+ "Can't enable fs-verity on inode %lu: the verity feature is not enabled on this filesystem.\n",
+ inode->i_ino);
+ return -EOPNOTSUPP;
+ }
+
+ return fsverity_ioctl_enable(filp, (const void __user *)arg);
+}
+
+static int f2fs_ioc_measure_verity(struct file *filp, unsigned long arg)
+{
+ if (!f2fs_sb_has_verity(F2FS_I_SB(file_inode(filp))))
+ return -EOPNOTSUPP;
+
+ return fsverity_ioctl_measure(filp, (void __user *)arg);
+}
+
long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
if (unlikely(f2fs_cp_error(F2FS_I_SB(file_inode(filp)))))
@@ -3036,6 +3070,10 @@ long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return f2fs_ioc_set_pin_file(filp, arg);
case F2FS_IOC_PRECACHE_EXTENTS:
return f2fs_ioc_precache_extents(filp, arg);
+ case FS_IOC_ENABLE_VERITY:
+ return f2fs_ioc_enable_verity(filp, arg);
+ case FS_IOC_MEASURE_VERITY:
+ return f2fs_ioc_measure_verity(filp, arg);
default:
return -ENOTTY;
}
@@ -3149,6 +3187,8 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case F2FS_IOC_GET_PIN_FILE:
case F2FS_IOC_SET_PIN_FILE:
case F2FS_IOC_PRECACHE_EXTENTS:
+ case FS_IOC_ENABLE_VERITY:
+ case FS_IOC_MEASURE_VERITY:
break;
default:
return -ENOIOCTLCMD;
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index ccb02226dd2c0c..b2f945b1afe501 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -46,9 +46,11 @@ void f2fs_set_inode_flags(struct inode *inode)
new_fl |= S_DIRSYNC;
if (file_is_encrypt(inode))
new_fl |= S_ENCRYPTED;
+ if (file_is_verity(inode))
+ new_fl |= S_VERITY;
inode_set_flags(inode, new_fl,
S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|
- S_ENCRYPTED);
+ S_ENCRYPTED|S_VERITY);
}

static void __get_inode_rdev(struct inode *inode, struct f2fs_inode *ri)
@@ -749,6 +751,7 @@ void f2fs_evict_inode(struct inode *inode)
}
out_clear:
fscrypt_put_encryption_info(inode);
+ fsverity_cleanup_inode(inode);
clear_inode(inode);
}

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 6b959bbb336a30..ea4a247d6ed6f7 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3177,6 +3177,9 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
sb->s_op = &f2fs_sops;
#ifdef CONFIG_FS_ENCRYPTION
sb->s_cop = &f2fs_cryptops;
+#endif
+#ifdef CONFIG_FS_VERITY
+ sb->s_vop = &f2fs_verityops;
#endif
sb->s_xattr = f2fs_xattr_handlers;
sb->s_export_op = &f2fs_export_ops;
diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index 729f46a3c9ee0b..b3e28467db7279 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -117,6 +117,9 @@ static ssize_t features_show(struct f2fs_attr *a,
if (f2fs_sb_has_lost_found(sbi))
len += snprintf(buf + len, PAGE_SIZE - len, "%s%s",
len ? ", " : "", "lost_found");
+ if (f2fs_sb_has_verity(sbi))
+ len += snprintf(buf + len, PAGE_SIZE - len, "%s%s",
+ len ? ", " : "", "verity");
if (f2fs_sb_has_sb_chksum(sbi))
len += snprintf(buf + len, PAGE_SIZE - len, "%s%s",
len ? ", " : "", "sb_checksum");
@@ -350,6 +353,7 @@ enum feat_id {
FEAT_QUOTA_INO,
FEAT_INODE_CRTIME,
FEAT_LOST_FOUND,
+ FEAT_VERITY,
FEAT_SB_CHECKSUM,
};

@@ -367,6 +371,7 @@ static ssize_t f2fs_feature_show(struct f2fs_attr *a,
case FEAT_QUOTA_INO:
case FEAT_INODE_CRTIME:
case FEAT_LOST_FOUND:
+ case FEAT_VERITY:
case FEAT_SB_CHECKSUM:
return snprintf(buf, PAGE_SIZE, "supported\n");
}
@@ -455,6 +460,9 @@ F2FS_FEATURE_RO_ATTR(flexible_inline_xattr, FEAT_FLEXIBLE_INLINE_XATTR);
F2FS_FEATURE_RO_ATTR(quota_ino, FEAT_QUOTA_INO);
F2FS_FEATURE_RO_ATTR(inode_crtime, FEAT_INODE_CRTIME);
F2FS_FEATURE_RO_ATTR(lost_found, FEAT_LOST_FOUND);
+#ifdef CONFIG_FS_VERITY
+F2FS_FEATURE_RO_ATTR(verity, FEAT_VERITY);
+#endif
F2FS_FEATURE_RO_ATTR(sb_checksum, FEAT_SB_CHECKSUM);

#define ATTR_LIST(name) (&f2fs_attr_##name.attr)
@@ -517,6 +525,9 @@ static struct attribute *f2fs_feat_attrs[] = {
ATTR_LIST(quota_ino),
ATTR_LIST(inode_crtime),
ATTR_LIST(lost_found),
+#ifdef CONFIG_FS_VERITY
+ ATTR_LIST(verity),
+#endif
ATTR_LIST(sb_checksum),
NULL,
};
diff --git a/fs/f2fs/verity.c b/fs/f2fs/verity.c
new file mode 100644
index 00000000000000..dd9bb47ced0093
--- /dev/null
+++ b/fs/f2fs/verity.c
@@ -0,0 +1,233 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fs/f2fs/verity.c: fs-verity support for f2fs
+ *
+ * Copyright 2019 Google LLC
+ */
+
+/*
+ * Implementation of fsverity_operations for f2fs.
+ *
+ * Like ext4, f2fs stores the verity metadata (Merkle tree and
+ * fsverity_descriptor) past the end of the file, starting at the first 64K
+ * boundary beyond i_size. This approach works because (a) verity files are
+ * readonly, and (b) pages fully beyond i_size aren't visible to userspace but
+ * can be read/written internally by f2fs with only some relatively small
+ * changes to f2fs. Extended attributes cannot be used because (a) f2fs limits
+ * the total size of an inode's xattr entries to 4096 bytes, which wouldn't be
+ * enough for even a single Merkle tree block, and (b) f2fs encryption doesn't
+ * encrypt xattrs, yet the verity metadata *must* be encrypted when the file is
+ * because it contains hashes of the plaintext data.
+ *
+ * Using a 64K boundary rather than a 4K one keeps things ready for
+ * architectures with 64K pages, and it doesn't necessarily waste space on-disk
+ * since there can be a hole between i_size and the start of the Merkle tree.
+ */
+
+#include <linux/f2fs_fs.h>
+
+#include "f2fs.h"
+#include "xattr.h"
+
+static inline loff_t f2fs_verity_metadata_pos(const struct inode *inode)
+{
+ return round_up(inode->i_size, 65536);
+}
+
+/*
+ * Read some verity metadata from the inode. __vfs_read() can't be used because
+ * we need to read beyond i_size.
+ */
+static int pagecache_read(struct inode *inode, void *buf, size_t count,
+ loff_t pos)
+{
+ while (count) {
+ size_t n = min_t(size_t, count,
+ PAGE_SIZE - offset_in_page(pos));
+ struct page *page;
+ void *addr;
+
+ page = read_mapping_page(inode->i_mapping, pos >> PAGE_SHIFT,
+ NULL);
+ if (IS_ERR(page))
+ return PTR_ERR(page);
+
+ addr = kmap_atomic(page);
+ memcpy(buf, addr + offset_in_page(pos), n);
+ kunmap_atomic(addr);
+
+ put_page(page);
+
+ buf += n;
+ pos += n;
+ count -= n;
+ }
+ return 0;
+}
+
+/*
+ * Write some verity metadata to the inode for FS_IOC_ENABLE_VERITY.
+ * kernel_write() can't be used because the file descriptor is readonly.
+ */
+static int pagecache_write(struct inode *inode, const void *buf, size_t count,
+ loff_t pos)
+{
+ while (count) {
+ size_t n = min_t(size_t, count,
+ PAGE_SIZE - offset_in_page(pos));
+ struct page *page;
+ void *fsdata;
+ void *addr;
+ int res;
+
+ res = pagecache_write_begin(NULL, inode->i_mapping, pos, n, 0,
+ &page, &fsdata);
+ if (res)
+ return res;
+
+ addr = kmap_atomic(page);
+ memcpy(addr + offset_in_page(pos), buf, n);
+ kunmap_atomic(addr);
+
+ res = pagecache_write_end(NULL, inode->i_mapping, pos, n, n,
+ page, fsdata);
+ if (res < 0)
+ return res;
+ if (res != n)
+ return -EIO;
+
+ buf += n;
+ pos += n;
+ count -= n;
+ }
+ return 0;
+}
+
+/*
+ * Format of f2fs verity xattr. This points to the location of the verity
+ * descriptor within the file data rather than containing it directly because
+ * the verity descriptor *must* be encrypted when f2fs encryption is used. But,
+ * f2fs encryption does not encrypt xattrs.
+ */
+struct fsverity_descriptor_location {
+ __le32 version;
+ __le32 size;
+ __le64 pos;
+};
+
+static int f2fs_begin_enable_verity(struct file *filp)
+{
+ struct inode *inode = file_inode(filp);
+ int err;
+
+ err = f2fs_convert_inline_inode(inode);
+ if (err)
+ return err;
+
+ err = dquot_initialize(inode);
+ if (err)
+ return err;
+
+ set_inode_flag(inode, FI_VERITY_IN_PROGRESS);
+ return 0;
+}
+
+static int f2fs_end_enable_verity(struct file *filp, const void *desc,
+ size_t desc_size, u64 merkle_tree_size)
+{
+ struct inode *inode = file_inode(filp);
+ u64 desc_pos = f2fs_verity_metadata_pos(inode) + merkle_tree_size;
+ struct fsverity_descriptor_location dloc = {
+ .version = cpu_to_le32(1),
+ .size = cpu_to_le32(desc_size),
+ .pos = cpu_to_le64(desc_pos),
+ };
+ int err = 0;
+
+ if (desc != NULL) {
+ /* Succeeded; write the verity descriptor. */
+ err = pagecache_write(inode, desc, desc_size, desc_pos);
+
+ /* Write all pages before clearing FI_VERITY_IN_PROGRESS. */
+ if (!err)
+ err = filemap_write_and_wait(inode->i_mapping);
+ } else {
+ /* Failed; truncate anything we wrote past i_size. */
+ f2fs_truncate(inode);
+ }
+
+ clear_inode_flag(inode, FI_VERITY_IN_PROGRESS);
+
+ if (desc != NULL && !err) {
+ err = f2fs_setxattr(inode, F2FS_XATTR_INDEX_VERITY,
+ F2FS_XATTR_NAME_VERITY, &dloc, sizeof(dloc),
+ NULL, XATTR_CREATE);
+ if (!err) {
+ file_set_verity(inode);
+ f2fs_set_inode_flags(inode);
+ f2fs_mark_inode_dirty_sync(inode, true);
+ }
+ }
+ return err;
+}
+
+static int f2fs_get_verity_descriptor(struct inode *inode, void *buf,
+ size_t buf_size)
+{
+ struct fsverity_descriptor_location dloc;
+ int res;
+ u32 size;
+ u64 pos;
+
+ /* Get the descriptor location */
+ res = f2fs_getxattr(inode, F2FS_XATTR_INDEX_VERITY,
+ F2FS_XATTR_NAME_VERITY, &dloc, sizeof(dloc), NULL);
+ if (res < 0 && res != -ERANGE)
+ return res;
+ if (res != sizeof(dloc) || dloc.version != cpu_to_le32(1)) {
+ f2fs_msg(inode->i_sb, KERN_WARNING,
+ "unknown verity xattr format");
+ return -EINVAL;
+ }
+ size = le32_to_cpu(dloc.size);
+ pos = le64_to_cpu(dloc.pos);
+
+ /* Get the descriptor */
+ if (pos + size < pos || pos + size > inode->i_sb->s_maxbytes ||
+ pos < f2fs_verity_metadata_pos(inode) || size > INT_MAX) {
+ f2fs_msg(inode->i_sb, KERN_WARNING, "invalid verity xattr");
+ return -EUCLEAN; /* EFSCORRUPTED */
+ }
+ if (buf_size) {
+ if (size > buf_size)
+ return -ERANGE;
+ res = pagecache_read(inode, buf, size, pos);
+ if (res)
+ return res;
+ }
+ return size;
+}
+
+static struct page *f2fs_read_merkle_tree_page(struct inode *inode,
+ pgoff_t index)
+{
+ index += f2fs_verity_metadata_pos(inode) >> PAGE_SHIFT;
+
+ return read_mapping_page(inode->i_mapping, index, NULL);
+}
+
+static int f2fs_write_merkle_tree_block(struct inode *inode, const void *buf,
+ u64 index, int log_blocksize)
+{
+ loff_t pos = f2fs_verity_metadata_pos(inode) + (index << log_blocksize);
+
+ return pagecache_write(inode, buf, 1 << log_blocksize, pos);
+}
+
+const struct fsverity_operations f2fs_verityops = {
+ .begin_enable_verity = f2fs_begin_enable_verity,
+ .end_enable_verity = f2fs_end_enable_verity,
+ .get_verity_descriptor = f2fs_get_verity_descriptor,
+ .read_merkle_tree_page = f2fs_read_merkle_tree_page,
+ .write_merkle_tree_block = f2fs_write_merkle_tree_block,
+};
diff --git a/fs/f2fs/xattr.h b/fs/f2fs/xattr.h
index a90920e2f94980..de0c600b9cab09 100644
--- a/fs/f2fs/xattr.h
+++ b/fs/f2fs/xattr.h
@@ -34,8 +34,10 @@
#define F2FS_XATTR_INDEX_ADVISE 7
/* Should be same as EXT4_XATTR_INDEX_ENCRYPTION */
#define F2FS_XATTR_INDEX_ENCRYPTION 9
+#define F2FS_XATTR_INDEX_VERITY 11

#define F2FS_XATTR_NAME_ENCRYPTION_CONTEXT "c"
+#define F2FS_XATTR_NAME_VERITY "v"

struct f2fs_xattr_header {
__le32 h_magic; /* magic number for identification */
--
2.22.0.410.gd8fdbe21b5-goog

2019-06-20 20:54:41

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v5 14/16] ext4: add basic fs-verity support

From: Eric Biggers <[email protected]>

Add most of fs-verity support to ext4. fs-verity is a filesystem
feature that enables transparent integrity protection and authentication
of read-only files. It uses a dm-verity like mechanism at the file
level: a Merkle tree is used to verify any block in the file in
log(filesize) time. It is implemented mainly by helper functions in
fs/verity/. See Documentation/filesystems/fsverity.rst for the full
documentation.

This commit adds all of ext4 fs-verity support except for the actual
data verification, including:

- Adding a filesystem feature flag and an inode flag for fs-verity.

- Implementing the fsverity_operations to support enabling verity on an
inode and reading/writing the verity metadata.

- Updating ->write_begin(), ->write_end(), and ->writepages() to support
writing verity metadata pages.

- Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().

ext4 stores the verity metadata (Merkle tree and fsverity_descriptor)
past the end of the file, starting at the first 64K boundary beyond
i_size. This approach works because (a) verity files are readonly, and
(b) pages fully beyond i_size aren't visible to userspace but can be
read/written internally by ext4 with only some relatively small changes
to ext4. This approach avoids having to depend on the EA_INODE feature
and on rearchitecturing ext4's xattr support to support paging
multi-gigabyte xattrs into memory, and to support encrypting xattrs.
Note that the verity metadata *must* be encrypted when the file is,
since it contains hashes of the plaintext data.

This patch incorporates work by Theodore Ts'o and Chandan Rajendra.

Signed-off-by: Eric Biggers <[email protected]>
---
fs/ext4/Makefile | 1 +
fs/ext4/ext4.h | 21 ++-
fs/ext4/file.c | 4 +
fs/ext4/inode.c | 46 ++++--
fs/ext4/ioctl.c | 12 ++
fs/ext4/super.c | 9 ++
fs/ext4/sysfs.c | 6 +
fs/ext4/verity.c | 354 +++++++++++++++++++++++++++++++++++++++++++++++
8 files changed, 438 insertions(+), 15 deletions(-)
create mode 100644 fs/ext4/verity.c

diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
index 8fdfcd3c3e0437..b17ddc229ac5f5 100644
--- a/fs/ext4/Makefile
+++ b/fs/ext4/Makefile
@@ -13,3 +13,4 @@ ext4-y := balloc.o bitmap.o block_validity.o dir.o ext4_jbd2.o extents.o \

ext4-$(CONFIG_EXT4_FS_POSIX_ACL) += acl.o
ext4-$(CONFIG_EXT4_FS_SECURITY) += xattr_security.o
+ext4-$(CONFIG_FS_VERITY) += verity.o
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 1cb67859e0518b..5a1deea3fb3e37 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -41,6 +41,7 @@
#endif

#include <linux/fscrypt.h>
+#include <linux/fsverity.h>

#include <linux/compiler.h>

@@ -395,6 +396,7 @@ struct flex_groups {
#define EXT4_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/
#define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */
#define EXT4_EXTENTS_FL 0x00080000 /* Inode uses extents */
+#define EXT4_VERITY_FL 0x00100000 /* Verity protected inode */
#define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
#define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
#define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
@@ -402,7 +404,7 @@ struct flex_groups {
#define EXT4_CASEFOLD_FL 0x40000000 /* Casefolded file */
#define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */

-#define EXT4_FL_USER_VISIBLE 0x704BDFFF /* User visible flags */
+#define EXT4_FL_USER_VISIBLE 0x705BDFFF /* User visible flags */
#define EXT4_FL_USER_MODIFIABLE 0x604BC0FF /* User modifiable flags */

/* Flags we can manipulate with through EXT4_IOC_FSSETXATTR */
@@ -466,6 +468,7 @@ enum {
EXT4_INODE_TOPDIR = 17, /* Top of directory hierarchies*/
EXT4_INODE_HUGE_FILE = 18, /* Set to each huge file */
EXT4_INODE_EXTENTS = 19, /* Inode uses extents */
+ EXT4_INODE_VERITY = 20, /* Verity protected inode */
EXT4_INODE_EA_INODE = 21, /* Inode used for large EA */
EXT4_INODE_EOFBLOCKS = 22, /* Blocks allocated beyond EOF */
EXT4_INODE_INLINE_DATA = 28, /* Data in inode. */
@@ -511,6 +514,7 @@ static inline void ext4_check_flag_values(void)
CHECK_FLAG_VALUE(TOPDIR);
CHECK_FLAG_VALUE(HUGE_FILE);
CHECK_FLAG_VALUE(EXTENTS);
+ CHECK_FLAG_VALUE(VERITY);
CHECK_FLAG_VALUE(EA_INODE);
CHECK_FLAG_VALUE(EOFBLOCKS);
CHECK_FLAG_VALUE(INLINE_DATA);
@@ -1559,6 +1563,7 @@ enum {
EXT4_STATE_MAY_INLINE_DATA, /* may have in-inode data */
EXT4_STATE_EXT_PRECACHED, /* extents have been precached */
EXT4_STATE_LUSTRE_EA_INODE, /* Lustre-style ea_inode */
+ EXT4_STATE_VERITY_IN_PROGRESS, /* building fs-verity Merkle tree */
};

#define EXT4_INODE_BIT_FNS(name, field, offset) \
@@ -1609,6 +1614,12 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
#define EXT4_SB(sb) (sb)
#endif

+static inline bool ext4_verity_in_progress(struct inode *inode)
+{
+ return IS_ENABLED(CONFIG_FS_VERITY) &&
+ ext4_test_inode_state(inode, EXT4_STATE_VERITY_IN_PROGRESS);
+}
+
#define NEXT_ORPHAN(inode) EXT4_I(inode)->i_dtime

/*
@@ -1661,6 +1672,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
#define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM 0x0400
#define EXT4_FEATURE_RO_COMPAT_READONLY 0x1000
#define EXT4_FEATURE_RO_COMPAT_PROJECT 0x2000
+#define EXT4_FEATURE_RO_COMPAT_VERITY 0x8000

#define EXT4_FEATURE_INCOMPAT_COMPRESSION 0x0001
#define EXT4_FEATURE_INCOMPAT_FILETYPE 0x0002
@@ -1755,6 +1767,7 @@ EXT4_FEATURE_RO_COMPAT_FUNCS(bigalloc, BIGALLOC)
EXT4_FEATURE_RO_COMPAT_FUNCS(metadata_csum, METADATA_CSUM)
EXT4_FEATURE_RO_COMPAT_FUNCS(readonly, READONLY)
EXT4_FEATURE_RO_COMPAT_FUNCS(project, PROJECT)
+EXT4_FEATURE_RO_COMPAT_FUNCS(verity, VERITY)

EXT4_FEATURE_INCOMPAT_FUNCS(compression, COMPRESSION)
EXT4_FEATURE_INCOMPAT_FUNCS(filetype, FILETYPE)
@@ -1812,7 +1825,8 @@ EXT4_FEATURE_INCOMPAT_FUNCS(casefold, CASEFOLD)
EXT4_FEATURE_RO_COMPAT_BIGALLOC |\
EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|\
EXT4_FEATURE_RO_COMPAT_QUOTA |\
- EXT4_FEATURE_RO_COMPAT_PROJECT)
+ EXT4_FEATURE_RO_COMPAT_PROJECT |\
+ EXT4_FEATURE_RO_COMPAT_VERITY)

#define EXTN_FEATURE_FUNCS(ver) \
static inline bool ext4_has_unknown_ext##ver##_compat_features(struct super_block *sb) \
@@ -3250,6 +3264,9 @@ extern int ext4_bio_write_page(struct ext4_io_submit *io,
/* mmp.c */
extern int ext4_multi_mount_protect(struct super_block *, ext4_fsblk_t);

+/* verity.c */
+extern const struct fsverity_operations ext4_verityops;
+
/*
* Add new method to test whether block and inode bitmaps are properly
* initialized. With uninit_bg reading the block from disk is not enough
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 2c5baa5e829116..ed59fb8f268e00 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -451,6 +451,10 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
if (ret)
return ret;

+ ret = fsverity_file_open(inode, filp);
+ if (ret)
+ return ret;
+
/*
* Set up the jbd2_inode if we are opening the inode for
* writing and the journal is present
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c7f77c64300855..514e24f88f90f4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1390,6 +1390,7 @@ static int ext4_write_end(struct file *file,
int ret = 0, ret2;
int i_size_changed = 0;
int inline_data = ext4_has_inline_data(inode);
+ bool verity = ext4_verity_in_progress(inode);

trace_ext4_write_end(inode, pos, len, copied);
if (inline_data) {
@@ -1407,12 +1408,16 @@ static int ext4_write_end(struct file *file,
/*
* it's important to update i_size while still holding page lock:
* page writeout could otherwise come in and zero beyond i_size.
+ *
+ * If FS_IOC_ENABLE_VERITY is running on this inode, then Merkle tree
+ * blocks are being written past EOF, so skip the i_size update.
*/
- i_size_changed = ext4_update_inode_size(inode, pos + copied);
+ if (!verity)
+ i_size_changed = ext4_update_inode_size(inode, pos + copied);
unlock_page(page);
put_page(page);

- if (old_size < pos)
+ if (old_size < pos && !verity)
pagecache_isize_extended(inode, old_size, pos);
/*
* Don't mark the inode dirty under page lock. First, it unnecessarily
@@ -1423,7 +1428,7 @@ static int ext4_write_end(struct file *file,
if (i_size_changed || inline_data)
ext4_mark_inode_dirty(handle, inode);

- if (pos + len > inode->i_size && ext4_can_truncate(inode))
+ if (pos + len > inode->i_size && !verity && ext4_can_truncate(inode))
/* if we have allocated more blocks and copied
* less. We will have blocks allocated outside
* inode->i_size. So truncate them
@@ -1434,7 +1439,7 @@ static int ext4_write_end(struct file *file,
if (!ret)
ret = ret2;

- if (pos + len > inode->i_size) {
+ if (pos + len > inode->i_size && !verity) {
ext4_truncate_failed_write(inode);
/*
* If truncate failed early the inode might still be
@@ -1495,6 +1500,7 @@ static int ext4_journalled_write_end(struct file *file,
unsigned from, to;
int size_changed = 0;
int inline_data = ext4_has_inline_data(inode);
+ bool verity = ext4_verity_in_progress(inode);

trace_ext4_journalled_write_end(inode, pos, len, copied);
from = pos & (PAGE_SIZE - 1);
@@ -1524,13 +1530,14 @@ static int ext4_journalled_write_end(struct file *file,
if (!partial)
SetPageUptodate(page);
}
- size_changed = ext4_update_inode_size(inode, pos + copied);
+ if (!verity)
+ size_changed = ext4_update_inode_size(inode, pos + copied);
ext4_set_inode_state(inode, EXT4_STATE_JDATA);
EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid;
unlock_page(page);
put_page(page);

- if (old_size < pos)
+ if (old_size < pos && !verity)
pagecache_isize_extended(inode, old_size, pos);

if (size_changed || inline_data) {
@@ -1539,7 +1546,7 @@ static int ext4_journalled_write_end(struct file *file,
ret = ret2;
}

- if (pos + len > inode->i_size && ext4_can_truncate(inode))
+ if (pos + len > inode->i_size && !verity && ext4_can_truncate(inode))
/* if we have allocated more blocks and copied
* less. We will have blocks allocated outside
* inode->i_size. So truncate them
@@ -1550,7 +1557,7 @@ static int ext4_journalled_write_end(struct file *file,
ret2 = ext4_journal_stop(handle);
if (!ret)
ret = ret2;
- if (pos + len > inode->i_size) {
+ if (pos + len > inode->i_size && !verity) {
ext4_truncate_failed_write(inode);
/*
* If truncate failed early the inode might still be
@@ -2146,7 +2153,8 @@ static int ext4_writepage(struct page *page,

trace_ext4_writepage(page);
size = i_size_read(inode);
- if (page->index == size >> PAGE_SHIFT)
+ if (page->index == size >> PAGE_SHIFT &&
+ !ext4_verity_in_progress(inode))
len = size & ~PAGE_MASK;
else
len = PAGE_SIZE;
@@ -2230,7 +2238,8 @@ static int mpage_submit_page(struct mpage_da_data *mpd, struct page *page)
* after page tables are updated.
*/
size = i_size_read(mpd->inode);
- if (page->index == size >> PAGE_SHIFT)
+ if (page->index == size >> PAGE_SHIFT &&
+ !ext4_verity_in_progress(mpd->inode))
len = size & ~PAGE_MASK;
else
len = PAGE_SIZE;
@@ -2329,6 +2338,9 @@ static int mpage_process_page_bufs(struct mpage_da_data *mpd,
ext4_lblk_t blocks = (i_size_read(inode) + i_blocksize(inode) - 1)
>> inode->i_blkbits;

+ if (ext4_verity_in_progress(inode))
+ blocks = EXT_MAX_BLOCKS;
+
do {
BUG_ON(buffer_locked(bh));

@@ -3045,8 +3057,8 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,

index = pos >> PAGE_SHIFT;

- if (ext4_nonda_switch(inode->i_sb) ||
- S_ISLNK(inode->i_mode)) {
+ if (ext4_nonda_switch(inode->i_sb) || S_ISLNK(inode->i_mode) ||
+ ext4_verity_in_progress(inode)) {
*fsdata = (void *)FALL_BACK_TO_NONDELALLOC;
return ext4_write_begin(file, mapping, pos,
len, flags, pagep, fsdata);
@@ -4720,6 +4732,8 @@ static bool ext4_should_use_dax(struct inode *inode)
return false;
if (ext4_test_inode_flag(inode, EXT4_INODE_ENCRYPT))
return false;
+ if (ext4_test_inode_flag(inode, EXT4_INODE_VERITY))
+ return false;
return true;
}

@@ -4744,9 +4758,11 @@ void ext4_set_inode_flags(struct inode *inode)
new_fl |= S_ENCRYPTED;
if (flags & EXT4_CASEFOLD_FL)
new_fl |= S_CASEFOLD;
+ if (flags & EXT4_VERITY_FL)
+ new_fl |= S_VERITY;
inode_set_flags(inode, new_fl,
S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX|
- S_ENCRYPTED|S_CASEFOLD);
+ S_ENCRYPTED|S_CASEFOLD|S_VERITY);
}

static blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode,
@@ -5528,6 +5544,10 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
if (error)
return error;

+ error = fsverity_prepare_setattr(dentry, attr);
+ if (error)
+ return error;
+
if (is_quota_modification(inode, attr)) {
error = dquot_initialize(inode);
if (error)
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index e486e49b31ed7a..93b63697f5dce6 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -1092,6 +1092,16 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
case EXT4_IOC_GET_ENCRYPTION_POLICY:
return fscrypt_ioctl_get_policy(filp, (void __user *)arg);

+ case FS_IOC_ENABLE_VERITY:
+ if (!ext4_has_feature_verity(sb))
+ return -EOPNOTSUPP;
+ return fsverity_ioctl_enable(filp, (const void __user *)arg);
+
+ case FS_IOC_MEASURE_VERITY:
+ if (!ext4_has_feature_verity(sb))
+ return -EOPNOTSUPP;
+ return fsverity_ioctl_measure(filp, (void __user *)arg);
+
case EXT4_IOC_FSGETXATTR:
{
struct fsxattr fa;
@@ -1210,6 +1220,8 @@ long ext4_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case EXT4_IOC_SET_ENCRYPTION_POLICY:
case EXT4_IOC_GET_ENCRYPTION_PWSALT:
case EXT4_IOC_GET_ENCRYPTION_POLICY:
+ case FS_IOC_ENABLE_VERITY:
+ case FS_IOC_MEASURE_VERITY:
case EXT4_IOC_SHUTDOWN:
case FS_IOC_GETFSMAP:
break;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 4079605d437ae7..05a9874687c365 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1179,6 +1179,7 @@ void ext4_clear_inode(struct inode *inode)
EXT4_I(inode)->jinode = NULL;
}
fscrypt_put_encryption_info(inode);
+ fsverity_cleanup_inode(inode);
}

static struct inode *ext4_nfs_get_inode(struct super_block *sb,
@@ -4272,6 +4273,9 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
#ifdef CONFIG_FS_ENCRYPTION
sb->s_cop = &ext4_cryptops;
#endif
+#ifdef CONFIG_FS_VERITY
+ sb->s_vop = &ext4_verityops;
+#endif
#ifdef CONFIG_QUOTA
sb->dq_op = &ext4_quota_operations;
if (ext4_has_feature_quota(sb))
@@ -4419,6 +4423,11 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
goto failed_mount_wq;
}

+ if (ext4_has_feature_verity(sb) && blocksize != PAGE_SIZE) {
+ ext4_msg(sb, KERN_ERR, "Unsupported blocksize for fs-verity");
+ goto failed_mount_wq;
+ }
+
if (DUMMY_ENCRYPTION_ENABLED(sbi) && !sb_rdonly(sb) &&
!ext4_has_feature_encrypt(sb)) {
ext4_set_feature_encrypt(sb);
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index 04b4f53f0659e5..534531747bf1af 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -241,6 +241,9 @@ EXT4_ATTR_FEATURE(encryption);
#ifdef CONFIG_UNICODE
EXT4_ATTR_FEATURE(casefold);
#endif
+#ifdef CONFIG_FS_VERITY
+EXT4_ATTR_FEATURE(verity);
+#endif
EXT4_ATTR_FEATURE(metadata_csum_seed);

static struct attribute *ext4_feat_attrs[] = {
@@ -252,6 +255,9 @@ static struct attribute *ext4_feat_attrs[] = {
#endif
#ifdef CONFIG_UNICODE
ATTR_LIST(casefold),
+#endif
+#ifdef CONFIG_FS_VERITY
+ ATTR_LIST(verity),
#endif
ATTR_LIST(metadata_csum_seed),
NULL,
diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
new file mode 100644
index 00000000000000..0ff98eb4ecdbb7
--- /dev/null
+++ b/fs/ext4/verity.c
@@ -0,0 +1,354 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fs/ext4/verity.c: fs-verity support for ext4
+ *
+ * Copyright 2019 Google LLC
+ */
+
+/*
+ * Implementation of fsverity_operations for ext4.
+ *
+ * ext4 stores the verity metadata (Merkle tree and fsverity_descriptor) past
+ * the end of the file, starting at the first 64K boundary beyond i_size. This
+ * approach works because (a) verity files are readonly, and (b) pages fully
+ * beyond i_size aren't visible to userspace but can be read/written internally
+ * by ext4 with only some relatively small changes to ext4. This approach
+ * avoids having to depend on the EA_INODE feature and on rearchitecturing
+ * ext4's xattr support to support paging multi-gigabyte xattrs into memory, and
+ * to support encrypting xattrs. Note that the verity metadata *must* be
+ * encrypted when the file is, since it contains hashes of the plaintext data.
+ *
+ * Using a 64K boundary rather than a 4K one keeps things ready for
+ * architectures with 64K pages, and it doesn't necessarily waste space on-disk
+ * since there can be a hole between i_size and the start of the Merkle tree.
+ */
+
+#include <linux/quotaops.h>
+
+#include "ext4.h"
+#include "ext4_extents.h"
+#include "ext4_jbd2.h"
+
+static inline loff_t ext4_verity_metadata_pos(const struct inode *inode)
+{
+ return round_up(inode->i_size, 65536);
+}
+
+/*
+ * Read some verity metadata from the inode. __vfs_read() can't be used because
+ * we need to read beyond i_size.
+ */
+static int pagecache_read(struct inode *inode, void *buf, size_t count,
+ loff_t pos)
+{
+ while (count) {
+ size_t n = min_t(size_t, count,
+ PAGE_SIZE - offset_in_page(pos));
+ struct page *page;
+ void *addr;
+
+ page = read_mapping_page(inode->i_mapping, pos >> PAGE_SHIFT,
+ NULL);
+ if (IS_ERR(page))
+ return PTR_ERR(page);
+
+ addr = kmap_atomic(page);
+ memcpy(buf, addr + offset_in_page(pos), n);
+ kunmap_atomic(addr);
+
+ put_page(page);
+
+ buf += n;
+ pos += n;
+ count -= n;
+ }
+ return 0;
+}
+
+/*
+ * Write some verity metadata to the inode for FS_IOC_ENABLE_VERITY.
+ * kernel_write() can't be used because the file descriptor is readonly.
+ */
+static int pagecache_write(struct inode *inode, const void *buf, size_t count,
+ loff_t pos)
+{
+ while (count) {
+ size_t n = min_t(size_t, count,
+ PAGE_SIZE - offset_in_page(pos));
+ struct page *page;
+ void *fsdata;
+ void *addr;
+ int res;
+
+ res = pagecache_write_begin(NULL, inode->i_mapping, pos, n, 0,
+ &page, &fsdata);
+ if (res)
+ return res;
+
+ addr = kmap_atomic(page);
+ memcpy(addr + offset_in_page(pos), buf, n);
+ kunmap_atomic(addr);
+
+ res = pagecache_write_end(NULL, inode->i_mapping, pos, n, n,
+ page, fsdata);
+ if (res < 0)
+ return res;
+ if (res != n)
+ return -EIO;
+
+ buf += n;
+ pos += n;
+ count -= n;
+ }
+ return 0;
+}
+
+static int ext4_begin_enable_verity(struct file *filp)
+{
+ struct inode *inode = file_inode(filp);
+ const int credits = 2; /* superblock and inode for ext4_orphan_add() */
+ handle_t *handle;
+ int err;
+
+ err = ext4_convert_inline_data(inode);
+ if (err)
+ return err;
+
+ if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
+ ext4_warning_inode(inode,
+ "verity is only allowed on extent-based files");
+ return -EOPNOTSUPP;
+ }
+
+ err = ext4_inode_attach_jinode(inode);
+ if (err)
+ return err;
+
+ /*
+ * ext4 uses the last allocated block to find the verity descriptor, so
+ * we must remove any other blocks which might confuse things.
+ */
+ err = ext4_truncate(inode);
+ if (err)
+ return err;
+
+ err = dquot_initialize(inode);
+ if (err)
+ return err;
+
+ handle = ext4_journal_start(inode, EXT4_HT_INODE, credits);
+ if (IS_ERR(handle))
+ return PTR_ERR(handle);
+
+ err = ext4_orphan_add(handle, inode);
+ if (err == 0)
+ ext4_set_inode_state(inode, EXT4_STATE_VERITY_IN_PROGRESS);
+
+ ext4_journal_stop(handle);
+ return err;
+}
+
+/*
+ * ext4 stores the verity descriptor beginning on the next filesystem block
+ * boundary after the Merkle tree. Then, the descriptor size is stored in the
+ * last 4 bytes of the last allocated filesystem block --- which is either the
+ * block in which the descriptor ends, or the next block after that if there
+ * weren't at least 4 bytes remaining.
+ *
+ * We can't simply store the descriptor in an xattr because it *must* be
+ * encrypted when ext4 encryption is used, but ext4 encryption doesn't encrypt
+ * xattrs. Also, if the descriptor includes a large signature blob it may be
+ * too large to store in an xattr without the EA_INODE feature.
+ */
+static int ext4_write_verity_descriptor(struct inode *inode, const void *desc,
+ size_t desc_size, u64 merkle_tree_size)
+{
+ const u64 desc_pos = round_up(ext4_verity_metadata_pos(inode) +
+ merkle_tree_size, i_blocksize(inode));
+ const u64 desc_end = desc_pos + desc_size;
+ const __le32 desc_size_disk = cpu_to_le32(desc_size);
+ const u64 desc_size_pos = round_up(desc_end + sizeof(desc_size_disk),
+ i_blocksize(inode)) -
+ sizeof(desc_size_disk);
+ int err;
+
+ err = pagecache_write(inode, desc, desc_size, desc_pos);
+ if (err)
+ return err;
+
+ return pagecache_write(inode, &desc_size_disk, sizeof(desc_size_disk),
+ desc_size_pos);
+}
+
+static int ext4_end_enable_verity(struct file *filp, const void *desc,
+ size_t desc_size, u64 merkle_tree_size)
+{
+ struct inode *inode = file_inode(filp);
+ const int credits = 2; /* superblock and inode for ext4_orphan_add() */
+ handle_t *handle;
+ int err1 = 0;
+ int err;
+
+ if (desc != NULL) {
+ /* Succeeded; write the verity descriptor. */
+ err1 = ext4_write_verity_descriptor(inode, desc, desc_size,
+ merkle_tree_size);
+
+ /* Write all pages before clearing VERITY_IN_PROGRESS. */
+ if (!err1)
+ err1 = filemap_write_and_wait(inode->i_mapping);
+ } else {
+ /* Failed; truncate anything we wrote past i_size. */
+ ext4_truncate(inode);
+ }
+
+ /*
+ * We must always clean up by clearing EXT4_STATE_VERITY_IN_PROGRESS and
+ * deleting the inode from the orphan list, even if something failed.
+ * If everything succeeded, we'll also set the verity bit in the same
+ * transaction.
+ */
+
+ ext4_clear_inode_state(inode, EXT4_STATE_VERITY_IN_PROGRESS);
+
+ handle = ext4_journal_start(inode, EXT4_HT_INODE, credits);
+ if (IS_ERR(handle)) {
+ ext4_orphan_del(NULL, inode);
+ return PTR_ERR(handle);
+ }
+
+ err = ext4_orphan_del(handle, inode);
+ if (err)
+ goto out_stop;
+
+ if (desc != NULL && !err1) {
+ struct ext4_iloc iloc;
+
+ err = ext4_reserve_inode_write(handle, inode, &iloc);
+ if (err)
+ goto out_stop;
+ ext4_set_inode_flag(inode, EXT4_INODE_VERITY);
+ ext4_set_inode_flags(inode);
+ err = ext4_mark_iloc_dirty(handle, inode, &iloc);
+ }
+out_stop:
+ ext4_journal_stop(handle);
+ return err ?: err1;
+}
+
+static int ext4_get_verity_descriptor_location(struct inode *inode,
+ size_t *desc_size_ret,
+ u64 *desc_pos_ret)
+{
+ struct ext4_ext_path *path;
+ struct ext4_extent *last_extent;
+ u32 end_lblk;
+ u64 desc_size_pos;
+ __le32 desc_size_disk;
+ u32 desc_size;
+ u64 desc_pos;
+ int err;
+
+ /*
+ * Descriptor size is in last 4 bytes of last allocated block.
+ * See ext4_write_verity_descriptor().
+ */
+
+ if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
+ EXT4_ERROR_INODE(inode, "verity file doesn't use extents");
+ return -EFSCORRUPTED;
+ }
+
+ path = ext4_find_extent(inode, EXT_MAX_BLOCKS - 1, NULL, 0);
+ if (IS_ERR(path))
+ return PTR_ERR(path);
+
+ last_extent = path[path->p_depth].p_ext;
+ if (!last_extent) {
+ EXT4_ERROR_INODE(inode, "verity file has no extents");
+ ext4_ext_drop_refs(path);
+ kfree(path);
+ return -EFSCORRUPTED;
+ }
+
+ end_lblk = le32_to_cpu(last_extent->ee_block) +
+ ext4_ext_get_actual_len(last_extent);
+ desc_size_pos = (u64)end_lblk << inode->i_blkbits;
+ ext4_ext_drop_refs(path);
+ kfree(path);
+
+ if (desc_size_pos < sizeof(desc_size_disk))
+ goto bad;
+ desc_size_pos -= sizeof(desc_size_disk);
+
+ err = pagecache_read(inode, &desc_size_disk, sizeof(desc_size_disk),
+ desc_size_pos);
+ if (err)
+ return err;
+ desc_size = le32_to_cpu(desc_size_disk);
+
+ /*
+ * The descriptor is stored just before the desc_size_disk, but starting
+ * on a filesystem block boundary.
+ */
+
+ if (desc_size > INT_MAX || desc_size > desc_size_pos)
+ goto bad;
+
+ desc_pos = round_down(desc_size_pos - desc_size, i_blocksize(inode));
+ if (desc_pos < ext4_verity_metadata_pos(inode))
+ goto bad;
+
+ *desc_size_ret = desc_size;
+ *desc_pos_ret = desc_pos;
+ return 0;
+
+bad:
+ EXT4_ERROR_INODE(inode, "verity file corrupted; can't find descriptor");
+ return -EFSCORRUPTED;
+}
+
+static int ext4_get_verity_descriptor(struct inode *inode, void *buf,
+ size_t buf_size)
+{
+ size_t desc_size = 0;
+ u64 desc_pos = 0;
+ int err;
+
+ err = ext4_get_verity_descriptor_location(inode, &desc_size, &desc_pos);
+ if (err)
+ return err;
+
+ if (buf_size) {
+ if (desc_size > buf_size)
+ return -ERANGE;
+ err = pagecache_read(inode, buf, desc_size, desc_pos);
+ if (err)
+ return err;
+ }
+ return desc_size;
+}
+
+static struct page *ext4_read_merkle_tree_page(struct inode *inode,
+ pgoff_t index)
+{
+ index += ext4_verity_metadata_pos(inode) >> PAGE_SHIFT;
+
+ return read_mapping_page(inode->i_mapping, index, NULL);
+}
+
+static int ext4_write_merkle_tree_block(struct inode *inode, const void *buf,
+ u64 index, int log_blocksize)
+{
+ loff_t pos = ext4_verity_metadata_pos(inode) + (index << log_blocksize);
+
+ return pagecache_write(inode, buf, 1 << log_blocksize, pos);
+}
+
+const struct fsverity_operations ext4_verityops = {
+ .begin_enable_verity = ext4_begin_enable_verity,
+ .end_enable_verity = ext4_end_enable_verity,
+ .get_verity_descriptor = ext4_get_verity_descriptor,
+ .read_merkle_tree_page = ext4_read_merkle_tree_page,
+ .write_merkle_tree_block = ext4_write_merkle_tree_block,
+};
--
2.22.0.410.gd8fdbe21b5-goog

2019-06-20 20:55:43

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v5 07/16] fs-verity: add the hook for file ->open()

From: Eric Biggers <[email protected]>

Add the fsverity_file_open() function, which prepares an fs-verity file
to be read from. If not already done, it loads the fs-verity descriptor
from the filesystem and sets up an fsverity_info structure for the inode
which describes the Merkle tree and contains the file measurement. It
also denies all attempts to open verity files for writing.

This commit also begins the include/linux/fsverity.h header, which
declares the interface between fs/verity/ and filesystems.

Reviewed-by: Theodore Ts'o <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
fs/verity/Makefile | 3 +-
fs/verity/fsverity_private.h | 54 +++++-
fs/verity/init.c | 6 +
fs/verity/open.c | 319 +++++++++++++++++++++++++++++++++++
include/linux/fsverity.h | 71 ++++++++
5 files changed, 450 insertions(+), 3 deletions(-)
create mode 100644 fs/verity/open.c
create mode 100644 include/linux/fsverity.h

diff --git a/fs/verity/Makefile b/fs/verity/Makefile
index 398f3f85fa184b..e6a8951c493a5e 100644
--- a/fs/verity/Makefile
+++ b/fs/verity/Makefile
@@ -1,4 +1,5 @@
# SPDX-License-Identifier: GPL-2.0

obj-$(CONFIG_FS_VERITY) += hash_algs.o \
- init.o
+ init.o \
+ open.o
diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index 9697aaebb5dc1f..c79746ff335e14 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -15,8 +15,7 @@
#define pr_fmt(fmt) "fs-verity: " fmt

#include <crypto/sha.h>
-#include <linux/fs.h>
-#include <uapi/linux/fsverity.h>
+#include <linux/fsverity.h>

struct ahash_request;

@@ -59,6 +58,40 @@ struct merkle_tree_params {
u64 level_start[FS_VERITY_MAX_LEVELS];
};

+/**
+ * fsverity_info - cached verity metadata for an inode
+ *
+ * When a verity file is first opened, an instance of this struct is allocated
+ * and stored in ->i_verity_info; it remains until the inode is evicted. It
+ * caches information about the Merkle tree that's needed to efficiently verify
+ * data read from the file. It also caches the file measurement. The Merkle
+ * tree pages themselves are not cached here, but the filesystem may cache them.
+ */
+struct fsverity_info {
+ struct merkle_tree_params tree_params;
+ u8 root_hash[FS_VERITY_MAX_DIGEST_SIZE];
+ u8 measurement[FS_VERITY_MAX_DIGEST_SIZE];
+ const struct inode *inode;
+};
+
+/*
+ * Merkle tree properties. The file measurement is the hash of this structure.
+ */
+struct fsverity_descriptor {
+ __u8 version; /* must be 1 */
+ __u8 hash_algorithm; /* Merkle tree hash algorithm */
+ __u8 log_blocksize; /* log2 of size of data and tree blocks */
+ __u8 salt_size; /* size of salt in bytes; 0 if none */
+ __le32 sig_size; /* reserved, must be 0 */
+ __le64 data_size; /* size of file the Merkle tree is built over */
+ __u8 root_hash[64]; /* Merkle tree root hash */
+ __u8 salt[32]; /* salt prepended to each hashed block */
+ __u8 __reserved[144]; /* must be 0's */
+};
+
+/* Arbitrary limit to bound the kmalloc() size. Can be changed. */
+#define FS_VERITY_MAX_DESCRIPTOR_SIZE 16384
+
/* hash_algs.c */

extern struct fsverity_hash_alg fsverity_hash_algs[];
@@ -85,4 +118,21 @@ fsverity_msg(const struct inode *inode, const char *level,
#define fsverity_err(inode, fmt, ...) \
fsverity_msg((inode), KERN_ERR, fmt, ##__VA_ARGS__)

+/* open.c */
+
+int fsverity_init_merkle_tree_params(struct merkle_tree_params *params,
+ const struct inode *inode,
+ unsigned int hash_algorithm,
+ unsigned int log_blocksize,
+ const u8 *salt, size_t salt_size);
+
+struct fsverity_info *fsverity_create_info(const struct inode *inode,
+ const void *desc, size_t desc_size);
+
+void fsverity_set_info(struct inode *inode, struct fsverity_info *vi);
+
+void fsverity_free_info(struct fsverity_info *vi);
+
+int __init fsverity_init_info_cache(void);
+
#endif /* _FSVERITY_PRIVATE_H */
diff --git a/fs/verity/init.c b/fs/verity/init.c
index 40076bbe452a48..fff1fd6343357d 100644
--- a/fs/verity/init.c
+++ b/fs/verity/init.c
@@ -33,8 +33,14 @@ void fsverity_msg(const struct inode *inode, const char *level,

static int __init fsverity_init(void)
{
+ int err;
+
fsverity_check_hash_algs();

+ err = fsverity_init_info_cache();
+ if (err)
+ return err;
+
pr_debug("Initialized fs-verity\n");
return 0;
}
diff --git a/fs/verity/open.c b/fs/verity/open.c
new file mode 100644
index 00000000000000..3a3bb27e23f5e3
--- /dev/null
+++ b/fs/verity/open.c
@@ -0,0 +1,319 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fs/verity/open.c: opening fs-verity files
+ *
+ * Copyright 2019 Google LLC
+ */
+
+#include "fsverity_private.h"
+
+#include <linux/slab.h>
+
+static struct kmem_cache *fsverity_info_cachep;
+
+/**
+ * fsverity_init_merkle_tree_params() - initialize Merkle tree parameters
+ * @params: the parameters struct to initialize
+ * @inode: the inode for which the Merkle tree is being built
+ * @hash_algorithm: number of hash algorithm to use
+ * @log_blocksize: log base 2 of block size to use
+ * @salt: pointer to salt (optional)
+ * @salt_size: size of salt, possibly 0
+ *
+ * Validate the hash algorithm and block size, then compute the tree topology
+ * (num levels, num blocks in each level, etc.) and initialize @params.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int fsverity_init_merkle_tree_params(struct merkle_tree_params *params,
+ const struct inode *inode,
+ unsigned int hash_algorithm,
+ unsigned int log_blocksize,
+ const u8 *salt, size_t salt_size)
+{
+ const struct fsverity_hash_alg *hash_alg;
+ int err;
+ u64 blocks;
+ u64 offset;
+ int level;
+
+ memset(params, 0, sizeof(*params));
+
+ hash_alg = fsverity_get_hash_alg(inode, hash_algorithm);
+ if (IS_ERR(hash_alg))
+ return PTR_ERR(hash_alg);
+ params->hash_alg = hash_alg;
+ params->digest_size = hash_alg->digest_size;
+
+ params->hashstate = fsverity_prepare_hash_state(hash_alg, salt,
+ salt_size);
+ if (IS_ERR(params->hashstate)) {
+ err = PTR_ERR(params->hashstate);
+ params->hashstate = NULL;
+ fsverity_err(inode, "Error %d preparing hash state", err);
+ goto out_err;
+ }
+
+ if (log_blocksize != PAGE_SHIFT) {
+ fsverity_warn(inode, "Unsupported log_blocksize: %u",
+ log_blocksize);
+ err = -EINVAL;
+ goto out_err;
+ }
+ params->log_blocksize = log_blocksize;
+ params->block_size = 1 << log_blocksize;
+
+ if (WARN_ON(!is_power_of_2(params->digest_size))) {
+ err = -EINVAL;
+ goto out_err;
+ }
+ if (params->block_size < 2 * params->digest_size) {
+ fsverity_warn(inode,
+ "Merkle tree block size (%u) too small for hash algorithm \"%s\"",
+ params->block_size, hash_alg->name);
+ err = -EINVAL;
+ goto out_err;
+ }
+ params->log_arity = params->log_blocksize - ilog2(params->digest_size);
+ params->hashes_per_block = 1 << params->log_arity;
+
+ pr_debug("Merkle tree uses %s with %u-byte blocks (%u hashes/block), salt=%*phN\n",
+ hash_alg->name, params->block_size, params->hashes_per_block,
+ (int)salt_size, salt);
+
+ /*
+ * Compute the number of levels in the Merkle tree and create a map from
+ * level to the starting block of that level. Level 'num_levels - 1' is
+ * the root and is stored first. Level 0 is the level directly "above"
+ * the data blocks and is stored last.
+ */
+
+ /* Compute number of levels and the number of blocks in each level */
+ blocks = (inode->i_size + params->block_size - 1) >> log_blocksize;
+ pr_debug("Data is %lld bytes (%llu blocks)\n", inode->i_size, blocks);
+ while (blocks > 1) {
+ if (params->num_levels >= FS_VERITY_MAX_LEVELS) {
+ fsverity_err(inode, "Too many levels in Merkle tree");
+ err = -EINVAL;
+ goto out_err;
+ }
+ blocks = (blocks + params->hashes_per_block - 1) >>
+ params->log_arity;
+ /* temporarily using level_start[] to store blocks in level */
+ params->level_start[params->num_levels++] = blocks;
+ }
+
+ /* Compute the starting block of each level */
+ offset = 0;
+ for (level = (int)params->num_levels - 1; level >= 0; level--) {
+ blocks = params->level_start[level];
+ params->level_start[level] = offset;
+ pr_debug("Level %d is %llu blocks starting at index %llu\n",
+ level, blocks, offset);
+ offset += blocks;
+ }
+
+ params->tree_size = offset << log_blocksize;
+ return 0;
+
+out_err:
+ kfree(params->hashstate);
+ memset(params, 0, sizeof(*params));
+ return err;
+}
+
+/* Compute the file measurement by hashing the fsverity_descriptor. */
+static int compute_file_measurement(const struct fsverity_hash_alg *hash_alg,
+ const struct fsverity_descriptor *desc,
+ u8 *measurement)
+{
+ return fsverity_hash_buffer(hash_alg, desc, sizeof(*desc), measurement);
+}
+
+/*
+ * Validate the given fsverity_descriptor and create a new fsverity_info from
+ * it.
+ */
+struct fsverity_info *fsverity_create_info(const struct inode *inode,
+ const void *_desc, size_t desc_size)
+{
+ const struct fsverity_descriptor *desc = _desc;
+ struct fsverity_info *vi;
+ int err;
+
+ if (desc_size < sizeof(*desc)) {
+ fsverity_err(inode, "Unrecognized descriptor size (%zu)",
+ desc_size);
+ return ERR_PTR(-EINVAL);
+ }
+
+ if (desc->version != 1) {
+ fsverity_err(inode, "Unrecognized descriptor version: %u",
+ desc->version);
+ return ERR_PTR(-EINVAL);
+ }
+
+ if (desc->sig_size ||
+ memchr_inv(desc->__reserved, 0, sizeof(desc->__reserved))) {
+ fsverity_err(inode, "Reserved bits set in descriptor");
+ return ERR_PTR(-EINVAL);
+ }
+
+ if (desc->salt_size > sizeof(desc->salt)) {
+ fsverity_err(inode, "Invalid salt_size: %u", desc->salt_size);
+ return ERR_PTR(-EINVAL);
+ }
+
+ if (le64_to_cpu(desc->data_size) != inode->i_size) {
+ fsverity_err(inode,
+ "Wrong data_size: %llu (desc) != %lld (inode)",
+ le64_to_cpu(desc->data_size), inode->i_size);
+ return ERR_PTR(-EINVAL);
+ }
+
+ vi = kmem_cache_zalloc(fsverity_info_cachep, GFP_KERNEL);
+ if (!vi)
+ return ERR_PTR(-ENOMEM);
+ vi->inode = inode;
+
+ err = fsverity_init_merkle_tree_params(&vi->tree_params, inode,
+ desc->hash_algorithm,
+ desc->log_blocksize,
+ desc->salt, desc->salt_size);
+ if (err) {
+ fsverity_err(inode,
+ "Error %d initializing Merkle tree parameters",
+ err);
+ goto out;
+ }
+
+ memcpy(vi->root_hash, desc->root_hash, vi->tree_params.digest_size);
+
+ err = compute_file_measurement(vi->tree_params.hash_alg, desc,
+ vi->measurement);
+ if (err) {
+ fsverity_err(vi->inode, "Error %d computing file measurement",
+ err);
+ goto out;
+ }
+ pr_debug("Computed file measurement: %s:%*phN\n",
+ vi->tree_params.hash_alg->name,
+ vi->tree_params.digest_size, vi->measurement);
+out:
+ if (err) {
+ fsverity_free_info(vi);
+ vi = ERR_PTR(err);
+ }
+ return vi;
+}
+
+void fsverity_set_info(struct inode *inode, struct fsverity_info *vi)
+{
+ /*
+ * Multiple processes may race to set ->i_verity_info, so use cmpxchg.
+ * This pairs with the READ_ONCE() in fsverity_get_info().
+ */
+ if (cmpxchg_release(&inode->i_verity_info, NULL, vi) != NULL)
+ fsverity_free_info(vi);
+}
+
+void fsverity_free_info(struct fsverity_info *vi)
+{
+ if (!vi)
+ return;
+ kfree(vi->tree_params.hashstate);
+ kmem_cache_free(fsverity_info_cachep, vi);
+}
+
+/* Ensure the inode has an ->i_verity_info */
+static int ensure_verity_info(struct inode *inode)
+{
+ struct fsverity_info *vi = fsverity_get_info(inode);
+ struct fsverity_descriptor *desc;
+ int res;
+
+ if (vi)
+ return 0;
+
+ res = inode->i_sb->s_vop->get_verity_descriptor(inode, NULL, 0);
+ if (res < 0) {
+ fsverity_err(inode,
+ "Error %d getting verity descriptor size", res);
+ return res;
+ }
+ if (res > FS_VERITY_MAX_DESCRIPTOR_SIZE) {
+ fsverity_err(inode, "Verity descriptor is too large (%d bytes)",
+ res);
+ return -EMSGSIZE;
+ }
+ desc = kmalloc(res, GFP_KERNEL);
+ if (!desc)
+ return -ENOMEM;
+ res = inode->i_sb->s_vop->get_verity_descriptor(inode, desc, res);
+ if (res < 0) {
+ fsverity_err(inode, "Error %d reading verity descriptor", res);
+ goto out_free_desc;
+ }
+
+ vi = fsverity_create_info(inode, desc, res);
+ if (IS_ERR(vi)) {
+ res = PTR_ERR(vi);
+ goto out_free_desc;
+ }
+
+ fsverity_set_info(inode, vi);
+ res = 0;
+out_free_desc:
+ kfree(desc);
+ return res;
+}
+
+/**
+ * fsverity_file_open - prepare to open a verity file
+ * @inode: the inode being opened
+ * @filp: the struct file being set up
+ *
+ * When opening a verity file, deny the open if it is for writing. Otherwise,
+ * set up the inode's ->i_verity_info if not already done.
+ *
+ * When combined with fscrypt, this must be called after fscrypt_file_open().
+ * Otherwise, we won't have the key set up to decrypt the verity metadata.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int fsverity_file_open(struct inode *inode, struct file *filp)
+{
+ if (!IS_VERITY(inode))
+ return 0;
+
+ if (filp->f_mode & FMODE_WRITE) {
+ pr_debug("Denying opening verity file (ino %lu) for write\n",
+ inode->i_ino);
+ return -EPERM;
+ }
+
+ return ensure_verity_info(inode);
+}
+EXPORT_SYMBOL_GPL(fsverity_file_open);
+
+/**
+ * fsverity_cleanup_inode - free the inode's verity info, if present
+ *
+ * Filesystems must call this on inode eviction to free ->i_verity_info.
+ */
+void fsverity_cleanup_inode(struct inode *inode)
+{
+ fsverity_free_info(inode->i_verity_info);
+ inode->i_verity_info = NULL;
+}
+EXPORT_SYMBOL_GPL(fsverity_cleanup_inode);
+
+int __init fsverity_init_info_cache(void)
+{
+ fsverity_info_cachep = KMEM_CACHE_USERCOPY(fsverity_info,
+ SLAB_RECLAIM_ACCOUNT,
+ measurement);
+ if (!fsverity_info_cachep)
+ return -ENOMEM;
+ return 0;
+}
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
new file mode 100644
index 00000000000000..1372c236c8770c
--- /dev/null
+++ b/include/linux/fsverity.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * fs-verity: read-only file-based authenticity protection
+ *
+ * This header declares the interface between the fs/verity/ support layer and
+ * filesystems that support fs-verity.
+ *
+ * Copyright 2019 Google LLC
+ */
+
+#ifndef _LINUX_FSVERITY_H
+#define _LINUX_FSVERITY_H
+
+#include <linux/fs.h>
+#include <uapi/linux/fsverity.h>
+
+/* Verity operations for filesystems */
+struct fsverity_operations {
+
+ /**
+ * Get the verity descriptor of the given inode.
+ *
+ * @inode: an inode with the S_VERITY flag set
+ * @buf: buffer in which to place the verity descriptor
+ * @bufsize: size of @buf, or 0 to retrieve the size only
+ *
+ * If bufsize == 0, then the size of the verity descriptor is returned.
+ * Otherwise the verity descriptor is written to 'buf' and its actual
+ * size is returned; -ERANGE is returned if it's too large. This may be
+ * called by multiple processes concurrently on the same inode.
+ *
+ * Return: the size on success, -errno on failure
+ */
+ int (*get_verity_descriptor)(struct inode *inode, void *buf,
+ size_t bufsize);
+};
+
+#ifdef CONFIG_FS_VERITY
+
+static inline struct fsverity_info *fsverity_get_info(const struct inode *inode)
+{
+ /* pairs with the cmpxchg_release() in fsverity_set_info() */
+ return READ_ONCE(inode->i_verity_info);
+}
+
+/* open.c */
+
+extern int fsverity_file_open(struct inode *inode, struct file *filp);
+extern void fsverity_cleanup_inode(struct inode *inode);
+
+#else /* !CONFIG_FS_VERITY */
+
+static inline struct fsverity_info *fsverity_get_info(const struct inode *inode)
+{
+ return NULL;
+}
+
+/* open.c */
+
+static inline int fsverity_file_open(struct inode *inode, struct file *filp)
+{
+ return IS_VERITY(inode) ? -EOPNOTSUPP : 0;
+}
+
+static inline void fsverity_cleanup_inode(struct inode *inode)
+{
+}
+
+#endif /* !CONFIG_FS_VERITY */
+
+#endif /* _LINUX_FSVERITY_H */
--
2.22.0.410.gd8fdbe21b5-goog

2019-06-20 20:55:51

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v5 02/16] fs-verity: add MAINTAINERS file entry

From: Eric Biggers <[email protected]>

fs-verity will be jointly maintained by Eric Biggers and Theodore Ts'o.

Reviewed-by: Theodore Ts'o <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
MAINTAINERS | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index a6954776a37e70..655065116f9228 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6505,6 +6505,18 @@ S: Maintained
F: fs/notify/
F: include/linux/fsnotify*.h

+FSVERITY: READ-ONLY FILE-BASED AUTHENTICITY PROTECTION
+M: Eric Biggers <[email protected]>
+M: Theodore Y. Ts'o <[email protected]>
+L: [email protected]
+Q: https://patchwork.kernel.org/project/linux-fscrypt/list/
+T: git git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt.git fsverity
+S: Supported
+F: fs/verity/
+F: include/linux/fsverity.h
+F: include/uapi/linux/fsverity.h
+F: Documentation/filesystems/fsverity.rst
+
FUJITSU LAPTOP EXTRAS
M: Jonathan Woithe <[email protected]>
L: [email protected]
--
2.22.0.410.gd8fdbe21b5-goog

2019-06-20 20:55:51

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v5 10/16] fs-verity: implement FS_IOC_ENABLE_VERITY ioctl

From: Eric Biggers <[email protected]>

Add a function for filesystems to call to implement the
FS_IOC_ENABLE_VERITY ioctl. This ioctl enables fs-verity on a file.

See the "FS_IOC_ENABLE_VERITY" section of
Documentation/filesystems/fsverity.rst for the documentation.

Signed-off-by: Eric Biggers <[email protected]>
---
fs/verity/Makefile | 3 +-
fs/verity/enable.c | 341 +++++++++++++++++++++++++++++++++++++++
include/linux/fsverity.h | 64 ++++++++
3 files changed, 407 insertions(+), 1 deletion(-)
create mode 100644 fs/verity/enable.c

diff --git a/fs/verity/Makefile b/fs/verity/Makefile
index 7fa628cd5eba24..04b37475fd280a 100644
--- a/fs/verity/Makefile
+++ b/fs/verity/Makefile
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: GPL-2.0

-obj-$(CONFIG_FS_VERITY) += hash_algs.o \
+obj-$(CONFIG_FS_VERITY) += enable.o \
+ hash_algs.o \
init.o \
open.o \
verify.o
diff --git a/fs/verity/enable.c b/fs/verity/enable.c
new file mode 100644
index 00000000000000..144721bbe4aab9
--- /dev/null
+++ b/fs/verity/enable.c
@@ -0,0 +1,341 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fs/verity/enable.c: ioctl to enable verity on a file
+ *
+ * Copyright 2019 Google LLC
+ */
+
+#include "fsverity_private.h"
+
+#include <crypto/hash.h>
+#include <linux/mount.h>
+#include <linux/pagemap.h>
+#include <linux/sched/signal.h>
+#include <linux/uaccess.h>
+
+static int build_merkle_tree_level(struct inode *inode, unsigned int level,
+ u64 num_blocks_to_hash,
+ const struct merkle_tree_params *params,
+ u8 *pending_hashes,
+ struct ahash_request *req)
+{
+ const struct fsverity_operations *vops = inode->i_sb->s_vop;
+ unsigned int pending_size = 0;
+ u64 dst_block_num;
+ u64 i;
+ int err;
+
+ if (WARN_ON(params->block_size != PAGE_SIZE)) /* checked earlier too */
+ return -EINVAL;
+
+ if (level < params->num_levels) {
+ dst_block_num = params->level_start[level];
+ } else {
+ if (WARN_ON(num_blocks_to_hash != 1))
+ return -EINVAL;
+ dst_block_num = 0; /* unused */
+ }
+
+ for (i = 0; i < num_blocks_to_hash; i++) {
+ struct page *src_page;
+
+ if ((pgoff_t)i % 10000 == 0 || i + 1 == num_blocks_to_hash)
+ pr_debug("Hashing block %llu of %llu for level %u\n",
+ i + 1, num_blocks_to_hash, level);
+
+ if (level == 0)
+ /* Leaf: hashing a data block */
+ src_page = read_mapping_page(inode->i_mapping, i, NULL);
+ else
+ /* Non-leaf: hashing hash block from level below */
+ src_page = vops->read_merkle_tree_page(inode,
+ params->level_start[level - 1] + i);
+ if (IS_ERR(src_page)) {
+ err = PTR_ERR(src_page);
+ fsverity_err(inode,
+ "Error %d reading Merkle tree page %llu",
+ err, params->level_start[level - 1] + i);
+ return err;
+ }
+
+ err = fsverity_hash_page(params, inode, req, src_page,
+ &pending_hashes[pending_size]);
+ put_page(src_page);
+ if (err)
+ return err;
+ pending_size += params->digest_size;
+
+ if (level == params->num_levels) /* Root hash? */
+ return 0;
+
+ if (pending_size + params->digest_size > params->block_size ||
+ i + 1 == num_blocks_to_hash) {
+ /* Flush the pending hash block */
+ memset(&pending_hashes[pending_size], 0,
+ params->block_size - pending_size);
+ err = vops->write_merkle_tree_block(inode,
+ pending_hashes,
+ dst_block_num,
+ params->log_blocksize);
+ if (err) {
+ fsverity_err(inode,
+ "Error %d writing Merkle tree block %llu",
+ err, dst_block_num);
+ return err;
+ }
+ dst_block_num++;
+ pending_size = 0;
+ }
+
+ if (fatal_signal_pending(current))
+ return -EINTR;
+ cond_resched();
+ }
+ return 0;
+}
+
+/*
+ * Build the Merkle tree for the given inode using the given parameters, and
+ * return the root hash in @root_hash.
+ *
+ * The tree is written to a filesystem-specific location as determined by the
+ * ->write_merkle_tree_block() method. However, the blocks that comprise the
+ * tree are the same for all filesystems.
+ */
+static int build_merkle_tree(struct inode *inode,
+ const struct merkle_tree_params *params,
+ u8 *root_hash)
+{
+ u8 *pending_hashes;
+ struct ahash_request *req;
+ u64 blocks;
+ unsigned int level;
+ int err = -ENOMEM;
+
+ if (inode->i_size == 0) {
+ /* Empty file is a special case; root hash is all 0's */
+ memset(root_hash, 0, params->digest_size);
+ return 0;
+ }
+
+ pending_hashes = kmalloc(params->block_size, GFP_KERNEL);
+ req = ahash_request_alloc(params->hash_alg->tfm, GFP_KERNEL);
+ if (!pending_hashes || !req)
+ goto out;
+
+ /*
+ * Build each level of the Merkle tree, starting at the leaf level
+ * (level 0) and ascending to the root node (level 'num_levels - 1').
+ * Then at the end (level 'num_levels'), calculate the root hash.
+ */
+ blocks = (inode->i_size + params->block_size - 1) >>
+ params->log_blocksize;
+ for (level = 0; level <= params->num_levels; level++) {
+ err = build_merkle_tree_level(inode, level, blocks, params,
+ pending_hashes, req);
+ if (err)
+ goto out;
+ blocks = (blocks + params->hashes_per_block - 1) >>
+ params->log_arity;
+ }
+ memcpy(root_hash, pending_hashes, params->digest_size);
+ err = 0;
+out:
+ kfree(pending_hashes);
+ ahash_request_free(req);
+ return err;
+}
+
+static int enable_verity(struct file *filp,
+ const struct fsverity_enable_arg *arg)
+{
+ struct inode *inode = file_inode(filp);
+ const struct fsverity_operations *vops = inode->i_sb->s_vop;
+ struct merkle_tree_params params = { };
+ struct fsverity_descriptor *desc;
+ size_t desc_size = sizeof(*desc);
+ struct fsverity_info *vi;
+ int err;
+
+ /* Start initializing the fsverity_descriptor */
+ desc = kzalloc(desc_size, GFP_KERNEL);
+ if (!desc)
+ return -ENOMEM;
+ desc->version = 1;
+ desc->hash_algorithm = arg->hash_algorithm;
+ desc->log_blocksize = ilog2(arg->block_size);
+
+ /* Get the salt if the user provided one */
+ if (arg->salt_size &&
+ copy_from_user(desc->salt,
+ (const u8 __user *)(uintptr_t)arg->salt_ptr,
+ arg->salt_size)) {
+ err = -EFAULT;
+ goto out;
+ }
+ desc->salt_size = arg->salt_size;
+
+ desc->data_size = cpu_to_le64(inode->i_size);
+
+ pr_debug("Building Merkle tree...\n");
+
+ /* Prepare the Merkle tree parameters */
+ err = fsverity_init_merkle_tree_params(&params, inode,
+ arg->hash_algorithm,
+ desc->log_blocksize,
+ desc->salt, desc->salt_size);
+ if (err)
+ goto out;
+
+ /* Tell the filesystem that verity is being enabled on the file */
+ err = vops->begin_enable_verity(filp);
+ if (err)
+ goto out;
+
+ /* Build the Merkle tree */
+ BUILD_BUG_ON(sizeof(desc->root_hash) < FS_VERITY_MAX_DIGEST_SIZE);
+ err = build_merkle_tree(inode, &params, desc->root_hash);
+ if (err) {
+ fsverity_err(inode, "Error %d building Merkle tree", err);
+ goto rollback;
+ }
+ pr_debug("Done building Merkle tree. Root hash is %s:%*phN\n",
+ params.hash_alg->name, params.digest_size, desc->root_hash);
+
+ /*
+ * Create the fsverity_info. Don't bother trying to save work by
+ * reusing the merkle_tree_params from above. Instead, just create the
+ * fsverity_info from the fsverity_descriptor as if it were just loaded
+ * from disk. This is simpler, and it serves as an extra check that the
+ * metadata we're writing is valid before actually enabling verity.
+ */
+ vi = fsverity_create_info(inode, desc, desc_size);
+ if (IS_ERR(vi)) {
+ err = PTR_ERR(vi);
+ goto rollback;
+ }
+
+ /* Tell the filesystem to finish enabling verity on the file */
+ err = vops->end_enable_verity(filp, desc, desc_size, params.tree_size);
+ if (err) {
+ fsverity_err(inode, "%ps() failed with err %d",
+ vops->end_enable_verity, err);
+ fsverity_free_info(vi);
+ } else if (WARN_ON(!IS_VERITY(inode))) {
+ err = -EINVAL;
+ fsverity_free_info(vi);
+ } else {
+ /* Successfully enabled verity */
+
+ /*
+ * Readers can start using ->i_verity_info immediately, so it
+ * can't be rolled back once set. So don't set it until just
+ * after the filesystem has successfully enabled verity.
+ */
+ fsverity_set_info(inode, vi);
+ }
+out:
+ kfree(params.hashstate);
+ kfree(desc);
+ return err;
+
+rollback:
+ (void)vops->end_enable_verity(filp, NULL, 0, params.tree_size);
+ goto out;
+}
+
+/**
+ * fsverity_ioctl_enable() - enable verity on a file
+ *
+ * Enable fs-verity on a file. See the "FS_IOC_ENABLE_VERITY" section of
+ * Documentation/filesystems/fsverity.rst for the documentation.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int fsverity_ioctl_enable(struct file *filp, const void __user *uarg)
+{
+ struct inode *inode = file_inode(filp);
+ struct fsverity_enable_arg arg;
+ int err;
+
+ if (copy_from_user(&arg, uarg, sizeof(arg)))
+ return -EFAULT;
+
+ if (arg.version != 1)
+ return -EINVAL;
+
+ if (arg.__reserved1 ||
+ memchr_inv(arg.__reserved2, 0, sizeof(arg.__reserved2)))
+ return -EINVAL;
+
+ if (arg.block_size != PAGE_SIZE)
+ return -EINVAL;
+
+ if (arg.salt_size > FIELD_SIZEOF(struct fsverity_descriptor, salt))
+ return -EMSGSIZE;
+
+ if (arg.sig_size)
+ return -EINVAL;
+
+ /*
+ * Require a regular file with write access. But the actual fd must
+ * still be readonly so that we can lock out all writers. This is
+ * needed to guarantee that no writable fds exist to the file once it
+ * has verity enabled, and to stabilize the data being hashed.
+ */
+
+ err = inode_permission(inode, MAY_WRITE);
+ if (err)
+ return err;
+
+ if (IS_APPEND(inode))
+ return -EPERM;
+
+ if (S_ISDIR(inode->i_mode))
+ return -EISDIR;
+
+ if (!S_ISREG(inode->i_mode))
+ return -EINVAL;
+
+ err = mnt_want_write_file(filp);
+ if (err) /* -EROFS */
+ return err;
+
+ err = deny_write_access(filp);
+ if (err) /* -ETXTBSY */
+ goto out_drop_write;
+
+ inode_lock(inode);
+
+ if (IS_VERITY(inode)) {
+ err = -EEXIST;
+ goto out_unlock;
+ }
+
+ err = enable_verity(filp, &arg);
+ if (err)
+ goto out_unlock;
+
+ /*
+ * Some pages of the file may have been evicted from pagecache after
+ * being used in the Merkle tree construction, then read into pagecache
+ * again by another process reading from the file concurrently. Since
+ * these pages didn't undergo verification against the file measurement
+ * which fs-verity now claims to be enforcing, we have to wipe the
+ * pagecache to ensure that all future reads are verified.
+ */
+ filemap_write_and_wait(inode->i_mapping);
+ truncate_inode_pages(inode->i_mapping, 0);
+
+ /*
+ * allow_write_access() is needed to pair with deny_write_access().
+ * Regardless, the filesystem won't allow writing to verity files.
+ */
+out_unlock:
+ inode_unlock(inode);
+ allow_write_access(filp);
+out_drop_write:
+ mnt_drop_write_file(filp);
+ return err;
+}
+EXPORT_SYMBOL_GPL(fsverity_ioctl_enable);
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index ecd47e748c7f64..7ef2ef82653409 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -17,6 +17,42 @@
/* Verity operations for filesystems */
struct fsverity_operations {

+ /**
+ * Begin enabling verity on the given file.
+ *
+ * @filp: a readonly file descriptor for the file
+ *
+ * The filesystem must do any needed filesystem-specific preparations
+ * for enabling verity, e.g. evicting inline data.
+ *
+ * i_rwsem is held for write.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+ int (*begin_enable_verity)(struct file *filp);
+
+ /**
+ * End enabling verity on the given file.
+ *
+ * @filp: a readonly file descriptor for the file
+ * @desc: the verity descriptor to write, or NULL on failure
+ * @desc_size: size of verity descriptor, or 0 on failure
+ * @merkle_tree_size: total bytes the Merkle tree took up
+ *
+ * If desc == NULL, then enabling verity failed and the filesystem only
+ * must do any necessary cleanups. Else, it must also store the given
+ * verity descriptor to a fs-specific location associated with the inode
+ * and do any fs-specific actions needed to mark the inode as a verity
+ * inode, e.g. setting a bit in the on-disk inode. The filesystem is
+ * also responsible for setting the S_VERITY flag in the VFS inode.
+ *
+ * i_rwsem is held for write.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+ int (*end_enable_verity)(struct file *filp, const void *desc,
+ size_t desc_size, u64 merkle_tree_size);
+
/**
* Get the verity descriptor of the given inode.
*
@@ -50,6 +86,22 @@ struct fsverity_operations {
*/
struct page *(*read_merkle_tree_page)(struct inode *inode,
pgoff_t index);
+
+ /**
+ * Write a Merkle tree block to the given inode.
+ *
+ * @inode: the inode for which the Merkle tree is being built
+ * @buf: block to write
+ * @index: 0-based index of the block within the Merkle tree
+ * @log_blocksize: log base 2 of the Merkle tree block size
+ *
+ * This is only called between ->begin_enable_verity() and
+ * ->end_enable_verity(). i_rwsem is held for write.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+ int (*write_merkle_tree_block)(struct inode *inode, const void *buf,
+ u64 index, int log_blocksize);
};

#ifdef CONFIG_FS_VERITY
@@ -60,6 +112,10 @@ static inline struct fsverity_info *fsverity_get_info(const struct inode *inode)
return READ_ONCE(inode->i_verity_info);
}

+/* enable.c */
+
+extern int fsverity_ioctl_enable(struct file *filp, const void __user *arg);
+
/* open.c */

extern int fsverity_file_open(struct inode *inode, struct file *filp);
@@ -79,6 +135,14 @@ static inline struct fsverity_info *fsverity_get_info(const struct inode *inode)
return NULL;
}

+/* enable.c */
+
+static inline int fsverity_ioctl_enable(struct file *filp,
+ const void __user *arg)
+{
+ return -EOPNOTSUPP;
+}
+
/* open.c */

static inline int fsverity_file_open(struct inode *inode, struct file *filp)
--
2.22.0.410.gd8fdbe21b5-goog

2019-06-20 20:56:02

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v5 01/16] fs-verity: add a documentation file

From: Eric Biggers <[email protected]>

Add a documentation file for fs-verity, covering:

- Introduction
- Use cases
- User API
- FS_IOC_ENABLE_VERITY
- FS_IOC_MEASURE_VERITY
- FS_IOC_GETFLAGS
- Accessing verity files
- File measurement computation
- Merkle tree
- fs-verity descriptor
- Built-in signature verification
- Filesystem support
- ext4
- f2fs
- Implementation details
- Verifying data
- Pagecache
- Block device based filesystems
- Userspace utility
- Tests
- FAQ

Reviewed-by: Theodore Ts'o <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
Documentation/filesystems/fsverity.rst | 710 +++++++++++++++++++++++++
Documentation/filesystems/index.rst | 1 +
2 files changed, 711 insertions(+)
create mode 100644 Documentation/filesystems/fsverity.rst

diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst
new file mode 100644
index 00000000000000..49524d7ea190e5
--- /dev/null
+++ b/Documentation/filesystems/fsverity.rst
@@ -0,0 +1,710 @@
+=======================================================
+fs-verity: read-only file-based authenticity protection
+=======================================================
+
+Introduction
+============
+
+fs-verity (``fs/verity/``) is a support layer that filesystems can
+hook into to support transparent integrity and authenticity protection
+of read-only files. Currently, it is supported by the ext4 and f2fs
+filesystems. Like fscrypt, not too much filesystem-specific code is
+needed to support fs-verity.
+
+fs-verity is similar to `dm-verity
+<https://www.kernel.org/doc/Documentation/device-mapper/verity.txt>`_
+but works on files rather than block devices. On regular files on
+filesystems supporting fs-verity, userspace can execute an ioctl that
+causes the filesystem to build a Merkle tree for the file and persist
+it to a filesystem-specific location associated with the file.
+
+After this, the file is made readonly, and all reads from the file are
+automatically verified against the file's Merkle tree. Reads of any
+corrupted data, including mmap reads, will fail.
+
+Userspace can use another ioctl to retrieve the root hash (actually
+the "file measurement", which is a hash that includes the root hash)
+that fs-verity is enforcing for the file. This ioctl executes in
+constant time, regardless of the file size.
+
+fs-verity is essentially a way to hash a file in constant time,
+subject to the caveat that reads which would violate the hash will
+fail at runtime.
+
+Use cases
+=========
+
+By itself, the base fs-verity feature only provides integrity
+protection, i.e. detection of accidental (non-malicious) corruption.
+
+However, because fs-verity makes retrieving the file hash extremely
+efficient, it's primarily meant to be used as a tool to support
+authentication (detection of malicious modifications) or auditing
+(logging file hashes before use).
+
+Trusted userspace code (e.g. operating system code running on a
+read-only partition that is itself authenticated by dm-verity) can
+authenticate the contents of an fs-verity file by using the
+`FS_IOC_MEASURE_VERITY`_ ioctl to retrieve its hash, then verifying a
+digital signature of it.
+
+A standard file hash could be used instead of fs-verity. However,
+this is inefficient if the file is large and only a small portion may
+be accessed. This is often the case for Android application package
+(APK) files, for example. These typically contain many translations,
+classes, and other resources that are infrequently or even never
+accessed on a particular device. It would be slow and wasteful to
+read and hash the entire file before starting the application.
+
+Unlike an ahead-of-time hash, fs-verity also re-verifies data each
+time it's paged in. This ensures that malicious disk firmware can't
+undetectably change the contents of the file at runtime.
+
+fs-verity does not replace or obsolete dm-verity. dm-verity should
+still be used on read-only filesystems. fs-verity is for files that
+must live on a read-write filesystem because they are independently
+updated and potentially user-installed, so dm-verity cannot be used.
+
+The base fs-verity feature is a hashing mechanism only; actually
+authenticating the files is up to userspace. However, to meet some
+users' needs, fs-verity optionally supports a simple signature
+verification mechanism where users can configure the kernel to require
+that all fs-verity files be signed by a key loaded into a keyring; see
+`Built-in signature verification`_. Support for fs-verity file hashes
+in IMA (Integrity Measurement Architecture) policies is also planned.
+
+User API
+========
+
+FS_IOC_ENABLE_VERITY
+--------------------
+
+The FS_IOC_ENABLE_VERITY ioctl enables fs-verity on a file. It takes
+in a pointer to a :c:type:`struct fsverity_enable_arg`, defined as
+follows::
+
+ struct fsverity_enable_arg {
+ __u32 version;
+ __u32 hash_algorithm;
+ __u32 block_size;
+ __u32 salt_size;
+ __u64 salt_ptr;
+ __u32 sig_size;
+ __u32 __reserved1;
+ __u64 sig_ptr;
+ __u64 __reserved2[11];
+ };
+
+This structure contains the parameters of the Merkle tree to build for
+the file, and optionally contains a signature. It must be initialized
+as follows:
+
+- ``version`` must be 1.
+- ``hash_algorithm`` must be the identifier for the hash algorithm to
+ use for the Merkle tree, such as FS_VERITY_HASH_ALG_SHA256. See
+ ``include/uapi/linux/fsverity.h`` for the list of possible values.
+- ``block_size`` must be the Merkle tree block size. Currently, this
+ must be equal to the system page size, which is usually 4096 bytes.
+ Other sizes may be supported in the future. This value is not
+ necessarily the same as the filesystem block size.
+- ``salt_size`` is the size of the salt in bytes, or 0 if no salt is
+ provided. The salt is a value that is prepended to every hashed
+ block; it can be used to personalize the hashing for a particular
+ file or device. Currently the maximum salt size is 32 bytes.
+- ``salt_ptr`` is the pointer to the salt, or NULL if no salt is
+ provided.
+- ``sig_size`` is the size of the signature in bytes, or 0 if no
+ signature is provided. Currently the signature is (somewhat
+ arbitrarily) limited to 16128 bytes. See `Built-in signature
+ verification`_ for more information.
+- ``sig_ptr`` is the pointer to the signature, or NULL if no
+ signature is provided.
+- All reserved fields must be zeroed.
+
+FS_IOC_ENABLE_VERITY causes the filesystem to build a Merkle tree for
+the file and persist it to a filesystem-specific location associated
+with the file, then mark the file as a verity file. This ioctl may
+take a long time to execute on large files, and it is interruptible by
+fatal signals.
+
+FS_IOC_ENABLE_VERITY checks for write access to the inode. However,
+it must be executed on an O_RDONLY file descriptor and no processes
+can have the file open for writing. Attempts to open the file for
+writing while this ioctl is executing will fail with ETXTBSY. (This
+is necessary to guarantee that no writable file descriptors will exist
+after verity is enabled, and to guarantee that the file's contents are
+stable while the Merkle tree is being built over it.)
+
+On success, FS_IOC_ENABLE_VERITY returns 0, and the file becomes a
+verity file. On failure (including the case of interruption by a
+fatal signal), no changes are made to the file.
+
+FS_IOC_ENABLE_VERITY can fail with the following errors:
+
+- ``EACCES``: the process does not have write access to the file
+- ``EEXIST``: the file already has verity enabled
+- ``EFAULT``: the caller provided inaccessible memory
+- ``EINTR``: the operation was interrupted by a fatal signal
+- ``EINVAL``: unsupported version, hash algorithm, or block size; or
+ reserved bits are set; or the file descriptor refers to neither a
+ regular file nor a directory.
+- ``EISDIR``: the file descriptor refers to a directory
+- ``EMSGSIZE``: the salt or signature is too long
+- ``ENOENT``: fs-verity recognizes the hash algorithm, but it's not
+ available in the kernel's crypto API as currently configured (e.g.
+ for SHA-512, missing CONFIG_CRYPTO_SHA512).
+- ``ENOTTY``: this type of filesystem does not implement fs-verity
+- ``EOPNOTSUPP``: the kernel was not configured with fs-verity
+ support; or the filesystem superblock has not had the 'verity'
+ feature enabled on it; or the filesystem does not support fs-verity
+ on this file. (See `Filesystem support`_.)
+- ``EPERM``: the file is append-only
+- ``EROFS``: the filesystem is read-only
+- ``ETXTBSY``: someone has the file open for writing. This can be the
+ caller's file descriptor, another open file descriptor, or the file
+ reference held by a writable memory map.
+
+FS_IOC_MEASURE_VERITY
+---------------------
+
+The FS_IOC_MEASURE_VERITY ioctl retrieves the measurement of a verity
+file. The file measurement is a digest that cryptographically
+identifies the file contents that are being enforced on reads.
+
+This ioctl takes in a pointer to a variable-length structure::
+
+ struct fsverity_digest {
+ __u16 digest_algorithm;
+ __u16 digest_size; /* input/output */
+ __u8 digest[];
+ };
+
+``digest_size`` is an input/output field. On input, it must be
+initialized to the number of bytes allocated for the variable-length
+``digest`` field.
+
+On success, 0 is returned and the kernel fills in the structure as
+follows:
+
+- ``digest_algorithm`` will be the hash algorithm used for the file
+ measurement. It will match ``fsverity_enable_arg::hash_algorithm``.
+- ``digest_size`` will be the size of the digest in bytes, e.g. 32
+ for SHA-256. (This can be redundant with ``digest_algorithm``.)
+- ``digest`` will be the actual bytes of the digest.
+
+FS_IOC_MEASURE_VERITY is guaranteed to execute in constant time,
+regardless of the size of the file.
+
+FS_IOC_MEASURE_VERITY can fail with the following errors:
+
+- ``EFAULT``: the caller provided inaccessible memory
+- ``ENODATA``: the file is not a verity file
+- ``ENOTTY``: this type of filesystem does not implement fs-verity
+- ``EOPNOTSUPP``: the kernel was not configured with fs-verity
+ support, or the filesystem superblock has not had the 'verity'
+ feature enabled on it. (See `Filesystem support`_.)
+- ``EOVERFLOW``: the digest is longer than the specified
+ ``digest_size`` bytes. Try providing a larger buffer.
+
+FS_IOC_GETFLAGS
+---------------
+
+The existing ioctl FS_IOC_GETFLAGS (which isn't specific to fs-verity)
+can also be used to check whether a file has fs-verity enabled or not.
+To do so, check for FS_VERITY_FL (0x00100000) in the returned flags.
+
+The verity flag is not settable via FS_IOC_SETFLAGS. You must use
+FS_IOC_ENABLE_VERITY instead, since parameters must be provided.
+
+Accessing verity files
+======================
+
+Applications can transparently access a verity file just like a
+non-verity one, with the following exceptions:
+
+- Verity files are readonly. They cannot be opened for writing or
+ truncate()d, even if the file mode bits allow it. Attempts to do
+ one of these things will fail with EPERM. However, changes to
+ metadata such as owner, mode, timestamps, and xattrs are still
+ allowed, since these are not measured by fs-verity. Verity files
+ can also still be renamed, deleted, and linked to.
+
+- Direct I/O is not supported on verity files. Attempts to use direct
+ I/O on such files will fall back to buffered I/O.
+
+- DAX (Direct Access) is not supported on verity files, because this
+ would circumvent the data verification.
+
+- Reads of data that doesn't match the verity Merkle tree will fail
+ with EIO (for read()) or SIGBUS (for mmap() reads).
+
+- If the sysctl "fs.verity.require_signatures" is set to 1 and the
+ file's verity measurement is not signed by a key in the fs-verity
+ keyring, then opening the file will fail. See `Built-in signature
+ verification`_.
+
+Direct access to the Merkle tree is not supported. Therefore, if a
+verity file is copied, or is backed up and restored, then it will lose
+its "verity"-ness. fs-verity is primarily meant for files like
+executables that are managed by a package manager.
+
+File measurement computation
+============================
+
+This section describes how fs-verity hashes the file contents using a
+Merkle tree to produce the "file measurement" which cryptographically
+identifies the file contents. This algorithm is the same for all
+filesystems that support fs-verity.
+
+Userspace only needs to be aware of this algorithm if it needs to
+compute the file measurement itself, e.g. in order to sign the file.
+
+Merkle tree
+-----------
+
+The file contents is divided into blocks, where the block size is
+configurable but is usually 4096 bytes. The end of the last block is
+zero-padded if needed. Each block is then hashed, producing the first
+level of hashes. Then, the hashes in this first level are grouped
+into 'blocksize'-byte blocks (zero-padding the ends as needed) and
+these blocks are hashed, producing the second level of hashes. This
+proceeds up the tree until only a single block remains. The hash of
+this block is the "Merkle tree root hash".
+
+If the file is nonempty and fits in one block, then the "Merkle tree
+root hash" is simply the hash of the single data block. If the file
+is empty, then the "Merkle tree root hash" is all zeroes.
+
+The "blocks" here are not necessarily the same as "filesystem blocks".
+
+If a salt was specified, then it's zero-padded to the closest multiple
+of the input size of the hash algorithm's compression function, e.g.
+64 bytes for SHA-256 or 128 bytes for SHA-512. The padded salt is
+prepended to every data or Merkle tree block that is hashed.
+
+The purpose of the block padding is to cause every hash to be taken
+over the same amount of data, which simplifies the implementation and
+keeps open more possibilities for hardware acceleration. The purpose
+of the salt padding is to make the salting "free" when the salted hash
+state is precomputed, then imported for each hash.
+
+Example: in the recommended configuration of SHA-256 and 4K blocks,
+128 hash values fit in each block. Thus, each level of the Merkle
+tree is approximately 128 times smaller than the previous, and for
+large files the Merkle tree's size converges to approximately 1/127 of
+the original file size. However, for small files, the padding is
+significant, making the space overhead proportionally more.
+
+fs-verity descriptor
+--------------------
+
+By itself, the Merkle tree root hash is ambiguous. For example, it
+can't a distinguish a large file from a small second file whose data
+is exactly the top-level hash block of the first file. Ambiguities
+also arise from the convention of padding to the next block boundary.
+
+To solve this problem, the verity file measurement is actually
+computed as a hash of the following structure, which contains the
+Merkle tree root hash as well as other fields such as the file size::
+
+ struct fsverity_descriptor {
+ __u8 version; /* must be 1 */
+ __u8 hash_algorithm; /* Merkle tree hash algorithm */
+ __u8 log_blocksize; /* log2 of size of data and tree blocks */
+ __u8 salt_size; /* size of salt in bytes; 0 if none */
+ __le32 sig_size; /* must be 0 */
+ __le64 data_size; /* size of file the Merkle tree is built over */
+ __u8 root_hash[64]; /* Merkle tree root hash */
+ __u8 salt[32]; /* salt prepended to each hashed block */
+ __u8 __reserved[144]; /* must be 0's */
+ };
+
+Note that the ``sig_size`` field must be set to 0 for the purpose of
+computing the file measurement, even if a signature was provided (or
+will be provided) to `FS_IOC_ENABLE_VERITY`_.
+
+Built-in signature verification
+===============================
+
+With CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y, fs-verity supports putting
+a portion of an authentication policy (see `Use cases`_) in the
+kernel. Specifically, it adds support for:
+
+1. At fs-verity module initialization time, a keyring ".fs-verity" is
+ created. The root user can add trusted X.509 certificates to this
+ keyring using the add_key() system call, then (when done)
+ optionally use keyctl_restrict_keyring() to prevent additional
+ certificates from being added.
+
+2. `FS_IOC_ENABLE_VERITY`_ accepts a pointer to a PKCS#7 formatted
+ signature in DER format of the file measurement. On success, this
+ signature is persisted alongside the Merkle tree. Then, any time
+ the file is opened, the kernel will verify this signature against
+ the certificates in the ".fs-verity" keyring, and verify that it
+ matches the actual file measurement.
+
+3. A new sysctl "fs.verity.require_signatures" is made available.
+ When set to 1, the kernel requires that all verity files have a
+ correctly signed file measurement as described in (2).
+
+File measurements must be signed in the following format, which is
+similar to the structure used by `FS_IOC_MEASURE_VERITY`_::
+
+ struct fsverity_signed_digest {
+ char magic[8]; /* must be "FSVerity" */
+ __le16 digest_algorithm;
+ __le16 digest_size;
+ __u8 digest[];
+ };
+
+fs-verity's built-in signature verification support is meant as a
+relatively simple mechanism that can be used to provide some level of
+authenticity protection for verity files, as an alternative to doing
+the signature verification in userspace or using IMA-appraisal.
+However, with this mechanism, userspace programs still need to check
+that the verity bit is set, and there is no protection against verity
+files being swapped around.
+
+Filesystem support
+==================
+
+fs-verity is currently supported by the ext4 and f2fs filesystems.
+The CONFIG_FS_VERITY kconfig option must be enabled to use fs-verity
+on either filesystem.
+
+``include/linux/fsverity.h`` declares the interface between the
+``fs/verity/`` support layer and filesystems. Briefly, filesystems
+must provide an ``fsverity_operations`` structure that provides
+methods to read and write the verity metadata to a filesystem-specific
+location, including the Merkle tree blocks and
+``fsverity_descriptor``. Filesystems must also call functions in
+``fs/verity/`` at certain times, such as when a file is opened or when
+pages have been read into the pagecache. (See `Verifying data`_.)
+
+ext4
+----
+
+ext4 supports fs-verity since Linux TODO and e2fsprogs v1.45.2.
+
+To create verity files on an ext4 filesystem, the filesystem must have
+been formatted with ``-O verity`` or had ``tune2fs -O verity`` run on
+it. "verity" is an RO_COMPAT filesystem feature, so once set, old
+kernels will only be able to mount the filesystem readonly, and old
+versions of e2fsck will be unable to check the filesystem. Moreover,
+currently ext4 only supports mounting a filesystem with the "verity"
+feature when its block size is equal to PAGE_SIZE (often 4096 bytes).
+
+ext4 sets the EXT4_VERITY_FL on-disk inode flag on verity files. It
+can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be cleared.
+
+ext4 also supports encryption, which can be used simultaneously with
+fs-verity. In this case, the plaintext data is verified rather than
+the ciphertext. This is necessary in order to make the file
+measurement meaningful, since every file is encrypted differently.
+
+ext4 stores the verity metadata (Merkle tree and fsverity_descriptor)
+past the end of the file, starting at the first 64K boundary beyond
+i_size. This approach works because (a) verity files are readonly,
+and (b) pages fully beyond i_size aren't visible to userspace but can
+be read/written internally by ext4 with only some relatively small
+changes to ext4. This approach avoids having to depend on the
+EA_INODE feature and on rearchitecturing ext4's xattr support to
+support paging multi-gigabyte xattrs into memory, and to support
+encrypting xattrs. Note that the verity metadata *must* be encrypted
+when the file is, since it contains hashes of the plaintext data.
+
+Currently, ext4 verity only supports the case where the Merkle tree
+block size, filesystem block size, and page size are all the same. It
+also only supports extent-based files.
+
+f2fs
+----
+
+f2fs supports fs-verity since Linux TODO and f2fs-tools v1.11.0.
+
+To create verity files on an f2fs filesystem, the filesystem must have
+been formatted with ``-O verity``.
+
+f2fs sets the FADVISE_VERITY_BIT on-disk inode flag on verity files.
+It can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be
+cleared.
+
+Like ext4, f2fs stores the verity metadata (Merkle tree and
+fsverity_descriptor) past the end of the file, starting at the first
+64K boundary beyond i_size. See explanation for ext4 above.
+Moreover, f2fs supports at most 4096 bytes of xattr entries per inode
+which wouldn't be enough for even a single Merkle tree block.
+
+Currently, f2fs verity only supports a Merkle tree block size of 4096.
+
+Implementation details
+======================
+
+Verifying data
+--------------
+
+fs-verity ensures that all reads of a verity file's data are verified,
+regardless of which syscall is used to do the read (e.g. mmap(),
+read(), pread()) and regardless of whether it's the first read or a
+later read (unless the later read can return cached data that was
+already verified). Below, we describe how filesystems implement this.
+
+Pagecache
+~~~~~~~~~
+
+For filesystems using Linux's pagecache, the ``->readpage()`` and
+``->readpages()`` methods must be modified to verify pages before they
+are marked Uptodate. Merely hooking ``->read_iter()`` would be
+insufficient, since ``->read_iter()`` is not used for memory maps.
+
+Therefore, fs/verity/ provides a function fsverity_verify_page() which
+verifies a page that has been read into the pagecache of a verity
+inode, but is still locked and not Uptodate, so it's not yet readable
+by userspace. As needed to do the verification,
+fsverity_verify_page() will call back into the filesystem to read
+Merkle tree pages via fsverity_operations::read_merkle_tree_page().
+
+fsverity_verify_page() returns false if verification failed; in this
+case, the filesystem must not set the page Uptodate. Following this,
+as per the usual Linux pagecache behavior, attempts by userspace to
+read() from the part of the file containing the page will fail with
+EIO, and accesses to the page within a memory map will raise SIGBUS.
+
+fsverity_verify_page() currently only supports the case where the
+Merkle tree block size is equal to PAGE_SIZE (often 4096 bytes).
+
+In principle, fsverity_verify_page() verifies the entire path in the
+Merkle tree from the data page to the root hash. However, for
+efficiency the filesystem may cache the hash pages. Therefore,
+fsverity_verify_page() only ascends the tree reading hash pages until
+an already-verified hash page is seen, as indicated by the PageChecked
+bit being set. It then verifies the path to that page.
+
+This optimization, which is also used by dm-verity, results in
+excellent sequential read performance. This is because usually (e.g.
+127 in 128 times for 4K blocks and SHA-256) the hash page from the
+bottom level of the tree will already be cached and checked from
+reading a previous data page. However, random reads perform worse.
+
+Block device based filesystems
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Block device based filesystems (e.g. ext4 and f2fs) in Linux also use
+the pagecache, so the above subsection applies too. However, they
+also usually read many pages from a file at once, grouped into a
+structure called a "bio". To make it easier for these types of
+filesystems to support fs-verity, fs/verity/ also provides a function
+fsverity_verify_bio() which verifies all pages in a bio.
+
+ext4 and f2fs also support encryption. If a verity file is also
+encrypted, the pages must be decrypted before being verified. To
+support this, these filesystems allocate a "post-read context" for
+each bio and store it in ``->bi_private``::
+
+ struct bio_post_read_ctx {
+ struct bio *bio;
+ struct work_struct work;
+ unsigned int cur_step;
+ unsigned int enabled_steps;
+ };
+
+``enabled_steps`` is a bitmask that specifies whether decryption,
+verity, or both is enabled. After the bio completes, for each needed
+postprocessing step the filesystem enqueues the bio_post_read_ctx on a
+workqueue, and then the workqueue work does the decryption or
+verification. Finally, pages where no decryption or verity error
+occurred are marked Uptodate, and the pages are unlocked.
+
+Files on ext4 and f2fs may contain holes. Normally, ``->readpages()``
+simply zeroes holes and sets the corresponding pages Uptodate; no bios
+are issued. To prevent this case from bypassing fs-verity, these
+filesystems use fsverity_verify_page() to verify hole pages.
+
+ext4 and f2fs disable direct I/O on verity files, since otherwise
+direct I/O would bypass fs-verity. (They also do the same for
+encrypted files.)
+
+Userspace utility
+=================
+
+This document focuses on the kernel, but a userspace utility for
+fs-verity can be found at:
+
+ https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity-utils.git
+
+See the README.md file in the fsverity-utils source tree for details,
+including examples of setting up fs-verity protected files.
+
+Tests
+=====
+
+To test fs-verity, use xfstests. For example, using `kvm-xfstests
+<https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md>`_::
+
+ kvm-xfstests -c ext4,f2fs -g verity
+
+FAQ
+===
+
+This section answers frequently asked questions about fs-verity that
+weren't already directly answered in other parts of this document.
+
+:Q: Why isn't fs-verity part of IMA?
+:A: fs-verity and IMA (Integrity Measurement Architecture) have
+ different focuses. fs-verity is a filesystem-level mechanism for
+ hashing individual files using a Merkle tree. In contrast, IMA
+ specifies a system-wide policy that specifies which files are
+ hashed and what to do with those hashes, such as log them,
+ authenticate them, or add them to a measurement list.
+
+ IMA is planned to support the fs-verity hashing mechanism as an
+ alternative to doing full file hashes, for people who want the
+ performance and security benefits of the Merkle tree based hash.
+ But it doesn't make sense to force all uses of fs-verity to be
+ through IMA. As a standalone filesystem feature, fs-verity
+ already meets many users' needs, and it's testable like other
+ filesystem features e.g. with xfstests.
+
+:Q: Isn't fs-verity useless because the attacker can just modify the
+ hashes in the Merkle tree, which is stored on-disk?
+:A: To verify the authenticity of an fs-verity file you must verify
+ the authenticity of the "file measurement", which is basically the
+ root hash of the Merkle tree. See `Use cases`_.
+
+:Q: Isn't fs-verity useless because the attacker can just replace a
+ verity file with a non-verity one?
+:A: See `Use cases`_. In the initial use case, it's really trusted
+ userspace code that authenticates the files; fs-verity is just a
+ tool to do this job efficiently and securely. The trusted
+ userspace code will consider non-verity files to be inauthentic.
+
+:Q: Why does the Merkle tree need to be stored on-disk? Couldn't you
+ store just the root hash?
+:A: If the Merkle tree wasn't stored on-disk, then you'd have to
+ compute the entire tree when the file is first accessed, even if
+ just one byte is being read. This is a fundamental consequence of
+ how Merkle tree hashing works. To verify a leaf node, you need to
+ verify the whole path to the root hash, including the root node
+ (the thing which the root hash is a hash of). But if the root
+ node isn't stored on-disk, you have to compute it by hashing its
+ children, and so on until you've actually hashed the entire file.
+
+ That defeats most of the point of doing a Merkle tree-based hash,
+ since if you have to hash the whole file ahead of time anyway,
+ then you could simply do sha256(file) instead. That would be much
+ simpler, and a bit faster too.
+
+ It's true that an in-memory Merkle tree could still provide the
+ advantage of verification on every read rather than just on the
+ first read. However, it would be inefficient because every time a
+ hash page gets evicted (you can't pin the entire Merkle tree into
+ memory, since it may be very large), in order to restore it you
+ again need to hash everything below it in the tree. This again
+ defeats most of the point of doing a Merkle tree-based hash, since
+ a single block read could trigger re-hashing gigabytes of data.
+
+:Q: But couldn't you store just the leaf nodes and compute the rest?
+:A: See previous answer; this really just moves up one level, since
+ one could alternatively interpret the data blocks as being the
+ leaf nodes of the Merkle tree. It's true that the tree can be
+ computed much faster if the leaf level is stored rather than just
+ the data, but that's only because each level is less than 1% the
+ size of the level below (assuming the recommended settings of
+ SHA-256 and 4K blocks). For the exact same reason, by storing
+ "just the leaf nodes" you'd already be storing over 99% of the
+ tree, so you might as well simply store the whole tree.
+
+:Q: Can the Merkle tree be built ahead of time, e.g. distributed as
+ part of a package that is installed to many computers?
+:A: This isn't currently supported. It was part of the original
+ design, but was removed to simplify the kernel UAPI and because it
+ wasn't a critical use case. Files are usually installed once and
+ used many times, and cryptographic hashing is somewhat fast on
+ most modern processors.
+
+:Q: Why doesn't fs-verity support writes?
+:A: Write support would be very difficult and would require a
+ completely different design, so it's well outside the scope of
+ fs-verity. Write support would require:
+
+ - A way to maintain consistency between the data and hashes,
+ including all levels of hashes, since corruption after a crash
+ (especially of potentially the entire file!) is unacceptable.
+ The main options for solving this are data journalling,
+ copy-on-write, and log-structured volume. But it's very hard to
+ retrofit existing filesystems with new consistency mechanisms.
+ Data journalling is available on ext4, but is very slow.
+
+ - Rebuilding the the Merkle tree after every write, which would be
+ extremely inefficient. Alternatively, a different authenticated
+ dictionary structure such as an "authenticated skiplist" could
+ be used. However, this would be far more complex.
+
+ Compare it to dm-verity vs. dm-integrity. dm-verity is very
+ simple: the kernel just verifies read-only data against a
+ read-only Merkle tree. In contrast, dm-integrity supports writes
+ but is slow, is much more complex, and doesn't actually support
+ full-device authentication since it authenticates each sector
+ independently, i.e. there is no "root hash". It doesn't really
+ make sense for the same device-mapper target to support these two
+ very different cases; the same applies to fs-verity.
+
+:Q: Since verity files are immutable, why isn't the immutable bit set?
+:A: The existing "immutable" bit (FS_IMMUTABLE_FL) already has a
+ specific set of semantics which not only make the file contents
+ read-only, but also prevent the file from being deleted, renamed,
+ linked to, or having its owner or mode changed. These extra
+ properties are unwanted for fs-verity, so reusing the immutable
+ bit isn't appropriate.
+
+:Q: Why does the API use ioctls instead of setxattr() and getxattr()?
+:A: Abusing the xattr interface for basically arbitrary syscalls is
+ heavily frowned upon by most of the Linux filesystem developers.
+ An xattr should really just be an xattr on-disk, not an API to
+ e.g. magically trigger construction of a Merkle tree.
+
+:Q: Does fs-verity support remote filesystems?
+:A: Only ext4 and f2fs support is implemented currently, but in
+ principle any filesystem that can store per-file verity metadata
+ can support fs-verity, regardless of whether it's local or remote.
+ Some filesystems may have fewer options of where to store the
+ verity metadata; one possibility is to store it past the end of
+ the file and "hide" it from userspace by manipulating i_size. The
+ data verification functions provided by ``fs/verity/`` also assume
+ that the filesystem uses the Linux pagecache, but both local and
+ remote filesystems normally do so.
+
+:Q: Why is anything filesystem-specific at all? Shouldn't fs-verity
+ be implemented entirely at the VFS level?
+:A: There are many reasons why this is not possible or would be very
+ difficult, including the following:
+
+ - To prevent bypassing verification, pages must not be marked
+ Uptodate until they've been verified. Currently, each
+ filesystem is responsible for marking pages Uptodate via
+ ``->readpages()``. Therefore, currently it's not possible for
+ the VFS to do the verification on its own. Changing this would
+ require significant changes to the VFS and all filesystems.
+
+ - It would require defining a filesystem-independent way to store
+ the verity metadata. Extended attributes don't work for this
+ because (a) the Merkle tree may be gigabytes, but many
+ filesystems assume that all xattrs fit into a single 4K
+ filesystem block, and (b) ext4 and f2fs encryption doesn't
+ encrypt xattrs, yet the Merkle tree *must* be encrypted when the
+ file contents are, because it stores hashes of the plaintext
+ file contents.
+
+ So the verity metadata would have to be stored in an actual
+ file. Using a separate file would be very ugly, since the
+ metadata is fundamentally part of the file to be protected, and
+ it could cause problems where users could delete the real file
+ but not the metadata file or vice versa. On the other hand,
+ having it be in the same file would break applications unless
+ filesystems' notion of i_size were divorced from the VFS's,
+ which would be complex and require changes to all filesystems.
+
+ - It's desirable that FS_IOC_ENABLE_VERITY uses the filesystem's
+ transaction mechanism so that either the file ends up with
+ verity enabled, or no changes were made. Allowing intermediate
+ states to occur after a crash may cause problems.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 1131c34d77f6f1..416c7f0e123af7 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -31,6 +31,7 @@ filesystem implementations.

journalling
fscrypt
+ fsverity

Filesystem-specific documentation
=================================
--
2.22.0.410.gd8fdbe21b5-goog

2019-06-20 20:56:04

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v5 03/16] fs-verity: add UAPI header

From: Eric Biggers <[email protected]>

Add the UAPI header for fs-verity, including two ioctls:

- FS_IOC_ENABLE_VERITY
- FS_IOC_MEASURE_VERITY

These ioctls are documented in the "User API" section of
Documentation/filesystems/fsverity.rst.

Examples of using these ioctls can be found in fsverity-utils
(https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity-utils.git).

I've also written xfstests that test these ioctls
(https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/xfstests-dev.git/log/?h=fsverity).

Reviewed-by: Theodore Ts'o <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
Documentation/ioctl/ioctl-number.txt | 1 +
include/uapi/linux/fsverity.h | 39 ++++++++++++++++++++++++++++
2 files changed, 40 insertions(+)
create mode 100644 include/uapi/linux/fsverity.h

diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index c9558146ac5896..21767c81e86d58 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -225,6 +225,7 @@ Code Seq#(hex) Include File Comments
'f' 00-0F fs/ext4/ext4.h conflict!
'f' 00-0F linux/fs.h conflict!
'f' 00-0F fs/ocfs2/ocfs2_fs.h conflict!
+'f' 81-8F linux/fsverity.h
'g' 00-0F linux/usb/gadgetfs.h
'g' 20-2F linux/usb/g_printer.h
'h' 00-7F conflict! Charon filesystem
diff --git a/include/uapi/linux/fsverity.h b/include/uapi/linux/fsverity.h
new file mode 100644
index 00000000000000..57d1d7fc0c345a
--- /dev/null
+++ b/include/uapi/linux/fsverity.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * fs-verity user API
+ *
+ * These ioctls can be used on filesystems that support fs-verity. See the
+ * "User API" section of Documentation/filesystems/fsverity.rst.
+ *
+ * Copyright 2019 Google LLC
+ */
+#ifndef _UAPI_LINUX_FSVERITY_H
+#define _UAPI_LINUX_FSVERITY_H
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+#define FS_VERITY_HASH_ALG_SHA256 1
+
+struct fsverity_enable_arg {
+ __u32 version;
+ __u32 hash_algorithm;
+ __u32 block_size;
+ __u32 salt_size;
+ __u64 salt_ptr;
+ __u32 sig_size;
+ __u32 __reserved1;
+ __u64 sig_ptr;
+ __u64 __reserved2[11];
+};
+
+struct fsverity_digest {
+ __u16 digest_algorithm;
+ __u16 digest_size; /* input/output */
+ __u8 digest[];
+};
+
+#define FS_IOC_ENABLE_VERITY _IOW('f', 133, struct fsverity_enable_arg)
+#define FS_IOC_MEASURE_VERITY _IOWR('f', 134, struct fsverity_digest)
+
+#endif /* _UAPI_LINUX_FSVERITY_H */
--
2.22.0.410.gd8fdbe21b5-goog

2019-06-20 20:56:12

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v5 06/16] fs-verity: add inode and superblock fields

From: Eric Biggers <[email protected]>

Analogous to fs/crypto/, add fields to the VFS inode and superblock for
use by the fs/verity/ support layer:

- ->s_vop: points to the fsverity_operations if the filesystem supports
fs-verity, otherwise is NULL.

- ->i_verity_info: points to cached fs-verity information for the inode
after someone opens it, otherwise is NULL.

- S_VERITY: bit in ->i_flags that identifies verity inodes, even when
they haven't been opened yet and thus still have NULL ->i_verity_info.

Reviewed-by: Theodore Ts'o <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
include/linux/fs.h | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index f7fdfe93e25d3e..a80a192cdcf285 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -64,6 +64,8 @@ struct workqueue_struct;
struct iov_iter;
struct fscrypt_info;
struct fscrypt_operations;
+struct fsverity_info;
+struct fsverity_operations;
struct fs_context;
struct fs_parameter_description;

@@ -723,6 +725,10 @@ struct inode {
struct fscrypt_info *i_crypt_info;
#endif

+#ifdef CONFIG_FS_VERITY
+ struct fsverity_info *i_verity_info;
+#endif
+
void *i_private; /* fs or device private pointer */
} __randomize_layout;

@@ -1429,6 +1435,9 @@ struct super_block {
const struct xattr_handler **s_xattr;
#ifdef CONFIG_FS_ENCRYPTION
const struct fscrypt_operations *s_cop;
+#endif
+#ifdef CONFIG_FS_VERITY
+ const struct fsverity_operations *s_vop;
#endif
struct hlist_bl_head s_roots; /* alternate root dentries for NFS */
struct list_head s_mounts; /* list of mounts; _not_ for fs use */
@@ -1964,6 +1973,7 @@ struct super_operations {
#endif
#define S_ENCRYPTED 16384 /* Encrypted file (using fs/crypto/) */
#define S_CASEFOLD 32768 /* Casefolded file */
+#define S_VERITY 65536 /* Verity file (using fs/verity/) */

/*
* Note that nosuid etc flags are inode-specific: setting some file-system
@@ -2005,6 +2015,7 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags
#define IS_DAX(inode) ((inode)->i_flags & S_DAX)
#define IS_ENCRYPTED(inode) ((inode)->i_flags & S_ENCRYPTED)
#define IS_CASEFOLDED(inode) ((inode)->i_flags & S_CASEFOLD)
+#define IS_VERITY(inode) ((inode)->i_flags & S_VERITY)

#define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \
(inode)->i_rdev == WHITEOUT_DEV)
--
2.22.0.410.gd8fdbe21b5-goog

2019-06-20 20:56:12

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v5 09/16] fs-verity: add data verification hooks for ->readpages()

From: Eric Biggers <[email protected]>

Add functions that verify data pages that have been read from a
fs-verity file, against that file's Merkle tree. These will be called
from filesystems' ->readpage() and ->readpages() methods.

Since data verification can block, a workqueue is provided for these
methods to enqueue verification work from their bio completion callback.

See the "Verifying data" section of
Documentation/filesystems/fsverity.rst for more information.

Reviewed-by: Theodore Ts'o <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
fs/verity/Makefile | 3 +-
fs/verity/fsverity_private.h | 5 +
fs/verity/init.c | 8 +
fs/verity/open.c | 6 +
fs/verity/verify.c | 275 +++++++++++++++++++++++++++++++++++
include/linux/fsverity.h | 56 +++++++
6 files changed, 352 insertions(+), 1 deletion(-)
create mode 100644 fs/verity/verify.c

diff --git a/fs/verity/Makefile b/fs/verity/Makefile
index e6a8951c493a5e..7fa628cd5eba24 100644
--- a/fs/verity/Makefile
+++ b/fs/verity/Makefile
@@ -2,4 +2,5 @@

obj-$(CONFIG_FS_VERITY) += hash_algs.o \
init.o \
- open.o
+ open.o \
+ verify.o
diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index c79746ff335e14..eaa2b3b93bbf6b 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -134,5 +134,10 @@ void fsverity_set_info(struct inode *inode, struct fsverity_info *vi);
void fsverity_free_info(struct fsverity_info *vi);

int __init fsverity_init_info_cache(void);
+void __init fsverity_exit_info_cache(void);
+
+/* verify.c */
+
+int __init fsverity_init_workqueue(void);

#endif /* _FSVERITY_PRIVATE_H */
diff --git a/fs/verity/init.c b/fs/verity/init.c
index fff1fd6343357d..b593805aafcc89 100644
--- a/fs/verity/init.c
+++ b/fs/verity/init.c
@@ -41,7 +41,15 @@ static int __init fsverity_init(void)
if (err)
return err;

+ err = fsverity_init_workqueue();
+ if (err)
+ goto err_exit_info_cache;
+
pr_debug("Initialized fs-verity\n");
return 0;
+
+err_exit_info_cache:
+ fsverity_exit_info_cache();
+ return err;
}
late_initcall(fsverity_init)
diff --git a/fs/verity/open.c b/fs/verity/open.c
index 21ae0ef254a695..7a2cd000dc4f06 100644
--- a/fs/verity/open.c
+++ b/fs/verity/open.c
@@ -338,3 +338,9 @@ int __init fsverity_init_info_cache(void)
return -ENOMEM;
return 0;
}
+
+void __init fsverity_exit_info_cache(void)
+{
+ kmem_cache_destroy(fsverity_info_cachep);
+ fsverity_info_cachep = NULL;
+}
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
new file mode 100644
index 00000000000000..2a0f9e2ebc9f16
--- /dev/null
+++ b/fs/verity/verify.c
@@ -0,0 +1,275 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fs/verity/verify.c: data verification functions, i.e. hooks for ->readpages()
+ *
+ * Copyright 2019 Google LLC
+ */
+
+#include "fsverity_private.h"
+
+#include <crypto/hash.h>
+#include <linux/bio.h>
+#include <linux/ratelimit.h>
+
+static struct workqueue_struct *fsverity_read_workqueue;
+
+/**
+ * hash_at_level() - compute the location of the block's hash at the given level
+ *
+ * @params: (in) the Merkle tree parameters
+ * @dindex: (in) the index of the data block being verified
+ * @level: (in) the level of hash we want (0 is leaf level)
+ * @hindex: (out) the index of the hash block containing the wanted hash
+ * @hoffset: (out) the byte offset to the wanted hash within the hash block
+ */
+static void hash_at_level(const struct merkle_tree_params *params,
+ pgoff_t dindex, unsigned int level, pgoff_t *hindex,
+ unsigned int *hoffset)
+{
+ pgoff_t position;
+
+ /* Offset of the hash within the level's region, in hashes */
+ position = dindex >> (level * params->log_arity);
+
+ /* Index of the hash block in the tree overall */
+ *hindex = params->level_start[level] + (position >> params->log_arity);
+
+ /* Offset of the wanted hash (in bytes) within the hash block */
+ *hoffset = (position & ((1 << params->log_arity) - 1)) <<
+ (params->log_blocksize - params->log_arity);
+}
+
+/* Extract a hash from a hash page */
+static void extract_hash(struct page *hpage, unsigned int hoffset,
+ unsigned int hsize, u8 *out)
+{
+ void *virt = kmap_atomic(hpage);
+
+ memcpy(out, virt + hoffset, hsize);
+ kunmap_atomic(virt);
+}
+
+static inline int cmp_hashes(const struct fsverity_info *vi,
+ const u8 *want_hash, const u8 *real_hash,
+ pgoff_t index, int level)
+{
+ const unsigned int hsize = vi->tree_params.digest_size;
+
+ if (memcmp(want_hash, real_hash, hsize) == 0)
+ return 0;
+
+ fsverity_err(vi->inode,
+ "FILE CORRUPTED! index=%lu, level=%d, want_hash=%s:%*phN, real_hash=%s:%*phN",
+ index, level,
+ vi->tree_params.hash_alg->name, hsize, want_hash,
+ vi->tree_params.hash_alg->name, hsize, real_hash);
+ return -EBADMSG;
+}
+
+/*
+ * Verify a single data page against the file's Merkle tree.
+ *
+ * In principle, we need to verify the entire path to the root node. However,
+ * for efficiency the filesystem may cache the hash pages. Therefore we need
+ * only ascend the tree until an already-verified page is seen, as indicated by
+ * the PageChecked bit being set; then verify the path to that page.
+ *
+ * This code currently only supports the case where the verity block size is
+ * equal to PAGE_SIZE. Doing otherwise would be possible but tricky, since we
+ * wouldn't be able to use the PageChecked bit.
+ *
+ * Note that multiple processes may race to verify a hash page and mark it
+ * Checked, but it doesn't matter; the result will be the same either way.
+ *
+ * Return: true if the page is valid, else false.
+ */
+static bool verify_page(struct inode *inode, const struct fsverity_info *vi,
+ struct ahash_request *req, struct page *data_page)
+{
+ const struct merkle_tree_params *params = &vi->tree_params;
+ const unsigned int hsize = params->digest_size;
+ const pgoff_t index = data_page->index;
+ int level;
+ u8 _want_hash[FS_VERITY_MAX_DIGEST_SIZE];
+ const u8 *want_hash;
+ u8 real_hash[FS_VERITY_MAX_DIGEST_SIZE];
+ struct page *hpages[FS_VERITY_MAX_LEVELS];
+ unsigned int hoffsets[FS_VERITY_MAX_LEVELS];
+ int err;
+
+ if (WARN_ON_ONCE(!PageLocked(data_page) || PageUptodate(data_page)))
+ return false;
+
+ pr_debug_ratelimited("Verifying data page %lu...\n", index);
+
+ /*
+ * Starting at the leaf level, ascend the tree saving hash pages along
+ * the way until we find a verified hash page, indicated by PageChecked;
+ * or until we reach the root.
+ */
+ for (level = 0; level < params->num_levels; level++) {
+ pgoff_t hindex;
+ unsigned int hoffset;
+ struct page *hpage;
+
+ hash_at_level(params, index, level, &hindex, &hoffset);
+
+ pr_debug_ratelimited("Level %d: hindex=%lu, hoffset=%u\n",
+ level, hindex, hoffset);
+
+ hpage = inode->i_sb->s_vop->read_merkle_tree_page(inode,
+ hindex);
+ if (IS_ERR(hpage)) {
+ err = PTR_ERR(hpage);
+ fsverity_err(inode,
+ "Error %d reading Merkle tree page %lu",
+ err, hindex);
+ goto out;
+ }
+
+ if (PageChecked(hpage)) {
+ extract_hash(hpage, hoffset, hsize, _want_hash);
+ want_hash = _want_hash;
+ put_page(hpage);
+ pr_debug_ratelimited("Hash page already checked, want %s:%*phN\n",
+ params->hash_alg->name,
+ hsize, want_hash);
+ goto descend;
+ }
+ pr_debug_ratelimited("Hash page not yet checked\n");
+ hpages[level] = hpage;
+ hoffsets[level] = hoffset;
+ }
+
+ want_hash = vi->root_hash;
+ pr_debug("Want root hash: %s:%*phN\n",
+ params->hash_alg->name, hsize, want_hash);
+descend:
+ /* Descend the tree verifying hash pages */
+ for (; level > 0; level--) {
+ struct page *hpage = hpages[level - 1];
+ unsigned int hoffset = hoffsets[level - 1];
+
+ err = fsverity_hash_page(params, inode, req, hpage, real_hash);
+ if (err)
+ goto out;
+ err = cmp_hashes(vi, want_hash, real_hash, index, level - 1);
+ if (err)
+ goto out;
+ SetPageChecked(hpage);
+ extract_hash(hpage, hoffset, hsize, _want_hash);
+ want_hash = _want_hash;
+ put_page(hpage);
+ pr_debug("Verified hash page at level %d, now want %s:%*phN\n",
+ level - 1, params->hash_alg->name, hsize, want_hash);
+ }
+
+ /* Finally, verify the data page */
+ err = fsverity_hash_page(params, inode, req, data_page, real_hash);
+ if (err)
+ goto out;
+ err = cmp_hashes(vi, want_hash, real_hash, index, -1);
+out:
+ for (; level > 0; level--)
+ put_page(hpages[level - 1]);
+
+ return err == 0;
+}
+
+/**
+ * fsverity_verify_page - verify a data page
+ *
+ * Verify a page that has just been read from a verity file. The page must be a
+ * pagecache page that is still locked and not yet uptodate.
+ *
+ * Return: true if the page is valid, else false.
+ */
+bool fsverity_verify_page(struct page *page)
+{
+ struct inode *inode = page->mapping->host;
+ const struct fsverity_info *vi = inode->i_verity_info;
+ struct ahash_request *req;
+ bool valid;
+
+ req = ahash_request_alloc(vi->tree_params.hash_alg->tfm, GFP_NOFS);
+ if (unlikely(!req))
+ return false;
+
+ valid = verify_page(inode, vi, req, page);
+
+ ahash_request_free(req);
+
+ return valid;
+}
+EXPORT_SYMBOL_GPL(fsverity_verify_page);
+
+#ifdef CONFIG_BLOCK
+/**
+ * fsverity_verify_bio - verify a 'read' bio that has just completed
+ *
+ * Verify a set of pages that have just been read from a verity file. The pages
+ * must be pagecache pages that are still locked and not yet uptodate. Pages
+ * that fail verification are set to the Error state. Verification is skipped
+ * for pages already in the Error state, e.g. due to fscrypt decryption failure.
+ *
+ * This is a helper function for use by the ->readpages() method of filesystems
+ * that issue bios to read data directly into the page cache. Filesystems that
+ * populate the page cache without issuing bios (e.g. non block-based
+ * filesystems) must instead call fsverity_verify_page() directly on each page.
+ * All filesystems must also call fsverity_verify_page() on holes.
+ */
+void fsverity_verify_bio(struct bio *bio)
+{
+ struct inode *inode = bio_first_page_all(bio)->mapping->host;
+ const struct fsverity_info *vi = inode->i_verity_info;
+ struct ahash_request *req;
+ struct bio_vec *bv;
+ struct bvec_iter_all iter_all;
+
+ req = ahash_request_alloc(vi->tree_params.hash_alg->tfm, GFP_NOFS);
+ if (unlikely(!req)) {
+ bio_for_each_segment_all(bv, bio, iter_all)
+ SetPageError(bv->bv_page);
+ return;
+ }
+
+ bio_for_each_segment_all(bv, bio, iter_all) {
+ struct page *page = bv->bv_page;
+
+ if (!PageError(page) && !verify_page(inode, vi, req, page))
+ SetPageError(page);
+ }
+
+ ahash_request_free(req);
+}
+EXPORT_SYMBOL_GPL(fsverity_verify_bio);
+#endif /* CONFIG_BLOCK */
+
+/**
+ * fsverity_enqueue_verify_work - enqueue work on the fs-verity workqueue
+ *
+ * Enqueue verification work for asynchronous processing.
+ */
+void fsverity_enqueue_verify_work(struct work_struct *work)
+{
+ queue_work(fsverity_read_workqueue, work);
+}
+EXPORT_SYMBOL_GPL(fsverity_enqueue_verify_work);
+
+int __init fsverity_init_workqueue(void)
+{
+ /*
+ * Use an unbound workqueue to allow bios to be verified in parallel
+ * even when they happen to complete on the same CPU. This sacrifices
+ * locality, but it's worthwhile since hashing is CPU-intensive.
+ *
+ * Also use a high-priority workqueue to prioritize verification work,
+ * which blocks reads from completing, over regular application tasks.
+ */
+ fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue",
+ WQ_UNBOUND | WQ_HIGHPRI,
+ num_online_cpus());
+ if (!fsverity_read_workqueue)
+ return -ENOMEM;
+ return 0;
+}
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index cbcc358d073652..ecd47e748c7f64 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -33,6 +33,23 @@ struct fsverity_operations {
*/
int (*get_verity_descriptor)(struct inode *inode, void *buf,
size_t bufsize);
+
+ /**
+ * Read a Merkle tree page of the given inode.
+ *
+ * @inode: the inode
+ * @index: 0-based index of the page within the Merkle tree
+ *
+ * This can be called at any time on an open verity file, as well as
+ * between ->begin_enable_verity() and ->end_enable_verity(). It may be
+ * called by multiple processes concurrently, even with the same page.
+ *
+ * Note that this must retrieve a *page*, not necessarily a *block*.
+ *
+ * Return: the page on success, ERR_PTR() on failure
+ */
+ struct page *(*read_merkle_tree_page)(struct inode *inode,
+ pgoff_t index);
};

#ifdef CONFIG_FS_VERITY
@@ -49,6 +66,12 @@ extern int fsverity_file_open(struct inode *inode, struct file *filp);
extern int fsverity_prepare_setattr(struct dentry *dentry, struct iattr *attr);
extern void fsverity_cleanup_inode(struct inode *inode);

+/* verify.c */
+
+extern bool fsverity_verify_page(struct page *page);
+extern void fsverity_verify_bio(struct bio *bio);
+extern void fsverity_enqueue_verify_work(struct work_struct *work);
+
#else /* !CONFIG_FS_VERITY */

static inline struct fsverity_info *fsverity_get_info(const struct inode *inode)
@@ -73,6 +96,39 @@ static inline void fsverity_cleanup_inode(struct inode *inode)
{
}

+/* verify.c */
+
+static inline bool fsverity_verify_page(struct page *page)
+{
+ WARN_ON(1);
+ return false;
+}
+
+static inline void fsverity_verify_bio(struct bio *bio)
+{
+ WARN_ON(1);
+}
+
+static inline void fsverity_enqueue_verify_work(struct work_struct *work)
+{
+ WARN_ON(1);
+}
+
#endif /* !CONFIG_FS_VERITY */

+/**
+ * fsverity_active() - do reads from the inode need to go through fs-verity?
+ *
+ * This checks whether ->i_verity_info has been set.
+ *
+ * Filesystems call this from ->readpages() to check whether the pages need to
+ * be verified or not. Don't use IS_VERITY() for this purpose; it's subject to
+ * a race condition where the file is being read concurrently with
+ * FS_IOC_ENABLE_VERITY completing. (S_VERITY is set before ->i_verity_info.)
+ */
+static inline bool fsverity_active(const struct inode *inode)
+{
+ return fsverity_get_info(inode) != NULL;
+}
+
#endif /* _LINUX_FSVERITY_H */
--
2.22.0.410.gd8fdbe21b5-goog

2019-06-20 20:56:16

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v5 05/16] fs-verity: add Kconfig and the helper functions for hashing

From: Eric Biggers <[email protected]>

Add the beginnings of the fs/verity/ support layer, including the
Kconfig option and various helper functions for hashing. To start, only
SHA-256 is supported, but other hash algorithms can easily be added.

Reviewed-by: Theodore Ts'o <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
fs/Kconfig | 2 +
fs/Makefile | 1 +
fs/verity/Kconfig | 38 +++++
fs/verity/Makefile | 4 +
fs/verity/fsverity_private.h | 88 +++++++++++
fs/verity/hash_algs.c | 274 +++++++++++++++++++++++++++++++++++
fs/verity/init.c | 41 ++++++
7 files changed, 448 insertions(+)
create mode 100644 fs/verity/Kconfig
create mode 100644 fs/verity/Makefile
create mode 100644 fs/verity/fsverity_private.h
create mode 100644 fs/verity/hash_algs.c
create mode 100644 fs/verity/init.c

diff --git a/fs/Kconfig b/fs/Kconfig
index f1046cf6ad85e0..4b66dafbdc7b1c 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -113,6 +113,8 @@ config MANDATORY_FILE_LOCKING

source "fs/crypto/Kconfig"

+source "fs/verity/Kconfig"
+
source "fs/notify/Kconfig"

source "fs/quota/Kconfig"
diff --git a/fs/Makefile b/fs/Makefile
index c9aea23aba560c..fe7f2c07f482e1 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -34,6 +34,7 @@ obj-$(CONFIG_AIO) += aio.o
obj-$(CONFIG_IO_URING) += io_uring.o
obj-$(CONFIG_FS_DAX) += dax.o
obj-$(CONFIG_FS_ENCRYPTION) += crypto/
+obj-$(CONFIG_FS_VERITY) += verity/
obj-$(CONFIG_FILE_LOCKING) += locks.o
obj-$(CONFIG_COMPAT) += compat.o compat_ioctl.o
obj-$(CONFIG_BINFMT_AOUT) += binfmt_aout.o
diff --git a/fs/verity/Kconfig b/fs/verity/Kconfig
new file mode 100644
index 00000000000000..c2bca0b01ecfa9
--- /dev/null
+++ b/fs/verity/Kconfig
@@ -0,0 +1,38 @@
+# SPDX-License-Identifier: GPL-2.0
+
+config FS_VERITY
+ bool "FS Verity (read-only file-based authenticity protection)"
+ select CRYPTO
+ # SHA-256 is selected as it's intended to be the default hash algorithm.
+ # To avoid bloat, other wanted algorithms must be selected explicitly.
+ select CRYPTO_SHA256
+ help
+ This option enables fs-verity. fs-verity is the dm-verity
+ mechanism implemented at the file level. On supported
+ filesystems (currently EXT4 and F2FS), userspace can use an
+ ioctl to enable verity for a file, which causes the filesystem
+ to build a Merkle tree for the file. The filesystem will then
+ transparently verify any data read from the file against the
+ Merkle tree. The file is also made read-only.
+
+ This serves as an integrity check, but the availability of the
+ Merkle tree root hash also allows efficiently supporting
+ various use cases where normally the whole file would need to
+ be hashed at once, such as: (a) auditing (logging the file's
+ hash), or (b) authenticity verification (comparing the hash
+ against a known good value, e.g. from a digital signature).
+
+ fs-verity is especially useful on large files where not all
+ the contents may actually be needed. Also, fs-verity verifies
+ data each time it is paged back in, which provides better
+ protection against malicious disks vs. an ahead-of-time hash.
+
+ If unsure, say N.
+
+config FS_VERITY_DEBUG
+ bool "FS Verity debugging"
+ depends on FS_VERITY
+ help
+ Enable debugging messages related to fs-verity by default.
+
+ Say N unless you are an fs-verity developer.
diff --git a/fs/verity/Makefile b/fs/verity/Makefile
new file mode 100644
index 00000000000000..398f3f85fa184b
--- /dev/null
+++ b/fs/verity/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_FS_VERITY) += hash_algs.o \
+ init.o
diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
new file mode 100644
index 00000000000000..9697aaebb5dc1f
--- /dev/null
+++ b/fs/verity/fsverity_private.h
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * fs-verity: read-only file-based authenticity protection
+ *
+ * Copyright 2019 Google LLC
+ */
+
+#ifndef _FSVERITY_PRIVATE_H
+#define _FSVERITY_PRIVATE_H
+
+#ifdef CONFIG_FS_VERITY_DEBUG
+#define DEBUG
+#endif
+
+#define pr_fmt(fmt) "fs-verity: " fmt
+
+#include <crypto/sha.h>
+#include <linux/fs.h>
+#include <uapi/linux/fsverity.h>
+
+struct ahash_request;
+
+/*
+ * Implementation limit: maximum depth of the Merkle tree. For now 8 is plenty;
+ * it's enough for over U64_MAX bytes of data using SHA-256 and 4K blocks.
+ */
+#define FS_VERITY_MAX_LEVELS 8
+
+/*
+ * Largest digest size among all hash algorithms supported by fs-verity.
+ * Currently assumed to be <= size of fsverity_descriptor::root_hash.
+ */
+#define FS_VERITY_MAX_DIGEST_SIZE SHA256_DIGEST_SIZE
+
+/* A hash algorithm supported by fs-verity */
+struct fsverity_hash_alg {
+ struct crypto_ahash *tfm; /* hash tfm, allocated on demand */
+ const char *name; /* crypto API name, e.g. sha256 */
+ unsigned int digest_size; /* digest size in bytes, e.g. 32 for SHA-256 */
+ unsigned int block_size; /* block size in bytes, e.g. 64 for SHA-256 */
+};
+
+/* Merkle tree parameters: hash algorithm, initial hash state, and topology */
+struct merkle_tree_params {
+ const struct fsverity_hash_alg *hash_alg; /* the hash algorithm */
+ const u8 *hashstate; /* initial hash state or NULL */
+ unsigned int digest_size; /* same as hash_alg->digest_size */
+ unsigned int block_size; /* size of data and tree blocks */
+ unsigned int hashes_per_block; /* number of hashes per tree block */
+ unsigned int log_blocksize; /* log2(block_size) */
+ unsigned int log_arity; /* log2(hashes_per_block) */
+ unsigned int num_levels; /* number of levels in Merkle tree */
+ u64 tree_size; /* Merkle tree size in bytes */
+
+ /*
+ * Starting block index for each tree level, ordered from leaf level (0)
+ * to root level ('num_levels - 1')
+ */
+ u64 level_start[FS_VERITY_MAX_LEVELS];
+};
+
+/* hash_algs.c */
+
+extern struct fsverity_hash_alg fsverity_hash_algs[];
+
+const struct fsverity_hash_alg *fsverity_get_hash_alg(const struct inode *inode,
+ unsigned int num);
+const u8 *fsverity_prepare_hash_state(const struct fsverity_hash_alg *alg,
+ const u8 *salt, size_t salt_size);
+int fsverity_hash_page(const struct merkle_tree_params *params,
+ const struct inode *inode,
+ struct ahash_request *req, struct page *page, u8 *out);
+int fsverity_hash_buffer(const struct fsverity_hash_alg *alg,
+ const void *data, size_t size, u8 *out);
+void __init fsverity_check_hash_algs(void);
+
+/* init.c */
+
+extern void __printf(3, 4) __cold
+fsverity_msg(const struct inode *inode, const char *level,
+ const char *fmt, ...);
+
+#define fsverity_warn(inode, fmt, ...) \
+ fsverity_msg((inode), KERN_WARNING, fmt, ##__VA_ARGS__)
+#define fsverity_err(inode, fmt, ...) \
+ fsverity_msg((inode), KERN_ERR, fmt, ##__VA_ARGS__)
+
+#endif /* _FSVERITY_PRIVATE_H */
diff --git a/fs/verity/hash_algs.c b/fs/verity/hash_algs.c
new file mode 100644
index 00000000000000..46df17094fc252
--- /dev/null
+++ b/fs/verity/hash_algs.c
@@ -0,0 +1,274 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fs/verity/hash_algs.c: fs-verity hash algorithms
+ *
+ * Copyright 2019 Google LLC
+ */
+
+#include "fsverity_private.h"
+
+#include <crypto/hash.h>
+#include <linux/scatterlist.h>
+
+/* The hash algorithms supported by fs-verity */
+struct fsverity_hash_alg fsverity_hash_algs[] = {
+ [FS_VERITY_HASH_ALG_SHA256] = {
+ .name = "sha256",
+ .digest_size = SHA256_DIGEST_SIZE,
+ .block_size = SHA256_BLOCK_SIZE,
+ },
+};
+
+/**
+ * fsverity_get_hash_alg() - validate and prepare a hash algorithm
+ * @inode: optional inode for logging purposes
+ * @num: the hash algorithm number
+ *
+ * Get the struct fsverity_hash_alg for the given hash algorithm number, and
+ * ensure it has a hash transform ready to go. The hash transforms are
+ * allocated on-demand so that we don't waste resources unnecessarily, and
+ * because the crypto modules may be initialized later than fs/verity/.
+ *
+ * Return: pointer to the hash alg on success, else an ERR_PTR()
+ */
+const struct fsverity_hash_alg *fsverity_get_hash_alg(const struct inode *inode,
+ unsigned int num)
+{
+ struct fsverity_hash_alg *alg;
+ struct crypto_ahash *tfm;
+ int err;
+
+ if (num >= ARRAY_SIZE(fsverity_hash_algs) ||
+ !fsverity_hash_algs[num].name) {
+ fsverity_warn(inode, "Unknown hash algorithm number: %u", num);
+ return ERR_PTR(-EINVAL);
+ }
+ alg = &fsverity_hash_algs[num];
+
+ /* pairs with cmpxchg() below */
+ tfm = READ_ONCE(alg->tfm);
+ if (likely(tfm != NULL))
+ return alg;
+ /*
+ * Using the shash API would make things a bit simpler, but the ahash
+ * API is preferable as it allows the use of crypto accelerators.
+ */
+ tfm = crypto_alloc_ahash(alg->name, 0, 0);
+ if (IS_ERR(tfm)) {
+ if (PTR_ERR(tfm) == -ENOENT)
+ fsverity_warn(inode,
+ "Missing crypto API support for hash algorithm \"%s\"",
+ alg->name);
+ else
+ fsverity_err(inode,
+ "Error allocating hash algorithm \"%s\": %ld",
+ alg->name, PTR_ERR(tfm));
+ return ERR_CAST(tfm);
+ }
+
+ err = -EINVAL;
+ if (WARN_ON(alg->digest_size != crypto_ahash_digestsize(tfm)))
+ goto err_free_tfm;
+ if (WARN_ON(alg->block_size != crypto_ahash_blocksize(tfm)))
+ goto err_free_tfm;
+
+ pr_info("%s using implementation \"%s\"\n",
+ alg->name, crypto_ahash_driver_name(tfm));
+
+ /* pairs with READ_ONCE() above */
+ if (cmpxchg(&alg->tfm, NULL, tfm) != NULL)
+ crypto_free_ahash(tfm);
+
+ return alg;
+
+err_free_tfm:
+ crypto_free_ahash(tfm);
+ return ERR_PTR(err);
+}
+
+/**
+ * fsverity_prepare_hash_state() - precompute the initial hash state
+ * @alg: hash algorithm
+ * @salt: a salt which is to be prepended to all data to be hashed
+ * @salt_size: salt size in bytes, possibly 0
+ *
+ * Return: NULL if the salt is empty, otherwise the kmalloc()'ed precomputed
+ * initial hash state on success or an ERR_PTR() on failure.
+ */
+const u8 *fsverity_prepare_hash_state(const struct fsverity_hash_alg *alg,
+ const u8 *salt, size_t salt_size)
+{
+ u8 *hashstate = NULL;
+ struct ahash_request *req = NULL;
+ u8 *padded_salt = NULL;
+ size_t padded_salt_size;
+ struct scatterlist sg;
+ DECLARE_CRYPTO_WAIT(wait);
+ int err;
+
+ if (salt_size == 0)
+ return NULL;
+
+ hashstate = kmalloc(crypto_ahash_statesize(alg->tfm), GFP_KERNEL);
+ if (!hashstate)
+ return ERR_PTR(-ENOMEM);
+
+ req = ahash_request_alloc(alg->tfm, GFP_KERNEL);
+ if (!req) {
+ err = -ENOMEM;
+ goto err_free;
+ }
+
+ /*
+ * Zero-pad the salt to the next multiple of the input size of the hash
+ * algorithm's compression function, e.g. 64 bytes for SHA-256 or 128
+ * bytes for SHA-512. This ensures that the hash algorithm won't have
+ * any bytes buffered internally after processing the salt, thus making
+ * salted hashing just as fast as unsalted hashing.
+ */
+ padded_salt_size = round_up(salt_size, alg->block_size);
+ padded_salt = kzalloc(padded_salt_size, GFP_KERNEL);
+ if (!padded_salt) {
+ err = -ENOMEM;
+ goto err_free;
+ }
+ memcpy(padded_salt, salt, salt_size);
+
+ sg_init_one(&sg, padded_salt, padded_salt_size);
+ ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP |
+ CRYPTO_TFM_REQ_MAY_BACKLOG,
+ crypto_req_done, &wait);
+ ahash_request_set_crypt(req, &sg, NULL, padded_salt_size);
+
+ err = crypto_wait_req(crypto_ahash_init(req), &wait);
+ if (err)
+ goto err_free;
+
+ err = crypto_wait_req(crypto_ahash_update(req), &wait);
+ if (err)
+ goto err_free;
+
+ err = crypto_ahash_export(req, hashstate);
+ if (err)
+ goto err_free;
+out:
+ kfree(padded_salt);
+ ahash_request_free(req);
+ return hashstate;
+
+err_free:
+ kfree(hashstate);
+ hashstate = ERR_PTR(err);
+ goto out;
+}
+
+/**
+ * fsverity_hash_page() - hash a single data or hash page
+ * @params: the Merkle tree's parameters
+ * @inode: inode for which the hashing is being done
+ * @req: preallocated hash request
+ * @page: the page to hash
+ * @out: output digest, size 'params->digest_size' bytes
+ *
+ * Hash a single data or hash block, assuming block_size == PAGE_SIZE.
+ * The hash is salted if a salt is specified in the Merkle tree parameters.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int fsverity_hash_page(const struct merkle_tree_params *params,
+ const struct inode *inode,
+ struct ahash_request *req, struct page *page, u8 *out)
+{
+ struct scatterlist sg;
+ DECLARE_CRYPTO_WAIT(wait);
+ int err;
+
+ if (WARN_ON(params->block_size != PAGE_SIZE))
+ return -EINVAL;
+
+ sg_init_table(&sg, 1);
+ sg_set_page(&sg, page, PAGE_SIZE, 0);
+ ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP |
+ CRYPTO_TFM_REQ_MAY_BACKLOG,
+ crypto_req_done, &wait);
+ ahash_request_set_crypt(req, &sg, out, PAGE_SIZE);
+
+ if (params->hashstate) {
+ err = crypto_ahash_import(req, params->hashstate);
+ if (err) {
+ fsverity_err(inode,
+ "Error %d importing hash state", err);
+ return err;
+ }
+ err = crypto_ahash_finup(req);
+ } else {
+ err = crypto_ahash_digest(req);
+ }
+
+ err = crypto_wait_req(err, &wait);
+ if (err)
+ fsverity_err(inode, "Error %d computing page hash", err);
+ return err;
+}
+
+/**
+ * fsverity_hash_buffer() - hash some data
+ * @alg: the hash algorithm to use
+ * @data: the data to hash
+ * @size: size of data to hash
+ * @out: output digest, size 'alg->digest_size' bytes
+ *
+ * Hash some data which is located in physically contiguous memory (i.e. memory
+ * allocated by kmalloc(), not by vmalloc()). No salt is used.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int fsverity_hash_buffer(const struct fsverity_hash_alg *alg,
+ const void *data, size_t size, u8 *out)
+{
+ struct ahash_request *req;
+ struct scatterlist sg;
+ DECLARE_CRYPTO_WAIT(wait);
+ int err;
+
+ req = ahash_request_alloc(alg->tfm, GFP_KERNEL);
+ if (!req)
+ return -ENOMEM;
+
+ sg_init_one(&sg, data, size);
+ ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP |
+ CRYPTO_TFM_REQ_MAY_BACKLOG,
+ crypto_req_done, &wait);
+ ahash_request_set_crypt(req, &sg, out, size);
+
+ err = crypto_wait_req(crypto_ahash_digest(req), &wait);
+
+ ahash_request_free(req);
+ return err;
+}
+
+void __init fsverity_check_hash_algs(void)
+{
+ size_t i;
+
+ /*
+ * Sanity check the hash algorithms (could be a build-time check, but
+ * they're in an array)
+ */
+ for (i = 0; i < ARRAY_SIZE(fsverity_hash_algs); i++) {
+ const struct fsverity_hash_alg *alg = &fsverity_hash_algs[i];
+
+ if (!alg->name)
+ continue;
+
+ BUG_ON(alg->digest_size > FS_VERITY_MAX_DIGEST_SIZE);
+
+ /*
+ * For efficiency, the implementation currently assumes the
+ * digest and block sizes are powers of 2. This limitation can
+ * be lifted if the code is updated to handle other values.
+ */
+ BUG_ON(!is_power_of_2(alg->digest_size));
+ BUG_ON(!is_power_of_2(alg->block_size));
+ }
+}
diff --git a/fs/verity/init.c b/fs/verity/init.c
new file mode 100644
index 00000000000000..40076bbe452a48
--- /dev/null
+++ b/fs/verity/init.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fs/verity/init.c: fs-verity module initialization and logging
+ *
+ * Copyright 2019 Google LLC
+ */
+
+#include "fsverity_private.h"
+
+#include <linux/ratelimit.h>
+
+void fsverity_msg(const struct inode *inode, const char *level,
+ const char *fmt, ...)
+{
+ static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+ struct va_format vaf;
+ va_list args;
+
+ if (!__ratelimit(&rs))
+ return;
+
+ va_start(args, fmt);
+ vaf.fmt = fmt;
+ vaf.va = &args;
+ if (inode)
+ printk("%sfs-verity (%s, inode %lu): %pV\n",
+ level, inode->i_sb->s_id, inode->i_ino, &vaf);
+ else
+ printk("%sfs-verity: %pV\n", level, &vaf);
+ va_end(args);
+}
+
+static int __init fsverity_init(void)
+{
+ fsverity_check_hash_algs();
+
+ pr_debug("Initialized fs-verity\n");
+ return 0;
+}
+late_initcall(fsverity_init)
--
2.22.0.410.gd8fdbe21b5-goog

2019-06-21 00:01:28

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH v5 14/16] ext4: add basic fs-verity support

On Thu, Jun 20, 2019 at 01:50:41PM -0700, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> Add most of fs-verity support to ext4. fs-verity is a filesystem
> feature that enables transparent integrity protection and authentication
> of read-only files. It uses a dm-verity like mechanism at the file
> level: a Merkle tree is used to verify any block in the file in
> log(filesize) time. It is implemented mainly by helper functions in
> fs/verity/. See Documentation/filesystems/fsverity.rst for the full
> documentation.
>
> This commit adds all of ext4 fs-verity support except for the actual
> data verification, including:
>
> - Adding a filesystem feature flag and an inode flag for fs-verity.
>
> - Implementing the fsverity_operations to support enabling verity on an
> inode and reading/writing the verity metadata.
>
> - Updating ->write_begin(), ->write_end(), and ->writepages() to support
> writing verity metadata pages.
>
> - Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().
>
> ext4 stores the verity metadata (Merkle tree and fsverity_descriptor)
> past the end of the file, starting at the first 64K boundary beyond
> i_size. This approach works because (a) verity files are readonly, and
> (b) pages fully beyond i_size aren't visible to userspace but can be
> read/written internally by ext4 with only some relatively small changes
> to ext4. This approach avoids having to depend on the EA_INODE feature
> and on rearchitecturing ext4's xattr support to support paging
> multi-gigabyte xattrs into memory, and to support encrypting xattrs.
> Note that the verity metadata *must* be encrypted when the file is,
> since it contains hashes of the plaintext data.
>
> This patch incorporates work by Theodore Ts'o and Chandan Rajendra.
>
> Signed-off-by: Eric Biggers <[email protected]>
> ---
> fs/ext4/Makefile | 1 +
> fs/ext4/ext4.h | 21 ++-
> fs/ext4/file.c | 4 +
> fs/ext4/inode.c | 46 ++++--
> fs/ext4/ioctl.c | 12 ++
> fs/ext4/super.c | 9 ++
> fs/ext4/sysfs.c | 6 +
> fs/ext4/verity.c | 354 +++++++++++++++++++++++++++++++++++++++++++++++
> 8 files changed, 438 insertions(+), 15 deletions(-)
> create mode 100644 fs/ext4/verity.c
>
> diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
> index 8fdfcd3c3e0437..b17ddc229ac5f5 100644
> --- a/fs/ext4/Makefile
> +++ b/fs/ext4/Makefile
> @@ -13,3 +13,4 @@ ext4-y := balloc.o bitmap.o block_validity.o dir.o ext4_jbd2.o extents.o \
>
> ext4-$(CONFIG_EXT4_FS_POSIX_ACL) += acl.o
> ext4-$(CONFIG_EXT4_FS_SECURITY) += xattr_security.o
> +ext4-$(CONFIG_FS_VERITY) += verity.o
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 1cb67859e0518b..5a1deea3fb3e37 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -41,6 +41,7 @@
> #endif
>
> #include <linux/fscrypt.h>
> +#include <linux/fsverity.h>
>
> #include <linux/compiler.h>
>
> @@ -395,6 +396,7 @@ struct flex_groups {
> #define EXT4_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/
> #define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */
> #define EXT4_EXTENTS_FL 0x00080000 /* Inode uses extents */
> +#define EXT4_VERITY_FL 0x00100000 /* Verity protected inode */

Hmm, a new inode flag, superblock rocompat feature flag, and
(presumably) the Merkle tree has some sort of well defined format which
starts at the next 64k boundary past EOF.

Would you mind updating the relevant parts of the ondisk format
documentation in Documentation/filesystems/ext4/, please?

I saw that the Merkle tree and verity descriptor formats themselves are
documented in the first patch, so you could simply link the ext4
documentation to it.

> #define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
> #define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
> #define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */

<snip>

> diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
> new file mode 100644
> index 00000000000000..0ff98eb4ecdbb7
> --- /dev/null
> +++ b/fs/ext4/verity.c
> @@ -0,0 +1,354 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * fs/ext4/verity.c: fs-verity support for ext4
> + *
> + * Copyright 2019 Google LLC
> + */
> +
> +/*
> + * Implementation of fsverity_operations for ext4.
> + *
> + * ext4 stores the verity metadata (Merkle tree and fsverity_descriptor) past
> + * the end of the file, starting at the first 64K boundary beyond i_size. This
> + * approach works because (a) verity files are readonly, and (b) pages fully
> + * beyond i_size aren't visible to userspace but can be read/written internally
> + * by ext4 with only some relatively small changes to ext4. This approach
> + * avoids having to depend on the EA_INODE feature and on rearchitecturing
> + * ext4's xattr support to support paging multi-gigabyte xattrs into memory, and
> + * to support encrypting xattrs. Note that the verity metadata *must* be
> + * encrypted when the file is, since it contains hashes of the plaintext data.

Ahh, I had wondered about "why not just shove it in an ea_inode?"...

> + *
> + * Using a 64K boundary rather than a 4K one keeps things ready for
> + * architectures with 64K pages, and it doesn't necessarily waste space on-disk
> + * since there can be a hole between i_size and the start of the Merkle tree.
> + */
> +
> +#include <linux/quotaops.h>
> +
> +#include "ext4.h"
> +#include "ext4_extents.h"
> +#include "ext4_jbd2.h"
> +
> +static inline loff_t ext4_verity_metadata_pos(const struct inode *inode)
> +{
> + return round_up(inode->i_size, 65536);
> +}
> +
> +/*
> + * Read some verity metadata from the inode. __vfs_read() can't be used because
> + * we need to read beyond i_size.
> + */
> +static int pagecache_read(struct inode *inode, void *buf, size_t count,
> + loff_t pos)
> +{
> + while (count) {
> + size_t n = min_t(size_t, count,
> + PAGE_SIZE - offset_in_page(pos));
> + struct page *page;
> + void *addr;
> +
> + page = read_mapping_page(inode->i_mapping, pos >> PAGE_SHIFT,
> + NULL);
> + if (IS_ERR(page))
> + return PTR_ERR(page);
> +
> + addr = kmap_atomic(page);
> + memcpy(buf, addr + offset_in_page(pos), n);
> + kunmap_atomic(addr);
> +
> + put_page(page);
> +
> + buf += n;
> + pos += n;
> + count -= n;
> + }
> + return 0;
> +}
> +
> +/*
> + * Write some verity metadata to the inode for FS_IOC_ENABLE_VERITY.
> + * kernel_write() can't be used because the file descriptor is readonly.
> + */
> +static int pagecache_write(struct inode *inode, const void *buf, size_t count,
> + loff_t pos)
> +{
> + while (count) {
> + size_t n = min_t(size_t, count,
> + PAGE_SIZE - offset_in_page(pos));
> + struct page *page;
> + void *fsdata;
> + void *addr;
> + int res;
> +
> + res = pagecache_write_begin(NULL, inode->i_mapping, pos, n, 0,
> + &page, &fsdata);
> + if (res)
> + return res;
> +
> + addr = kmap_atomic(page);
> + memcpy(addr + offset_in_page(pos), buf, n);
> + kunmap_atomic(addr);
> +
> + res = pagecache_write_end(NULL, inode->i_mapping, pos, n, n,
> + page, fsdata);
> + if (res < 0)
> + return res;
> + if (res != n)
> + return -EIO;
> +
> + buf += n;
> + pos += n;
> + count -= n;
> + }
> + return 0;
> +}

This same code is duplicated in the f2fs patch. Is there a reason why
they don't share this common code? Even if you have to hide it under
fs/verity/ ?

--D

> +
> +static int ext4_begin_enable_verity(struct file *filp)
> +{
> + struct inode *inode = file_inode(filp);
> + const int credits = 2; /* superblock and inode for ext4_orphan_add() */
> + handle_t *handle;
> + int err;
> +
> + err = ext4_convert_inline_data(inode);
> + if (err)
> + return err;
> +
> + if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
> + ext4_warning_inode(inode,
> + "verity is only allowed on extent-based files");
> + return -EOPNOTSUPP;
> + }
> +
> + err = ext4_inode_attach_jinode(inode);
> + if (err)
> + return err;
> +
> + /*
> + * ext4 uses the last allocated block to find the verity descriptor, so
> + * we must remove any other blocks which might confuse things.
> + */
> + err = ext4_truncate(inode);
> + if (err)
> + return err;
> +
> + err = dquot_initialize(inode);
> + if (err)
> + return err;
> +
> + handle = ext4_journal_start(inode, EXT4_HT_INODE, credits);
> + if (IS_ERR(handle))
> + return PTR_ERR(handle);
> +
> + err = ext4_orphan_add(handle, inode);
> + if (err == 0)
> + ext4_set_inode_state(inode, EXT4_STATE_VERITY_IN_PROGRESS);
> +
> + ext4_journal_stop(handle);
> + return err;
> +}
> +
> +/*
> + * ext4 stores the verity descriptor beginning on the next filesystem block
> + * boundary after the Merkle tree. Then, the descriptor size is stored in the
> + * last 4 bytes of the last allocated filesystem block --- which is either the
> + * block in which the descriptor ends, or the next block after that if there
> + * weren't at least 4 bytes remaining.
> + *
> + * We can't simply store the descriptor in an xattr because it *must* be
> + * encrypted when ext4 encryption is used, but ext4 encryption doesn't encrypt
> + * xattrs. Also, if the descriptor includes a large signature blob it may be
> + * too large to store in an xattr without the EA_INODE feature.
> + */
> +static int ext4_write_verity_descriptor(struct inode *inode, const void *desc,
> + size_t desc_size, u64 merkle_tree_size)
> +{
> + const u64 desc_pos = round_up(ext4_verity_metadata_pos(inode) +
> + merkle_tree_size, i_blocksize(inode));
> + const u64 desc_end = desc_pos + desc_size;
> + const __le32 desc_size_disk = cpu_to_le32(desc_size);
> + const u64 desc_size_pos = round_up(desc_end + sizeof(desc_size_disk),
> + i_blocksize(inode)) -
> + sizeof(desc_size_disk);
> + int err;
> +
> + err = pagecache_write(inode, desc, desc_size, desc_pos);
> + if (err)
> + return err;
> +
> + return pagecache_write(inode, &desc_size_disk, sizeof(desc_size_disk),
> + desc_size_pos);
> +}
> +
> +static int ext4_end_enable_verity(struct file *filp, const void *desc,
> + size_t desc_size, u64 merkle_tree_size)
> +{
> + struct inode *inode = file_inode(filp);
> + const int credits = 2; /* superblock and inode for ext4_orphan_add() */
> + handle_t *handle;
> + int err1 = 0;
> + int err;
> +
> + if (desc != NULL) {
> + /* Succeeded; write the verity descriptor. */
> + err1 = ext4_write_verity_descriptor(inode, desc, desc_size,
> + merkle_tree_size);
> +
> + /* Write all pages before clearing VERITY_IN_PROGRESS. */
> + if (!err1)
> + err1 = filemap_write_and_wait(inode->i_mapping);
> + } else {
> + /* Failed; truncate anything we wrote past i_size. */
> + ext4_truncate(inode);
> + }
> +
> + /*
> + * We must always clean up by clearing EXT4_STATE_VERITY_IN_PROGRESS and
> + * deleting the inode from the orphan list, even if something failed.
> + * If everything succeeded, we'll also set the verity bit in the same
> + * transaction.
> + */
> +
> + ext4_clear_inode_state(inode, EXT4_STATE_VERITY_IN_PROGRESS);
> +
> + handle = ext4_journal_start(inode, EXT4_HT_INODE, credits);
> + if (IS_ERR(handle)) {
> + ext4_orphan_del(NULL, inode);
> + return PTR_ERR(handle);
> + }
> +
> + err = ext4_orphan_del(handle, inode);
> + if (err)
> + goto out_stop;
> +
> + if (desc != NULL && !err1) {
> + struct ext4_iloc iloc;
> +
> + err = ext4_reserve_inode_write(handle, inode, &iloc);
> + if (err)
> + goto out_stop;
> + ext4_set_inode_flag(inode, EXT4_INODE_VERITY);
> + ext4_set_inode_flags(inode);
> + err = ext4_mark_iloc_dirty(handle, inode, &iloc);
> + }
> +out_stop:
> + ext4_journal_stop(handle);
> + return err ?: err1;
> +}
> +
> +static int ext4_get_verity_descriptor_location(struct inode *inode,
> + size_t *desc_size_ret,
> + u64 *desc_pos_ret)
> +{
> + struct ext4_ext_path *path;
> + struct ext4_extent *last_extent;
> + u32 end_lblk;
> + u64 desc_size_pos;
> + __le32 desc_size_disk;
> + u32 desc_size;
> + u64 desc_pos;
> + int err;
> +
> + /*
> + * Descriptor size is in last 4 bytes of last allocated block.
> + * See ext4_write_verity_descriptor().
> + */
> +
> + if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
> + EXT4_ERROR_INODE(inode, "verity file doesn't use extents");
> + return -EFSCORRUPTED;
> + }
> +
> + path = ext4_find_extent(inode, EXT_MAX_BLOCKS - 1, NULL, 0);
> + if (IS_ERR(path))
> + return PTR_ERR(path);
> +
> + last_extent = path[path->p_depth].p_ext;
> + if (!last_extent) {
> + EXT4_ERROR_INODE(inode, "verity file has no extents");
> + ext4_ext_drop_refs(path);
> + kfree(path);
> + return -EFSCORRUPTED;
> + }
> +
> + end_lblk = le32_to_cpu(last_extent->ee_block) +
> + ext4_ext_get_actual_len(last_extent);
> + desc_size_pos = (u64)end_lblk << inode->i_blkbits;
> + ext4_ext_drop_refs(path);
> + kfree(path);
> +
> + if (desc_size_pos < sizeof(desc_size_disk))
> + goto bad;
> + desc_size_pos -= sizeof(desc_size_disk);
> +
> + err = pagecache_read(inode, &desc_size_disk, sizeof(desc_size_disk),
> + desc_size_pos);
> + if (err)
> + return err;
> + desc_size = le32_to_cpu(desc_size_disk);
> +
> + /*
> + * The descriptor is stored just before the desc_size_disk, but starting
> + * on a filesystem block boundary.
> + */
> +
> + if (desc_size > INT_MAX || desc_size > desc_size_pos)
> + goto bad;
> +
> + desc_pos = round_down(desc_size_pos - desc_size, i_blocksize(inode));
> + if (desc_pos < ext4_verity_metadata_pos(inode))
> + goto bad;
> +
> + *desc_size_ret = desc_size;
> + *desc_pos_ret = desc_pos;
> + return 0;
> +
> +bad:
> + EXT4_ERROR_INODE(inode, "verity file corrupted; can't find descriptor");
> + return -EFSCORRUPTED;
> +}
> +
> +static int ext4_get_verity_descriptor(struct inode *inode, void *buf,
> + size_t buf_size)
> +{
> + size_t desc_size = 0;
> + u64 desc_pos = 0;
> + int err;
> +
> + err = ext4_get_verity_descriptor_location(inode, &desc_size, &desc_pos);
> + if (err)
> + return err;
> +
> + if (buf_size) {
> + if (desc_size > buf_size)
> + return -ERANGE;
> + err = pagecache_read(inode, buf, desc_size, desc_pos);
> + if (err)
> + return err;
> + }
> + return desc_size;
> +}
> +
> +static struct page *ext4_read_merkle_tree_page(struct inode *inode,
> + pgoff_t index)
> +{
> + index += ext4_verity_metadata_pos(inode) >> PAGE_SHIFT;
> +
> + return read_mapping_page(inode->i_mapping, index, NULL);
> +}
> +
> +static int ext4_write_merkle_tree_block(struct inode *inode, const void *buf,
> + u64 index, int log_blocksize)
> +{
> + loff_t pos = ext4_verity_metadata_pos(inode) + (index << log_blocksize);
> +
> + return pagecache_write(inode, buf, 1 << log_blocksize, pos);
> +}
> +
> +const struct fsverity_operations ext4_verityops = {
> + .begin_enable_verity = ext4_begin_enable_verity,
> + .end_enable_verity = ext4_end_enable_verity,
> + .get_verity_descriptor = ext4_get_verity_descriptor,
> + .read_merkle_tree_page = ext4_read_merkle_tree_page,
> + .write_merkle_tree_block = ext4_write_merkle_tree_block,
> +};
> --
> 2.22.0.410.gd8fdbe21b5-goog
>

2019-06-21 03:18:13

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v5 14/16] ext4: add basic fs-verity support

Hi Darrick,

On Thu, Jun 20, 2019 at 04:59:38PM -0700, Darrick J. Wong wrote:
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 1cb67859e0518b..5a1deea3fb3e37 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -41,6 +41,7 @@
> > #endif
> >
> > #include <linux/fscrypt.h>
> > +#include <linux/fsverity.h>
> >
> > #include <linux/compiler.h>
> >
> > @@ -395,6 +396,7 @@ struct flex_groups {
> > #define EXT4_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/
> > #define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */
> > #define EXT4_EXTENTS_FL 0x00080000 /* Inode uses extents */
> > +#define EXT4_VERITY_FL 0x00100000 /* Verity protected inode */
>
> Hmm, a new inode flag, superblock rocompat feature flag, and
> (presumably) the Merkle tree has some sort of well defined format which
> starts at the next 64k boundary past EOF.
>
> Would you mind updating the relevant parts of the ondisk format
> documentation in Documentation/filesystems/ext4/, please?
>
> I saw that the Merkle tree and verity descriptor formats themselves are
> documented in the first patch, so you could simply link the ext4
> documentation to it.
>

Sure, I'll update the ext4 documentation.

> > +/*
> > + * Read some verity metadata from the inode. __vfs_read() can't be used because
> > + * we need to read beyond i_size.
> > + */
> > +static int pagecache_read(struct inode *inode, void *buf, size_t count,
> > + loff_t pos)
> > +{
> > + while (count) {
> > + size_t n = min_t(size_t, count,
> > + PAGE_SIZE - offset_in_page(pos));
> > + struct page *page;
> > + void *addr;
> > +
> > + page = read_mapping_page(inode->i_mapping, pos >> PAGE_SHIFT,
> > + NULL);
> > + if (IS_ERR(page))
> > + return PTR_ERR(page);
> > +
> > + addr = kmap_atomic(page);
> > + memcpy(buf, addr + offset_in_page(pos), n);
> > + kunmap_atomic(addr);
> > +
> > + put_page(page);
> > +
> > + buf += n;
> > + pos += n;
> > + count -= n;
> > + }
> > + return 0;
> > +}
> > +
> > +/*
> > + * Write some verity metadata to the inode for FS_IOC_ENABLE_VERITY.
> > + * kernel_write() can't be used because the file descriptor is readonly.
> > + */
> > +static int pagecache_write(struct inode *inode, const void *buf, size_t count,
> > + loff_t pos)
> > +{
> > + while (count) {
> > + size_t n = min_t(size_t, count,
> > + PAGE_SIZE - offset_in_page(pos));
> > + struct page *page;
> > + void *fsdata;
> > + void *addr;
> > + int res;
> > +
> > + res = pagecache_write_begin(NULL, inode->i_mapping, pos, n, 0,
> > + &page, &fsdata);
> > + if (res)
> > + return res;
> > +
> > + addr = kmap_atomic(page);
> > + memcpy(addr + offset_in_page(pos), buf, n);
> > + kunmap_atomic(addr);
> > +
> > + res = pagecache_write_end(NULL, inode->i_mapping, pos, n, n,
> > + page, fsdata);
> > + if (res < 0)
> > + return res;
> > + if (res != n)
> > + return -EIO;
> > +
> > + buf += n;
> > + pos += n;
> > + count -= n;
> > + }
> > + return 0;
> > +}
>
> This same code is duplicated in the f2fs patch. Is there a reason why
> they don't share this common code? Even if you have to hide it under
> fs/verity/ ?
>

Yes, pagecache_read() and pagecache_write() are identical between ext4 and f2fs.
I didn't put them in fs/verity/ because the "metadata past EOF" approach is a
choice of ext4 and f2fs and not intrinsic to the fs-verity feature itself, so to
avoid confusion I made the fs/verity/ support layer be completely clean of any
assumption that that's the way filesystems implement fs-verity.

Also, making the fsverity_operations call back into fs/verity/ adds a little
extra conceptual complexity about what belongs where, since then we'd have a
call stack of filesystem => fs/verity/ => filesystem => fs/verity/.

But if people would rather that ext4 and f2fs share these two functions anyway,
then sure, we could move them into fs/verity/, and other filesystems (if they
take a different approach to fs-verity) simply won't use them.

- Eric

2019-06-22 22:11:49

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH v5 01/16] fs-verity: add a documentation file

On 06/20, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> Add a documentation file for fs-verity, covering:
>
> - Introduction
> - Use cases
> - User API
> - FS_IOC_ENABLE_VERITY
> - FS_IOC_MEASURE_VERITY
> - FS_IOC_GETFLAGS
> - Accessing verity files
> - File measurement computation
> - Merkle tree
> - fs-verity descriptor
> - Built-in signature verification
> - Filesystem support
> - ext4
> - f2fs
> - Implementation details
> - Verifying data
> - Pagecache
> - Block device based filesystems
> - Userspace utility
> - Tests
> - FAQ
>
> Reviewed-by: Theodore Ts'o <[email protected]>

Reviewed-by: Jaegeuk Kim <[email protected]>

> Signed-off-by: Eric Biggers <[email protected]>
> ---
> Documentation/filesystems/fsverity.rst | 710 +++++++++++++++++++++++++
> Documentation/filesystems/index.rst | 1 +
> 2 files changed, 711 insertions(+)
> create mode 100644 Documentation/filesystems/fsverity.rst
>
> diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst
> new file mode 100644
> index 00000000000000..49524d7ea190e5
> --- /dev/null
> +++ b/Documentation/filesystems/fsverity.rst
> @@ -0,0 +1,710 @@
> +=======================================================
> +fs-verity: read-only file-based authenticity protection
> +=======================================================
> +
> +Introduction
> +============
> +
> +fs-verity (``fs/verity/``) is a support layer that filesystems can
> +hook into to support transparent integrity and authenticity protection
> +of read-only files. Currently, it is supported by the ext4 and f2fs
> +filesystems. Like fscrypt, not too much filesystem-specific code is
> +needed to support fs-verity.
> +
> +fs-verity is similar to `dm-verity
> +<https://www.kernel.org/doc/Documentation/device-mapper/verity.txt>`_
> +but works on files rather than block devices. On regular files on
> +filesystems supporting fs-verity, userspace can execute an ioctl that
> +causes the filesystem to build a Merkle tree for the file and persist
> +it to a filesystem-specific location associated with the file.
> +
> +After this, the file is made readonly, and all reads from the file are
> +automatically verified against the file's Merkle tree. Reads of any
> +corrupted data, including mmap reads, will fail.
> +
> +Userspace can use another ioctl to retrieve the root hash (actually
> +the "file measurement", which is a hash that includes the root hash)
> +that fs-verity is enforcing for the file. This ioctl executes in
> +constant time, regardless of the file size.
> +
> +fs-verity is essentially a way to hash a file in constant time,
> +subject to the caveat that reads which would violate the hash will
> +fail at runtime.
> +
> +Use cases
> +=========
> +
> +By itself, the base fs-verity feature only provides integrity
> +protection, i.e. detection of accidental (non-malicious) corruption.
> +
> +However, because fs-verity makes retrieving the file hash extremely
> +efficient, it's primarily meant to be used as a tool to support
> +authentication (detection of malicious modifications) or auditing
> +(logging file hashes before use).
> +
> +Trusted userspace code (e.g. operating system code running on a
> +read-only partition that is itself authenticated by dm-verity) can
> +authenticate the contents of an fs-verity file by using the
> +`FS_IOC_MEASURE_VERITY`_ ioctl to retrieve its hash, then verifying a
> +digital signature of it.
> +
> +A standard file hash could be used instead of fs-verity. However,
> +this is inefficient if the file is large and only a small portion may
> +be accessed. This is often the case for Android application package
> +(APK) files, for example. These typically contain many translations,
> +classes, and other resources that are infrequently or even never
> +accessed on a particular device. It would be slow and wasteful to
> +read and hash the entire file before starting the application.
> +
> +Unlike an ahead-of-time hash, fs-verity also re-verifies data each
> +time it's paged in. This ensures that malicious disk firmware can't
> +undetectably change the contents of the file at runtime.
> +
> +fs-verity does not replace or obsolete dm-verity. dm-verity should
> +still be used on read-only filesystems. fs-verity is for files that
> +must live on a read-write filesystem because they are independently
> +updated and potentially user-installed, so dm-verity cannot be used.
> +
> +The base fs-verity feature is a hashing mechanism only; actually
> +authenticating the files is up to userspace. However, to meet some
> +users' needs, fs-verity optionally supports a simple signature
> +verification mechanism where users can configure the kernel to require
> +that all fs-verity files be signed by a key loaded into a keyring; see
> +`Built-in signature verification`_. Support for fs-verity file hashes
> +in IMA (Integrity Measurement Architecture) policies is also planned.
> +
> +User API
> +========
> +
> +FS_IOC_ENABLE_VERITY
> +--------------------
> +
> +The FS_IOC_ENABLE_VERITY ioctl enables fs-verity on a file. It takes
> +in a pointer to a :c:type:`struct fsverity_enable_arg`, defined as
> +follows::
> +
> + struct fsverity_enable_arg {
> + __u32 version;
> + __u32 hash_algorithm;
> + __u32 block_size;
> + __u32 salt_size;
> + __u64 salt_ptr;
> + __u32 sig_size;
> + __u32 __reserved1;
> + __u64 sig_ptr;
> + __u64 __reserved2[11];
> + };
> +
> +This structure contains the parameters of the Merkle tree to build for
> +the file, and optionally contains a signature. It must be initialized
> +as follows:
> +
> +- ``version`` must be 1.
> +- ``hash_algorithm`` must be the identifier for the hash algorithm to
> + use for the Merkle tree, such as FS_VERITY_HASH_ALG_SHA256. See
> + ``include/uapi/linux/fsverity.h`` for the list of possible values.
> +- ``block_size`` must be the Merkle tree block size. Currently, this
> + must be equal to the system page size, which is usually 4096 bytes.
> + Other sizes may be supported in the future. This value is not
> + necessarily the same as the filesystem block size.
> +- ``salt_size`` is the size of the salt in bytes, or 0 if no salt is
> + provided. The salt is a value that is prepended to every hashed
> + block; it can be used to personalize the hashing for a particular
> + file or device. Currently the maximum salt size is 32 bytes.
> +- ``salt_ptr`` is the pointer to the salt, or NULL if no salt is
> + provided.
> +- ``sig_size`` is the size of the signature in bytes, or 0 if no
> + signature is provided. Currently the signature is (somewhat
> + arbitrarily) limited to 16128 bytes. See `Built-in signature
> + verification`_ for more information.
> +- ``sig_ptr`` is the pointer to the signature, or NULL if no
> + signature is provided.
> +- All reserved fields must be zeroed.
> +
> +FS_IOC_ENABLE_VERITY causes the filesystem to build a Merkle tree for
> +the file and persist it to a filesystem-specific location associated
> +with the file, then mark the file as a verity file. This ioctl may
> +take a long time to execute on large files, and it is interruptible by
> +fatal signals.
> +
> +FS_IOC_ENABLE_VERITY checks for write access to the inode. However,
> +it must be executed on an O_RDONLY file descriptor and no processes
> +can have the file open for writing. Attempts to open the file for
> +writing while this ioctl is executing will fail with ETXTBSY. (This
> +is necessary to guarantee that no writable file descriptors will exist
> +after verity is enabled, and to guarantee that the file's contents are
> +stable while the Merkle tree is being built over it.)
> +
> +On success, FS_IOC_ENABLE_VERITY returns 0, and the file becomes a
> +verity file. On failure (including the case of interruption by a
> +fatal signal), no changes are made to the file.
> +
> +FS_IOC_ENABLE_VERITY can fail with the following errors:
> +
> +- ``EACCES``: the process does not have write access to the file
> +- ``EEXIST``: the file already has verity enabled
> +- ``EFAULT``: the caller provided inaccessible memory
> +- ``EINTR``: the operation was interrupted by a fatal signal
> +- ``EINVAL``: unsupported version, hash algorithm, or block size; or
> + reserved bits are set; or the file descriptor refers to neither a
> + regular file nor a directory.
> +- ``EISDIR``: the file descriptor refers to a directory
> +- ``EMSGSIZE``: the salt or signature is too long
> +- ``ENOENT``: fs-verity recognizes the hash algorithm, but it's not
> + available in the kernel's crypto API as currently configured (e.g.
> + for SHA-512, missing CONFIG_CRYPTO_SHA512).
> +- ``ENOTTY``: this type of filesystem does not implement fs-verity
> +- ``EOPNOTSUPP``: the kernel was not configured with fs-verity
> + support; or the filesystem superblock has not had the 'verity'
> + feature enabled on it; or the filesystem does not support fs-verity
> + on this file. (See `Filesystem support`_.)
> +- ``EPERM``: the file is append-only
> +- ``EROFS``: the filesystem is read-only
> +- ``ETXTBSY``: someone has the file open for writing. This can be the
> + caller's file descriptor, another open file descriptor, or the file
> + reference held by a writable memory map.
> +
> +FS_IOC_MEASURE_VERITY
> +---------------------
> +
> +The FS_IOC_MEASURE_VERITY ioctl retrieves the measurement of a verity
> +file. The file measurement is a digest that cryptographically
> +identifies the file contents that are being enforced on reads.
> +
> +This ioctl takes in a pointer to a variable-length structure::
> +
> + struct fsverity_digest {
> + __u16 digest_algorithm;
> + __u16 digest_size; /* input/output */
> + __u8 digest[];
> + };
> +
> +``digest_size`` is an input/output field. On input, it must be
> +initialized to the number of bytes allocated for the variable-length
> +``digest`` field.
> +
> +On success, 0 is returned and the kernel fills in the structure as
> +follows:
> +
> +- ``digest_algorithm`` will be the hash algorithm used for the file
> + measurement. It will match ``fsverity_enable_arg::hash_algorithm``.
> +- ``digest_size`` will be the size of the digest in bytes, e.g. 32
> + for SHA-256. (This can be redundant with ``digest_algorithm``.)
> +- ``digest`` will be the actual bytes of the digest.
> +
> +FS_IOC_MEASURE_VERITY is guaranteed to execute in constant time,
> +regardless of the size of the file.
> +
> +FS_IOC_MEASURE_VERITY can fail with the following errors:
> +
> +- ``EFAULT``: the caller provided inaccessible memory
> +- ``ENODATA``: the file is not a verity file
> +- ``ENOTTY``: this type of filesystem does not implement fs-verity
> +- ``EOPNOTSUPP``: the kernel was not configured with fs-verity
> + support, or the filesystem superblock has not had the 'verity'
> + feature enabled on it. (See `Filesystem support`_.)
> +- ``EOVERFLOW``: the digest is longer than the specified
> + ``digest_size`` bytes. Try providing a larger buffer.
> +
> +FS_IOC_GETFLAGS
> +---------------
> +
> +The existing ioctl FS_IOC_GETFLAGS (which isn't specific to fs-verity)
> +can also be used to check whether a file has fs-verity enabled or not.
> +To do so, check for FS_VERITY_FL (0x00100000) in the returned flags.
> +
> +The verity flag is not settable via FS_IOC_SETFLAGS. You must use
> +FS_IOC_ENABLE_VERITY instead, since parameters must be provided.
> +
> +Accessing verity files
> +======================
> +
> +Applications can transparently access a verity file just like a
> +non-verity one, with the following exceptions:
> +
> +- Verity files are readonly. They cannot be opened for writing or
> + truncate()d, even if the file mode bits allow it. Attempts to do
> + one of these things will fail with EPERM. However, changes to
> + metadata such as owner, mode, timestamps, and xattrs are still
> + allowed, since these are not measured by fs-verity. Verity files
> + can also still be renamed, deleted, and linked to.
> +
> +- Direct I/O is not supported on verity files. Attempts to use direct
> + I/O on such files will fall back to buffered I/O.
> +
> +- DAX (Direct Access) is not supported on verity files, because this
> + would circumvent the data verification.
> +
> +- Reads of data that doesn't match the verity Merkle tree will fail
> + with EIO (for read()) or SIGBUS (for mmap() reads).
> +
> +- If the sysctl "fs.verity.require_signatures" is set to 1 and the
> + file's verity measurement is not signed by a key in the fs-verity
> + keyring, then opening the file will fail. See `Built-in signature
> + verification`_.
> +
> +Direct access to the Merkle tree is not supported. Therefore, if a
> +verity file is copied, or is backed up and restored, then it will lose
> +its "verity"-ness. fs-verity is primarily meant for files like
> +executables that are managed by a package manager.
> +
> +File measurement computation
> +============================
> +
> +This section describes how fs-verity hashes the file contents using a
> +Merkle tree to produce the "file measurement" which cryptographically
> +identifies the file contents. This algorithm is the same for all
> +filesystems that support fs-verity.
> +
> +Userspace only needs to be aware of this algorithm if it needs to
> +compute the file measurement itself, e.g. in order to sign the file.
> +
> +Merkle tree
> +-----------
> +
> +The file contents is divided into blocks, where the block size is
> +configurable but is usually 4096 bytes. The end of the last block is
> +zero-padded if needed. Each block is then hashed, producing the first
> +level of hashes. Then, the hashes in this first level are grouped
> +into 'blocksize'-byte blocks (zero-padding the ends as needed) and
> +these blocks are hashed, producing the second level of hashes. This
> +proceeds up the tree until only a single block remains. The hash of
> +this block is the "Merkle tree root hash".
> +
> +If the file is nonempty and fits in one block, then the "Merkle tree
> +root hash" is simply the hash of the single data block. If the file
> +is empty, then the "Merkle tree root hash" is all zeroes.
> +
> +The "blocks" here are not necessarily the same as "filesystem blocks".
> +
> +If a salt was specified, then it's zero-padded to the closest multiple
> +of the input size of the hash algorithm's compression function, e.g.
> +64 bytes for SHA-256 or 128 bytes for SHA-512. The padded salt is
> +prepended to every data or Merkle tree block that is hashed.
> +
> +The purpose of the block padding is to cause every hash to be taken
> +over the same amount of data, which simplifies the implementation and
> +keeps open more possibilities for hardware acceleration. The purpose
> +of the salt padding is to make the salting "free" when the salted hash
> +state is precomputed, then imported for each hash.
> +
> +Example: in the recommended configuration of SHA-256 and 4K blocks,
> +128 hash values fit in each block. Thus, each level of the Merkle
> +tree is approximately 128 times smaller than the previous, and for
> +large files the Merkle tree's size converges to approximately 1/127 of
> +the original file size. However, for small files, the padding is
> +significant, making the space overhead proportionally more.
> +
> +fs-verity descriptor
> +--------------------
> +
> +By itself, the Merkle tree root hash is ambiguous. For example, it
> +can't a distinguish a large file from a small second file whose data
> +is exactly the top-level hash block of the first file. Ambiguities
> +also arise from the convention of padding to the next block boundary.
> +
> +To solve this problem, the verity file measurement is actually
> +computed as a hash of the following structure, which contains the
> +Merkle tree root hash as well as other fields such as the file size::
> +
> + struct fsverity_descriptor {
> + __u8 version; /* must be 1 */
> + __u8 hash_algorithm; /* Merkle tree hash algorithm */
> + __u8 log_blocksize; /* log2 of size of data and tree blocks */
> + __u8 salt_size; /* size of salt in bytes; 0 if none */
> + __le32 sig_size; /* must be 0 */
> + __le64 data_size; /* size of file the Merkle tree is built over */
> + __u8 root_hash[64]; /* Merkle tree root hash */
> + __u8 salt[32]; /* salt prepended to each hashed block */
> + __u8 __reserved[144]; /* must be 0's */
> + };
> +
> +Note that the ``sig_size`` field must be set to 0 for the purpose of
> +computing the file measurement, even if a signature was provided (or
> +will be provided) to `FS_IOC_ENABLE_VERITY`_.
> +
> +Built-in signature verification
> +===============================
> +
> +With CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y, fs-verity supports putting
> +a portion of an authentication policy (see `Use cases`_) in the
> +kernel. Specifically, it adds support for:
> +
> +1. At fs-verity module initialization time, a keyring ".fs-verity" is
> + created. The root user can add trusted X.509 certificates to this
> + keyring using the add_key() system call, then (when done)
> + optionally use keyctl_restrict_keyring() to prevent additional
> + certificates from being added.
> +
> +2. `FS_IOC_ENABLE_VERITY`_ accepts a pointer to a PKCS#7 formatted
> + signature in DER format of the file measurement. On success, this
> + signature is persisted alongside the Merkle tree. Then, any time
> + the file is opened, the kernel will verify this signature against
> + the certificates in the ".fs-verity" keyring, and verify that it
> + matches the actual file measurement.
> +
> +3. A new sysctl "fs.verity.require_signatures" is made available.
> + When set to 1, the kernel requires that all verity files have a
> + correctly signed file measurement as described in (2).
> +
> +File measurements must be signed in the following format, which is
> +similar to the structure used by `FS_IOC_MEASURE_VERITY`_::
> +
> + struct fsverity_signed_digest {
> + char magic[8]; /* must be "FSVerity" */
> + __le16 digest_algorithm;
> + __le16 digest_size;
> + __u8 digest[];
> + };
> +
> +fs-verity's built-in signature verification support is meant as a
> +relatively simple mechanism that can be used to provide some level of
> +authenticity protection for verity files, as an alternative to doing
> +the signature verification in userspace or using IMA-appraisal.
> +However, with this mechanism, userspace programs still need to check
> +that the verity bit is set, and there is no protection against verity
> +files being swapped around.
> +
> +Filesystem support
> +==================
> +
> +fs-verity is currently supported by the ext4 and f2fs filesystems.
> +The CONFIG_FS_VERITY kconfig option must be enabled to use fs-verity
> +on either filesystem.
> +
> +``include/linux/fsverity.h`` declares the interface between the
> +``fs/verity/`` support layer and filesystems. Briefly, filesystems
> +must provide an ``fsverity_operations`` structure that provides
> +methods to read and write the verity metadata to a filesystem-specific
> +location, including the Merkle tree blocks and
> +``fsverity_descriptor``. Filesystems must also call functions in
> +``fs/verity/`` at certain times, such as when a file is opened or when
> +pages have been read into the pagecache. (See `Verifying data`_.)
> +
> +ext4
> +----
> +
> +ext4 supports fs-verity since Linux TODO and e2fsprogs v1.45.2.
> +
> +To create verity files on an ext4 filesystem, the filesystem must have
> +been formatted with ``-O verity`` or had ``tune2fs -O verity`` run on
> +it. "verity" is an RO_COMPAT filesystem feature, so once set, old
> +kernels will only be able to mount the filesystem readonly, and old
> +versions of e2fsck will be unable to check the filesystem. Moreover,
> +currently ext4 only supports mounting a filesystem with the "verity"
> +feature when its block size is equal to PAGE_SIZE (often 4096 bytes).
> +
> +ext4 sets the EXT4_VERITY_FL on-disk inode flag on verity files. It
> +can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be cleared.
> +
> +ext4 also supports encryption, which can be used simultaneously with
> +fs-verity. In this case, the plaintext data is verified rather than
> +the ciphertext. This is necessary in order to make the file
> +measurement meaningful, since every file is encrypted differently.
> +
> +ext4 stores the verity metadata (Merkle tree and fsverity_descriptor)
> +past the end of the file, starting at the first 64K boundary beyond
> +i_size. This approach works because (a) verity files are readonly,
> +and (b) pages fully beyond i_size aren't visible to userspace but can
> +be read/written internally by ext4 with only some relatively small
> +changes to ext4. This approach avoids having to depend on the
> +EA_INODE feature and on rearchitecturing ext4's xattr support to
> +support paging multi-gigabyte xattrs into memory, and to support
> +encrypting xattrs. Note that the verity metadata *must* be encrypted
> +when the file is, since it contains hashes of the plaintext data.
> +
> +Currently, ext4 verity only supports the case where the Merkle tree
> +block size, filesystem block size, and page size are all the same. It
> +also only supports extent-based files.
> +
> +f2fs
> +----
> +
> +f2fs supports fs-verity since Linux TODO and f2fs-tools v1.11.0.
> +
> +To create verity files on an f2fs filesystem, the filesystem must have
> +been formatted with ``-O verity``.
> +
> +f2fs sets the FADVISE_VERITY_BIT on-disk inode flag on verity files.
> +It can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be
> +cleared.
> +
> +Like ext4, f2fs stores the verity metadata (Merkle tree and
> +fsverity_descriptor) past the end of the file, starting at the first
> +64K boundary beyond i_size. See explanation for ext4 above.
> +Moreover, f2fs supports at most 4096 bytes of xattr entries per inode
> +which wouldn't be enough for even a single Merkle tree block.
> +
> +Currently, f2fs verity only supports a Merkle tree block size of 4096.
> +
> +Implementation details
> +======================
> +
> +Verifying data
> +--------------
> +
> +fs-verity ensures that all reads of a verity file's data are verified,
> +regardless of which syscall is used to do the read (e.g. mmap(),
> +read(), pread()) and regardless of whether it's the first read or a
> +later read (unless the later read can return cached data that was
> +already verified). Below, we describe how filesystems implement this.
> +
> +Pagecache
> +~~~~~~~~~
> +
> +For filesystems using Linux's pagecache, the ``->readpage()`` and
> +``->readpages()`` methods must be modified to verify pages before they
> +are marked Uptodate. Merely hooking ``->read_iter()`` would be
> +insufficient, since ``->read_iter()`` is not used for memory maps.
> +
> +Therefore, fs/verity/ provides a function fsverity_verify_page() which
> +verifies a page that has been read into the pagecache of a verity
> +inode, but is still locked and not Uptodate, so it's not yet readable
> +by userspace. As needed to do the verification,
> +fsverity_verify_page() will call back into the filesystem to read
> +Merkle tree pages via fsverity_operations::read_merkle_tree_page().
> +
> +fsverity_verify_page() returns false if verification failed; in this
> +case, the filesystem must not set the page Uptodate. Following this,
> +as per the usual Linux pagecache behavior, attempts by userspace to
> +read() from the part of the file containing the page will fail with
> +EIO, and accesses to the page within a memory map will raise SIGBUS.
> +
> +fsverity_verify_page() currently only supports the case where the
> +Merkle tree block size is equal to PAGE_SIZE (often 4096 bytes).
> +
> +In principle, fsverity_verify_page() verifies the entire path in the
> +Merkle tree from the data page to the root hash. However, for
> +efficiency the filesystem may cache the hash pages. Therefore,
> +fsverity_verify_page() only ascends the tree reading hash pages until
> +an already-verified hash page is seen, as indicated by the PageChecked
> +bit being set. It then verifies the path to that page.
> +
> +This optimization, which is also used by dm-verity, results in
> +excellent sequential read performance. This is because usually (e.g.
> +127 in 128 times for 4K blocks and SHA-256) the hash page from the
> +bottom level of the tree will already be cached and checked from
> +reading a previous data page. However, random reads perform worse.
> +
> +Block device based filesystems
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Block device based filesystems (e.g. ext4 and f2fs) in Linux also use
> +the pagecache, so the above subsection applies too. However, they
> +also usually read many pages from a file at once, grouped into a
> +structure called a "bio". To make it easier for these types of
> +filesystems to support fs-verity, fs/verity/ also provides a function
> +fsverity_verify_bio() which verifies all pages in a bio.
> +
> +ext4 and f2fs also support encryption. If a verity file is also
> +encrypted, the pages must be decrypted before being verified. To
> +support this, these filesystems allocate a "post-read context" for
> +each bio and store it in ``->bi_private``::
> +
> + struct bio_post_read_ctx {
> + struct bio *bio;
> + struct work_struct work;
> + unsigned int cur_step;
> + unsigned int enabled_steps;
> + };
> +
> +``enabled_steps`` is a bitmask that specifies whether decryption,
> +verity, or both is enabled. After the bio completes, for each needed
> +postprocessing step the filesystem enqueues the bio_post_read_ctx on a
> +workqueue, and then the workqueue work does the decryption or
> +verification. Finally, pages where no decryption or verity error
> +occurred are marked Uptodate, and the pages are unlocked.
> +
> +Files on ext4 and f2fs may contain holes. Normally, ``->readpages()``
> +simply zeroes holes and sets the corresponding pages Uptodate; no bios
> +are issued. To prevent this case from bypassing fs-verity, these
> +filesystems use fsverity_verify_page() to verify hole pages.
> +
> +ext4 and f2fs disable direct I/O on verity files, since otherwise
> +direct I/O would bypass fs-verity. (They also do the same for
> +encrypted files.)
> +
> +Userspace utility
> +=================
> +
> +This document focuses on the kernel, but a userspace utility for
> +fs-verity can be found at:
> +
> + https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity-utils.git
> +
> +See the README.md file in the fsverity-utils source tree for details,
> +including examples of setting up fs-verity protected files.
> +
> +Tests
> +=====
> +
> +To test fs-verity, use xfstests. For example, using `kvm-xfstests
> +<https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md>`_::
> +
> + kvm-xfstests -c ext4,f2fs -g verity
> +
> +FAQ
> +===
> +
> +This section answers frequently asked questions about fs-verity that
> +weren't already directly answered in other parts of this document.
> +
> +:Q: Why isn't fs-verity part of IMA?
> +:A: fs-verity and IMA (Integrity Measurement Architecture) have
> + different focuses. fs-verity is a filesystem-level mechanism for
> + hashing individual files using a Merkle tree. In contrast, IMA
> + specifies a system-wide policy that specifies which files are
> + hashed and what to do with those hashes, such as log them,
> + authenticate them, or add them to a measurement list.
> +
> + IMA is planned to support the fs-verity hashing mechanism as an
> + alternative to doing full file hashes, for people who want the
> + performance and security benefits of the Merkle tree based hash.
> + But it doesn't make sense to force all uses of fs-verity to be
> + through IMA. As a standalone filesystem feature, fs-verity
> + already meets many users' needs, and it's testable like other
> + filesystem features e.g. with xfstests.
> +
> +:Q: Isn't fs-verity useless because the attacker can just modify the
> + hashes in the Merkle tree, which is stored on-disk?
> +:A: To verify the authenticity of an fs-verity file you must verify
> + the authenticity of the "file measurement", which is basically the
> + root hash of the Merkle tree. See `Use cases`_.
> +
> +:Q: Isn't fs-verity useless because the attacker can just replace a
> + verity file with a non-verity one?
> +:A: See `Use cases`_. In the initial use case, it's really trusted
> + userspace code that authenticates the files; fs-verity is just a
> + tool to do this job efficiently and securely. The trusted
> + userspace code will consider non-verity files to be inauthentic.
> +
> +:Q: Why does the Merkle tree need to be stored on-disk? Couldn't you
> + store just the root hash?
> +:A: If the Merkle tree wasn't stored on-disk, then you'd have to
> + compute the entire tree when the file is first accessed, even if
> + just one byte is being read. This is a fundamental consequence of
> + how Merkle tree hashing works. To verify a leaf node, you need to
> + verify the whole path to the root hash, including the root node
> + (the thing which the root hash is a hash of). But if the root
> + node isn't stored on-disk, you have to compute it by hashing its
> + children, and so on until you've actually hashed the entire file.
> +
> + That defeats most of the point of doing a Merkle tree-based hash,
> + since if you have to hash the whole file ahead of time anyway,
> + then you could simply do sha256(file) instead. That would be much
> + simpler, and a bit faster too.
> +
> + It's true that an in-memory Merkle tree could still provide the
> + advantage of verification on every read rather than just on the
> + first read. However, it would be inefficient because every time a
> + hash page gets evicted (you can't pin the entire Merkle tree into
> + memory, since it may be very large), in order to restore it you
> + again need to hash everything below it in the tree. This again
> + defeats most of the point of doing a Merkle tree-based hash, since
> + a single block read could trigger re-hashing gigabytes of data.
> +
> +:Q: But couldn't you store just the leaf nodes and compute the rest?
> +:A: See previous answer; this really just moves up one level, since
> + one could alternatively interpret the data blocks as being the
> + leaf nodes of the Merkle tree. It's true that the tree can be
> + computed much faster if the leaf level is stored rather than just
> + the data, but that's only because each level is less than 1% the
> + size of the level below (assuming the recommended settings of
> + SHA-256 and 4K blocks). For the exact same reason, by storing
> + "just the leaf nodes" you'd already be storing over 99% of the
> + tree, so you might as well simply store the whole tree.
> +
> +:Q: Can the Merkle tree be built ahead of time, e.g. distributed as
> + part of a package that is installed to many computers?
> +:A: This isn't currently supported. It was part of the original
> + design, but was removed to simplify the kernel UAPI and because it
> + wasn't a critical use case. Files are usually installed once and
> + used many times, and cryptographic hashing is somewhat fast on
> + most modern processors.
> +
> +:Q: Why doesn't fs-verity support writes?
> +:A: Write support would be very difficult and would require a
> + completely different design, so it's well outside the scope of
> + fs-verity. Write support would require:
> +
> + - A way to maintain consistency between the data and hashes,
> + including all levels of hashes, since corruption after a crash
> + (especially of potentially the entire file!) is unacceptable.
> + The main options for solving this are data journalling,
> + copy-on-write, and log-structured volume. But it's very hard to
> + retrofit existing filesystems with new consistency mechanisms.
> + Data journalling is available on ext4, but is very slow.
> +
> + - Rebuilding the the Merkle tree after every write, which would be
> + extremely inefficient. Alternatively, a different authenticated
> + dictionary structure such as an "authenticated skiplist" could
> + be used. However, this would be far more complex.
> +
> + Compare it to dm-verity vs. dm-integrity. dm-verity is very
> + simple: the kernel just verifies read-only data against a
> + read-only Merkle tree. In contrast, dm-integrity supports writes
> + but is slow, is much more complex, and doesn't actually support
> + full-device authentication since it authenticates each sector
> + independently, i.e. there is no "root hash". It doesn't really
> + make sense for the same device-mapper target to support these two
> + very different cases; the same applies to fs-verity.
> +
> +:Q: Since verity files are immutable, why isn't the immutable bit set?
> +:A: The existing "immutable" bit (FS_IMMUTABLE_FL) already has a
> + specific set of semantics which not only make the file contents
> + read-only, but also prevent the file from being deleted, renamed,
> + linked to, or having its owner or mode changed. These extra
> + properties are unwanted for fs-verity, so reusing the immutable
> + bit isn't appropriate.
> +
> +:Q: Why does the API use ioctls instead of setxattr() and getxattr()?
> +:A: Abusing the xattr interface for basically arbitrary syscalls is
> + heavily frowned upon by most of the Linux filesystem developers.
> + An xattr should really just be an xattr on-disk, not an API to
> + e.g. magically trigger construction of a Merkle tree.
> +
> +:Q: Does fs-verity support remote filesystems?
> +:A: Only ext4 and f2fs support is implemented currently, but in
> + principle any filesystem that can store per-file verity metadata
> + can support fs-verity, regardless of whether it's local or remote.
> + Some filesystems may have fewer options of where to store the
> + verity metadata; one possibility is to store it past the end of
> + the file and "hide" it from userspace by manipulating i_size. The
> + data verification functions provided by ``fs/verity/`` also assume
> + that the filesystem uses the Linux pagecache, but both local and
> + remote filesystems normally do so.
> +
> +:Q: Why is anything filesystem-specific at all? Shouldn't fs-verity
> + be implemented entirely at the VFS level?
> +:A: There are many reasons why this is not possible or would be very
> + difficult, including the following:
> +
> + - To prevent bypassing verification, pages must not be marked
> + Uptodate until they've been verified. Currently, each
> + filesystem is responsible for marking pages Uptodate via
> + ``->readpages()``. Therefore, currently it's not possible for
> + the VFS to do the verification on its own. Changing this would
> + require significant changes to the VFS and all filesystems.
> +
> + - It would require defining a filesystem-independent way to store
> + the verity metadata. Extended attributes don't work for this
> + because (a) the Merkle tree may be gigabytes, but many
> + filesystems assume that all xattrs fit into a single 4K
> + filesystem block, and (b) ext4 and f2fs encryption doesn't
> + encrypt xattrs, yet the Merkle tree *must* be encrypted when the
> + file contents are, because it stores hashes of the plaintext
> + file contents.
> +
> + So the verity metadata would have to be stored in an actual
> + file. Using a separate file would be very ugly, since the
> + metadata is fundamentally part of the file to be protected, and
> + it could cause problems where users could delete the real file
> + but not the metadata file or vice versa. On the other hand,
> + having it be in the same file would break applications unless
> + filesystems' notion of i_size were divorced from the VFS's,
> + which would be complex and require changes to all filesystems.
> +
> + - It's desirable that FS_IOC_ENABLE_VERITY uses the filesystem's
> + transaction mechanism so that either the file ends up with
> + verity enabled, or no changes were made. Allowing intermediate
> + states to occur after a crash may cause problems.
> diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
> index 1131c34d77f6f1..416c7f0e123af7 100644
> --- a/Documentation/filesystems/index.rst
> +++ b/Documentation/filesystems/index.rst
> @@ -31,6 +31,7 @@ filesystem implementations.
>
> journalling
> fscrypt
> + fsverity
>
> Filesystem-specific documentation
> =================================
> --
> 2.22.0.410.gd8fdbe21b5-goog

2019-06-22 22:12:11

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH v5 02/16] fs-verity: add MAINTAINERS file entry

On 06/20, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> fs-verity will be jointly maintained by Eric Biggers and Theodore Ts'o.
>
> Reviewed-by: Theodore Ts'o <[email protected]>

Reviewed-by: Jaegeuk Kim <[email protected]>

> Signed-off-by: Eric Biggers <[email protected]>
> ---
> MAINTAINERS | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a6954776a37e70..655065116f9228 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6505,6 +6505,18 @@ S: Maintained
> F: fs/notify/
> F: include/linux/fsnotify*.h
>
> +FSVERITY: READ-ONLY FILE-BASED AUTHENTICITY PROTECTION
> +M: Eric Biggers <[email protected]>
> +M: Theodore Y. Ts'o <[email protected]>
> +L: [email protected]
> +Q: https://patchwork.kernel.org/project/linux-fscrypt/list/
> +T: git git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt.git fsverity
> +S: Supported
> +F: fs/verity/
> +F: include/linux/fsverity.h
> +F: include/uapi/linux/fsverity.h
> +F: Documentation/filesystems/fsverity.rst
> +
> FUJITSU LAPTOP EXTRAS
> M: Jonathan Woithe <[email protected]>
> L: [email protected]
> --
> 2.22.0.410.gd8fdbe21b5-goog

2019-06-22 22:12:25

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH v5 03/16] fs-verity: add UAPI header

On 06/20, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> Add the UAPI header for fs-verity, including two ioctls:
>
> - FS_IOC_ENABLE_VERITY
> - FS_IOC_MEASURE_VERITY
>
> These ioctls are documented in the "User API" section of
> Documentation/filesystems/fsverity.rst.
>
> Examples of using these ioctls can be found in fsverity-utils
> (https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity-utils.git).
>
> I've also written xfstests that test these ioctls
> (https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/xfstests-dev.git/log/?h=fsverity).
>
> Reviewed-by: Theodore Ts'o <[email protected]>

Reviewed-by: Jaegeuk Kim <[email protected]>

> Signed-off-by: Eric Biggers <[email protected]>
> ---
> Documentation/ioctl/ioctl-number.txt | 1 +
> include/uapi/linux/fsverity.h | 39 ++++++++++++++++++++++++++++
> 2 files changed, 40 insertions(+)
> create mode 100644 include/uapi/linux/fsverity.h
>
> diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
> index c9558146ac5896..21767c81e86d58 100644
> --- a/Documentation/ioctl/ioctl-number.txt
> +++ b/Documentation/ioctl/ioctl-number.txt
> @@ -225,6 +225,7 @@ Code Seq#(hex) Include File Comments
> 'f' 00-0F fs/ext4/ext4.h conflict!
> 'f' 00-0F linux/fs.h conflict!
> 'f' 00-0F fs/ocfs2/ocfs2_fs.h conflict!
> +'f' 81-8F linux/fsverity.h
> 'g' 00-0F linux/usb/gadgetfs.h
> 'g' 20-2F linux/usb/g_printer.h
> 'h' 00-7F conflict! Charon filesystem
> diff --git a/include/uapi/linux/fsverity.h b/include/uapi/linux/fsverity.h
> new file mode 100644
> index 00000000000000..57d1d7fc0c345a
> --- /dev/null
> +++ b/include/uapi/linux/fsverity.h
> @@ -0,0 +1,39 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * fs-verity user API
> + *
> + * These ioctls can be used on filesystems that support fs-verity. See the
> + * "User API" section of Documentation/filesystems/fsverity.rst.
> + *
> + * Copyright 2019 Google LLC
> + */
> +#ifndef _UAPI_LINUX_FSVERITY_H
> +#define _UAPI_LINUX_FSVERITY_H
> +
> +#include <linux/ioctl.h>
> +#include <linux/types.h>
> +
> +#define FS_VERITY_HASH_ALG_SHA256 1
> +
> +struct fsverity_enable_arg {
> + __u32 version;
> + __u32 hash_algorithm;
> + __u32 block_size;
> + __u32 salt_size;
> + __u64 salt_ptr;
> + __u32 sig_size;
> + __u32 __reserved1;
> + __u64 sig_ptr;
> + __u64 __reserved2[11];
> +};
> +
> +struct fsverity_digest {
> + __u16 digest_algorithm;
> + __u16 digest_size; /* input/output */
> + __u8 digest[];
> +};
> +
> +#define FS_IOC_ENABLE_VERITY _IOW('f', 133, struct fsverity_enable_arg)
> +#define FS_IOC_MEASURE_VERITY _IOWR('f', 134, struct fsverity_digest)
> +
> +#endif /* _UAPI_LINUX_FSVERITY_H */
> --
> 2.22.0.410.gd8fdbe21b5-goog

2019-06-22 22:17:30

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH v5 05/16] fs-verity: add Kconfig and the helper functions for hashing

On 06/20, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> Add the beginnings of the fs/verity/ support layer, including the
> Kconfig option and various helper functions for hashing. To start, only
> SHA-256 is supported, but other hash algorithms can easily be added.
>
> Reviewed-by: Theodore Ts'o <[email protected]>

Reviewed-by: Jaegeuk Kim <[email protected]>

> Signed-off-by: Eric Biggers <[email protected]>
> ---
> fs/Kconfig | 2 +
> fs/Makefile | 1 +
> fs/verity/Kconfig | 38 +++++
> fs/verity/Makefile | 4 +
> fs/verity/fsverity_private.h | 88 +++++++++++
> fs/verity/hash_algs.c | 274 +++++++++++++++++++++++++++++++++++
> fs/verity/init.c | 41 ++++++
> 7 files changed, 448 insertions(+)
> create mode 100644 fs/verity/Kconfig
> create mode 100644 fs/verity/Makefile
> create mode 100644 fs/verity/fsverity_private.h
> create mode 100644 fs/verity/hash_algs.c
> create mode 100644 fs/verity/init.c
>
> diff --git a/fs/Kconfig b/fs/Kconfig
> index f1046cf6ad85e0..4b66dafbdc7b1c 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -113,6 +113,8 @@ config MANDATORY_FILE_LOCKING
>
> source "fs/crypto/Kconfig"
>
> +source "fs/verity/Kconfig"
> +
> source "fs/notify/Kconfig"
>
> source "fs/quota/Kconfig"
> diff --git a/fs/Makefile b/fs/Makefile
> index c9aea23aba560c..fe7f2c07f482e1 100644
> --- a/fs/Makefile
> +++ b/fs/Makefile
> @@ -34,6 +34,7 @@ obj-$(CONFIG_AIO) += aio.o
> obj-$(CONFIG_IO_URING) += io_uring.o
> obj-$(CONFIG_FS_DAX) += dax.o
> obj-$(CONFIG_FS_ENCRYPTION) += crypto/
> +obj-$(CONFIG_FS_VERITY) += verity/
> obj-$(CONFIG_FILE_LOCKING) += locks.o
> obj-$(CONFIG_COMPAT) += compat.o compat_ioctl.o
> obj-$(CONFIG_BINFMT_AOUT) += binfmt_aout.o
> diff --git a/fs/verity/Kconfig b/fs/verity/Kconfig
> new file mode 100644
> index 00000000000000..c2bca0b01ecfa9
> --- /dev/null
> +++ b/fs/verity/Kconfig
> @@ -0,0 +1,38 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +config FS_VERITY
> + bool "FS Verity (read-only file-based authenticity protection)"
> + select CRYPTO
> + # SHA-256 is selected as it's intended to be the default hash algorithm.
> + # To avoid bloat, other wanted algorithms must be selected explicitly.
> + select CRYPTO_SHA256
> + help
> + This option enables fs-verity. fs-verity is the dm-verity
> + mechanism implemented at the file level. On supported
> + filesystems (currently EXT4 and F2FS), userspace can use an
> + ioctl to enable verity for a file, which causes the filesystem
> + to build a Merkle tree for the file. The filesystem will then
> + transparently verify any data read from the file against the
> + Merkle tree. The file is also made read-only.
> +
> + This serves as an integrity check, but the availability of the
> + Merkle tree root hash also allows efficiently supporting
> + various use cases where normally the whole file would need to
> + be hashed at once, such as: (a) auditing (logging the file's
> + hash), or (b) authenticity verification (comparing the hash
> + against a known good value, e.g. from a digital signature).
> +
> + fs-verity is especially useful on large files where not all
> + the contents may actually be needed. Also, fs-verity verifies
> + data each time it is paged back in, which provides better
> + protection against malicious disks vs. an ahead-of-time hash.
> +
> + If unsure, say N.
> +
> +config FS_VERITY_DEBUG
> + bool "FS Verity debugging"
> + depends on FS_VERITY
> + help
> + Enable debugging messages related to fs-verity by default.
> +
> + Say N unless you are an fs-verity developer.
> diff --git a/fs/verity/Makefile b/fs/verity/Makefile
> new file mode 100644
> index 00000000000000..398f3f85fa184b
> --- /dev/null
> +++ b/fs/verity/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +obj-$(CONFIG_FS_VERITY) += hash_algs.o \
> + init.o
> diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
> new file mode 100644
> index 00000000000000..9697aaebb5dc1f
> --- /dev/null
> +++ b/fs/verity/fsverity_private.h
> @@ -0,0 +1,88 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * fs-verity: read-only file-based authenticity protection
> + *
> + * Copyright 2019 Google LLC
> + */
> +
> +#ifndef _FSVERITY_PRIVATE_H
> +#define _FSVERITY_PRIVATE_H
> +
> +#ifdef CONFIG_FS_VERITY_DEBUG
> +#define DEBUG
> +#endif
> +
> +#define pr_fmt(fmt) "fs-verity: " fmt
> +
> +#include <crypto/sha.h>
> +#include <linux/fs.h>
> +#include <uapi/linux/fsverity.h>
> +
> +struct ahash_request;
> +
> +/*
> + * Implementation limit: maximum depth of the Merkle tree. For now 8 is plenty;
> + * it's enough for over U64_MAX bytes of data using SHA-256 and 4K blocks.
> + */
> +#define FS_VERITY_MAX_LEVELS 8
> +
> +/*
> + * Largest digest size among all hash algorithms supported by fs-verity.
> + * Currently assumed to be <= size of fsverity_descriptor::root_hash.
> + */
> +#define FS_VERITY_MAX_DIGEST_SIZE SHA256_DIGEST_SIZE
> +
> +/* A hash algorithm supported by fs-verity */
> +struct fsverity_hash_alg {
> + struct crypto_ahash *tfm; /* hash tfm, allocated on demand */
> + const char *name; /* crypto API name, e.g. sha256 */
> + unsigned int digest_size; /* digest size in bytes, e.g. 32 for SHA-256 */
> + unsigned int block_size; /* block size in bytes, e.g. 64 for SHA-256 */
> +};
> +
> +/* Merkle tree parameters: hash algorithm, initial hash state, and topology */
> +struct merkle_tree_params {
> + const struct fsverity_hash_alg *hash_alg; /* the hash algorithm */
> + const u8 *hashstate; /* initial hash state or NULL */
> + unsigned int digest_size; /* same as hash_alg->digest_size */
> + unsigned int block_size; /* size of data and tree blocks */
> + unsigned int hashes_per_block; /* number of hashes per tree block */
> + unsigned int log_blocksize; /* log2(block_size) */
> + unsigned int log_arity; /* log2(hashes_per_block) */
> + unsigned int num_levels; /* number of levels in Merkle tree */
> + u64 tree_size; /* Merkle tree size in bytes */
> +
> + /*
> + * Starting block index for each tree level, ordered from leaf level (0)
> + * to root level ('num_levels - 1')
> + */
> + u64 level_start[FS_VERITY_MAX_LEVELS];
> +};
> +
> +/* hash_algs.c */
> +
> +extern struct fsverity_hash_alg fsverity_hash_algs[];
> +
> +const struct fsverity_hash_alg *fsverity_get_hash_alg(const struct inode *inode,
> + unsigned int num);
> +const u8 *fsverity_prepare_hash_state(const struct fsverity_hash_alg *alg,
> + const u8 *salt, size_t salt_size);
> +int fsverity_hash_page(const struct merkle_tree_params *params,
> + const struct inode *inode,
> + struct ahash_request *req, struct page *page, u8 *out);
> +int fsverity_hash_buffer(const struct fsverity_hash_alg *alg,
> + const void *data, size_t size, u8 *out);
> +void __init fsverity_check_hash_algs(void);
> +
> +/* init.c */
> +
> +extern void __printf(3, 4) __cold
> +fsverity_msg(const struct inode *inode, const char *level,
> + const char *fmt, ...);
> +
> +#define fsverity_warn(inode, fmt, ...) \
> + fsverity_msg((inode), KERN_WARNING, fmt, ##__VA_ARGS__)
> +#define fsverity_err(inode, fmt, ...) \
> + fsverity_msg((inode), KERN_ERR, fmt, ##__VA_ARGS__)
> +
> +#endif /* _FSVERITY_PRIVATE_H */
> diff --git a/fs/verity/hash_algs.c b/fs/verity/hash_algs.c
> new file mode 100644
> index 00000000000000..46df17094fc252
> --- /dev/null
> +++ b/fs/verity/hash_algs.c
> @@ -0,0 +1,274 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * fs/verity/hash_algs.c: fs-verity hash algorithms
> + *
> + * Copyright 2019 Google LLC
> + */
> +
> +#include "fsverity_private.h"
> +
> +#include <crypto/hash.h>
> +#include <linux/scatterlist.h>
> +
> +/* The hash algorithms supported by fs-verity */
> +struct fsverity_hash_alg fsverity_hash_algs[] = {
> + [FS_VERITY_HASH_ALG_SHA256] = {
> + .name = "sha256",
> + .digest_size = SHA256_DIGEST_SIZE,
> + .block_size = SHA256_BLOCK_SIZE,
> + },
> +};
> +
> +/**
> + * fsverity_get_hash_alg() - validate and prepare a hash algorithm
> + * @inode: optional inode for logging purposes
> + * @num: the hash algorithm number
> + *
> + * Get the struct fsverity_hash_alg for the given hash algorithm number, and
> + * ensure it has a hash transform ready to go. The hash transforms are
> + * allocated on-demand so that we don't waste resources unnecessarily, and
> + * because the crypto modules may be initialized later than fs/verity/.
> + *
> + * Return: pointer to the hash alg on success, else an ERR_PTR()
> + */
> +const struct fsverity_hash_alg *fsverity_get_hash_alg(const struct inode *inode,
> + unsigned int num)
> +{
> + struct fsverity_hash_alg *alg;
> + struct crypto_ahash *tfm;
> + int err;
> +
> + if (num >= ARRAY_SIZE(fsverity_hash_algs) ||
> + !fsverity_hash_algs[num].name) {
> + fsverity_warn(inode, "Unknown hash algorithm number: %u", num);
> + return ERR_PTR(-EINVAL);
> + }
> + alg = &fsverity_hash_algs[num];
> +
> + /* pairs with cmpxchg() below */
> + tfm = READ_ONCE(alg->tfm);
> + if (likely(tfm != NULL))
> + return alg;
> + /*
> + * Using the shash API would make things a bit simpler, but the ahash
> + * API is preferable as it allows the use of crypto accelerators.
> + */
> + tfm = crypto_alloc_ahash(alg->name, 0, 0);
> + if (IS_ERR(tfm)) {
> + if (PTR_ERR(tfm) == -ENOENT)
> + fsverity_warn(inode,
> + "Missing crypto API support for hash algorithm \"%s\"",
> + alg->name);
> + else
> + fsverity_err(inode,
> + "Error allocating hash algorithm \"%s\": %ld",
> + alg->name, PTR_ERR(tfm));
> + return ERR_CAST(tfm);
> + }
> +
> + err = -EINVAL;
> + if (WARN_ON(alg->digest_size != crypto_ahash_digestsize(tfm)))
> + goto err_free_tfm;
> + if (WARN_ON(alg->block_size != crypto_ahash_blocksize(tfm)))
> + goto err_free_tfm;
> +
> + pr_info("%s using implementation \"%s\"\n",
> + alg->name, crypto_ahash_driver_name(tfm));
> +
> + /* pairs with READ_ONCE() above */
> + if (cmpxchg(&alg->tfm, NULL, tfm) != NULL)
> + crypto_free_ahash(tfm);
> +
> + return alg;
> +
> +err_free_tfm:
> + crypto_free_ahash(tfm);
> + return ERR_PTR(err);
> +}
> +
> +/**
> + * fsverity_prepare_hash_state() - precompute the initial hash state
> + * @alg: hash algorithm
> + * @salt: a salt which is to be prepended to all data to be hashed
> + * @salt_size: salt size in bytes, possibly 0
> + *
> + * Return: NULL if the salt is empty, otherwise the kmalloc()'ed precomputed
> + * initial hash state on success or an ERR_PTR() on failure.
> + */
> +const u8 *fsverity_prepare_hash_state(const struct fsverity_hash_alg *alg,
> + const u8 *salt, size_t salt_size)
> +{
> + u8 *hashstate = NULL;
> + struct ahash_request *req = NULL;
> + u8 *padded_salt = NULL;
> + size_t padded_salt_size;
> + struct scatterlist sg;
> + DECLARE_CRYPTO_WAIT(wait);
> + int err;
> +
> + if (salt_size == 0)
> + return NULL;
> +
> + hashstate = kmalloc(crypto_ahash_statesize(alg->tfm), GFP_KERNEL);
> + if (!hashstate)
> + return ERR_PTR(-ENOMEM);
> +
> + req = ahash_request_alloc(alg->tfm, GFP_KERNEL);
> + if (!req) {
> + err = -ENOMEM;
> + goto err_free;
> + }
> +
> + /*
> + * Zero-pad the salt to the next multiple of the input size of the hash
> + * algorithm's compression function, e.g. 64 bytes for SHA-256 or 128
> + * bytes for SHA-512. This ensures that the hash algorithm won't have
> + * any bytes buffered internally after processing the salt, thus making
> + * salted hashing just as fast as unsalted hashing.
> + */
> + padded_salt_size = round_up(salt_size, alg->block_size);
> + padded_salt = kzalloc(padded_salt_size, GFP_KERNEL);
> + if (!padded_salt) {
> + err = -ENOMEM;
> + goto err_free;
> + }
> + memcpy(padded_salt, salt, salt_size);
> +
> + sg_init_one(&sg, padded_salt, padded_salt_size);
> + ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP |
> + CRYPTO_TFM_REQ_MAY_BACKLOG,
> + crypto_req_done, &wait);
> + ahash_request_set_crypt(req, &sg, NULL, padded_salt_size);
> +
> + err = crypto_wait_req(crypto_ahash_init(req), &wait);
> + if (err)
> + goto err_free;
> +
> + err = crypto_wait_req(crypto_ahash_update(req), &wait);
> + if (err)
> + goto err_free;
> +
> + err = crypto_ahash_export(req, hashstate);
> + if (err)
> + goto err_free;
> +out:
> + kfree(padded_salt);
> + ahash_request_free(req);
> + return hashstate;
> +
> +err_free:
> + kfree(hashstate);
> + hashstate = ERR_PTR(err);
> + goto out;
> +}
> +
> +/**
> + * fsverity_hash_page() - hash a single data or hash page
> + * @params: the Merkle tree's parameters
> + * @inode: inode for which the hashing is being done
> + * @req: preallocated hash request
> + * @page: the page to hash
> + * @out: output digest, size 'params->digest_size' bytes
> + *
> + * Hash a single data or hash block, assuming block_size == PAGE_SIZE.
> + * The hash is salted if a salt is specified in the Merkle tree parameters.
> + *
> + * Return: 0 on success, -errno on failure
> + */
> +int fsverity_hash_page(const struct merkle_tree_params *params,
> + const struct inode *inode,
> + struct ahash_request *req, struct page *page, u8 *out)
> +{
> + struct scatterlist sg;
> + DECLARE_CRYPTO_WAIT(wait);
> + int err;
> +
> + if (WARN_ON(params->block_size != PAGE_SIZE))
> + return -EINVAL;
> +
> + sg_init_table(&sg, 1);
> + sg_set_page(&sg, page, PAGE_SIZE, 0);
> + ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP |
> + CRYPTO_TFM_REQ_MAY_BACKLOG,
> + crypto_req_done, &wait);
> + ahash_request_set_crypt(req, &sg, out, PAGE_SIZE);
> +
> + if (params->hashstate) {
> + err = crypto_ahash_import(req, params->hashstate);
> + if (err) {
> + fsverity_err(inode,
> + "Error %d importing hash state", err);
> + return err;
> + }
> + err = crypto_ahash_finup(req);
> + } else {
> + err = crypto_ahash_digest(req);
> + }
> +
> + err = crypto_wait_req(err, &wait);
> + if (err)
> + fsverity_err(inode, "Error %d computing page hash", err);
> + return err;
> +}
> +
> +/**
> + * fsverity_hash_buffer() - hash some data
> + * @alg: the hash algorithm to use
> + * @data: the data to hash
> + * @size: size of data to hash
> + * @out: output digest, size 'alg->digest_size' bytes
> + *
> + * Hash some data which is located in physically contiguous memory (i.e. memory
> + * allocated by kmalloc(), not by vmalloc()). No salt is used.
> + *
> + * Return: 0 on success, -errno on failure
> + */
> +int fsverity_hash_buffer(const struct fsverity_hash_alg *alg,
> + const void *data, size_t size, u8 *out)
> +{
> + struct ahash_request *req;
> + struct scatterlist sg;
> + DECLARE_CRYPTO_WAIT(wait);
> + int err;
> +
> + req = ahash_request_alloc(alg->tfm, GFP_KERNEL);
> + if (!req)
> + return -ENOMEM;
> +
> + sg_init_one(&sg, data, size);
> + ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP |
> + CRYPTO_TFM_REQ_MAY_BACKLOG,
> + crypto_req_done, &wait);
> + ahash_request_set_crypt(req, &sg, out, size);
> +
> + err = crypto_wait_req(crypto_ahash_digest(req), &wait);
> +
> + ahash_request_free(req);
> + return err;
> +}
> +
> +void __init fsverity_check_hash_algs(void)
> +{
> + size_t i;
> +
> + /*
> + * Sanity check the hash algorithms (could be a build-time check, but
> + * they're in an array)
> + */
> + for (i = 0; i < ARRAY_SIZE(fsverity_hash_algs); i++) {
> + const struct fsverity_hash_alg *alg = &fsverity_hash_algs[i];
> +
> + if (!alg->name)
> + continue;
> +
> + BUG_ON(alg->digest_size > FS_VERITY_MAX_DIGEST_SIZE);
> +
> + /*
> + * For efficiency, the implementation currently assumes the
> + * digest and block sizes are powers of 2. This limitation can
> + * be lifted if the code is updated to handle other values.
> + */
> + BUG_ON(!is_power_of_2(alg->digest_size));
> + BUG_ON(!is_power_of_2(alg->block_size));
> + }
> +}
> diff --git a/fs/verity/init.c b/fs/verity/init.c
> new file mode 100644
> index 00000000000000..40076bbe452a48
> --- /dev/null
> +++ b/fs/verity/init.c
> @@ -0,0 +1,41 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * fs/verity/init.c: fs-verity module initialization and logging
> + *
> + * Copyright 2019 Google LLC
> + */
> +
> +#include "fsverity_private.h"
> +
> +#include <linux/ratelimit.h>
> +
> +void fsverity_msg(const struct inode *inode, const char *level,
> + const char *fmt, ...)
> +{
> + static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
> + DEFAULT_RATELIMIT_BURST);
> + struct va_format vaf;
> + va_list args;
> +
> + if (!__ratelimit(&rs))
> + return;
> +
> + va_start(args, fmt);
> + vaf.fmt = fmt;
> + vaf.va = &args;
> + if (inode)
> + printk("%sfs-verity (%s, inode %lu): %pV\n",
> + level, inode->i_sb->s_id, inode->i_ino, &vaf);
> + else
> + printk("%sfs-verity: %pV\n", level, &vaf);
> + va_end(args);
> +}
> +
> +static int __init fsverity_init(void)
> +{
> + fsverity_check_hash_algs();
> +
> + pr_debug("Initialized fs-verity\n");
> + return 0;
> +}
> +late_initcall(fsverity_init)
> --
> 2.22.0.410.gd8fdbe21b5-goog

2019-06-22 22:18:42

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH v5 06/16] fs-verity: add inode and superblock fields

On 06/20, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> Analogous to fs/crypto/, add fields to the VFS inode and superblock for
> use by the fs/verity/ support layer:
>
> - ->s_vop: points to the fsverity_operations if the filesystem supports
> fs-verity, otherwise is NULL.
>
> - ->i_verity_info: points to cached fs-verity information for the inode
> after someone opens it, otherwise is NULL.
>
> - S_VERITY: bit in ->i_flags that identifies verity inodes, even when
> they haven't been opened yet and thus still have NULL ->i_verity_info.
>
> Reviewed-by: Theodore Ts'o <[email protected]>

Reviewed-by: Jaegeuk Kim <[email protected]>

> Signed-off-by: Eric Biggers <[email protected]>
> ---
> include/linux/fs.h | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index f7fdfe93e25d3e..a80a192cdcf285 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -64,6 +64,8 @@ struct workqueue_struct;
> struct iov_iter;
> struct fscrypt_info;
> struct fscrypt_operations;
> +struct fsverity_info;
> +struct fsverity_operations;
> struct fs_context;
> struct fs_parameter_description;
>
> @@ -723,6 +725,10 @@ struct inode {
> struct fscrypt_info *i_crypt_info;
> #endif
>
> +#ifdef CONFIG_FS_VERITY
> + struct fsverity_info *i_verity_info;
> +#endif
> +
> void *i_private; /* fs or device private pointer */
> } __randomize_layout;
>
> @@ -1429,6 +1435,9 @@ struct super_block {
> const struct xattr_handler **s_xattr;
> #ifdef CONFIG_FS_ENCRYPTION
> const struct fscrypt_operations *s_cop;
> +#endif
> +#ifdef CONFIG_FS_VERITY
> + const struct fsverity_operations *s_vop;
> #endif
> struct hlist_bl_head s_roots; /* alternate root dentries for NFS */
> struct list_head s_mounts; /* list of mounts; _not_ for fs use */
> @@ -1964,6 +1973,7 @@ struct super_operations {
> #endif
> #define S_ENCRYPTED 16384 /* Encrypted file (using fs/crypto/) */
> #define S_CASEFOLD 32768 /* Casefolded file */
> +#define S_VERITY 65536 /* Verity file (using fs/verity/) */
>
> /*
> * Note that nosuid etc flags are inode-specific: setting some file-system
> @@ -2005,6 +2015,7 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags
> #define IS_DAX(inode) ((inode)->i_flags & S_DAX)
> #define IS_ENCRYPTED(inode) ((inode)->i_flags & S_ENCRYPTED)
> #define IS_CASEFOLDED(inode) ((inode)->i_flags & S_CASEFOLD)
> +#define IS_VERITY(inode) ((inode)->i_flags & S_VERITY)
>
> #define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \
> (inode)->i_rdev == WHITEOUT_DEV)
> --
> 2.22.0.410.gd8fdbe21b5-goog

2019-06-22 22:29:41

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH v5 08/16] fs-verity: add the hook for file ->setattr()

On 06/20, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> Add a function fsverity_prepare_setattr() which filesystems that support
> fs-verity must call to deny truncates of verity files.
>
> Reviewed-by: Theodore Ts'o <[email protected]>

Reviewed-by: Jaegeuk Kim <[email protected]>

> Signed-off-by: Eric Biggers <[email protected]>
> ---
> fs/verity/open.c | 21 +++++++++++++++++++++
> include/linux/fsverity.h | 7 +++++++
> 2 files changed, 28 insertions(+)
>
> diff --git a/fs/verity/open.c b/fs/verity/open.c
> index 3a3bb27e23f5e3..21ae0ef254a695 100644
> --- a/fs/verity/open.c
> +++ b/fs/verity/open.c
> @@ -296,6 +296,27 @@ int fsverity_file_open(struct inode *inode, struct file *filp)
> }
> EXPORT_SYMBOL_GPL(fsverity_file_open);
>
> +/**
> + * fsverity_prepare_setattr - prepare to change a verity inode's attributes
> + * @dentry: dentry through which the inode is being changed
> + * @attr: attributes to change
> + *
> + * Verity files are immutable, so deny truncates. This isn't covered by the
> + * open-time check because sys_truncate() takes a path, not a file descriptor.
> + *
> + * Return: 0 on success, -errno on failure
> + */
> +int fsverity_prepare_setattr(struct dentry *dentry, struct iattr *attr)
> +{
> + if (IS_VERITY(d_inode(dentry)) && (attr->ia_valid & ATTR_SIZE)) {
> + pr_debug("Denying truncate of verity file (ino %lu)\n",
> + d_inode(dentry)->i_ino);
> + return -EPERM;
> + }
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(fsverity_prepare_setattr);
> +
> /**
> * fsverity_cleanup_inode - free the inode's verity info, if present
> *
> diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
> index 1372c236c8770c..cbcc358d073652 100644
> --- a/include/linux/fsverity.h
> +++ b/include/linux/fsverity.h
> @@ -46,6 +46,7 @@ static inline struct fsverity_info *fsverity_get_info(const struct inode *inode)
> /* open.c */
>
> extern int fsverity_file_open(struct inode *inode, struct file *filp);
> +extern int fsverity_prepare_setattr(struct dentry *dentry, struct iattr *attr);
> extern void fsverity_cleanup_inode(struct inode *inode);
>
> #else /* !CONFIG_FS_VERITY */
> @@ -62,6 +63,12 @@ static inline int fsverity_file_open(struct inode *inode, struct file *filp)
> return IS_VERITY(inode) ? -EOPNOTSUPP : 0;
> }
>
> +static inline int fsverity_prepare_setattr(struct dentry *dentry,
> + struct iattr *attr)
> +{
> + return IS_VERITY(d_inode(dentry)) ? -EOPNOTSUPP : 0;
> +}
> +
> static inline void fsverity_cleanup_inode(struct inode *inode)
> {
> }
> --
> 2.22.0.410.gd8fdbe21b5-goog

2019-06-22 22:30:57

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH v5 07/16] fs-verity: add the hook for file ->open()

On 06/20, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> Add the fsverity_file_open() function, which prepares an fs-verity file
> to be read from. If not already done, it loads the fs-verity descriptor
> from the filesystem and sets up an fsverity_info structure for the inode
> which describes the Merkle tree and contains the file measurement. It
> also denies all attempts to open verity files for writing.
>
> This commit also begins the include/linux/fsverity.h header, which
> declares the interface between fs/verity/ and filesystems.
>
> Reviewed-by: Theodore Ts'o <[email protected]>

Reviewed-by: Jaegeuk Kim <[email protected]>

> Signed-off-by: Eric Biggers <[email protected]>
> ---
> fs/verity/Makefile | 3 +-
> fs/verity/fsverity_private.h | 54 +++++-
> fs/verity/init.c | 6 +
> fs/verity/open.c | 319 +++++++++++++++++++++++++++++++++++
> include/linux/fsverity.h | 71 ++++++++
> 5 files changed, 450 insertions(+), 3 deletions(-)
> create mode 100644 fs/verity/open.c
> create mode 100644 include/linux/fsverity.h
>
> diff --git a/fs/verity/Makefile b/fs/verity/Makefile
> index 398f3f85fa184b..e6a8951c493a5e 100644
> --- a/fs/verity/Makefile
> +++ b/fs/verity/Makefile
> @@ -1,4 +1,5 @@
> # SPDX-License-Identifier: GPL-2.0
>
> obj-$(CONFIG_FS_VERITY) += hash_algs.o \
> - init.o
> + init.o \
> + open.o
> diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
> index 9697aaebb5dc1f..c79746ff335e14 100644
> --- a/fs/verity/fsverity_private.h
> +++ b/fs/verity/fsverity_private.h
> @@ -15,8 +15,7 @@
> #define pr_fmt(fmt) "fs-verity: " fmt
>
> #include <crypto/sha.h>
> -#include <linux/fs.h>
> -#include <uapi/linux/fsverity.h>
> +#include <linux/fsverity.h>
>
> struct ahash_request;
>
> @@ -59,6 +58,40 @@ struct merkle_tree_params {
> u64 level_start[FS_VERITY_MAX_LEVELS];
> };
>
> +/**
> + * fsverity_info - cached verity metadata for an inode
> + *
> + * When a verity file is first opened, an instance of this struct is allocated
> + * and stored in ->i_verity_info; it remains until the inode is evicted. It
> + * caches information about the Merkle tree that's needed to efficiently verify
> + * data read from the file. It also caches the file measurement. The Merkle
> + * tree pages themselves are not cached here, but the filesystem may cache them.
> + */
> +struct fsverity_info {
> + struct merkle_tree_params tree_params;
> + u8 root_hash[FS_VERITY_MAX_DIGEST_SIZE];
> + u8 measurement[FS_VERITY_MAX_DIGEST_SIZE];
> + const struct inode *inode;
> +};
> +
> +/*
> + * Merkle tree properties. The file measurement is the hash of this structure.
> + */
> +struct fsverity_descriptor {
> + __u8 version; /* must be 1 */
> + __u8 hash_algorithm; /* Merkle tree hash algorithm */
> + __u8 log_blocksize; /* log2 of size of data and tree blocks */
> + __u8 salt_size; /* size of salt in bytes; 0 if none */
> + __le32 sig_size; /* reserved, must be 0 */
> + __le64 data_size; /* size of file the Merkle tree is built over */
> + __u8 root_hash[64]; /* Merkle tree root hash */
> + __u8 salt[32]; /* salt prepended to each hashed block */
> + __u8 __reserved[144]; /* must be 0's */
> +};
> +
> +/* Arbitrary limit to bound the kmalloc() size. Can be changed. */
> +#define FS_VERITY_MAX_DESCRIPTOR_SIZE 16384
> +
> /* hash_algs.c */
>
> extern struct fsverity_hash_alg fsverity_hash_algs[];
> @@ -85,4 +118,21 @@ fsverity_msg(const struct inode *inode, const char *level,
> #define fsverity_err(inode, fmt, ...) \
> fsverity_msg((inode), KERN_ERR, fmt, ##__VA_ARGS__)
>
> +/* open.c */
> +
> +int fsverity_init_merkle_tree_params(struct merkle_tree_params *params,
> + const struct inode *inode,
> + unsigned int hash_algorithm,
> + unsigned int log_blocksize,
> + const u8 *salt, size_t salt_size);
> +
> +struct fsverity_info *fsverity_create_info(const struct inode *inode,
> + const void *desc, size_t desc_size);
> +
> +void fsverity_set_info(struct inode *inode, struct fsverity_info *vi);
> +
> +void fsverity_free_info(struct fsverity_info *vi);
> +
> +int __init fsverity_init_info_cache(void);
> +
> #endif /* _FSVERITY_PRIVATE_H */
> diff --git a/fs/verity/init.c b/fs/verity/init.c
> index 40076bbe452a48..fff1fd6343357d 100644
> --- a/fs/verity/init.c
> +++ b/fs/verity/init.c
> @@ -33,8 +33,14 @@ void fsverity_msg(const struct inode *inode, const char *level,
>
> static int __init fsverity_init(void)
> {
> + int err;
> +
> fsverity_check_hash_algs();
>
> + err = fsverity_init_info_cache();
> + if (err)
> + return err;
> +
> pr_debug("Initialized fs-verity\n");
> return 0;
> }
> diff --git a/fs/verity/open.c b/fs/verity/open.c
> new file mode 100644
> index 00000000000000..3a3bb27e23f5e3
> --- /dev/null
> +++ b/fs/verity/open.c
> @@ -0,0 +1,319 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * fs/verity/open.c: opening fs-verity files
> + *
> + * Copyright 2019 Google LLC
> + */
> +
> +#include "fsverity_private.h"
> +
> +#include <linux/slab.h>
> +
> +static struct kmem_cache *fsverity_info_cachep;
> +
> +/**
> + * fsverity_init_merkle_tree_params() - initialize Merkle tree parameters
> + * @params: the parameters struct to initialize
> + * @inode: the inode for which the Merkle tree is being built
> + * @hash_algorithm: number of hash algorithm to use
> + * @log_blocksize: log base 2 of block size to use
> + * @salt: pointer to salt (optional)
> + * @salt_size: size of salt, possibly 0
> + *
> + * Validate the hash algorithm and block size, then compute the tree topology
> + * (num levels, num blocks in each level, etc.) and initialize @params.
> + *
> + * Return: 0 on success, -errno on failure
> + */
> +int fsverity_init_merkle_tree_params(struct merkle_tree_params *params,
> + const struct inode *inode,
> + unsigned int hash_algorithm,
> + unsigned int log_blocksize,
> + const u8 *salt, size_t salt_size)
> +{
> + const struct fsverity_hash_alg *hash_alg;
> + int err;
> + u64 blocks;
> + u64 offset;
> + int level;
> +
> + memset(params, 0, sizeof(*params));
> +
> + hash_alg = fsverity_get_hash_alg(inode, hash_algorithm);
> + if (IS_ERR(hash_alg))
> + return PTR_ERR(hash_alg);
> + params->hash_alg = hash_alg;
> + params->digest_size = hash_alg->digest_size;
> +
> + params->hashstate = fsverity_prepare_hash_state(hash_alg, salt,
> + salt_size);
> + if (IS_ERR(params->hashstate)) {
> + err = PTR_ERR(params->hashstate);
> + params->hashstate = NULL;
> + fsverity_err(inode, "Error %d preparing hash state", err);
> + goto out_err;
> + }
> +
> + if (log_blocksize != PAGE_SHIFT) {
> + fsverity_warn(inode, "Unsupported log_blocksize: %u",
> + log_blocksize);
> + err = -EINVAL;
> + goto out_err;
> + }
> + params->log_blocksize = log_blocksize;
> + params->block_size = 1 << log_blocksize;
> +
> + if (WARN_ON(!is_power_of_2(params->digest_size))) {
> + err = -EINVAL;
> + goto out_err;
> + }
> + if (params->block_size < 2 * params->digest_size) {
> + fsverity_warn(inode,
> + "Merkle tree block size (%u) too small for hash algorithm \"%s\"",
> + params->block_size, hash_alg->name);
> + err = -EINVAL;
> + goto out_err;
> + }
> + params->log_arity = params->log_blocksize - ilog2(params->digest_size);
> + params->hashes_per_block = 1 << params->log_arity;
> +
> + pr_debug("Merkle tree uses %s with %u-byte blocks (%u hashes/block), salt=%*phN\n",
> + hash_alg->name, params->block_size, params->hashes_per_block,
> + (int)salt_size, salt);
> +
> + /*
> + * Compute the number of levels in the Merkle tree and create a map from
> + * level to the starting block of that level. Level 'num_levels - 1' is
> + * the root and is stored first. Level 0 is the level directly "above"
> + * the data blocks and is stored last.
> + */
> +
> + /* Compute number of levels and the number of blocks in each level */
> + blocks = (inode->i_size + params->block_size - 1) >> log_blocksize;
> + pr_debug("Data is %lld bytes (%llu blocks)\n", inode->i_size, blocks);
> + while (blocks > 1) {
> + if (params->num_levels >= FS_VERITY_MAX_LEVELS) {
> + fsverity_err(inode, "Too many levels in Merkle tree");
> + err = -EINVAL;
> + goto out_err;
> + }
> + blocks = (blocks + params->hashes_per_block - 1) >>
> + params->log_arity;
> + /* temporarily using level_start[] to store blocks in level */
> + params->level_start[params->num_levels++] = blocks;
> + }
> +
> + /* Compute the starting block of each level */
> + offset = 0;
> + for (level = (int)params->num_levels - 1; level >= 0; level--) {
> + blocks = params->level_start[level];
> + params->level_start[level] = offset;
> + pr_debug("Level %d is %llu blocks starting at index %llu\n",
> + level, blocks, offset);
> + offset += blocks;
> + }
> +
> + params->tree_size = offset << log_blocksize;
> + return 0;
> +
> +out_err:
> + kfree(params->hashstate);
> + memset(params, 0, sizeof(*params));
> + return err;
> +}
> +
> +/* Compute the file measurement by hashing the fsverity_descriptor. */
> +static int compute_file_measurement(const struct fsverity_hash_alg *hash_alg,
> + const struct fsverity_descriptor *desc,
> + u8 *measurement)
> +{
> + return fsverity_hash_buffer(hash_alg, desc, sizeof(*desc), measurement);
> +}
> +
> +/*
> + * Validate the given fsverity_descriptor and create a new fsverity_info from
> + * it.
> + */
> +struct fsverity_info *fsverity_create_info(const struct inode *inode,
> + const void *_desc, size_t desc_size)
> +{
> + const struct fsverity_descriptor *desc = _desc;
> + struct fsverity_info *vi;
> + int err;
> +
> + if (desc_size < sizeof(*desc)) {
> + fsverity_err(inode, "Unrecognized descriptor size (%zu)",
> + desc_size);
> + return ERR_PTR(-EINVAL);
> + }
> +
> + if (desc->version != 1) {
> + fsverity_err(inode, "Unrecognized descriptor version: %u",
> + desc->version);
> + return ERR_PTR(-EINVAL);
> + }
> +
> + if (desc->sig_size ||
> + memchr_inv(desc->__reserved, 0, sizeof(desc->__reserved))) {
> + fsverity_err(inode, "Reserved bits set in descriptor");
> + return ERR_PTR(-EINVAL);
> + }
> +
> + if (desc->salt_size > sizeof(desc->salt)) {
> + fsverity_err(inode, "Invalid salt_size: %u", desc->salt_size);
> + return ERR_PTR(-EINVAL);
> + }
> +
> + if (le64_to_cpu(desc->data_size) != inode->i_size) {
> + fsverity_err(inode,
> + "Wrong data_size: %llu (desc) != %lld (inode)",
> + le64_to_cpu(desc->data_size), inode->i_size);
> + return ERR_PTR(-EINVAL);
> + }
> +
> + vi = kmem_cache_zalloc(fsverity_info_cachep, GFP_KERNEL);
> + if (!vi)
> + return ERR_PTR(-ENOMEM);
> + vi->inode = inode;
> +
> + err = fsverity_init_merkle_tree_params(&vi->tree_params, inode,
> + desc->hash_algorithm,
> + desc->log_blocksize,
> + desc->salt, desc->salt_size);
> + if (err) {
> + fsverity_err(inode,
> + "Error %d initializing Merkle tree parameters",
> + err);
> + goto out;
> + }
> +
> + memcpy(vi->root_hash, desc->root_hash, vi->tree_params.digest_size);
> +
> + err = compute_file_measurement(vi->tree_params.hash_alg, desc,
> + vi->measurement);
> + if (err) {
> + fsverity_err(vi->inode, "Error %d computing file measurement",
> + err);
> + goto out;
> + }
> + pr_debug("Computed file measurement: %s:%*phN\n",
> + vi->tree_params.hash_alg->name,
> + vi->tree_params.digest_size, vi->measurement);
> +out:
> + if (err) {
> + fsverity_free_info(vi);
> + vi = ERR_PTR(err);
> + }
> + return vi;
> +}
> +
> +void fsverity_set_info(struct inode *inode, struct fsverity_info *vi)
> +{
> + /*
> + * Multiple processes may race to set ->i_verity_info, so use cmpxchg.
> + * This pairs with the READ_ONCE() in fsverity_get_info().
> + */
> + if (cmpxchg_release(&inode->i_verity_info, NULL, vi) != NULL)
> + fsverity_free_info(vi);
> +}
> +
> +void fsverity_free_info(struct fsverity_info *vi)
> +{
> + if (!vi)
> + return;
> + kfree(vi->tree_params.hashstate);
> + kmem_cache_free(fsverity_info_cachep, vi);
> +}
> +
> +/* Ensure the inode has an ->i_verity_info */
> +static int ensure_verity_info(struct inode *inode)
> +{
> + struct fsverity_info *vi = fsverity_get_info(inode);
> + struct fsverity_descriptor *desc;
> + int res;
> +
> + if (vi)
> + return 0;
> +
> + res = inode->i_sb->s_vop->get_verity_descriptor(inode, NULL, 0);
> + if (res < 0) {
> + fsverity_err(inode,
> + "Error %d getting verity descriptor size", res);
> + return res;
> + }
> + if (res > FS_VERITY_MAX_DESCRIPTOR_SIZE) {
> + fsverity_err(inode, "Verity descriptor is too large (%d bytes)",
> + res);
> + return -EMSGSIZE;
> + }
> + desc = kmalloc(res, GFP_KERNEL);
> + if (!desc)
> + return -ENOMEM;
> + res = inode->i_sb->s_vop->get_verity_descriptor(inode, desc, res);
> + if (res < 0) {
> + fsverity_err(inode, "Error %d reading verity descriptor", res);
> + goto out_free_desc;
> + }
> +
> + vi = fsverity_create_info(inode, desc, res);
> + if (IS_ERR(vi)) {
> + res = PTR_ERR(vi);
> + goto out_free_desc;
> + }
> +
> + fsverity_set_info(inode, vi);
> + res = 0;
> +out_free_desc:
> + kfree(desc);
> + return res;
> +}
> +
> +/**
> + * fsverity_file_open - prepare to open a verity file
> + * @inode: the inode being opened
> + * @filp: the struct file being set up
> + *
> + * When opening a verity file, deny the open if it is for writing. Otherwise,
> + * set up the inode's ->i_verity_info if not already done.
> + *
> + * When combined with fscrypt, this must be called after fscrypt_file_open().
> + * Otherwise, we won't have the key set up to decrypt the verity metadata.
> + *
> + * Return: 0 on success, -errno on failure
> + */
> +int fsverity_file_open(struct inode *inode, struct file *filp)
> +{
> + if (!IS_VERITY(inode))
> + return 0;
> +
> + if (filp->f_mode & FMODE_WRITE) {
> + pr_debug("Denying opening verity file (ino %lu) for write\n",
> + inode->i_ino);
> + return -EPERM;
> + }
> +
> + return ensure_verity_info(inode);
> +}
> +EXPORT_SYMBOL_GPL(fsverity_file_open);
> +
> +/**
> + * fsverity_cleanup_inode - free the inode's verity info, if present
> + *
> + * Filesystems must call this on inode eviction to free ->i_verity_info.
> + */
> +void fsverity_cleanup_inode(struct inode *inode)
> +{
> + fsverity_free_info(inode->i_verity_info);
> + inode->i_verity_info = NULL;
> +}
> +EXPORT_SYMBOL_GPL(fsverity_cleanup_inode);
> +
> +int __init fsverity_init_info_cache(void)
> +{
> + fsverity_info_cachep = KMEM_CACHE_USERCOPY(fsverity_info,
> + SLAB_RECLAIM_ACCOUNT,
> + measurement);
> + if (!fsverity_info_cachep)
> + return -ENOMEM;
> + return 0;
> +}
> diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
> new file mode 100644
> index 00000000000000..1372c236c8770c
> --- /dev/null
> +++ b/include/linux/fsverity.h
> @@ -0,0 +1,71 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * fs-verity: read-only file-based authenticity protection
> + *
> + * This header declares the interface between the fs/verity/ support layer and
> + * filesystems that support fs-verity.
> + *
> + * Copyright 2019 Google LLC
> + */
> +
> +#ifndef _LINUX_FSVERITY_H
> +#define _LINUX_FSVERITY_H
> +
> +#include <linux/fs.h>
> +#include <uapi/linux/fsverity.h>
> +
> +/* Verity operations for filesystems */
> +struct fsverity_operations {
> +
> + /**
> + * Get the verity descriptor of the given inode.
> + *
> + * @inode: an inode with the S_VERITY flag set
> + * @buf: buffer in which to place the verity descriptor
> + * @bufsize: size of @buf, or 0 to retrieve the size only
> + *
> + * If bufsize == 0, then the size of the verity descriptor is returned.
> + * Otherwise the verity descriptor is written to 'buf' and its actual
> + * size is returned; -ERANGE is returned if it's too large. This may be
> + * called by multiple processes concurrently on the same inode.
> + *
> + * Return: the size on success, -errno on failure
> + */
> + int (*get_verity_descriptor)(struct inode *inode, void *buf,
> + size_t bufsize);
> +};
> +
> +#ifdef CONFIG_FS_VERITY
> +
> +static inline struct fsverity_info *fsverity_get_info(const struct inode *inode)
> +{
> + /* pairs with the cmpxchg_release() in fsverity_set_info() */
> + return READ_ONCE(inode->i_verity_info);
> +}
> +
> +/* open.c */
> +
> +extern int fsverity_file_open(struct inode *inode, struct file *filp);
> +extern void fsverity_cleanup_inode(struct inode *inode);
> +
> +#else /* !CONFIG_FS_VERITY */
> +
> +static inline struct fsverity_info *fsverity_get_info(const struct inode *inode)
> +{
> + return NULL;
> +}
> +
> +/* open.c */
> +
> +static inline int fsverity_file_open(struct inode *inode, struct file *filp)
> +{
> + return IS_VERITY(inode) ? -EOPNOTSUPP : 0;
> +}
> +
> +static inline void fsverity_cleanup_inode(struct inode *inode)
> +{
> +}
> +
> +#endif /* !CONFIG_FS_VERITY */
> +
> +#endif /* _LINUX_FSVERITY_H */
> --
> 2.22.0.410.gd8fdbe21b5-goog

2019-06-22 22:32:55

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH v5 09/16] fs-verity: add data verification hooks for ->readpages()

On 06/20, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> Add functions that verify data pages that have been read from a
> fs-verity file, against that file's Merkle tree. These will be called
> from filesystems' ->readpage() and ->readpages() methods.
>
> Since data verification can block, a workqueue is provided for these
> methods to enqueue verification work from their bio completion callback.
>
> See the "Verifying data" section of
> Documentation/filesystems/fsverity.rst for more information.
>
> Reviewed-by: Theodore Ts'o <[email protected]>

Reviewed-by: Jaegeuk Kim <[email protected]>

> Signed-off-by: Eric Biggers <[email protected]>
> ---
> fs/verity/Makefile | 3 +-
> fs/verity/fsverity_private.h | 5 +
> fs/verity/init.c | 8 +
> fs/verity/open.c | 6 +
> fs/verity/verify.c | 275 +++++++++++++++++++++++++++++++++++
> include/linux/fsverity.h | 56 +++++++
> 6 files changed, 352 insertions(+), 1 deletion(-)
> create mode 100644 fs/verity/verify.c
>
> diff --git a/fs/verity/Makefile b/fs/verity/Makefile
> index e6a8951c493a5e..7fa628cd5eba24 100644
> --- a/fs/verity/Makefile
> +++ b/fs/verity/Makefile
> @@ -2,4 +2,5 @@
>
> obj-$(CONFIG_FS_VERITY) += hash_algs.o \
> init.o \
> - open.o
> + open.o \
> + verify.o
> diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
> index c79746ff335e14..eaa2b3b93bbf6b 100644
> --- a/fs/verity/fsverity_private.h
> +++ b/fs/verity/fsverity_private.h
> @@ -134,5 +134,10 @@ void fsverity_set_info(struct inode *inode, struct fsverity_info *vi);
> void fsverity_free_info(struct fsverity_info *vi);
>
> int __init fsverity_init_info_cache(void);
> +void __init fsverity_exit_info_cache(void);
> +
> +/* verify.c */
> +
> +int __init fsverity_init_workqueue(void);
>
> #endif /* _FSVERITY_PRIVATE_H */
> diff --git a/fs/verity/init.c b/fs/verity/init.c
> index fff1fd6343357d..b593805aafcc89 100644
> --- a/fs/verity/init.c
> +++ b/fs/verity/init.c
> @@ -41,7 +41,15 @@ static int __init fsverity_init(void)
> if (err)
> return err;
>
> + err = fsverity_init_workqueue();
> + if (err)
> + goto err_exit_info_cache;
> +
> pr_debug("Initialized fs-verity\n");
> return 0;
> +
> +err_exit_info_cache:
> + fsverity_exit_info_cache();
> + return err;
> }
> late_initcall(fsverity_init)
> diff --git a/fs/verity/open.c b/fs/verity/open.c
> index 21ae0ef254a695..7a2cd000dc4f06 100644
> --- a/fs/verity/open.c
> +++ b/fs/verity/open.c
> @@ -338,3 +338,9 @@ int __init fsverity_init_info_cache(void)
> return -ENOMEM;
> return 0;
> }
> +
> +void __init fsverity_exit_info_cache(void)
> +{
> + kmem_cache_destroy(fsverity_info_cachep);
> + fsverity_info_cachep = NULL;
> +}
> diff --git a/fs/verity/verify.c b/fs/verity/verify.c
> new file mode 100644
> index 00000000000000..2a0f9e2ebc9f16
> --- /dev/null
> +++ b/fs/verity/verify.c
> @@ -0,0 +1,275 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * fs/verity/verify.c: data verification functions, i.e. hooks for ->readpages()
> + *
> + * Copyright 2019 Google LLC
> + */
> +
> +#include "fsverity_private.h"
> +
> +#include <crypto/hash.h>
> +#include <linux/bio.h>
> +#include <linux/ratelimit.h>
> +
> +static struct workqueue_struct *fsverity_read_workqueue;
> +
> +/**
> + * hash_at_level() - compute the location of the block's hash at the given level
> + *
> + * @params: (in) the Merkle tree parameters
> + * @dindex: (in) the index of the data block being verified
> + * @level: (in) the level of hash we want (0 is leaf level)
> + * @hindex: (out) the index of the hash block containing the wanted hash
> + * @hoffset: (out) the byte offset to the wanted hash within the hash block
> + */
> +static void hash_at_level(const struct merkle_tree_params *params,
> + pgoff_t dindex, unsigned int level, pgoff_t *hindex,
> + unsigned int *hoffset)
> +{
> + pgoff_t position;
> +
> + /* Offset of the hash within the level's region, in hashes */
> + position = dindex >> (level * params->log_arity);
> +
> + /* Index of the hash block in the tree overall */
> + *hindex = params->level_start[level] + (position >> params->log_arity);
> +
> + /* Offset of the wanted hash (in bytes) within the hash block */
> + *hoffset = (position & ((1 << params->log_arity) - 1)) <<
> + (params->log_blocksize - params->log_arity);
> +}
> +
> +/* Extract a hash from a hash page */
> +static void extract_hash(struct page *hpage, unsigned int hoffset,
> + unsigned int hsize, u8 *out)
> +{
> + void *virt = kmap_atomic(hpage);
> +
> + memcpy(out, virt + hoffset, hsize);
> + kunmap_atomic(virt);
> +}
> +
> +static inline int cmp_hashes(const struct fsverity_info *vi,
> + const u8 *want_hash, const u8 *real_hash,
> + pgoff_t index, int level)
> +{
> + const unsigned int hsize = vi->tree_params.digest_size;
> +
> + if (memcmp(want_hash, real_hash, hsize) == 0)
> + return 0;
> +
> + fsverity_err(vi->inode,
> + "FILE CORRUPTED! index=%lu, level=%d, want_hash=%s:%*phN, real_hash=%s:%*phN",
> + index, level,
> + vi->tree_params.hash_alg->name, hsize, want_hash,
> + vi->tree_params.hash_alg->name, hsize, real_hash);
> + return -EBADMSG;
> +}
> +
> +/*
> + * Verify a single data page against the file's Merkle tree.
> + *
> + * In principle, we need to verify the entire path to the root node. However,
> + * for efficiency the filesystem may cache the hash pages. Therefore we need
> + * only ascend the tree until an already-verified page is seen, as indicated by
> + * the PageChecked bit being set; then verify the path to that page.
> + *
> + * This code currently only supports the case where the verity block size is
> + * equal to PAGE_SIZE. Doing otherwise would be possible but tricky, since we
> + * wouldn't be able to use the PageChecked bit.
> + *
> + * Note that multiple processes may race to verify a hash page and mark it
> + * Checked, but it doesn't matter; the result will be the same either way.
> + *
> + * Return: true if the page is valid, else false.
> + */
> +static bool verify_page(struct inode *inode, const struct fsverity_info *vi,
> + struct ahash_request *req, struct page *data_page)
> +{
> + const struct merkle_tree_params *params = &vi->tree_params;
> + const unsigned int hsize = params->digest_size;
> + const pgoff_t index = data_page->index;
> + int level;
> + u8 _want_hash[FS_VERITY_MAX_DIGEST_SIZE];
> + const u8 *want_hash;
> + u8 real_hash[FS_VERITY_MAX_DIGEST_SIZE];
> + struct page *hpages[FS_VERITY_MAX_LEVELS];
> + unsigned int hoffsets[FS_VERITY_MAX_LEVELS];
> + int err;
> +
> + if (WARN_ON_ONCE(!PageLocked(data_page) || PageUptodate(data_page)))
> + return false;
> +
> + pr_debug_ratelimited("Verifying data page %lu...\n", index);
> +
> + /*
> + * Starting at the leaf level, ascend the tree saving hash pages along
> + * the way until we find a verified hash page, indicated by PageChecked;
> + * or until we reach the root.
> + */
> + for (level = 0; level < params->num_levels; level++) {
> + pgoff_t hindex;
> + unsigned int hoffset;
> + struct page *hpage;
> +
> + hash_at_level(params, index, level, &hindex, &hoffset);
> +
> + pr_debug_ratelimited("Level %d: hindex=%lu, hoffset=%u\n",
> + level, hindex, hoffset);
> +
> + hpage = inode->i_sb->s_vop->read_merkle_tree_page(inode,
> + hindex);
> + if (IS_ERR(hpage)) {
> + err = PTR_ERR(hpage);
> + fsverity_err(inode,
> + "Error %d reading Merkle tree page %lu",
> + err, hindex);
> + goto out;
> + }
> +
> + if (PageChecked(hpage)) {
> + extract_hash(hpage, hoffset, hsize, _want_hash);
> + want_hash = _want_hash;
> + put_page(hpage);
> + pr_debug_ratelimited("Hash page already checked, want %s:%*phN\n",
> + params->hash_alg->name,
> + hsize, want_hash);
> + goto descend;
> + }
> + pr_debug_ratelimited("Hash page not yet checked\n");
> + hpages[level] = hpage;
> + hoffsets[level] = hoffset;
> + }
> +
> + want_hash = vi->root_hash;
> + pr_debug("Want root hash: %s:%*phN\n",
> + params->hash_alg->name, hsize, want_hash);
> +descend:
> + /* Descend the tree verifying hash pages */
> + for (; level > 0; level--) {
> + struct page *hpage = hpages[level - 1];
> + unsigned int hoffset = hoffsets[level - 1];
> +
> + err = fsverity_hash_page(params, inode, req, hpage, real_hash);
> + if (err)
> + goto out;
> + err = cmp_hashes(vi, want_hash, real_hash, index, level - 1);
> + if (err)
> + goto out;
> + SetPageChecked(hpage);
> + extract_hash(hpage, hoffset, hsize, _want_hash);
> + want_hash = _want_hash;
> + put_page(hpage);
> + pr_debug("Verified hash page at level %d, now want %s:%*phN\n",
> + level - 1, params->hash_alg->name, hsize, want_hash);
> + }
> +
> + /* Finally, verify the data page */
> + err = fsverity_hash_page(params, inode, req, data_page, real_hash);
> + if (err)
> + goto out;
> + err = cmp_hashes(vi, want_hash, real_hash, index, -1);
> +out:
> + for (; level > 0; level--)
> + put_page(hpages[level - 1]);
> +
> + return err == 0;
> +}
> +
> +/**
> + * fsverity_verify_page - verify a data page
> + *
> + * Verify a page that has just been read from a verity file. The page must be a
> + * pagecache page that is still locked and not yet uptodate.
> + *
> + * Return: true if the page is valid, else false.
> + */
> +bool fsverity_verify_page(struct page *page)
> +{
> + struct inode *inode = page->mapping->host;
> + const struct fsverity_info *vi = inode->i_verity_info;
> + struct ahash_request *req;
> + bool valid;
> +
> + req = ahash_request_alloc(vi->tree_params.hash_alg->tfm, GFP_NOFS);
> + if (unlikely(!req))
> + return false;
> +
> + valid = verify_page(inode, vi, req, page);
> +
> + ahash_request_free(req);
> +
> + return valid;
> +}
> +EXPORT_SYMBOL_GPL(fsverity_verify_page);
> +
> +#ifdef CONFIG_BLOCK
> +/**
> + * fsverity_verify_bio - verify a 'read' bio that has just completed
> + *
> + * Verify a set of pages that have just been read from a verity file. The pages
> + * must be pagecache pages that are still locked and not yet uptodate. Pages
> + * that fail verification are set to the Error state. Verification is skipped
> + * for pages already in the Error state, e.g. due to fscrypt decryption failure.
> + *
> + * This is a helper function for use by the ->readpages() method of filesystems
> + * that issue bios to read data directly into the page cache. Filesystems that
> + * populate the page cache without issuing bios (e.g. non block-based
> + * filesystems) must instead call fsverity_verify_page() directly on each page.
> + * All filesystems must also call fsverity_verify_page() on holes.
> + */
> +void fsverity_verify_bio(struct bio *bio)
> +{
> + struct inode *inode = bio_first_page_all(bio)->mapping->host;
> + const struct fsverity_info *vi = inode->i_verity_info;
> + struct ahash_request *req;
> + struct bio_vec *bv;
> + struct bvec_iter_all iter_all;
> +
> + req = ahash_request_alloc(vi->tree_params.hash_alg->tfm, GFP_NOFS);
> + if (unlikely(!req)) {
> + bio_for_each_segment_all(bv, bio, iter_all)
> + SetPageError(bv->bv_page);
> + return;
> + }
> +
> + bio_for_each_segment_all(bv, bio, iter_all) {
> + struct page *page = bv->bv_page;
> +
> + if (!PageError(page) && !verify_page(inode, vi, req, page))
> + SetPageError(page);
> + }
> +
> + ahash_request_free(req);
> +}
> +EXPORT_SYMBOL_GPL(fsverity_verify_bio);
> +#endif /* CONFIG_BLOCK */
> +
> +/**
> + * fsverity_enqueue_verify_work - enqueue work on the fs-verity workqueue
> + *
> + * Enqueue verification work for asynchronous processing.
> + */
> +void fsverity_enqueue_verify_work(struct work_struct *work)
> +{
> + queue_work(fsverity_read_workqueue, work);
> +}
> +EXPORT_SYMBOL_GPL(fsverity_enqueue_verify_work);
> +
> +int __init fsverity_init_workqueue(void)
> +{
> + /*
> + * Use an unbound workqueue to allow bios to be verified in parallel
> + * even when they happen to complete on the same CPU. This sacrifices
> + * locality, but it's worthwhile since hashing is CPU-intensive.
> + *
> + * Also use a high-priority workqueue to prioritize verification work,
> + * which blocks reads from completing, over regular application tasks.
> + */
> + fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue",
> + WQ_UNBOUND | WQ_HIGHPRI,
> + num_online_cpus());
> + if (!fsverity_read_workqueue)
> + return -ENOMEM;
> + return 0;
> +}
> diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
> index cbcc358d073652..ecd47e748c7f64 100644
> --- a/include/linux/fsverity.h
> +++ b/include/linux/fsverity.h
> @@ -33,6 +33,23 @@ struct fsverity_operations {
> */
> int (*get_verity_descriptor)(struct inode *inode, void *buf,
> size_t bufsize);
> +
> + /**
> + * Read a Merkle tree page of the given inode.
> + *
> + * @inode: the inode
> + * @index: 0-based index of the page within the Merkle tree
> + *
> + * This can be called at any time on an open verity file, as well as
> + * between ->begin_enable_verity() and ->end_enable_verity(). It may be
> + * called by multiple processes concurrently, even with the same page.
> + *
> + * Note that this must retrieve a *page*, not necessarily a *block*.
> + *
> + * Return: the page on success, ERR_PTR() on failure
> + */
> + struct page *(*read_merkle_tree_page)(struct inode *inode,
> + pgoff_t index);
> };
>
> #ifdef CONFIG_FS_VERITY
> @@ -49,6 +66,12 @@ extern int fsverity_file_open(struct inode *inode, struct file *filp);
> extern int fsverity_prepare_setattr(struct dentry *dentry, struct iattr *attr);
> extern void fsverity_cleanup_inode(struct inode *inode);
>
> +/* verify.c */
> +
> +extern bool fsverity_verify_page(struct page *page);
> +extern void fsverity_verify_bio(struct bio *bio);
> +extern void fsverity_enqueue_verify_work(struct work_struct *work);
> +
> #else /* !CONFIG_FS_VERITY */
>
> static inline struct fsverity_info *fsverity_get_info(const struct inode *inode)
> @@ -73,6 +96,39 @@ static inline void fsverity_cleanup_inode(struct inode *inode)
> {
> }
>
> +/* verify.c */
> +
> +static inline bool fsverity_verify_page(struct page *page)
> +{
> + WARN_ON(1);
> + return false;
> +}
> +
> +static inline void fsverity_verify_bio(struct bio *bio)
> +{
> + WARN_ON(1);
> +}
> +
> +static inline void fsverity_enqueue_verify_work(struct work_struct *work)
> +{
> + WARN_ON(1);
> +}
> +
> #endif /* !CONFIG_FS_VERITY */
>
> +/**
> + * fsverity_active() - do reads from the inode need to go through fs-verity?
> + *
> + * This checks whether ->i_verity_info has been set.
> + *
> + * Filesystems call this from ->readpages() to check whether the pages need to
> + * be verified or not. Don't use IS_VERITY() for this purpose; it's subject to
> + * a race condition where the file is being read concurrently with
> + * FS_IOC_ENABLE_VERITY completing. (S_VERITY is set before ->i_verity_info.)
> + */
> +static inline bool fsverity_active(const struct inode *inode)
> +{
> + return fsverity_get_info(inode) != NULL;
> +}
> +
> #endif /* _LINUX_FSVERITY_H */
> --
> 2.22.0.410.gd8fdbe21b5-goog

2019-06-22 22:44:07

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH v5 11/16] fs-verity: implement FS_IOC_MEASURE_VERITY ioctl

On 06/20, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> Add a function for filesystems to call to implement the
> FS_IOC_MEASURE_VERITY ioctl. This ioctl retrieves the file measurement
> that fs-verity calculated for the given file and is enforcing for reads;
> i.e., reads that don't match this hash will fail. This ioctl can be
> used for authentication or logging of file measurements in userspace.
>
> See the "FS_IOC_MEASURE_VERITY" section of
> Documentation/filesystems/fsverity.rst for the documentation.
>
> Reviewed-by: Theodore Ts'o <[email protected]>

Reviewed-by: Jaegeuk Kim <[email protected]>

> Signed-off-by: Eric Biggers <[email protected]>
> ---
> fs/verity/Makefile | 1 +
> fs/verity/measure.c | 57 ++++++++++++++++++++++++++++++++++++++++
> include/linux/fsverity.h | 11 ++++++++
> 3 files changed, 69 insertions(+)
> create mode 100644 fs/verity/measure.c
>
> diff --git a/fs/verity/Makefile b/fs/verity/Makefile
> index 04b37475fd280a..6f7675ae0a3110 100644
> --- a/fs/verity/Makefile
> +++ b/fs/verity/Makefile
> @@ -3,5 +3,6 @@
> obj-$(CONFIG_FS_VERITY) += enable.o \
> hash_algs.o \
> init.o \
> + measure.o \
> open.o \
> verify.o
> diff --git a/fs/verity/measure.c b/fs/verity/measure.c
> new file mode 100644
> index 00000000000000..05049b68c74553
> --- /dev/null
> +++ b/fs/verity/measure.c
> @@ -0,0 +1,57 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * fs/verity/measure.c: ioctl to get a verity file's measurement
> + *
> + * Copyright 2019 Google LLC
> + */
> +
> +#include "fsverity_private.h"
> +
> +#include <linux/uaccess.h>
> +
> +/**
> + * fsverity_ioctl_measure() - get a verity file's measurement
> + *
> + * Retrieve the file measurement that the kernel is enforcing for reads from a
> + * verity file. See the "FS_IOC_MEASURE_VERITY" section of
> + * Documentation/filesystems/fsverity.rst for the documentation.
> + *
> + * Return: 0 on success, -errno on failure
> + */
> +int fsverity_ioctl_measure(struct file *filp, void __user *_uarg)
> +{
> + const struct inode *inode = file_inode(filp);
> + struct fsverity_digest __user *uarg = _uarg;
> + const struct fsverity_info *vi;
> + const struct fsverity_hash_alg *hash_alg;
> + struct fsverity_digest arg;
> +
> + vi = fsverity_get_info(inode);
> + if (!vi)
> + return -ENODATA; /* not a verity file */
> + hash_alg = vi->tree_params.hash_alg;
> +
> + /*
> + * The user specifies the digest_size their buffer has space for; we can
> + * return the digest if it fits in the available space. We write back
> + * the actual size, which may be shorter than the user-specified size.
> + */
> +
> + if (get_user(arg.digest_size, &uarg->digest_size))
> + return -EFAULT;
> + if (arg.digest_size < hash_alg->digest_size)
> + return -EOVERFLOW;
> +
> + memset(&arg, 0, sizeof(arg));
> + arg.digest_algorithm = hash_alg - fsverity_hash_algs;
> + arg.digest_size = hash_alg->digest_size;
> +
> + if (copy_to_user(uarg, &arg, sizeof(arg)))
> + return -EFAULT;
> +
> + if (copy_to_user(uarg->digest, vi->measurement, hash_alg->digest_size))
> + return -EFAULT;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(fsverity_ioctl_measure);
> diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
> index 7ef2ef82653409..247359c86b72e0 100644
> --- a/include/linux/fsverity.h
> +++ b/include/linux/fsverity.h
> @@ -116,6 +116,10 @@ static inline struct fsverity_info *fsverity_get_info(const struct inode *inode)
>
> extern int fsverity_ioctl_enable(struct file *filp, const void __user *arg);
>
> +/* measure.c */
> +
> +extern int fsverity_ioctl_measure(struct file *filp, void __user *arg);
> +
> /* open.c */
>
> extern int fsverity_file_open(struct inode *inode, struct file *filp);
> @@ -143,6 +147,13 @@ static inline int fsverity_ioctl_enable(struct file *filp,
> return -EOPNOTSUPP;
> }
>
> +/* measure.c */
> +
> +static inline int fsverity_ioctl_measure(struct file *filp, void __user *arg)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> /* open.c */
>
> static inline int fsverity_file_open(struct inode *inode, struct file *filp)
> --
> 2.22.0.410.gd8fdbe21b5-goog

2019-06-22 22:44:07

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH v5 10/16] fs-verity: implement FS_IOC_ENABLE_VERITY ioctl

On 06/20, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> Add a function for filesystems to call to implement the
> FS_IOC_ENABLE_VERITY ioctl. This ioctl enables fs-verity on a file.
>
> See the "FS_IOC_ENABLE_VERITY" section of
> Documentation/filesystems/fsverity.rst for the documentation.
>

Reviewed-by: Jaegeuk Kim <[email protected]>

> Signed-off-by: Eric Biggers <[email protected]>
> ---
> fs/verity/Makefile | 3 +-
> fs/verity/enable.c | 341 +++++++++++++++++++++++++++++++++++++++
> include/linux/fsverity.h | 64 ++++++++
> 3 files changed, 407 insertions(+), 1 deletion(-)
> create mode 100644 fs/verity/enable.c
>
> diff --git a/fs/verity/Makefile b/fs/verity/Makefile
> index 7fa628cd5eba24..04b37475fd280a 100644
> --- a/fs/verity/Makefile
> +++ b/fs/verity/Makefile
> @@ -1,6 +1,7 @@
> # SPDX-License-Identifier: GPL-2.0
>
> -obj-$(CONFIG_FS_VERITY) += hash_algs.o \
> +obj-$(CONFIG_FS_VERITY) += enable.o \
> + hash_algs.o \
> init.o \
> open.o \
> verify.o
> diff --git a/fs/verity/enable.c b/fs/verity/enable.c
> new file mode 100644
> index 00000000000000..144721bbe4aab9
> --- /dev/null
> +++ b/fs/verity/enable.c
> @@ -0,0 +1,341 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * fs/verity/enable.c: ioctl to enable verity on a file
> + *
> + * Copyright 2019 Google LLC
> + */
> +
> +#include "fsverity_private.h"
> +
> +#include <crypto/hash.h>
> +#include <linux/mount.h>
> +#include <linux/pagemap.h>
> +#include <linux/sched/signal.h>
> +#include <linux/uaccess.h>
> +
> +static int build_merkle_tree_level(struct inode *inode, unsigned int level,
> + u64 num_blocks_to_hash,
> + const struct merkle_tree_params *params,
> + u8 *pending_hashes,
> + struct ahash_request *req)
> +{
> + const struct fsverity_operations *vops = inode->i_sb->s_vop;
> + unsigned int pending_size = 0;
> + u64 dst_block_num;
> + u64 i;
> + int err;
> +
> + if (WARN_ON(params->block_size != PAGE_SIZE)) /* checked earlier too */
> + return -EINVAL;
> +
> + if (level < params->num_levels) {
> + dst_block_num = params->level_start[level];
> + } else {
> + if (WARN_ON(num_blocks_to_hash != 1))
> + return -EINVAL;
> + dst_block_num = 0; /* unused */
> + }
> +
> + for (i = 0; i < num_blocks_to_hash; i++) {
> + struct page *src_page;
> +
> + if ((pgoff_t)i % 10000 == 0 || i + 1 == num_blocks_to_hash)
> + pr_debug("Hashing block %llu of %llu for level %u\n",
> + i + 1, num_blocks_to_hash, level);
> +
> + if (level == 0)
> + /* Leaf: hashing a data block */
> + src_page = read_mapping_page(inode->i_mapping, i, NULL);
> + else
> + /* Non-leaf: hashing hash block from level below */
> + src_page = vops->read_merkle_tree_page(inode,
> + params->level_start[level - 1] + i);
> + if (IS_ERR(src_page)) {
> + err = PTR_ERR(src_page);
> + fsverity_err(inode,
> + "Error %d reading Merkle tree page %llu",
> + err, params->level_start[level - 1] + i);
> + return err;
> + }
> +
> + err = fsverity_hash_page(params, inode, req, src_page,
> + &pending_hashes[pending_size]);
> + put_page(src_page);
> + if (err)
> + return err;
> + pending_size += params->digest_size;
> +
> + if (level == params->num_levels) /* Root hash? */
> + return 0;
> +
> + if (pending_size + params->digest_size > params->block_size ||
> + i + 1 == num_blocks_to_hash) {
> + /* Flush the pending hash block */
> + memset(&pending_hashes[pending_size], 0,
> + params->block_size - pending_size);
> + err = vops->write_merkle_tree_block(inode,
> + pending_hashes,
> + dst_block_num,
> + params->log_blocksize);
> + if (err) {
> + fsverity_err(inode,
> + "Error %d writing Merkle tree block %llu",
> + err, dst_block_num);
> + return err;
> + }
> + dst_block_num++;
> + pending_size = 0;
> + }
> +
> + if (fatal_signal_pending(current))
> + return -EINTR;
> + cond_resched();
> + }
> + return 0;
> +}
> +
> +/*
> + * Build the Merkle tree for the given inode using the given parameters, and
> + * return the root hash in @root_hash.
> + *
> + * The tree is written to a filesystem-specific location as determined by the
> + * ->write_merkle_tree_block() method. However, the blocks that comprise the
> + * tree are the same for all filesystems.
> + */
> +static int build_merkle_tree(struct inode *inode,
> + const struct merkle_tree_params *params,
> + u8 *root_hash)
> +{
> + u8 *pending_hashes;
> + struct ahash_request *req;
> + u64 blocks;
> + unsigned int level;
> + int err = -ENOMEM;
> +
> + if (inode->i_size == 0) {
> + /* Empty file is a special case; root hash is all 0's */
> + memset(root_hash, 0, params->digest_size);
> + return 0;
> + }
> +
> + pending_hashes = kmalloc(params->block_size, GFP_KERNEL);
> + req = ahash_request_alloc(params->hash_alg->tfm, GFP_KERNEL);
> + if (!pending_hashes || !req)
> + goto out;
> +
> + /*
> + * Build each level of the Merkle tree, starting at the leaf level
> + * (level 0) and ascending to the root node (level 'num_levels - 1').
> + * Then at the end (level 'num_levels'), calculate the root hash.
> + */
> + blocks = (inode->i_size + params->block_size - 1) >>
> + params->log_blocksize;
> + for (level = 0; level <= params->num_levels; level++) {
> + err = build_merkle_tree_level(inode, level, blocks, params,
> + pending_hashes, req);
> + if (err)
> + goto out;
> + blocks = (blocks + params->hashes_per_block - 1) >>
> + params->log_arity;
> + }
> + memcpy(root_hash, pending_hashes, params->digest_size);
> + err = 0;
> +out:
> + kfree(pending_hashes);
> + ahash_request_free(req);
> + return err;
> +}
> +
> +static int enable_verity(struct file *filp,
> + const struct fsverity_enable_arg *arg)
> +{
> + struct inode *inode = file_inode(filp);
> + const struct fsverity_operations *vops = inode->i_sb->s_vop;
> + struct merkle_tree_params params = { };
> + struct fsverity_descriptor *desc;
> + size_t desc_size = sizeof(*desc);
> + struct fsverity_info *vi;
> + int err;
> +
> + /* Start initializing the fsverity_descriptor */
> + desc = kzalloc(desc_size, GFP_KERNEL);
> + if (!desc)
> + return -ENOMEM;
> + desc->version = 1;
> + desc->hash_algorithm = arg->hash_algorithm;
> + desc->log_blocksize = ilog2(arg->block_size);
> +
> + /* Get the salt if the user provided one */
> + if (arg->salt_size &&
> + copy_from_user(desc->salt,
> + (const u8 __user *)(uintptr_t)arg->salt_ptr,
> + arg->salt_size)) {
> + err = -EFAULT;
> + goto out;
> + }
> + desc->salt_size = arg->salt_size;
> +
> + desc->data_size = cpu_to_le64(inode->i_size);
> +
> + pr_debug("Building Merkle tree...\n");
> +
> + /* Prepare the Merkle tree parameters */
> + err = fsverity_init_merkle_tree_params(&params, inode,
> + arg->hash_algorithm,
> + desc->log_blocksize,
> + desc->salt, desc->salt_size);
> + if (err)
> + goto out;
> +
> + /* Tell the filesystem that verity is being enabled on the file */
> + err = vops->begin_enable_verity(filp);
> + if (err)
> + goto out;
> +
> + /* Build the Merkle tree */
> + BUILD_BUG_ON(sizeof(desc->root_hash) < FS_VERITY_MAX_DIGEST_SIZE);
> + err = build_merkle_tree(inode, &params, desc->root_hash);
> + if (err) {
> + fsverity_err(inode, "Error %d building Merkle tree", err);
> + goto rollback;
> + }
> + pr_debug("Done building Merkle tree. Root hash is %s:%*phN\n",
> + params.hash_alg->name, params.digest_size, desc->root_hash);
> +
> + /*
> + * Create the fsverity_info. Don't bother trying to save work by
> + * reusing the merkle_tree_params from above. Instead, just create the
> + * fsverity_info from the fsverity_descriptor as if it were just loaded
> + * from disk. This is simpler, and it serves as an extra check that the
> + * metadata we're writing is valid before actually enabling verity.
> + */
> + vi = fsverity_create_info(inode, desc, desc_size);
> + if (IS_ERR(vi)) {
> + err = PTR_ERR(vi);
> + goto rollback;
> + }
> +
> + /* Tell the filesystem to finish enabling verity on the file */
> + err = vops->end_enable_verity(filp, desc, desc_size, params.tree_size);
> + if (err) {
> + fsverity_err(inode, "%ps() failed with err %d",
> + vops->end_enable_verity, err);
> + fsverity_free_info(vi);
> + } else if (WARN_ON(!IS_VERITY(inode))) {
> + err = -EINVAL;
> + fsverity_free_info(vi);
> + } else {
> + /* Successfully enabled verity */
> +
> + /*
> + * Readers can start using ->i_verity_info immediately, so it
> + * can't be rolled back once set. So don't set it until just
> + * after the filesystem has successfully enabled verity.
> + */
> + fsverity_set_info(inode, vi);
> + }
> +out:
> + kfree(params.hashstate);
> + kfree(desc);
> + return err;
> +
> +rollback:
> + (void)vops->end_enable_verity(filp, NULL, 0, params.tree_size);
> + goto out;
> +}
> +
> +/**
> + * fsverity_ioctl_enable() - enable verity on a file
> + *
> + * Enable fs-verity on a file. See the "FS_IOC_ENABLE_VERITY" section of
> + * Documentation/filesystems/fsverity.rst for the documentation.
> + *
> + * Return: 0 on success, -errno on failure
> + */
> +int fsverity_ioctl_enable(struct file *filp, const void __user *uarg)
> +{
> + struct inode *inode = file_inode(filp);
> + struct fsverity_enable_arg arg;
> + int err;
> +
> + if (copy_from_user(&arg, uarg, sizeof(arg)))
> + return -EFAULT;
> +
> + if (arg.version != 1)
> + return -EINVAL;
> +
> + if (arg.__reserved1 ||
> + memchr_inv(arg.__reserved2, 0, sizeof(arg.__reserved2)))
> + return -EINVAL;
> +
> + if (arg.block_size != PAGE_SIZE)
> + return -EINVAL;
> +
> + if (arg.salt_size > FIELD_SIZEOF(struct fsverity_descriptor, salt))
> + return -EMSGSIZE;
> +
> + if (arg.sig_size)
> + return -EINVAL;
> +
> + /*
> + * Require a regular file with write access. But the actual fd must
> + * still be readonly so that we can lock out all writers. This is
> + * needed to guarantee that no writable fds exist to the file once it
> + * has verity enabled, and to stabilize the data being hashed.
> + */
> +
> + err = inode_permission(inode, MAY_WRITE);
> + if (err)
> + return err;
> +
> + if (IS_APPEND(inode))
> + return -EPERM;
> +
> + if (S_ISDIR(inode->i_mode))
> + return -EISDIR;
> +
> + if (!S_ISREG(inode->i_mode))
> + return -EINVAL;
> +
> + err = mnt_want_write_file(filp);
> + if (err) /* -EROFS */
> + return err;
> +
> + err = deny_write_access(filp);
> + if (err) /* -ETXTBSY */
> + goto out_drop_write;
> +
> + inode_lock(inode);
> +
> + if (IS_VERITY(inode)) {
> + err = -EEXIST;
> + goto out_unlock;
> + }
> +
> + err = enable_verity(filp, &arg);
> + if (err)
> + goto out_unlock;
> +
> + /*
> + * Some pages of the file may have been evicted from pagecache after
> + * being used in the Merkle tree construction, then read into pagecache
> + * again by another process reading from the file concurrently. Since
> + * these pages didn't undergo verification against the file measurement
> + * which fs-verity now claims to be enforcing, we have to wipe the
> + * pagecache to ensure that all future reads are verified.
> + */
> + filemap_write_and_wait(inode->i_mapping);
> + truncate_inode_pages(inode->i_mapping, 0);
> +
> + /*
> + * allow_write_access() is needed to pair with deny_write_access().
> + * Regardless, the filesystem won't allow writing to verity files.
> + */
> +out_unlock:
> + inode_unlock(inode);
> + allow_write_access(filp);
> +out_drop_write:
> + mnt_drop_write_file(filp);
> + return err;
> +}
> +EXPORT_SYMBOL_GPL(fsverity_ioctl_enable);
> diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
> index ecd47e748c7f64..7ef2ef82653409 100644
> --- a/include/linux/fsverity.h
> +++ b/include/linux/fsverity.h
> @@ -17,6 +17,42 @@
> /* Verity operations for filesystems */
> struct fsverity_operations {
>
> + /**
> + * Begin enabling verity on the given file.
> + *
> + * @filp: a readonly file descriptor for the file
> + *
> + * The filesystem must do any needed filesystem-specific preparations
> + * for enabling verity, e.g. evicting inline data.
> + *
> + * i_rwsem is held for write.
> + *
> + * Return: 0 on success, -errno on failure
> + */
> + int (*begin_enable_verity)(struct file *filp);
> +
> + /**
> + * End enabling verity on the given file.
> + *
> + * @filp: a readonly file descriptor for the file
> + * @desc: the verity descriptor to write, or NULL on failure
> + * @desc_size: size of verity descriptor, or 0 on failure
> + * @merkle_tree_size: total bytes the Merkle tree took up
> + *
> + * If desc == NULL, then enabling verity failed and the filesystem only
> + * must do any necessary cleanups. Else, it must also store the given
> + * verity descriptor to a fs-specific location associated with the inode
> + * and do any fs-specific actions needed to mark the inode as a verity
> + * inode, e.g. setting a bit in the on-disk inode. The filesystem is
> + * also responsible for setting the S_VERITY flag in the VFS inode.
> + *
> + * i_rwsem is held for write.
> + *
> + * Return: 0 on success, -errno on failure
> + */
> + int (*end_enable_verity)(struct file *filp, const void *desc,
> + size_t desc_size, u64 merkle_tree_size);
> +
> /**
> * Get the verity descriptor of the given inode.
> *
> @@ -50,6 +86,22 @@ struct fsverity_operations {
> */
> struct page *(*read_merkle_tree_page)(struct inode *inode,
> pgoff_t index);
> +
> + /**
> + * Write a Merkle tree block to the given inode.
> + *
> + * @inode: the inode for which the Merkle tree is being built
> + * @buf: block to write
> + * @index: 0-based index of the block within the Merkle tree
> + * @log_blocksize: log base 2 of the Merkle tree block size
> + *
> + * This is only called between ->begin_enable_verity() and
> + * ->end_enable_verity(). i_rwsem is held for write.
> + *
> + * Return: 0 on success, -errno on failure
> + */
> + int (*write_merkle_tree_block)(struct inode *inode, const void *buf,
> + u64 index, int log_blocksize);
> };
>
> #ifdef CONFIG_FS_VERITY
> @@ -60,6 +112,10 @@ static inline struct fsverity_info *fsverity_get_info(const struct inode *inode)
> return READ_ONCE(inode->i_verity_info);
> }
>
> +/* enable.c */
> +
> +extern int fsverity_ioctl_enable(struct file *filp, const void __user *arg);
> +
> /* open.c */
>
> extern int fsverity_file_open(struct inode *inode, struct file *filp);
> @@ -79,6 +135,14 @@ static inline struct fsverity_info *fsverity_get_info(const struct inode *inode)
> return NULL;
> }
>
> +/* enable.c */
> +
> +static inline int fsverity_ioctl_enable(struct file *filp,
> + const void __user *arg)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> /* open.c */
>
> static inline int fsverity_file_open(struct inode *inode, struct file *filp)
> --
> 2.22.0.410.gd8fdbe21b5-goog

2019-06-22 22:44:35

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH v5 12/16] fs-verity: add SHA-512 support

On 06/20, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> Add SHA-512 support to fs-verity. This is primarily a demonstration of
> the trivial changes needed to support a new hash algorithm in fs-verity;
> most users will still use SHA-256, due to the smaller space required to
> store the hashes. But some users may prefer SHA-512.
>
> Reviewed-by: Theodore Ts'o <[email protected]>

Reviewed-by: Jaegeuk Kim <[email protected]>

> Signed-off-by: Eric Biggers <[email protected]>
> ---
> fs/verity/fsverity_private.h | 2 +-
> fs/verity/hash_algs.c | 5 +++++
> include/uapi/linux/fsverity.h | 1 +
> 3 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
> index eaa2b3b93bbf6b..02a547f0667c13 100644
> --- a/fs/verity/fsverity_private.h
> +++ b/fs/verity/fsverity_private.h
> @@ -29,7 +29,7 @@ struct ahash_request;
> * Largest digest size among all hash algorithms supported by fs-verity.
> * Currently assumed to be <= size of fsverity_descriptor::root_hash.
> */
> -#define FS_VERITY_MAX_DIGEST_SIZE SHA256_DIGEST_SIZE
> +#define FS_VERITY_MAX_DIGEST_SIZE SHA512_DIGEST_SIZE
>
> /* A hash algorithm supported by fs-verity */
> struct fsverity_hash_alg {
> diff --git a/fs/verity/hash_algs.c b/fs/verity/hash_algs.c
> index 46df17094fc252..e0462a010cabfb 100644
> --- a/fs/verity/hash_algs.c
> +++ b/fs/verity/hash_algs.c
> @@ -17,6 +17,11 @@ struct fsverity_hash_alg fsverity_hash_algs[] = {
> .digest_size = SHA256_DIGEST_SIZE,
> .block_size = SHA256_BLOCK_SIZE,
> },
> + [FS_VERITY_HASH_ALG_SHA512] = {
> + .name = "sha512",
> + .digest_size = SHA512_DIGEST_SIZE,
> + .block_size = SHA512_BLOCK_SIZE,
> + },
> };
>
> /**
> diff --git a/include/uapi/linux/fsverity.h b/include/uapi/linux/fsverity.h
> index 57d1d7fc0c345a..da0daf6c193b4b 100644
> --- a/include/uapi/linux/fsverity.h
> +++ b/include/uapi/linux/fsverity.h
> @@ -14,6 +14,7 @@
> #include <linux/types.h>
>
> #define FS_VERITY_HASH_ALG_SHA256 1
> +#define FS_VERITY_HASH_ALG_SHA512 2
>
> struct fsverity_enable_arg {
> __u32 version;
> --
> 2.22.0.410.gd8fdbe21b5-goog

2019-06-22 23:13:12

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH v5 16/16] f2fs: add fs-verity support

On 06/20, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> Add fs-verity support to f2fs. fs-verity is a filesystem feature that
> enables transparent integrity protection and authentication of read-only
> files. It uses a dm-verity like mechanism at the file level: a Merkle
> tree is used to verify any block in the file in log(filesize) time. It
> is implemented mainly by helper functions in fs/verity/. See
> Documentation/filesystems/fsverity.rst for the full documentation.
>
> The f2fs support for fs-verity consists of:
>
> - Adding a filesystem feature flag and an inode flag for fs-verity.
>
> - Implementing the fsverity_operations to support enabling verity on an
> inode and reading/writing the verity metadata.
>
> - Updating ->readpages() to verify data as it's read from verity files
> and to support reading verity metadata pages.
>
> - Updating ->write_begin(), ->write_end(), and ->writepages() to support
> writing verity metadata pages.
>
> - Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().
>
> Like ext4, f2fs stores the verity metadata (Merkle tree and
> fsverity_descriptor) past the end of the file, starting at the first 64K
> boundary beyond i_size. This approach works because (a) verity files
> are readonly, and (b) pages fully beyond i_size aren't visible to
> userspace but can be read/written internally by f2fs with only some
> relatively small changes to f2fs. Extended attributes cannot be used
> because (a) f2fs limits the total size of an inode's xattr entries to
> 4096 bytes, which wouldn't be enough for even a single Merkle tree
> block, and (b) f2fs encryption doesn't encrypt xattrs, yet the verity
> metadata *must* be encrypted when the file is because it contains hashes
> of the plaintext data.
>

Acked-by: Jaegeuk Kim <[email protected]>

> Signed-off-by: Eric Biggers <[email protected]>
> ---
> fs/f2fs/Makefile | 1 +
> fs/f2fs/data.c | 72 +++++++++++++--
> fs/f2fs/f2fs.h | 23 ++++-
> fs/f2fs/file.c | 40 ++++++++
> fs/f2fs/inode.c | 5 +-
> fs/f2fs/super.c | 3 +
> fs/f2fs/sysfs.c | 11 +++
> fs/f2fs/verity.c | 233 +++++++++++++++++++++++++++++++++++++++++++++++
> fs/f2fs/xattr.h | 2 +
> 9 files changed, 376 insertions(+), 14 deletions(-)
> create mode 100644 fs/f2fs/verity.c
>
> diff --git a/fs/f2fs/Makefile b/fs/f2fs/Makefile
> index 776c4b93650496..2aaecc63834fc8 100644
> --- a/fs/f2fs/Makefile
> +++ b/fs/f2fs/Makefile
> @@ -8,3 +8,4 @@ f2fs-$(CONFIG_F2FS_STAT_FS) += debug.o
> f2fs-$(CONFIG_F2FS_FS_XATTR) += xattr.o
> f2fs-$(CONFIG_F2FS_FS_POSIX_ACL) += acl.o
> f2fs-$(CONFIG_F2FS_IO_TRACE) += trace.o
> +f2fs-$(CONFIG_FS_VERITY) += verity.o
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index eda4181d20926b..8f175d47291d0b 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -73,6 +73,7 @@ static enum count_type __read_io_type(struct page *page)
> enum bio_post_read_step {
> STEP_INITIAL = 0,
> STEP_DECRYPT,
> + STEP_VERITY,
> };
>
> struct bio_post_read_ctx {
> @@ -119,8 +120,23 @@ static void decrypt_work(struct work_struct *work)
> bio_post_read_processing(ctx);
> }
>
> +static void verity_work(struct work_struct *work)
> +{
> + struct bio_post_read_ctx *ctx =
> + container_of(work, struct bio_post_read_ctx, work);
> +
> + fsverity_verify_bio(ctx->bio);
> +
> + bio_post_read_processing(ctx);
> +}
> +
> static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
> {
> + /*
> + * We use different work queues for decryption and for verity because
> + * verity may require reading metadata pages that need decryption, and
> + * we shouldn't recurse to the same workqueue.
> + */
> switch (++ctx->cur_step) {
> case STEP_DECRYPT:
> if (ctx->enabled_steps & (1 << STEP_DECRYPT)) {
> @@ -130,6 +146,14 @@ static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
> }
> ctx->cur_step++;
> /* fall-through */
> + case STEP_VERITY:
> + if (ctx->enabled_steps & (1 << STEP_VERITY)) {
> + INIT_WORK(&ctx->work, verity_work);
> + fsverity_enqueue_verify_work(&ctx->work);
> + return;
> + }
> + ctx->cur_step++;
> + /* fall-through */
> default:
> __read_end_io(ctx->bio);
> }
> @@ -553,8 +577,15 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
> up_write(&io->io_rwsem);
> }
>
> +static inline bool f2fs_need_verity(const struct inode *inode, pgoff_t idx)
> +{
> + return fsverity_active(inode) &&
> + idx < DIV_ROUND_UP(inode->i_size, PAGE_SIZE);
> +}
> +
> static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
> - unsigned nr_pages, unsigned op_flag)
> + unsigned nr_pages, unsigned op_flag,
> + pgoff_t first_idx)
> {
> struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> struct bio *bio;
> @@ -570,6 +601,10 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
>
> if (f2fs_encrypted_file(inode))
> post_read_steps |= 1 << STEP_DECRYPT;
> +
> + if (f2fs_need_verity(inode, first_idx))
> + post_read_steps |= 1 << STEP_VERITY;
> +
> if (post_read_steps) {
> ctx = mempool_alloc(bio_post_read_ctx_pool, GFP_NOFS);
> if (!ctx) {
> @@ -591,7 +626,7 @@ static int f2fs_submit_page_read(struct inode *inode, struct page *page,
> struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> struct bio *bio;
>
> - bio = f2fs_grab_read_bio(inode, blkaddr, 1, 0);
> + bio = f2fs_grab_read_bio(inode, blkaddr, 1, 0, page->index);
> if (IS_ERR(bio))
> return PTR_ERR(bio);
>
> @@ -1514,6 +1549,15 @@ int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> return ret;
> }
>
> +static inline loff_t f2fs_readpage_limit(struct inode *inode)
> +{
> + if (IS_ENABLED(CONFIG_FS_VERITY) &&
> + (IS_VERITY(inode) || f2fs_verity_in_progress(inode)))
> + return inode->i_sb->s_maxbytes;
> +
> + return i_size_read(inode);
> +}
> +
> static int f2fs_read_single_page(struct inode *inode, struct page *page,
> unsigned nr_pages,
> struct f2fs_map_blocks *map,
> @@ -1532,7 +1576,7 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page,
>
> block_in_file = (sector_t)page->index;
> last_block = block_in_file + nr_pages;
> - last_block_in_file = (i_size_read(inode) + blocksize - 1) >>
> + last_block_in_file = (f2fs_readpage_limit(inode) + blocksize - 1) >>
> blkbits;
> if (last_block > last_block_in_file)
> last_block = last_block_in_file;
> @@ -1576,6 +1620,11 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page,
> } else {
> zero_out:
> zero_user_segment(page, 0, PAGE_SIZE);
> + if (f2fs_need_verity(inode, page->index) &&
> + !fsverity_verify_page(page)) {
> + ret = -EIO;
> + goto out;
> + }
> if (!PageUptodate(page))
> SetPageUptodate(page);
> unlock_page(page);
> @@ -1594,7 +1643,7 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page,
> }
> if (bio == NULL) {
> bio = f2fs_grab_read_bio(inode, block_nr, nr_pages,
> - is_readahead ? REQ_RAHEAD : 0);
> + is_readahead ? REQ_RAHEAD : 0, page->index);
> if (IS_ERR(bio)) {
> ret = PTR_ERR(bio);
> bio = NULL;
> @@ -1991,7 +2040,7 @@ static int __write_data_page(struct page *page, bool *submitted,
> if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
> goto redirty_out;
>
> - if (page->index < end_index)
> + if (page->index < end_index || f2fs_verity_in_progress(inode))
> goto write;
>
> /*
> @@ -2383,7 +2432,8 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi,
> * the block addresses when there is no need to fill the page.
> */
> if (!f2fs_has_inline_data(inode) && len == PAGE_SIZE &&
> - !is_inode_flag_set(inode, FI_NO_PREALLOC))
> + !is_inode_flag_set(inode, FI_NO_PREALLOC) &&
> + !f2fs_verity_in_progress(inode))
> return 0;
>
> /* f2fs_lock_op avoids race between write CP and convert_inline_page */
> @@ -2522,7 +2572,8 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping,
> if (len == PAGE_SIZE || PageUptodate(page))
> return 0;
>
> - if (!(pos & (PAGE_SIZE - 1)) && (pos + len) >= i_size_read(inode)) {
> + if (!(pos & (PAGE_SIZE - 1)) && (pos + len) >= i_size_read(inode) &&
> + !f2fs_verity_in_progress(inode)) {
> zero_user_segment(page, len, PAGE_SIZE);
> return 0;
> }
> @@ -2585,7 +2636,8 @@ static int f2fs_write_end(struct file *file,
>
> set_page_dirty(page);
>
> - if (pos + copied > i_size_read(inode))
> + if (pos + copied > i_size_read(inode) &&
> + !f2fs_verity_in_progress(inode))
> f2fs_i_size_write(inode, pos + copied);
> unlock_out:
> f2fs_put_page(page, 1);
> @@ -2906,7 +2958,9 @@ void f2fs_clear_page_cache_dirty_tag(struct page *page)
>
> int __init f2fs_init_post_read_processing(void)
> {
> - bio_post_read_ctx_cache = KMEM_CACHE(bio_post_read_ctx, 0);
> + bio_post_read_ctx_cache =
> + kmem_cache_create("f2fs_bio_post_read_ctx",
> + sizeof(struct bio_post_read_ctx), 0, 0, NULL);
> if (!bio_post_read_ctx_cache)
> goto fail;
> bio_post_read_ctx_pool =
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 06b89a9862ab2b..8477191ad1c9b2 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -25,6 +25,7 @@
> #include <crypto/hash.h>
>
> #include <linux/fscrypt.h>
> +#include <linux/fsverity.h>
>
> #ifdef CONFIG_F2FS_CHECK_FS
> #define f2fs_bug_on(sbi, condition) BUG_ON(condition)
> @@ -148,7 +149,7 @@ struct f2fs_mount_info {
> #define F2FS_FEATURE_QUOTA_INO 0x0080
> #define F2FS_FEATURE_INODE_CRTIME 0x0100
> #define F2FS_FEATURE_LOST_FOUND 0x0200
> -#define F2FS_FEATURE_VERITY 0x0400 /* reserved */
> +#define F2FS_FEATURE_VERITY 0x0400
> #define F2FS_FEATURE_SB_CHKSUM 0x0800
>
> #define __F2FS_HAS_FEATURE(raw_super, mask) \
> @@ -626,7 +627,7 @@ enum {
> #define FADVISE_ENC_NAME_BIT 0x08
> #define FADVISE_KEEP_SIZE_BIT 0x10
> #define FADVISE_HOT_BIT 0x20
> -#define FADVISE_VERITY_BIT 0x40 /* reserved */
> +#define FADVISE_VERITY_BIT 0x40
>
> #define FADVISE_MODIFIABLE_BITS (FADVISE_COLD_BIT | FADVISE_HOT_BIT)
>
> @@ -646,6 +647,8 @@ enum {
> #define file_is_hot(inode) is_file(inode, FADVISE_HOT_BIT)
> #define file_set_hot(inode) set_file(inode, FADVISE_HOT_BIT)
> #define file_clear_hot(inode) clear_file(inode, FADVISE_HOT_BIT)
> +#define file_is_verity(inode) is_file(inode, FADVISE_VERITY_BIT)
> +#define file_set_verity(inode) set_file(inode, FADVISE_VERITY_BIT)
>
> #define DEF_DIR_LEVEL 0
>
> @@ -2344,6 +2347,7 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr)
> #define F2FS_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/
> #define F2FS_HUGE_FILE_FL 0x00040000 /* Set to each huge file */
> #define F2FS_EXTENTS_FL 0x00080000 /* Inode uses extents */
> +#define F2FS_VERITY_FL 0x00100000 /* Verity protected inode */
> #define F2FS_EA_INODE_FL 0x00200000 /* Inode used for large EA */
> #define F2FS_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
> #define F2FS_NOCOW_FL 0x00800000 /* Do not cow file */
> @@ -2351,7 +2355,7 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr)
> #define F2FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
> #define F2FS_RESERVED_FL 0x80000000 /* reserved for ext4 lib */
>
> -#define F2FS_FL_USER_VISIBLE 0x30CBDFFF /* User visible flags */
> +#define F2FS_FL_USER_VISIBLE 0x30DBDFFF /* User visible flags */
> #define F2FS_FL_USER_MODIFIABLE 0x204BC0FF /* User modifiable flags */
>
> /* Flags we can manipulate with through F2FS_IOC_FSSETXATTR */
> @@ -2417,6 +2421,7 @@ enum {
> FI_PROJ_INHERIT, /* indicate file inherits projectid */
> FI_PIN_FILE, /* indicate file should not be gced */
> FI_ATOMIC_REVOKE_REQUEST, /* request to drop atomic data */
> + FI_VERITY_IN_PROGRESS, /* building fs-verity Merkle tree */
> };
>
> static inline void __mark_inode_dirty_flag(struct inode *inode,
> @@ -2456,6 +2461,12 @@ static inline void clear_inode_flag(struct inode *inode, int flag)
> __mark_inode_dirty_flag(inode, flag, false);
> }
>
> +static inline bool f2fs_verity_in_progress(struct inode *inode)
> +{
> + return IS_ENABLED(CONFIG_FS_VERITY) &&
> + is_inode_flag_set(inode, FI_VERITY_IN_PROGRESS);
> +}
> +
> static inline void set_acl_inode(struct inode *inode, umode_t mode)
> {
> F2FS_I(inode)->i_acl_mode = mode;
> @@ -3524,6 +3535,9 @@ void f2fs_exit_sysfs(void);
> int f2fs_register_sysfs(struct f2fs_sb_info *sbi);
> void f2fs_unregister_sysfs(struct f2fs_sb_info *sbi);
>
> +/* verity.c */
> +extern const struct fsverity_operations f2fs_verityops;
> +
> /*
> * crypto support
> */
> @@ -3546,7 +3560,7 @@ static inline void f2fs_set_encrypted_inode(struct inode *inode)
> */
> static inline bool f2fs_post_read_required(struct inode *inode)
> {
> - return f2fs_encrypted_file(inode);
> + return f2fs_encrypted_file(inode) || fsverity_active(inode);
> }
>
> #define F2FS_FEATURE_FUNCS(name, flagname) \
> @@ -3564,6 +3578,7 @@ F2FS_FEATURE_FUNCS(flexible_inline_xattr, FLEXIBLE_INLINE_XATTR);
> F2FS_FEATURE_FUNCS(quota_ino, QUOTA_INO);
> F2FS_FEATURE_FUNCS(inode_crtime, INODE_CRTIME);
> F2FS_FEATURE_FUNCS(lost_found, LOST_FOUND);
> +F2FS_FEATURE_FUNCS(verity, VERITY);
> F2FS_FEATURE_FUNCS(sb_chksum, SB_CHKSUM);
>
> #ifdef CONFIG_BLK_DEV_ZONED
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 45b45f37d347e4..6706c2081941a2 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -493,6 +493,10 @@ static int f2fs_file_open(struct inode *inode, struct file *filp)
> {
> int err = fscrypt_file_open(inode, filp);
>
> + if (err)
> + return err;
> +
> + err = fsverity_file_open(inode, filp);
> if (err)
> return err;
>
> @@ -781,6 +785,10 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
> if (err)
> return err;
>
> + err = fsverity_prepare_setattr(dentry, attr);
> + if (err)
> + return err;
> +
> if (is_quota_modification(inode, attr)) {
> err = dquot_initialize(inode);
> if (err)
> @@ -1656,6 +1664,8 @@ static int f2fs_ioc_getflags(struct file *filp, unsigned long arg)
>
> if (IS_ENCRYPTED(inode))
> flags |= F2FS_ENCRYPT_FL;
> + if (IS_VERITY(inode))
> + flags |= F2FS_VERITY_FL;
> if (f2fs_has_inline_data(inode) || f2fs_has_inline_dentry(inode))
> flags |= F2FS_INLINE_DATA_FL;
> if (is_inode_flag_set(inode, FI_PIN_FILE))
> @@ -2980,6 +2990,30 @@ static int f2fs_ioc_precache_extents(struct file *filp, unsigned long arg)
> return f2fs_precache_extents(file_inode(filp));
> }
>
> +static int f2fs_ioc_enable_verity(struct file *filp, unsigned long arg)
> +{
> + struct inode *inode = file_inode(filp);
> +
> + f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
> +
> + if (!f2fs_sb_has_verity(F2FS_I_SB(inode))) {
> + f2fs_msg(inode->i_sb, KERN_WARNING,
> + "Can't enable fs-verity on inode %lu: the verity feature is not enabled on this filesystem.\n",
> + inode->i_ino);
> + return -EOPNOTSUPP;
> + }
> +
> + return fsverity_ioctl_enable(filp, (const void __user *)arg);
> +}
> +
> +static int f2fs_ioc_measure_verity(struct file *filp, unsigned long arg)
> +{
> + if (!f2fs_sb_has_verity(F2FS_I_SB(file_inode(filp))))
> + return -EOPNOTSUPP;
> +
> + return fsverity_ioctl_measure(filp, (void __user *)arg);
> +}
> +
> long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> {
> if (unlikely(f2fs_cp_error(F2FS_I_SB(file_inode(filp)))))
> @@ -3036,6 +3070,10 @@ long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> return f2fs_ioc_set_pin_file(filp, arg);
> case F2FS_IOC_PRECACHE_EXTENTS:
> return f2fs_ioc_precache_extents(filp, arg);
> + case FS_IOC_ENABLE_VERITY:
> + return f2fs_ioc_enable_verity(filp, arg);
> + case FS_IOC_MEASURE_VERITY:
> + return f2fs_ioc_measure_verity(filp, arg);
> default:
> return -ENOTTY;
> }
> @@ -3149,6 +3187,8 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> case F2FS_IOC_GET_PIN_FILE:
> case F2FS_IOC_SET_PIN_FILE:
> case F2FS_IOC_PRECACHE_EXTENTS:
> + case FS_IOC_ENABLE_VERITY:
> + case FS_IOC_MEASURE_VERITY:
> break;
> default:
> return -ENOIOCTLCMD;
> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> index ccb02226dd2c0c..b2f945b1afe501 100644
> --- a/fs/f2fs/inode.c
> +++ b/fs/f2fs/inode.c
> @@ -46,9 +46,11 @@ void f2fs_set_inode_flags(struct inode *inode)
> new_fl |= S_DIRSYNC;
> if (file_is_encrypt(inode))
> new_fl |= S_ENCRYPTED;
> + if (file_is_verity(inode))
> + new_fl |= S_VERITY;
> inode_set_flags(inode, new_fl,
> S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|
> - S_ENCRYPTED);
> + S_ENCRYPTED|S_VERITY);
> }
>
> static void __get_inode_rdev(struct inode *inode, struct f2fs_inode *ri)
> @@ -749,6 +751,7 @@ void f2fs_evict_inode(struct inode *inode)
> }
> out_clear:
> fscrypt_put_encryption_info(inode);
> + fsverity_cleanup_inode(inode);
> clear_inode(inode);
> }
>
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 6b959bbb336a30..ea4a247d6ed6f7 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -3177,6 +3177,9 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> sb->s_op = &f2fs_sops;
> #ifdef CONFIG_FS_ENCRYPTION
> sb->s_cop = &f2fs_cryptops;
> +#endif
> +#ifdef CONFIG_FS_VERITY
> + sb->s_vop = &f2fs_verityops;
> #endif
> sb->s_xattr = f2fs_xattr_handlers;
> sb->s_export_op = &f2fs_export_ops;
> diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
> index 729f46a3c9ee0b..b3e28467db7279 100644
> --- a/fs/f2fs/sysfs.c
> +++ b/fs/f2fs/sysfs.c
> @@ -117,6 +117,9 @@ static ssize_t features_show(struct f2fs_attr *a,
> if (f2fs_sb_has_lost_found(sbi))
> len += snprintf(buf + len, PAGE_SIZE - len, "%s%s",
> len ? ", " : "", "lost_found");
> + if (f2fs_sb_has_verity(sbi))
> + len += snprintf(buf + len, PAGE_SIZE - len, "%s%s",
> + len ? ", " : "", "verity");
> if (f2fs_sb_has_sb_chksum(sbi))
> len += snprintf(buf + len, PAGE_SIZE - len, "%s%s",
> len ? ", " : "", "sb_checksum");
> @@ -350,6 +353,7 @@ enum feat_id {
> FEAT_QUOTA_INO,
> FEAT_INODE_CRTIME,
> FEAT_LOST_FOUND,
> + FEAT_VERITY,
> FEAT_SB_CHECKSUM,
> };
>
> @@ -367,6 +371,7 @@ static ssize_t f2fs_feature_show(struct f2fs_attr *a,
> case FEAT_QUOTA_INO:
> case FEAT_INODE_CRTIME:
> case FEAT_LOST_FOUND:
> + case FEAT_VERITY:
> case FEAT_SB_CHECKSUM:
> return snprintf(buf, PAGE_SIZE, "supported\n");
> }
> @@ -455,6 +460,9 @@ F2FS_FEATURE_RO_ATTR(flexible_inline_xattr, FEAT_FLEXIBLE_INLINE_XATTR);
> F2FS_FEATURE_RO_ATTR(quota_ino, FEAT_QUOTA_INO);
> F2FS_FEATURE_RO_ATTR(inode_crtime, FEAT_INODE_CRTIME);
> F2FS_FEATURE_RO_ATTR(lost_found, FEAT_LOST_FOUND);
> +#ifdef CONFIG_FS_VERITY
> +F2FS_FEATURE_RO_ATTR(verity, FEAT_VERITY);
> +#endif
> F2FS_FEATURE_RO_ATTR(sb_checksum, FEAT_SB_CHECKSUM);
>
> #define ATTR_LIST(name) (&f2fs_attr_##name.attr)
> @@ -517,6 +525,9 @@ static struct attribute *f2fs_feat_attrs[] = {
> ATTR_LIST(quota_ino),
> ATTR_LIST(inode_crtime),
> ATTR_LIST(lost_found),
> +#ifdef CONFIG_FS_VERITY
> + ATTR_LIST(verity),
> +#endif
> ATTR_LIST(sb_checksum),
> NULL,
> };
> diff --git a/fs/f2fs/verity.c b/fs/f2fs/verity.c
> new file mode 100644
> index 00000000000000..dd9bb47ced0093
> --- /dev/null
> +++ b/fs/f2fs/verity.c
> @@ -0,0 +1,233 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * fs/f2fs/verity.c: fs-verity support for f2fs
> + *
> + * Copyright 2019 Google LLC
> + */
> +
> +/*
> + * Implementation of fsverity_operations for f2fs.
> + *
> + * Like ext4, f2fs stores the verity metadata (Merkle tree and
> + * fsverity_descriptor) past the end of the file, starting at the first 64K
> + * boundary beyond i_size. This approach works because (a) verity files are
> + * readonly, and (b) pages fully beyond i_size aren't visible to userspace but
> + * can be read/written internally by f2fs with only some relatively small
> + * changes to f2fs. Extended attributes cannot be used because (a) f2fs limits
> + * the total size of an inode's xattr entries to 4096 bytes, which wouldn't be
> + * enough for even a single Merkle tree block, and (b) f2fs encryption doesn't
> + * encrypt xattrs, yet the verity metadata *must* be encrypted when the file is
> + * because it contains hashes of the plaintext data.
> + *
> + * Using a 64K boundary rather than a 4K one keeps things ready for
> + * architectures with 64K pages, and it doesn't necessarily waste space on-disk
> + * since there can be a hole between i_size and the start of the Merkle tree.
> + */
> +
> +#include <linux/f2fs_fs.h>
> +
> +#include "f2fs.h"
> +#include "xattr.h"
> +
> +static inline loff_t f2fs_verity_metadata_pos(const struct inode *inode)
> +{
> + return round_up(inode->i_size, 65536);
> +}
> +
> +/*
> + * Read some verity metadata from the inode. __vfs_read() can't be used because
> + * we need to read beyond i_size.
> + */
> +static int pagecache_read(struct inode *inode, void *buf, size_t count,
> + loff_t pos)
> +{
> + while (count) {
> + size_t n = min_t(size_t, count,
> + PAGE_SIZE - offset_in_page(pos));
> + struct page *page;
> + void *addr;
> +
> + page = read_mapping_page(inode->i_mapping, pos >> PAGE_SHIFT,
> + NULL);
> + if (IS_ERR(page))
> + return PTR_ERR(page);
> +
> + addr = kmap_atomic(page);
> + memcpy(buf, addr + offset_in_page(pos), n);
> + kunmap_atomic(addr);
> +
> + put_page(page);
> +
> + buf += n;
> + pos += n;
> + count -= n;
> + }
> + return 0;
> +}
> +
> +/*
> + * Write some verity metadata to the inode for FS_IOC_ENABLE_VERITY.
> + * kernel_write() can't be used because the file descriptor is readonly.
> + */
> +static int pagecache_write(struct inode *inode, const void *buf, size_t count,
> + loff_t pos)
> +{
> + while (count) {
> + size_t n = min_t(size_t, count,
> + PAGE_SIZE - offset_in_page(pos));
> + struct page *page;
> + void *fsdata;
> + void *addr;
> + int res;
> +
> + res = pagecache_write_begin(NULL, inode->i_mapping, pos, n, 0,
> + &page, &fsdata);
> + if (res)
> + return res;
> +
> + addr = kmap_atomic(page);
> + memcpy(addr + offset_in_page(pos), buf, n);
> + kunmap_atomic(addr);
> +
> + res = pagecache_write_end(NULL, inode->i_mapping, pos, n, n,
> + page, fsdata);
> + if (res < 0)
> + return res;
> + if (res != n)
> + return -EIO;
> +
> + buf += n;
> + pos += n;
> + count -= n;
> + }
> + return 0;
> +}
> +
> +/*
> + * Format of f2fs verity xattr. This points to the location of the verity
> + * descriptor within the file data rather than containing it directly because
> + * the verity descriptor *must* be encrypted when f2fs encryption is used. But,
> + * f2fs encryption does not encrypt xattrs.
> + */
> +struct fsverity_descriptor_location {
> + __le32 version;
> + __le32 size;
> + __le64 pos;
> +};
> +
> +static int f2fs_begin_enable_verity(struct file *filp)
> +{
> + struct inode *inode = file_inode(filp);
> + int err;
> +
> + err = f2fs_convert_inline_inode(inode);
> + if (err)
> + return err;
> +
> + err = dquot_initialize(inode);
> + if (err)
> + return err;
> +
> + set_inode_flag(inode, FI_VERITY_IN_PROGRESS);
> + return 0;
> +}
> +
> +static int f2fs_end_enable_verity(struct file *filp, const void *desc,
> + size_t desc_size, u64 merkle_tree_size)
> +{
> + struct inode *inode = file_inode(filp);
> + u64 desc_pos = f2fs_verity_metadata_pos(inode) + merkle_tree_size;
> + struct fsverity_descriptor_location dloc = {
> + .version = cpu_to_le32(1),
> + .size = cpu_to_le32(desc_size),
> + .pos = cpu_to_le64(desc_pos),
> + };
> + int err = 0;
> +
> + if (desc != NULL) {
> + /* Succeeded; write the verity descriptor. */
> + err = pagecache_write(inode, desc, desc_size, desc_pos);
> +
> + /* Write all pages before clearing FI_VERITY_IN_PROGRESS. */
> + if (!err)
> + err = filemap_write_and_wait(inode->i_mapping);
> + } else {
> + /* Failed; truncate anything we wrote past i_size. */
> + f2fs_truncate(inode);
> + }
> +
> + clear_inode_flag(inode, FI_VERITY_IN_PROGRESS);
> +
> + if (desc != NULL && !err) {
> + err = f2fs_setxattr(inode, F2FS_XATTR_INDEX_VERITY,
> + F2FS_XATTR_NAME_VERITY, &dloc, sizeof(dloc),
> + NULL, XATTR_CREATE);
> + if (!err) {
> + file_set_verity(inode);
> + f2fs_set_inode_flags(inode);
> + f2fs_mark_inode_dirty_sync(inode, true);
> + }
> + }
> + return err;
> +}
> +
> +static int f2fs_get_verity_descriptor(struct inode *inode, void *buf,
> + size_t buf_size)
> +{
> + struct fsverity_descriptor_location dloc;
> + int res;
> + u32 size;
> + u64 pos;
> +
> + /* Get the descriptor location */
> + res = f2fs_getxattr(inode, F2FS_XATTR_INDEX_VERITY,
> + F2FS_XATTR_NAME_VERITY, &dloc, sizeof(dloc), NULL);
> + if (res < 0 && res != -ERANGE)
> + return res;
> + if (res != sizeof(dloc) || dloc.version != cpu_to_le32(1)) {
> + f2fs_msg(inode->i_sb, KERN_WARNING,
> + "unknown verity xattr format");
> + return -EINVAL;
> + }
> + size = le32_to_cpu(dloc.size);
> + pos = le64_to_cpu(dloc.pos);
> +
> + /* Get the descriptor */
> + if (pos + size < pos || pos + size > inode->i_sb->s_maxbytes ||
> + pos < f2fs_verity_metadata_pos(inode) || size > INT_MAX) {
> + f2fs_msg(inode->i_sb, KERN_WARNING, "invalid verity xattr");
> + return -EUCLEAN; /* EFSCORRUPTED */
> + }
> + if (buf_size) {
> + if (size > buf_size)
> + return -ERANGE;
> + res = pagecache_read(inode, buf, size, pos);
> + if (res)
> + return res;
> + }
> + return size;
> +}
> +
> +static struct page *f2fs_read_merkle_tree_page(struct inode *inode,
> + pgoff_t index)
> +{
> + index += f2fs_verity_metadata_pos(inode) >> PAGE_SHIFT;
> +
> + return read_mapping_page(inode->i_mapping, index, NULL);
> +}
> +
> +static int f2fs_write_merkle_tree_block(struct inode *inode, const void *buf,
> + u64 index, int log_blocksize)
> +{
> + loff_t pos = f2fs_verity_metadata_pos(inode) + (index << log_blocksize);
> +
> + return pagecache_write(inode, buf, 1 << log_blocksize, pos);
> +}
> +
> +const struct fsverity_operations f2fs_verityops = {
> + .begin_enable_verity = f2fs_begin_enable_verity,
> + .end_enable_verity = f2fs_end_enable_verity,
> + .get_verity_descriptor = f2fs_get_verity_descriptor,
> + .read_merkle_tree_page = f2fs_read_merkle_tree_page,
> + .write_merkle_tree_block = f2fs_write_merkle_tree_block,
> +};
> diff --git a/fs/f2fs/xattr.h b/fs/f2fs/xattr.h
> index a90920e2f94980..de0c600b9cab09 100644
> --- a/fs/f2fs/xattr.h
> +++ b/fs/f2fs/xattr.h
> @@ -34,8 +34,10 @@
> #define F2FS_XATTR_INDEX_ADVISE 7
> /* Should be same as EXT4_XATTR_INDEX_ENCRYPTION */
> #define F2FS_XATTR_INDEX_ENCRYPTION 9
> +#define F2FS_XATTR_INDEX_VERITY 11
>
> #define F2FS_XATTR_NAME_ENCRYPTION_CONTEXT "c"
> +#define F2FS_XATTR_NAME_VERITY "v"
>
> struct f2fs_xattr_header {
> __le32 h_magic; /* magic number for identification */
> --
> 2.22.0.410.gd8fdbe21b5-goog

2019-06-25 07:58:09

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH v5 16/16] f2fs: add fs-verity support

Hi Eric,

On 2019/6/21 4:50, Eric Biggers wrote:
> +static int f2fs_begin_enable_verity(struct file *filp)
> +{
> + struct inode *inode = file_inode(filp);
> + int err;
> +

I think we'd better add condition here (under inode lock) to disallow enabling
verity on atomic/volatile inode, as we may fail to write merkle tree data due to
atomic/volatile inode's special writeback method.

> + err = f2fs_convert_inline_inode(inode);
> + if (err)
> + return err;
> +
> + err = dquot_initialize(inode);
> + if (err)
> + return err;

We can get rid of dquot_initialize() here, since f2fs_file_open() ->
dquot_file_open() should has initialized quota entry previously, right?

Thanks,

> +
> + set_inode_flag(inode, FI_VERITY_IN_PROGRESS);
> + return 0;
> +}
> +

2019-06-25 19:49:11

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v5 16/16] f2fs: add fs-verity support

Hi Chao, thanks for the review.

On Tue, Jun 25, 2019 at 03:55:57PM +0800, Chao Yu wrote:
> Hi Eric,
>
> On 2019/6/21 4:50, Eric Biggers wrote:
> > +static int f2fs_begin_enable_verity(struct file *filp)
> > +{
> > + struct inode *inode = file_inode(filp);
> > + int err;
> > +
>
> I think we'd better add condition here (under inode lock) to disallow enabling
> verity on atomic/volatile inode, as we may fail to write merkle tree data due to
> atomic/volatile inode's special writeback method.
>

Yes, I'll add the following:

if (f2fs_is_atomic_file(inode) || f2fs_is_volatile_file(inode))
return -EOPNOTSUPP;

> > + err = f2fs_convert_inline_inode(inode);
> > + if (err)
> > + return err;
> > +
> > + err = dquot_initialize(inode);
> > + if (err)
> > + return err;
>
> We can get rid of dquot_initialize() here, since f2fs_file_open() ->
> dquot_file_open() should has initialized quota entry previously, right?

We still need it because dquot_file_open() only calls dquot_initialize() if the
file is being opened for writing. But here the file descriptor is readonly.
I'll add a comment explaining this here and in the ext4 equivalent.

- Eric

2019-06-26 07:35:46

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH v5 16/16] f2fs: add fs-verity support

Hi Eric,

On 2019/6/26 1:52, Eric Biggers wrote:
> Hi Chao, thanks for the review.
>
> On Tue, Jun 25, 2019 at 03:55:57PM +0800, Chao Yu wrote:
>> Hi Eric,
>>
>> On 2019/6/21 4:50, Eric Biggers wrote:
>>> +static int f2fs_begin_enable_verity(struct file *filp)
>>> +{
>>> + struct inode *inode = file_inode(filp);
>>> + int err;
>>> +
>>
>> I think we'd better add condition here (under inode lock) to disallow enabling
>> verity on atomic/volatile inode, as we may fail to write merkle tree data due to
>> atomic/volatile inode's special writeback method.
>>
>
> Yes, I'll add the following:
>
> if (f2fs_is_atomic_file(inode) || f2fs_is_volatile_file(inode))
> return -EOPNOTSUPP;
>
>>> + err = f2fs_convert_inline_inode(inode);
>>> + if (err)
>>> + return err;
>>> +
>>> + err = dquot_initialize(inode);
>>> + if (err)
>>> + return err;
>>
>> We can get rid of dquot_initialize() here, since f2fs_file_open() ->
>> dquot_file_open() should has initialized quota entry previously, right?
>
> We still need it because dquot_file_open() only calls dquot_initialize() if the
> file is being opened for writing. But here the file descriptor is readonly.
> I'll add a comment explaining this here and in the ext4 equivalent.

Ah, you're right.

f2fs_convert_inline_inode() may grab one more block during conversion, so we
need to call dquot_initialize() before inline conversion?

Thanks,

>
> - Eric
> .
>

2019-06-26 18:22:28

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v5 16/16] f2fs: add fs-verity support

On Wed, Jun 26, 2019 at 03:34:35PM +0800, Chao Yu wrote:
> >>> + err = f2fs_convert_inline_inode(inode);
> >>> + if (err)
> >>> + return err;
> >>> +
> >>> + err = dquot_initialize(inode);
> >>> + if (err)
> >>> + return err;
> >>
> >> We can get rid of dquot_initialize() here, since f2fs_file_open() ->
> >> dquot_file_open() should has initialized quota entry previously, right?
> >
> > We still need it because dquot_file_open() only calls dquot_initialize() if the
> > file is being opened for writing. But here the file descriptor is readonly.
> > I'll add a comment explaining this here and in the ext4 equivalent.
>
> Ah, you're right.
>
> f2fs_convert_inline_inode() may grab one more block during conversion, so we
> need to call dquot_initialize() before inline conversion?
>
> Thanks,
>

Good point. I'll fix that here and in ext4.

- Eric