Hello,
This patchset implements fs-verity for ext4 and f2fs. fs-verity is
similar to dm-verity, but implemented on a per-file basis: a Merkle tree
is used to measure (hash) the file's data as it is paged in. ext4 and
f2fs hide this Merkle tree beyond the end of the file, though other
filesystems might implement it differently in the future. In general,
fs-verity is intended for use on writable filesystems; dm-verity is
still recommended on read-only ones.
Similar to fscrypt, most of the code is in fs/verity/, and not too many
filesystem-specific changes are needed. The Merkle tree is written by
userspace before calling an ioctl to mark the file as a verity file; the
file then becomes read-only and the verity metadata is hidden or moved.
fs-verity provides a file measurement (hash) in constant time and
verifies data on-demand. Thus, it is useful for efficiently verifying
the authenticity of large files of which only a small portion may be
accessed, such as Android application package (APK) files. It may also
be useful in "audit" use cases where file hashes are logged.
fs-verity also provides better protection against malicious disks than
an ahead-of-time hash, since fs-verity re-verifies data each time it's
paged in. Note, however, that any authenticity guarantee is still
dependent on verification of the file measurement and other relevant
metadata in a way that makes sense for the overall system; fs-verity is
only a tool to help with this.
This patchset doesn't yet include IMA support for fs-verity file
measurements. This is planned and we'd like to collaborate with the IMA
maintainers. Although fs-verity can be used on its own without IMA,
fs-verity is primarily a lower level feature (think of it as a way of
hashing a file), so some users may still need IMA's policy mechanism.
However, an optional in-kernel signature verification mechanism within
fs-verity itself is also included.
This patchset is based on Linus' tree as of today (commit 7c6c54b505b8a).
It can also be found in git at tag "fsverity_2018-11-01" of:
https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git
fs-verity has a userspace utility:
https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity-utils.git
xfstests for fs-verity can be found at branch "fsverity" of:
https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/xfstests-dev.git
fs-verity is supported by e2fsprogs v1.44.4-2+ and f2fs-tools v1.11.0+.
Please see the documentation file Documentation/filesystems/fsverity.rst
(added by patch 1) for details; this cover letter only gave an overview.
Examples of setting up fs-verity protected files can also be found in
the README file of fsverity-utils.
Other useful references include:
- LWN coverage of v1 patchset: https://lwn.net/Articles/763729/
- Presentation at Linux Security Summit North America 2018:
- Slides: https://schd.ws/hosted_files/lssna18/af/fs-verity%20slide%20deck.pdf
- Video: https://www.youtube.com/watch?v=Aw5h6aBhu6M
- Notes from discussion at LSFMM 2018: https://lwn.net/Articles/752614/
Changes since v1:
- Added documentation file.
- Require write permission for FS_IOC_ENABLE_VERITY, rather than
CAP_SYS_ADMIN.
- Eliminated dependency on CONFIG_BLOCK and clarified that filesystems
can verify a page at a time rather than a bio at a time.
- Fixed conditions for verifying holes.
- ext4 now only allows fs-verity on extent-based files.
- Eliminated most of the assumptions that the verity metadata is stored
beyond EOF, in case filesystems want to do things differently.
- Other cleanups.
Eric Biggers (12):
fs-verity: add a documentation file
fs-verity: add setup code, UAPI, and Kconfig
fs-verity: add MAINTAINERS file entry
fs-verity: add data verification hooks for ->readpages()
fs-verity: implement FS_IOC_ENABLE_VERITY ioctl
fs-verity: implement FS_IOC_MEASURE_VERITY ioctl
fs-verity: add SHA-512 support
fs-verity: add CRC-32C support
fs-verity: support builtin file signatures
ext4: add basic fs-verity support
ext4: add fs-verity read support
f2fs: fs-verity support
Documentation/filesystems/fsverity.rst | 583 ++++++++++++++++
Documentation/filesystems/index.rst | 11 +
Documentation/ioctl/ioctl-number.txt | 1 +
MAINTAINERS | 11 +
fs/Kconfig | 2 +
fs/Makefile | 1 +
fs/ext4/Kconfig | 20 +
fs/ext4/ext4.h | 22 +-
fs/ext4/file.c | 6 +
fs/ext4/inode.c | 11 +
fs/ext4/ioctl.c | 12 +
fs/ext4/readpage.c | 209 +++++-
fs/ext4/super.c | 100 ++-
fs/ext4/sysfs.c | 6 +
fs/f2fs/Kconfig | 20 +
fs/f2fs/data.c | 43 +-
fs/f2fs/f2fs.h | 17 +-
fs/f2fs/file.c | 58 ++
fs/f2fs/inode.c | 3 +-
fs/f2fs/super.c | 30 +
fs/f2fs/sysfs.c | 11 +
fs/verity/Kconfig | 52 ++
fs/verity/Makefile | 5 +
fs/verity/fsverity_private.h | 135 ++++
fs/verity/hash_algs.c | 115 ++++
fs/verity/ioctl.c | 164 +++++
fs/verity/setup.c | 908 +++++++++++++++++++++++++
fs/verity/signature.c | 187 +++++
fs/verity/verify.c | 298 ++++++++
include/linux/fs.h | 9 +
include/linux/fsverity.h | 112 +++
include/uapi/linux/fsverity.h | 98 +++
32 files changed, 3218 insertions(+), 42 deletions(-)
create mode 100644 Documentation/filesystems/fsverity.rst
create mode 100644 fs/verity/Kconfig
create mode 100644 fs/verity/Makefile
create mode 100644 fs/verity/fsverity_private.h
create mode 100644 fs/verity/hash_algs.c
create mode 100644 fs/verity/ioctl.c
create mode 100644 fs/verity/setup.c
create mode 100644 fs/verity/signature.c
create mode 100644 fs/verity/verify.c
create mode 100644 include/linux/fsverity.h
create mode 100644 include/uapi/linux/fsverity.h
--
2.19.1.568.g152ad8e336-goog
From: Eric Biggers <[email protected]>
Add fs-verity support to f2fs. fs-verity is a filesystem feature that
enables transparent integrity protection and authentication of read-only
files. It uses a dm-verity like mechanism at the file level: a Merkle
tree is used to verify any block in the file in log(filesize) time. It
is implemented mainly by helper functions in fs/verity/.
See Documentation/filesystems/fsverity.rst for details.
In f2fs, the main change is to the I/O path: ->readpage() and
->readpages() now verify data as it is read from verity files. Pages
that fail verification are set to PG_error && !PG_uptodate, causing
applications to see an I/O error.
Hooks are also added to several other f2fs filesystem operations:
* ->open(), to deny opening verity files for writing and to set up
the fsverity_info to prepare for I/O
* ->getattr() to set up the fsverity_info to make stat() show the
original data size of verity files
* ->setattr() to deny truncating verity files
* update_inode() to write out the full file size rather than the
original data size, since for verity files the in-memory ->i_size is
overridden with the original data size.
Finally, the FS_IOC_ENABLE_VERITY and FS_IOC_MEASURE_VERITY ioctls are
wired up. On f2fs, these ioctls require that the filesystem has the
'verity' feature, i.e. it was created with 'mkfs.f2fs -O verity'.
Like we did in ext4, in f2fs we choose to retain the fs-verity metadata
past the end of the file rather than move it into an xattr, since in
practice this results in the simplest and most efficient implementation.
For example, it avoids needing to add support for external inode xattrs
and for xattr encryption.
Signed-off-by: Eric Biggers <[email protected]>
---
fs/f2fs/Kconfig | 20 +++++++++++++++++
fs/f2fs/data.c | 43 +++++++++++++++++++++++++++++++-----
fs/f2fs/f2fs.h | 17 ++++++++++++---
fs/f2fs/file.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/f2fs/inode.c | 3 ++-
fs/f2fs/super.c | 30 +++++++++++++++++++++++++
fs/f2fs/sysfs.c | 11 ++++++++++
7 files changed, 173 insertions(+), 9 deletions(-)
diff --git a/fs/f2fs/Kconfig b/fs/f2fs/Kconfig
index 9a20ef42fadde..c8396c7220f2a 100644
--- a/fs/f2fs/Kconfig
+++ b/fs/f2fs/Kconfig
@@ -81,6 +81,26 @@ config F2FS_FS_ENCRYPTION
efficient since it avoids caching the encrypted and
decrypted pages in the page cache.
+config F2FS_FS_VERITY
+ bool "F2FS Verity"
+ depends on F2FS_FS
+ select FS_VERITY
+ help
+ This option enables fs-verity for f2fs. fs-verity is the
+ dm-verity mechanism implemented at the file level. Userspace
+ can append a Merkle tree (hash tree) to a file, then enable
+ fs-verity on the file. f2fs will then transparently verify
+ any data read from the file against the Merkle tree. The file
+ is also made read-only.
+
+ This serves as an integrity check, but the availability of the
+ Merkle tree root hash also allows efficiently supporting
+ various use cases where normally the whole file would need to
+ be hashed at once, such as auditing and authenticity
+ verification (appraisal).
+
+ If unsure, say N.
+
config F2FS_IO_TRACE
bool "F2FS IO tracer"
depends on F2FS_FS
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index b293cb3e27a22..09d9fc1676a7e 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -73,6 +73,7 @@ static enum count_type __read_io_type(struct page *page)
enum bio_post_read_step {
STEP_INITIAL = 0,
STEP_DECRYPT,
+ STEP_VERITY,
};
struct bio_post_read_ctx {
@@ -119,8 +120,23 @@ static void decrypt_work(struct work_struct *work)
bio_post_read_processing(ctx);
}
+static void verity_work(struct work_struct *work)
+{
+ struct bio_post_read_ctx *ctx =
+ container_of(work, struct bio_post_read_ctx, work);
+
+ fsverity_verify_bio(ctx->bio);
+
+ bio_post_read_processing(ctx);
+}
+
static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
{
+ /*
+ * We use different work queues for decryption and for verity because
+ * verity may require reading metadata pages that need decryption, and
+ * we shouldn't recurse to the same workqueue.
+ */
switch (++ctx->cur_step) {
case STEP_DECRYPT:
if (ctx->enabled_steps & (1 << STEP_DECRYPT)) {
@@ -130,6 +146,14 @@ static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
}
ctx->cur_step++;
/* fall-through */
+ case STEP_VERITY:
+ if (ctx->enabled_steps & (1 << STEP_VERITY)) {
+ INIT_WORK(&ctx->work, verity_work);
+ fsverity_enqueue_verify_work(&ctx->work);
+ return;
+ }
+ ctx->cur_step++;
+ /* fall-through */
default:
__read_end_io(ctx->bio);
}
@@ -566,7 +590,8 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
}
static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
- unsigned nr_pages, unsigned op_flag)
+ unsigned nr_pages, unsigned op_flag,
+ pgoff_t first_idx)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct bio *bio;
@@ -585,6 +610,11 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
if (f2fs_encrypted_file(inode))
post_read_steps |= 1 << STEP_DECRYPT;
+#ifdef CONFIG_F2FS_FS_VERITY
+ if (inode->i_verity_info != NULL &&
+ (first_idx < ((i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT)))
+ post_read_steps |= 1 << STEP_VERITY;
+#endif
if (post_read_steps) {
ctx = mempool_alloc(bio_post_read_ctx_pool, GFP_NOFS);
if (!ctx) {
@@ -603,7 +633,7 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
static int f2fs_submit_page_read(struct inode *inode, struct page *page,
block_t blkaddr)
{
- struct bio *bio = f2fs_grab_read_bio(inode, blkaddr, 1, 0);
+ struct bio *bio = f2fs_grab_read_bio(inode, blkaddr, 1, 0, page->index);
if (IS_ERR(bio))
return PTR_ERR(bio);
@@ -1540,8 +1570,8 @@ static int f2fs_mpage_readpages(struct address_space *mapping,
block_in_file = (sector_t)page->index;
last_block = block_in_file + nr_pages;
- last_block_in_file = (i_size_read(inode) + blocksize - 1) >>
- blkbits;
+ last_block_in_file = (fsverity_full_i_size(inode) +
+ blocksize - 1) >> blkbits;
if (last_block > last_block_in_file)
last_block = last_block_in_file;
@@ -1582,6 +1612,8 @@ static int f2fs_mpage_readpages(struct address_space *mapping,
goto set_error_page;
} else {
zero_user_segment(page, 0, PAGE_SIZE);
+ if (!fsverity_check_hole(inode, page))
+ goto set_error_page;
if (!PageUptodate(page))
SetPageUptodate(page);
unlock_page(page);
@@ -1600,7 +1632,8 @@ static int f2fs_mpage_readpages(struct address_space *mapping,
}
if (bio == NULL) {
bio = f2fs_grab_read_bio(inode, block_nr, nr_pages,
- is_readahead ? REQ_RAHEAD : 0);
+ is_readahead ? REQ_RAHEAD : 0,
+ page->index);
if (IS_ERR(bio)) {
bio = NULL;
goto set_error_page;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 1e031971a466c..dadb5f468f20e 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -27,6 +27,9 @@
#define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_F2FS_FS_ENCRYPTION)
#include <linux/fscrypt.h>
+#define __FS_HAS_VERITY IS_ENABLED(CONFIG_F2FS_FS_VERITY)
+#include <linux/fsverity.h>
+
#ifdef CONFIG_F2FS_CHECK_FS
#define f2fs_bug_on(sbi, condition) BUG_ON(condition)
#else
@@ -149,7 +152,7 @@ struct f2fs_mount_info {
#define F2FS_FEATURE_QUOTA_INO 0x0080
#define F2FS_FEATURE_INODE_CRTIME 0x0100
#define F2FS_FEATURE_LOST_FOUND 0x0200
-#define F2FS_FEATURE_VERITY 0x0400 /* reserved */
+#define F2FS_FEATURE_VERITY 0x0400
#define F2FS_FEATURE_SB_CHKSUM 0x0800
#define F2FS_HAS_FEATURE(sb, mask) \
@@ -623,7 +626,7 @@ enum {
#define FADVISE_ENC_NAME_BIT 0x08
#define FADVISE_KEEP_SIZE_BIT 0x10
#define FADVISE_HOT_BIT 0x20
-#define FADVISE_VERITY_BIT 0x40 /* reserved */
+#define FADVISE_VERITY_BIT 0x40
#define FADVISE_MODIFIABLE_BITS (FADVISE_COLD_BIT | FADVISE_HOT_BIT)
@@ -643,6 +646,8 @@ enum {
#define file_is_hot(inode) is_file(inode, FADVISE_HOT_BIT)
#define file_set_hot(inode) set_file(inode, FADVISE_HOT_BIT)
#define file_clear_hot(inode) clear_file(inode, FADVISE_HOT_BIT)
+#define file_is_verity(inode) is_file(inode, FADVISE_VERITY_BIT)
+#define file_set_verity(inode) set_file(inode, FADVISE_VERITY_BIT)
#define DEF_DIR_LEVEL 0
@@ -3449,13 +3454,18 @@ static inline void f2fs_set_encrypted_inode(struct inode *inode)
#endif
}
+static inline bool f2fs_verity_file(struct inode *inode)
+{
+ return file_is_verity(inode);
+}
+
/*
* Returns true if the reads of the inode's data need to undergo some
* postprocessing step, like decryption or authenticity verification.
*/
static inline bool f2fs_post_read_required(struct inode *inode)
{
- return f2fs_encrypted_file(inode);
+ return f2fs_encrypted_file(inode) || f2fs_verity_file(inode);
}
#define F2FS_FEATURE_FUNCS(name, flagname) \
@@ -3473,6 +3483,7 @@ F2FS_FEATURE_FUNCS(flexible_inline_xattr, FLEXIBLE_INLINE_XATTR);
F2FS_FEATURE_FUNCS(quota_ino, QUOTA_INO);
F2FS_FEATURE_FUNCS(inode_crtime, INODE_CRTIME);
F2FS_FEATURE_FUNCS(lost_found, LOST_FOUND);
+F2FS_FEATURE_FUNCS(verity, VERITY);
F2FS_FEATURE_FUNCS(sb_chksum, SB_CHKSUM);
#ifdef CONFIG_BLK_DEV_ZONED
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 88b124677189b..87794b2a45ff8 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -491,6 +491,12 @@ static int f2fs_file_open(struct inode *inode, struct file *filp)
if (err)
return err;
+ if (f2fs_verity_file(inode)) {
+ err = fsverity_file_open(inode, filp);
+ if (err)
+ return err;
+ }
+
filp->f_mode |= FMODE_NOWAIT;
return dquot_file_open(inode, filp);
@@ -695,6 +701,22 @@ int f2fs_getattr(const struct path *path, struct kstat *stat,
struct f2fs_inode *ri;
unsigned int flags;
+ if (f2fs_verity_file(inode)) {
+ /*
+ * For fs-verity we need to override i_size with the original
+ * data i_size. This requires I/O to the file which with
+ * fscrypt requires that the key be set up. But, if the key is
+ * unavailable just continue on without the i_size override.
+ */
+ int err = fscrypt_require_key(inode);
+
+ if (!err) {
+ err = fsverity_prepare_getattr(inode);
+ if (err)
+ return err;
+ }
+ }
+
if (f2fs_has_extra_attr(inode) &&
f2fs_sb_has_inode_crtime(inode->i_sb) &&
F2FS_FITS_IN_INODE(ri, fi->i_extra_isize, i_crtime)) {
@@ -778,6 +800,12 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
if (err)
return err;
+ if (f2fs_verity_file(inode)) {
+ err = fsverity_prepare_setattr(dentry, attr);
+ if (err)
+ return err;
+ }
+
if (is_quota_modification(inode, attr)) {
err = dquot_initialize(inode);
if (err)
@@ -2954,6 +2982,30 @@ static int f2fs_ioc_precache_extents(struct file *filp, unsigned long arg)
return f2fs_precache_extents(file_inode(filp));
}
+static int f2fs_ioc_enable_verity(struct file *filp, unsigned long arg)
+{
+ struct inode *inode = file_inode(filp);
+
+ f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
+
+ if (!f2fs_sb_has_verity(inode->i_sb)) {
+ f2fs_msg(inode->i_sb, KERN_WARNING,
+ "Can't enable fs-verity on inode %lu: the fs-verity feature is disabled on this filesystem.\n",
+ inode->i_ino);
+ return -EOPNOTSUPP;
+ }
+
+ return fsverity_ioctl_enable(filp, (const void __user *)arg);
+}
+
+static int f2fs_ioc_measure_verity(struct file *filp, unsigned long arg)
+{
+ if (!f2fs_sb_has_verity(file_inode(filp)->i_sb))
+ return -EOPNOTSUPP;
+
+ return fsverity_ioctl_measure(filp, (void __user *)arg);
+}
+
long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
if (unlikely(f2fs_cp_error(F2FS_I_SB(file_inode(filp)))))
@@ -3010,6 +3062,10 @@ long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return f2fs_ioc_set_pin_file(filp, arg);
case F2FS_IOC_PRECACHE_EXTENTS:
return f2fs_ioc_precache_extents(filp, arg);
+ case FS_IOC_ENABLE_VERITY:
+ return f2fs_ioc_enable_verity(filp, arg);
+ case FS_IOC_MEASURE_VERITY:
+ return f2fs_ioc_measure_verity(filp, arg);
default:
return -ENOTTY;
}
@@ -3117,6 +3173,8 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case F2FS_IOC_GET_PIN_FILE:
case F2FS_IOC_SET_PIN_FILE:
case F2FS_IOC_PRECACHE_EXTENTS:
+ case FS_IOC_ENABLE_VERITY:
+ case FS_IOC_MEASURE_VERITY:
break;
default:
return -ENOIOCTLCMD;
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 91ceee0ed4c40..ddef483ad689d 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -509,7 +509,7 @@ void f2fs_update_inode(struct inode *inode, struct page *node_page)
ri->i_uid = cpu_to_le32(i_uid_read(inode));
ri->i_gid = cpu_to_le32(i_gid_read(inode));
ri->i_links = cpu_to_le32(inode->i_nlink);
- ri->i_size = cpu_to_le64(i_size_read(inode));
+ ri->i_size = cpu_to_le64(fsverity_full_i_size(inode));
ri->i_blocks = cpu_to_le64(SECTOR_TO_BLOCK(inode->i_blocks) + 1);
if (et) {
@@ -732,6 +732,7 @@ void f2fs_evict_inode(struct inode *inode)
}
out_clear:
fscrypt_put_encryption_info(inode);
+ fsverity_cleanup_inode(inode);
clear_inode(inode);
}
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index af58b2cc21b81..adf38c1b64141 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2197,6 +2197,33 @@ static const struct fscrypt_operations f2fs_cryptops = {
};
#endif
+#ifdef CONFIG_F2FS_FS_VERITY
+static int f2fs_set_verity(struct inode *inode, loff_t data_i_size)
+{
+ int err;
+
+ err = f2fs_convert_inline_inode(inode);
+ if (err)
+ return err;
+
+ file_set_verity(inode);
+ f2fs_mark_inode_dirty_sync(inode, true);
+ return 0;
+}
+
+static int f2fs_get_metadata_end(struct inode *inode, loff_t *metadata_end_ret)
+{
+ /* f2fs on-disk size is the full file size including verity metadata */
+ *metadata_end_ret = i_size_read(inode);
+ return 0;
+}
+
+static const struct fsverity_operations f2fs_verityops = {
+ .set_verity = f2fs_set_verity,
+ .get_metadata_end = f2fs_get_metadata_end,
+};
+#endif /* CONFIG_F2FS_FS_VERITY */
+
static struct inode *f2fs_nfs_get_inode(struct super_block *sb,
u64 ino, u32 generation)
{
@@ -3118,6 +3145,9 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
sb->s_op = &f2fs_sops;
#ifdef CONFIG_F2FS_FS_ENCRYPTION
sb->s_cop = &f2fs_cryptops;
+#endif
+#ifdef CONFIG_F2FS_FS_VERITY
+ sb->s_vop = &f2fs_verityops;
#endif
sb->s_xattr = f2fs_xattr_handlers;
sb->s_export_op = &f2fs_export_ops;
diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index b777cbdd796ba..5599c9ac4426d 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -117,6 +117,9 @@ static ssize_t features_show(struct f2fs_attr *a,
if (f2fs_sb_has_lost_found(sb))
len += snprintf(buf + len, PAGE_SIZE - len, "%s%s",
len ? ", " : "", "lost_found");
+ if (f2fs_sb_has_verity(sb))
+ len += snprintf(buf + len, PAGE_SIZE - len, "%s%s",
+ len ? ", " : "", "verity");
if (f2fs_sb_has_sb_chksum(sb))
len += snprintf(buf + len, PAGE_SIZE - len, "%s%s",
len ? ", " : "", "sb_checksum");
@@ -337,6 +340,7 @@ enum feat_id {
FEAT_QUOTA_INO,
FEAT_INODE_CRTIME,
FEAT_LOST_FOUND,
+ FEAT_VERITY,
FEAT_SB_CHECKSUM,
};
@@ -354,6 +358,7 @@ static ssize_t f2fs_feature_show(struct f2fs_attr *a,
case FEAT_QUOTA_INO:
case FEAT_INODE_CRTIME:
case FEAT_LOST_FOUND:
+ case FEAT_VERITY:
case FEAT_SB_CHECKSUM:
return snprintf(buf, PAGE_SIZE, "supported\n");
}
@@ -439,6 +444,9 @@ F2FS_FEATURE_RO_ATTR(flexible_inline_xattr, FEAT_FLEXIBLE_INLINE_XATTR);
F2FS_FEATURE_RO_ATTR(quota_ino, FEAT_QUOTA_INO);
F2FS_FEATURE_RO_ATTR(inode_crtime, FEAT_INODE_CRTIME);
F2FS_FEATURE_RO_ATTR(lost_found, FEAT_LOST_FOUND);
+#ifdef CONFIG_F2FS_FS_VERITY
+F2FS_FEATURE_RO_ATTR(verity, FEAT_VERITY);
+#endif
F2FS_FEATURE_RO_ATTR(sb_checksum, FEAT_SB_CHECKSUM);
#define ATTR_LIST(name) (&f2fs_attr_##name.attr)
@@ -499,6 +507,9 @@ static struct attribute *f2fs_feat_attrs[] = {
ATTR_LIST(quota_ino),
ATTR_LIST(inode_crtime),
ATTR_LIST(lost_found),
+#ifdef CONFIG_F2FS_FS_VERITY
+ ATTR_LIST(verity),
+#endif
ATTR_LIST(sb_checksum),
NULL,
};
--
2.19.1.568.g152ad8e336-goog
From: Eric Biggers <[email protected]>
Add SHA-512 support to fs-verity. This is primarily a demonstration of
the (small) changes needed to support a new hash algorithm; it's
anticipated that most users will still prefer SHA-256 due to the smaller
space required to store the hashes, though some may prefer SHA-512.
Signed-off-by: Eric Biggers <[email protected]>
---
fs/verity/fsverity_private.h | 2 +-
fs/verity/hash_algs.c | 5 +++++
include/uapi/linux/fsverity.h | 1 +
3 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index dfdbac3874d74..c3a261a598557 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -30,7 +30,7 @@
* Largest digest size among all hash algorithms supported by fs-verity. This
* can be increased if needed.
*/
-#define FS_VERITY_MAX_DIGEST_SIZE SHA256_DIGEST_SIZE
+#define FS_VERITY_MAX_DIGEST_SIZE SHA512_DIGEST_SIZE
/* A hash algorithm supported by fs-verity */
struct fsverity_hash_alg {
diff --git a/fs/verity/hash_algs.c b/fs/verity/hash_algs.c
index 9c19c9553f120..3174a0c08785d 100644
--- a/fs/verity/hash_algs.c
+++ b/fs/verity/hash_algs.c
@@ -18,6 +18,11 @@ struct fsverity_hash_alg fsverity_hash_algs[] = {
.digest_size = 32,
.cryptographic = true,
},
+ [FS_VERITY_ALG_SHA512] = {
+ .name = "sha512",
+ .digest_size = 64,
+ .cryptographic = true,
+ },
};
/*
diff --git a/include/uapi/linux/fsverity.h b/include/uapi/linux/fsverity.h
index 55b9f32676220..67ed830ae2ece 100644
--- a/include/uapi/linux/fsverity.h
+++ b/include/uapi/linux/fsverity.h
@@ -28,6 +28,7 @@ struct fsverity_digest {
/* Supported hash algorithms */
#define FS_VERITY_ALG_SHA256 1
+#define FS_VERITY_ALG_SHA512 2
/* Metadata stored near the end of verity files, after the Merkle tree */
/* This structure is 64 bytes long */
--
2.19.1.568.g152ad8e336-goog
Hi Chandan,
On Fri, Nov 02, 2018 at 03:13:14PM +0530, Chandan Rajendra wrote:
> On Friday, November 2, 2018 4:22:28 AM IST Eric Biggers wrote:
> > From: Eric Biggers <[email protected]>
> >
> > Add basic fs-verity support to ext4. fs-verity is a filesystem feature
> > that enables transparent integrity protection and authentication of
> > read-only files. It uses a dm-verity like mechanism at the file level:
> > a Merkle tree is used to verify any block in the file in log(filesize)
> > time. It is implemented mainly by helper functions in fs/verity/.
> > See Documentation/filesystems/fsverity.rst for details.
> >
> > This patch adds everything except the data verification hooks that will
> > needed in ->readpages().
> >
> > On ext4, enabling fs-verity on a file requires that the filesystem has
> > the 'verity' feature, e.g. that it was formatted with
> > 'mkfs.ext4 -O verity' or had 'tune2fs -O verity' run on it.
> > This requires e2fsprogs 1.44.4-2 or later.
> >
> > In ext4, we choose to retain the fs-verity metadata past the end of the
> > file rather than trying to move it into an external inode xattr, since
> > in practice keeping the metadata in-line actually results in the
> > simplest and most efficient implementation. One non-obvious advantage
> > of keeping the verity metadata in-line is that when fs-verity is
> > combined with fscrypt, the verity metadata naturally gets encrypted too;
> > this is actually necessary because it contains hashes of the plaintext.
> >
> > We also choose to keep the on-disk i_size equal to the original file
> > size, in order to make the 'verity' feature a RO_COMPAT feature. Thus,
> > ext4 has to find the fsverity_footer by looking in the last extent.
> >
> > Co-developed-by: Theodore Ts'o <[email protected]>
> > Signed-off-by: Theodore Ts'o <[email protected]>
> > Signed-off-by: Eric Biggers <[email protected]>
> > ---
> > fs/ext4/Kconfig | 20 +++++++++++
> > fs/ext4/ext4.h | 20 ++++++++++-
> > fs/ext4/file.c | 6 ++++
> > fs/ext4/inode.c | 8 +++++
> > fs/ext4/ioctl.c | 12 +++++++
> > fs/ext4/super.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++++
> > fs/ext4/sysfs.c | 6 ++++
> > 7 files changed, 162 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig
> > index a453cc87082b5..5a76125ac0f8a 100644
> > --- a/fs/ext4/Kconfig
> > +++ b/fs/ext4/Kconfig
> > @@ -111,6 +111,26 @@ config EXT4_FS_ENCRYPTION
> > default y
> > depends on EXT4_ENCRYPTION
> >
> > +config EXT4_FS_VERITY
> > + bool "Ext4 Verity"
> > + depends on EXT4_FS
> > + select FS_VERITY
> > + help
> > + This option enables fs-verity for ext4. fs-verity is the
> > + dm-verity mechanism implemented at the file level. Userspace
> > + can append a Merkle tree (hash tree) to a file, then enable
> > + fs-verity on the file. ext4 will then transparently verify
> > + any data read from the file against the Merkle tree. The file
> > + is also made read-only.
> > +
> > + This serves as an integrity check, but the availability of the
> > + Merkle tree root hash also allows efficiently supporting
> > + various use cases where normally the whole file would need to
> > + be hashed at once, such as auditing and authenticity
> > + verification (appraisal).
> > +
> > + If unsure, say N.
> > +
> > config EXT4_DEBUG
> > bool "EXT4 debugging support"
> > depends on EXT4_FS
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 12f90d48ba613..e5475a629ed80 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -43,6 +43,9 @@
> > #define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_EXT4_FS_ENCRYPTION)
> > #include <linux/fscrypt.h>
> >
> > +#define __FS_HAS_VERITY IS_ENABLED(CONFIG_EXT4_FS_VERITY)
> > +#include <linux/fsverity.h>
> > +
> > #include <linux/compiler.h>
> >
> > /* Until this gets included into linux/compiler-gcc.h */
> > @@ -405,6 +408,7 @@ struct flex_groups {
> > #define EXT4_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/
> > #define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */
> > #define EXT4_EXTENTS_FL 0x00080000 /* Inode uses extents */
> > +#define EXT4_VERITY_FL 0x00100000 /* Verity protected inode */
> > #define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
> > #define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
> > #define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
> > @@ -472,6 +476,7 @@ enum {
> > EXT4_INODE_TOPDIR = 17, /* Top of directory hierarchies*/
> > EXT4_INODE_HUGE_FILE = 18, /* Set to each huge file */
> > EXT4_INODE_EXTENTS = 19, /* Inode uses extents */
> > + EXT4_INODE_VERITY = 20, /* Verity protected inode */
> > EXT4_INODE_EA_INODE = 21, /* Inode used for large EA */
> > EXT4_INODE_EOFBLOCKS = 22, /* Blocks allocated beyond EOF */
> > EXT4_INODE_INLINE_DATA = 28, /* Data in inode. */
> > @@ -517,6 +522,7 @@ static inline void ext4_check_flag_values(void)
> > CHECK_FLAG_VALUE(TOPDIR);
> > CHECK_FLAG_VALUE(HUGE_FILE);
> > CHECK_FLAG_VALUE(EXTENTS);
> > + CHECK_FLAG_VALUE(VERITY);
> > CHECK_FLAG_VALUE(EA_INODE);
> > CHECK_FLAG_VALUE(EOFBLOCKS);
> > CHECK_FLAG_VALUE(INLINE_DATA);
> > @@ -1654,6 +1660,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
> > #define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM 0x0400
> > #define EXT4_FEATURE_RO_COMPAT_READONLY 0x1000
> > #define EXT4_FEATURE_RO_COMPAT_PROJECT 0x2000
> > +#define EXT4_FEATURE_RO_COMPAT_VERITY 0x8000
> >
> > #define EXT4_FEATURE_INCOMPAT_COMPRESSION 0x0001
> > #define EXT4_FEATURE_INCOMPAT_FILETYPE 0x0002
> > @@ -1742,6 +1749,7 @@ EXT4_FEATURE_RO_COMPAT_FUNCS(bigalloc, BIGALLOC)
> > EXT4_FEATURE_RO_COMPAT_FUNCS(metadata_csum, METADATA_CSUM)
> > EXT4_FEATURE_RO_COMPAT_FUNCS(readonly, READONLY)
> > EXT4_FEATURE_RO_COMPAT_FUNCS(project, PROJECT)
> > +EXT4_FEATURE_RO_COMPAT_FUNCS(verity, VERITY)
> >
> > EXT4_FEATURE_INCOMPAT_FUNCS(compression, COMPRESSION)
> > EXT4_FEATURE_INCOMPAT_FUNCS(filetype, FILETYPE)
> > @@ -1797,7 +1805,8 @@ EXT4_FEATURE_INCOMPAT_FUNCS(encrypt, ENCRYPT)
> > EXT4_FEATURE_RO_COMPAT_BIGALLOC |\
> > EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|\
> > EXT4_FEATURE_RO_COMPAT_QUOTA |\
> > - EXT4_FEATURE_RO_COMPAT_PROJECT)
> > + EXT4_FEATURE_RO_COMPAT_PROJECT |\
> > + EXT4_FEATURE_RO_COMPAT_VERITY)
> >
> > #define EXTN_FEATURE_FUNCS(ver) \
> > static inline bool ext4_has_unknown_ext##ver##_compat_features(struct super_block *sb) \
> > @@ -2293,6 +2302,15 @@ static inline bool ext4_encrypted_inode(struct inode *inode)
> > return ext4_test_inode_flag(inode, EXT4_INODE_ENCRYPT);
> > }
> >
> > +static inline bool ext4_verity_inode(struct inode *inode)
> > +{
> > +#ifdef CONFIG_EXT4_FS_VERITY
> > + return ext4_test_inode_flag(inode, EXT4_INODE_VERITY);
> > +#else
> > + return false;
> > +#endif
> > +}
> > +
>
> Hi Eric,
>
> Can you please explain as to why we check for the presence of
> EXT4_INODE_VERITY flag only when fsverity is enabled during kernel build?
>
Good question, this might not be the best approach actually; I think this was
originally copied from the f2fs version. It does reduce the overhead introduced
by the fs-verity changes in the !CONFIG_EXT4_FS_VERITY case. But it will allow
opening verity files, even for writing which will corrupt them.
Probably we should make ext4_verity_inode() work regardless of
CONFIG_EXT4_FS_VERITY, so open(), truncate(), etc. will fail with EOPNOTSUPP on
verity files when !CONFIG_EXT4_FS_VERITY, like how ext4 encryption works.
Thanks,
- Eric
From: Eric Biggers <[email protected]>
Add a function for filesystems to call to implement the
FS_IOC_ENABLE_VERITY ioctl. This ioctl enables fs-verity on a file,
after userspace has appended verity metadata to it.
This ioctl is documented in Documentation/filesystem/fsverity.rst;
see there for more information.
Signed-off-by: Eric Biggers <[email protected]>
---
fs/verity/Makefile | 2 +-
fs/verity/ioctl.c | 117 +++++++++++++++++++++++++++++++++++++++
include/linux/fsverity.h | 11 ++++
3 files changed, 129 insertions(+), 1 deletion(-)
create mode 100644 fs/verity/ioctl.c
diff --git a/fs/verity/Makefile b/fs/verity/Makefile
index a6c7cefb61ab7..6450925e3a8b7 100644
--- a/fs/verity/Makefile
+++ b/fs/verity/Makefile
@@ -1,3 +1,3 @@
obj-$(CONFIG_FS_VERITY) += fsverity.o
-fsverity-y := hash_algs.o setup.o verify.o
+fsverity-y := hash_algs.o ioctl.o setup.o verify.o
diff --git a/fs/verity/ioctl.c b/fs/verity/ioctl.c
new file mode 100644
index 0000000000000..c5f0022cb3bef
--- /dev/null
+++ b/fs/verity/ioctl.c
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fs/verity/ioctl.c: fs-verity ioctls
+ *
+ * Copyright 2018 Google LLC
+ *
+ * Originally written by Jaegeuk Kim and Michael Halcrow;
+ * heavily rewritten by Eric Biggers.
+ */
+
+#include "fsverity_private.h"
+
+#include <linux/mm.h>
+#include <linux/mount.h>
+#include <linux/uaccess.h>
+
+/**
+ * fsverity_ioctl_enable - enable fs-verity on a file
+ *
+ * Enable fs-verity on a file. Verity metadata must have already been appended
+ * to the file. See Documentation/filesystems/fsverity.rst, section
+ * 'FS_IOC_ENABLE_VERITY' for details.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int fsverity_ioctl_enable(struct file *filp, const void __user *arg)
+{
+ struct inode *inode = file_inode(filp);
+ struct fsverity_info *vi;
+ int err;
+
+ err = inode_permission(inode, MAY_WRITE);
+ if (err)
+ return err;
+
+ if (IS_APPEND(inode))
+ return -EPERM;
+
+ if (arg) /* argument is reserved */
+ return -EINVAL;
+
+ if (S_ISDIR(inode->i_mode))
+ return -EISDIR;
+
+ if (!S_ISREG(inode->i_mode))
+ return -EINVAL;
+
+ err = mnt_want_write_file(filp);
+ if (err)
+ goto out;
+
+ /*
+ * Temporarily lock out writers via writable file descriptors or
+ * truncate(). This should stabilize the contents of the file as well
+ * as its size. Note that at the end of this ioctl we will unlock
+ * writers, but at that point the verity bit will be set (if the ioctl
+ * succeeded), preventing future writers.
+ */
+ err = deny_write_access(filp);
+ if (err) /* -ETXTBSY */
+ goto out_drop_write;
+
+ /*
+ * fsync so that the verity bit can't be persisted to disk prior to the
+ * data, causing verification errors after a crash.
+ */
+ err = vfs_fsync(filp, 1);
+ if (err)
+ goto out_allow_write;
+
+ /* Serialize concurrent use of this ioctl on the same inode */
+ inode_lock(inode);
+
+ if (get_fsverity_info(inode)) { /* fs-verity already enabled? */
+ err = -EEXIST;
+ goto out_unlock;
+ }
+
+ /* Validate the verity metadata */
+ vi = create_fsverity_info(inode, true);
+ if (IS_ERR(vi)) {
+ err = PTR_ERR(vi);
+ if (err == -EINVAL) /* distinguish "invalid metadata" case */
+ err = -EBADMSG;
+ goto out_unlock;
+ }
+
+ /*
+ * Ask the filesystem to mark the file as a verity file, e.g. by setting
+ * the verity bit in the inode.
+ */
+ err = inode->i_sb->s_vop->set_verity(inode, vi->data_i_size);
+ if (err)
+ goto out_free_vi;
+
+ /* Invalidate all cached pages, forcing re-verification */
+ truncate_inode_pages(inode->i_mapping, 0);
+
+ /*
+ * Set ->i_verity_info, unless another task managed to do it already
+ * between ->set_verity() and here.
+ */
+ if (set_fsverity_info(inode, vi))
+ vi = NULL;
+ err = 0;
+out_free_vi:
+ free_fsverity_info(vi);
+out_unlock:
+ inode_unlock(inode);
+out_allow_write:
+ allow_write_access(filp);
+out_drop_write:
+ mnt_drop_write_file(filp);
+out:
+ return err;
+}
+EXPORT_SYMBOL_GPL(fsverity_ioctl_enable);
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 15478fe7d55aa..5de50b52ccc70 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -21,6 +21,9 @@ struct fsverity_operations {
#if __FS_HAS_VERITY
+/* ioctl.c */
+extern int fsverity_ioctl_enable(struct file *filp, const void __user *arg);
+
/* setup.c */
extern int fsverity_file_open(struct inode *inode, struct file *filp);
extern int fsverity_prepare_setattr(struct dentry *dentry, struct iattr *attr);
@@ -40,6 +43,14 @@ static inline bool fsverity_check_hole(struct inode *inode, struct page *page)
#else /* !__FS_HAS_VERITY */
+/* ioctl.c */
+
+static inline int fsverity_ioctl_enable(struct file *filp,
+ const void __user *arg)
+{
+ return -EOPNOTSUPP;
+}
+
/* setup.c */
static inline int fsverity_file_open(struct inode *inode, struct file *filp)
--
2.19.1.568.g152ad8e336-goog
On Tuesday, November 6, 2018 6:55:03 AM IST Eric Biggers wrote:
> Hi Chandan,
>
> On Fri, Nov 02, 2018 at 03:13:14PM +0530, Chandan Rajendra wrote:
> > On Friday, November 2, 2018 4:22:28 AM IST Eric Biggers wrote:
> > > From: Eric Biggers <[email protected]>
> > >
> > > Add basic fs-verity support to ext4. fs-verity is a filesystem feature
> > > that enables transparent integrity protection and authentication of
> > > read-only files. It uses a dm-verity like mechanism at the file level:
> > > a Merkle tree is used to verify any block in the file in log(filesize)
> > > time. It is implemented mainly by helper functions in fs/verity/.
> > > See Documentation/filesystems/fsverity.rst for details.
> > >
> > > This patch adds everything except the data verification hooks that will
> > > needed in ->readpages().
> > >
> > > On ext4, enabling fs-verity on a file requires that the filesystem has
> > > the 'verity' feature, e.g. that it was formatted with
> > > 'mkfs.ext4 -O verity' or had 'tune2fs -O verity' run on it.
> > > This requires e2fsprogs 1.44.4-2 or later.
> > >
> > > In ext4, we choose to retain the fs-verity metadata past the end of the
> > > file rather than trying to move it into an external inode xattr, since
> > > in practice keeping the metadata in-line actually results in the
> > > simplest and most efficient implementation. One non-obvious advantage
> > > of keeping the verity metadata in-line is that when fs-verity is
> > > combined with fscrypt, the verity metadata naturally gets encrypted too;
> > > this is actually necessary because it contains hashes of the plaintext.
> > >
> > > We also choose to keep the on-disk i_size equal to the original file
> > > size, in order to make the 'verity' feature a RO_COMPAT feature. Thus,
> > > ext4 has to find the fsverity_footer by looking in the last extent.
> > >
> > > Co-developed-by: Theodore Ts'o <[email protected]>
> > > Signed-off-by: Theodore Ts'o <[email protected]>
> > > Signed-off-by: Eric Biggers <[email protected]>
> > > ---
> > > fs/ext4/Kconfig | 20 +++++++++++
> > > fs/ext4/ext4.h | 20 ++++++++++-
> > > fs/ext4/file.c | 6 ++++
> > > fs/ext4/inode.c | 8 +++++
> > > fs/ext4/ioctl.c | 12 +++++++
> > > fs/ext4/super.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > fs/ext4/sysfs.c | 6 ++++
> > > 7 files changed, 162 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig
> > > index a453cc87082b5..5a76125ac0f8a 100644
> > > --- a/fs/ext4/Kconfig
> > > +++ b/fs/ext4/Kconfig
> > > @@ -111,6 +111,26 @@ config EXT4_FS_ENCRYPTION
> > > default y
> > > depends on EXT4_ENCRYPTION
> > >
> > > +config EXT4_FS_VERITY
> > > + bool "Ext4 Verity"
> > > + depends on EXT4_FS
> > > + select FS_VERITY
> > > + help
> > > + This option enables fs-verity for ext4. fs-verity is the
> > > + dm-verity mechanism implemented at the file level. Userspace
> > > + can append a Merkle tree (hash tree) to a file, then enable
> > > + fs-verity on the file. ext4 will then transparently verify
> > > + any data read from the file against the Merkle tree. The file
> > > + is also made read-only.
> > > +
> > > + This serves as an integrity check, but the availability of the
> > > + Merkle tree root hash also allows efficiently supporting
> > > + various use cases where normally the whole file would need to
> > > + be hashed at once, such as auditing and authenticity
> > > + verification (appraisal).
> > > +
> > > + If unsure, say N.
> > > +
> > > config EXT4_DEBUG
> > > bool "EXT4 debugging support"
> > > depends on EXT4_FS
> > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > > index 12f90d48ba613..e5475a629ed80 100644
> > > --- a/fs/ext4/ext4.h
> > > +++ b/fs/ext4/ext4.h
> > > @@ -43,6 +43,9 @@
> > > #define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_EXT4_FS_ENCRYPTION)
> > > #include <linux/fscrypt.h>
> > >
> > > +#define __FS_HAS_VERITY IS_ENABLED(CONFIG_EXT4_FS_VERITY)
> > > +#include <linux/fsverity.h>
> > > +
> > > #include <linux/compiler.h>
> > >
> > > /* Until this gets included into linux/compiler-gcc.h */
> > > @@ -405,6 +408,7 @@ struct flex_groups {
> > > #define EXT4_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/
> > > #define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */
> > > #define EXT4_EXTENTS_FL 0x00080000 /* Inode uses extents */
> > > +#define EXT4_VERITY_FL 0x00100000 /* Verity protected inode */
> > > #define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
> > > #define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
> > > #define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
> > > @@ -472,6 +476,7 @@ enum {
> > > EXT4_INODE_TOPDIR = 17, /* Top of directory hierarchies*/
> > > EXT4_INODE_HUGE_FILE = 18, /* Set to each huge file */
> > > EXT4_INODE_EXTENTS = 19, /* Inode uses extents */
> > > + EXT4_INODE_VERITY = 20, /* Verity protected inode */
> > > EXT4_INODE_EA_INODE = 21, /* Inode used for large EA */
> > > EXT4_INODE_EOFBLOCKS = 22, /* Blocks allocated beyond EOF */
> > > EXT4_INODE_INLINE_DATA = 28, /* Data in inode. */
> > > @@ -517,6 +522,7 @@ static inline void ext4_check_flag_values(void)
> > > CHECK_FLAG_VALUE(TOPDIR);
> > > CHECK_FLAG_VALUE(HUGE_FILE);
> > > CHECK_FLAG_VALUE(EXTENTS);
> > > + CHECK_FLAG_VALUE(VERITY);
> > > CHECK_FLAG_VALUE(EA_INODE);
> > > CHECK_FLAG_VALUE(EOFBLOCKS);
> > > CHECK_FLAG_VALUE(INLINE_DATA);
> > > @@ -1654,6 +1660,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
> > > #define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM 0x0400
> > > #define EXT4_FEATURE_RO_COMPAT_READONLY 0x1000
> > > #define EXT4_FEATURE_RO_COMPAT_PROJECT 0x2000
> > > +#define EXT4_FEATURE_RO_COMPAT_VERITY 0x8000
> > >
> > > #define EXT4_FEATURE_INCOMPAT_COMPRESSION 0x0001
> > > #define EXT4_FEATURE_INCOMPAT_FILETYPE 0x0002
> > > @@ -1742,6 +1749,7 @@ EXT4_FEATURE_RO_COMPAT_FUNCS(bigalloc, BIGALLOC)
> > > EXT4_FEATURE_RO_COMPAT_FUNCS(metadata_csum, METADATA_CSUM)
> > > EXT4_FEATURE_RO_COMPAT_FUNCS(readonly, READONLY)
> > > EXT4_FEATURE_RO_COMPAT_FUNCS(project, PROJECT)
> > > +EXT4_FEATURE_RO_COMPAT_FUNCS(verity, VERITY)
> > >
> > > EXT4_FEATURE_INCOMPAT_FUNCS(compression, COMPRESSION)
> > > EXT4_FEATURE_INCOMPAT_FUNCS(filetype, FILETYPE)
> > > @@ -1797,7 +1805,8 @@ EXT4_FEATURE_INCOMPAT_FUNCS(encrypt, ENCRYPT)
> > > EXT4_FEATURE_RO_COMPAT_BIGALLOC |\
> > > EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|\
> > > EXT4_FEATURE_RO_COMPAT_QUOTA |\
> > > - EXT4_FEATURE_RO_COMPAT_PROJECT)
> > > + EXT4_FEATURE_RO_COMPAT_PROJECT |\
> > > + EXT4_FEATURE_RO_COMPAT_VERITY)
> > >
> > > #define EXTN_FEATURE_FUNCS(ver) \
> > > static inline bool ext4_has_unknown_ext##ver##_compat_features(struct super_block *sb) \
> > > @@ -2293,6 +2302,15 @@ static inline bool ext4_encrypted_inode(struct inode *inode)
> > > return ext4_test_inode_flag(inode, EXT4_INODE_ENCRYPT);
> > > }
> > >
> > > +static inline bool ext4_verity_inode(struct inode *inode)
> > > +{
> > > +#ifdef CONFIG_EXT4_FS_VERITY
> > > + return ext4_test_inode_flag(inode, EXT4_INODE_VERITY);
> > > +#else
> > > + return false;
> > > +#endif
> > > +}
> > > +
> >
> > Hi Eric,
> >
> > Can you please explain as to why we check for the presence of
> > EXT4_INODE_VERITY flag only when fsverity is enabled during kernel build?
> >
>
> Good question, this might not be the best approach actually; I think this was
> originally copied from the f2fs version. It does reduce the overhead introduced
> by the fs-verity changes in the !CONFIG_EXT4_FS_VERITY case. But it will allow
> opening verity files, even for writing which will corrupt them.
>
> Probably we should make ext4_verity_inode() work regardless of
> CONFIG_EXT4_FS_VERITY, so open(), truncate(), etc. will fail with EOPNOTSUPP on
> verity files when !CONFIG_EXT4_FS_VERITY, like how ext4 encryption works.
>
Yes, I agree with what you say. I have followed the above explained logic when
implementing S_VERITY and IS_VERITY() for Ext4 and will extend that to F2FS as
well.
--
chandan
From: Eric Biggers <[email protected]>
Add functions that verify data pages that have been read from a
fs-verity file, against that file's Merkle tree. These will be called
from filesystems' ->readpage() and ->readpages() methods.
Since data verification can block, a workqueue is provided for these
methods to enqueue verification work from their bio completion callback.
Signed-off-by: Eric Biggers <[email protected]>
---
fs/verity/Makefile | 2 +-
fs/verity/fsverity_private.h | 3 +
fs/verity/setup.c | 26 ++-
fs/verity/verify.c | 298 +++++++++++++++++++++++++++++++++++
include/linux/fsverity.h | 33 ++++
5 files changed, 360 insertions(+), 2 deletions(-)
create mode 100644 fs/verity/verify.c
diff --git a/fs/verity/Makefile b/fs/verity/Makefile
index 39e123805c827..a6c7cefb61ab7 100644
--- a/fs/verity/Makefile
+++ b/fs/verity/Makefile
@@ -1,3 +1,3 @@
obj-$(CONFIG_FS_VERITY) += fsverity.o
-fsverity-y := hash_algs.o setup.o
+fsverity-y := hash_algs.o setup.o verify.o
diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index acc29825a0ed7..dfdbac3874d74 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -95,4 +95,7 @@ static inline bool set_fsverity_info(struct inode *inode,
return cmpxchg_release(&inode->i_verity_info, NULL, vi) == NULL;
}
+/* verify.c */
+extern struct workqueue_struct *fsverity_read_workqueue;
+
#endif /* _FSVERITY_PRIVATE_H */
diff --git a/fs/verity/setup.c b/fs/verity/setup.c
index 925970fbe084d..184bdc96abe51 100644
--- a/fs/verity/setup.c
+++ b/fs/verity/setup.c
@@ -801,18 +801,42 @@ EXPORT_SYMBOL_GPL(fsverity_full_i_size);
static int __init fsverity_module_init(void)
{
+ int err;
+
+ /*
+ * Use an unbound workqueue to allow bios to be verified in parallel
+ * even when they happen to complete on the same CPU. This sacrifices
+ * locality, but it's worthwhile since hashing is CPU-intensive.
+ *
+ * Also use a high-priority workqueue to prioritize verification work,
+ * which blocks reads from completing, over regular application tasks.
+ */
+ err = -ENOMEM;
+ fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue",
+ WQ_UNBOUND | WQ_HIGHPRI,
+ num_online_cpus());
+ if (!fsverity_read_workqueue)
+ goto error;
+
+ err = -ENOMEM;
fsverity_info_cachep = KMEM_CACHE(fsverity_info, SLAB_RECLAIM_ACCOUNT);
if (!fsverity_info_cachep)
- return -ENOMEM;
+ goto error_free_workqueue;
fsverity_check_hash_algs();
pr_debug("Initialized fs-verity\n");
return 0;
+
+error_free_workqueue:
+ destroy_workqueue(fsverity_read_workqueue);
+error:
+ return err;
}
static void __exit fsverity_module_exit(void)
{
+ destroy_workqueue(fsverity_read_workqueue);
kmem_cache_destroy(fsverity_info_cachep);
fsverity_exit_hash_algs();
}
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
new file mode 100644
index 0000000000000..e308f22475e8d
--- /dev/null
+++ b/fs/verity/verify.c
@@ -0,0 +1,298 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fs/verity/verify.c: fs-verity data verification functions,
+ * i.e. hooks for ->readpages()
+ *
+ * Copyright 2018 Google LLC
+ *
+ * Originally written by Jaegeuk Kim and Michael Halcrow;
+ * heavily rewritten by Eric Biggers.
+ */
+
+#include "fsverity_private.h"
+
+#include <crypto/hash.h>
+#include <linux/bio.h>
+#include <linux/pagemap.h>
+#include <linux/ratelimit.h>
+#include <linux/scatterlist.h>
+
+struct workqueue_struct *fsverity_read_workqueue;
+
+/**
+ * hash_at_level() - compute the location of the block's hash at the given level
+ *
+ * @vi: (in) the file's verity info
+ * @dindex: (in) the index of the data block being verified
+ * @level: (in) the level of hash we want (0 is leaf level)
+ * @hindex: (out) the index of the hash block containing the wanted hash
+ * @hoffset: (out) the byte offset to the wanted hash within the hash block
+ */
+static void hash_at_level(const struct fsverity_info *vi, pgoff_t dindex,
+ unsigned int level, pgoff_t *hindex,
+ unsigned int *hoffset)
+{
+ pgoff_t position;
+
+ /* Offset of the hash within the level's region, in hashes */
+ position = dindex >> (level * vi->log_arity);
+
+ /* Index of the hash block in the tree overall */
+ *hindex = vi->hash_lvl_region_idx[level] + (position >> vi->log_arity);
+
+ /* Offset of the wanted hash (in bytes) within the hash block */
+ *hoffset = (position & ((1 << vi->log_arity) - 1)) <<
+ (vi->block_bits - vi->log_arity);
+}
+
+/* Extract a hash from a hash page */
+static void extract_hash(struct page *hpage, unsigned int hoffset,
+ unsigned int hsize, u8 *out)
+{
+ void *virt = kmap_atomic(hpage);
+
+ memcpy(out, virt + hoffset, hsize);
+ kunmap_atomic(virt);
+}
+
+static int fsverity_hash_page(const struct fsverity_info *vi,
+ struct ahash_request *req,
+ struct page *page, u8 *out)
+{
+ struct scatterlist sg;
+ DECLARE_CRYPTO_WAIT(wait);
+ int err;
+
+ sg_init_table(&sg, 1);
+ sg_set_page(&sg, page, PAGE_SIZE, 0);
+
+ ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP |
+ CRYPTO_TFM_REQ_MAY_BACKLOG,
+ crypto_req_done, &wait);
+ ahash_request_set_crypt(req, &sg, out, PAGE_SIZE);
+
+ err = crypto_ahash_import(req, vi->hashstate);
+ if (err)
+ return err;
+
+ return crypto_wait_req(crypto_ahash_finup(req), &wait);
+}
+
+static inline int compare_hashes(const u8 *want_hash, const u8 *real_hash,
+ int digest_size, struct inode *inode,
+ pgoff_t index, int level, const char *algname)
+{
+ if (memcmp(want_hash, real_hash, digest_size) == 0)
+ return 0;
+
+ pr_warn_ratelimited("VERIFICATION FAILURE! ino=%lu, index=%lu, level=%d, want_hash=%s:%*phN, real_hash=%s:%*phN\n",
+ inode->i_ino, index, level,
+ algname, digest_size, want_hash,
+ algname, digest_size, real_hash);
+ return -EBADMSG;
+}
+
+/*
+ * Verify a single data page against the file's Merkle tree.
+ *
+ * In principle, we need to verify the entire path to the root node. But as an
+ * optimization, we cache the hash pages in the file's page cache, similar to
+ * data pages. Therefore, we can stop verifying as soon as a verified hash page
+ * is seen while ascending the tree.
+ *
+ * Note that unlike data pages, hash pages are marked Uptodate *before* they are
+ * verified; instead, the Checked bit is set on hash pages that have been
+ * verified. Multiple tasks may race to verify a hash page and mark it Checked,
+ * but it doesn't matter. The use of the Checked bit also implies that the hash
+ * block size must equal PAGE_SIZE (for now).
+ */
+static bool verify_page(struct inode *inode, const struct fsverity_info *vi,
+ struct ahash_request *req, struct page *data_page)
+{
+ pgoff_t index = data_page->index;
+ int level = 0;
+ u8 _want_hash[FS_VERITY_MAX_DIGEST_SIZE];
+ const u8 *want_hash = NULL;
+ u8 real_hash[FS_VERITY_MAX_DIGEST_SIZE];
+ struct page *hpages[FS_VERITY_MAX_LEVELS];
+ unsigned int hoffsets[FS_VERITY_MAX_LEVELS];
+ int err;
+
+ /* The page must not be unlocked until verification has completed. */
+ if (WARN_ON_ONCE(!PageLocked(data_page)))
+ return false;
+
+ /*
+ * Filesystems shouldn't ask to verify pages beyond the end of the
+ * original data (e.g. pages of the Merkle tree itself, if it's stored
+ * beyond EOF), but to be safe check for it here too.
+ */
+ if (index >= (vi->data_i_size + PAGE_SIZE - 1) >> PAGE_SHIFT) {
+ pr_debug("Page %lu is beyond data region\n", index);
+ return true;
+ }
+
+ pr_debug_ratelimited("Verifying data page %lu...\n", index);
+
+ /*
+ * Starting at the leaves, ascend the tree saving hash pages along the
+ * way until we find a verified hash page, indicated by PageChecked; or
+ * until we reach the root.
+ */
+ for (level = 0; level < vi->depth; level++) {
+ pgoff_t hindex;
+ unsigned int hoffset;
+ struct page *hpage;
+
+ hash_at_level(vi, index, level, &hindex, &hoffset);
+
+ pr_debug_ratelimited("Level %d: hindex=%lu, hoffset=%u\n",
+ level, hindex, hoffset);
+
+ hpage = fsverity_read_metadata_page(inode, hindex);
+ if (IS_ERR(hpage)) {
+ err = PTR_ERR(hpage);
+ goto out;
+ }
+
+ if (PageChecked(hpage)) {
+ extract_hash(hpage, hoffset, vi->hash_alg->digest_size,
+ _want_hash);
+ want_hash = _want_hash;
+ put_page(hpage);
+ pr_debug_ratelimited("Hash page already checked, want %s:%*phN\n",
+ vi->hash_alg->name,
+ vi->hash_alg->digest_size,
+ want_hash);
+ break;
+ }
+ pr_debug_ratelimited("Hash page not yet checked\n");
+ hpages[level] = hpage;
+ hoffsets[level] = hoffset;
+ }
+
+ if (!want_hash) {
+ want_hash = vi->root_hash;
+ pr_debug("Want root hash: %s:%*phN\n", vi->hash_alg->name,
+ vi->hash_alg->digest_size, want_hash);
+ }
+
+ /* Descend the tree verifying hash pages */
+ for (; level > 0; level--) {
+ struct page *hpage = hpages[level - 1];
+ unsigned int hoffset = hoffsets[level - 1];
+
+ err = fsverity_hash_page(vi, req, hpage, real_hash);
+ if (err)
+ goto out;
+ err = compare_hashes(want_hash, real_hash,
+ vi->hash_alg->digest_size,
+ inode, index, level - 1,
+ vi->hash_alg->name);
+ if (err)
+ goto out;
+ SetPageChecked(hpage);
+ extract_hash(hpage, hoffset, vi->hash_alg->digest_size,
+ _want_hash);
+ want_hash = _want_hash;
+ put_page(hpage);
+ pr_debug("Verified hash page at level %d, now want %s:%*phN\n",
+ level - 1, vi->hash_alg->name,
+ vi->hash_alg->digest_size, want_hash);
+ }
+
+ /* Finally, verify the data page */
+ err = fsverity_hash_page(vi, req, data_page, real_hash);
+ if (err)
+ goto out;
+ err = compare_hashes(want_hash, real_hash, vi->hash_alg->digest_size,
+ inode, index, -1, vi->hash_alg->name);
+out:
+ for (; level > 0; level--)
+ put_page(hpages[level - 1]);
+ if (err) {
+ pr_warn_ratelimited("Error verifying page; ino=%lu, index=%lu (err=%d)\n",
+ inode->i_ino, data_page->index, err);
+ return false;
+ }
+ return true;
+}
+
+/**
+ * fsverity_verify_page - verify a data page
+ *
+ * Verify a page that has just been read from a file against that file's Merkle
+ * tree. The page is assumed to be a pagecache page.
+ *
+ * Return: true if the page is valid, else false.
+ */
+bool fsverity_verify_page(struct page *data_page)
+{
+ struct inode *inode = data_page->mapping->host;
+ const struct fsverity_info *vi = get_fsverity_info(inode);
+ struct ahash_request *req;
+ bool valid;
+
+ req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
+ if (unlikely(!req))
+ return false;
+
+ valid = verify_page(inode, vi, req, data_page);
+
+ ahash_request_free(req);
+
+ return valid;
+}
+EXPORT_SYMBOL_GPL(fsverity_verify_page);
+
+#ifdef CONFIG_BLOCK
+/**
+ * fsverity_verify_bio - verify a 'read' bio that has just completed
+ *
+ * Verify a set of pages that have just been read from a file against that
+ * file's Merkle tree. The pages are assumed to be pagecache pages. Pages that
+ * fail verification are set to the Error state. Verification is skipped for
+ * pages already in the Error state, e.g. due to fscrypt decryption failure.
+ *
+ * This is a helper function for filesystems that issue bios to read data
+ * directly into the page cache. Filesystems that work differently should call
+ * fsverity_verify_page() on each page instead. fsverity_verify_page() is also
+ * needed on holes!
+ */
+void fsverity_verify_bio(struct bio *bio)
+{
+ struct inode *inode = bio_first_page_all(bio)->mapping->host;
+ const struct fsverity_info *vi = get_fsverity_info(inode);
+ struct ahash_request *req;
+ struct bio_vec *bv;
+ int i;
+
+ req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
+ if (unlikely(!req)) {
+ bio_for_each_segment_all(bv, bio, i)
+ SetPageError(bv->bv_page);
+ return;
+ }
+
+ bio_for_each_segment_all(bv, bio, i) {
+ struct page *page = bv->bv_page;
+
+ if (!PageError(page) && !verify_page(inode, vi, req, page))
+ SetPageError(page);
+ }
+
+ ahash_request_free(req);
+}
+EXPORT_SYMBOL_GPL(fsverity_verify_bio);
+#endif /* CONFIG_BLOCK */
+
+/**
+ * fsverity_enqueue_verify_work - enqueue work on the fs-verity workqueue
+ *
+ * Enqueue verification work for asynchronous processing.
+ */
+void fsverity_enqueue_verify_work(struct work_struct *work)
+{
+ queue_work(fsverity_read_workqueue, work);
+}
+EXPORT_SYMBOL_GPL(fsverity_enqueue_verify_work);
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index c9422a579c160..15478fe7d55aa 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -28,6 +28,16 @@ extern int fsverity_prepare_getattr(struct inode *inode);
extern void fsverity_cleanup_inode(struct inode *inode);
extern loff_t fsverity_full_i_size(const struct inode *inode);
+/* verify.c */
+extern bool fsverity_verify_page(struct page *page);
+extern void fsverity_verify_bio(struct bio *bio);
+extern void fsverity_enqueue_verify_work(struct work_struct *work);
+
+static inline bool fsverity_check_hole(struct inode *inode, struct page *page)
+{
+ return inode->i_verity_info == NULL || fsverity_verify_page(page);
+}
+
#else /* !__FS_HAS_VERITY */
/* setup.c */
@@ -57,6 +67,29 @@ static inline loff_t fsverity_full_i_size(const struct inode *inode)
return i_size_read(inode);
}
+/* verify.c */
+
+static inline bool fsverity_verify_page(struct page *page)
+{
+ WARN_ON(1);
+ return false;
+}
+
+static inline void fsverity_verify_bio(struct bio *bio)
+{
+ WARN_ON(1);
+}
+
+static inline void fsverity_enqueue_verify_work(struct work_struct *work)
+{
+ WARN_ON(1);
+}
+
+static inline bool fsverity_check_hole(struct inode *inode, struct page *page)
+{
+ return true;
+}
+
#endif /* !__FS_HAS_VERITY */
#endif /* _LINUX_FSVERITY_H */
--
2.19.1.568.g152ad8e336-goog
From: Eric Biggers <[email protected]>
Add a documentation file for fs-verity, covering:
- Introduction
- Use cases
- Metadata format
- Merkle tree
- fs-verity descriptor
- fsveritysetup format
- Filesystem support
- ext4
- f2fs
- User API
- FS_IOC_ENABLE_VERITY
- FS_IOC_MEASURE_VERITY
- Access semantics
- In-kernel policies
- Built-in signature verification
- Implementation details
- I/O path design
- Userspace utility
- Tests
Signed-off-by: Eric Biggers <[email protected]>
---
Documentation/filesystems/fsverity.rst | 583 +++++++++++++++++++++++++
Documentation/filesystems/index.rst | 11 +
2 files changed, 594 insertions(+)
create mode 100644 Documentation/filesystems/fsverity.rst
diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst
new file mode 100644
index 0000000000000..d633fc0567bd5
--- /dev/null
+++ b/Documentation/filesystems/fsverity.rst
@@ -0,0 +1,583 @@
+========================================================
+Read-only file-based authenticity protection (fs-verity)
+========================================================
+
+Introduction
+============
+
+fs-verity (``fs/verity/``) is a library that filesystems can hook into
+to support transparent integrity and authenticity protection of
+read-only files. Currently, it is supported by the ext4 and f2fs
+filesystems. Similar to fscrypt, not too much filesystem-specific
+code is needed to support fs-verity.
+
+fs-verity is similar to `dm-verity
+<https://www.kernel.org/doc/Documentation/device-mapper/verity.txt>`_
+but works on files rather than block devices. On supported
+filesystems, userspace can append a Merkle tree (hash tree) to a file,
+then use an ioctl to enable fs-verity on it. Then, the filesystem
+transparently verifies all data read from the file against the Merkle
+tree; reads that fail verification will fail. The filesystem also
+hides or moves the Merkle tree, and forbids changes to the file's
+contents via the syscall interface.
+
+Essentially, fs-verity is a way of efficiently hashing a file, subject
+to the caveat that the enforcement of that hash happens on-demand as
+reads occur. The file hash that fs-verity computes is called the
+"file measurement"; this is the hash of the Merkle tree's root hash
+and certain other fs-verity metadata, and it takes constant time to
+compute regardless of the size of the file. Note: the value of the
+fs-verity file measurement will differ from a regular hash of the
+file, even when they use the same hash algorithm, e.g. SHA-256;
+however, they achieve the same purpose.
+
+Use cases
+=========
+
+In general, fs-verity does not replace or obsolete dm-verity.
+dm-verity should still be used when it is possible to authenticate the
+full block device, i.e. when the device is read-only. fs-verity is
+intended for use on read-write filesystems where dm-verity cannot be
+used.
+
+fs-verity is most useful for hashing large files where only a small
+portion may be accessed. For example, it's useful on Android
+application package (APK) files, which typically contain many
+translations, classes, and other resources that are infrequently or
+even never accessed on a particular device. It would be wasteful to
+hash the entire file before starting the application.
+
+Unlike an ahead-of-time hash, fs-verity also re-verifies data each
+time it's paged in, which ensures the file measurement remains
+correctly enforced even if the file contents are modified from
+underneath the filesystem, e.g. by malicious disk firmware.
+
+fs-verity can support various use cases, such as:
+
+- Integrity protection (detecting accidental corruption)
+- Auditing (logging file hashes before use)
+- Authenticity protection (detecting malicious modification)
+
+Note that the latter two are not features of fs-verity per se, but
+rather fs-verity is a tool for supporting these use cases. For
+example, for the overall system to actually provide authenticity
+protection, the file measurement itself must still be authenticated,
+e.g. by comparing it with a known good value or by verifying a digital
+signature of it.
+
+This can be userspace driven, in which case fs-verity will only be
+used (essentially) as a fast way of hashing the file contents, via the
+`FS_IOC_MEASURE_VERITY`_ ioctl. For authenticity protection, trusted
+userspace code [#]_ still must verify the relevant portions of the
+untrusted filesystem state before it is used in a security-critical
+way, such as executing code from it.
+
+For example, the trusted userspace code might verify that the file
+located at ``/foo/bar/baz`` has an fs-verity file measurement of
+``sha256:a83d5cd722ef0d070b23353c2d9f316c38425114da8bd007cb9e8499371a97b3``,
+or that all security-critical files (e.g. executable code) have stored
+alongside them a valid digital signature (signed by a known, trusted
+public key) of their fs-verity file measurement, potentially combined
+with other important file metadata such as path and SELinux label.
+
+However, for ease of use, a subset of this policy logic (but not all
+of it!) is also supported in the kernel by the `Built-in signature
+verification`_ mechanism. Support for fs-verity file hashes in IMA
+(Integrity Measurement Architecture) policies is also planned.
+
+.. [#] For example, on Android, "trusted userspace code" would be code
+ running from the system or vendor partitions, which are
+ read-only partitions authenticated by dm-verity tied into
+ Verified Boot, as opposed to the userdata partition which is
+ read-write.
+
+Metadata format
+===============
+
+Merkle tree
+-----------
+
+fs-verity uses the same Merkle tree (hash tree) format as dm-verity;
+the only difference is that fs-verity's Merkle tree is built over the
+contents of a regular file rather than a block device.
+
+Briefly, the file contents is divided into blocks, where the blocksize
+is configurable but usually 4096 bytes. The last block is zero-padded
+if needed. Each block is then hashed, producing the first level of
+hashes. Then, the hashes in this first level are grouped into
+'blocksize'-byte blocks (zero-padding the ends as needed) and these
+blocks are hashed, producing the second level of hashes. This
+proceeds up the tree until only a single block remains. The hash of
+this block is called the "Merkle tree root hash". Note: if the entire
+file contents fit in one block, then there are no hash blocks and the
+"Merkle tree root hash" is simply the hash of the data block.
+
+The blocks of the Merkle tree are stored on-disk starting from the
+root level and then proceeding to store each level down to the "first"
+(the level that gives the hashes of the data blocks).
+
+The hash algorithm is configurable. The default is SHA-256, but
+SHA-512 is also supported. The non-cryptographic checksum CRC-32C is
+also supported for integrity-only use cases such as detecting bit
+errors in read-only backup files. A non-cryptographic checksum must
+not be used if authenticity protection is desired.
+
+In the recommended configuration of SHA-256 and 4K blocks, 128 hash
+values fit in each block. Thus, each level of the hash tree is 128
+times smaller than the previous, and for large files the Merkle tree's
+size converges to approximately 1/129 of the original file size.
+However, for small files, the padding to a block boundary is
+significant, making the space overhead proportionally more.
+
+fs-verity descriptor
+--------------------
+
+For each file, fs-verity also uses an additional on-disk metadata
+structure called the *fs-verity descriptor*. This contains the
+properties of the Merkle tree and some other information. It begins
+with a header in the following format::
+
+ struct fsverity_descriptor {
+ __u8 magic[8];
+ __u8 major_version;
+ __u8 minor_version;
+ __u8 log_data_blocksize;
+ __u8 log_tree_blocksize;
+ __le16 data_algorithm;
+ __le16 tree_algorithm;
+ __le32 flags;
+ __le32 reserved1;
+ __le64 orig_file_size;
+ __le16 auth_ext_count;
+ __u8 reserved2[30];
+ };
+
+This structure contains:
+
+- ``magic`` is the ASCII bytes "FSVerity".
+- ``major_version`` is 1.
+- ``minor_version`` is 0.
+- ``log_data_blocksize`` and ``log_tree_blocksize`` are the log base 2
+ of the block size (in bytes) of data blocks and Merkle tree blocks,
+ respectively. Currently, in both cases the kernel only supports
+ page-sized blocks, i.e. on most architectures, 4096-byte blocks.
+ Thus, usually both of these fields must be 12.
+- ``data_algorithm`` and ``tree_algorithm`` are the hash algorithms
+ used to hash data blocks and Merkle tree blocks, respectively.
+ Currently the kernel requires these to have the same value. The
+ recommended value is FS_VERITY_ALG_SHA256. See
+ ``include/uapi/linux/fsverity.h`` for the list of allowed values.
+- ``orig_file_size`` is the original size of the file in bytes. This
+ means the size excluding the verity metadata and padding.
+- ``auth_ext_count`` is the number of authenticated extensions that
+ follow.
+- All other fields are zeroed.
+
+Following the ``struct fsverity_descriptor``, there is a list of
+"authenticated extensions". Each extension is a variable-length
+structure that begins with the following header::
+
+ struct fsverity_extension {
+ __le32 length;
+ __le16 type;
+ __le16 reserved;
+ };
+
+This structure contains:
+
+- ``length`` is the length of this extension in bytes, including the
+ header.
+- ``type`` is the extension number. See
+ ``include/uapi/linux/fsverity.h`` for the allowed values.
+- ``reserved`` must be 0.
+
+Each extension begins on an 8-byte aligned boundary. When an
+extension's length is not a multiple of 8, it must be zero-padded to
+the next 8-byte boundary, even if it is the last extension. This zero
+padding is not counted in the ``length`` field.
+
+This first list of extensions is "authenticated", meaning that they
+are included in the file measurement. Currently, the following
+authenticated extensions are supported. Except where otherwise
+indicated, extensions are optional and cannot be given multiple times:
+
+- FS_VERITY_EXT_ROOT_HASH: This is mandatory. It gives the root hash
+ of the Merkle tree, as a byte array.
+- FS_VERITY_EXT_SALT: A salt to salt the hashes with, given as a byte
+ array. The salt is prepended to every block that is hashed. Any
+ length salt is supported. Note that using a unique salt for every
+ file should make it more difficult for fs-verity to be attacked
+ across many files. However, in principle this is unnecessary since
+ simply choosing a strong cryptographic hash algorithm such as
+ SHA-256 or SHA-512 should be sufficient.
+
+Following the authenticated extensions, there is a list of
+unauthenticated extensions. These are *not* included in the file
+measurement. This list begins with a header::
+
+ __le16 unauth_ext_count;
+ __le16 padding[3];
+
+``unauth_ext_count`` is the number of unauthenticated extensions.
+This may be 0.
+
+Like authenticated extensions, each unauthenticated extension begins
+with the header ``struct fsverity_extension`` from above.
+
+The following types of unauthenticated extensions are supported:
+
+- FS_VERITY_EXT_PKCS7_SIGNATURE. This is a DER-encoded PKCS#7 message
+ containing the signed file measurement. See `Built-in signature
+ verification`_ for details.
+
+fsveritysetup format
+--------------------
+
+When enabling fs-verity on a file via the `FS_IOC_ENABLE_VERITY`_
+ioctl, the kernel requires that the verity metadata has been appended
+to the file contents. Specifically, the file must be arranged as:
+
+#. Original file contents
+#. Zero-padding to next block boundary
+#. `Merkle tree`_
+#. `fs-verity descriptor`_
+#. fs-verity footer
+
+We call this file format the "fsveritysetup format". It is not
+necessarily the on-disk format actually used by the filesystem, since
+the filesystem is free to move things around during the ioctl.
+However, the easiest way to implement fs-verity is to just keep this
+arrangement in-place, as ext4 and f2fs do; see `Filesystem support`_.
+
+Note that "block" here means the fs-verity block size, which is not
+necessarily the same as the filesystem's block size. For example, on
+ext4, fs-verity can use 4K blocks on top of a filesystem formatted to
+use a 1K block size.
+
+The fs-verity footer is a structure of the following format::
+
+ struct fsverity_footer {
+ __le32 desc_reverse_offset;
+ __u8 magic[8];
+ };
+
+``desc_reverse_offset`` is the distance in bytes from the end of the
+fs-verity footer to the beginning of the fs-verity descriptor; this
+allows software to find the fs-verity descriptor. ``magic`` is the
+ASCII bytes "FSVerity"; this allows software to quickly identify a
+file as being in the "fsveritysetup" format as well as find the
+fs-verity footer if zeroes have been appended.
+
+The kernel cannot handle fs-verity footers that cross a page boundary.
+Padding must be prepended as needed to meet this constaint.
+
+Filesystem support
+==================
+
+ext4
+----
+
+ext4 supports fs-verity since kernel version TODO.
+
+CONFIG_EXT4_FS_VERITY must be enabled in the kernel config. Also, the
+filesystem must have been formatted with ``-O verity``, or had
+``tune2fs -O verity`` run on it. These require e2fsprogs v1.44.4-2 or
+later. This e2fsprogs version is also required for e2fsck to
+understand the verity feature. Since "verity" is an RO_COMPAT
+feature, once enabled earlier kernels will be unable to mount the
+filesystem for writing, and earlier versions of e2fsck will be unable
+to check the filesystem.
+
+ext4 only allows fs-verity on extent-based files.
+
+The EXT4_VERITY_FL flag in the inode is used to indicate that the
+inode uses fs-verity. This bit cannot be set directly; it can only be
+set indirectly via `FS_IOC_ENABLE_VERITY`_.
+
+When enabling verity on an inode, ext4 leaves the verity metadata
+in-place in the `fsveritysetup format`_. However, it changes the
+on-disk i_size to the original file size, which allows the verity
+feature to be RO_COMPAT rather than INCOMPAT. Later, the fs-verity
+footer is found by scanning backwards from the end of the last extent
+rather than from i_size.
+
+f2fs
+----
+
+f2fs supports fs-verity since kernel version TODO.
+
+CONFIG_F2FS_FS_VERITY must be enabled in the kernel config. Also, the
+filesystem must have been formatted with ``-O verity``. This requires
+f2fs-tools v1.11.0 or later.
+
+The FADVISE_VERITY_BIT flag in the inode is used to indicate that the
+inode uses fs-verity. This bit cannot be set directly; it can only be
+set indirectly via `FS_IOC_ENABLE_VERITY`_.
+
+When enabling verity on an inode, f2fs leaves the verity metadata
+in-place in the `fsveritysetup format`_. It leaves the on-disk i_size
+as the full file size; however, the in-memory i_size is overridden
+with the original size.
+
+User API
+========
+
+FS_IOC_ENABLE_VERITY
+--------------------
+
+The FS_IOC_ENABLE_VERITY ioctl enables fs-verity on a regular file.
+Userspace must have already appended verity metadata to the file,
+using the file format described in `fsveritysetup format`_.
+Additionally, the filesystem must support fs-verity.
+
+The argument parameter for this ioctl is reserved and must be NULL.
+
+This ioctl checks for write access to the inode; no capability is
+required. However, it must be executed on an O_RDONLY file
+descriptor, and no processes may have the file open for writing.
+(This is necessary to prevent various race conditions.)
+
+On success, this ioctl returns 0, and the file becomes a verity file.
+This means that:
+
+- The filesystem marks the file as a verity file both in-memory and
+ on-disk, e.g. by setting a bit in the inode.
+- All later reads from the file are verified against the Merkle tree.
+- The verity metadata at the end of the file is hidden or moved.
+- Opening the file for writing or truncating it is no longer allowed.
+- There is no way to disable verity on the file, other than by
+ deleting it and replacing it with a copy.
+
+If this ioctl fails, then no changes are made to the file. The
+reasons it might fail include:
+
+- ``EACCES``: the process does not have write access to the file
+- ``EBADMSG``: the file's fs-verity metadata is invalid
+- ``EEXIST``: the file already has fs-verity enabled
+- ``EINVAL``: a value was specified for the reserved argument
+ parameter, or the file descriptor refers to neither a regular file
+ nor a directory
+- ``EIO``: an I/O error occurred
+- ``EISDIR``: the file descriptor refers to a directory, not a regular
+ file
+- ``ENOTTY``: this type of filesystem does not implement fs-verity
+- ``EOPNOTSUPP``: the kernel was not configured with fs-verity support
+ for this filesystem, or the filesystem superblock has not had the
+ 'verity' feature enabled on it. (See `Filesystem support`_.)
+- ``EPERM``: the file is append-only
+- ``EROFS``: the filesystem is read-only
+- ``ETXTBSY``: the file is open for writing. Note that this can be
+ the caller's file descriptor, or another open file descriptor, or
+ the file reference held by a writable memory map.
+
+FS_IOC_MEASURE_VERITY
+---------------------
+
+The FS_IOC_MEASURE_VERITY ioctl retrieves the fs-verity measurement of
+a regular file. This is a digest that cryptographically summarizes
+the file contents that are being enforced on reads. The file must
+have fs-verity enabled.
+
+This ioctl takes in a pointer to a variable-length structure::
+
+ struct fsverity_digest {
+ __u16 digest_algorithm;
+ __u16 digest_size; /* input/output */
+ __u8 digest[];
+ };
+
+``digest_size`` is an input/output field. On input, it must be
+initialized to the number of bytes allocated for the variable-length
+``digest`` field.
+
+On success, 0 is returned and the kernel fills in the structure as
+follows:
+
+- ``digest_algorithm`` will be the hash algorithm used for the file
+ measurement. It will match the algorithm used in the Merkle tree,
+ e.g. FS_VERITY_ALG_SHA256. See ``include/uapi/linux/fsverity.h``
+ for the list of possible values.
+- ``digest_size`` will be the size of the digest in bytes, e.g. 32
+ for SHA-256. (This can be redundant with ``digest_algorithm``.)
+- ``digest`` will be the actual bytes of the digest.
+
+This ioctl is guaranteed to be very fast. Due to fs-verity's use of a
+Merkle tree, its running time is independent of the file size.
+
+This ioctl can fail with the following errors:
+
+- ``EFAULT``: invalid buffer was specified
+- ``ENODATA``: the file is not a verity file
+- ``ENOTTY``: this type of filesystem does not implement fs-verity
+- ``EOPNOTSUPP``: the kernel was not configured with fs-verity support
+ for this filesystem, or the filesystem superblock has not had the
+ 'verity' feature enabled on it. (See `Filesystem support`_.)
+- ``EOVERFLOW``: the file measurement is longer than the specified
+ ``digest_size`` bytes. Try providing a larger buffer.
+
+Access semantics
+================
+
+fs-verity only implements reads, not writes. Therefore, after it is
+enabled on a given file, regardless of the mode bits filesystems will
+forbid opening the file for writing as well as changing the size of
+the file via truncate(). The error code received for this is EPERM.
+
+However, fs-verity does not measure metadata such as owner, mode,
+timestamps, and xattrs. Therefore, changes to these are still
+allowed.
+
+For read-only access, fs-verity is intended to be transparent; no
+changes to userspace applications should be needed. However, astute
+users may notice some slight differences in behavior:
+
+- Direct I/O is not supported on verity files. Attempts to use direct
+ I/O on such files will fall back to buffered I/O.
+
+- DAX (Direct Access) is not supported on verity files.
+
+Note: read-only mmaps are supported, as is combining fs-verity and
+fscrypt.
+
+Verity files can be sparse; holes are still verified.
+
+In-kernel policies
+==================
+
+Built-in signature verification
+-------------------------------
+
+With CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y, fs-verity supports putting
+a portion of an authentication policy (see `Use cases`_) in the
+kernel. Specifically, it adds support for:
+
+1. At fs-verity module initialization time, a keyring ".fs-verity" is
+ created. The root user can add trusted X.509 certificates to this
+ keyring using the add_key() system call, then (when done)
+ optionally use keyctl_restrict_keyring() to prevent additional
+ certificates from being added.
+
+2. When a PKCS7_SIGNATURE extension containing a signed file
+ measurement is found in a file's verity metadata, the kernel will
+ verify this signature against the certificates in the ".fs-verity"
+ keyring, and verify that it matches the actual file measurement.
+ The extension must contain the PKCS#7 formatted signature in DER
+ format, where the signed data is the file measurement as a ``struct
+ fsverity_digest`` as described for `FS_IOC_MEASURE_VERITY`_ except
+ that all fields must be little-endian rather than native endian.
+
+3. A new sysctl "fs.verity.require_signatures" is made available.
+ When set to 1, the kernel requires that all fs-verity files have a
+ correctly signed file measurement as described in (2).
+
+This is meant as a relatively simple mechanism that can be used to
+provide some level of authenticity protection for fs-verity files, as
+an alternative to doing the signature verification in userspace or
+using IMA-appraisal. However, with this mechanism, userspace programs
+still need to check that the fs-verity bit is set, and there is no
+protection against fs-verity files being swapped around.
+
+Implementation details
+======================
+
+I/O path design
+---------------
+
+To support fs-verity, the filesystem's ``->readpage()`` and
+``->readpages()`` methods are modified to verify the data pages before
+they are marked Uptodate. Merely hooking ``->read_iter()`` would be
+insufficient, since ``->read_iter()`` is not used for memory maps.
+fs-verity exposes functions to verify data:
+
+- ``fsverity_verify_page()`` verifies an individual page
+- ``fsverity_verify_bio()`` verifies all pages in a bio
+
+Currently, fs-verity only supports the case where data blocks, hash
+blocks, and pages all have the same size (usually 4096 bytes).
+
+Filesystems that use bios call ``fsverity_verify_bio()`` after each
+read bio completes. To do this while also continuing to support
+encryption (fscrypt), filesystems allocate a "post-read context" for
+each bio and store it in ``->bi_private``::
+
+ struct bio_post_read_ctx {
+ struct bio *bio;
+ struct work_struct work;
+ unsigned int cur_step;
+ unsigned int enabled_steps;
+ };
+
+``enabled_steps`` is a bitmask of the post-read steps that are
+enabled. The available steps are STEP_DECRYPT and STEP_VERITY. These
+steps can be enabled together, independently, or not at all. If both
+are enabled, then decryption is done first. Since bio completion
+callbacks cannot sleep, each post-read step is done by enqueueing the
+struct on a workqueue, and then actual work happens in the work item.
+Different workqueues are needed for encryption and verity because
+verity work may require decrypting metadata pages from the file.
+
+The bio completion callback sets PG_error for each page if either
+decryption or verification failed. Finally, after the work item(s)
+complete, pages without PG_error are set Uptodate, and all pages are
+unlocked.
+
+A data page being set Uptodate and unlocked implies that it has been
+verified, and such pages become visible to userspace via read(),
+mmap(), etc. Otherwise, the page is left in the PG_error && !Uptodate
+state which results in the read() family of syscalls failing with EIO,
+and accesses to the data via a memory map raising SIGBUS. Note that
+even if some pages in a file fail verification, pages that pass
+verification can still be read.
+
+To verify a data page, fs-verity reads the required hash page(s)
+starting at the leaves and ascending to the root; then, the pages are
+verified descending from the root. Filesystems that store the verity
+metadata past EOF implement reading hash pages using their usual
+``->readpage{,s}()`` methods, with modifications:
+
+- Verification is skipped for pages beyond ``i_size``.
+- When checking whether a page is in the implicit hole beyond EOF,
+ the full file size (including the verity metadata) is used rather
+ than the original data i_size. Note that this does not allow
+ userspace to read or mmap the verity metadata.
+
+The hash pages are also cached in the inode's address_space, similar
+to data pages. However, to simplify the verification logic, a hash
+page being Uptodate doesn't imply that it has been verified; instead,
+the PG_checked bit is used for this purpose. Hash pages aren't locked
+while being verified, so multiple threads may race to set PG_checked,
+but this doesn't matter.
+
+Thus, when ascending the tree reading hash pages, fs-verity can stop
+as soon as it finds an already-checked hash page. This optimization,
+which is also used by dm-verity, results in excellent sequential read
+performance since usually the deepest needed hash page will already be
+cached and checked. However, random reads perform worse.
+
+Files may contain holes. Normally, the filesystem's
+``->readpage{,s}()`` methods will zero pages in holes and set them
+Uptodate without issuing any bios. To prevent this from being abused
+to bypass fs-verity, filesystems call ``fsverity_verify_page()`` on
+hole pages.
+
+Like fscrypt, filesystems also disable direct I/O on verity files,
+since direct I/O bypasses the normal read paths.
+
+Userspace utility
+=================
+
+This document focuses on the kernel, but a userspace utility for
+fs-verity can be found at:
+
+ https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity-utils.git
+
+See the README.md file in the fsverity-utils source tree for details,
+including examples of setting up fs-verity protected files.
+
+Tests
+=====
+
+To test fs-verity, use xfstests. For example, using `kvm-xfstests
+<https://git.kernel.org/pub/scm/fs/ext2/xfstests-bld.git/tree/Documentation/kvm-quickstart.md>`_::
+
+ kvm-xfstests -c ext4,f2fs -g verity
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 46d1b1be3a510..818390c32be63 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -359,3 +359,14 @@ encryption of files and directories.
:maxdepth: 2
fscrypt
+
+Verity API
+==========
+
+A library which filesystems can hook into to support transparent
+authentication of read-only files.
+
+.. toctree::
+ :maxdepth: 2
+
+ fsverity
--
2.19.1.568.g152ad8e336-goog
On Nov 1, 2018, at 4:52 PM, Eric Biggers <[email protected]> wrote:
>
> From: Eric Biggers <[email protected]>
>
> Add basic fs-verity support to ext4. fs-verity is a filesystem feature
> that enables transparent integrity protection and authentication of
> read-only files. It uses a dm-verity like mechanism at the file level:
> a Merkle tree is used to verify any block in the file in log(filesize)
> time. It is implemented mainly by helper functions in fs/verity/.
> See Documentation/filesystems/fsverity.rst for details.
>
> This patch adds everything except the data verification hooks that will
> needed in ->readpages().
>
> On ext4, enabling fs-verity on a file requires that the filesystem has
> the 'verity' feature, e.g. that it was formatted with
> 'mkfs.ext4 -O verity' or had 'tune2fs -O verity' run on it.
> This requires e2fsprogs 1.44.4-2 or later.
>
> In ext4, we choose to retain the fs-verity metadata past the end of the
> file rather than trying to move it into an external inode xattr, since
> in practice keeping the metadata in-line actually results in the
> simplest and most efficient implementation. One non-obvious advantage
> of keeping the verity metadata in-line is that when fs-verity is
> combined with fscrypt, the verity metadata naturally gets encrypted too;
> this is actually necessary because it contains hashes of the plaintext.
On the plus side, this means that the verity data will automatically be
invalidated if the file is truncated or extended, but on the negative side
it means that the verity Merkle tree needs to be recalculated for the
entire file if e.g. the file is appended to.
I guess the current implementation will generate the Merkle tree in
userspace, but at some point it might be useful to generate it on-the-fly
to have proper data integrity from the time of write (e.g. like ZFS)
rather than only allowing it to be stored after the entire file is written?
Storing the Merkle tree in a large xattr inode would allow this to change
in the future rather than being stuck with the current implementation. We
could encrypt the xattr data just as easily as the file data (which should
be done anyway even for non-verity files to avoid leaking data), and having
the verity attr keyed to the inode version/size/mime(?) would ensure the
kernel knows it is stale if the inode is modified.
I'm not going to stand on my head and block this implementation, I just
thought it is worthwhile to raise these issues now rather than after it
is a fait accompli.
> We also choose to keep the on-disk i_size equal to the original file
> size, in order to make the 'verity' feature a RO_COMPAT feature. Thus,
> ext4 has to find the fsverity_footer by looking in the last extent.
Cheers, Andreas
From: Eric Biggers <[email protected]>
I'm volunteering to maintain fs-verity.
It's been suggested to take fs-verity changes through the fscrypt git
tree, but as these are logically independent features I suggest having a
separate git tree for fs-verity.
But I left the mailing list as linux-fscrypt for now.
Signed-off-by: Eric Biggers <[email protected]>
---
MAINTAINERS | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 690c2f68a401f..72fef7c44bfba 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6052,6 +6052,17 @@ S: Maintained
F: fs/notify/
F: include/linux/fsnotify*.h
+FSVERITY: READ-ONLY FILE-BASED AUTHENTICITY PROTECTION
+M: Eric Biggers <[email protected]>
+L: [email protected]
+Q: https://patchwork.kernel.org/project/linux-fscrypt/list/
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity.git
+S: Supported
+F: fs/verity/
+F: include/linux/fsverity.h
+F: include/uapi/linux/fsverity.h
+F: Documentation/filesystems/fsverity.rst
+
FUJITSU LAPTOP EXTRAS
M: Jonathan Woithe <[email protected]>
L: [email protected]
--
2.19.1.568.g152ad8e336-goog
From: Eric Biggers <[email protected]>
Add CRC-32C support to fs-verity, to provide a faster alternative to
SHA-256 for users who want integrity-only (not authenticity), i.e. who
want to detect only accidental corruption, not malicious changes.
CRC-32C is chosen over CRC-32 because the CRC-32C polynomial is believed
to provide slightly better error-detection properties; and CRC-32C is
just as fast (or can be just as fast) as CRC-32, or even faster e.g. on
some x86 processors that have a CRC-32C instruction but not CRC-32.
We use "crc32c" from the crypto API, so the polynomial convention is
bitwise little-endian, the digest is bytewise little-endian, and the CRC
bits are inverted at the beginning and end (which is desirable).
Signed-off-by: Eric Biggers <[email protected]>
---
fs/verity/hash_algs.c | 4 ++++
include/uapi/linux/fsverity.h | 1 +
2 files changed, 5 insertions(+)
diff --git a/fs/verity/hash_algs.c b/fs/verity/hash_algs.c
index 3174a0c08785d..109afeec60fc9 100644
--- a/fs/verity/hash_algs.c
+++ b/fs/verity/hash_algs.c
@@ -23,6 +23,10 @@ struct fsverity_hash_alg fsverity_hash_algs[] = {
.digest_size = 64,
.cryptographic = true,
},
+ [FS_VERITY_ALG_CRC32C] = {
+ .name = "crc32c",
+ .digest_size = 4,
+ },
};
/*
diff --git a/include/uapi/linux/fsverity.h b/include/uapi/linux/fsverity.h
index 67ed830ae2ece..a96bbf87077de 100644
--- a/include/uapi/linux/fsverity.h
+++ b/include/uapi/linux/fsverity.h
@@ -29,6 +29,7 @@ struct fsverity_digest {
/* Supported hash algorithms */
#define FS_VERITY_ALG_SHA256 1
#define FS_VERITY_ALG_SHA512 2
+#define FS_VERITY_ALG_CRC32C 3 /* for integrity only */
/* Metadata stored near the end of verity files, after the Merkle tree */
/* This structure is 64 bytes long */
--
2.19.1.568.g152ad8e336-goog
Hi Andreas,
On Mon, Nov 05, 2018 at 02:05:24PM -0700, Andreas Dilger wrote:
> On Nov 1, 2018, at 4:52 PM, Eric Biggers <[email protected]> wrote:
> >
> > From: Eric Biggers <[email protected]>
> >
> > Add basic fs-verity support to ext4. fs-verity is a filesystem feature
> > that enables transparent integrity protection and authentication of
> > read-only files. It uses a dm-verity like mechanism at the file level:
> > a Merkle tree is used to verify any block in the file in log(filesize)
> > time. It is implemented mainly by helper functions in fs/verity/.
> > See Documentation/filesystems/fsverity.rst for details.
> >
> > This patch adds everything except the data verification hooks that will
> > needed in ->readpages().
> >
> > On ext4, enabling fs-verity on a file requires that the filesystem has
> > the 'verity' feature, e.g. that it was formatted with
> > 'mkfs.ext4 -O verity' or had 'tune2fs -O verity' run on it.
> > This requires e2fsprogs 1.44.4-2 or later.
> >
> > In ext4, we choose to retain the fs-verity metadata past the end of the
> > file rather than trying to move it into an external inode xattr, since
> > in practice keeping the metadata in-line actually results in the
> > simplest and most efficient implementation. One non-obvious advantage
> > of keeping the verity metadata in-line is that when fs-verity is
> > combined with fscrypt, the verity metadata naturally gets encrypted too;
> > this is actually necessary because it contains hashes of the plaintext.
>
> On the plus side, this means that the verity data will automatically be
> invalidated if the file is truncated or extended, but on the negative side
> it means that the verity Merkle tree needs to be recalculated for the
> entire file if e.g. the file is appended to.
>
> I guess the current implementation will generate the Merkle tree in
> userspace, but at some point it might be useful to generate it on-the-fly
> to have proper data integrity from the time of write (e.g. like ZFS)
> rather than only allowing it to be stored after the entire file is written?
>
> Storing the Merkle tree in a large xattr inode would allow this to change
> in the future rather than being stuck with the current implementation. We
> could encrypt the xattr data just as easily as the file data (which should
> be done anyway even for non-verity files to avoid leaking data), and having
> the verity attr keyed to the inode version/size/mime(?) would ensure the
> kernel knows it is stale if the inode is modified.
>
> I'm not going to stand on my head and block this implementation, I just
> thought it is worthwhile to raise these issues now rather than after it
> is a fait accompli.
>
That would actually be the least of the problems for adding write support.
Adding write support would require at least:
- A way to maintain consistency between the data and hashes, including all
levels of hashes, since corruption after a crash (especially of potentially
the entire file!) is unacceptable. The main options for solving this are data
journalling, copy-on-write, and log-structured volume. But it's very hard to
retrofit existing filesystems with new consistency mechanisms. Data
journalling can always be used, but is very slow.
- An on-disk format that allows dynamically growing/shrinking each level of the
Merkle tree; or, using a different authenticated dictionary structure, such as
an authenticated skiplist rather than a Merkle tree. This would drastically
increase the complexity over a regular Merkle tree.
Compare it to dm-verity vs. dm-integrity. dm-verity is read-only and very
simple; the kernel just uses a Merkle tree that is generated by userspace.
On the other hand, dm-integrity supports writes but is slow, much more complex,
and doesn't even actually do full-device authentication since it authenticates
each sector independently, i.e. there is no Merkle tree.
I don't think it would make sense for the same device-mapper target to support
these quite different use cases. And the same general concepts apply at the
filesystem level; for these reasons and others (note that per-block checksums
like btrfs and ZFS wouldn't need a Merkle tree), write support is very
intentionally outside the scope of fs-verity.
So I think any arguments for doing things differently in fs-verity need to be
made in the context of read-only files.
Thanks,
Eric
On Friday, November 2, 2018 4:22:28 AM IST Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> Add basic fs-verity support to ext4. fs-verity is a filesystem feature
> that enables transparent integrity protection and authentication of
> read-only files. It uses a dm-verity like mechanism at the file level:
> a Merkle tree is used to verify any block in the file in log(filesize)
> time. It is implemented mainly by helper functions in fs/verity/.
> See Documentation/filesystems/fsverity.rst for details.
>
> This patch adds everything except the data verification hooks that will
> needed in ->readpages().
>
> On ext4, enabling fs-verity on a file requires that the filesystem has
> the 'verity' feature, e.g. that it was formatted with
> 'mkfs.ext4 -O verity' or had 'tune2fs -O verity' run on it.
> This requires e2fsprogs 1.44.4-2 or later.
>
> In ext4, we choose to retain the fs-verity metadata past the end of the
> file rather than trying to move it into an external inode xattr, since
> in practice keeping the metadata in-line actually results in the
> simplest and most efficient implementation. One non-obvious advantage
> of keeping the verity metadata in-line is that when fs-verity is
> combined with fscrypt, the verity metadata naturally gets encrypted too;
> this is actually necessary because it contains hashes of the plaintext.
>
> We also choose to keep the on-disk i_size equal to the original file
> size, in order to make the 'verity' feature a RO_COMPAT feature. Thus,
> ext4 has to find the fsverity_footer by looking in the last extent.
>
> Co-developed-by: Theodore Ts'o <[email protected]>
> Signed-off-by: Theodore Ts'o <[email protected]>
> Signed-off-by: Eric Biggers <[email protected]>
> ---
> fs/ext4/Kconfig | 20 +++++++++++
> fs/ext4/ext4.h | 20 ++++++++++-
> fs/ext4/file.c | 6 ++++
> fs/ext4/inode.c | 8 +++++
> fs/ext4/ioctl.c | 12 +++++++
> fs/ext4/super.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++++
> fs/ext4/sysfs.c | 6 ++++
> 7 files changed, 162 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig
> index a453cc87082b5..5a76125ac0f8a 100644
> --- a/fs/ext4/Kconfig
> +++ b/fs/ext4/Kconfig
> @@ -111,6 +111,26 @@ config EXT4_FS_ENCRYPTION
> default y
> depends on EXT4_ENCRYPTION
>
> +config EXT4_FS_VERITY
> + bool "Ext4 Verity"
> + depends on EXT4_FS
> + select FS_VERITY
> + help
> + This option enables fs-verity for ext4. fs-verity is the
> + dm-verity mechanism implemented at the file level. Userspace
> + can append a Merkle tree (hash tree) to a file, then enable
> + fs-verity on the file. ext4 will then transparently verify
> + any data read from the file against the Merkle tree. The file
> + is also made read-only.
> +
> + This serves as an integrity check, but the availability of the
> + Merkle tree root hash also allows efficiently supporting
> + various use cases where normally the whole file would need to
> + be hashed at once, such as auditing and authenticity
> + verification (appraisal).
> +
> + If unsure, say N.
> +
> config EXT4_DEBUG
> bool "EXT4 debugging support"
> depends on EXT4_FS
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 12f90d48ba613..e5475a629ed80 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -43,6 +43,9 @@
> #define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_EXT4_FS_ENCRYPTION)
> #include <linux/fscrypt.h>
>
> +#define __FS_HAS_VERITY IS_ENABLED(CONFIG_EXT4_FS_VERITY)
> +#include <linux/fsverity.h>
> +
> #include <linux/compiler.h>
>
> /* Until this gets included into linux/compiler-gcc.h */
> @@ -405,6 +408,7 @@ struct flex_groups {
> #define EXT4_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/
> #define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */
> #define EXT4_EXTENTS_FL 0x00080000 /* Inode uses extents */
> +#define EXT4_VERITY_FL 0x00100000 /* Verity protected inode */
> #define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
> #define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
> #define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
> @@ -472,6 +476,7 @@ enum {
> EXT4_INODE_TOPDIR = 17, /* Top of directory hierarchies*/
> EXT4_INODE_HUGE_FILE = 18, /* Set to each huge file */
> EXT4_INODE_EXTENTS = 19, /* Inode uses extents */
> + EXT4_INODE_VERITY = 20, /* Verity protected inode */
> EXT4_INODE_EA_INODE = 21, /* Inode used for large EA */
> EXT4_INODE_EOFBLOCKS = 22, /* Blocks allocated beyond EOF */
> EXT4_INODE_INLINE_DATA = 28, /* Data in inode. */
> @@ -517,6 +522,7 @@ static inline void ext4_check_flag_values(void)
> CHECK_FLAG_VALUE(TOPDIR);
> CHECK_FLAG_VALUE(HUGE_FILE);
> CHECK_FLAG_VALUE(EXTENTS);
> + CHECK_FLAG_VALUE(VERITY);
> CHECK_FLAG_VALUE(EA_INODE);
> CHECK_FLAG_VALUE(EOFBLOCKS);
> CHECK_FLAG_VALUE(INLINE_DATA);
> @@ -1654,6 +1660,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
> #define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM 0x0400
> #define EXT4_FEATURE_RO_COMPAT_READONLY 0x1000
> #define EXT4_FEATURE_RO_COMPAT_PROJECT 0x2000
> +#define EXT4_FEATURE_RO_COMPAT_VERITY 0x8000
>
> #define EXT4_FEATURE_INCOMPAT_COMPRESSION 0x0001
> #define EXT4_FEATURE_INCOMPAT_FILETYPE 0x0002
> @@ -1742,6 +1749,7 @@ EXT4_FEATURE_RO_COMPAT_FUNCS(bigalloc, BIGALLOC)
> EXT4_FEATURE_RO_COMPAT_FUNCS(metadata_csum, METADATA_CSUM)
> EXT4_FEATURE_RO_COMPAT_FUNCS(readonly, READONLY)
> EXT4_FEATURE_RO_COMPAT_FUNCS(project, PROJECT)
> +EXT4_FEATURE_RO_COMPAT_FUNCS(verity, VERITY)
>
> EXT4_FEATURE_INCOMPAT_FUNCS(compression, COMPRESSION)
> EXT4_FEATURE_INCOMPAT_FUNCS(filetype, FILETYPE)
> @@ -1797,7 +1805,8 @@ EXT4_FEATURE_INCOMPAT_FUNCS(encrypt, ENCRYPT)
> EXT4_FEATURE_RO_COMPAT_BIGALLOC |\
> EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|\
> EXT4_FEATURE_RO_COMPAT_QUOTA |\
> - EXT4_FEATURE_RO_COMPAT_PROJECT)
> + EXT4_FEATURE_RO_COMPAT_PROJECT |\
> + EXT4_FEATURE_RO_COMPAT_VERITY)
>
> #define EXTN_FEATURE_FUNCS(ver) \
> static inline bool ext4_has_unknown_ext##ver##_compat_features(struct super_block *sb) \
> @@ -2293,6 +2302,15 @@ static inline bool ext4_encrypted_inode(struct inode *inode)
> return ext4_test_inode_flag(inode, EXT4_INODE_ENCRYPT);
> }
>
> +static inline bool ext4_verity_inode(struct inode *inode)
> +{
> +#ifdef CONFIG_EXT4_FS_VERITY
> + return ext4_test_inode_flag(inode, EXT4_INODE_VERITY);
> +#else
> + return false;
> +#endif
> +}
> +
Hi Eric,
Can you please explain as to why we check for the presence of
EXT4_INODE_VERITY flag only when fsverity is enabled during kernel build?
--
chandan
From: Eric Biggers <[email protected]>
For ease of use, add optional support for having fs-verity handle a
portion of the authentication policy in the kernel. A ".fs-verity"
keyring is created to which trusted X.509 certificates can be added;
then a sysctl 'fs.verity.require_signatures' can be set to cause the
kernel to enforce that all fs-verity files contain a signature of their
file measurement, signed by a key in this keyring.
See Documentation/filesystem/fsverity.rst for more information,
namely the "Built-in file signatures" section.
Signed-off-by: Eric Biggers <[email protected]>
---
fs/verity/Kconfig | 17 ++++
fs/verity/Makefile | 2 +
fs/verity/fsverity_private.h | 34 +++++++
fs/verity/setup.c | 63 +++++++++++-
fs/verity/signature.c | 187 ++++++++++++++++++++++++++++++++++
include/uapi/linux/fsverity.h | 10 ++
6 files changed, 311 insertions(+), 2 deletions(-)
create mode 100644 fs/verity/signature.c
diff --git a/fs/verity/Kconfig b/fs/verity/Kconfig
index 102c46ebe275f..a7470a2e4892f 100644
--- a/fs/verity/Kconfig
+++ b/fs/verity/Kconfig
@@ -33,3 +33,20 @@ config FS_VERITY_DEBUG
Enable debugging messages related to fs-verity by default.
Say N unless you are an fs-verity developer.
+
+config FS_VERITY_BUILTIN_SIGNATURES
+ bool "FS Verity builtin signature support"
+ depends on FS_VERITY
+ select SYSTEM_DATA_VERIFICATION
+ help
+ Support verifying signatures of verity files against the X.509
+ certificates that have been loaded into the ".fs-verity"
+ kernel keyring.
+
+ This is meant as a relatively simple mechanism that can be
+ used to provide an authenticity guarantee for verity files, as
+ an alternative to IMA appraisal. Userspace programs still
+ need to check that the verity bit is set in order to get an
+ authenticity guarantee.
+
+ If unsure, say N.
diff --git a/fs/verity/Makefile b/fs/verity/Makefile
index 6450925e3a8b7..d293ea2a1b393 100644
--- a/fs/verity/Makefile
+++ b/fs/verity/Makefile
@@ -1,3 +1,5 @@
obj-$(CONFIG_FS_VERITY) += fsverity.o
fsverity-y := hash_algs.o ioctl.o setup.o verify.o
+
+fsverity-$(CONFIG_FS_VERITY_BUILTIN_SIGNATURES) += signature.o
diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index c3a261a598557..4b39d0a5544ba 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -63,6 +63,7 @@ struct fsverity_info {
u8 root_hash[FS_VERITY_MAX_DIGEST_SIZE]; /* Merkle tree root hash */
u8 measurement[FS_VERITY_MAX_DIGEST_SIZE]; /* file measurement */
bool have_root_hash; /* have root hash from disk? */
+ bool have_signed_measurement; /* have measurement from signature? */
/* Starting blocks for each tree level. 'depth-1' is the root level. */
u64 hash_lvl_region_idx[FS_VERITY_MAX_LEVELS];
@@ -95,6 +96,39 @@ static inline bool set_fsverity_info(struct inode *inode,
return cmpxchg_release(&inode->i_verity_info, NULL, vi) == NULL;
}
+/* signature.c */
+#ifdef CONFIG_FS_VERITY_BUILTIN_SIGNATURES
+extern int fsverity_require_signatures;
+
+int fsverity_parse_pkcs7_signature_extension(struct fsverity_info *vi,
+ const void *raw_pkcs7,
+ size_t size);
+
+int __init fsverity_signature_init(void);
+
+void __exit fsverity_signature_exit(void);
+#else /* CONFIG_FS_VERITY_BUILTIN_SIGNATURES */
+
+#define fsverity_require_signatures 0
+
+static inline int
+fsverity_parse_pkcs7_signature_extension(struct fsverity_info *vi,
+ const void *raw_pkcs7, size_t size)
+{
+ pr_warn("PKCS#7 signatures not supported in this kernel build!\n");
+ return -EINVAL;
+}
+
+static inline int fsverity_signature_init(void)
+{
+ return 0;
+}
+
+static inline void fsverity_signature_exit(void)
+{
+}
+#endif /* !CONFIG_FS_VERITY_BUILTIN_SIGNATURES */
+
/* verify.c */
extern struct workqueue_struct *fsverity_read_workqueue;
diff --git a/fs/verity/setup.c b/fs/verity/setup.c
index e0b39c518b890..08b609127531b 100644
--- a/fs/verity/setup.c
+++ b/fs/verity/setup.c
@@ -132,6 +132,10 @@ static const struct extension_type {
[FS_VERITY_EXT_SALT] = {
.parse = parse_salt_extension,
},
+ [FS_VERITY_EXT_PKCS7_SIGNATURE] = {
+ .parse = fsverity_parse_pkcs7_signature_extension,
+ .unauthenticated = true,
+ },
};
static int do_parse_extensions(struct fsverity_info *vi,
@@ -429,6 +433,54 @@ static int compute_measurement(const struct fsverity_info *vi,
return err;
}
+/*
+ * Compute the file's measurement; then, if a signature was present, verify that
+ * the signed measurement matches the actual one.
+ */
+static int
+verify_file_measurement(struct fsverity_info *vi,
+ const struct fsverity_descriptor *desc,
+ int desc_auth_len,
+ struct page *desc_pages[MAX_DESCRIPTOR_PAGES],
+ int nr_desc_pages)
+{
+ u8 measurement[FS_VERITY_MAX_DIGEST_SIZE];
+ int err;
+
+ err = compute_measurement(vi, desc, desc_auth_len, desc_pages,
+ nr_desc_pages, measurement);
+ if (err) {
+ pr_warn("Error computing fs-verity measurement: %d\n", err);
+ return err;
+ }
+
+ if (!vi->have_signed_measurement) {
+ pr_debug("Computed measurement: %s:%*phN (used desc_auth_len %d)\n",
+ vi->hash_alg->name, vi->hash_alg->digest_size,
+ measurement, desc_auth_len);
+ if (fsverity_require_signatures) {
+ pr_warn("require_signatures=1, rejecting unsigned file!\n");
+ return -EBADMSG;
+ }
+ memcpy(vi->measurement, measurement, vi->hash_alg->digest_size);
+ return 0;
+ }
+
+ if (!memcmp(measurement, vi->measurement, vi->hash_alg->digest_size)) {
+ pr_debug("Verified measurement: %s:%*phN (used desc_auth_len %d)\n",
+ vi->hash_alg->name, vi->hash_alg->digest_size,
+ measurement, desc_auth_len);
+ return 0;
+ }
+
+ pr_warn("FILE CORRUPTED (actual measurement mismatches signed measurement): "
+ "want %s:%*phN, real %s:%*phN (used desc_auth_len %d)\n",
+ vi->hash_alg->name, vi->hash_alg->digest_size, vi->measurement,
+ vi->hash_alg->name, vi->hash_alg->digest_size, measurement,
+ desc_auth_len);
+ return -EBADMSG;
+}
+
static struct fsverity_info *alloc_fsverity_info(void)
{
return kmem_cache_zalloc(fsverity_info_cachep, GFP_NOFS);
@@ -674,8 +726,8 @@ struct fsverity_info *create_fsverity_info(struct inode *inode, bool enabling)
err = compute_tree_depth_and_offsets(vi);
if (err)
goto out;
- err = compute_measurement(vi, desc, desc_auth_len, desc_pages,
- nr_desc_pages, vi->measurement);
+ err = verify_file_measurement(vi, desc, desc_auth_len,
+ desc_pages, nr_desc_pages);
out:
if (desc)
unmap_fsverity_descriptor(desc, desc_pages, nr_desc_pages);
@@ -825,11 +877,17 @@ static int __init fsverity_module_init(void)
if (!fsverity_info_cachep)
goto error_free_workqueue;
+ err = fsverity_signature_init();
+ if (err)
+ goto error_free_info_cache;
+
fsverity_check_hash_algs();
pr_debug("Initialized fs-verity\n");
return 0;
+error_free_info_cache:
+ kmem_cache_destroy(fsverity_info_cachep);
error_free_workqueue:
destroy_workqueue(fsverity_read_workqueue);
error:
@@ -840,6 +898,7 @@ static void __exit fsverity_module_exit(void)
{
destroy_workqueue(fsverity_read_workqueue);
kmem_cache_destroy(fsverity_info_cachep);
+ fsverity_signature_exit();
fsverity_exit_hash_algs();
}
diff --git a/fs/verity/signature.c b/fs/verity/signature.c
new file mode 100644
index 0000000000000..e13b25becbc6f
--- /dev/null
+++ b/fs/verity/signature.c
@@ -0,0 +1,187 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fs/verity/signature.c: verification of builtin signatures
+ *
+ * Copyright 2018 Google LLC
+ *
+ * Written by Eric Biggers.
+ */
+
+#include "fsverity_private.h"
+
+#include <linux/cred.h>
+#include <linux/key.h>
+#include <linux/verification.h>
+
+/*
+ * /proc/sys/fs/verity/require_signatures
+ * If 1, all verity files must have a valid builtin signature.
+ */
+int fsverity_require_signatures;
+
+/*
+ * Keyring that contains the trusted X.509 certificates.
+ *
+ * Only root (kuid=0) can modify this. Also, root may use
+ * keyctl_restrict_keyring() to prevent any more additions.
+ */
+static struct key *fsverity_keyring;
+
+static int extract_measurement(void *ctx, const void *data, size_t len,
+ size_t asn1hdrlen)
+{
+ struct fsverity_info *vi = ctx;
+ const struct fsverity_digest_disk *d;
+ const struct fsverity_hash_alg *hash_alg;
+
+ if (len < sizeof(*d)) {
+ pr_warn("Signed file measurement has unrecognized format\n");
+ return -EBADMSG;
+ }
+ d = (const void *)data;
+
+ hash_alg = fsverity_get_hash_alg(le16_to_cpu(d->digest_algorithm));
+ if (IS_ERR(hash_alg))
+ return PTR_ERR(hash_alg);
+
+ if (le16_to_cpu(d->digest_size) != hash_alg->digest_size) {
+ pr_warn("Wrong digest_size in signed measurement: wanted %u for algorithm %s, but got %u\n",
+ hash_alg->digest_size, hash_alg->name,
+ le16_to_cpu(d->digest_size));
+ return -EBADMSG;
+ }
+
+ if (len < sizeof(*d) + hash_alg->digest_size) {
+ pr_warn("Signed file measurement is truncated\n");
+ return -EBADMSG;
+ }
+
+ if (hash_alg != vi->hash_alg) {
+ pr_warn("Signed file measurement uses %s, but file uses %s\n",
+ hash_alg->name, vi->hash_alg->name);
+ return -EBADMSG;
+ }
+
+ memcpy(vi->measurement, d->digest, hash_alg->digest_size);
+ vi->have_signed_measurement = true;
+ return 0;
+}
+
+/**
+ * fsverity_parse_pkcs7_signature_extension - verify the signed file measurement
+ *
+ * Verify a signed fsverity_measurement against the certificates in the
+ * fs-verity keyring. The signature is given as a PKCS#7 formatted message, and
+ * the signed data is included in the message (not detached).
+ *
+ * Return: 0 if the signature checks out and the signed measurement is
+ * well-formed and uses the expected hash algorithm; -EBADMSG on signature
+ * verification failure or malformed data; else another -errno code.
+ */
+int fsverity_parse_pkcs7_signature_extension(struct fsverity_info *vi,
+ const void *raw_pkcs7, size_t size)
+{
+ int err;
+
+ if (vi->have_signed_measurement) {
+ pr_warn("Found multiple PKCS#7 signatures\n");
+ return -EBADMSG;
+ }
+
+ if (!vi->hash_alg->cryptographic) {
+ /* Might as well check this... */
+ pr_warn("Found signed %s file measurement, but %s isn't a cryptographic hash algorithm.\n",
+ vi->hash_alg->name, vi->hash_alg->name);
+ return -EBADMSG;
+ }
+
+ err = verify_pkcs7_signature(NULL, 0, raw_pkcs7, size, fsverity_keyring,
+ VERIFYING_UNSPECIFIED_SIGNATURE,
+ extract_measurement, vi);
+ if (err)
+ pr_warn("PKCS#7 signature verification error: %d\n", err);
+
+ return err;
+}
+
+#ifdef CONFIG_SYSCTL
+static int zero;
+static int one = 1;
+static struct ctl_table_header *fsverity_sysctl_header;
+
+static const struct ctl_path fsverity_sysctl_path[] = {
+ { .procname = "fs", },
+ { .procname = "verity", },
+ { }
+};
+
+static struct ctl_table fsverity_sysctl_table[] = {
+ {
+ .procname = "require_signatures",
+ .data = &fsverity_require_signatures,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &zero,
+ .extra2 = &one,
+ },
+ { }
+};
+
+static int __init fsverity_sysctl_init(void)
+{
+ fsverity_sysctl_header = register_sysctl_paths(fsverity_sysctl_path,
+ fsverity_sysctl_table);
+ if (!fsverity_sysctl_header) {
+ pr_warn("sysctl registration failed!");
+ return -ENOMEM;
+ }
+ return 0;
+}
+
+static void __exit fsverity_sysctl_exit(void)
+{
+ unregister_sysctl_table(fsverity_sysctl_header);
+}
+#else /* CONFIG_SYSCTL */
+static inline int fsverity_sysctl_init(void)
+{
+ return 0;
+}
+
+static inline void fsverity_sysctl_exit(void)
+{
+}
+#endif /* !CONFIG_SYSCTL */
+
+int __init fsverity_signature_init(void)
+{
+ struct key *ring;
+ int err;
+
+ ring = keyring_alloc(".fs-verity", KUIDT_INIT(0), KGIDT_INIT(0),
+ current_cred(),
+ ((KEY_POS_ALL & ~KEY_POS_SETATTR) |
+ KEY_USR_VIEW | KEY_USR_READ |
+ KEY_USR_WRITE | KEY_USR_SEARCH | KEY_USR_SETATTR),
+ KEY_ALLOC_NOT_IN_QUOTA, NULL, NULL);
+ if (IS_ERR(ring))
+ return PTR_ERR(ring);
+
+ err = fsverity_sysctl_init();
+ if (err)
+ goto error_put_ring;
+
+ fsverity_keyring = ring;
+ return 0;
+
+error_put_ring:
+ key_put(ring);
+ return err;
+}
+
+void __exit fsverity_signature_exit(void)
+{
+ key_put(fsverity_keyring);
+ fsverity_sysctl_exit();
+}
diff --git a/include/uapi/linux/fsverity.h b/include/uapi/linux/fsverity.h
index a96bbf87077de..b030589b8fd93 100644
--- a/include/uapi/linux/fsverity.h
+++ b/include/uapi/linux/fsverity.h
@@ -56,6 +56,7 @@ struct fsverity_descriptor {
/* Extension types */
#define FS_VERITY_EXT_ROOT_HASH 1
#define FS_VERITY_EXT_SALT 2
+#define FS_VERITY_EXT_PKCS7_SIGNATURE 3
/* Header of each extension (variable-length metadata item) */
struct fsverity_extension {
@@ -78,6 +79,15 @@ struct fsverity_extension {
/* FS_VERITY_EXT_SALT payload is just a byte array, any size */
+/*
+ * FS_VERITY_EXT_PKCS7_SIGNATURE payload is a DER-encoded PKCS#7 message
+ * containing the signed file measurement in the following format:
+ */
+struct fsverity_digest_disk {
+ __le16 digest_algorithm;
+ __le16 digest_size;
+ __u8 digest[];
+};
/* Fields stored at the very end of the file */
struct fsverity_footer {
--
2.19.1.568.g152ad8e336-goog
From: Eric Biggers <[email protected]>
Add a function for filesystems to call to implement the
FS_IOC_MEASURE_VERITY ioctl. This ioctl retrieves the file measurement
hash that fs-verity calculated for the given file and is enforcing for
reads; i.e., reads that don't match this hash will fail. This ioctl can
be used for logging or authentication of file hashes in userspace.
This ioctl is documented in Documentation/filesystem/fsverity.rst;
see there for more information.
Signed-off-by: Eric Biggers <[email protected]>
---
fs/verity/ioctl.c | 47 ++++++++++++++++++++++++++++++++++++++++
fs/verity/setup.c | 4 +++-
include/linux/fsverity.h | 6 +++++
3 files changed, 56 insertions(+), 1 deletion(-)
diff --git a/fs/verity/ioctl.c b/fs/verity/ioctl.c
index c5f0022cb3bef..640aa78e1c00c 100644
--- a/fs/verity/ioctl.c
+++ b/fs/verity/ioctl.c
@@ -115,3 +115,50 @@ int fsverity_ioctl_enable(struct file *filp, const void __user *arg)
return err;
}
EXPORT_SYMBOL_GPL(fsverity_ioctl_enable);
+
+/**
+ * fsverity_ioctl_measure - get a verity file's measurement
+ *
+ * Retrieve the file measurement that the kernel is enforcing for reads from a
+ * verity file. See Documentation/filesystems/fsverity.rst, section
+ * 'FS_IOC_MEASURE_VERITY' for details.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int fsverity_ioctl_measure(struct file *filp, void __user *_uarg)
+{
+ const struct inode *inode = file_inode(filp);
+ struct fsverity_digest __user *uarg = _uarg;
+ const struct fsverity_info *vi;
+ const struct fsverity_hash_alg *hash_alg;
+ struct fsverity_digest arg;
+
+ vi = get_fsverity_info(inode);
+ if (!vi)
+ return -ENODATA; /* not a verity file */
+ hash_alg = vi->hash_alg;
+
+ /*
+ * The user specifies the digest_size their buffer has space for; we can
+ * return the digest if it fits in the available space. We write back
+ * the actual size, which may be shorter than the user-specified size.
+ */
+
+ if (get_user(arg.digest_size, &uarg->digest_size))
+ return -EFAULT;
+ if (arg.digest_size < hash_alg->digest_size)
+ return -EOVERFLOW;
+
+ memset(&arg, 0, sizeof(arg));
+ arg.digest_algorithm = hash_alg - fsverity_hash_algs;
+ arg.digest_size = hash_alg->digest_size;
+
+ if (copy_to_user(uarg, &arg, sizeof(arg)))
+ return -EFAULT;
+
+ if (copy_to_user(uarg->digest, vi->measurement, hash_alg->digest_size))
+ return -EFAULT;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fsverity_ioctl_measure);
diff --git a/fs/verity/setup.c b/fs/verity/setup.c
index 184bdc96abe51..e0b39c518b890 100644
--- a/fs/verity/setup.c
+++ b/fs/verity/setup.c
@@ -819,7 +819,9 @@ static int __init fsverity_module_init(void)
goto error;
err = -ENOMEM;
- fsverity_info_cachep = KMEM_CACHE(fsverity_info, SLAB_RECLAIM_ACCOUNT);
+ fsverity_info_cachep = KMEM_CACHE_USERCOPY(fsverity_info,
+ SLAB_RECLAIM_ACCOUNT,
+ measurement);
if (!fsverity_info_cachep)
goto error_free_workqueue;
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 5de50b52ccc70..c30c4f6ed411c 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -23,6 +23,7 @@ struct fsverity_operations {
/* ioctl.c */
extern int fsverity_ioctl_enable(struct file *filp, const void __user *arg);
+extern int fsverity_ioctl_measure(struct file *filp, void __user *arg);
/* setup.c */
extern int fsverity_file_open(struct inode *inode, struct file *filp);
@@ -51,6 +52,11 @@ static inline int fsverity_ioctl_enable(struct file *filp,
return -EOPNOTSUPP;
}
+static inline int fsverity_ioctl_measure(struct file *filp, void __user *arg)
+{
+ return -EOPNOTSUPP;
+}
+
/* setup.c */
static inline int fsverity_file_open(struct inode *inode, struct file *filp)
--
2.19.1.568.g152ad8e336-goog
From: Eric Biggers <[email protected]>
Add basic fs-verity support to ext4. fs-verity is a filesystem feature
that enables transparent integrity protection and authentication of
read-only files. It uses a dm-verity like mechanism at the file level:
a Merkle tree is used to verify any block in the file in log(filesize)
time. It is implemented mainly by helper functions in fs/verity/.
See Documentation/filesystems/fsverity.rst for details.
This patch adds everything except the data verification hooks that will
needed in ->readpages().
On ext4, enabling fs-verity on a file requires that the filesystem has
the 'verity' feature, e.g. that it was formatted with
'mkfs.ext4 -O verity' or had 'tune2fs -O verity' run on it.
This requires e2fsprogs 1.44.4-2 or later.
In ext4, we choose to retain the fs-verity metadata past the end of the
file rather than trying to move it into an external inode xattr, since
in practice keeping the metadata in-line actually results in the
simplest and most efficient implementation. One non-obvious advantage
of keeping the verity metadata in-line is that when fs-verity is
combined with fscrypt, the verity metadata naturally gets encrypted too;
this is actually necessary because it contains hashes of the plaintext.
We also choose to keep the on-disk i_size equal to the original file
size, in order to make the 'verity' feature a RO_COMPAT feature. Thus,
ext4 has to find the fsverity_footer by looking in the last extent.
Co-developed-by: Theodore Ts'o <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
fs/ext4/Kconfig | 20 +++++++++++
fs/ext4/ext4.h | 20 ++++++++++-
fs/ext4/file.c | 6 ++++
fs/ext4/inode.c | 8 +++++
fs/ext4/ioctl.c | 12 +++++++
fs/ext4/super.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/ext4/sysfs.c | 6 ++++
7 files changed, 162 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig
index a453cc87082b5..5a76125ac0f8a 100644
--- a/fs/ext4/Kconfig
+++ b/fs/ext4/Kconfig
@@ -111,6 +111,26 @@ config EXT4_FS_ENCRYPTION
default y
depends on EXT4_ENCRYPTION
+config EXT4_FS_VERITY
+ bool "Ext4 Verity"
+ depends on EXT4_FS
+ select FS_VERITY
+ help
+ This option enables fs-verity for ext4. fs-verity is the
+ dm-verity mechanism implemented at the file level. Userspace
+ can append a Merkle tree (hash tree) to a file, then enable
+ fs-verity on the file. ext4 will then transparently verify
+ any data read from the file against the Merkle tree. The file
+ is also made read-only.
+
+ This serves as an integrity check, but the availability of the
+ Merkle tree root hash also allows efficiently supporting
+ various use cases where normally the whole file would need to
+ be hashed at once, such as auditing and authenticity
+ verification (appraisal).
+
+ If unsure, say N.
+
config EXT4_DEBUG
bool "EXT4 debugging support"
depends on EXT4_FS
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 12f90d48ba613..e5475a629ed80 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -43,6 +43,9 @@
#define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_EXT4_FS_ENCRYPTION)
#include <linux/fscrypt.h>
+#define __FS_HAS_VERITY IS_ENABLED(CONFIG_EXT4_FS_VERITY)
+#include <linux/fsverity.h>
+
#include <linux/compiler.h>
/* Until this gets included into linux/compiler-gcc.h */
@@ -405,6 +408,7 @@ struct flex_groups {
#define EXT4_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/
#define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */
#define EXT4_EXTENTS_FL 0x00080000 /* Inode uses extents */
+#define EXT4_VERITY_FL 0x00100000 /* Verity protected inode */
#define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
#define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
#define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
@@ -472,6 +476,7 @@ enum {
EXT4_INODE_TOPDIR = 17, /* Top of directory hierarchies*/
EXT4_INODE_HUGE_FILE = 18, /* Set to each huge file */
EXT4_INODE_EXTENTS = 19, /* Inode uses extents */
+ EXT4_INODE_VERITY = 20, /* Verity protected inode */
EXT4_INODE_EA_INODE = 21, /* Inode used for large EA */
EXT4_INODE_EOFBLOCKS = 22, /* Blocks allocated beyond EOF */
EXT4_INODE_INLINE_DATA = 28, /* Data in inode. */
@@ -517,6 +522,7 @@ static inline void ext4_check_flag_values(void)
CHECK_FLAG_VALUE(TOPDIR);
CHECK_FLAG_VALUE(HUGE_FILE);
CHECK_FLAG_VALUE(EXTENTS);
+ CHECK_FLAG_VALUE(VERITY);
CHECK_FLAG_VALUE(EA_INODE);
CHECK_FLAG_VALUE(EOFBLOCKS);
CHECK_FLAG_VALUE(INLINE_DATA);
@@ -1654,6 +1660,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
#define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM 0x0400
#define EXT4_FEATURE_RO_COMPAT_READONLY 0x1000
#define EXT4_FEATURE_RO_COMPAT_PROJECT 0x2000
+#define EXT4_FEATURE_RO_COMPAT_VERITY 0x8000
#define EXT4_FEATURE_INCOMPAT_COMPRESSION 0x0001
#define EXT4_FEATURE_INCOMPAT_FILETYPE 0x0002
@@ -1742,6 +1749,7 @@ EXT4_FEATURE_RO_COMPAT_FUNCS(bigalloc, BIGALLOC)
EXT4_FEATURE_RO_COMPAT_FUNCS(metadata_csum, METADATA_CSUM)
EXT4_FEATURE_RO_COMPAT_FUNCS(readonly, READONLY)
EXT4_FEATURE_RO_COMPAT_FUNCS(project, PROJECT)
+EXT4_FEATURE_RO_COMPAT_FUNCS(verity, VERITY)
EXT4_FEATURE_INCOMPAT_FUNCS(compression, COMPRESSION)
EXT4_FEATURE_INCOMPAT_FUNCS(filetype, FILETYPE)
@@ -1797,7 +1805,8 @@ EXT4_FEATURE_INCOMPAT_FUNCS(encrypt, ENCRYPT)
EXT4_FEATURE_RO_COMPAT_BIGALLOC |\
EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|\
EXT4_FEATURE_RO_COMPAT_QUOTA |\
- EXT4_FEATURE_RO_COMPAT_PROJECT)
+ EXT4_FEATURE_RO_COMPAT_PROJECT |\
+ EXT4_FEATURE_RO_COMPAT_VERITY)
#define EXTN_FEATURE_FUNCS(ver) \
static inline bool ext4_has_unknown_ext##ver##_compat_features(struct super_block *sb) \
@@ -2293,6 +2302,15 @@ static inline bool ext4_encrypted_inode(struct inode *inode)
return ext4_test_inode_flag(inode, EXT4_INODE_ENCRYPT);
}
+static inline bool ext4_verity_inode(struct inode *inode)
+{
+#ifdef CONFIG_EXT4_FS_VERITY
+ return ext4_test_inode_flag(inode, EXT4_INODE_VERITY);
+#else
+ return false;
+#endif
+}
+
#ifdef CONFIG_EXT4_FS_ENCRYPTION
static inline int ext4_fname_setup_filename(struct inode *dir,
const struct qstr *iname,
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 69d65d49837bb..cb4b69ef01a22 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -444,6 +444,12 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
if (ret)
return ret;
+ if (ext4_verity_inode(inode)) {
+ ret = fsverity_file_open(inode, filp);
+ if (ret)
+ return ret;
+ }
+
/*
* Set up the jbd2_inode if we are opening the inode for
* writing and the journal is present
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 05f01fbd9c7fb..c624c83bbad26 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4723,6 +4723,8 @@ static bool ext4_should_use_dax(struct inode *inode)
return false;
if (ext4_encrypted_inode(inode))
return false;
+ if (ext4_verity_inode(inode))
+ return false;
return true;
}
@@ -5505,6 +5507,12 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
if (error)
return error;
+ if (ext4_verity_inode(inode)) {
+ error = fsverity_prepare_setattr(dentry, attr);
+ if (error)
+ return error;
+ }
+
if (is_quota_modification(inode, attr)) {
error = dquot_initialize(inode);
if (error)
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 0edee31913d1f..9bb6cc1ae8ceb 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -1020,6 +1020,16 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
case EXT4_IOC_GET_ENCRYPTION_POLICY:
return fscrypt_ioctl_get_policy(filp, (void __user *)arg);
+ case FS_IOC_ENABLE_VERITY:
+ if (!ext4_has_feature_verity(sb))
+ return -EOPNOTSUPP;
+ return fsverity_ioctl_enable(filp, (const void __user *)arg);
+
+ case FS_IOC_MEASURE_VERITY:
+ if (!ext4_has_feature_verity(sb))
+ return -EOPNOTSUPP;
+ return fsverity_ioctl_measure(filp, (void __user *)arg);
+
case EXT4_IOC_FSGETXATTR:
{
struct fsxattr fa;
@@ -1138,6 +1148,8 @@ long ext4_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case EXT4_IOC_SET_ENCRYPTION_POLICY:
case EXT4_IOC_GET_ENCRYPTION_PWSALT:
case EXT4_IOC_GET_ENCRYPTION_POLICY:
+ case FS_IOC_ENABLE_VERITY:
+ case FS_IOC_MEASURE_VERITY:
case EXT4_IOC_SHUTDOWN:
case FS_IOC_GETFSMAP:
break;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index a221f1cdf7046..c4a66b64ea604 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1144,6 +1144,7 @@ void ext4_clear_inode(struct inode *inode)
EXT4_I(inode)->jinode = NULL;
}
fscrypt_put_encryption_info(inode);
+ fsverity_cleanup_inode(inode);
}
static struct inode *ext4_nfs_get_inode(struct super_block *sb,
@@ -1315,6 +1316,93 @@ static const struct fscrypt_operations ext4_cryptops = {
};
#endif
+#ifdef CONFIG_EXT4_FS_VERITY
+static int ext4_set_verity(struct inode *inode, loff_t data_i_size)
+{
+ int err;
+ handle_t *handle;
+ struct ext4_iloc iloc;
+
+ err = ext4_convert_inline_data(inode);
+ if (err)
+ return err;
+
+ if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
+ ext4_warning_inode(inode,
+ "fs-verity is only allowed on extent-based files");
+ return -EINVAL;
+ }
+
+ /* Remove extents past EOF; see ext4_get_verity_full_size() */
+ err = ext4_truncate(inode);
+ if (err)
+ return err;
+
+ handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
+ if (IS_ERR(handle))
+ return PTR_ERR(handle);
+ err = ext4_reserve_inode_write(handle, inode, &iloc);
+ if (err == 0) {
+ ext4_set_inode_flag(inode, EXT4_INODE_VERITY);
+ EXT4_I(inode)->i_disksize = data_i_size;
+ err = ext4_mark_iloc_dirty(handle, inode, &iloc);
+ }
+ ext4_journal_stop(handle);
+
+ return err;
+}
+
+/*
+ * Retrieve the offset, in bytes, to the end of the verity metadata. Ext4
+ * stores the verity metadata beyond EOF, but sets the on-disk i_size to the
+ * original data size in order to make verity an RO_COMPAT filesystem feature.
+ * Therefore, it has to compute the end offset implicitly via the end of the
+ * last extent. Trailing zeroes after the footer are tolerated.
+ */
+static int ext4_get_metadata_end(struct inode *inode, loff_t *metadata_end_ret)
+{
+ struct ext4_ext_path *path;
+ struct ext4_extent *last_extent;
+ u32 end_lblk;
+ int err;
+
+ if (ext4_has_inline_data(inode)) {
+ EXT4_ERROR_INODE(inode, "verity file has inline data");
+ return -EFSCORRUPTED;
+ }
+
+ if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
+ EXT4_ERROR_INODE(inode, "verity file doesn't use extents");
+ return -EFSCORRUPTED;
+ }
+
+ path = ext4_find_extent(inode, EXT_MAX_BLOCKS - 1, NULL, 0);
+ if (IS_ERR(path))
+ return PTR_ERR(path);
+
+ last_extent = path[path->p_depth].p_ext;
+ if (!last_extent) {
+ EXT4_ERROR_INODE(inode, "verity file has no extents");
+ err = -EFSCORRUPTED;
+ goto out_drop_path;
+ }
+
+ end_lblk = le32_to_cpu(last_extent->ee_block) +
+ ext4_ext_get_actual_len(last_extent);
+ *metadata_end_ret = (loff_t)end_lblk << inode->i_blkbits;
+ err = 0;
+out_drop_path:
+ ext4_ext_drop_refs(path);
+ kfree(path);
+ return err;
+}
+
+static const struct fsverity_operations ext4_verityops = {
+ .set_verity = ext4_set_verity,
+ .get_metadata_end = ext4_get_metadata_end,
+};
+#endif /* CONFIG_EXT4_FS_VERITY */
+
#ifdef CONFIG_QUOTA
static const char * const quotatypes[] = INITQFNAMES;
#define QTYPE2NAME(t) (quotatypes[t])
@@ -4146,6 +4234,9 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
#ifdef CONFIG_EXT4_FS_ENCRYPTION
sb->s_cop = &ext4_cryptops;
#endif
+#ifdef CONFIG_EXT4_FS_VERITY
+ sb->s_vop = &ext4_verityops;
+#endif
#ifdef CONFIG_QUOTA
sb->dq_op = &ext4_quota_operations;
if (ext4_has_feature_quota(sb))
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index 9212a026a1f12..8e86087c2f039 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -227,6 +227,9 @@ EXT4_ATTR_FEATURE(meta_bg_resize);
#ifdef CONFIG_EXT4_FS_ENCRYPTION
EXT4_ATTR_FEATURE(encryption);
#endif
+#ifdef CONFIG_EXT4_FS_VERITY
+EXT4_ATTR_FEATURE(verity);
+#endif
EXT4_ATTR_FEATURE(metadata_csum_seed);
static struct attribute *ext4_feat_attrs[] = {
@@ -235,6 +238,9 @@ static struct attribute *ext4_feat_attrs[] = {
ATTR_LIST(meta_bg_resize),
#ifdef CONFIG_EXT4_FS_ENCRYPTION
ATTR_LIST(encryption),
+#endif
+#ifdef CONFIG_EXT4_FS_VERITY
+ ATTR_LIST(verity),
#endif
ATTR_LIST(metadata_csum_seed),
NULL,
--
2.19.1.568.g152ad8e336-goog
From: Eric Biggers <[email protected]>
Add the beginnings of fs-verity support, including:
- The fs-verity Kconfig option (CONFIG_FS_VERITY)
- The fs-verity UAPI declarations (uapi/linux/fsverity.h)
- The internal API header for filesystems to use (linux/fsverity.h)
- The "setup" code which parses the fs-verity descriptor to create an
fsverity_info structure that is attached to the in-memory inode; this
structure describes the Merkle tree properties and contains the file
measurement. This is called from the ->open() and ->getattr() hooks.
- Hash algorithm management; initially supporting SHA-256 only.
The actual ->readpages() data verification, the ioctl implementations,
ext4 and f2fs support, and other functionality comes in later patches.
For more information about fs-verity, see the documentation file
Documentation/filesystems/fsverity.rst.
Signed-off-by: Eric Biggers <[email protected]>
---
Documentation/ioctl/ioctl-number.txt | 1 +
fs/Kconfig | 2 +
fs/Makefile | 1 +
fs/verity/Kconfig | 35 ++
fs/verity/Makefile | 3 +
fs/verity/fsverity_private.h | 98 ++++
fs/verity/hash_algs.c | 106 ++++
fs/verity/setup.c | 823 +++++++++++++++++++++++++++
include/linux/fs.h | 9 +
include/linux/fsverity.h | 62 ++
include/uapi/linux/fsverity.h | 86 +++
11 files changed, 1226 insertions(+)
create mode 100644 fs/verity/Kconfig
create mode 100644 fs/verity/Makefile
create mode 100644 fs/verity/fsverity_private.h
create mode 100644 fs/verity/hash_algs.c
create mode 100644 fs/verity/setup.c
create mode 100644 include/linux/fsverity.h
create mode 100644 include/uapi/linux/fsverity.h
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index af6f6ba1fe804..e9ab862adbf90 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -224,6 +224,7 @@ Code Seq#(hex) Include File Comments
'f' 00-0F fs/ext4/ext4.h conflict!
'f' 00-0F linux/fs.h conflict!
'f' 00-0F fs/ocfs2/ocfs2_fs.h conflict!
+'f' 81-8F linux/fsverity.h
'g' 00-0F linux/usb/gadgetfs.h
'g' 20-2F linux/usb/g_printer.h
'h' 00-7F conflict! Charon filesystem
diff --git a/fs/Kconfig b/fs/Kconfig
index ac474a61be379..ddadc4e999429 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -105,6 +105,8 @@ config MANDATORY_FILE_LOCKING
source "fs/crypto/Kconfig"
+source "fs/verity/Kconfig"
+
source "fs/notify/Kconfig"
source "fs/quota/Kconfig"
diff --git a/fs/Makefile b/fs/Makefile
index 293733f61594b..10b37f651ffde 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -32,6 +32,7 @@ obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
obj-$(CONFIG_AIO) += aio.o
obj-$(CONFIG_FS_DAX) += dax.o
obj-$(CONFIG_FS_ENCRYPTION) += crypto/
+obj-$(CONFIG_FS_VERITY) += verity/
obj-$(CONFIG_FILE_LOCKING) += locks.o
obj-$(CONFIG_COMPAT) += compat.o compat_ioctl.o
obj-$(CONFIG_BINFMT_AOUT) += binfmt_aout.o
diff --git a/fs/verity/Kconfig b/fs/verity/Kconfig
new file mode 100644
index 0000000000000..102c46ebe275f
--- /dev/null
+++ b/fs/verity/Kconfig
@@ -0,0 +1,35 @@
+config FS_VERITY
+ tristate "FS Verity (read-only file-based authenticity protection)"
+ select CRYPTO
+ # SHA-256 is selected as it's intended to be the default hash algorithm.
+ # To avoid bloat, other wanted algorithms must be selected explicitly.
+ select CRYPTO_SHA256
+ help
+ This option enables fs-verity. fs-verity is the dm-verity
+ mechanism implemented at the file level. On supported
+ filesystems, userspace can append a Merkle tree (hash tree) to
+ a file, then enable fs-verity on the file. The filesystem
+ will then transparently verify any data read from the file
+ against the Merkle tree. The file is also made read-only.
+
+ This serves as an integrity check, but the availability of the
+ Merkle tree root hash also allows efficiently supporting
+ various use cases where normally the whole file would need to
+ be hashed at once, such as: (a) auditing (logging the file's
+ hash), or (b) authenticity verification (comparing the hash
+ against a known good value, e.g. from a digital signature).
+
+ fs-verity is especially useful on large files where not all
+ the contents may actually be needed. Also, fs-verity verifies
+ data each time it is paged back in, which provides better
+ protection against malicious disks vs. an ahead-of-time hash.
+
+ If unsure, say N.
+
+config FS_VERITY_DEBUG
+ bool "FS Verity debugging"
+ depends on FS_VERITY
+ help
+ Enable debugging messages related to fs-verity by default.
+
+ Say N unless you are an fs-verity developer.
diff --git a/fs/verity/Makefile b/fs/verity/Makefile
new file mode 100644
index 0000000000000..39e123805c827
--- /dev/null
+++ b/fs/verity/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_FS_VERITY) += fsverity.o
+
+fsverity-y := hash_algs.o setup.o
diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
new file mode 100644
index 0000000000000..acc29825a0ed7
--- /dev/null
+++ b/fs/verity/fsverity_private.h
@@ -0,0 +1,98 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * fs-verity: read-only file-based authenticity protection
+ *
+ * Copyright 2018 Google LLC
+ */
+
+#ifndef _FSVERITY_PRIVATE_H
+#define _FSVERITY_PRIVATE_H
+
+#ifdef CONFIG_FS_VERITY_DEBUG
+#define DEBUG
+#endif
+
+#define pr_fmt(fmt) "fs-verity: " fmt
+
+#include <crypto/sha.h>
+#define __FS_HAS_VERITY 1
+#include <linux/fsverity.h>
+
+/*
+ * Maximum depth of the Merkle tree. Up to 64 levels are theoretically possible
+ * with a very small block size, but we'd like to limit stack usage during
+ * verification, and in practice this is plenty. E.g., with SHA-256 and 4K
+ * blocks, a file with size UINT64_MAX bytes needs just 8 levels.
+ */
+#define FS_VERITY_MAX_LEVELS 16
+
+/*
+ * Largest digest size among all hash algorithms supported by fs-verity. This
+ * can be increased if needed.
+ */
+#define FS_VERITY_MAX_DIGEST_SIZE SHA256_DIGEST_SIZE
+
+/* A hash algorithm supported by fs-verity */
+struct fsverity_hash_alg {
+ struct crypto_ahash *tfm; /* allocated on demand */
+ const char *name;
+ unsigned int digest_size;
+ bool cryptographic;
+};
+
+/**
+ * fsverity_info - cached verity metadata for an inode
+ *
+ * When a verity file is first opened, an instance of this struct is allocated
+ * and stored in ->i_verity_info. It caches various values from the verity
+ * metadata, such as the tree topology and the root hash, which are needed to
+ * efficiently verify data read from the file. Once created, it remains until
+ * the inode is evicted.
+ *
+ * (The tree pages themselves are not cached here, though they may be cached in
+ * the inode's page cache.)
+ */
+struct fsverity_info {
+ const struct fsverity_hash_alg *hash_alg; /* hash algorithm */
+ u8 block_bits; /* log2(block size) */
+ u8 log_arity; /* log2(hashes per hash block) */
+ u8 depth; /* num levels in the Merkle tree */
+ u8 *hashstate; /* salted initial hash state */
+ loff_t data_i_size; /* original file size */
+ loff_t metadata_end; /* offset to end of verity metadata */
+ u8 root_hash[FS_VERITY_MAX_DIGEST_SIZE]; /* Merkle tree root hash */
+ u8 measurement[FS_VERITY_MAX_DIGEST_SIZE]; /* file measurement */
+ bool have_root_hash; /* have root hash from disk? */
+
+ /* Starting blocks for each tree level. 'depth-1' is the root level. */
+ u64 hash_lvl_region_idx[FS_VERITY_MAX_LEVELS];
+};
+
+/* hash_algs.c */
+extern struct fsverity_hash_alg fsverity_hash_algs[];
+const struct fsverity_hash_alg *fsverity_get_hash_alg(unsigned int num);
+void __init fsverity_check_hash_algs(void);
+void __exit fsverity_exit_hash_algs(void);
+
+/* setup.c */
+struct page *fsverity_read_metadata_page(struct inode *inode, pgoff_t index);
+struct fsverity_info *create_fsverity_info(struct inode *inode, bool enabling);
+void free_fsverity_info(struct fsverity_info *vi);
+
+static inline struct fsverity_info *get_fsverity_info(const struct inode *inode)
+{
+ /* pairs with cmpxchg_release() in set_fsverity_info() */
+ return smp_load_acquire(&inode->i_verity_info);
+}
+
+static inline bool set_fsverity_info(struct inode *inode,
+ struct fsverity_info *vi)
+{
+ /* Make sure the in-memory i_size is set to the data i_size */
+ i_size_write(inode, vi->data_i_size);
+
+ /* pairs with smp_load_acquire() in get_fsverity_info() */
+ return cmpxchg_release(&inode->i_verity_info, NULL, vi) == NULL;
+}
+
+#endif /* _FSVERITY_PRIVATE_H */
diff --git a/fs/verity/hash_algs.c b/fs/verity/hash_algs.c
new file mode 100644
index 0000000000000..9c19c9553f120
--- /dev/null
+++ b/fs/verity/hash_algs.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fs/verity/hash_algs.c: fs-verity hash algorithm management
+ *
+ * Copyright 2018 Google LLC
+ *
+ * Written by Eric Biggers.
+ */
+
+#include "fsverity_private.h"
+
+#include <crypto/hash.h>
+
+/* The list of hash algorithms supported by fs-verity */
+struct fsverity_hash_alg fsverity_hash_algs[] = {
+ [FS_VERITY_ALG_SHA256] = {
+ .name = "sha256",
+ .digest_size = 32,
+ .cryptographic = true,
+ },
+};
+
+/*
+ * Translate the given fs-verity hash algorithm number into a struct describing
+ * the algorithm, and ensure it has a hash transform ready to go. The hash
+ * transforms are allocated on-demand firstly to not waste resources when they
+ * aren't needed, and secondly because the fs-verity module may be loaded
+ * earlier than the needed crypto modules.
+ */
+const struct fsverity_hash_alg *fsverity_get_hash_alg(unsigned int num)
+{
+ struct fsverity_hash_alg *alg;
+ struct crypto_ahash *tfm;
+ int err;
+
+ if (num >= ARRAY_SIZE(fsverity_hash_algs) ||
+ !fsverity_hash_algs[num].digest_size) {
+ pr_warn("Unknown hash algorithm: %u\n", num);
+ return ERR_PTR(-EINVAL);
+ }
+ alg = &fsverity_hash_algs[num];
+retry:
+ /* pairs with cmpxchg_release() below */
+ tfm = smp_load_acquire(&alg->tfm);
+ if (tfm)
+ return alg;
+ /*
+ * Using the shash API would make things a bit simpler, but the ahash
+ * API is preferable as it allows the use of crypto accelerators.
+ */
+ tfm = crypto_alloc_ahash(alg->name, 0, 0);
+ if (IS_ERR(tfm)) {
+ if (PTR_ERR(tfm) == -ENOENT)
+ pr_warn("Algorithm %u (%s) is unavailable\n",
+ num, alg->name);
+ else
+ pr_warn("Error allocating algorithm %u (%s): %ld\n",
+ num, alg->name, PTR_ERR(tfm));
+ return ERR_CAST(tfm);
+ }
+
+ err = -EINVAL;
+ if (WARN_ON(alg->digest_size != crypto_ahash_digestsize(tfm)))
+ goto err_free_tfm;
+
+ pr_info("%s using implementation \"%s\"\n", alg->name,
+ crypto_hash_alg_common(tfm)->base.cra_driver_name);
+
+ /* pairs with smp_load_acquire() above */
+ if (cmpxchg_release(&alg->tfm, NULL, tfm) != NULL) {
+ crypto_free_ahash(tfm);
+ goto retry;
+ }
+
+ return alg;
+
+err_free_tfm:
+ crypto_free_ahash(tfm);
+ return ERR_PTR(err);
+}
+
+void __init fsverity_check_hash_algs(void)
+{
+ int i;
+
+ /*
+ * Sanity check the digest sizes (could be a build-time check, but
+ * they're in an array)
+ */
+ for (i = 0; i < ARRAY_SIZE(fsverity_hash_algs); i++) {
+ struct fsverity_hash_alg *alg = &fsverity_hash_algs[i];
+
+ if (!alg->digest_size)
+ continue;
+ BUG_ON(alg->digest_size > FS_VERITY_MAX_DIGEST_SIZE);
+ BUG_ON(!is_power_of_2(alg->digest_size));
+ }
+}
+
+void __exit fsverity_exit_hash_algs(void)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(fsverity_hash_algs); i++)
+ crypto_free_ahash(fsverity_hash_algs[i].tfm);
+}
diff --git a/fs/verity/setup.c b/fs/verity/setup.c
new file mode 100644
index 0000000000000..925970fbe084d
--- /dev/null
+++ b/fs/verity/setup.c
@@ -0,0 +1,823 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fs/verity/setup.c: fs-verity module initialization and descriptor parsing
+ *
+ * Copyright 2018 Google LLC
+ *
+ * Originally written by Jaegeuk Kim and Michael Halcrow;
+ * heavily rewritten by Eric Biggers.
+ */
+
+#include "fsverity_private.h"
+
+#include <crypto/hash.h>
+#include <linux/highmem.h>
+#include <linux/list_sort.h>
+#include <linux/module.h>
+#include <linux/pagemap.h>
+#include <linux/scatterlist.h>
+#include <linux/vmalloc.h>
+
+static struct kmem_cache *fsverity_info_cachep;
+
+static void dump_fsverity_descriptor(const struct fsverity_descriptor *desc)
+{
+ pr_debug("magic = %.*s\n", (int)sizeof(desc->magic), desc->magic);
+ pr_debug("major_version = %u\n", desc->major_version);
+ pr_debug("minor_version = %u\n", desc->minor_version);
+ pr_debug("log_data_blocksize = %u\n", desc->log_data_blocksize);
+ pr_debug("log_tree_blocksize = %u\n", desc->log_tree_blocksize);
+ pr_debug("data_algorithm = %u\n", le16_to_cpu(desc->data_algorithm));
+ pr_debug("tree_algorithm = %u\n", le16_to_cpu(desc->tree_algorithm));
+ pr_debug("flags = %#x\n", le32_to_cpu(desc->flags));
+ pr_debug("orig_file_size = %llu\n", le64_to_cpu(desc->orig_file_size));
+ pr_debug("auth_ext_count = %u\n", le16_to_cpu(desc->auth_ext_count));
+}
+
+/* Precompute the salted initial hash state */
+static int set_salt(struct fsverity_info *vi, const u8 *salt, size_t saltlen)
+{
+ struct crypto_ahash *tfm = vi->hash_alg->tfm;
+ struct ahash_request *req;
+ unsigned int reqsize = sizeof(*req) + crypto_ahash_reqsize(tfm);
+ struct scatterlist sg;
+ DECLARE_CRYPTO_WAIT(wait);
+ u8 *saltbuf;
+ int err;
+
+ vi->hashstate = kmalloc(crypto_ahash_statesize(tfm), GFP_KERNEL);
+ if (!vi->hashstate)
+ return -ENOMEM;
+ /* On error, vi->hashstate is freed by free_fsverity_info() */
+
+ /*
+ * Allocate a hash request buffer. Also reserve space for a copy of
+ * the salt, since the given 'salt' may point into vmap'ed memory, so
+ * sg_init_one() may not work on it.
+ */
+ req = kmalloc(reqsize + saltlen, GFP_KERNEL);
+ if (!req)
+ return -ENOMEM;
+ saltbuf = (u8 *)req + reqsize;
+ memcpy(saltbuf, salt, saltlen);
+ sg_init_one(&sg, saltbuf, saltlen);
+
+ ahash_request_set_tfm(req, tfm);
+ ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP |
+ CRYPTO_TFM_REQ_MAY_BACKLOG,
+ crypto_req_done, &wait);
+ ahash_request_set_crypt(req, &sg, NULL, saltlen);
+
+ err = crypto_wait_req(crypto_ahash_init(req), &wait);
+ if (err)
+ goto out;
+ err = crypto_wait_req(crypto_ahash_update(req), &wait);
+ if (err)
+ goto out;
+ err = crypto_ahash_export(req, vi->hashstate);
+out:
+ kfree(req);
+ return err;
+}
+
+/*
+ * Copy in the root hash stored on disk.
+ *
+ * Note that the root hash could be computed by hashing the root block of the
+ * Merkle tree. But it works out a bit simpler to store the hash separately;
+ * then it gets included in the file measurement without special-casing it, and
+ * the root block gets verified on the ->readpages() path like the other blocks.
+ */
+static int parse_root_hash_extension(struct fsverity_info *vi,
+ const void *hash, size_t size)
+{
+ const struct fsverity_hash_alg *alg = vi->hash_alg;
+
+ if (vi->have_root_hash) {
+ pr_warn("Multiple root hashes were found!\n");
+ return -EINVAL;
+ }
+ if (size != alg->digest_size) {
+ pr_warn("Wrong root hash size; got %zu bytes, but expected %u for hash algorithm %s\n",
+ size, alg->digest_size, alg->name);
+ return -EINVAL;
+ }
+ memcpy(vi->root_hash, hash, size);
+ vi->have_root_hash = true;
+ pr_debug("Root hash: %s:%*phN\n", alg->name,
+ alg->digest_size, vi->root_hash);
+ return 0;
+}
+
+static int parse_salt_extension(struct fsverity_info *vi,
+ const void *salt, size_t saltlen)
+{
+ if (vi->hashstate) {
+ pr_warn("Multiple salts were found!\n");
+ return -EINVAL;
+ }
+ return set_salt(vi, salt, saltlen);
+}
+
+/* The available types of extensions (variable-length metadata items) */
+static const struct extension_type {
+ int (*parse)(struct fsverity_info *vi, const void *_ext,
+ size_t extra_len);
+ size_t base_len; /* length of fixed-size part of payload, if any */
+ bool unauthenticated; /* true if not included in file measurement */
+} extension_types[] = {
+ [FS_VERITY_EXT_ROOT_HASH] = {
+ .parse = parse_root_hash_extension,
+ },
+ [FS_VERITY_EXT_SALT] = {
+ .parse = parse_salt_extension,
+ },
+};
+
+static int do_parse_extensions(struct fsverity_info *vi,
+ const struct fsverity_extension **ext_hdr_p,
+ const void *end, int count, bool authenticated)
+{
+ const struct fsverity_extension *ext_hdr = *ext_hdr_p;
+ int i;
+ int err;
+
+ for (i = 0; i < count; i++) {
+ const struct extension_type *type;
+ u32 len, rounded_len;
+ u16 type_code;
+
+ if (end - (const void *)ext_hdr < sizeof(*ext_hdr)) {
+ pr_warn("Extension list overflows buffer\n");
+ return -EINVAL;
+ }
+ type_code = le16_to_cpu(ext_hdr->type);
+ if (type_code >= ARRAY_SIZE(extension_types) ||
+ !extension_types[type_code].parse) {
+ pr_warn("Unknown extension type: %u\n", type_code);
+ return -EINVAL;
+ }
+ type = &extension_types[type_code];
+ if (authenticated != !type->unauthenticated) {
+ pr_warn("Extension type %u must be %sauthenticated\n",
+ type_code, type->unauthenticated ? "un" : "");
+ return -EINVAL;
+ }
+ if (ext_hdr->reserved) {
+ pr_warn("Reserved bits set in extension header\n");
+ return -EINVAL;
+ }
+ len = le32_to_cpu(ext_hdr->length);
+ if (len < sizeof(*ext_hdr)) {
+ pr_warn("Invalid length in extension header\n");
+ return -EINVAL;
+ }
+ rounded_len = round_up(len, 8);
+ if (rounded_len == 0 ||
+ rounded_len > end - (const void *)ext_hdr) {
+ pr_warn("Extension item overflows buffer\n");
+ return -EINVAL;
+ }
+ if (len < sizeof(*ext_hdr) + type->base_len) {
+ pr_warn("Extension length too small for type\n");
+ return -EINVAL;
+ }
+ err = type->parse(vi, ext_hdr + 1,
+ len - sizeof(*ext_hdr) - type->base_len);
+ if (err)
+ return err;
+ ext_hdr = (const void *)ext_hdr + rounded_len;
+ }
+ *ext_hdr_p = ext_hdr;
+ return 0;
+}
+
+/*
+ * Parse the extension items following the fixed-size portion of the fs-verity
+ * descriptor. The fsverity_info is updated accordingly.
+ *
+ * Return: On success, the size of the authenticated portion of the descriptor
+ * (the fixed-size portion plus the authenticated extensions).
+ * Otherwise, a -errno value.
+ */
+static int parse_extensions(struct fsverity_info *vi,
+ const struct fsverity_descriptor *desc,
+ int desc_len)
+{
+ const struct fsverity_extension *ext_hdr = (const void *)(desc + 1);
+ const void *end = (const void *)desc + desc_len;
+ u16 auth_ext_count = le16_to_cpu(desc->auth_ext_count);
+ int auth_desc_len;
+ int err;
+
+ /* Authenticated extensions */
+ err = do_parse_extensions(vi, &ext_hdr, end, auth_ext_count, true);
+ if (err)
+ return err;
+ auth_desc_len = (void *)ext_hdr - (void *)desc;
+
+ /*
+ * Unauthenticated extensions (optional). Careful: an attacker able to
+ * corrupt the file can change these arbitrarily without being detected.
+ * Thus, only specific types of extensions are whitelisted here --
+ * namely, the ones containing a signature of the file measurement,
+ * which by definition can't be included in the file measurement itself.
+ */
+ if (end - (void *)ext_hdr >= 8) {
+ u16 unauth_ext_count = le16_to_cpup((__le16 *)ext_hdr);
+
+ ext_hdr = (void *)ext_hdr + 8;
+ err = do_parse_extensions(vi, &ext_hdr, end,
+ unauth_ext_count, false);
+ if (err)
+ return err;
+ }
+
+ return auth_desc_len;
+}
+
+/*
+ * Parse an fs-verity descriptor, loading information into the fsverity_info.
+ *
+ * Return: On success, the size of the authenticated portion of the descriptor
+ * (the fixed-size portion plus the authenticated extensions).
+ * Otherwise, a -errno value.
+ */
+static int parse_fsverity_descriptor(struct fsverity_info *vi,
+ const struct fsverity_descriptor *desc,
+ int desc_len)
+{
+ unsigned int alg_num;
+ unsigned int hashes_per_block;
+ int desc_auth_len;
+ int err;
+
+ BUILD_BUG_ON(sizeof(*desc) != 64);
+
+ /* magic */
+ if (memcmp(desc->magic, FS_VERITY_MAGIC, sizeof(desc->magic))) {
+ pr_warn("Wrong magic bytes\n");
+ return -EINVAL;
+ }
+
+ /* major_version */
+ if (desc->major_version != 1) {
+ pr_warn("Unsupported major version (%u)\n",
+ desc->major_version);
+ return -EINVAL;
+ }
+
+ /* minor_version */
+ if (desc->minor_version != 0) {
+ pr_warn("Unsupported minor version (%u)\n",
+ desc->minor_version);
+ return -EINVAL;
+ }
+
+ /* data_algorithm and tree_algorithm */
+ alg_num = le16_to_cpu(desc->data_algorithm);
+ if (alg_num != le16_to_cpu(desc->tree_algorithm)) {
+ pr_warn("Unimplemented case: data (%u) and tree (%u) hash algorithms differ\n",
+ alg_num, le16_to_cpu(desc->tree_algorithm));
+ return -EINVAL;
+ }
+ vi->hash_alg = fsverity_get_hash_alg(alg_num);
+ if (IS_ERR(vi->hash_alg))
+ return PTR_ERR(vi->hash_alg);
+
+ /* log_data_blocksize and log_tree_blocksize */
+ if (desc->log_data_blocksize != PAGE_SHIFT) {
+ pr_warn("Unsupported log_blocksize (%u). Need block_size == PAGE_SIZE.\n",
+ desc->log_data_blocksize);
+ return -EINVAL;
+ }
+ if (desc->log_tree_blocksize != desc->log_data_blocksize) {
+ pr_warn("Unimplemented case: data (%u) and tree (%u) block sizes differ\n",
+ desc->log_data_blocksize, desc->log_data_blocksize);
+ return -EINVAL;
+ }
+ vi->block_bits = desc->log_data_blocksize;
+ hashes_per_block = (1 << vi->block_bits) / vi->hash_alg->digest_size;
+ if (!is_power_of_2(hashes_per_block)) {
+ pr_warn("Unimplemented case: hashes per block (%u) isn't a power of 2\n",
+ hashes_per_block);
+ return -EINVAL;
+ }
+ vi->log_arity = ilog2(hashes_per_block);
+
+ /* flags */
+ if (desc->flags) {
+ pr_warn("Unsupported flags (%#x)\n", le32_to_cpu(desc->flags));
+ return -EINVAL;
+ }
+
+ /* reserved fields */
+ if (desc->reserved1 ||
+ memchr_inv(desc->reserved2, 0, sizeof(desc->reserved2))) {
+ pr_warn("Reserved bits set in fsverity_descriptor\n");
+ return -EINVAL;
+ }
+
+ /* orig_file_size */
+ vi->data_i_size = le64_to_cpu(desc->orig_file_size);
+ if (vi->data_i_size <= 0) {
+ pr_warn("Original file size is 0 or negative; this is unsupported\n");
+ return -EINVAL;
+ }
+
+ /* extensions */
+ desc_auth_len = parse_extensions(vi, desc, desc_len);
+ if (desc_auth_len < 0)
+ return desc_auth_len;
+
+ if (!vi->have_root_hash) {
+ pr_warn("Root hash wasn't found!\n");
+ return -EINVAL;
+ }
+
+ /* Use an empty salt if no salt was found in the extensions list */
+ if (!vi->hashstate) {
+ err = set_salt(vi, "", 0);
+ if (err)
+ return err;
+ }
+
+ return desc_auth_len;
+}
+
+/*
+ * Calculate the depth of the Merkle tree, then create a map from level to the
+ * block offset at which that level's hash blocks start. Level 'depth - 1' is
+ * the root and is stored first. Level 0 is the level directly "above" the data
+ * blocks and is stored last, just before the fsverity_descriptor.
+ */
+static int compute_tree_depth_and_offsets(struct fsverity_info *vi)
+{
+ unsigned int hashes_per_block = 1 << vi->log_arity;
+ u64 blocks = ((u64)vi->data_i_size + (1 << vi->block_bits) - 1) >>
+ vi->block_bits;
+ u64 offset = blocks; /* assuming Merkle tree past EOF */
+ int depth = 0;
+ int i;
+
+ while (blocks > 1) {
+ if (depth >= FS_VERITY_MAX_LEVELS) {
+ pr_warn("Too many tree levels (max is %d)\n",
+ FS_VERITY_MAX_LEVELS);
+ return -EINVAL;
+ }
+ blocks = (blocks + hashes_per_block - 1) >> vi->log_arity;
+ vi->hash_lvl_region_idx[depth++] = blocks;
+ }
+ vi->depth = depth;
+
+ for (i = depth - 1; i >= 0; i--) {
+ u64 next_count = vi->hash_lvl_region_idx[i];
+
+ vi->hash_lvl_region_idx[i] = offset;
+ pr_debug("Level %d is [%llu..%llu] (%llu blocks)\n",
+ i, offset, offset + next_count - 1, next_count);
+ offset += next_count;
+ }
+ return 0;
+}
+
+/* Arbitrary limit, can be increased if needed */
+#define MAX_DESCRIPTOR_PAGES 16
+
+/*
+ * Compute the file's measurement by hashing the first 'desc_auth_len' bytes of
+ * the fs-verity descriptor (which includes the Merkle tree root hash as an
+ * authenticated extension item).
+ *
+ * Note: 'desc' may point into vmap'ed memory, so it can't be passed directly to
+ * sg_set_buf() for the ahash API. Instead, we pass the pages directly.
+ */
+static int compute_measurement(const struct fsverity_info *vi,
+ const struct fsverity_descriptor *desc,
+ int desc_auth_len,
+ struct page *desc_pages[MAX_DESCRIPTOR_PAGES],
+ int nr_desc_pages, u8 *measurement)
+{
+ struct ahash_request *req;
+ DECLARE_CRYPTO_WAIT(wait);
+ struct scatterlist sg[MAX_DESCRIPTOR_PAGES];
+ int offset, len, remaining;
+ int i;
+ int err;
+
+ req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
+ if (!req)
+ return -ENOMEM;
+
+ sg_init_table(sg, nr_desc_pages);
+ offset = offset_in_page(desc);
+ remaining = desc_auth_len;
+ for (i = 0; i < nr_desc_pages && remaining; i++) {
+ len = min_t(int, PAGE_SIZE - offset, remaining);
+ sg_set_page(&sg[i], desc_pages[i], len, offset);
+ remaining -= len;
+ offset = 0;
+ }
+
+ ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP |
+ CRYPTO_TFM_REQ_MAY_BACKLOG,
+ crypto_req_done, &wait);
+ ahash_request_set_crypt(req, sg, measurement, desc_auth_len);
+ err = crypto_wait_req(crypto_ahash_digest(req), &wait);
+ ahash_request_free(req);
+ return err;
+}
+
+static struct fsverity_info *alloc_fsverity_info(void)
+{
+ return kmem_cache_zalloc(fsverity_info_cachep, GFP_NOFS);
+}
+
+void free_fsverity_info(struct fsverity_info *vi)
+{
+ if (!vi)
+ return;
+ kfree(vi->hashstate);
+ kmem_cache_free(fsverity_info_cachep, vi);
+}
+
+/**
+ * find_fsverity_footer - find the fsverity_footer in the last page of metadata
+ *
+ * Allow the fs-verity footer to be padded with zeroes. This is needed by ext4,
+ * which stores the fs-verity metadata beyond EOF but sets i_size = data_i_size.
+ * Then, the fs-verity footer must be found implicitly via the last extent.
+ *
+ * Return: pointer to the footer if found, else NULL
+ */
+static const struct fsverity_footer *
+find_fsverity_footer(const u8 *last_virt, size_t last_validsize)
+{
+ const u8 *p = last_virt + last_validsize;
+ const struct fsverity_footer *ftr;
+
+ /* Find the last nonzero byte, which should be ftr->magic[7] */
+ do {
+ if (p <= last_virt)
+ return NULL;
+ } while (*--p == 0);
+
+ BUILD_BUG_ON(sizeof(ftr->magic) != 8);
+ BUILD_BUG_ON(offsetof(struct fsverity_footer, magic[8]) !=
+ sizeof(*ftr));
+ if (p - last_virt < offsetof(struct fsverity_footer, magic[7]))
+ return NULL;
+ ftr = container_of(p, struct fsverity_footer, magic[7]);
+ if (memcmp(ftr->magic, FS_VERITY_MAGIC, sizeof(ftr->magic)))
+ return NULL;
+ return ftr;
+}
+
+struct page *fsverity_read_metadata_page(struct inode *inode, pgoff_t index)
+{
+ /*
+ * For now we assume that the verity metadata is stored in the same data
+ * stream as the actual file contents (as ext4 and f2fs do), so we read
+ * the metadata directly from the inode's page cache. If any
+ * filesystems need to do things differently, this should be replaced
+ * with a method fsverity_operations.read_metadata_page().
+ */
+ return read_mapping_page(inode->i_mapping, index, NULL);
+}
+
+/**
+ * map_fsverity_descriptor - map an inode's fs-verity descriptor into memory
+ *
+ * If the descriptor fits in one page, we use kmap; otherwise we use vmap.
+ * unmap_fsverity_descriptor() must be called later to unmap it.
+ *
+ * It's assumed that the file contents cannot be modified concurrently.
+ * (This is guaranteed by either deny_write_access() or by the verity bit.)
+ *
+ * Return: the virtual address of the start of the descriptor, in virtually
+ * contiguous memory. Also fills in desc_pages and returns in *desc_len the
+ * length of the descriptor including all extensions, and in *desc_start the
+ * offset of the descriptor from the start of the file, in bytes.
+ */
+static const struct fsverity_descriptor *
+map_fsverity_descriptor(struct inode *inode, loff_t metadata_end,
+ struct page *desc_pages[MAX_DESCRIPTOR_PAGES],
+ int *nr_desc_pages, int *desc_len, loff_t *desc_start)
+{
+ const int last_validsize = ((metadata_end - 1) & ~PAGE_MASK) + 1;
+ const pgoff_t last_pgoff = (metadata_end - 1) >> PAGE_SHIFT;
+ struct page *last_page;
+ const void *last_virt;
+ const struct fsverity_footer *ftr;
+ pgoff_t first_pgoff;
+ u32 desc_reverse_offset;
+ pgoff_t pgoff;
+ const void *desc_virt;
+ int i;
+ int err;
+
+ *nr_desc_pages = 0;
+ *desc_len = 0;
+ *desc_start = 0;
+
+ last_page = fsverity_read_metadata_page(inode, last_pgoff);
+ if (IS_ERR(last_page)) {
+ pr_warn("Error reading last page: %ld\n", PTR_ERR(last_page));
+ return ERR_CAST(last_page);
+ }
+ last_virt = kmap(last_page);
+
+ ftr = find_fsverity_footer(last_virt, last_validsize);
+ if (!ftr) {
+ pr_warn("No verity metadata found\n");
+ err = -EINVAL;
+ goto err_out;
+ }
+ metadata_end -= (last_virt + last_validsize - sizeof(*ftr)) -
+ (void *)ftr;
+
+ desc_reverse_offset = le32_to_cpu(ftr->desc_reverse_offset);
+ if (desc_reverse_offset <
+ sizeof(struct fsverity_descriptor) + sizeof(*ftr) ||
+ desc_reverse_offset > metadata_end) {
+ pr_warn("Unexpected desc_reverse_offset: %u\n",
+ desc_reverse_offset);
+ err = -EINVAL;
+ goto err_out;
+ }
+ *desc_start = metadata_end - desc_reverse_offset;
+ if (*desc_start & 7) {
+ pr_warn("fs-verity descriptor is misaligned (desc_start=%lld)\n",
+ *desc_start);
+ err = -EINVAL;
+ goto err_out;
+ }
+
+ first_pgoff = *desc_start >> PAGE_SHIFT;
+ if (last_pgoff - first_pgoff >= MAX_DESCRIPTOR_PAGES) {
+ pr_warn("fs-verity descriptor is too long (%lu pages)\n",
+ last_pgoff - first_pgoff + 1);
+ err = -EINVAL;
+ goto err_out;
+ }
+
+ *desc_len = desc_reverse_offset - sizeof(__le32);
+
+ if (first_pgoff == last_pgoff) {
+ /* Single-page descriptor; use the already-kmapped last page */
+ desc_pages[0] = last_page;
+ *nr_desc_pages = 1;
+ return last_virt + (*desc_start & ~PAGE_MASK);
+ }
+
+ /* Multi-page descriptor; map the additional pages into memory */
+
+ for (pgoff = first_pgoff; pgoff < last_pgoff; pgoff++) {
+ struct page *page;
+
+ page = fsverity_read_metadata_page(inode, pgoff);
+ if (IS_ERR(page)) {
+ err = PTR_ERR(page);
+ pr_warn("Error reading descriptor page: %d\n", err);
+ goto err_out;
+ }
+ desc_pages[(*nr_desc_pages)++] = page;
+ }
+
+ desc_pages[(*nr_desc_pages)++] = last_page;
+ kunmap(last_page);
+ last_page = NULL;
+
+ desc_virt = vmap(desc_pages, *nr_desc_pages, VM_MAP, PAGE_KERNEL_RO);
+ if (!desc_virt) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+
+ return desc_virt + (*desc_start & ~PAGE_MASK);
+
+err_out:
+ for (i = 0; i < *nr_desc_pages; i++)
+ put_page(desc_pages[i]);
+ if (last_page) {
+ kunmap(last_page);
+ put_page(last_page);
+ }
+ return ERR_PTR(err);
+}
+
+static void
+unmap_fsverity_descriptor(const struct fsverity_descriptor *desc,
+ struct page *desc_pages[MAX_DESCRIPTOR_PAGES],
+ int nr_desc_pages)
+{
+ int i;
+
+ if (is_vmalloc_addr(desc)) {
+ vunmap((void *)((unsigned long)desc & PAGE_MASK));
+ } else {
+ WARN_ON(nr_desc_pages != 1);
+ kunmap(desc_pages[0]);
+ }
+ for (i = 0; i < nr_desc_pages; i++)
+ put_page(desc_pages[i]);
+}
+
+/* Read the file's fs-verity descriptor and create an fsverity_info for it */
+struct fsverity_info *create_fsverity_info(struct inode *inode, bool enabling)
+{
+ struct fsverity_info *vi;
+ const struct fsverity_descriptor *desc = NULL;
+ struct page *desc_pages[MAX_DESCRIPTOR_PAGES];
+ int nr_desc_pages;
+ int desc_len;
+ loff_t desc_start;
+ int desc_auth_len;
+ int err;
+
+ vi = alloc_fsverity_info();
+ if (!vi)
+ return ERR_PTR(-ENOMEM);
+
+ if (enabling) {
+ /* file is in fsveritysetup format */
+ vi->metadata_end = i_size_read(inode);
+ } else {
+ /* verity metadata may be in a filesystem-specific location */
+ err = inode->i_sb->s_vop->get_metadata_end(inode,
+ &vi->metadata_end);
+ if (err)
+ goto out;
+ }
+
+ desc = map_fsverity_descriptor(inode, vi->metadata_end, desc_pages,
+ &nr_desc_pages, &desc_len, &desc_start);
+ if (IS_ERR(desc)) {
+ err = PTR_ERR(desc);
+ desc = NULL;
+ goto out;
+ }
+
+ dump_fsverity_descriptor(desc);
+ desc_auth_len = parse_fsverity_descriptor(vi, desc, desc_len);
+ if (desc_auth_len < 0) {
+ err = desc_auth_len;
+ goto out;
+ }
+ if (vi->data_i_size > i_size_read(inode)) {
+ pr_warn("Bad data_i_size: %llu\n", vi->data_i_size);
+ err = -EINVAL;
+ goto out;
+ }
+
+ err = compute_tree_depth_and_offsets(vi);
+ if (err)
+ goto out;
+ err = compute_measurement(vi, desc, desc_auth_len, desc_pages,
+ nr_desc_pages, vi->measurement);
+out:
+ if (desc)
+ unmap_fsverity_descriptor(desc, desc_pages, nr_desc_pages);
+ if (err) {
+ free_fsverity_info(vi);
+ vi = ERR_PTR(err);
+ }
+ return vi;
+}
+
+/* Ensure the inode has an ->i_verity_info */
+static int setup_fsverity_info(struct inode *inode)
+{
+ struct fsverity_info *vi = get_fsverity_info(inode);
+
+ if (vi)
+ return 0;
+
+ vi = create_fsverity_info(inode, false);
+ if (IS_ERR(vi))
+ return PTR_ERR(vi);
+
+ if (!set_fsverity_info(inode, vi))
+ free_fsverity_info(vi);
+ return 0;
+}
+
+/**
+ * fsverity_file_open - prepare to open a verity file
+ * @inode: the inode being opened
+ * @filp: the struct file being set up
+ *
+ * When opening a verity file, deny the open if it is for writing. Otherwise,
+ * set up the inode's ->i_verity_info (if not already done) by parsing the
+ * verity metadata at the end of the file.
+ *
+ * When combined with fscrypt, this must be called after fscrypt_file_open().
+ * Otherwise, we won't have the key set up to decrypt the verity metadata.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int fsverity_file_open(struct inode *inode, struct file *filp)
+{
+ if (filp->f_mode & FMODE_WRITE) {
+ pr_debug("Denying opening verity file (ino %lu) for write\n",
+ inode->i_ino);
+ return -EPERM;
+ }
+
+ return setup_fsverity_info(inode);
+}
+EXPORT_SYMBOL_GPL(fsverity_file_open);
+
+/**
+ * fsverity_prepare_setattr - prepare to change a verity inode's attributes
+ * @dentry: dentry through which the inode is being changed
+ * @attr: attributes to change
+ *
+ * Verity files are immutable, so deny truncates. This isn't covered by the
+ * open-time check because sys_truncate() takes a path, not a file descriptor.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int fsverity_prepare_setattr(struct dentry *dentry, struct iattr *attr)
+{
+ if (attr->ia_valid & ATTR_SIZE) {
+ pr_debug("Denying truncate of verity file (ino %lu)\n",
+ d_inode(dentry)->i_ino);
+ return -EPERM;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fsverity_prepare_setattr);
+
+/**
+ * fsverity_prepare_getattr - prepare to get a verity inode's attributes
+ * @inode: the inode for which the attributes are being retrieved
+ *
+ * This only needs to be called by filesystems that set the on-disk i_size of
+ * verity files to something other than the data size, as then this is needed to
+ * override i_size so that stat() shows the correct size.
+ *
+ * When the filesystem supports fscrypt too, it must make sure to set up the
+ * inode's encryption key (if needed) before calling this.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int fsverity_prepare_getattr(struct inode *inode)
+{
+ return setup_fsverity_info(inode);
+}
+EXPORT_SYMBOL_GPL(fsverity_prepare_getattr);
+
+/**
+ * fsverity_cleanup_inode - free the inode's verity info, if present
+ *
+ * Filesystems must call this on inode eviction to free ->i_verity_info.
+ */
+void fsverity_cleanup_inode(struct inode *inode)
+{
+ free_fsverity_info(inode->i_verity_info);
+ inode->i_verity_info = NULL;
+}
+EXPORT_SYMBOL_GPL(fsverity_cleanup_inode);
+
+/**
+ * fsverity_full_i_size - get the full file size
+ *
+ * If the file has fs-verity set up, return the full file size including the
+ * verity metadata. Otherwise just return i_size. This is only meaningful when
+ * the filesystem stores the verity metadata past EOF.
+ */
+loff_t fsverity_full_i_size(const struct inode *inode)
+{
+ struct fsverity_info *vi = get_fsverity_info(inode);
+
+ if (vi)
+ return vi->metadata_end;
+
+ return i_size_read(inode);
+}
+EXPORT_SYMBOL_GPL(fsverity_full_i_size);
+
+static int __init fsverity_module_init(void)
+{
+ fsverity_info_cachep = KMEM_CACHE(fsverity_info, SLAB_RECLAIM_ACCOUNT);
+ if (!fsverity_info_cachep)
+ return -ENOMEM;
+
+ fsverity_check_hash_algs();
+
+ pr_debug("Initialized fs-verity\n");
+ return 0;
+}
+
+static void __exit fsverity_module_exit(void)
+{
+ kmem_cache_destroy(fsverity_info_cachep);
+ fsverity_exit_hash_algs();
+}
+
+module_init(fsverity_module_init)
+module_exit(fsverity_module_exit);
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("fs-verity: read-only file-based authenticity protection");
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8252df30b9a16..bcfc400627574 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -61,6 +61,8 @@ struct workqueue_struct;
struct iov_iter;
struct fscrypt_info;
struct fscrypt_operations;
+struct fsverity_info;
+struct fsverity_operations;
extern void __init inode_init(void);
extern void __init inode_init_early(void);
@@ -702,6 +704,10 @@ struct inode {
struct fscrypt_info *i_crypt_info;
#endif
+#if IS_ENABLED(CONFIG_FS_VERITY)
+ struct fsverity_info *i_verity_info;
+#endif
+
void *i_private; /* fs or device private pointer */
} __randomize_layout;
@@ -1400,6 +1406,9 @@ struct super_block {
const struct xattr_handler **s_xattr;
#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
const struct fscrypt_operations *s_cop;
+#endif
+#if IS_ENABLED(CONFIG_FS_VERITY)
+ const struct fsverity_operations *s_vop;
#endif
struct hlist_bl_head s_roots; /* alternate root dentries for NFS */
struct list_head s_mounts; /* list of mounts; _not_ for fs use */
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
new file mode 100644
index 0000000000000..c9422a579c160
--- /dev/null
+++ b/include/linux/fsverity.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * fs-verity: read-only file-based authenticity protection
+ *
+ * Copyright 2018 Google LLC
+ */
+
+#ifndef _LINUX_FSVERITY_H
+#define _LINUX_FSVERITY_H
+
+#include <linux/fs.h>
+#include <uapi/linux/fsverity.h>
+
+/*
+ * fs-verity operations for filesystems
+ */
+struct fsverity_operations {
+ int (*set_verity)(struct inode *inode, loff_t data_i_size);
+ int (*get_metadata_end)(struct inode *inode, loff_t *metadata_end_ret);
+};
+
+#if __FS_HAS_VERITY
+
+/* setup.c */
+extern int fsverity_file_open(struct inode *inode, struct file *filp);
+extern int fsverity_prepare_setattr(struct dentry *dentry, struct iattr *attr);
+extern int fsverity_prepare_getattr(struct inode *inode);
+extern void fsverity_cleanup_inode(struct inode *inode);
+extern loff_t fsverity_full_i_size(const struct inode *inode);
+
+#else /* !__FS_HAS_VERITY */
+
+/* setup.c */
+
+static inline int fsverity_file_open(struct inode *inode, struct file *filp)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int fsverity_prepare_setattr(struct dentry *dentry,
+ struct iattr *attr)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int fsverity_prepare_getattr(struct inode *inode)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void fsverity_cleanup_inode(struct inode *inode)
+{
+}
+
+static inline loff_t fsverity_full_i_size(const struct inode *inode)
+{
+ return i_size_read(inode);
+}
+
+#endif /* !__FS_HAS_VERITY */
+
+#endif /* _LINUX_FSVERITY_H */
diff --git a/include/uapi/linux/fsverity.h b/include/uapi/linux/fsverity.h
new file mode 100644
index 0000000000000..55b9f32676220
--- /dev/null
+++ b/include/uapi/linux/fsverity.h
@@ -0,0 +1,86 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * fs-verity (file-based verity) support
+ *
+ * Copyright (C) 2018 Google LLC
+ */
+#ifndef _UAPI_LINUX_FSVERITY_H
+#define _UAPI_LINUX_FSVERITY_H
+
+#include <linux/limits.h>
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+/* ========== Ioctls ========== */
+
+struct fsverity_digest {
+ __u16 digest_algorithm;
+ __u16 digest_size; /* input/output */
+ __u8 digest[];
+};
+
+#define FS_IOC_ENABLE_VERITY _IO('f', 133)
+#define FS_IOC_MEASURE_VERITY _IOWR('f', 134, struct fsverity_digest)
+
+/* ========== On-disk format ========== */
+
+#define FS_VERITY_MAGIC "FSVerity"
+
+/* Supported hash algorithms */
+#define FS_VERITY_ALG_SHA256 1
+
+/* Metadata stored near the end of verity files, after the Merkle tree */
+/* This structure is 64 bytes long */
+struct fsverity_descriptor {
+ __u8 magic[8]; /* must be FS_VERITY_MAGIC */
+ __u8 major_version; /* must be 1 */
+ __u8 minor_version; /* must be 0 */
+ __u8 log_data_blocksize;/* log2(data-bytes-per-hash), e.g. 12 for 4KB */
+ __u8 log_tree_blocksize;/* log2(tree-bytes-per-hash), e.g. 12 for 4KB */
+ __le16 data_algorithm; /* hash algorithm for data blocks */
+ __le16 tree_algorithm; /* hash algorithm for tree blocks */
+ __le32 flags; /* flags */
+ __le32 reserved1; /* must be 0 */
+ __le64 orig_file_size; /* size of the original file data */
+ __le16 auth_ext_count; /* number of authenticated extensions */
+ __u8 reserved2[30]; /* must be 0 */
+};
+/* followed by list of 'auth_ext_count' authenticated extensions */
+/*
+ * then followed by '__le16 unauth_ext_count' padded to next 8-byte boundary,
+ * then a list of 'unauth_ext_count' (may be 0) unauthenticated extensions
+ */
+
+/* Extension types */
+#define FS_VERITY_EXT_ROOT_HASH 1
+#define FS_VERITY_EXT_SALT 2
+
+/* Header of each extension (variable-length metadata item) */
+struct fsverity_extension {
+ /*
+ * Length in bytes, including this header but excluding padding to next
+ * 8-byte boundary that is applied when advancing to the next extension.
+ */
+ __le32 length;
+ __le16 type; /* Type of this extension (see codes above) */
+ __le16 reserved; /* Reserved, must be 0 */
+};
+/* followed by the payload of 'length - 8' bytes */
+
+/* Extension payload formats */
+
+/*
+ * FS_VERITY_EXT_ROOT_HASH payload is just a byte array, with size equal to the
+ * digest size of the hash algorithm given in the fsverity_descriptor
+ */
+
+/* FS_VERITY_EXT_SALT payload is just a byte array, any size */
+
+
+/* Fields stored at the very end of the file */
+struct fsverity_footer {
+ __le32 desc_reverse_offset; /* distance to fsverity_descriptor */
+ __u8 magic[8]; /* FS_VERITY_MAGIC */
+} __packed;
+
+#endif /* _UAPI_LINUX_FSVERITY_H */
--
2.19.1.568.g152ad8e336-goog
From: Eric Biggers <[email protected]>
Make ext4_mpage_readpages() verify data as it is read from fs-verity
files, using the helper functions from fs/verity/.
To be compatible with fscrypt, like in the corresponding f2fs patch this
required refactoring the decryption workflow into a generic "post-read
processing" workflow, which can do decryption, verification, or both.
Co-developed-by: Theodore Ts'o <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
fs/ext4/ext4.h | 2 +
fs/ext4/inode.c | 3 +
fs/ext4/readpage.c | 209 ++++++++++++++++++++++++++++++++++++++-------
fs/ext4/super.c | 9 +-
4 files changed, 191 insertions(+), 32 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index e5475a629ed80..80957f9d3cbef 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3101,6 +3101,8 @@ static inline void ext4_set_de_type(struct super_block *sb,
extern int ext4_mpage_readpages(struct address_space *mapping,
struct list_head *pages, struct page *page,
unsigned nr_pages, bool is_readahead);
+extern int __init ext4_init_post_read_processing(void);
+extern void ext4_exit_post_read_processing(void);
/* symlink.c */
extern const struct inode_operations ext4_encrypted_symlink_inode_operations;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c624c83bbad26..d7019f5dca6f1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3884,6 +3884,9 @@ static ssize_t ext4_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
return 0;
#endif
+ if (ext4_verity_inode(inode))
+ return 0;
+
/*
* If we are doing data journalling we don't support O_DIRECT
*/
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index f461d75ac049f..d3dd1ff745db8 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -47,6 +47,11 @@
#include "ext4.h"
+#define NUM_PREALLOC_POST_READ_CTXS 128
+
+static struct kmem_cache *bio_post_read_ctx_cache;
+static mempool_t *bio_post_read_ctx_pool;
+
static inline bool ext4_bio_encrypted(struct bio *bio)
{
#ifdef CONFIG_EXT4_FS_ENCRYPTION
@@ -56,6 +61,124 @@ static inline bool ext4_bio_encrypted(struct bio *bio)
#endif
}
+/* postprocessing steps for read bios */
+enum bio_post_read_step {
+ STEP_INITIAL = 0,
+ STEP_DECRYPT,
+ STEP_VERITY,
+};
+
+struct bio_post_read_ctx {
+ struct bio *bio;
+ struct work_struct work;
+ unsigned int cur_step;
+ unsigned int enabled_steps;
+};
+
+static void __read_end_io(struct bio *bio)
+{
+ struct page *page;
+ struct bio_vec *bv;
+ int i;
+
+ bio_for_each_segment_all(bv, bio, i) {
+ page = bv->bv_page;
+
+ /* PG_error was set if any post_read step failed */
+ if (bio->bi_status || PageError(page)) {
+ ClearPageUptodate(page);
+ SetPageError(page);
+ } else {
+ SetPageUptodate(page);
+ }
+ unlock_page(page);
+ }
+ if (bio->bi_private)
+ mempool_free(bio->bi_private, bio_post_read_ctx_pool);
+ bio_put(bio);
+}
+
+static void bio_post_read_processing(struct bio_post_read_ctx *ctx);
+
+static void decrypt_work(struct work_struct *work)
+{
+ struct bio_post_read_ctx *ctx =
+ container_of(work, struct bio_post_read_ctx, work);
+
+ fscrypt_decrypt_bio(ctx->bio);
+
+ bio_post_read_processing(ctx);
+}
+
+static void verity_work(struct work_struct *work)
+{
+ struct bio_post_read_ctx *ctx =
+ container_of(work, struct bio_post_read_ctx, work);
+
+ fsverity_verify_bio(ctx->bio);
+
+ bio_post_read_processing(ctx);
+}
+
+static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
+{
+ /*
+ * We use different work queues for decryption and for verity because
+ * verity may require reading metadata pages that need decryption, and
+ * we shouldn't recurse to the same workqueue.
+ */
+ switch (++ctx->cur_step) {
+ case STEP_DECRYPT:
+ if (ctx->enabled_steps & (1 << STEP_DECRYPT)) {
+ INIT_WORK(&ctx->work, decrypt_work);
+ fscrypt_enqueue_decrypt_work(&ctx->work);
+ return;
+ }
+ ctx->cur_step++;
+ /* fall-through */
+ case STEP_VERITY:
+ if (ctx->enabled_steps & (1 << STEP_VERITY)) {
+ INIT_WORK(&ctx->work, verity_work);
+ fsverity_enqueue_verify_work(&ctx->work);
+ return;
+ }
+ ctx->cur_step++;
+ /* fall-through */
+ default:
+ __read_end_io(ctx->bio);
+ }
+}
+
+static struct bio_post_read_ctx *get_bio_post_read_ctx(struct inode *inode,
+ struct bio *bio,
+ pgoff_t index)
+{
+ unsigned int post_read_steps = 0;
+ struct bio_post_read_ctx *ctx = NULL;
+
+ if (ext4_encrypted_inode(inode) && S_ISREG(inode->i_mode))
+ post_read_steps |= 1 << STEP_DECRYPT;
+#ifdef CONFIG_EXT4_FS_VERITY
+ if (inode->i_verity_info != NULL &&
+ (index < ((i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT)))
+ post_read_steps |= 1 << STEP_VERITY;
+#endif
+ if (post_read_steps) {
+ ctx = mempool_alloc(bio_post_read_ctx_pool, GFP_NOFS);
+ if (!ctx)
+ return ERR_PTR(-ENOMEM);
+ ctx->bio = bio;
+ ctx->enabled_steps = post_read_steps;
+ bio->bi_private = ctx;
+ }
+ return ctx;
+}
+
+static bool bio_post_read_required(struct bio *bio)
+{
+ return bio->bi_private && !bio->bi_status;
+}
+
/*
* I/O completion handler for multipage BIOs.
*
@@ -70,30 +193,31 @@ static inline bool ext4_bio_encrypted(struct bio *bio)
*/
static void mpage_end_io(struct bio *bio)
{
- struct bio_vec *bv;
- int i;
+ if (bio_post_read_required(bio)) {
+ struct bio_post_read_ctx *ctx = bio->bi_private;
- if (ext4_bio_encrypted(bio)) {
- if (bio->bi_status) {
- fscrypt_release_ctx(bio->bi_private);
- } else {
- fscrypt_enqueue_decrypt_bio(bio->bi_private, bio);
- return;
- }
+ ctx->cur_step = STEP_INITIAL;
+ bio_post_read_processing(ctx);
+ return;
}
- bio_for_each_segment_all(bv, bio, i) {
- struct page *page = bv->bv_page;
+ __read_end_io(bio);
+}
- if (!bio->bi_status) {
- SetPageUptodate(page);
- } else {
- ClearPageUptodate(page);
- SetPageError(page);
- }
- unlock_page(page);
+static inline loff_t ext4_readpage_limit(struct inode *inode)
+{
+#ifdef CONFIG_EXT4_FS_VERITY
+ if (ext4_verity_inode(inode)) {
+ if (inode->i_verity_info)
+ /* limit to end of metadata region */
+ return fsverity_full_i_size(inode);
+ /*
+ * fsverity_info is currently being set up and no user reads are
+ * allowed yet. It's easiest to just not enforce a limit yet.
+ */
+ return inode->i_sb->s_maxbytes;
}
-
- bio_put(bio);
+#endif
+ return i_size_read(inode);
}
int ext4_mpage_readpages(struct address_space *mapping,
@@ -140,7 +264,8 @@ int ext4_mpage_readpages(struct address_space *mapping,
block_in_file = (sector_t)page->index << (PAGE_SHIFT - blkbits);
last_block = block_in_file + nr_pages * blocks_per_page;
- last_block_in_file = (i_size_read(inode) + blocksize - 1) >> blkbits;
+ last_block_in_file = (ext4_readpage_limit(inode) +
+ blocksize - 1) >> blkbits;
if (last_block > last_block_in_file)
last_block = last_block_in_file;
page_block = 0;
@@ -217,6 +342,8 @@ int ext4_mpage_readpages(struct address_space *mapping,
zero_user_segment(page, first_hole << blkbits,
PAGE_SIZE);
if (first_hole == 0) {
+ if (!fsverity_check_hole(inode, page))
+ goto set_error_page;
SetPageUptodate(page);
unlock_page(page);
goto next_page;
@@ -240,19 +367,15 @@ int ext4_mpage_readpages(struct address_space *mapping,
bio = NULL;
}
if (bio == NULL) {
- struct fscrypt_ctx *ctx = NULL;
+ struct bio_post_read_ctx *ctx;
- if (ext4_encrypted_inode(inode) &&
- S_ISREG(inode->i_mode)) {
- ctx = fscrypt_get_ctx(inode, GFP_NOFS);
- if (IS_ERR(ctx))
- goto set_error_page;
- }
bio = bio_alloc(GFP_KERNEL,
min_t(int, nr_pages, BIO_MAX_PAGES));
- if (!bio) {
- if (ctx)
- fscrypt_release_ctx(ctx);
+ if (!bio)
+ goto set_error_page;
+ ctx = get_bio_post_read_ctx(inode, bio, page->index);
+ if (IS_ERR(ctx)) {
+ bio_put(bio);
goto set_error_page;
}
bio_set_dev(bio, bdev);
@@ -293,3 +416,27 @@ int ext4_mpage_readpages(struct address_space *mapping,
submit_bio(bio);
return 0;
}
+
+int __init ext4_init_post_read_processing(void)
+{
+ bio_post_read_ctx_cache = KMEM_CACHE(bio_post_read_ctx, 0);
+ if (!bio_post_read_ctx_cache)
+ goto fail;
+ bio_post_read_ctx_pool =
+ mempool_create_slab_pool(NUM_PREALLOC_POST_READ_CTXS,
+ bio_post_read_ctx_cache);
+ if (!bio_post_read_ctx_pool)
+ goto fail_free_cache;
+ return 0;
+
+fail_free_cache:
+ kmem_cache_destroy(bio_post_read_ctx_cache);
+fail:
+ return -ENOMEM;
+}
+
+void ext4_exit_post_read_processing(void)
+{
+ mempool_destroy(bio_post_read_ctx_pool);
+ kmem_cache_destroy(bio_post_read_ctx_cache);
+}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c4a66b64ea604..fb4e060f28ecb 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -6070,6 +6070,10 @@ static int __init ext4_init_fs(void)
return err;
err = ext4_init_pending();
+ if (err)
+ goto out7;
+
+ err = ext4_init_post_read_processing();
if (err)
goto out6;
@@ -6111,8 +6115,10 @@ static int __init ext4_init_fs(void)
out4:
ext4_exit_pageio();
out5:
- ext4_exit_pending();
+ ext4_exit_post_read_processing();
out6:
+ ext4_exit_pending();
+out7:
ext4_exit_es();
return err;
@@ -6129,6 +6135,7 @@ static void __exit ext4_exit_fs(void)
ext4_exit_sysfs();
ext4_exit_system_zone();
ext4_exit_pageio();
+ ext4_exit_post_read_processing();
ext4_exit_es();
ext4_exit_pending();
}
--
2.19.1.568.g152ad8e336-goog
On Sat, Dec 22, 2018 at 02:47:22PM -0800, Linus Torvalds wrote:
> So I want to understand why this was made a filesystem operation in
> the first place. What's fs-specific about this implementation?
These are the things which are fs-specific.
*) We have to splice into the file system's readpage processing so we
can verify the merkle tree hash before we mark the page up-to-date.
This is most of the complexity involved in adding fs-verity
support, and that's because both ext4 and f2fs have their own
fs-specific readpage[s]() implementations, and both ext4 and f2fs
also supports fscrypt, which *also* has to splice into readpage[s]().
*) The file system needs to define a file system feature bit in the
superblock which means, "this file system uses fs-verity" --- so
that old kernels will know that they need to refuse to mount the
file system (f2fs) or mount the file system read-only (ext4).
*) The file system needs to define inode flag which is used to
indicate "this is a fs-verity protected file". This flag is not
user-visible, so the file system just has to provide a single bit
in the inode structure and a function which tests that bit.
*) Ext4 chose to have i_size on disk to be size of the data. We did
this so that the the fs-verity feature for ext4 could be a
read-only compat feature. (e.g., an old kernel can safely mount a
file system with fs-verity protected files, but only for a
read-only mount.) This adds a bit more complexity for ext4 in that
we need to look up in our extent tree to find the last block in the
file (which is where the fs-verity superblock is located).
For f2fs, it can just use the on-disk i_size to find the fs-verity
superblock, and then from that, f2fs can find the original data
i_size (which then gets presents to userspace when it calls
stat(2).)
As far as the last point is concerned, ext4 could have done things the
f2fs way, which is simpler, and which would allowed us to make things
much more generic. However, being able to support read-only mounts of
file systems with fs-verity protected files was important to me.
Everything else is generic and we tried to factor out as much common
code as possible into fs/verity. But the model has always been that
at least *some* changes would be needed in the file system to call out
to the fs-verity code, primarily because we didn't want to make
changes to readpage()/readpages() VFS<->low-level fs interface. That
would have required making changes in dozens of file systems, and
while that would have allowed us to factor out some duplicated code in
{ext4,f2fs}_readpage[s]() --- right now it's only those two file
systems out of 70 or so support fscrypt and fs-verity. It's just not
worth it.
- Ted
Hi Christoph,
On Thu, Dec 13, 2018 at 12:22:49PM -0800, Christoph Hellwig wrote:
> On Wed, Dec 12, 2018 at 12:26:10PM -0800, Eric Biggers wrote:
> > > As this apparently got merged despite no proper reviews from VFS
> > > level persons:
> >
> > fs-verity has been out for review since August, and Cc'ed to all relevant
> > mailing lists including linux-fsdevel, linux-ext4, linux-f2fs-devel,
> > linux-fscrypt, linux-integrity, and linux-kernel. There are tests,
> > documentation (since v2), and a userspace tool. It's also been presented at
> > multiple conferences, and has been covered by LWN multiple times. If more
> > people want to review it, then they should do so; there's nothing stopping them.
>
> But you did not got a review from someone like Al, Linus, Andrew or me,
> did you?
Sure, those specific people (modulo you just now) haven't responded to the
fs-verity patches yet. But again, the patches have been out for review for
months. Of course, we always prefer more reviews over fewer, and we strongly
encourage anyone interested to review fs-verity! (The Documentation/ file may
be a good place to start.) But ultimately we cannot force reviews, and as you
know kernel reviews can be very hard to come by. Yet, people still need
fs-verity anyway; it isn't just some toy. And we're committed to maintaining
it, similar to fscrypt. The ext4 and f2fs maintainers are also satisfied with
the current approach to storing the verity metadata past EOF; in fact it was
even originally Ted's idea, I think.
>
> > Can you elaborate on the actual problems you think the current solution has, and
> > exactly what solution you'd prefer instead? Keep in mind that (1) for large
> > files the Merkle tree can be gigabytes long, (2) Linux doesn't have an API for
> > file streams, and (3) when fs-verity is combined with fscrypt, it's important
> > that the hashes be encrypted, so as to not leak information about the plaintext.
>
> Given that you alread use an ioctl as the interface what is the problem
> of passing this data through the ioctl?
Do you mean pass the verity metadata in a buffer? That cannot work in general,
because it may be too large to fit into memory.
Or do you mean pass it via a second file descriptor? That could work, but it
doesn't seem better than the current approach. It would force every filesystem
to move the metadata around, whereas currently ext4 and f2fs can simply leave it
in place. If you meant this, are there advantages you have in mind that would
outweigh this?
We also considered generating the Merkle tree in the kernel, in which case
FS_IOC_ENABLE_VERITY would just take a small structure similar to the current
fsverity_descriptor. But that would add extra complexity to the kernel, and
generating a Merkle tree over a large file is the type of parallelizable, CPU
intensive work that really should be done in userspace. Also, having userspace
provide the Merkle tree allows for it to be pre-generated and distributed with
the file, e.g. provided in a package to be installed on many systems.
But please do let us know if you have any better ideas.
Thanks!
- Eric
On Thu, Dec 20, 2018 at 11:04:47PM -0800, Christoph Hellwig wrote:
> Ted, I think you know yourself this isn't true. Whenever we added
> useful interface to one of the major file systems we had other pick
> it up, and that is a good thing because the last thing we need is
> fragmentation of interfaces. And even if that wasn't the case I don't
> think we should take short cuts, because even if an interface was just
> for a file system or two it still needs to be properly desgined.
This is why I think the interface argument is totally bogus.
If you're OK with Darrick's suggested interface, where you pass in a
file descriptor, offset and length --- that's just a superset of the
current interface, except where the file descriptor is in the file
which is going to be protected using fs-verity. So there's if you're
OK with that interface, we can add that interface later, and it's
really no big deal; it certainly doesn't add any extra complexity for
XFS --- assuming that XFS even gets around to adding support for
fs-verity.
Adding that extra complexity is not necessary for the current users of
the interface, and as I've said multiple times before, there's no
*value* in allowing the Merkle tree to be passed in via some arbitrary
file descriptor, which might even be on a separate fhile system, as
opposed in the file which is about to be protected using fs-verity.
Linus --- we're going round and round, and I don't think this is
really a technical dispute at this point, but rather an aesthetics
one. Will you be willing to accept my pull request for a feature
which is being shippped on millions of Android phones, has been out
for review for months, and for which, if we *really* need to add
uselessly complicated interface later, we can do that? It's always
been the case for internal Kernel interfaces not to add code "just in
case" it's useful, but rather when a user turns up. I argue we should
be doing the same thing for user-space visible interfaces.
Regards,
- Ted
On Thu, Dec 20, 2018 at 08:35:52AM +1100, Dave Chinner wrote:
>
> The file has to be written before it has been protected, which means
> it may very well have user space allocated beyond EOF before the
> merkle tree needs to be written.
Sure, and every file system knows how to truncate a file. This isn't
hard.
> But whether or not fsverity is enabled on the filesystem, the fact
> is that the kernel code now has to support storing and reading data
> from beyond EOF. Every user, whether they are using fsverity or not,
> is now exposed to that code and a filesystem that no longer
> considers the user data region beyond EOF as write only.
That's simply not true. Number one, fsverity is not mandatory for all
file systems to implement. If XFS doesn't want to implement fscrypt
or fsverity, it doesn't have to. Number two, we're not *making* any
changes to the kernel code; nothing in mm/filemap.c, et. al. So
saying that we are making changes that are impacted by /everyone/ just
doesn't make any sense.
> How filesystems store and retrieve merkle tree data should be a
> filesystem internal detail. If how metadata is stored in th e
> filesystem is defined by the userspace API or the kernel library
> code that implements the verification feature, then it lacks the
> necessary abstraction to be a generic Linux filesystem feature.
> IOWs, it needs to be redesigned and reworked before we should
> consider it for merging.
I disagree with your aesthetics that the interface has to be
completely isolated from the implementation. If you don't want to
call it a generic file system feature, fine. It can just be something
that f2fs and ext4 uses.
- Ted
On Fri, Dec 21, 2018 at 10:47:14AM -0500, Theodore Y. Ts'o wrote:
> Linus --- we're going round and round, and I don't think this is
> really a technical dispute at this point, but rather an aesthetics
> one. Will you be willing to accept my pull request for a feature
> which is being shippped on millions of Android phones, has been out
> for review for months, and for which, if we *really* need to add
> uselessly complicated interface later, we can do that? It's always
> been the case for internal Kernel interfaces not to add code "just in
> case" it's useful, but rather when a user turns up. I argue we should
> be doing the same thing for user-space visible interfaces.
To look at it another way, this is an aesthetic dispute in which all those
who have offered opinions from outside Google -- myself, Dave Chinner &
Christoph all really dislike this interface. I'd be happy to discuss
alternative interfaces, particularly ones which allow for the current
internal implementation, but I think this interface is really bad.
In contrast to "we'll just fix it up later" (which usually applies
to in-kernel interfaces), we have a policy of not breaking userspace,
so accepting this interface means setting it in stone. We should get
it right.
On Thu, Dec 13, 2018 at 08:48:03PM -0800, Eric Biggers wrote:
> Sure, those specific people (modulo you just now) haven't responded to the
> fs-verity patches yet. But again, the patches have been out for review for
> months. Of course, we always prefer more reviews over fewer, and we strongly
> encourage anyone interested to review fs-verity! (The Documentation/ file may
> be a good place to start.) But ultimately we cannot force reviews, and as you
> know kernel reviews can be very hard to come by. Yet, people still need
> fs-verity anyway; it isn't just some toy. And we're committed to maintaining
> it, similar to fscrypt. The ext4 and f2fs maintainers are also satisfied with
> the current approach to storing the verity metadata past EOF; in fact it was
> even originally Ted's idea, I think.
But you also can't force inclusion. And Linus just recently complained
about merging common code patches through trees for a specific fs
without proper VFS ACKs. And that was a for a case without userspace
ABI implications, so we really need a much better review here.
Including a VFS person ACK, CC to linux-abi and man pages for the
interface.
> > Given that you alread use an ioctl as the interface what is the problem
> > of passing this data through the ioctl?
>
> Do you mean pass the verity metadata in a buffer? That cannot work in general,
> because it may be too large to fit into memory.
Have a pointer in the ioctl and do get_user_pages on it.
On Fri, Dec 21, 2018 at 8:20 PM Theodore Y. Ts'o <[email protected]> wrote:
>
> On Fri, Dec 21, 2018 at 11:13:07AM -0800, Linus Torvalds wrote:
> >
> > In other words: either the model is that the file *itself* contains
> > its own merkle tree that validates the file, or it isn't. You can't
> > have it two ways. No silly "layout changes when you apply the hash"
> > garbage. That's just crazy talk and invalidates the whole model.
>
> Userspace applications which are reading the file aren't going to be
> expecting Merkle tree. For example, one of the use cases is Android
> APK files, which are essentially ZIP files. ZIP files can be parsed
> both from the front-end (streaming), or by looking for the complete
> directory of all of the files in the ZIP file by starting at the end
> of the file and moving backwards. If the Merkle tree was visible to
> userspace programs that are opening and reading the file, it would
> confuse them mightily.
>
> So what we do for ext4 and f2fs is make the Merkle tree invisible
Again, this has nothing that is per-filesystem in it.
If we were to decide to support the notion of "append merkle hashes to
the file for validation" at the vfs layer, the same logic would apply:
obviously the merkle data shouldn't be visible to user space.
But that's not a reason to do it at a filesystem layer, quite the
reverse: exactly like you say, as far as the *filesystem* is
concerned, the data is there in the file. It's literally about the
*view* of the file, ie the system call interface:
> From the *file system's* perspective,
> though, the metadata blocks are part of the file.
To me that only argues that this all should be at the vfs layer, and
that it shouldn't be the filesystem that hides it. Exactly because as
far as the filesystem is concerned, the merkle data is there, it's
just that we hide it at read (and stat) time.
Preferably some way where it's namespace-dependent or whatever, so
that you could still access the original file data from user space if
you want to (eg some backup purpose or other).
What I'm missing is any kind of sane explanation for why it was done
so badly, and why it should be upstreamed despite the apparent bad
implementation.
It sounds like a complete hack.
Again, to me either the point is that it's a generic extension of the
file data, _or_ it's some filesystem-specific hidden data. The way
you've done it and written the documentation, it's clearly a generic
extension of normal file data, and I don't see what's fs-specific to
it.
> The problem is that xattrs are designed to be accessed via a set/get
> interface, are currently limited, IIRC at 32k. The max size of an APK
> is 300 megabytes; and the Merkle tree for a file that size will be
> about 2.3 megabytes. That's way too big to store as an xattr;
> certainly using the existing xattr interfaces. And it's also bigger
> than most file systems can handle as xattrs today --- because they've
> been optimzied for relatively small sizes, for things like SELinux
> labels and ACL structures.
So *this* kind of argument is what I'm looking for.
That at least explains why it's not an xattr. Ugly, but understandable.
> > So why is this sold as some unholy mess of "filesystem-specific" and
> > "generic"? That part just annoys the hell out of me. Why isn't this
> > sold as an *actual* generic model, where you just say "append the
> > merkle tree to the file, then enable verity testing of the end result
> > and validate the top-level hash".
>
> That was the original way it was sold, but Cristoph and Dave have
> NACK'ed it in that form.
That seems entirely irrelevant. What do Christoph and Dave have to do
with it once it's generic? It would have _zero_ filesystem component
if it's actually done in a generic manner. It would be a total no-op
to XFS.
Which makes me think "it wasn't actually sold as being
filesystem-independent" at all.
So I want to understand why this was made a filesystem operation in
the first place. What's fs-specific about this implementation?
Linus
On Mon, Dec 17, 2018 at 12:00:39PM -0800, Darrick J. Wong wrote:
> FWIW, if I were (hypothetically) working on an xfs implementation, I
> likely would have settled on passing a reference to a merkle tree
> through a (fd, length) pair, because that allows us plenty of options
> on the back end:
>
> b) we could remap the tree into a new inode fork for merkle trees, or
> a) remap it as posteof blocks like ext4/f2fs does, or
> c) remap the blocks into the attribute fork as an (unusually large)
> extended attribute value.
Sure, but what would be the benefit of doing different things on the
back end? I think this is a really more of a philophical objection
than anything else. With both fsverity and fscrypt, well over 95% of
the implementation is shared between ext4 and f2fs. And from a
cryptographic design, that's something I consider a feature, not a
bug. Cryptographic code is subtle in very different ways compared to
file system code. So it's a good thing to having it done once and
audited by crypto specialists, as opposed to having each file system
doing it differently / independently.
> If the merkle_fd isn't on the same filesystem as the fd we could at
> least use generic_copy_file_range (i.e. page cache copying) to land the
> merkle tree wherever we want.
>
> Granted, it's not like we can't do any of those three things given the
> current interface. I gather most of the grumbling has to do with
> feeling like we're associating the on-disk format to the ioctl interface
> too closely?
Right, the current interface makes it somewhat more awkward to do
these other things --- but the question is *why* would you want to in
the first place? Why add the extra complexity? I'm a big believer of
the KISS principle, and if there was a reason why a file system would
want to store the Merkle tree somewhere else, we could talk about it,
but I see only downside, and no upside.
- Ted
Hi Christoph,
On Wed, Dec 12, 2018 at 01:14:06AM -0800, Christoph Hellwig wrote:
> As this apparently got merged despite no proper reviews from VFS
> level persons:
fs-verity has been out for review since August, and Cc'ed to all relevant
mailing lists including linux-fsdevel, linux-ext4, linux-f2fs-devel,
linux-fscrypt, linux-integrity, and linux-kernel. There are tests,
documentation (since v2), and a userspace tool. It's also been presented at
multiple conferences, and has been covered by LWN multiple times. If more
people want to review it, then they should do so; there's nothing stopping them.
>
> NAK - the ioctl format that expects the verifycation hash in the file
> data data with padding after the real data is simply not acceptable,
> we can't just transform the data in the file itself based on a magic
> calls like this.
>
Can you elaborate on the actual problems you think the current solution has, and
exactly what solution you'd prefer instead? Keep in mind that (1) for large
files the Merkle tree can be gigabytes long, (2) Linux doesn't have an API for
file streams, and (3) when fs-verity is combined with fscrypt, it's important
that the hashes be encrypted, so as to not leak information about the plaintext.
> Also the core code should not depend on this as a storage format,
> which is a rather bad idea. In any modern file system you can
> store data like this out of line in something like the attr fork
> in XFS, or the attr items in btrfs.
As explained in the documentation, the core code uses the "metadata after EOF"
format for the API, but not necessarily the on-disk format. I.e.,
FS_IOC_ENABLE_VERITY requires it, but during the ioctl the filesystem can choose
to move the metadata into a different location, such as a file stream.
We'd just need to update fsverity_read_metadata_page() and
compute_tree_depth_and_offsets() to call out to the filesystem's
fsverity_operations to read a metadata page and get the offset of the first
metadata page, respectively. The rest of fs/verity/ will still work. I'd be
glad to add those two fsverity_operations now, though they're not needed for
ext4 and f2fs, if it would help clarify things further.
Thanks,
- Eric
On Wed, Dec 19, 2018 at 01:19:53PM +1100, Dave Chinner wrote:
> Putting metadata in user files beyond EOF doesn't work with XFS's
> post-EOF speculative allocation algorithms.
>
> i.e. Filesystem design/algorithms often assume that the region
> beyond EOF in user files is a write-only region. e.g. We can allow
> extents beyond EOF to be uninitialised because they are in a write
> only region of the file and so there's no possibility of stale data
> exposure. Unfortunately, putting filesystem/security metadata beyond
> EOF breaks these assumptions - it's no longer a write-only region.
On Tue, Dec 18, 2018 at 11:14:20PM -0800, Christoph Hellwig wrote:
> Filesystems already use blocks beyond EOF for preallocation, either
> speculative by the file system itself, or explicitly by the user with
> fallocate. I bet you will run into bugs with your creative abuse
> sooner or later. Indepnd of that the interface simply is gross, which
> is enough of a reason not to merge it.
Both of these concerns aren't applicable for fs-verity because the
entire file will be read-only. So there will be no preallocation or
fallocation going on --- or allowed --- for a file which is protected
by fs-verity. Since no writes are allowed at all, it won't break any
file systems' assumptions about "write-only regions".
As far as whether it's "gross" --- that's a taste question, and I
happen to think it's more "clever" than "gross". It allows for a very
simple implementation, *leveraging* the fact that the file will never
change --- and especially, grow in length. So why not use the space
after EOF?
The alternative requires adding Solaris-style alternate data streams
support. Whether or not ADS is a good idea or just an invitation to
malware authors[1] is something which can be debated, but my position
is it's unnecessary given the requirements of fs-verity. And avoiding
such complexity is a *good* thing, not a bad thing.
[1] https://www.deepinstinct.com/2018/06/12/the-abuse-of-alternate-data-stream-hasnt-disappeared/
- Ted
On Fri, Dec 21, 2018 at 7:47 AM Theodore Y. Ts'o <[email protected]> wrote:
>
> Linus --- we're going round and round, and I don't think this is
> really a technical dispute at this point, but rather an aesthetics
> one.
Grr.
So honestly, I personally *like* the model of "the file contains its
own validation data" model. I think that's the right model, so that
you can then basically just do "enable verification on this file, and
verify that the root hash is this".
So that part I like. I think the people who argue for "let's have a
separate interface that writes the merkle tree data" are completely
wrong.
HOWEVER.
I do agree that your particular model is pretty damn broken in lots of ways.
Why is it filesystem specific? If the whole point is that the file
itself has its own verification data (which I like), then I don't see
why this is then documented as some filesystem-specific layout model.
That's complete and utter garbage.
In other words: either the model is that the file *itself* contains
its own merkle tree that validates the file, or it isn't. You can't
have it two ways. No silly "layout changes when you apply the hash"
garbage. That's just crazy talk and invalidates the whole model.
And honestly, I still think that it's very odd to add the merge data
to the end, when the filesystem already supports xattrs. It would have
made much more sense to just make one xattr contain the merkle tree
validation data.
So why is this sold as some unholy mess of "filesystem-specific" and
"generic"? That part just annoys the hell out of me. Why isn't this
sold as an *actual* generic model, where you just say "append the
merkle tree to the file, then enable verity testing of the end result
and validate the top-level hash".
That kind of thing could be done with absolutely _zero_ per-filesystem
code, and made 100% generic, and we'd just verify the merge data in
readpages().
So what's the excuse for doing the crazy odd "let's just support one
single filesystem" model?
Linus
On Wed, Dec 12, 2018 at 12:26:10PM -0800, Eric Biggers wrote:
> > As this apparently got merged despite no proper reviews from VFS
> > level persons:
>
> fs-verity has been out for review since August, and Cc'ed to all relevant
> mailing lists including linux-fsdevel, linux-ext4, linux-f2fs-devel,
> linux-fscrypt, linux-integrity, and linux-kernel. There are tests,
> documentation (since v2), and a userspace tool. It's also been presented at
> multiple conferences, and has been covered by LWN multiple times. If more
> people want to review it, then they should do so; there's nothing stopping them.
But you did not got a review from someone like Al, Linus, Andrew or me,
did you?
> Can you elaborate on the actual problems you think the current solution has, and
> exactly what solution you'd prefer instead? Keep in mind that (1) for large
> files the Merkle tree can be gigabytes long, (2) Linux doesn't have an API for
> file streams, and (3) when fs-verity is combined with fscrypt, it's important
> that the hashes be encrypted, so as to not leak information about the plaintext.
Given that you alread use an ioctl as the interface what is the problem
of passing this data through the ioctl?
On Thu, Dec 20, 2018 at 05:01:58PM -0500, Theodore Y. Ts'o wrote:
> That's simply not true. Number one, fsverity is not mandatory for all
> file systems to implement. If XFS doesn't want to implement fscrypt
> or fsverity, it doesn't have to. Number two, we're not *making* any
> changes to the kernel code; nothing in mm/filemap.c, et. al. So
> saying that we are making changes that are impacted by /everyone/ just
> doesn't make any sense.
Ted, I think you know yourself this isn't true. Whenever we added
useful interface to one of the major file systems we had other pick
it up, and that is a good thing because the last thing we need is
fragmentation of interfaces. And even if that wasn't the case I don't
think we should take short cuts, because even if an interface was just
for a file system or two it still needs to be properly desgined.
There is no reason to rush interfacs in, because everytime we have done
that it has turned out to be a very bad idea in retrospective.
On Fri, Dec 21, 2018 at 11:17:12PM -0500, Theodore Y. Ts'o wrote:
> Userspace applications which are reading the file aren't going to be
> expecting Merkle tree. For example, one of the use cases is Android
> APK files, which are essentially ZIP files. ZIP files can be parsed
> both from the front-end (streaming), or by looking for the complete
> directory of all of the files in the ZIP file by starting at the end
> of the file and moving backwards. If the Merkle tree was visible to
> userspace programs that are opening and reading the file, it would
> confuse them mightily.
Pretty much every file format has the ability to put arbitrary blocks
of information into a file somewhere the tools which don't know about
it will skip it. For example, ZIP "includes an extra field facility
within file headers, which can be used to store extra data not defined
by existing ZIP specifications, and which allow compliant archivers that
do not recognize the fields to safely skip them. Header IDs 0–31 are
reserved for use by PKWARE. The remaining IDs can be used by third-party
vendors for proprietary usage. " (Wikipedia)
ELF, PNG, PDF and many other formats have the ability to put data
_somewhere_. It might not be at the tail of the file, but there's
somewhere to do it.
(I appreciate this isn't what Linus is asking for, but I'm pointing out
that this is by no means as intractable as you make it sound.)
As this apparently got merged despite no proper reviews from VFS
level persons:
NAK - the ioctl format that expects the verifycation hash in the file
data data with padding after the real data is simply not acceptable,
we can't just transform the data in the file itself based on a magic
calls like this.
Also the core code should not depend on this as a storage format,
which is a rather bad idea. In any modern file system you can
store data like this out of line in something like the attr fork
in XFS, or the attr items in btrfs.
[FYI, your mail never made it to my inbox, although I found the copy
in linux-fsdevel now]
On Fri, Dec 14, 2018 at 12:17:22AM -0500, Theodore Y. Ts'o wrote:
> I don't consider fs-verity to be part of core VFS, but rather a
> library that happens to be used by ext4 and f2fs. This is much like
> fscrypt, which was originally an ext4-only thing, but the code was
> always set up so it could be used by other file systems, and when f2fs
> was interested in using it, we moved it to fs/crypto. As such the
> fscrypto code never got a review from Al, Andrew, or you, and when I
> pushed it to Linus, he accepted the pull request.
And as a result we are stuck with a pretty bad interface, so this is
a very good example for how to not do thing! Just because a user
interface is only implemented by one or two file systems doesn't mean
it should skip the userspace ABI review, because we tend to generalize
them unless they are deeply specific to fs internals.
> P.S. And if you've purchased a Pixel 3 device, it's already using the
> fsverity code, so it's quite well tested (and yes, we have xfstests).
And all kinds of other code that would never pass review, so that isn't
really a good argument unfortunately :( Note that I would want to buy
a piece of hardware coming with google spyware preinstalled.
On Tue, Dec 18, 2018 at 07:16:03PM -0500, Theodore Y. Ts'o wrote:
> Sure, but what would be the benefit of doing different things on the
> back end? I think this is a really more of a philophical objection
> than anything else. With both fsverity and fscrypt, well over 95% of
> the implementation is shared between ext4 and f2fs. And from a
> cryptographic design, that's something I consider a feature, not a
> bug. Cryptographic code is subtle in very different ways compared to
> file system code. So it's a good thing to having it done once and
> audited by crypto specialists, as opposed to having each file system
> doing it differently / independently.
Where the data is located on disk should not matter for the crypto
details. If it does you have severe implementation issues.
> Right, the current interface makes it somewhat more awkward to do
> these other things --- but the question is *why* would you want to in
> the first place? Why add the extra complexity? I'm a big believer of
> the KISS principle, and if there was a reason why a file system would
> want to store the Merkle tree somewhere else, we could talk about it,
> but I see only downside, and no upside.
Filesystems already use blocks beyond EOF for preallocation, either
speculative by the file system itself, or explicitly by the user with
fallocate. I bet you will run into bugs with your creative abuse
sooner or later. Indepnd of that the interface simply is gross, which
is enough of a reason not to merge it.
On Thu, Dec 13, 2018 at 08:48:03PM -0800, Eric Biggers wrote:
> Hi Christoph,
>
> On Thu, Dec 13, 2018 at 12:22:49PM -0800, Christoph Hellwig wrote:
> > On Wed, Dec 12, 2018 at 12:26:10PM -0800, Eric Biggers wrote:
> > > > As this apparently got merged despite no proper reviews from VFS
> > > > level persons:
> > >
> > > fs-verity has been out for review since August, and Cc'ed to all relevant
> > > mailing lists including linux-fsdevel, linux-ext4, linux-f2fs-devel,
> > > linux-fscrypt, linux-integrity, and linux-kernel. There are tests,
> > > documentation (since v2), and a userspace tool. It's also been presented at
> > > multiple conferences, and has been covered by LWN multiple times. If more
> > > people want to review it, then they should do so; there's nothing stopping them.
> >
> > But you did not got a review from someone like Al, Linus, Andrew or me,
> > did you?
>
> Sure, those specific people (modulo you just now) haven't responded to the
> fs-verity patches yet. But again, the patches have been out for review for
> months. Of course, we always prefer more reviews over fewer, and we strongly
> encourage anyone interested to review fs-verity! (The Documentation/ file may
> be a good place to start.) But ultimately we cannot force reviews, and as you
> know kernel reviews can be very hard to come by. Yet, people still need
> fs-verity anyway; it isn't just some toy. And we're committed to maintaining
> it, similar to fscrypt. The ext4 and f2fs maintainers are also satisfied with
> the current approach to storing the verity metadata past EOF; in fact it was
> even originally Ted's idea, I think.
>
> >
> > > Can you elaborate on the actual problems you think the current solution has, and
> > > exactly what solution you'd prefer instead? Keep in mind that (1) for large
> > > files the Merkle tree can be gigabytes long, (2) Linux doesn't have an API for
> > > file streams, and (3) when fs-verity is combined with fscrypt, it's important
> > > that the hashes be encrypted, so as to not leak information about the plaintext.
> >
> > Given that you alread use an ioctl as the interface what is the problem
> > of passing this data through the ioctl?
>
> Do you mean pass the verity metadata in a buffer? That cannot work in general,
> because it may be too large to fit into memory.
>
> Or do you mean pass it via a second file descriptor? That could work, but it
> doesn't seem better than the current approach. It would force every filesystem
> to move the metadata around, whereas currently ext4 and f2fs can simply leave it
> in place. If you meant this, are there advantages you have in mind that would
> outweigh this?
FWIW, if I were (hypothetically) working on an xfs implementation, I
likely would have settled on passing a reference to a merkle tree
through a (fd, length) pair, because that allows us plenty of options
on the back end:
b) we could remap the tree into a new inode fork for merkle trees, or
a) remap it as posteof blocks like ext4/f2fs does, or
c) remap the blocks into the attribute fork as an (unusually large)
extended attribute value.
If the merkle_fd isn't on the same filesystem as the fd we could at
least use generic_copy_file_range (i.e. page cache copying) to land the
merkle tree wherever we want.
Granted, it's not like we can't do any of those three things given the
current interface. I gather most of the grumbling has to do with
feeling like we're associating the on-disk format to the ioctl interface
too closely?
I certainly can see why you'd want to avoid having to run a whole bunch
of SWAPEXT operations to set up a verity file, though.
Anyhow, that's just my 2 cents. :)
--D
> We also considered generating the Merkle tree in the kernel, in which case
> FS_IOC_ENABLE_VERITY would just take a small structure similar to the current
> fsverity_descriptor. But that would add extra complexity to the kernel, and
> generating a Merkle tree over a large file is the type of parallelizable, CPU
> intensive work that really should be done in userspace. Also, having userspace
> provide the Merkle tree allows for it to be pre-generated and distributed with
> the file, e.g. provided in a package to be installed on many systems.
>
> But please do let us know if you have any better ideas.
>
> Thanks!
>
> - Eric
On Thu, Nov 01, 2018 at 03:52:19PM -0700, Eric Biggers wrote:
> +In the recommended configuration of SHA-256 and 4K blocks, 128 hash
> +values fit in each block. Thus, each level of the hash tree is 128
> +times smaller than the previous, and for large files the Merkle tree's
> +size converges to approximately 1/129 of the original file size.
I think you mean 1/127, not 1/129.
> +fsveritysetup format
> +--------------------
> +
> +When enabling fs-verity on a file via the `FS_IOC_ENABLE_VERITY`_
> +ioctl, the kernel requires that the verity metadata has been appended
> +to the file contents. Specifically, the file must be arranged as:
> +
> +#. Original file contents
> +#. Zero-padding to next block boundary
> +#. `Merkle tree`_
> +#. `fs-verity descriptor`_
> +#. fs-verity footer
> +
> +We call this file format the "fsveritysetup format". It is not
> +necessarily the on-disk format actually used by the filesystem, since
> +the filesystem is free to move things around during the ioctl.
> +However, the easiest way to implement fs-verity is to just keep this
> +arrangement in-place, as ext4 and f2fs do; see `Filesystem support`_.
> +
> +Note that "block" here means the fs-verity block size, which is not
> +necessarily the same as the filesystem's block size. For example, on
> +ext4, fs-verity can use 4K blocks on top of a filesystem formatted to
> +use a 1K block size.
> +
> +The fs-verity footer is a structure of the following format::
> +
> + struct fsverity_footer {
> + __le32 desc_reverse_offset;
> + __u8 magic[8];
> + };
> +
> +``desc_reverse_offset`` is the distance in bytes from the end of the
> +fs-verity footer to the beginning of the fs-verity descriptor; this
> +allows software to find the fs-verity descriptor. ``magic`` is the
> +ASCII bytes "FSVerity"; this allows software to quickly identify a
> +file as being in the "fsveritysetup" format as well as find the
> +fs-verity footer if zeroes have been appended.
> +
> +The kernel cannot handle fs-verity footers that cross a page boundary.
> +Padding must be prepended as needed to meet this constaint.
I think this ioctl is the start of the disagreement. How about this
strawman:
verity_fd = ioctl(fd, FS_IOC_VERITY_FD);
write(verity_fd, &merkle_tree);
close(verity_fd);
At final close of that verity_fd, the filesystem behaves in the same way
that it does on receipt of this FS_IOC_ENABLE_VERITY ioctl today.
> +FS_IOC_MEASURE_VERITY
> +---------------------
> +
> +The FS_IOC_MEASURE_VERITY ioctl retrieves the fs-verity measurement of
> +a regular file. This is a digest that cryptographically summarizes
> +the file contents that are being enforced on reads. The file must
> +have fs-verity enabled.
> +
> +This ioctl takes in a pointer to a variable-length structure::
> +
> + struct fsverity_digest {
> + __u16 digest_algorithm;
> + __u16 digest_size; /* input/output */
> + __u8 digest[];
> + };
> +
> +``digest_size`` is an input/output field. On input, it must be
> +initialized to the number of bytes allocated for the variable-length
> +``digest`` field.
> +
> +On success, 0 is returned and the kernel fills in the structure as
> +follows:
> +
> +- ``digest_algorithm`` will be the hash algorithm used for the file
> + measurement. It will match the algorithm used in the Merkle tree,
> + e.g. FS_VERITY_ALG_SHA256. See ``include/uapi/linux/fsverity.h``
> + for the list of possible values.
> +- ``digest_size`` will be the size of the digest in bytes, e.g. 32
> + for SHA-256. (This can be redundant with ``digest_algorithm``.)
> +- ``digest`` will be the actual bytes of the digest.
> +
> +This ioctl is guaranteed to be very fast. Due to fs-verity's use of a
> +Merkle tree, its running time is independent of the file size.
> +
> +This ioctl can fail with the following errors:
> +
> +- ``EFAULT``: invalid buffer was specified
> +- ``ENODATA``: the file is not a verity file
> +- ``ENOTTY``: this type of filesystem does not implement fs-verity
> +- ``EOPNOTSUPP``: the kernel was not configured with fs-verity support
> + for this filesystem, or the filesystem superblock has not had the
> + 'verity' feature enabled on it. (See `Filesystem support`_.)
> +- ``EOVERFLOW``: the file measurement is longer than the specified
> + ``digest_size`` bytes. Try providing a larger buffer.
Should this ioctl be better implemented as an xattr?
> +- Direct I/O is not supported on verity files. Attempts to use direct
> + I/O on such files will fall back to buffered I/O.
That makes sense; the filesystem can't verify the data before presenting
it to userspace if it's being copied directly into userspace.
> +- DAX (Direct Access) is not supported on verity files.
That makes less sense. The kernel can check the checksum before
copying the data to the user. Is this simply a current limitation of
the implementation?
> +Thus, when ascending the tree reading hash pages, fs-verity can stop
> +as soon as it finds an already-checked hash page. This optimization,
> +which is also used by dm-verity, results in excellent sequential read
> +performance since usually the deepest needed hash page will already be
> +cached and checked. However, random reads perform worse.
I think you mean "all but the deepest"?
On Wed, Dec 19, 2018 at 02:30:05PM -0500, Theodore Y. Ts'o wrote:
> On Wed, Dec 19, 2018 at 01:19:53PM +1100, Dave Chinner wrote:
> > Putting metadata in user files beyond EOF doesn't work with XFS's
> > post-EOF speculative allocation algorithms.
> >
> > i.e. Filesystem design/algorithms often assume that the region
> > beyond EOF in user files is a write-only region. e.g. We can allow
> > extents beyond EOF to be uninitialised because they are in a write
> > only region of the file and so there's no possibility of stale data
> > exposure. Unfortunately, putting filesystem/security metadata beyond
> > EOF breaks these assumptions - it's no longer a write-only region.
>
> On Tue, Dec 18, 2018 at 11:14:20PM -0800, Christoph Hellwig wrote:
> > Filesystems already use blocks beyond EOF for preallocation, either
> > speculative by the file system itself, or explicitly by the user with
> > fallocate. I bet you will run into bugs with your creative abuse
> > sooner or later. Indepnd of that the interface simply is gross, which
> > is enough of a reason not to merge it.
>
> Both of these concerns aren't applicable for fs-verity because the
> entire file will be read-only. So there will be no preallocation or
> fallocation going on --- or allowed --- for a file which is protected
> by fs-verity. Since no writes are allowed at all, it won't break any
> file systems' assumptions about "write-only regions".
The file has to be written before it has been protected, which means
it may very well have user space allocated beyond EOF before the
merkle tree needs to be written. And, well, the fact that it is all
read only after creation is a feature implementation detail that
allows fsverity to "get away with" storing it's metadata in file
data space.
But whether or not fsverity is enabled on the filesystem, the fact
is that the kernel code now has to support storing and reading data
from beyond EOF. Every user, whether they are using fsverity or not,
is now exposed to that code and a filesystem that no longer
considers the user data region beyond EOF as write only. i.e. it
doesn't matter if fsverity is in use, then ext4/f2fs code now
allows reading of information beyond EOF from user data files
i.e. you've completely changed the way files appear to /everyone/,
not just the users of fsverity. Not only that, you now have file
data that has a specific metadata on-disk format encoded into file
data space. That greatly complicates filesystem checking and
scrubbing which typically /doesn't even look at the contents of
user data/. So yeah, this hack might make the merkle tree
verification "simple" but it greatly complicates everything else the
filesystem has to do.
That's the problem here - fsverity completely redefines the layout
of user data files for everyone, not just fsverity, and not just the
filesystems that implement fsverity. You've taken an ext4 fsverity
implementation feature and promoted it to being a linux-wide
file data layout standard that is encoded into the kernel/user
ABI/API forever more.
And you're trying to force this into the tree on everyone without
adequate review because "a product is already shipping with this
code in it". Didn't we learn the lessons of failing to "upstream
first" new features more than 15 years ago?
> As far as whether it's "gross" --- that's a taste question, and I
> happen to think it's more "clever" than "gross".
You think it's clever because it's a neat hack that makes what you
need simple to implement, and so you can ship it in phones quickly
and without needing to involve upstream in more complex design
discussions.
I think it's gross because it bleeds implementation details all over
the API and globally redefines the user data file layout for
everyone, kernel wide in a manner that is incompatible with existing
filesystem implementations.
> It allows for a very
> simple implementation, *leveraging* the fact that the file will never
> change --- and especially, grow in length. So why not use the space
> after EOF?
There have been many technical reasons given for why it's a bad interfaces,
yet you only address entirely subjective arguments and claim that
you have "good taste" in APIs because it is "clever".
> The alternative requires adding Solaris-style alternate data streams
> support.
No, it does not. It simply requires a different userspace API to
move the merkle tree data into the kernel, and a different
implemetnation abstraction that allows filesystems to provide the
merkle tree data pages on request. Darrick and Christoph have
already suggested alternative user APIs that would work just fine,
and they don't ahve a requirement that the merkle tree is held in
the user data space beyond EOF.
How filesystems store and retrieve merkle tree data should be a
filesystem internal detail. If how metadata is stored in th e
filesystem is defined by the userspace API or the kernel library
code that implements the verification feature, then it lacks the
necessary abstraction to be a generic Linux filesystem feature.
IOWs, it needs to be redesigned and reworked before we should
consider it for merging.
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Mon, Dec 17, 2018 at 08:49:49AM -0800, Christoph Hellwig wrote:
>
> > > Given that you alread use an ioctl as the interface what is the problem
> > > of passing this data through the ioctl?
> >
> > Do you mean pass the verity metadata in a buffer? That cannot work in general,
> > because it may be too large to fit into memory.
>
> Have a pointer in the ioctl and do get_user_pages on it.
I don't see how that helps. The Merkle tree can still be too large to fit in
memory. In the worst case, it might not even fit in the address space. And I
don't see how get_user_pages() helps either over just copy_from_user(); what are
you proposing to do with the pages after getting them, exactly?
- Eric
On Tue, Dec 18, 2018 at 07:16:03PM -0500, Theodore Y. Ts'o wrote:
> On Mon, Dec 17, 2018 at 12:00:39PM -0800, Darrick J. Wong wrote:
> > FWIW, if I were (hypothetically) working on an xfs implementation, I
> > likely would have settled on passing a reference to a merkle tree
> > through a (fd, length) pair, because that allows us plenty of options
> > on the back end:
> >
> > b) we could remap the tree into a new inode fork for merkle trees, or
> > a) remap it as posteof blocks like ext4/f2fs does, or
> > c) remap the blocks into the attribute fork as an (unusually large)
> > extended attribute value.
>
> Sure, but what would be the benefit of doing different things on the
> back end? I think this is a really more of a philophical objection
> than anything else.
Putting metadata in user files beyond EOF doesn't work with XFS's
post-EOF speculative allocation algorithms.
i.e. Filesystem design/algorithms often assume that the region
beyond EOF in user files is a write-only region. e.g. We can allow
extents beyond EOF to be uninitialised because they are in a write
only region of the file and so there's no possibility of stale data
exposure. Unfortunately, putting filesystem/security metadata beyond
EOF breaks these assumptions - it's no longer a write-only region.
IOWs, all these existing assumptions and implementation details are
exposed to a new attack surface involving tricking the filesystem
into thinking it has readable data beyond EOF. And because it can
now read from the "write only" region beyond EOF (because that's the
mechanism by which fsverity does it's verification) we no longer
have a clear line of protection against exposing such data to
userspace.
Putting the merkel tree somewhere else in the filesystem metadata
and providing a separate API to manipulate it avoids this problem.
It allows filesystems to keep their internal metadata and
security-related verification information in a separate channel (and
trust path) that is completely out of user data/access scope.
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Mon, Dec 17, 2018 at 10:32:06AM -0800, Eric Biggers wrote:
> I don't see how that helps. The Merkle tree can still be too large to fit in
> memory. In the worst case, it might not even fit in the address space. And I
> don't see how get_user_pages() helps either over just copy_from_user(); what are
> you proposing to do with the pages after getting them, exactly?
Write them out to a file system specific area on the media. Note
that get_user_pages is indeed not going to work if you run out of
address space, but that seems like an odd use case. Out of of memory
is not an issue as we generally iterate over a small number of pages
for each individual get_user_pages call.
On Tue, Dec 18, 2018 at 11:16:08PM -0800, Linus Torvalds wrote:
> On Tue, Dec 18, 2018, 23:11 Christoph Hellwig <[email protected] wrote:
>
> >
> > I think the fd would have to be on the same fs for this interface to
> > make sense. But it could be an O_TMPFILE one. And given that ext4
> > already supports a variant of swapext this interface should also work
> > with the existing ext4 on disk format.
> >
>
> Why is the merkle tree not just an xattr of the file?
Because it is too large for our awkward xattrs interface..
On Thu, Dec 13, 2018 at 12:22:49PM -0800, Christoph Hellwig wrote:
> On Wed, Dec 12, 2018 at 12:26:10PM -0800, Eric Biggers wrote:
> > > As this apparently got merged despite no proper reviews from VFS
> > > level persons:
> >
> > fs-verity has been out for review since August, and Cc'ed to all relevant
> > mailing lists including linux-fsdevel, linux-ext4, linux-f2fs-devel,
> > linux-fscrypt, linux-integrity, and linux-kernel. There are tests,
> > documentation (since v2), and a userspace tool. It's also been presented at
> > multiple conferences, and has been covered by LWN multiple times. If more
> > people want to review it, then they should do so; there's nothing stopping them.
>
> But you did not got a review from someone like Al, Linus, Andrew or me,
> did you?
I don't consider fs-verity to be part of core VFS, but rather a
library that happens to be used by ext4 and f2fs. This is much like
fscrypt, which was originally an ext4-only thing, but the code was
always set up so it could be used by other file systems, and when f2fs
was interested in using it, we moved it to fs/crypto. As such the
fscrypto code never got a review from Al, Andrew, or you, and when I
pushed it to Linus, he accepted the pull request.
The difference this time is that ext4 and f2fs are interested in using
common code from the beginning.
> > Can you elaborate on the actual problems you think the current solution has, and
> > exactly what solution you'd prefer instead? Keep in mind that (1) for large
> > files the Merkle tree can be gigabytes long, (2) Linux doesn't have an API for
> > file streams, and (3) when fs-verity is combined with fscrypt, it's important
> > that the hashes be encrypted, so as to not leak information about the plaintext.
>
> Given that you alread use an ioctl as the interface what is the problem
> of passing this data through the ioctl?
The size of the Merkle tree is roughly size/129. So for a 100MB file
(and there can be Android APK files that bug), the Merkle tree could
be almost 800k. That's not really a size that we would want to push
through an ioctl.
We could treat the ioctl as write-like interface, but using write(2)
seemed to make a lot more sense. Also, the fscrypt common code
leveraged by f2fs and ext4 assume that the verity tree will be stored
after the data blocks.
Given that the semantics of a verity-protected file is that it is
immutable, you *could* store the Merkle tree in a separate file
stream, but it really doesn't buy you anything --- by definition, you
can't append to a fs-verity protected file. Furthermore, it would
require extra complexity in the common fsverity code --- which looks
for the Merkle tree at the end of file data --- for no real benefit.
Cheers,
- Ted
P.S. And if you've purchased a Pixel 3 device, it's already using the
fsverity code, so it's quite well tested (and yes, we have xfstests).
On Mon, Dec 17, 2018 at 12:00:39PM -0800, Darrick J. Wong wrote:
> FWIW, if I were (hypothetically) working on an xfs implementation, I
> likely would have settled on passing a reference to a merkle tree
> through a (fd, length) pair, because that allows us plenty of options
> on the back end:
>
> b) we could remap the tree into a new inode fork for merkle trees, or
> a) remap it as posteof blocks like ext4/f2fs does, or
> c) remap the blocks into the attribute fork as an (unusually large)
> extended attribute value.
>
> If the merkle_fd isn't on the same filesystem as the fd we could at
> least use generic_copy_file_range (i.e. page cache copying) to land the
> merkle tree wherever we want.
I think the fd would have to be on the same fs for this interface to
make sense. But it could be an O_TMPFILE one. And given that ext4
already supports a variant of swapext this interface should also work
with the existing ext4 on disk format.
On Fri, Dec 21, 2018 at 11:28:13AM -0500, Theodore Y. Ts'o wrote:
> On Fri, Dec 21, 2018 at 07:53:54AM -0800, Matthew Wilcox wrote:
> > In contrast to "we'll just fix it up later" (which usually applies
> > to in-kernel interfaces), we have a policy of not breaking userspace,
> > so accepting this interface means setting it in stone. We should get
> > it right.
>
> I'm not convinced it's a "fix", but my point is that if later on you
> want to add extra complexity transforming
>
> ioctl(fd, FS_IOC_ENABLE_VERITY);
>
> so it does the equivalent of
>
> ioctl(fd, FS_IOC_ENABLE_VERITY_NOW_WITH_EXTRA_USELESS_COMPLEXITY,
> fd, sizeof_data, sizeof_verity_data);
I disagree with your EXTRA_USELESS_COMPLEXITY appendage. The interface
you designed reflects the implementation you did in ext4, so I understand
why it seems simple from your point of view. From the user point of view,
it looks completely weird. You write a file, being a series of bytes,
then all of a sudden have to know that it's composed of blocks, seek
to the next block, write a header, then this Merkle data structure,
then write a footer which isn't allowed to cross a block boundary
for some unknowable reason. It seems much more logical to have the
header+Merkle+footer as a separate data stream which the filesystem can
then layout according to its own rules.
On Fri, Dec 21, 2018 at 11:13:07AM -0800, Linus Torvalds wrote:
>
> I do agree that your particular model is pretty damn broken in lots of ways.
>
> Why is it filesystem specific? If the whole point is that the file
> itself has its own verification data (which I like), then I don't see
> why this is then documented as some filesystem-specific layout model.
> That's complete and utter garbage.
>
> In other words: either the model is that the file *itself* contains
> its own merkle tree that validates the file, or it isn't. You can't
> have it two ways. No silly "layout changes when you apply the hash"
> garbage. That's just crazy talk and invalidates the whole model.
Userspace applications which are reading the file aren't going to be
expecting Merkle tree. For example, one of the use cases is Android
APK files, which are essentially ZIP files. ZIP files can be parsed
both from the front-end (streaming), or by looking for the complete
directory of all of the files in the ZIP file by starting at the end
of the file and moving backwards. If the Merkle tree was visible to
userspace programs that are opening and reading the file, it would
confuse them mightily.
So what we do for ext4 and f2fs is make the Merkle tree invisible; if
userspace stats the file, st_size will return size of the original
"data" file, and reading beyond the st_size from userspace will behave
like reading beyond EOF. From the *file system's* perspective,
though, the metadata blocks are part of the file. There's just a
difference between the userspace visible EOF and the file system's
conception of EOF. I don't consider this a "layout change", and I
personally believe this should be just *fine* for all file systems.
The XFS developers are convinced that this is horrific, and no one
sane should do this. OK, call me insane. But it works, and I think
it's elegant and clean.
So if *they* want to use some other layout, where the Merkle blocks
are stored in some Alternate Data Stream, ala NTFS --- they are *free*
to do that. It will require more work, and at that point, it will
require a layout change. But it's Dave and Christoph who are
insisting on doing that; not me!
> And honestly, I still think that it's very odd to add the merge data
> to the end, when the filesystem already supports xattrs. It would have
> made much more sense to just make one xattr contain the merkle tree
> validation data.
The problem is that xattrs are designed to be accessed via a set/get
interface, are currently limited, IIRC at 32k. The max size of an APK
is 300 megabytes; and the Merkle tree for a file that size will be
about 2.3 megabytes. That's way too big to store as an xattr;
certainly using the existing xattr interfaces. And it's also bigger
than most file systems can handle as xattrs today --- because they've
been optimzied for relatively small sizes, for things like SELinux
labels and ACL structures.
> So why is this sold as some unholy mess of "filesystem-specific" and
> "generic"? That part just annoys the hell out of me. Why isn't this
> sold as an *actual* generic model, where you just say "append the
> merkle tree to the file, then enable verity testing of the end result
> and validate the top-level hash".
That was the original way it was sold, but Cristoph and Dave have
NACK'ed it in that form. The common fsverity code which is generic to
ext4 and f2fs does treat it that way, with the note that we "lie" to
userspace about is the size of the file and where the EOF is. Dave
and Cristoph have declaimed strongly that this is this layout choice
is horrible, and filesystem specific, and XFS could never do it that
way. I don't understand why, but they are the XFS experts. So if
they want to do something else, what I've been trying to point out is
that they can do that, using the existing interface.
> So what's the excuse for doing the crazy odd "let's just support one
> single filesystem" model?
Android devices use both ext4 and f2fs; it's the manufacturer's
choice. So we wanted fs-verity to support both. And we didn't want
to duplicate code across ext4 and f2fs; hence trying to put common
code in fs/verity. So we aren't supporting one file system out of the
gate; we're supporting two.
Whether XFS wants to implement fs-verity is purely XFS's choice. XFS
has chosen not to support fscrypt, which is currently used by ext4,
f2fs, and ubifs, and both fscrypt's and fs-verity's initial use case
has been for Android, which is not an area where XFS has proven to be
a common choice.
So I was not really expecting that they would have any interest in
fs-verity. But they seem to have very strong opinions about how they
would want to implement it, and it's different from what we have in
the current "generic code shared by ext4 and f2fs". I was trying to
show that even if they wanted to do things in this different way ---
and I don't understand why it's so important to them --- it would be
possible to do so.
Cheers,
- Ted
On Sat, Dec 22, 2018 at 08:10:07PM -0800, Matthew Wilcox wrote:
> Pretty much every file format has the ability to put arbitrary blocks
> of information into a file somewhere the tools which don't know about
> it will skip it. For example, ZIP "includes an extra field facility
> within file headers, which can be used to store extra data not defined
> by existing ZIP specifications, and which allow compliant archivers that
> do not recognize the fields to safely skip them. Header IDs 0–31 are
> reserved for use by PKWARE. The remaining IDs can be used by third-party
> vendors for proprietary usage. " (Wikipedia)
>
> ELF, PNG, PDF and many other formats have the ability to put data
> _somewhere_. It might not be at the tail of the file, but there's
> somewhere to do it.
>
> (I appreciate this isn't what Linus is asking for, but I'm pointing out
> that this is by no means as intractable as you make it sound.)
That design would require the fs-verity code to know the type of eacho
file, and where to find the in-band Merkle tree for each file type
that we wanted to support. And if you wanted to use fs-verity to
protect a sudoers text configuration file (for example), we'd have to
teach sudo how to ignore the userspace visible Merkle tree.
So I agree with you that it's *possible*. But it's ***ugly***. *Way*
uglier than putting the Merkle tree at the end of the file data and
then making it invisible to userspace.
Cheers,
- Ted
On Fri, Dec 21, 2018 at 9:58 AM Christoph Hellwig <[email protected]> wrote:
>
> On Thu, Dec 20, 2018 at 05:01:58PM -0500, Theodore Y. Ts'o wrote:
> > That's simply not true. Number one, fsverity is not mandatory for all
> > file systems to implement. If XFS doesn't want to implement fscrypt
> > or fsverity, it doesn't have to. Number two, we're not *making* any
> > changes to the kernel code; nothing in mm/filemap.c, et. al. So
> > saying that we are making changes that are impacted by /everyone/ just
> > doesn't make any sense.
>
> Ted, I think you know yourself this isn't true. Whenever we added
> useful interface to one of the major file systems we had other pick
> it up, and that is a good thing because the last thing we need is
> fragmentation of interfaces. And even if that wasn't the case I don't
> think we should take short cuts, because even if an interface was just
> for a file system or two it still needs to be properly desgined.
>
> There is no reason to rush interfacs in, because everytime we have done
> that it has turned out to be a very bad idea in retrospective.
Speaking of interfaces, one thing that needs IMHO more thought is the
user facing interface. Not only in the fsverity case, in all cases.
Linux has currently many different ways to implement and check
(cryptographic)-integrity.
Just to name a few, fsverity, UBIFS' auth feature, BTRFS csum,
EVM/IMA, dm-integrity,
dm-verity, ...
At least for filesystems it would be good to have a common interface
to query the
integrity status of a file.
So far we have mixture of different return codes (EPERM, EUCLEAN, EINVAL),
audit events plus cryptic kernel logs.
In UBIFS we return EPERM in case of an auth failure, we followed EVM/IMA.
fsverity goes the EUCLEAN route, just like BTRFS for check sum failures.
Before everything is set in stone, let's try to consolidate at least this.
Along with that, what about the statx() interface?
We have already STATX_ATTR_ENCRYPTED, why no STATX_ATTR_AUTHENTICATED?
--
Thanks,
//richard
On Fri, Dec 21, 2018 at 07:53:54AM -0800, Matthew Wilcox wrote:
> In contrast to "we'll just fix it up later" (which usually applies
> to in-kernel interfaces), we have a policy of not breaking userspace,
> so accepting this interface means setting it in stone. We should get
> it right.
I'm not convinced it's a "fix", but my point is that if later on you
want to add extra complexity transforming
ioctl(fd, FS_IOC_ENABLE_VERITY);
so it does the equivalent of
ioctl(fd, FS_IOC_ENABLE_VERITY_NOW_WITH_EXTRA_USELESS_COMPLEXITY,
fd, sizeof_data, sizeof_verity_data);
it adds essentially no complexity to provide this backwards
compatibility. But if we need to implement
FS_IOC_ENABLE_VERITY_NOW_WITH_EXTRA_USELESS_COMPLEXITY *now*, we gain
nothing, other than pushing back when fsverity lands upstream. We'd
have to provide that backwards compatibility interface anyway, since
there are a lot of users for that existing interface.
So why?
- Ted
Hi Christoph,
On Mon, Dec 17, 2018 at 08:52:31AM -0800, Christoph Hellwig wrote:
> [FYI, your mail never made it to my inbox, although I found the copy
> in linux-fsdevel now]
>
> On Fri, Dec 14, 2018 at 12:17:22AM -0500, Theodore Y. Ts'o wrote:
> > I don't consider fs-verity to be part of core VFS, but rather a
> > library that happens to be used by ext4 and f2fs. This is much like
> > fscrypt, which was originally an ext4-only thing, but the code was
> > always set up so it could be used by other file systems, and when f2fs
> > was interested in using it, we moved it to fs/crypto. As such the
> > fscrypto code never got a review from Al, Andrew, or you, and when I
> > pushed it to Linus, he accepted the pull request.
>
> And as a result we are stuck with a pretty bad interface, so this is
> a very good example for how to not do thing! Just because a user
> interface is only implemented by one or two file systems doesn't mean
> it should skip the userspace ABI review, because we tend to generalize
> them unless they are deeply specific to fs internals.
>
While I do have some improvements planned for the fscrypt interface,
specifically how encryption keys are managed [1], the issues are subtle enough
that I don't think there's any chance they could have been gotten "right" the
first time around, even if lots more people had reviewed it. It took me over a
year working with fscrypt to put together my proposal for how to improve things,
and it was only really possible because I was able to consider all the people
actually using fscrypt and what problems they are having, if any.
Even so, the current fscrypt interface is actually good enough that there still
hasn't been much real interest in getting my proposed improvements merged yet.
(Not surprisingly, they've also been completely ignored by all the "VFS people"
you say should be reviewing this stuff...)
So for fscrypt I personally don't think that waiting would have changed much in
practice, besides ensuring that users wouldn't have any solution at all.
[1] https://lwn.net/Articles/737274/
- Eric
On Fri, Dec 14, 2018 at 12:17:22AM -0500, Theodore Y. Ts'o wrote:
> Furthermore, it would require extra complexity in the common fsverity code
> --- which looks for the Merkle tree at the end of file data --- for no real
> benefit.
To clarify, while this is technically true currently, as I mentioned it's been
kept flexible enough such that a filesystem *could* store the metadata elsewhere
with only some slight changes to the common fs/verity/ code which won't break
other filesystems. Though of course, keeping all filesystems using the
"metadata after EOF" approach does allow a couple simplifications.
- Eric
On Sat, Dec 22, 2018 at 8:46 PM Theodore Y. Ts'o <[email protected]> wrote:
>
> On Sat, Dec 22, 2018 at 08:10:07PM -0800, Matthew Wilcox wrote:
> > Pretty much every file format has the ability to put arbitrary blocks
> > of information into a file somewhere the tools which don't know about
> > it will skip it. For example, ZIP "includes an extra field facility
> > within file headers, which can be used to store extra data not defined
> > by existing ZIP specifications, and which allow compliant archivers that
> > do not recognize the fields to safely skip them. Header IDs 0–31 are
> > reserved for use by PKWARE. The remaining IDs can be used by third-party
> > vendors for proprietary usage. " (Wikipedia)
> >
> > ELF, PNG, PDF and many other formats have the ability to put data
> > _somewhere_. It might not be at the tail of the file, but there's
> > somewhere to do it.
> >
> > (I appreciate this isn't what Linus is asking for, but I'm pointing out
> > that this is by no means as intractable as you make it sound.)
>
> That design would require the fs-verity code to know the type of eacho
> file, and where to find the in-band Merkle tree for each file type
> that we wanted to support. And if you wanted to use fs-verity to
> protect a sudoers text configuration file (for example), we'd have to
> teach sudo how to ignore the userspace visible Merkle tree.
I'm pretty late to the game, but I just want to bring up one approach
that I'm not sure people have previously considered. You can't put the
verification blob in an xattr due to xattr size limits, but you *can*
put a filename in an xattr. What if, at open time, fs-verity looked
for a specially-named xattr attached to a file, resolved that name
like a symlink target, opened the pointed-to file, and just used
*that* as the authentication blob? It'd also be possible to teach
unlink to delete the pointed-to file when the pointer file is deleted
--- sort of like a simple and stupid kind of data fork.
For example, if you wanted to secure /usr/bin/emacs, you could set an
security.fsverify.verification_file xattr (in the system namespace
because the xattr has special semantics) to
"/.verification-blobs/@usr@[email protected]" or something like that.
Then, open(2) on /usr/bin/emacs would, internally to VFS, also open
/.verification-blobs/@usr@[email protected] and read verification
data from it, transparently to both users and the underlying
filesystem. If someone deleted /usr/bin/emacs, VFS would automatically
delete /.verification-blobs/@usr@[email protected]. If
/.verification-blobs/@usr@[email protected] didn't exist at time of
open(2) of /usr/bin/emacs, or couldn't be opened for whatever reason,
the open(2) of /usr/bin/emacs would fail.
ISTM that a scheme like this would give you some of the advantages of
jumbo xattrs, but with much less implementation complexity. If
someone's proposed something like this before, sorry for the noise.