Hi all,
This is V2 of fs-verity support in XFS. In this series I did
numerous changes from V1 which are described below.
This patchset introduces fs-verity [5] support for XFS. This
implementation utilizes extended attributes to store fs-verity
metadata. The Merkle tree blocks are stored in the remote extended
attributes.
A few key points:
- fs-verity metadata is stored in extended attributes
- Direct path and DAX are disabled for inodes with fs-verity
- Pages are verified in iomap's read IO path (offloaded to
workqueue)
- New workqueue for verification processing
- New ro-compat flag
- Inodes with fs-verity have new on-disk diflag
- xfs_attr_get() can return buffer with the attribute
The patchset is tested with xfstests -g auto on xfs_1k, xfs_4k,
xfs_1k_quota, and xfs_4k_quota. Haven't found any major failures.
Patches [6/23] and [7/23] touch ext4, f2fs, btrfs, and patch [8/23]
touches erofs, gfs2, and zonefs.
The patchset consist of four parts:
- [1..4]: Patches from Parent Pointer patchset which add binary
xattr names with a few deps
- [5..7]: Improvements to core fs-verity
- [8..9]: Add read path verification to iomap
- [10..23]: Integration of fs-verity to xfs
Changes from V1:
- Added parent pointer patches for easier testing
- Many issues and refactoring points fixed from the V1 review
- Adjusted for recent changes in fs-verity core (folios, non-4k)
- Dropped disabling of large folios
- Completely new fsverity patches (fix, callout, log_blocksize)
- Change approach to verification in iomap to the same one as in
write path. Callouts to fs instead of direct fs-verity use.
- New XFS workqueue for post read folio verification
- xfs_attr_get() can return underlying xfs_buf
- xfs_bufs are marked with XBF_VERITY_CHECKED to track verified
blocks
kernel:
[1]: https://github.com/alberand/linux/tree/xfs-verity-v2
xfsprogs:
[2]: https://github.com/alberand/xfsprogs/tree/fsverity-v2
xfstests:
[3]: https://github.com/alberand/xfstests/tree/fsverity-v2
v1:
[4]: https://lore.kernel.org/linux-xfs/[email protected]/
fs-verity:
[5]: https://www.kernel.org/doc/html/latest/filesystems/fsverity.html
Thanks,
Andrey
Allison Henderson (4):
xfs: Add new name to attri/d
xfs: add parent pointer support to attribute code
xfs: define parent pointer xattr format
xfs: Add xfs_verify_pptr
Andrey Albershteyn (19):
fsverity: make fsverity_verify_folio() accept folio's offset and size
fsverity: add drop_page() callout
fsverity: pass Merkle tree block size to ->read_merkle_tree_page()
iomap: hoist iomap_readpage_ctx from the iomap_readahead/_folio
iomap: allow filesystem to implement read path verification
xfs: add XBF_VERITY_CHECKED xfs_buf flag
xfs: add XFS_DA_OP_BUFFER to make xfs_attr_get() return buffer
xfs: introduce workqueue for post read IO work
xfs: add iomap's readpage operations
xfs: add attribute type for fs-verity
xfs: add fs-verity ro-compat flag
xfs: add inode on-disk VERITY flag
xfs: initialize fs-verity on file open and cleanup on inode
destruction
xfs: don't allow to enable DAX on fs-verity sealsed inode
xfs: disable direct read path for fs-verity sealed files
xfs: add fs-verity support
xfs: handle merkle tree block size != fs blocksize != PAGE_SIZE
xfs: add fs-verity ioctls
xfs: enable ro-compat fs-verity flag
fs/btrfs/verity.c | 15 +-
fs/erofs/data.c | 12 +-
fs/ext4/verity.c | 9 +-
fs/f2fs/verity.c | 9 +-
fs/gfs2/aops.c | 10 +-
fs/ioctl.c | 4 +
fs/iomap/buffered-io.c | 89 ++++++-----
fs/verity/read_metadata.c | 7 +-
fs/verity/verify.c | 9 +-
fs/xfs/Makefile | 1 +
fs/xfs/libxfs/xfs_attr.c | 81 +++++++++-
fs/xfs/libxfs/xfs_attr.h | 7 +-
fs/xfs/libxfs/xfs_attr_leaf.c | 7 +
fs/xfs/libxfs/xfs_attr_remote.c | 13 +-
fs/xfs/libxfs/xfs_da_btree.h | 7 +-
fs/xfs/libxfs/xfs_da_format.h | 46 +++++-
fs/xfs/libxfs/xfs_format.h | 14 +-
fs/xfs/libxfs/xfs_log_format.h | 8 +-
fs/xfs/libxfs/xfs_sb.c | 2 +
fs/xfs/scrub/attr.c | 4 +-
fs/xfs/xfs_aops.c | 61 +++++++-
fs/xfs/xfs_attr_item.c | 142 +++++++++++++++---
fs/xfs/xfs_attr_item.h | 1 +
fs/xfs/xfs_attr_list.c | 17 ++-
fs/xfs/xfs_buf.h | 17 ++-
fs/xfs/xfs_file.c | 22 ++-
fs/xfs/xfs_inode.c | 2 +
fs/xfs/xfs_inode.h | 3 +-
fs/xfs/xfs_ioctl.c | 22 +++
fs/xfs/xfs_iomap.c | 14 ++
fs/xfs/xfs_iops.c | 4 +
fs/xfs/xfs_linux.h | 1 +
fs/xfs/xfs_mount.h | 3 +
fs/xfs/xfs_ondisk.h | 4 +
fs/xfs/xfs_super.c | 19 +++
fs/xfs/xfs_trace.h | 1 +
fs/xfs/xfs_verity.c | 256 ++++++++++++++++++++++++++++++++
fs/xfs/xfs_verity.h | 27 ++++
fs/xfs/xfs_xattr.c | 9 ++
fs/zonefs/file.c | 12 +-
include/linux/fsverity.h | 18 ++-
include/linux/iomap.h | 39 ++++-
include/uapi/linux/fs.h | 1 +
43 files changed, 923 insertions(+), 126 deletions(-)
create mode 100644 fs/xfs/xfs_verity.c
create mode 100644 fs/xfs/xfs_verity.h
--
2.38.4
From: Allison Henderson <[email protected]>
This patch adds two new fields to the atti/d. They are nname and
nnamelen. This will be used for parent pointer updates since a
rename operation may cause the parent pointer to update both the
name and value. So we need to carry both the new name as well as
the target name in the attri/d.
Signed-off-by: Allison Henderson <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
---
fs/xfs/libxfs/xfs_attr.c | 12 ++-
fs/xfs/libxfs/xfs_attr.h | 4 +-
fs/xfs/libxfs/xfs_da_btree.h | 2 +
fs/xfs/libxfs/xfs_log_format.h | 6 +-
fs/xfs/xfs_attr_item.c | 135 +++++++++++++++++++++++++++------
fs/xfs/xfs_attr_item.h | 1 +
6 files changed, 133 insertions(+), 27 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index e28d93d232de..b1dbed7655e8 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -423,6 +423,12 @@ xfs_attr_complete_op(
args->op_flags &= ~XFS_DA_OP_REPLACE;
if (do_replace) {
args->attr_filter &= ~XFS_ATTR_INCOMPLETE;
+ if (args->new_namelen > 0) {
+ args->name = args->new_name;
+ args->namelen = args->new_namelen;
+ args->hashval = xfs_da_hashname(args->name,
+ args->namelen);
+ }
return replace_state;
}
return XFS_DAS_DONE;
@@ -922,9 +928,13 @@ xfs_attr_defer_replace(
struct xfs_da_args *args)
{
struct xfs_attr_intent *new;
+ int op_flag;
int error = 0;
- error = xfs_attr_intent_init(args, XFS_ATTRI_OP_FLAGS_REPLACE, &new);
+ op_flag = args->new_namelen == 0 ? XFS_ATTRI_OP_FLAGS_REPLACE :
+ XFS_ATTRI_OP_FLAGS_NVREPLACE;
+
+ error = xfs_attr_intent_init(args, op_flag, &new);
if (error)
return error;
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 81be9b3e4004..3e81f3f48560 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -510,8 +510,8 @@ struct xfs_attr_intent {
struct xfs_da_args *xattri_da_args;
/*
- * Shared buffer containing the attr name and value so that the logging
- * code can share large memory buffers between log items.
+ * Shared buffer containing the attr name, new name, and value so that
+ * the logging code can share large memory buffers between log items.
*/
struct xfs_attri_log_nameval *xattri_nameval;
diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index ffa3df5b2893..a4b29827603f 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -55,7 +55,9 @@ enum xfs_dacmp {
typedef struct xfs_da_args {
struct xfs_da_geometry *geo; /* da block geometry */
const uint8_t *name; /* string (maybe not NULL terminated) */
+ const uint8_t *new_name; /* new attr name */
int namelen; /* length of string (maybe no NULL) */
+ int new_namelen; /* new attr name len */
uint8_t filetype; /* filetype of inode for directories */
void *value; /* set of bytes (maybe contain NULLs) */
int valuelen; /* length of value */
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index f13e0809dc63..ae9c99762a24 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -117,7 +117,8 @@ struct xfs_unmount_log_format {
#define XLOG_REG_TYPE_ATTRD_FORMAT 28
#define XLOG_REG_TYPE_ATTR_NAME 29
#define XLOG_REG_TYPE_ATTR_VALUE 30
-#define XLOG_REG_TYPE_MAX 30
+#define XLOG_REG_TYPE_ATTR_NNAME 31
+#define XLOG_REG_TYPE_MAX 31
/*
@@ -957,6 +958,7 @@ struct xfs_icreate_log {
#define XFS_ATTRI_OP_FLAGS_SET 1 /* Set the attribute */
#define XFS_ATTRI_OP_FLAGS_REMOVE 2 /* Remove the attribute */
#define XFS_ATTRI_OP_FLAGS_REPLACE 3 /* Replace the attribute */
+#define XFS_ATTRI_OP_FLAGS_NVREPLACE 4 /* Replace attr name and val */
#define XFS_ATTRI_OP_FLAGS_TYPE_MASK 0xFF /* Flags type mask */
/*
@@ -974,7 +976,7 @@ struct xfs_icreate_log {
struct xfs_attri_log_format {
uint16_t alfi_type; /* attri log item type */
uint16_t alfi_size; /* size of this item */
- uint32_t __pad; /* pad to 64 bit aligned */
+ uint32_t alfi_nname_len; /* attr new name length */
uint64_t alfi_id; /* attri identifier */
uint64_t alfi_ino; /* the inode for this attr operation */
uint32_t alfi_op_flags; /* marks the op as a set or remove */
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 2788a6f2edcd..95e9ecbb4a67 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -75,6 +75,8 @@ static inline struct xfs_attri_log_nameval *
xfs_attri_log_nameval_alloc(
const void *name,
unsigned int name_len,
+ const void *nname,
+ unsigned int nname_len,
const void *value,
unsigned int value_len)
{
@@ -85,15 +87,25 @@ xfs_attri_log_nameval_alloc(
* this. But kvmalloc() utterly sucks, so we use our own version.
*/
nv = xlog_kvmalloc(sizeof(struct xfs_attri_log_nameval) +
- name_len + value_len);
+ name_len + nname_len + value_len);
nv->name.i_addr = nv + 1;
nv->name.i_len = name_len;
nv->name.i_type = XLOG_REG_TYPE_ATTR_NAME;
memcpy(nv->name.i_addr, name, name_len);
+ if (nname_len) {
+ nv->nname.i_addr = nv->name.i_addr + name_len;
+ nv->nname.i_len = nname_len;
+ memcpy(nv->nname.i_addr, nname, nname_len);
+ } else {
+ nv->nname.i_addr = NULL;
+ nv->nname.i_len = 0;
+ }
+ nv->nname.i_type = XLOG_REG_TYPE_ATTR_NNAME;
+
if (value_len) {
- nv->value.i_addr = nv->name.i_addr + name_len;
+ nv->value.i_addr = nv->name.i_addr + nname_len + name_len;
nv->value.i_len = value_len;
memcpy(nv->value.i_addr, value, value_len);
} else {
@@ -147,11 +159,15 @@ xfs_attri_item_size(
*nbytes += sizeof(struct xfs_attri_log_format) +
xlog_calc_iovec_len(nv->name.i_len);
- if (!nv->value.i_len)
- return;
+ if (nv->nname.i_len) {
+ *nvecs += 1;
+ *nbytes += xlog_calc_iovec_len(nv->nname.i_len);
+ }
- *nvecs += 1;
- *nbytes += xlog_calc_iovec_len(nv->value.i_len);
+ if (nv->value.i_len) {
+ *nvecs += 1;
+ *nbytes += xlog_calc_iovec_len(nv->value.i_len);
+ }
}
/*
@@ -181,6 +197,9 @@ xfs_attri_item_format(
ASSERT(nv->name.i_len > 0);
attrip->attri_format.alfi_size++;
+ if (nv->nname.i_len > 0)
+ attrip->attri_format.alfi_size++;
+
if (nv->value.i_len > 0)
attrip->attri_format.alfi_size++;
@@ -188,6 +207,10 @@ xfs_attri_item_format(
&attrip->attri_format,
sizeof(struct xfs_attri_log_format));
xlog_copy_from_iovec(lv, &vecp, &nv->name);
+
+ if (nv->nname.i_len > 0)
+ xlog_copy_from_iovec(lv, &vecp, &nv->nname);
+
if (nv->value.i_len > 0)
xlog_copy_from_iovec(lv, &vecp, &nv->value);
}
@@ -374,6 +397,7 @@ xfs_attr_log_item(
attrp->alfi_op_flags = attr->xattri_op_flags;
attrp->alfi_value_len = attr->xattri_nameval->value.i_len;
attrp->alfi_name_len = attr->xattri_nameval->name.i_len;
+ attrp->alfi_nname_len = attr->xattri_nameval->nname.i_len;
ASSERT(!(attr->xattri_da_args->attr_filter & ~XFS_ATTRI_FILTER_MASK));
attrp->alfi_attr_filter = attr->xattri_da_args->attr_filter;
}
@@ -415,7 +439,8 @@ xfs_attr_create_intent(
* deferred work state structure.
*/
attr->xattri_nameval = xfs_attri_log_nameval_alloc(args->name,
- args->namelen, args->value, args->valuelen);
+ args->namelen, args->new_name,
+ args->new_namelen, args->value, args->valuelen);
}
attrip = xfs_attri_init(mp, attr->xattri_nameval);
@@ -503,7 +528,8 @@ xfs_attri_validate(
unsigned int op = attrp->alfi_op_flags &
XFS_ATTRI_OP_FLAGS_TYPE_MASK;
- if (attrp->__pad != 0)
+ if (attrp->alfi_op_flags != XFS_ATTRI_OP_FLAGS_NVREPLACE &&
+ attrp->alfi_nname_len != 0)
return false;
if (attrp->alfi_op_flags & ~XFS_ATTRI_OP_FLAGS_TYPE_MASK)
@@ -517,6 +543,7 @@ xfs_attri_validate(
case XFS_ATTRI_OP_FLAGS_SET:
case XFS_ATTRI_OP_FLAGS_REPLACE:
case XFS_ATTRI_OP_FLAGS_REMOVE:
+ case XFS_ATTRI_OP_FLAGS_NVREPLACE:
break;
default:
return false;
@@ -526,9 +553,14 @@ xfs_attri_validate(
return false;
if ((attrp->alfi_name_len > XATTR_NAME_MAX) ||
+ (attrp->alfi_nname_len > XATTR_NAME_MAX) ||
(attrp->alfi_name_len == 0))
return false;
+ if (op == XFS_ATTRI_OP_FLAGS_REMOVE &&
+ attrp->alfi_value_len != 0)
+ return false;
+
return xfs_verify_ino(mp, attrp->alfi_ino);
}
@@ -589,6 +621,8 @@ xfs_attri_item_recover(
args->whichfork = XFS_ATTR_FORK;
args->name = nv->name.i_addr;
args->namelen = nv->name.i_len;
+ args->new_name = nv->nname.i_addr;
+ args->new_namelen = nv->nname.i_len;
args->hashval = xfs_da_hashname(args->name, args->namelen);
args->attr_filter = attrp->alfi_attr_filter & XFS_ATTRI_FILTER_MASK;
args->op_flags = XFS_DA_OP_RECOVERY | XFS_DA_OP_OKNOENT |
@@ -599,6 +633,7 @@ xfs_attri_item_recover(
switch (attr->xattri_op_flags) {
case XFS_ATTRI_OP_FLAGS_SET:
case XFS_ATTRI_OP_FLAGS_REPLACE:
+ case XFS_ATTRI_OP_FLAGS_NVREPLACE:
args->value = nv->value.i_addr;
args->valuelen = nv->value.i_len;
args->total = xfs_attr_calc_size(args, &local);
@@ -688,6 +723,7 @@ xfs_attri_item_relog(
new_attrp->alfi_op_flags = old_attrp->alfi_op_flags;
new_attrp->alfi_value_len = old_attrp->alfi_value_len;
new_attrp->alfi_name_len = old_attrp->alfi_name_len;
+ new_attrp->alfi_nname_len = old_attrp->alfi_nname_len;
new_attrp->alfi_attr_filter = old_attrp->alfi_attr_filter;
xfs_trans_add_item(tp, &new_attrip->attri_item);
@@ -710,48 +746,102 @@ xlog_recover_attri_commit_pass2(
const void *attr_value = NULL;
const void *attr_name;
size_t len;
-
- attri_formatp = item->ri_buf[0].i_addr;
- attr_name = item->ri_buf[1].i_addr;
+ const void *attr_nname = NULL;
+ int op, i = 0;
/* Validate xfs_attri_log_format before the large memory allocation */
len = sizeof(struct xfs_attri_log_format);
- if (item->ri_buf[0].i_len != len) {
+ if (item->ri_buf[i].i_len != len) {
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
- item->ri_buf[0].i_addr, item->ri_buf[0].i_len);
+ item->ri_buf[i].i_addr, item->ri_buf[i].i_len);
return -EFSCORRUPTED;
}
+ attri_formatp = item->ri_buf[i].i_addr;
if (!xfs_attri_validate(mp, attri_formatp)) {
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
- item->ri_buf[0].i_addr, item->ri_buf[0].i_len);
+ item->ri_buf[i].i_addr, item->ri_buf[i].i_len);
return -EFSCORRUPTED;
}
+ op = attri_formatp->alfi_op_flags & XFS_ATTRI_OP_FLAGS_TYPE_MASK;
+ switch (op) {
+ case XFS_ATTRI_OP_FLAGS_SET:
+ case XFS_ATTRI_OP_FLAGS_REPLACE:
+ if (item->ri_total != 3) {
+ XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+ attri_formatp, len);
+ return -EFSCORRUPTED;
+ }
+ break;
+ case XFS_ATTRI_OP_FLAGS_REMOVE:
+ if (item->ri_total != 2) {
+ XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+ attri_formatp, len);
+ return -EFSCORRUPTED;
+ }
+ break;
+ case XFS_ATTRI_OP_FLAGS_NVREPLACE:
+ if (item->ri_total != 4) {
+ XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+ attri_formatp, len);
+ return -EFSCORRUPTED;
+ }
+ break;
+ default:
+ XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+ attri_formatp, len);
+ return -EFSCORRUPTED;
+ }
+
+ i++;
/* Validate the attr name */
- if (item->ri_buf[1].i_len !=
+ if (item->ri_buf[i].i_len !=
xlog_calc_iovec_len(attri_formatp->alfi_name_len)) {
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
- item->ri_buf[0].i_addr, item->ri_buf[0].i_len);
+ attri_formatp, len);
return -EFSCORRUPTED;
}
+ attr_name = item->ri_buf[i].i_addr;
if (!xfs_attr_namecheck(attr_name, attri_formatp->alfi_name_len)) {
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
- item->ri_buf[1].i_addr, item->ri_buf[1].i_len);
+ item->ri_buf[i].i_addr, item->ri_buf[i].i_len);
return -EFSCORRUPTED;
}
+ i++;
+ if (attri_formatp->alfi_nname_len) {
+ /* Validate the attr nname */
+ if (item->ri_buf[i].i_len !=
+ xlog_calc_iovec_len(attri_formatp->alfi_nname_len)) {
+ XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+ item->ri_buf[i].i_addr,
+ item->ri_buf[i].i_len);
+ return -EFSCORRUPTED;
+ }
+
+ attr_nname = item->ri_buf[i].i_addr;
+ if (!xfs_attr_namecheck(attr_nname,
+ attri_formatp->alfi_nname_len)) {
+ XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+ item->ri_buf[i].i_addr,
+ item->ri_buf[i].i_len);
+ return -EFSCORRUPTED;
+ }
+ i++;
+ }
+
+
/* Validate the attr value, if present */
if (attri_formatp->alfi_value_len != 0) {
- if (item->ri_buf[2].i_len != xlog_calc_iovec_len(attri_formatp->alfi_value_len)) {
+ if (item->ri_buf[i].i_len != xlog_calc_iovec_len(attri_formatp->alfi_value_len)) {
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
- item->ri_buf[0].i_addr,
- item->ri_buf[0].i_len);
+ attri_formatp, len);
return -EFSCORRUPTED;
}
- attr_value = item->ri_buf[2].i_addr;
+ attr_value = item->ri_buf[i].i_addr;
}
/*
@@ -760,7 +850,8 @@ xlog_recover_attri_commit_pass2(
* reference.
*/
nv = xfs_attri_log_nameval_alloc(attr_name,
- attri_formatp->alfi_name_len, attr_value,
+ attri_formatp->alfi_name_len, attr_nname,
+ attri_formatp->alfi_nname_len, attr_value,
attri_formatp->alfi_value_len);
attrip = xfs_attri_init(mp, nv);
diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
index 3280a7930287..24d4968dd6cc 100644
--- a/fs/xfs/xfs_attr_item.h
+++ b/fs/xfs/xfs_attr_item.h
@@ -13,6 +13,7 @@ struct kmem_zone;
struct xfs_attri_log_nameval {
struct xfs_log_iovec name;
+ struct xfs_log_iovec nname;
struct xfs_log_iovec value;
refcount_t refcount;
--
2.38.4
From: Allison Henderson <[email protected]>
Add the new parent attribute type. XFS_ATTR_PARENT is used only for parent pointer
entries; it uses reserved blocks like XFS_ATTR_ROOT.
Signed-off-by: Mark Tinguely <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Allison Henderson <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
---
fs/xfs/libxfs/xfs_attr.c | 4 +++-
fs/xfs/libxfs/xfs_da_format.h | 5 ++++-
fs/xfs/libxfs/xfs_log_format.h | 1 +
fs/xfs/scrub/attr.c | 2 +-
4 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index b1dbed7655e8..101823772bf9 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -976,11 +976,13 @@ xfs_attr_set(
struct xfs_inode *dp = args->dp;
struct xfs_mount *mp = dp->i_mount;
struct xfs_trans_res tres;
- bool rsvd = (args->attr_filter & XFS_ATTR_ROOT);
+ bool rsvd;
int error, local;
int rmt_blks = 0;
unsigned int total;
+ rsvd = (args->attr_filter & (XFS_ATTR_ROOT | XFS_ATTR_PARENT)) != 0;
+
if (xfs_is_shutdown(dp->i_mount))
return -EIO;
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 25e2841084e1..3dc03968bba6 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -688,12 +688,15 @@ struct xfs_attr3_leafblock {
#define XFS_ATTR_LOCAL_BIT 0 /* attr is stored locally */
#define XFS_ATTR_ROOT_BIT 1 /* limit access to trusted attrs */
#define XFS_ATTR_SECURE_BIT 2 /* limit access to secure attrs */
+#define XFS_ATTR_PARENT_BIT 3 /* parent pointer attrs */
#define XFS_ATTR_INCOMPLETE_BIT 7 /* attr in middle of create/delete */
#define XFS_ATTR_LOCAL (1u << XFS_ATTR_LOCAL_BIT)
#define XFS_ATTR_ROOT (1u << XFS_ATTR_ROOT_BIT)
#define XFS_ATTR_SECURE (1u << XFS_ATTR_SECURE_BIT)
+#define XFS_ATTR_PARENT (1u << XFS_ATTR_PARENT_BIT)
#define XFS_ATTR_INCOMPLETE (1u << XFS_ATTR_INCOMPLETE_BIT)
-#define XFS_ATTR_NSP_ONDISK_MASK (XFS_ATTR_ROOT | XFS_ATTR_SECURE)
+#define XFS_ATTR_NSP_ONDISK_MASK \
+ (XFS_ATTR_ROOT | XFS_ATTR_SECURE | XFS_ATTR_PARENT)
/*
* Alignment for namelist and valuelist entries (since they are mixed
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index ae9c99762a24..727b5a858028 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -967,6 +967,7 @@ struct xfs_icreate_log {
*/
#define XFS_ATTRI_FILTER_MASK (XFS_ATTR_ROOT | \
XFS_ATTR_SECURE | \
+ XFS_ATTR_PARENT | \
XFS_ATTR_INCOMPLETE)
/*
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index 31529b9bf389..9d2e33743ecd 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -441,7 +441,7 @@ xchk_xattr_rec(
/* Retrieve the entry and check it. */
hash = be32_to_cpu(ent->hashval);
badflags = ~(XFS_ATTR_LOCAL | XFS_ATTR_ROOT | XFS_ATTR_SECURE |
- XFS_ATTR_INCOMPLETE);
+ XFS_ATTR_INCOMPLETE | XFS_ATTR_PARENT);
if ((ent->flags & badflags) != 0)
xchk_da_set_corrupt(ds, level);
if (ent->flags & XFS_ATTR_LOCAL) {
--
2.38.4
From: Allison Henderson <[email protected]>
We need to define the parent pointer attribute format before we start
adding support for it into all the code that needs to use it. The EA
format we will use encodes the following information:
name={parent inode #, parent inode generation, dirent offset}
value={dirent filename}
The inode/gen gives all the information we need to reliably identify the
parent without requiring child->parent lock ordering, and allows
userspace to do pathname component level reconstruction without the
kernel ever needing to verify the parent itself as part of ioctl calls.
By using the dirent offset in the EA name, we have a method of knowing
the exact parent pointer EA we need to modify/remove in rename/unlink
without an unbound EA name search.
By keeping the dirent name in the value, we have enough information to
be able to validate and reconstruct damaged directory trees. While the
diroffset of a filename alone is not unique enough to identify the
child, the {diroffset,filename,child_inode} tuple is sufficient. That
is, if the diroffset gets reused and points to a different filename, we
can detect that from the contents of EA. If a link of the same name is
created, then we can check whether it points at the same inode as the
parent EA we current have.
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Allison Henderson <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
---
fs/xfs/libxfs/xfs_da_format.h | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 3dc03968bba6..b02b67f1999e 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -805,4 +805,29 @@ static inline unsigned int xfs_dir2_dirblock_bytes(struct xfs_sb *sbp)
xfs_failaddr_t xfs_da3_blkinfo_verify(struct xfs_buf *bp,
struct xfs_da3_blkinfo *hdr3);
+/*
+ * Parent pointer attribute format definition
+ *
+ * EA name encodes the parent inode number, generation and the offset of
+ * the dirent that points to the child inode. The EA value contains the
+ * same name as the dirent in the parent directory.
+ */
+struct xfs_parent_name_rec {
+ __be64 p_ino;
+ __be32 p_gen;
+ __be32 p_diroffset;
+};
+
+/*
+ * incore version of the above, also contains name pointers so callers
+ * can pass/obtain all the parent pointer information in a single structure
+ */
+struct xfs_parent_name_irec {
+ xfs_ino_t p_ino;
+ uint32_t p_gen;
+ xfs_dir2_dataptr_t p_diroffset;
+ const char *p_name;
+ uint8_t p_namelen;
+};
+
#endif /* __XFS_DA_FORMAT_H__ */
--
2.38.4
From: Allison Henderson <[email protected]>
Attribute names of parent pointers are not strings. So we need to modify
attr_namecheck to verify parent pointer records when the XFS_ATTR_PARENT flag is
set.
Signed-off-by: Allison Henderson <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
---
fs/xfs/libxfs/xfs_attr.c | 47 ++++++++++++++++++++++++++++++++---
fs/xfs/libxfs/xfs_attr.h | 3 ++-
fs/xfs/libxfs/xfs_da_format.h | 8 ++++++
fs/xfs/scrub/attr.c | 2 +-
fs/xfs/xfs_attr_item.c | 11 +++++---
fs/xfs/xfs_attr_list.c | 17 +++++++++----
6 files changed, 74 insertions(+), 14 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 101823772bf9..711022742e34 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -1577,9 +1577,33 @@ xfs_attr_node_get(
return error;
}
-/* Returns true if the attribute entry name is valid. */
-bool
-xfs_attr_namecheck(
+/*
+ * Verify parent pointer attribute is valid.
+ * Return true on success or false on failure
+ */
+STATIC bool
+xfs_verify_pptr(
+ struct xfs_mount *mp,
+ const struct xfs_parent_name_rec *rec)
+{
+ xfs_ino_t p_ino;
+ xfs_dir2_dataptr_t p_diroffset;
+
+ p_ino = be64_to_cpu(rec->p_ino);
+ p_diroffset = be32_to_cpu(rec->p_diroffset);
+
+ if (!xfs_verify_ino(mp, p_ino))
+ return false;
+
+ if (p_diroffset > XFS_DIR2_MAX_DATAPTR)
+ return false;
+
+ return true;
+}
+
+/* Returns true if the string attribute entry name is valid. */
+static bool
+xfs_str_attr_namecheck(
const void *name,
size_t length)
{
@@ -1594,6 +1618,23 @@ xfs_attr_namecheck(
return !memchr(name, 0, length);
}
+/* Returns true if the attribute entry name is valid. */
+bool
+xfs_attr_namecheck(
+ struct xfs_mount *mp,
+ const void *name,
+ size_t length,
+ int flags)
+{
+ if (flags & XFS_ATTR_PARENT) {
+ if (length != sizeof(struct xfs_parent_name_rec))
+ return false;
+ return xfs_verify_pptr(mp, (struct xfs_parent_name_rec *)name);
+ }
+
+ return xfs_str_attr_namecheck(name, length);
+}
+
int __init
xfs_attr_intent_init_cache(void)
{
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 3e81f3f48560..b79dae788cfb 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -547,7 +547,8 @@ int xfs_attr_get(struct xfs_da_args *args);
int xfs_attr_set(struct xfs_da_args *args);
int xfs_attr_set_iter(struct xfs_attr_intent *attr);
int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
-bool xfs_attr_namecheck(const void *name, size_t length);
+bool xfs_attr_namecheck(struct xfs_mount *mp, const void *name, size_t length,
+ int flags);
int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
unsigned int *total);
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index b02b67f1999e..75b13807145d 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -731,6 +731,14 @@ xfs_attr3_leaf_name(xfs_attr_leafblock_t *leafp, int idx)
return &((char *)leafp)[be16_to_cpu(entries[idx].nameidx)];
}
+static inline int
+xfs_attr3_leaf_flags(xfs_attr_leafblock_t *leafp, int idx)
+{
+ struct xfs_attr_leaf_entry *entries = xfs_attr3_leaf_entryp(leafp);
+
+ return entries[idx].flags;
+}
+
static inline xfs_attr_leaf_name_remote_t *
xfs_attr3_leaf_name_remote(xfs_attr_leafblock_t *leafp, int idx)
{
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index 9d2e33743ecd..2a79a13cb600 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -129,7 +129,7 @@ xchk_xattr_listent(
}
/* Does this name make sense? */
- if (!xfs_attr_namecheck(name, namelen)) {
+ if (!xfs_attr_namecheck(sx->sc->mp, name, namelen, flags)) {
xchk_fblock_set_corrupt(sx->sc, XFS_ATTR_FORK, args.blkno);
return;
}
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 95e9ecbb4a67..da807f286a09 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -593,7 +593,8 @@ xfs_attri_item_recover(
*/
attrp = &attrip->attri_format;
if (!xfs_attri_validate(mp, attrp) ||
- !xfs_attr_namecheck(nv->name.i_addr, nv->name.i_len))
+ !xfs_attr_namecheck(mp, nv->name.i_addr, nv->name.i_len,
+ attrp->alfi_attr_filter))
return -EFSCORRUPTED;
error = xlog_recover_iget(mp, attrp->alfi_ino, &ip);
@@ -804,7 +805,8 @@ xlog_recover_attri_commit_pass2(
}
attr_name = item->ri_buf[i].i_addr;
- if (!xfs_attr_namecheck(attr_name, attri_formatp->alfi_name_len)) {
+ if (!xfs_attr_namecheck(mp, attr_name, attri_formatp->alfi_name_len,
+ attri_formatp->alfi_attr_filter)) {
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
item->ri_buf[i].i_addr, item->ri_buf[i].i_len);
return -EFSCORRUPTED;
@@ -822,8 +824,9 @@ xlog_recover_attri_commit_pass2(
}
attr_nname = item->ri_buf[i].i_addr;
- if (!xfs_attr_namecheck(attr_nname,
- attri_formatp->alfi_nname_len)) {
+ if (!xfs_attr_namecheck(mp, attr_nname,
+ attri_formatp->alfi_nname_len,
+ attri_formatp->alfi_attr_filter)) {
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
item->ri_buf[i].i_addr,
item->ri_buf[i].i_len);
diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
index 99bbbe1a0e44..a51f7f13a352 100644
--- a/fs/xfs/xfs_attr_list.c
+++ b/fs/xfs/xfs_attr_list.c
@@ -58,9 +58,13 @@ xfs_attr_shortform_list(
struct xfs_attr_sf_sort *sbuf, *sbp;
struct xfs_attr_shortform *sf;
struct xfs_attr_sf_entry *sfe;
+ struct xfs_mount *mp;
int sbsize, nsbuf, count, i;
int error = 0;
+ ASSERT(context != NULL);
+ ASSERT(dp != NULL);
+ mp = dp->i_mount;
sf = (struct xfs_attr_shortform *)dp->i_af.if_u1.if_data;
ASSERT(sf != NULL);
if (!sf->hdr.count)
@@ -82,8 +86,9 @@ xfs_attr_shortform_list(
(dp->i_af.if_bytes + sf->hdr.count * 16) < context->bufsize)) {
for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) {
if (XFS_IS_CORRUPT(context->dp->i_mount,
- !xfs_attr_namecheck(sfe->nameval,
- sfe->namelen)))
+ !xfs_attr_namecheck(mp, sfe->nameval,
+ sfe->namelen,
+ sfe->flags)))
return -EFSCORRUPTED;
context->put_listent(context,
sfe->flags,
@@ -174,8 +179,9 @@ xfs_attr_shortform_list(
cursor->offset = 0;
}
if (XFS_IS_CORRUPT(context->dp->i_mount,
- !xfs_attr_namecheck(sbp->name,
- sbp->namelen))) {
+ !xfs_attr_namecheck(mp, sbp->name,
+ sbp->namelen,
+ sbp->flags))) {
error = -EFSCORRUPTED;
goto out;
}
@@ -465,7 +471,8 @@ xfs_attr3_leaf_list_int(
}
if (XFS_IS_CORRUPT(context->dp->i_mount,
- !xfs_attr_namecheck(name, namelen)))
+ !xfs_attr_namecheck(mp, name, namelen,
+ entry->flags)))
return -EFSCORRUPTED;
context->put_listent(context, entry->flags,
name, namelen, valuelen);
--
2.38.4
Allow filesystem to make additional processing on verified pages
instead of just dropping a reference. This will be used by XFS for
internal buffer cache manipulation in further patches. The btrfs,
ext4, and f2fs just drop the reference.
Signed-off-by: Andrey Albershteyn <[email protected]>
---
fs/btrfs/verity.c | 12 ++++++++++++
fs/ext4/verity.c | 6 ++++++
fs/f2fs/verity.c | 6 ++++++
fs/verity/read_metadata.c | 4 ++--
fs/verity/verify.c | 6 +++---
include/linux/fsverity.h | 10 ++++++++++
6 files changed, 39 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/verity.c b/fs/btrfs/verity.c
index c5ff16f9e9fa..4c2c09204bb4 100644
--- a/fs/btrfs/verity.c
+++ b/fs/btrfs/verity.c
@@ -804,10 +804,22 @@ static int btrfs_write_merkle_tree_block(struct inode *inode, const void *buf,
pos, buf, size);
}
+/*
+ * fsverity op that releases the reference obtained by ->read_merkle_tree_page()
+ *
+ * @page: reference to the page which can be released
+ *
+ */
+static void btrfs_drop_page(struct page *page)
+{
+ put_page(page);
+}
+
const struct fsverity_operations btrfs_verityops = {
.begin_enable_verity = btrfs_begin_enable_verity,
.end_enable_verity = btrfs_end_enable_verity,
.get_verity_descriptor = btrfs_get_verity_descriptor,
.read_merkle_tree_page = btrfs_read_merkle_tree_page,
.write_merkle_tree_block = btrfs_write_merkle_tree_block,
+ .drop_page = &btrfs_drop_page,
};
diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
index e4da1704438e..35a2feb6fd68 100644
--- a/fs/ext4/verity.c
+++ b/fs/ext4/verity.c
@@ -388,10 +388,16 @@ static int ext4_write_merkle_tree_block(struct inode *inode, const void *buf,
return pagecache_write(inode, buf, size, pos);
}
+static void ext4_drop_page(struct page *page)
+{
+ put_page(page);
+}
+
const struct fsverity_operations ext4_verityops = {
.begin_enable_verity = ext4_begin_enable_verity,
.end_enable_verity = ext4_end_enable_verity,
.get_verity_descriptor = ext4_get_verity_descriptor,
.read_merkle_tree_page = ext4_read_merkle_tree_page,
.write_merkle_tree_block = ext4_write_merkle_tree_block,
+ .drop_page = &ext4_drop_page,
};
diff --git a/fs/f2fs/verity.c b/fs/f2fs/verity.c
index 4fc95f353a7a..019c7a6c6bcf 100644
--- a/fs/f2fs/verity.c
+++ b/fs/f2fs/verity.c
@@ -283,10 +283,16 @@ static int f2fs_write_merkle_tree_block(struct inode *inode, const void *buf,
return pagecache_write(inode, buf, size, pos);
}
+static void f2fs_drop_page(struct page *page)
+{
+ put_page(page);
+}
+
const struct fsverity_operations f2fs_verityops = {
.begin_enable_verity = f2fs_begin_enable_verity,
.end_enable_verity = f2fs_end_enable_verity,
.get_verity_descriptor = f2fs_get_verity_descriptor,
.read_merkle_tree_page = f2fs_read_merkle_tree_page,
.write_merkle_tree_block = f2fs_write_merkle_tree_block,
+ .drop_page = &f2fs_drop_page,
};
diff --git a/fs/verity/read_metadata.c b/fs/verity/read_metadata.c
index 2aefc5565152..cab1612bf4a3 100644
--- a/fs/verity/read_metadata.c
+++ b/fs/verity/read_metadata.c
@@ -56,12 +56,12 @@ static int fsverity_read_merkle_tree(struct inode *inode,
virt = kmap_local_page(page);
if (copy_to_user(buf, virt + offs_in_page, bytes_to_copy)) {
kunmap_local(virt);
- put_page(page);
+ inode->i_sb->s_vop->drop_page(page);
err = -EFAULT;
break;
}
kunmap_local(virt);
- put_page(page);
+ inode->i_sb->s_vop->drop_page(page);
retval += bytes_to_copy;
buf += bytes_to_copy;
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index f50e3b5b52c9..c2fc4c86af34 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -210,7 +210,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
if (is_hash_block_verified(vi, hpage, hblock_idx)) {
memcpy_from_page(_want_hash, hpage, hoffset, hsize);
want_hash = _want_hash;
- put_page(hpage);
+ inode->i_sb->s_vop->drop_page(hpage);
goto descend;
}
hblocks[level].page = hpage;
@@ -248,7 +248,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
SetPageChecked(hpage);
memcpy_from_page(_want_hash, hpage, hoffset, hsize);
want_hash = _want_hash;
- put_page(hpage);
+ inode->i_sb->s_vop->drop_page(hpage);
}
/* Finally, verify the data block. */
@@ -259,7 +259,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
err = cmp_hashes(vi, want_hash, real_hash, data_pos, -1);
out:
for (; level > 0; level--)
- put_page(hblocks[level - 1].page);
+ inode->i_sb->s_vop->drop_page(hblocks[level - 1].page);
return err == 0;
}
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 6d7a4b3ea626..3e923a8e0d6f 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -120,6 +120,16 @@ struct fsverity_operations {
*/
int (*write_merkle_tree_block)(struct inode *inode, const void *buf,
u64 pos, unsigned int size);
+
+ /**
+ * Release the reference to a Merkle tree page
+ *
+ * @page: the page to release
+ *
+ * This is called when fs-verity is done with a page obtained with
+ * ->read_merkle_tree_page().
+ */
+ void (*drop_page)(struct page *page);
};
#ifdef CONFIG_FS_VERITY
--
2.38.4
Make filesystems create readpage context, similar as
iomap_writepage_ctx in write path. This will allow filesystem to
pass _ops to iomap for ioend configuration (->prepare_ioend) which
in turn would be used to set BIO end callout (bio->bi_end_io).
This will be utilized in further patches by fs-verity to verify
pages on BIO completion by XFS.
Signed-off-by: Andrey Albershteyn <[email protected]>
---
fs/erofs/data.c | 12 +++++++--
fs/gfs2/aops.c | 10 ++++++--
fs/iomap/buffered-io.c | 57 ++++++++++++++++--------------------------
fs/xfs/xfs_aops.c | 16 +++++++++---
fs/zonefs/file.c | 12 +++++++--
include/linux/iomap.h | 13 ++++++++--
6 files changed, 73 insertions(+), 47 deletions(-)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index e16545849ea7..189591249f61 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -344,12 +344,20 @@ int erofs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
*/
static int erofs_read_folio(struct file *file, struct folio *folio)
{
- return iomap_read_folio(folio, &erofs_iomap_ops);
+ struct iomap_readpage_ctx ctx = {
+ .cur_folio = folio,
+ };
+
+ return iomap_read_folio(&ctx, &erofs_iomap_ops);
}
static void erofs_readahead(struct readahead_control *rac)
{
- return iomap_readahead(rac, &erofs_iomap_ops);
+ struct iomap_readpage_ctx ctx = {
+ .rac = rac,
+ };
+
+ return iomap_readahead(&ctx, &erofs_iomap_ops);
}
static sector_t erofs_bmap(struct address_space *mapping, sector_t block)
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index a5f4be6b9213..2764e1e99e8b 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -453,10 +453,13 @@ static int gfs2_read_folio(struct file *file, struct folio *folio)
struct gfs2_inode *ip = GFS2_I(inode);
struct gfs2_sbd *sdp = GFS2_SB(inode);
int error;
+ struct iomap_readpage_ctx ctx = {
+ .cur_folio = folio,
+ };
if (!gfs2_is_jdata(ip) ||
(i_blocksize(inode) == PAGE_SIZE && !folio_buffers(folio))) {
- error = iomap_read_folio(folio, &gfs2_iomap_ops);
+ error = iomap_read_folio(&ctx, &gfs2_iomap_ops);
} else if (gfs2_is_stuffed(ip)) {
error = stuffed_readpage(ip, &folio->page);
folio_unlock(folio);
@@ -528,13 +531,16 @@ static void gfs2_readahead(struct readahead_control *rac)
{
struct inode *inode = rac->mapping->host;
struct gfs2_inode *ip = GFS2_I(inode);
+ struct iomap_readpage_ctx ctx = {
+ .rac = rac,
+ };
if (gfs2_is_stuffed(ip))
;
else if (gfs2_is_jdata(ip))
mpage_readahead(rac, gfs2_block_map);
else
- iomap_readahead(rac, &gfs2_iomap_ops);
+ iomap_readahead(&ctx, &gfs2_iomap_ops);
}
/**
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 6f4c97a6d7e9..d39be64b1da9 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -194,13 +194,6 @@ static void iomap_read_end_io(struct bio *bio)
bio_put(bio);
}
-struct iomap_readpage_ctx {
- struct folio *cur_folio;
- bool cur_folio_in_bio;
- struct bio *bio;
- struct readahead_control *rac;
-};
-
/**
* iomap_read_inline_data - copy inline data into the page cache
* @iter: iteration structure
@@ -325,32 +318,29 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
return pos - orig_pos + plen;
}
-int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
+int iomap_read_folio(struct iomap_readpage_ctx *ctx, const struct iomap_ops *ops)
{
struct iomap_iter iter = {
- .inode = folio->mapping->host,
- .pos = folio_pos(folio),
- .len = folio_size(folio),
- };
- struct iomap_readpage_ctx ctx = {
- .cur_folio = folio,
+ .inode = ctx->cur_folio->mapping->host,
+ .pos = folio_pos(ctx->cur_folio),
+ .len = folio_size(ctx->cur_folio),
};
int ret;
trace_iomap_readpage(iter.inode, 1);
while ((ret = iomap_iter(&iter, ops)) > 0)
- iter.processed = iomap_readpage_iter(&iter, &ctx, 0);
+ iter.processed = iomap_readpage_iter(&iter, ctx, 0);
if (ret < 0)
- folio_set_error(folio);
+ folio_set_error(ctx->cur_folio);
- if (ctx.bio) {
- submit_bio(ctx.bio);
- WARN_ON_ONCE(!ctx.cur_folio_in_bio);
+ if (ctx->bio) {
+ submit_bio(ctx->bio);
+ WARN_ON_ONCE(!ctx->cur_folio_in_bio);
} else {
- WARN_ON_ONCE(ctx.cur_folio_in_bio);
- folio_unlock(folio);
+ WARN_ON_ONCE(ctx->cur_folio_in_bio);
+ folio_unlock(ctx->cur_folio);
}
/*
@@ -402,27 +392,24 @@ static loff_t iomap_readahead_iter(const struct iomap_iter *iter,
* function is called with memalloc_nofs set, so allocations will not cause
* the filesystem to be reentered.
*/
-void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
+void iomap_readahead(struct iomap_readpage_ctx *ctx, const struct iomap_ops *ops)
{
struct iomap_iter iter = {
- .inode = rac->mapping->host,
- .pos = readahead_pos(rac),
- .len = readahead_length(rac),
- };
- struct iomap_readpage_ctx ctx = {
- .rac = rac,
+ .inode = ctx->rac->mapping->host,
+ .pos = readahead_pos(ctx->rac),
+ .len = readahead_length(ctx->rac),
};
- trace_iomap_readahead(rac->mapping->host, readahead_count(rac));
+ trace_iomap_readahead(ctx->rac->mapping->host, readahead_count(ctx->rac));
while (iomap_iter(&iter, ops) > 0)
- iter.processed = iomap_readahead_iter(&iter, &ctx);
+ iter.processed = iomap_readahead_iter(&iter, ctx);
- if (ctx.bio)
- submit_bio(ctx.bio);
- if (ctx.cur_folio) {
- if (!ctx.cur_folio_in_bio)
- folio_unlock(ctx.cur_folio);
+ if (ctx->bio)
+ submit_bio(ctx->bio);
+ if (ctx->cur_folio) {
+ if (!ctx->cur_folio_in_bio)
+ folio_unlock(ctx->cur_folio);
}
}
EXPORT_SYMBOL_GPL(iomap_readahead);
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 2ef78aa1d3f6..daa0dd4768fb 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -550,17 +550,25 @@ xfs_vm_bmap(
STATIC int
xfs_vm_read_folio(
- struct file *unused,
- struct folio *folio)
+ struct file *unused,
+ struct folio *folio)
{
- return iomap_read_folio(folio, &xfs_read_iomap_ops);
+ struct iomap_readpage_ctx ctx = {
+ .cur_folio = folio,
+ };
+
+ return iomap_read_folio(&ctx, &xfs_read_iomap_ops);
}
STATIC void
xfs_vm_readahead(
struct readahead_control *rac)
{
- iomap_readahead(rac, &xfs_read_iomap_ops);
+ struct iomap_readpage_ctx ctx = {
+ .rac = rac,
+ };
+
+ iomap_readahead(&ctx, &xfs_read_iomap_ops);
}
static int
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 738b0e28d74b..5d01496a5ada 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -112,12 +112,20 @@ static const struct iomap_ops zonefs_write_iomap_ops = {
static int zonefs_read_folio(struct file *unused, struct folio *folio)
{
- return iomap_read_folio(folio, &zonefs_read_iomap_ops);
+ struct iomap_readpage_ctx ctx = {
+ .cur_folio = folio,
+ };
+
+ return iomap_read_folio(&ctx, &zonefs_read_iomap_ops);
}
static void zonefs_readahead(struct readahead_control *rac)
{
- iomap_readahead(rac, &zonefs_read_iomap_ops);
+ struct iomap_readpage_ctx ctx = {
+ .rac = rac,
+ };
+
+ iomap_readahead(&ctx, &zonefs_read_iomap_ops);
}
/*
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 0f8123504e5e..0fbce375265d 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -258,8 +258,17 @@ int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
struct iomap *iomap, loff_t pos, loff_t length, ssize_t written,
int (*punch)(struct inode *inode, loff_t pos, loff_t length));
-int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops);
-void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
+struct iomap_readpage_ctx {
+ struct folio *cur_folio;
+ bool cur_folio_in_bio;
+ struct bio *bio;
+ struct readahead_control *rac;
+};
+
+int iomap_read_folio(struct iomap_readpage_ctx *ctx,
+ const struct iomap_ops *ops);
+void iomap_readahead(struct iomap_readpage_ctx *ctx,
+ const struct iomap_ops *ops);
bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos);
bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags);
--
2.38.4
XFS will need to know size of Merkle tree block as these blocks
will not be stored consecutively in fs blocks but under indexes in
extended attributes.
Signed-off-by: Andrey Albershteyn <[email protected]>
---
fs/btrfs/verity.c | 3 ++-
fs/ext4/verity.c | 3 ++-
fs/f2fs/verity.c | 3 ++-
fs/verity/read_metadata.c | 3 ++-
fs/verity/verify.c | 3 ++-
include/linux/fsverity.h | 3 ++-
6 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/verity.c b/fs/btrfs/verity.c
index 4c2c09204bb4..737ad277b15a 100644
--- a/fs/btrfs/verity.c
+++ b/fs/btrfs/verity.c
@@ -713,7 +713,8 @@ int btrfs_get_verity_descriptor(struct inode *inode, void *buf, size_t buf_size)
*/
static struct page *btrfs_read_merkle_tree_page(struct inode *inode,
pgoff_t index,
- unsigned long num_ra_pages)
+ unsigned long num_ra_pages,
+ u8 log_blocksize)
{
struct page *page;
u64 off = (u64)index << PAGE_SHIFT;
diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
index 35a2feb6fd68..cbf1253dd14a 100644
--- a/fs/ext4/verity.c
+++ b/fs/ext4/verity.c
@@ -361,7 +361,8 @@ static int ext4_get_verity_descriptor(struct inode *inode, void *buf,
static struct page *ext4_read_merkle_tree_page(struct inode *inode,
pgoff_t index,
- unsigned long num_ra_pages)
+ unsigned long num_ra_pages,
+ u8 log_blocksize)
{
struct page *page;
diff --git a/fs/f2fs/verity.c b/fs/f2fs/verity.c
index 019c7a6c6bcf..63c6a1b1bdef 100644
--- a/fs/f2fs/verity.c
+++ b/fs/f2fs/verity.c
@@ -256,7 +256,8 @@ static int f2fs_get_verity_descriptor(struct inode *inode, void *buf,
static struct page *f2fs_read_merkle_tree_page(struct inode *inode,
pgoff_t index,
- unsigned long num_ra_pages)
+ unsigned long num_ra_pages,
+ u8 log_blocksize)
{
struct page *page;
diff --git a/fs/verity/read_metadata.c b/fs/verity/read_metadata.c
index cab1612bf4a3..d6cc58c24a2e 100644
--- a/fs/verity/read_metadata.c
+++ b/fs/verity/read_metadata.c
@@ -44,7 +44,8 @@ static int fsverity_read_merkle_tree(struct inode *inode,
struct page *page;
const void *virt;
- page = vops->read_merkle_tree_page(inode, index, num_ra_pages);
+ page = vops->read_merkle_tree_page(inode, index, num_ra_pages,
+ vi->tree_params.log_blocksize);
if (IS_ERR(page)) {
err = PTR_ERR(page);
fsverity_err(inode,
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index c2fc4c86af34..9213b1e5ed2c 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -199,7 +199,8 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
hpage = inode->i_sb->s_vop->read_merkle_tree_page(inode,
hpage_idx, level == 0 ? min(max_ra_pages,
- params->tree_pages - hpage_idx) : 0);
+ params->tree_pages - hpage_idx) : 0,
+ params->log_blocksize);
if (IS_ERR(hpage)) {
err = PTR_ERR(hpage);
fsverity_err(inode,
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 3e923a8e0d6f..ad07a1d10fdf 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -103,7 +103,8 @@ struct fsverity_operations {
*/
struct page *(*read_merkle_tree_page)(struct inode *inode,
pgoff_t index,
- unsigned long num_ra_pages);
+ unsigned long num_ra_pages,
+ u8 log_blocksize);
/**
* Write a Merkle tree block to the given inode.
--
2.38.4
The meaning of the flag is that value of the extended attribute in
the buffer was verified. The underlying pages have PageChecked() ==
false (the way fs-verity identifies verified pages), as page content
will be copied out to newly allocated pages in further patches.
The flag is being used later to SetPageChecked() on pages handed to
the fs-verity.
Signed-off-by: Andrey Albershteyn <[email protected]>
---
fs/xfs/xfs_buf.h | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 549c60942208..8cc86fed962b 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -24,14 +24,15 @@ struct xfs_buf;
#define XFS_BUF_DADDR_NULL ((xfs_daddr_t) (-1LL))
-#define XBF_READ (1u << 0) /* buffer intended for reading from device */
-#define XBF_WRITE (1u << 1) /* buffer intended for writing to device */
-#define XBF_READ_AHEAD (1u << 2) /* asynchronous read-ahead */
-#define XBF_NO_IOACCT (1u << 3) /* bypass I/O accounting (non-LRU bufs) */
-#define XBF_ASYNC (1u << 4) /* initiator will not wait for completion */
-#define XBF_DONE (1u << 5) /* all pages in the buffer uptodate */
-#define XBF_STALE (1u << 6) /* buffer has been staled, do not find it */
-#define XBF_WRITE_FAIL (1u << 7) /* async writes have failed on this buffer */
+#define XBF_READ (1u << 0) /* buffer intended for reading from device */
+#define XBF_WRITE (1u << 1) /* buffer intended for writing to device */
+#define XBF_READ_AHEAD (1u << 2) /* asynchronous read-ahead */
+#define XBF_NO_IOACCT (1u << 3) /* bypass I/O accounting (non-LRU bufs) */
+#define XBF_ASYNC (1u << 4) /* initiator will not wait for completion */
+#define XBF_DONE (1u << 5) /* all pages in the buffer uptodate */
+#define XBF_STALE (1u << 6) /* buffer has been staled, do not find it */
+#define XBF_WRITE_FAIL (1u << 7) /* async writes have failed on this buffer */
+#define XBF_VERITY_CHECKED (1u << 8) /* buffer was verified by fs-verity*/
/* buffer type flags for write callbacks */
#define _XBF_INODES (1u << 16)/* inode buffer */
--
2.38.4
Not the whole folio always need to be verified by fs-verity (e.g.
with 1k blocks). Use passed folio's offset and size.
Signed-off-by: Andrey Albershteyn <[email protected]>
---
include/linux/fsverity.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 119a3266791f..6d7a4b3ea626 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -249,9 +249,10 @@ static inline void fsverity_enqueue_verify_work(struct work_struct *work)
#endif /* !CONFIG_FS_VERITY */
-static inline bool fsverity_verify_folio(struct folio *folio)
+static inline bool fsverity_verify_folio(struct folio *folio, size_t len,
+ size_t offset)
{
- return fsverity_verify_blocks(folio, folio_size(folio), 0);
+ return fsverity_verify_blocks(folio, len, offset);
}
static inline bool fsverity_verify_page(struct page *page)
--
2.38.4
One of essential ideas of fs-verity is that pages which are already
verified won't need to be re-verified if they still in page cache.
The XFS stores Merkle tree blocks in extended attributes. Each
attribute has one Merkle tree block. We can not directly mark
underlying xfs_buf's pages as checked. The are not aligned with
xattr value and we don't have a reference to that buffer which is
immediately release when value is copied out.
One way to track that this block was verified is to mark xattr's
buffer as verified. If buffer is evicted the incore
XBF_VERITY_CHECKED flag is lost. When the xattr is read again
xfs_attr_get() returns new buffer without the flag. The flag is then
used to tell fs-verity if it's new page or cached one.
This patch adds XFS_DA_OP_BUFFER to tell xfs_attr_get() to
xfs_buf_hold() underlying buffer and return it as xfs_da_args->bp.
The caller must then xfs_buf_rele() the buffer.
Signed-off-by: Andrey Albershteyn <[email protected]>
---
fs/xfs/libxfs/xfs_attr.c | 5 ++++-
fs/xfs/libxfs/xfs_attr_leaf.c | 7 +++++++
fs/xfs/libxfs/xfs_attr_remote.c | 13 +++++++++++--
fs/xfs/libxfs/xfs_da_btree.h | 5 ++++-
4 files changed, 26 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 711022742e34..298b74245267 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -251,6 +251,8 @@ xfs_attr_get_ilocked(
* If the attribute is found, but exceeds the size limit set by the caller in
* args->valuelen, return -ERANGE with the size of the attribute that was found
* in args->valuelen.
+ *
+ * Using XFS_DA_OP_BUFFER the caller have to release the buffer args->bp.
*/
int
xfs_attr_get(
@@ -269,7 +271,8 @@ xfs_attr_get(
args->hashval = xfs_da_hashname(args->name, args->namelen);
/* Entirely possible to look up a name which doesn't exist */
- args->op_flags = XFS_DA_OP_OKNOENT;
+ args->op_flags = XFS_DA_OP_OKNOENT |
+ (args->op_flags & XFS_DA_OP_BUFFER);
lock_mode = xfs_ilock_attr_map_shared(args->dp);
error = xfs_attr_get_ilocked(args);
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index beee51ad75ce..112bb2604c89 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -2533,6 +2533,13 @@ xfs_attr3_leaf_getvalue(
name_loc = xfs_attr3_leaf_name_local(leaf, args->index);
ASSERT(name_loc->namelen == args->namelen);
ASSERT(memcmp(args->name, name_loc->nameval, args->namelen) == 0);
+
+ /* must be released by the caller */
+ if (args->op_flags & XFS_DA_OP_BUFFER) {
+ xfs_buf_hold(bp);
+ args->bp = bp;
+ }
+
return xfs_attr_copy_value(args,
&name_loc->nameval[args->namelen],
be16_to_cpu(name_loc->valuelen));
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index d440393b40eb..72908e0e1c86 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -424,9 +424,18 @@ xfs_attr_rmtval_get(
error = xfs_attr_rmtval_copyout(mp, bp, args->dp->i_ino,
&offset, &valuelen,
&dst);
- xfs_buf_relse(bp);
- if (error)
+ xfs_buf_unlock(bp);
+ /* must be released by the caller */
+ if (args->op_flags & XFS_DA_OP_BUFFER)
+ args->bp = bp;
+ else
+ xfs_buf_rele(bp);
+
+ if (error) {
+ if (args->op_flags & XFS_DA_OP_BUFFER)
+ xfs_buf_rele(args->bp);
return error;
+ }
/* roll attribute extent map forwards */
lblkno += map[i].br_blockcount;
diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index a4b29827603f..269d26730bca 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -61,6 +61,7 @@ typedef struct xfs_da_args {
uint8_t filetype; /* filetype of inode for directories */
void *value; /* set of bytes (maybe contain NULLs) */
int valuelen; /* length of value */
+ struct xfs_buf *bp; /* OUT: xfs_buf which contains the attr */
unsigned int attr_filter; /* XFS_ATTR_{ROOT,SECURE,INCOMPLETE} */
unsigned int attr_flags; /* XATTR_{CREATE,REPLACE} */
xfs_dahash_t hashval; /* hash value of name */
@@ -95,6 +96,7 @@ typedef struct xfs_da_args {
#define XFS_DA_OP_REMOVE (1u << 6) /* this is a remove operation */
#define XFS_DA_OP_RECOVERY (1u << 7) /* Log recovery operation */
#define XFS_DA_OP_LOGGED (1u << 8) /* Use intent items to track op */
+#define XFS_DA_OP_BUFFER (1u << 9) /* Return underlying buffer */
#define XFS_DA_OP_FLAGS \
{ XFS_DA_OP_JUSTCHECK, "JUSTCHECK" }, \
@@ -105,7 +107,8 @@ typedef struct xfs_da_args {
{ XFS_DA_OP_NOTIME, "NOTIME" }, \
{ XFS_DA_OP_REMOVE, "REMOVE" }, \
{ XFS_DA_OP_RECOVERY, "RECOVERY" }, \
- { XFS_DA_OP_LOGGED, "LOGGED" }
+ { XFS_DA_OP_LOGGED, "LOGGED" }, \
+ { XFS_DA_OP_BUFFER, "BUFFER" }
/*
* Storage for holding state during Btree searches and split/join ops.
--
2.38.4
The read IO path provides callout for configuring ioend. This allows
filesystem to add verification of completed BIOs. The
xfs_prepare_read_ioend() configures bio->bi_end_io which places
verification task in the workqueue. The task does fs-verity
verification and then call back to the iomap to finish IO.
This patch add callouts implementation to verify pages with
fs-verity. Also implements folio operation .verify_folio for direct
folio verification by fs-verity.
Signed-off-by: Andrey Albershteyn <[email protected]>
---
fs/xfs/xfs_aops.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_iomap.c | 11 +++++++++++
fs/xfs/xfs_linux.h | 1 +
3 files changed, 57 insertions(+)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index daa0dd4768fb..2a3ab5afd665 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -548,6 +548,49 @@ xfs_vm_bmap(
return iomap_bmap(mapping, block, &xfs_read_iomap_ops);
}
+static void
+xfs_read_work_end_io(
+ struct work_struct *work)
+{
+ struct iomap_read_ioend *ioend =
+ container_of(work, struct iomap_read_ioend, work);
+ struct bio *bio = &ioend->read_inline_bio;
+
+ fsverity_verify_bio(bio);
+ iomap_read_end_io(bio);
+ /*
+ * The iomap_read_ioend has been freed by bio_put() in
+ * iomap_read_end_io()
+ */
+}
+
+static void
+xfs_read_end_io(
+ struct bio *bio)
+{
+ struct iomap_read_ioend *ioend =
+ container_of(bio, struct iomap_read_ioend, read_inline_bio);
+ struct xfs_inode *ip = XFS_I(ioend->io_inode);
+
+ WARN_ON_ONCE(!queue_work(ip->i_mount->m_postread_workqueue,
+ &ioend->work));
+}
+
+static void
+xfs_prepare_read_ioend(
+ struct iomap_read_ioend *ioend)
+{
+ if (!fsverity_active(ioend->io_inode))
+ return;
+
+ INIT_WORK(&ioend->work, &xfs_read_work_end_io);
+ ioend->read_inline_bio.bi_end_io = &xfs_read_end_io;
+}
+
+static const struct iomap_readpage_ops xfs_readpage_ops = {
+ .prepare_ioend = &xfs_prepare_read_ioend,
+};
+
STATIC int
xfs_vm_read_folio(
struct file *unused,
@@ -555,6 +598,7 @@ xfs_vm_read_folio(
{
struct iomap_readpage_ctx ctx = {
.cur_folio = folio,
+ .ops = &xfs_readpage_ops,
};
return iomap_read_folio(&ctx, &xfs_read_iomap_ops);
@@ -566,6 +610,7 @@ xfs_vm_readahead(
{
struct iomap_readpage_ctx ctx = {
.rac = rac,
+ .ops = &xfs_readpage_ops,
};
iomap_readahead(&ctx, &xfs_read_iomap_ops);
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 285885c308bd..e0f3c5d709f6 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -27,6 +27,7 @@
#include "xfs_dquot_item.h"
#include "xfs_dquot.h"
#include "xfs_reflink.h"
+#include "xfs_verity.h"
#define XFS_ALLOC_ALIGN(mp, off) \
(((off) >> mp->m_allocsize_log) << mp->m_allocsize_log)
@@ -83,8 +84,18 @@ xfs_iomap_valid(
return true;
}
+static bool
+xfs_verify_folio(
+ struct folio *folio,
+ loff_t pos,
+ unsigned int len)
+{
+ return fsverity_verify_folio(folio, len, pos);
+}
+
static const struct iomap_folio_ops xfs_iomap_folio_ops = {
.iomap_valid = xfs_iomap_valid,
+ .verify_folio = xfs_verify_folio,
};
int
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index e88f18f85e4b..c574fbf4b67d 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -63,6 +63,7 @@ typedef __u32 xfs_nlink_t;
#include <linux/rhashtable.h>
#include <linux/xattr.h>
#include <linux/mnt_idmapping.h>
+#include <linux/fsverity.h>
#include <asm/page.h>
#include <asm/div64.h>
--
2.38.4
Add IOMAP_F_READ_VERITY which indicates that iomap need to
verify BIO (e.g. fs-verity) after I/O is completed.
Add iomap_readpage_ops with only optional ->prepare_ioend() to allow
filesystem to add callout used for configuring read path ioend.
Mainly for setting ->bi_end_io() callout. Add
iomap_folio_ops->verify_folio() for direct folio verification.
The verification itself is suppose to happen on filesystem side. The
verification is done when the BIO is processed by calling out
->bi_end_io().
Make iomap_read_end_io() exportable, so, it can be called back from
filesystem callout after verification is done.
The read path ioend are stored side by side with BIOs allocated from
iomap_read_ioend_bioset.
Signed-off-by: Andrey Albershteyn <[email protected]>
---
fs/iomap/buffered-io.c | 32 +++++++++++++++++++++++++++++---
include/linux/iomap.h | 26 ++++++++++++++++++++++++++
2 files changed, 55 insertions(+), 3 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index d39be64b1da9..7e59c299c496 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -42,6 +42,7 @@ static inline struct iomap_page *to_iomap_page(struct folio *folio)
}
static struct bio_set iomap_ioend_bioset;
+static struct bio_set iomap_read_ioend_bioset;
static struct iomap_page *
iomap_page_create(struct inode *inode, struct folio *folio, unsigned int flags)
@@ -184,7 +185,7 @@ static void iomap_finish_folio_read(struct folio *folio, size_t offset,
folio_unlock(folio);
}
-static void iomap_read_end_io(struct bio *bio)
+void iomap_read_end_io(struct bio *bio)
{
int error = blk_status_to_errno(bio->bi_status);
struct folio_iter fi;
@@ -193,6 +194,7 @@ static void iomap_read_end_io(struct bio *bio)
iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
bio_put(bio);
}
+EXPORT_SYMBOL_GPL(iomap_read_end_io);
/**
* iomap_read_inline_data - copy inline data into the page cache
@@ -257,6 +259,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
loff_t orig_pos = pos;
size_t poff, plen;
sector_t sector;
+ struct iomap_read_ioend *ioend;
if (iomap->type == IOMAP_INLINE)
return iomap_read_inline_data(iter, folio);
@@ -269,6 +272,13 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
if (iomap_block_needs_zeroing(iter, pos)) {
folio_zero_range(folio, poff, plen);
+ if (iomap->flags & IOMAP_F_READ_VERITY) {
+ if (!iomap->folio_ops->verify_folio(folio, poff, plen)) {
+ folio_set_error(folio);
+ goto done;
+ }
+ }
+
iomap_set_range_uptodate(folio, iop, poff, plen);
goto done;
}
@@ -290,8 +300,8 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
if (ctx->rac) /* same as readahead_gfp_mask */
gfp |= __GFP_NORETRY | __GFP_NOWARN;
- ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
- REQ_OP_READ, gfp);
+ ctx->bio = bio_alloc_bioset(iomap->bdev, bio_max_segs(nr_vecs),
+ REQ_OP_READ, GFP_NOFS, &iomap_read_ioend_bioset);
/*
* If the bio_alloc fails, try it again for a single page to
* avoid having to deal with partial page reads. This emulates
@@ -305,6 +315,13 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
ctx->bio->bi_opf |= REQ_RAHEAD;
ctx->bio->bi_iter.bi_sector = sector;
ctx->bio->bi_end_io = iomap_read_end_io;
+
+ ioend = container_of(ctx->bio, struct iomap_read_ioend,
+ read_inline_bio);
+ ioend->io_inode = iter->inode;
+ if (ctx->ops && ctx->ops->prepare_ioend)
+ ctx->ops->prepare_ioend(ioend);
+
bio_add_folio(ctx->bio, folio, plen, poff);
}
@@ -1813,6 +1830,15 @@ EXPORT_SYMBOL_GPL(iomap_writepages);
static int __init iomap_init(void)
{
+ int error = 0;
+
+ error = bioset_init(&iomap_read_ioend_bioset,
+ 4 * (PAGE_SIZE / SECTOR_SIZE),
+ offsetof(struct iomap_read_ioend, read_inline_bio),
+ BIOSET_NEED_BVECS);
+ if (error)
+ return error;
+
return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
offsetof(struct iomap_ioend, io_inline_bio),
BIOSET_NEED_BVECS);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 0fbce375265d..9a17b53309c9 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -53,6 +53,9 @@ struct vm_fault;
*
* IOMAP_F_XATTR indicates that the iomap is for an extended attribute extent
* rather than a file data extent.
+ *
+ * IOMAP_F_READ_VERITY indicates that the iomap needs verification of read
+ * folios
*/
#define IOMAP_F_NEW (1U << 0)
#define IOMAP_F_DIRTY (1U << 1)
@@ -60,6 +63,7 @@ struct vm_fault;
#define IOMAP_F_MERGED (1U << 3)
#define IOMAP_F_BUFFER_HEAD (1U << 4)
#define IOMAP_F_XATTR (1U << 5)
+#define IOMAP_F_READ_VERITY (1U << 6)
/*
* Flags set by the core iomap code during operations:
@@ -156,6 +160,11 @@ struct iomap_folio_ops {
* locked by the iomap code.
*/
bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap);
+
+ /*
+ * Verify folio when successfully read
+ */
+ bool (*verify_folio)(struct folio *folio, loff_t pos, unsigned int len);
};
/*
@@ -258,13 +267,30 @@ int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
struct iomap *iomap, loff_t pos, loff_t length, ssize_t written,
int (*punch)(struct inode *inode, loff_t pos, loff_t length));
+struct iomap_read_ioend {
+ struct inode *io_inode; /* file being read from */
+ struct work_struct work; /* post read work (e.g. fs-verity) */
+ struct bio read_inline_bio;/* MUST BE LAST! */
+};
+
+struct iomap_readpage_ops {
+ /*
+ * Optional, allows the file systems to perform actions just before
+ * submitting the bio and/or override the bio bi_end_io handler for
+ * additional verification after bio is processed
+ */
+ void (*prepare_ioend)(struct iomap_read_ioend *ioend);
+};
+
struct iomap_readpage_ctx {
struct folio *cur_folio;
bool cur_folio_in_bio;
struct bio *bio;
struct readahead_control *rac;
+ const struct iomap_readpage_ops *ops;
};
+void iomap_read_end_io(struct bio *bio);
int iomap_read_folio(struct iomap_readpage_ctx *ctx,
const struct iomap_ops *ops);
void iomap_readahead(struct iomap_readpage_ctx *ctx,
--
2.38.4
The Merkle tree blocks and descriptor are stored in the extended
attributes of the inode. Add new attribute type for fs-verity
metadata. Add XFS_ATTR_INTERNAL_MASK to skip parent pointer and
fs-verity attributes as those are only for internal use. While we're
at it add a few comments in relevant places that internally visible
attributes are not suppose to be handled via interface defined in
xfs_xattr.c.
Signed-off-by: Andrey Albershteyn <[email protected]>
---
fs/xfs/libxfs/xfs_da_format.h | 10 +++++++++-
fs/xfs/libxfs/xfs_log_format.h | 1 +
fs/xfs/xfs_ioctl.c | 5 +++++
fs/xfs/xfs_trace.h | 1 +
fs/xfs/xfs_xattr.c | 9 +++++++++
5 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 75b13807145d..2b5967befc2e 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -689,14 +689,22 @@ struct xfs_attr3_leafblock {
#define XFS_ATTR_ROOT_BIT 1 /* limit access to trusted attrs */
#define XFS_ATTR_SECURE_BIT 2 /* limit access to secure attrs */
#define XFS_ATTR_PARENT_BIT 3 /* parent pointer attrs */
+#define XFS_ATTR_VERITY_BIT 4 /* verity merkle tree and descriptor */
#define XFS_ATTR_INCOMPLETE_BIT 7 /* attr in middle of create/delete */
#define XFS_ATTR_LOCAL (1u << XFS_ATTR_LOCAL_BIT)
#define XFS_ATTR_ROOT (1u << XFS_ATTR_ROOT_BIT)
#define XFS_ATTR_SECURE (1u << XFS_ATTR_SECURE_BIT)
#define XFS_ATTR_PARENT (1u << XFS_ATTR_PARENT_BIT)
+#define XFS_ATTR_VERITY (1u << XFS_ATTR_VERITY_BIT)
#define XFS_ATTR_INCOMPLETE (1u << XFS_ATTR_INCOMPLETE_BIT)
#define XFS_ATTR_NSP_ONDISK_MASK \
- (XFS_ATTR_ROOT | XFS_ATTR_SECURE | XFS_ATTR_PARENT)
+ (XFS_ATTR_ROOT | XFS_ATTR_SECURE | XFS_ATTR_PARENT | \
+ XFS_ATTR_VERITY)
+
+/*
+ * Internal attributes not exposed to the user
+ */
+#define XFS_ATTR_INTERNAL_MASK (XFS_ATTR_PARENT | XFS_ATTR_VERITY)
/*
* Alignment for namelist and valuelist entries (since they are mixed
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index 727b5a858028..678eacb7925c 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -968,6 +968,7 @@ struct xfs_icreate_log {
#define XFS_ATTRI_FILTER_MASK (XFS_ATTR_ROOT | \
XFS_ATTR_SECURE | \
XFS_ATTR_PARENT | \
+ XFS_ATTR_VERITY | \
XFS_ATTR_INCOMPLETE)
/*
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 55bb01173cde..3d6d680b6cf3 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -351,6 +351,11 @@ static unsigned int
xfs_attr_filter(
u32 ioc_flags)
{
+ /*
+ * Only externally visible attributes should be specified here.
+ * Internally used attributes (such as parent pointers or fs-verity)
+ * should not be exposed to userspace.
+ */
if (ioc_flags & XFS_IOC_ATTR_ROOT)
return XFS_ATTR_ROOT;
if (ioc_flags & XFS_IOC_ATTR_SECURE)
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 9c0006c55fec..e842b9d145cb 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -79,6 +79,7 @@ struct xfs_perag;
#define XFS_ATTR_FILTER_FLAGS \
{ XFS_ATTR_ROOT, "ROOT" }, \
{ XFS_ATTR_SECURE, "SECURE" }, \
+ { XFS_ATTR_VERITY, "VERITY" }, \
{ XFS_ATTR_INCOMPLETE, "INCOMPLETE" }
DECLARE_EVENT_CLASS(xfs_attr_list_class,
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index 7b9a0ed1b11f..5a71797fbd44 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -20,6 +20,12 @@
#include <linux/posix_acl_xattr.h>
+/*
+ * This file defines interface to work with externally visible extended
+ * attributes, such as those in system or security namespaces. This interface
+ * should not be used for internally used attributes (consider xfs_attr.c).
+ */
+
/*
* Get permission to use log-assisted atomic exchange of file extents.
*
@@ -234,6 +240,9 @@ xfs_xattr_put_listent(
ASSERT(context->count >= 0);
+ if (flags & XFS_ATTR_INTERNAL_MASK)
+ return;
+
if (flags & XFS_ATTR_ROOT) {
#ifdef CONFIG_XFS_POSIX_ACL
if (namelen == SGI_ACL_FILE_SIZE &&
--
2.38.4
Add flag to mark inodes which have fs-verity enabled on them (i.e.
descriptor exist and tree is built).
Signed-off-by: Andrey Albershteyn <[email protected]>
---
fs/ioctl.c | 4 ++++
fs/xfs/libxfs/xfs_format.h | 4 +++-
fs/xfs/xfs_inode.c | 2 ++
fs/xfs/xfs_iops.c | 2 ++
include/uapi/linux/fs.h | 1 +
5 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 5b2481cd4750..a274b33b2fd0 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -480,6 +480,8 @@ void fileattr_fill_xflags(struct fileattr *fa, u32 xflags)
fa->flags |= FS_DAX_FL;
if (fa->fsx_xflags & FS_XFLAG_PROJINHERIT)
fa->flags |= FS_PROJINHERIT_FL;
+ if (fa->fsx_xflags & FS_XFLAG_VERITY)
+ fa->flags |= FS_VERITY_FL;
}
EXPORT_SYMBOL(fileattr_fill_xflags);
@@ -510,6 +512,8 @@ void fileattr_fill_flags(struct fileattr *fa, u32 flags)
fa->fsx_xflags |= FS_XFLAG_DAX;
if (fa->flags & FS_PROJINHERIT_FL)
fa->fsx_xflags |= FS_XFLAG_PROJINHERIT;
+ if (fa->flags & FS_VERITY_FL)
+ fa->fsx_xflags |= FS_XFLAG_VERITY;
}
EXPORT_SYMBOL(fileattr_fill_flags);
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index ef617be2839c..ccb2ae5c2c93 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1070,16 +1070,18 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
#define XFS_DIFLAG2_COWEXTSIZE_BIT 2 /* copy on write extent size hint */
#define XFS_DIFLAG2_BIGTIME_BIT 3 /* big timestamps */
#define XFS_DIFLAG2_NREXT64_BIT 4 /* large extent counters */
+#define XFS_DIFLAG2_VERITY_BIT 5 /* inode sealed by fsverity */
#define XFS_DIFLAG2_DAX (1 << XFS_DIFLAG2_DAX_BIT)
#define XFS_DIFLAG2_REFLINK (1 << XFS_DIFLAG2_REFLINK_BIT)
#define XFS_DIFLAG2_COWEXTSIZE (1 << XFS_DIFLAG2_COWEXTSIZE_BIT)
#define XFS_DIFLAG2_BIGTIME (1 << XFS_DIFLAG2_BIGTIME_BIT)
#define XFS_DIFLAG2_NREXT64 (1 << XFS_DIFLAG2_NREXT64_BIT)
+#define XFS_DIFLAG2_VERITY (1 << XFS_DIFLAG2_VERITY_BIT)
#define XFS_DIFLAG2_ANY \
(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
- XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64)
+ XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_VERITY)
static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
{
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 5808abab786c..3b2bf9e7580b 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -634,6 +634,8 @@ xfs_ip2xflags(
flags |= FS_XFLAG_DAX;
if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
flags |= FS_XFLAG_COWEXTSIZE;
+ if (ip->i_diflags2 & XFS_DIFLAG2_VERITY)
+ flags |= FS_XFLAG_VERITY;
}
if (xfs_inode_has_attr_fork(ip))
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 24718adb3c16..5398be75a76a 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1232,6 +1232,8 @@ xfs_diflags_to_iflags(
flags |= S_NOATIME;
if (init && xfs_inode_should_enable_dax(ip))
flags |= S_DAX;
+ if (xflags & FS_XFLAG_VERITY)
+ flags |= S_VERITY;
/*
* S_DAX can only be set during inode initialization and is never set by
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index b7b56871029c..5172a2eb902c 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -140,6 +140,7 @@ struct fsxattr {
#define FS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */
#define FS_XFLAG_DAX 0x00008000 /* use DAX for IO */
#define FS_XFLAG_COWEXTSIZE 0x00010000 /* CoW extent size allocator hint */
+#define FS_XFLAG_VERITY 0x00020000 /* fs-verity sealed inode */
#define FS_XFLAG_HASATTR 0x80000000 /* no DIFLAG for this */
/* the read-only stuff doesn't really belong here, but any other place is
--
2.38.4
fs-verity will read and attach metadata (not the tree itself) from
a disk for those inodes which already have fs-verity enabled.
Signed-off-by: Andrey Albershteyn <[email protected]>
---
fs/xfs/xfs_file.c | 8 ++++++++
fs/xfs/xfs_super.c | 2 ++
2 files changed, 10 insertions(+)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 705250f9f90a..947b5c436172 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -31,6 +31,7 @@
#include <linux/mman.h>
#include <linux/fadvise.h>
#include <linux/mount.h>
+#include <linux/fsverity.h>
static const struct vm_operations_struct xfs_file_vm_ops;
@@ -1169,9 +1170,16 @@ xfs_file_open(
struct inode *inode,
struct file *file)
{
+ int error = 0;
+
if (xfs_is_shutdown(XFS_M(inode->i_sb)))
return -EIO;
file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC;
+
+ error = fsverity_file_open(inode, file);
+ if (error)
+ return error;
+
return generic_file_open(inode, file);
}
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index d6f22cb94ee2..d40de32362b1 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -46,6 +46,7 @@
#include <linux/magic.h>
#include <linux/fs_context.h>
#include <linux/fs_parser.h>
+#include <linux/fsverity.h>
static const struct super_operations xfs_super_operations;
@@ -667,6 +668,7 @@ xfs_fs_destroy_inode(
ASSERT(!rwsem_is_locked(&inode->i_rwsem));
XFS_STATS_INC(ip->i_mount, vn_rele);
XFS_STATS_INC(ip->i_mount, vn_remove);
+ fsverity_cleanup_inode(inode);
xfs_inode_mark_reclaimable(ip);
}
--
2.38.4
To mark inodes sealed with fs-verity the new XFS_DIFLAG2_VERITY flag
will be added in further patch. This requires ro-compat flag to let
older kernels know that fs with fs-verity can not be modified.
Signed-off-by: Andrey Albershteyn <[email protected]>
---
fs/xfs/libxfs/xfs_format.h | 1 +
fs/xfs/libxfs/xfs_sb.c | 2 ++
fs/xfs/xfs_mount.h | 2 ++
3 files changed, 5 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 371dc07233e0..ef617be2839c 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -353,6 +353,7 @@ xfs_sb_has_compat_feature(
#define XFS_SB_FEAT_RO_COMPAT_RMAPBT (1 << 1) /* reverse map btree */
#define XFS_SB_FEAT_RO_COMPAT_REFLINK (1 << 2) /* reflinked files */
#define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3) /* inobt block counts */
+#define XFS_SB_FEAT_RO_COMPAT_VERITY (1 << 4) /* fs-verity */
#define XFS_SB_FEAT_RO_COMPAT_ALL \
(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 99cc03a298e2..b1f1b21e8953 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -161,6 +161,8 @@ xfs_sb_version_to_features(
features |= XFS_FEAT_REFLINK;
if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
features |= XFS_FEAT_INOBTCNT;
+ if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_VERITY)
+ features |= XFS_FEAT_VERITY;
if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_FTYPE)
features |= XFS_FEAT_FTYPE;
if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_SPINODES)
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 53a4a9304937..9254c3cd9077 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -279,6 +279,7 @@ typedef struct xfs_mount {
#define XFS_FEAT_BIGTIME (1ULL << 24) /* large timestamps */
#define XFS_FEAT_NEEDSREPAIR (1ULL << 25) /* needs xfs_repair */
#define XFS_FEAT_NREXT64 (1ULL << 26) /* large extent counters */
+#define XFS_FEAT_VERITY (1ULL << 27) /* fs-verity */
/* Mount features */
#define XFS_FEAT_NOATTR2 (1ULL << 48) /* disable attr2 creation */
@@ -342,6 +343,7 @@ __XFS_HAS_FEAT(inobtcounts, INOBTCNT)
__XFS_HAS_FEAT(bigtime, BIGTIME)
__XFS_HAS_FEAT(needsrepair, NEEDSREPAIR)
__XFS_HAS_FEAT(large_extent_counts, NREXT64)
+__XFS_HAS_FEAT(verity, VERITY)
/*
* Mount features
--
2.38.4
As noted by Dave there are two problems with using fs-verity's
workqueue in XFS:
1. High priority workqueues are used within XFS to ensure that data
IO completion cannot stall processing of journal IO completions.
Hence using a WQ_HIGHPRI workqueue directly in the user data IO
path is a potential filesystem livelock/deadlock vector.
2. The fsverity workqueue is global - it creates a cross-filesystem
contention point.
This patch adds per-filesystem, per-cpu workqueue for fsverity
work.
Signed-off-by: Andrey Albershteyn <[email protected]>
---
fs/xfs/xfs_mount.h | 1 +
fs/xfs/xfs_super.c | 9 +++++++++
2 files changed, 10 insertions(+)
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index f3269c0626f0..53a4a9304937 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -107,6 +107,7 @@ typedef struct xfs_mount {
struct xfs_mru_cache *m_filestream; /* per-mount filestream data */
struct workqueue_struct *m_buf_workqueue;
struct workqueue_struct *m_unwritten_workqueue;
+ struct workqueue_struct *m_postread_workqueue;
struct workqueue_struct *m_reclaim_workqueue;
struct workqueue_struct *m_sync_workqueue;
struct workqueue_struct *m_blockgc_wq;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 4f814f9e12ab..d6f22cb94ee2 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -548,6 +548,12 @@ xfs_init_mount_workqueues(
if (!mp->m_unwritten_workqueue)
goto out_destroy_buf;
+ mp->m_postread_workqueue = alloc_workqueue("xfs-pread/%s",
+ XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
+ 0, mp->m_super->s_id);
+ if (!mp->m_postread_workqueue)
+ goto out_destroy_postread;
+
mp->m_reclaim_workqueue = alloc_workqueue("xfs-reclaim/%s",
XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
0, mp->m_super->s_id);
@@ -581,6 +587,8 @@ xfs_init_mount_workqueues(
destroy_workqueue(mp->m_reclaim_workqueue);
out_destroy_unwritten:
destroy_workqueue(mp->m_unwritten_workqueue);
+out_destroy_postread:
+ destroy_workqueue(mp->m_postread_workqueue);
out_destroy_buf:
destroy_workqueue(mp->m_buf_workqueue);
out:
@@ -596,6 +604,7 @@ xfs_destroy_mount_workqueues(
destroy_workqueue(mp->m_inodegc_wq);
destroy_workqueue(mp->m_reclaim_workqueue);
destroy_workqueue(mp->m_unwritten_workqueue);
+ destroy_workqueue(mp->m_postread_workqueue);
destroy_workqueue(mp->m_buf_workqueue);
}
--
2.38.4
fs-verity doesn't support DAX. Forbid filesystem to enable DAX on
inodes which already have fs-verity enabled. The opposite is checked
when fs-verity is enabled, it won't be enabled if DAX is.
Signed-off-by: Andrey Albershteyn <[email protected]>
---
fs/xfs/xfs_iops.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 5398be75a76a..e0d7107a9ba1 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1204,6 +1204,8 @@ xfs_inode_should_enable_dax(
return false;
if (!xfs_inode_supports_dax(ip))
return false;
+ if (ip->i_diflags2 & XFS_DIFLAG2_VERITY)
+ return false;
if (xfs_has_dax_always(ip->i_mount))
return true;
if (ip->i_diflags2 & XFS_DIFLAG2_DAX)
--
2.38.4
In case of different Merkle tree block size fs-verity expects
->read_merkle_tree_page() to return Merkle tree page filled with
Merkle tree blocks. The XFS stores each merkle tree block under
extended attribute. Those attributes are addressed by block offset
into Merkle tree.
This patch make ->read_merkle_tree_page() to fetch multiple merkle
tree blocks based on size ratio. Also the reference to each xfs_buf
is passed with page->private to ->drop_page().
Signed-off-by: Andrey Albershteyn <[email protected]>
---
fs/xfs/xfs_verity.c | 74 +++++++++++++++++++++++++++++++++++----------
fs/xfs/xfs_verity.h | 8 +++++
2 files changed, 66 insertions(+), 16 deletions(-)
diff --git a/fs/xfs/xfs_verity.c b/fs/xfs/xfs_verity.c
index a9874ff4efcd..ef0aff216f06 100644
--- a/fs/xfs/xfs_verity.c
+++ b/fs/xfs/xfs_verity.c
@@ -134,6 +134,10 @@ xfs_read_merkle_tree_page(
struct page *page = NULL;
__be64 name = cpu_to_be64(index << PAGE_SHIFT);
uint32_t bs = 1 << log_blocksize;
+ int blocks_per_page =
+ (1 << (PAGE_SHIFT - log_blocksize));
+ int n = 0;
+ int offset = 0;
struct xfs_da_args args = {
.dp = ip,
.attr_filter = XFS_ATTR_VERITY,
@@ -143,26 +147,59 @@ xfs_read_merkle_tree_page(
.valuelen = bs,
};
int error = 0;
+ bool is_checked = true;
+ struct xfs_verity_buf_list *buf_list;
page = alloc_page(GFP_KERNEL);
if (!page)
return ERR_PTR(-ENOMEM);
- error = xfs_attr_get(&args);
- if (error) {
- kmem_free(args.value);
- xfs_buf_rele(args.bp);
+ buf_list = kzalloc(sizeof(struct xfs_verity_buf_list), GFP_KERNEL);
+ if (!buf_list) {
put_page(page);
- return ERR_PTR(-EFAULT);
+ return ERR_PTR(-ENOMEM);
}
- if (args.bp->b_flags & XBF_VERITY_CHECKED)
+ /*
+ * Fill the page with Merkle tree blocks. The blcoks_per_page is higher
+ * than 1 when fs block size != PAGE_SIZE or Merkle tree block size !=
+ * PAGE SIZE
+ */
+ for (n = 0; n < blocks_per_page; n++) {
+ offset = bs * n;
+ name = cpu_to_be64(((index << PAGE_SHIFT) + offset));
+ args.name = (const uint8_t *)&name;
+
+ error = xfs_attr_get(&args);
+ if (error) {
+ kmem_free(args.value);
+ /*
+ * No more Merkle tree blocks (e.g. this was the last
+ * block of the tree)
+ */
+ if (error == -ENOATTR)
+ break;
+ xfs_buf_rele(args.bp);
+ put_page(page);
+ kmem_free(buf_list);
+ return ERR_PTR(-EFAULT);
+ }
+
+ buf_list->bufs[buf_list->buf_count++] = args.bp;
+
+ /* One of the buffers was dropped */
+ if (!(args.bp->b_flags & XBF_VERITY_CHECKED))
+ is_checked = false;
+
+ memcpy(page_address(page) + offset, args.value, args.valuelen);
+ kmem_free(args.value);
+ args.value = NULL;
+ }
+
+ if (is_checked)
SetPageChecked(page);
+ page->private = (unsigned long)buf_list;
- page->private = (unsigned long)args.bp;
- memcpy(page_address(page), args.value, args.valuelen);
-
- kmem_free(args.value);
return page;
}
@@ -191,16 +228,21 @@ xfs_write_merkle_tree_block(
static void
xfs_drop_page(
- struct page *page)
+ struct page *page)
{
- struct xfs_buf *buf = (struct xfs_buf *)page->private;
+ int i = 0;
+ struct xfs_verity_buf_list *buf_list =
+ (struct xfs_verity_buf_list *)page->private;
- ASSERT(buf != NULL);
+ ASSERT(buf_list != NULL);
- if (PageChecked(page))
- buf->b_flags |= XBF_VERITY_CHECKED;
+ for (i = 0; i < buf_list->buf_count; i++) {
+ if (PageChecked(page))
+ buf_list->bufs[i]->b_flags |= XBF_VERITY_CHECKED;
+ xfs_buf_rele(buf_list->bufs[i]);
+ }
- xfs_buf_rele(buf);
+ kmem_free(buf_list);
put_page(page);
}
diff --git a/fs/xfs/xfs_verity.h b/fs/xfs/xfs_verity.h
index ae5d87ca32a8..433b2f4ae3bc 100644
--- a/fs/xfs/xfs_verity.h
+++ b/fs/xfs/xfs_verity.h
@@ -16,4 +16,12 @@ extern const struct fsverity_operations xfs_verity_ops;
#define xfs_verity_ops NULL
#endif /* CONFIG_FS_VERITY */
+/* Minimal Merkle tree block size is 1024 */
+#define XFS_VERITY_MAX_MBLOCKS_PER_PAGE (1 << (PAGE_SHIFT - 10))
+
+struct xfs_verity_buf_list {
+ unsigned int buf_count;
+ struct xfs_buf *bufs[XFS_VERITY_MAX_MBLOCKS_PER_PAGE];
+};
+
#endif /* __XFS_VERITY_H__ */
--
2.38.4
The direct path is not supported on verity files. Attempts to use direct
I/O path on such files should fall back to buffered I/O path.
Signed-off-by: Andrey Albershteyn <[email protected]>
---
fs/xfs/xfs_file.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 947b5c436172..9e072e82f6c1 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -244,7 +244,8 @@ xfs_file_dax_read(
struct kiocb *iocb,
struct iov_iter *to)
{
- struct xfs_inode *ip = XFS_I(iocb->ki_filp->f_mapping->host);
+ struct inode *inode = iocb->ki_filp->f_mapping->host;
+ struct xfs_inode *ip = XFS_I(inode);
ssize_t ret = 0;
trace_xfs_file_dax_read(iocb, to);
@@ -297,10 +298,17 @@ xfs_file_read_iter(
if (IS_DAX(inode))
ret = xfs_file_dax_read(iocb, to);
- else if (iocb->ki_flags & IOCB_DIRECT)
+ else if (iocb->ki_flags & IOCB_DIRECT && !fsverity_active(inode))
ret = xfs_file_dio_read(iocb, to);
- else
+ else {
+ /*
+ * In case fs-verity is enabled, we also fallback to the
+ * buffered read from the direct read path. Therefore,
+ * IOCB_DIRECT is set and need to be cleared
+ */
+ iocb->ki_flags &= ~IOCB_DIRECT;
ret = xfs_file_buffered_read(iocb, to);
+ }
if (ret > 0)
XFS_STATS_ADD(mp, xs_read_bytes, ret);
--
2.38.4
Add integration with fs-verity. The XFS store fs-verity metadata in
the extended attributes. The metadata consist of verity descriptor
and Merkle tree blocks.
The descriptor is stored under "verity_descriptor" extended
attribute. The Merkle tree blocks are stored under binary indexes.
When fs-verity is enabled on an inode, the XFS_IVERITY_CONSTRUCTION
flag is set meaning that the Merkle tree is being build. The
initialization ends with storing of verity descriptor and setting
inode on-disk flag (XFS_DIFLAG2_VERITY).
The verification on read is done in iomap. Based on the inode verity
flag the IOMAP_F_READ_VERITY is set in xfs_read_iomap_begin() to let
iomap know that verification is needed.
Signed-off-by: Andrey Albershteyn <[email protected]>
---
fs/xfs/Makefile | 1 +
fs/xfs/libxfs/xfs_attr.c | 13 +++
fs/xfs/xfs_inode.h | 3 +-
fs/xfs/xfs_iomap.c | 3 +
fs/xfs/xfs_ondisk.h | 4 +
fs/xfs/xfs_super.c | 8 ++
fs/xfs/xfs_verity.c | 214 +++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_verity.h | 19 ++++
8 files changed, 264 insertions(+), 1 deletion(-)
create mode 100644 fs/xfs/xfs_verity.c
create mode 100644 fs/xfs/xfs_verity.h
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 92d88dc3c9f7..76174770d91a 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -130,6 +130,7 @@ xfs-$(CONFIG_XFS_POSIX_ACL) += xfs_acl.o
xfs-$(CONFIG_SYSCTL) += xfs_sysctl.o
xfs-$(CONFIG_COMPAT) += xfs_ioctl32.o
xfs-$(CONFIG_EXPORTFS_BLOCK_OPS) += xfs_pnfs.o
+xfs-$(CONFIG_FS_VERITY) += xfs_verity.o
# notify failure
ifeq ($(CONFIG_MEMORY_FAILURE),y)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 298b74245267..39d9038fbeee 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -26,6 +26,7 @@
#include "xfs_trace.h"
#include "xfs_attr_item.h"
#include "xfs_xattr.h"
+#include "xfs_verity.h"
struct kmem_cache *xfs_attr_intent_cache;
@@ -1635,6 +1636,18 @@ xfs_attr_namecheck(
return xfs_verify_pptr(mp, (struct xfs_parent_name_rec *)name);
}
+ if (flags & XFS_ATTR_VERITY) {
+ /* Merkle tree pages are stored under u64 indexes */
+ if (length == sizeof(__be64))
+ return true;
+
+ /* Verity descriptor blocks are held in a named attribute. */
+ if (length == XFS_VERITY_DESCRIPTOR_NAME_LEN)
+ return true;
+
+ return false;
+ }
+
return xfs_str_attr_namecheck(name, length);
}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 69d21e42c10a..a95f28cb049f 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -324,7 +324,8 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
* inactivation completes, both flags will be cleared and the inode is a
* plain old IRECLAIMABLE inode.
*/
-#define XFS_INACTIVATING (1 << 13)
+#define XFS_INACTIVATING (1 << 13)
+#define XFS_IVERITY_CONSTRUCTION (1 << 14) /* merkle tree construction */
/* All inode state flags related to inode reclaim. */
#define XFS_ALL_IRECLAIM_FLAGS (XFS_IRECLAIMABLE | \
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index e0f3c5d709f6..0adde39f02a5 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -143,6 +143,9 @@ xfs_bmbt_to_iomap(
(ip->i_itemp->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
iomap->flags |= IOMAP_F_DIRTY;
+ if (fsverity_active(VFS_I(ip)))
+ iomap->flags |= IOMAP_F_READ_VERITY;
+
iomap->validity_cookie = sequence_cookie;
iomap->folio_ops = &xfs_iomap_folio_ops;
return 0;
diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
index 9737b5a9f405..7fe88ccda519 100644
--- a/fs/xfs/xfs_ondisk.h
+++ b/fs/xfs/xfs_ondisk.h
@@ -189,6 +189,10 @@ xfs_check_ondisk_structs(void)
XFS_CHECK_VALUE(XFS_DQ_BIGTIME_EXPIRY_MIN << XFS_DQ_BIGTIME_SHIFT, 4);
XFS_CHECK_VALUE(XFS_DQ_BIGTIME_EXPIRY_MAX << XFS_DQ_BIGTIME_SHIFT,
16299260424LL);
+
+ /* fs-verity descriptor xattr name */
+ XFS_CHECK_VALUE(strlen(XFS_VERITY_DESCRIPTOR_NAME),
+ XFS_VERITY_DESCRIPTOR_NAME_LEN);
}
#endif /* __XFS_ONDISK_H */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index d40de32362b1..b6e99ed3b187 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -30,6 +30,7 @@
#include "xfs_filestream.h"
#include "xfs_quota.h"
#include "xfs_sysfs.h"
+#include "xfs_verity.h"
#include "xfs_ondisk.h"
#include "xfs_rmap_item.h"
#include "xfs_refcount_item.h"
@@ -1489,6 +1490,9 @@ xfs_fs_fill_super(
sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
#endif
sb->s_op = &xfs_super_operations;
+#ifdef CONFIG_FS_VERITY
+ sb->s_vop = &xfs_verity_ops;
+#endif
/*
* Delay mount work if the debug hook is set. This is debug
@@ -1685,6 +1689,10 @@ xfs_fs_fill_super(
xfs_warn(mp,
"EXPERIMENTAL Large extent counts feature in use. Use at your own risk!");
+ if (xfs_has_verity(mp))
+ xfs_alert(mp,
+ "EXPERIMENTAL fs-verity feature in use. Use at your own risk!");
+
error = xfs_mountfs(mp);
if (error)
goto out_filestream_unmount;
diff --git a/fs/xfs/xfs_verity.c b/fs/xfs/xfs_verity.c
new file mode 100644
index 000000000000..a9874ff4efcd
--- /dev/null
+++ b/fs/xfs/xfs_verity.c
@@ -0,0 +1,214 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2022 Red Hat, Inc.
+ */
+#include "xfs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_inode.h"
+#include "xfs_attr.h"
+#include "xfs_verity.h"
+#include "xfs_bmap_util.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+
+static int
+xfs_get_verity_descriptor(
+ struct inode *inode,
+ void *buf,
+ size_t buf_size)
+{
+ struct xfs_inode *ip = XFS_I(inode);
+ int error = 0;
+ struct xfs_da_args args = {
+ .dp = ip,
+ .attr_filter = XFS_ATTR_VERITY,
+ .name = (const uint8_t *)XFS_VERITY_DESCRIPTOR_NAME,
+ .namelen = XFS_VERITY_DESCRIPTOR_NAME_LEN,
+ .value = buf,
+ .valuelen = buf_size,
+ };
+
+ /*
+ * The fact that (returned attribute size) == (provided buf_size) is
+ * checked by xfs_attr_copy_value() (returns -ERANGE)
+ */
+ error = xfs_attr_get(&args);
+ if (error)
+ return error;
+
+ return args.valuelen;
+}
+
+static int
+xfs_begin_enable_verity(
+ struct file *filp)
+{
+ struct inode *inode = file_inode(filp);
+ struct xfs_inode *ip = XFS_I(inode);
+ int error = 0;
+
+ ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
+
+ if (IS_DAX(inode))
+ return -EINVAL;
+
+ if (xfs_iflags_test(ip, XFS_IVERITY_CONSTRUCTION))
+ return -EBUSY;
+ xfs_iflags_set(ip, XFS_IVERITY_CONSTRUCTION);
+
+ return error;
+}
+
+static int
+xfs_end_enable_verity(
+ struct file *filp,
+ const void *desc,
+ size_t desc_size,
+ u64 merkle_tree_size)
+{
+ struct inode *inode = file_inode(filp);
+ struct xfs_inode *ip = XFS_I(inode);
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_trans *tp;
+ struct xfs_da_args args = {
+ .dp = ip,
+ .whichfork = XFS_ATTR_FORK,
+ .attr_filter = XFS_ATTR_VERITY,
+ .attr_flags = XATTR_CREATE,
+ .name = (const uint8_t *)XFS_VERITY_DESCRIPTOR_NAME,
+ .namelen = XFS_VERITY_DESCRIPTOR_NAME_LEN,
+ .value = (void *)desc,
+ .valuelen = desc_size,
+ };
+ int error = 0;
+
+ ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
+
+ /* fs-verity failed, just cleanup */
+ if (desc == NULL)
+ goto out;
+
+ error = xfs_attr_set(&args);
+ if (error)
+ goto out;
+
+ /* Set fsverity inode flag */
+ error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_ichange,
+ 0, 0, false, &tp);
+ if (error)
+ goto out;
+
+ /*
+ * Ensure that we've persisted the verity information before we enable
+ * it on the inode and tell the caller we have sealed the inode.
+ */
+ ip->i_diflags2 |= XFS_DIFLAG2_VERITY;
+
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+ xfs_trans_set_sync(tp);
+
+ error = xfs_trans_commit(tp);
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+ if (!error)
+ inode->i_flags |= S_VERITY;
+
+out:
+ xfs_iflags_clear(ip, XFS_IVERITY_CONSTRUCTION);
+ return error;
+}
+
+static struct page *
+xfs_read_merkle_tree_page(
+ struct inode *inode,
+ pgoff_t index,
+ unsigned long num_ra_pages,
+ u8 log_blocksize)
+{
+ struct xfs_inode *ip = XFS_I(inode);
+ struct page *page = NULL;
+ __be64 name = cpu_to_be64(index << PAGE_SHIFT);
+ uint32_t bs = 1 << log_blocksize;
+ struct xfs_da_args args = {
+ .dp = ip,
+ .attr_filter = XFS_ATTR_VERITY,
+ .op_flags = XFS_DA_OP_BUFFER,
+ .name = (const uint8_t *)&name,
+ .namelen = sizeof(__be64),
+ .valuelen = bs,
+ };
+ int error = 0;
+
+ page = alloc_page(GFP_KERNEL);
+ if (!page)
+ return ERR_PTR(-ENOMEM);
+
+ error = xfs_attr_get(&args);
+ if (error) {
+ kmem_free(args.value);
+ xfs_buf_rele(args.bp);
+ put_page(page);
+ return ERR_PTR(-EFAULT);
+ }
+
+ if (args.bp->b_flags & XBF_VERITY_CHECKED)
+ SetPageChecked(page);
+
+ page->private = (unsigned long)args.bp;
+ memcpy(page_address(page), args.value, args.valuelen);
+
+ kmem_free(args.value);
+ return page;
+}
+
+static int
+xfs_write_merkle_tree_block(
+ struct inode *inode,
+ const void *buf,
+ u64 pos,
+ unsigned int size)
+{
+ struct xfs_inode *ip = XFS_I(inode);
+ __be64 name = cpu_to_be64(pos);
+ struct xfs_da_args args = {
+ .dp = ip,
+ .whichfork = XFS_ATTR_FORK,
+ .attr_filter = XFS_ATTR_VERITY,
+ .attr_flags = XATTR_CREATE,
+ .name = (const uint8_t *)&name,
+ .namelen = sizeof(__be64),
+ .value = (void *)buf,
+ .valuelen = size,
+ };
+
+ return xfs_attr_set(&args);
+}
+
+static void
+xfs_drop_page(
+ struct page *page)
+{
+ struct xfs_buf *buf = (struct xfs_buf *)page->private;
+
+ ASSERT(buf != NULL);
+
+ if (PageChecked(page))
+ buf->b_flags |= XBF_VERITY_CHECKED;
+
+ xfs_buf_rele(buf);
+ put_page(page);
+}
+
+const struct fsverity_operations xfs_verity_ops = {
+ .begin_enable_verity = &xfs_begin_enable_verity,
+ .end_enable_verity = &xfs_end_enable_verity,
+ .get_verity_descriptor = &xfs_get_verity_descriptor,
+ .read_merkle_tree_page = &xfs_read_merkle_tree_page,
+ .write_merkle_tree_block = &xfs_write_merkle_tree_block,
+ .drop_page = &xfs_drop_page,
+};
diff --git a/fs/xfs/xfs_verity.h b/fs/xfs/xfs_verity.h
new file mode 100644
index 000000000000..ae5d87ca32a8
--- /dev/null
+++ b/fs/xfs/xfs_verity.h
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2022 Red Hat, Inc.
+ */
+#ifndef __XFS_VERITY_H__
+#define __XFS_VERITY_H__
+
+#include <linux/fsverity.h>
+
+#define XFS_VERITY_DESCRIPTOR_NAME "verity_descriptor"
+#define XFS_VERITY_DESCRIPTOR_NAME_LEN 17
+
+#ifdef CONFIG_FS_VERITY
+extern const struct fsverity_operations xfs_verity_ops;
+#else
+#define xfs_verity_ops NULL
+#endif /* CONFIG_FS_VERITY */
+
+#endif /* __XFS_VERITY_H__ */
--
2.38.4
Add fs-verity ioctls to enable, dump metadata (descriptor and Merkle
tree pages) and obtain file's digest.
Signed-off-by: Andrey Albershteyn <[email protected]>
---
fs/xfs/xfs_ioctl.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 3d6d680b6cf3..ffa04f0aed4a 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -42,6 +42,7 @@
#include <linux/mount.h>
#include <linux/namei.h>
#include <linux/fileattr.h>
+#include <linux/fsverity.h>
/*
* xfs_find_handle maps from userspace xfs_fsop_handlereq structure to
@@ -2154,6 +2155,22 @@ xfs_file_ioctl(
return error;
}
+ case FS_IOC_ENABLE_VERITY:
+ if (!xfs_has_verity(mp))
+ return -EOPNOTSUPP;
+ return fsverity_ioctl_enable(filp, (const void __user *)arg);
+
+ case FS_IOC_MEASURE_VERITY:
+ if (!xfs_has_verity(mp))
+ return -EOPNOTSUPP;
+ return fsverity_ioctl_measure(filp, (void __user *)arg);
+
+ case FS_IOC_READ_VERITY_METADATA:
+ if (!xfs_has_verity(mp))
+ return -EOPNOTSUPP;
+ return fsverity_ioctl_read_metadata(filp,
+ (const void __user *)arg);
+
default:
return -ENOTTY;
}
--
2.38.4
Finalize fs-verity integration in XFS by making kernel fs-verity
aware with ro-compat flag.
Signed-off-by: Andrey Albershteyn <[email protected]>
---
fs/xfs/libxfs/xfs_format.h | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index ccb2ae5c2c93..a21612319765 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -355,10 +355,11 @@ xfs_sb_has_compat_feature(
#define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3) /* inobt block counts */
#define XFS_SB_FEAT_RO_COMPAT_VERITY (1 << 4) /* fs-verity */
#define XFS_SB_FEAT_RO_COMPAT_ALL \
- (XFS_SB_FEAT_RO_COMPAT_FINOBT | \
- XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
- XFS_SB_FEAT_RO_COMPAT_REFLINK| \
- XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
+ (XFS_SB_FEAT_RO_COMPAT_FINOBT | \
+ XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
+ XFS_SB_FEAT_RO_COMPAT_REFLINK | \
+ XFS_SB_FEAT_RO_COMPAT_INOBTCNT| \
+ XFS_SB_FEAT_RO_COMPAT_VERITY)
#define XFS_SB_FEAT_RO_COMPAT_UNKNOWN ~XFS_SB_FEAT_RO_COMPAT_ALL
static inline bool
xfs_sb_has_ro_compat_feature(
--
2.38.4
On Tue, Apr 04, 2023 at 04:53:01PM +0200, Andrey Albershteyn wrote:
> Not the whole folio always need to be verified by fs-verity (e.g.
> with 1k blocks). Use passed folio's offset and size.
Why can't those callers just call fsverity_verify_blocks directly?
On Tue, Apr 04, 2023 at 04:53:04PM +0200, Andrey Albershteyn wrote:
> Make filesystems create readpage context, similar as
> iomap_writepage_ctx in write path. This will allow filesystem to
> pass _ops to iomap for ioend configuration (->prepare_ioend) which
> in turn would be used to set BIO end callout (bio->bi_end_io).
>
> This will be utilized in further patches by fs-verity to verify
> pages on BIO completion by XFS.
Hmm. Can't we just have a version of the readpage helper that just
gets the ops passed instead of creating all this boilerplate code?
For writepage XFS embedds the context in it's own structure, so it's
kindof needed, but I don't think we'll need that here.
> if (iomap_block_needs_zeroing(iter, pos)) {
> folio_zero_range(folio, poff, plen);
> + if (iomap->flags & IOMAP_F_READ_VERITY) {
Wju do we need the new flag vs just testing that folio_ops and
folio_ops->verify_folio is non-NULL?
> - ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
> - REQ_OP_READ, gfp);
> + ctx->bio = bio_alloc_bioset(iomap->bdev, bio_max_segs(nr_vecs),
> + REQ_OP_READ, GFP_NOFS, &iomap_read_ioend_bioset);
All other callers don't really need the larger bioset, so I'd avoid
the unconditional allocation here, but more on that later.
> + ioend = container_of(ctx->bio, struct iomap_read_ioend,
> + read_inline_bio);
> + ioend->io_inode = iter->inode;
> + if (ctx->ops && ctx->ops->prepare_ioend)
> + ctx->ops->prepare_ioend(ioend);
> +
So what we're doing in writeback and direct I/O, is to:
a) have a submit_bio hook
b) allow the file system to then hook the bi_end_io caller
c) (only in direct O/O for now) allow the file system to provide
a bio_set to allocate from
I wonder if that also makes sense and keep all the deferral in the
file system. We'll need that for the btrfs iomap conversion anyway,
and it seems more flexible. The ioend processing would then move into
XFS.
> @@ -156,6 +160,11 @@ struct iomap_folio_ops {
> * locked by the iomap code.
> */
> bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap);
> +
> + /*
> + * Verify folio when successfully read
> + */
> + bool (*verify_folio)(struct folio *folio, loff_t pos, unsigned int len);
Why isn't this in iomap_readpage_ops?
On Tue, Apr 04, 2023 at 04:53:15PM +0200, Andrey Albershteyn wrote:
> The direct path is not supported on verity files. Attempts to use direct
> I/O path on such files should fall back to buffered I/O path.
>
> Signed-off-by: Andrey Albershteyn <[email protected]>
> ---
> fs/xfs/xfs_file.c | 14 +++++++++++---
> 1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 947b5c436172..9e072e82f6c1 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -244,7 +244,8 @@ xfs_file_dax_read(
> struct kiocb *iocb,
> struct iov_iter *to)
> {
> - struct xfs_inode *ip = XFS_I(iocb->ki_filp->f_mapping->host);
> + struct inode *inode = iocb->ki_filp->f_mapping->host;
> + struct xfs_inode *ip = XFS_I(inode);
> ssize_t ret = 0;
>
> trace_xfs_file_dax_read(iocb, to);
> @@ -297,10 +298,17 @@ xfs_file_read_iter(
>
> if (IS_DAX(inode))
> ret = xfs_file_dax_read(iocb, to);
> - else if (iocb->ki_flags & IOCB_DIRECT)
> + else if (iocb->ki_flags & IOCB_DIRECT && !fsverity_active(inode))
> ret = xfs_file_dio_read(iocb, to);
> - else
> + else {
> + /*
> + * In case fs-verity is enabled, we also fallback to the
> + * buffered read from the direct read path. Therefore,
> + * IOCB_DIRECT is set and need to be cleared
> + */
> + iocb->ki_flags &= ~IOCB_DIRECT;
> ret = xfs_file_buffered_read(iocb, to);
XFS doesn't usually allow directio fallback to the pagecache. Why would
fsverity be any different?
--D
> + }
>
> if (ret > 0)
> XFS_STATS_ADD(mp, xs_read_bytes, ret);
> --
> 2.38.4
>
On Tue, Apr 04, 2023 at 04:53:16PM +0200, Andrey Albershteyn wrote:
> Add integration with fs-verity. The XFS store fs-verity metadata in
> the extended attributes. The metadata consist of verity descriptor
> and Merkle tree blocks.
>
> The descriptor is stored under "verity_descriptor" extended
> attribute. The Merkle tree blocks are stored under binary indexes.
>
> When fs-verity is enabled on an inode, the XFS_IVERITY_CONSTRUCTION
> flag is set meaning that the Merkle tree is being build. The
> initialization ends with storing of verity descriptor and setting
> inode on-disk flag (XFS_DIFLAG2_VERITY).
>
> The verification on read is done in iomap. Based on the inode verity
> flag the IOMAP_F_READ_VERITY is set in xfs_read_iomap_begin() to let
> iomap know that verification is needed.
>
> Signed-off-by: Andrey Albershteyn <[email protected]>
> ---
> fs/xfs/Makefile | 1 +
> fs/xfs/libxfs/xfs_attr.c | 13 +++
> fs/xfs/xfs_inode.h | 3 +-
> fs/xfs/xfs_iomap.c | 3 +
> fs/xfs/xfs_ondisk.h | 4 +
> fs/xfs/xfs_super.c | 8 ++
> fs/xfs/xfs_verity.c | 214 +++++++++++++++++++++++++++++++++++++++
> fs/xfs/xfs_verity.h | 19 ++++
> 8 files changed, 264 insertions(+), 1 deletion(-)
> create mode 100644 fs/xfs/xfs_verity.c
> create mode 100644 fs/xfs/xfs_verity.h
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 92d88dc3c9f7..76174770d91a 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -130,6 +130,7 @@ xfs-$(CONFIG_XFS_POSIX_ACL) += xfs_acl.o
> xfs-$(CONFIG_SYSCTL) += xfs_sysctl.o
> xfs-$(CONFIG_COMPAT) += xfs_ioctl32.o
> xfs-$(CONFIG_EXPORTFS_BLOCK_OPS) += xfs_pnfs.o
> +xfs-$(CONFIG_FS_VERITY) += xfs_verity.o
>
> # notify failure
> ifeq ($(CONFIG_MEMORY_FAILURE),y)
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 298b74245267..39d9038fbeee 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -26,6 +26,7 @@
> #include "xfs_trace.h"
> #include "xfs_attr_item.h"
> #include "xfs_xattr.h"
> +#include "xfs_verity.h"
>
> struct kmem_cache *xfs_attr_intent_cache;
>
> @@ -1635,6 +1636,18 @@ xfs_attr_namecheck(
> return xfs_verify_pptr(mp, (struct xfs_parent_name_rec *)name);
> }
>
> + if (flags & XFS_ATTR_VERITY) {
> + /* Merkle tree pages are stored under u64 indexes */
> + if (length == sizeof(__be64))
This ondisk structure should be actual structs that we can grep and
ctags on, not open-coded __be64 scattered around the xattr code.
> + return true;
> +
> + /* Verity descriptor blocks are held in a named attribute. */
> + if (length == XFS_VERITY_DESCRIPTOR_NAME_LEN)
> + return true;
> +
> + return false;
> + }
> +
> return xfs_str_attr_namecheck(name, length);
> }
>
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 69d21e42c10a..a95f28cb049f 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -324,7 +324,8 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
> * inactivation completes, both flags will be cleared and the inode is a
> * plain old IRECLAIMABLE inode.
> */
> -#define XFS_INACTIVATING (1 << 13)
> +#define XFS_INACTIVATING (1 << 13)
> +#define XFS_IVERITY_CONSTRUCTION (1 << 14) /* merkle tree construction */
>
> /* All inode state flags related to inode reclaim. */
> #define XFS_ALL_IRECLAIM_FLAGS (XFS_IRECLAIMABLE | \
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index e0f3c5d709f6..0adde39f02a5 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -143,6 +143,9 @@ xfs_bmbt_to_iomap(
> (ip->i_itemp->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
> iomap->flags |= IOMAP_F_DIRTY;
>
> + if (fsverity_active(VFS_I(ip)))
> + iomap->flags |= IOMAP_F_READ_VERITY;
> +
> iomap->validity_cookie = sequence_cookie;
> iomap->folio_ops = &xfs_iomap_folio_ops;
> return 0;
> diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
> index 9737b5a9f405..7fe88ccda519 100644
> --- a/fs/xfs/xfs_ondisk.h
> +++ b/fs/xfs/xfs_ondisk.h
> @@ -189,6 +189,10 @@ xfs_check_ondisk_structs(void)
> XFS_CHECK_VALUE(XFS_DQ_BIGTIME_EXPIRY_MIN << XFS_DQ_BIGTIME_SHIFT, 4);
> XFS_CHECK_VALUE(XFS_DQ_BIGTIME_EXPIRY_MAX << XFS_DQ_BIGTIME_SHIFT,
> 16299260424LL);
> +
> + /* fs-verity descriptor xattr name */
> + XFS_CHECK_VALUE(strlen(XFS_VERITY_DESCRIPTOR_NAME),
Are you encoding the trailing null in the xattr name too? The attr name
length is stored explicitly, so the null isn't strictly necessary.
> + XFS_VERITY_DESCRIPTOR_NAME_LEN);
> }
>
> #endif /* __XFS_ONDISK_H */
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index d40de32362b1..b6e99ed3b187 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -30,6 +30,7 @@
> #include "xfs_filestream.h"
> #include "xfs_quota.h"
> #include "xfs_sysfs.h"
> +#include "xfs_verity.h"
> #include "xfs_ondisk.h"
> #include "xfs_rmap_item.h"
> #include "xfs_refcount_item.h"
> @@ -1489,6 +1490,9 @@ xfs_fs_fill_super(
> sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
> #endif
> sb->s_op = &xfs_super_operations;
> +#ifdef CONFIG_FS_VERITY
> + sb->s_vop = &xfs_verity_ops;
> +#endif
>
> /*
> * Delay mount work if the debug hook is set. This is debug
> @@ -1685,6 +1689,10 @@ xfs_fs_fill_super(
> xfs_warn(mp,
> "EXPERIMENTAL Large extent counts feature in use. Use at your own risk!");
>
> + if (xfs_has_verity(mp))
> + xfs_alert(mp,
> + "EXPERIMENTAL fs-verity feature in use. Use at your own risk!");
> +
> error = xfs_mountfs(mp);
> if (error)
> goto out_filestream_unmount;
> diff --git a/fs/xfs/xfs_verity.c b/fs/xfs/xfs_verity.c
> new file mode 100644
> index 000000000000..a9874ff4efcd
> --- /dev/null
> +++ b/fs/xfs/xfs_verity.c
> @@ -0,0 +1,214 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2022 Red Hat, Inc.
> + */
> +#include "xfs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_inode.h"
> +#include "xfs_attr.h"
> +#include "xfs_verity.h"
> +#include "xfs_bmap_util.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +
> +static int
> +xfs_get_verity_descriptor(
> + struct inode *inode,
> + void *buf,
This is secretly a pointer to a struct fsverity_descriptor, right?
> + size_t buf_size)
> +{
> + struct xfs_inode *ip = XFS_I(inode);
> + int error = 0;
> + struct xfs_da_args args = {
> + .dp = ip,
> + .attr_filter = XFS_ATTR_VERITY,
> + .name = (const uint8_t *)XFS_VERITY_DESCRIPTOR_NAME,
> + .namelen = XFS_VERITY_DESCRIPTOR_NAME_LEN,
> + .value = buf,
> + .valuelen = buf_size,
> + };
> +
> + /*
> + * The fact that (returned attribute size) == (provided buf_size) is
> + * checked by xfs_attr_copy_value() (returns -ERANGE)
> + */
> + error = xfs_attr_get(&args);
> + if (error)
> + return error;
> +
> + return args.valuelen;
> +}
> +
> +static int
> +xfs_begin_enable_verity(
> + struct file *filp)
> +{
> + struct inode *inode = file_inode(filp);
> + struct xfs_inode *ip = XFS_I(inode);
> + int error = 0;
> +
> + ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
Do we need to take MMAPLOCK_EXCL to lock out page faults too?
> +
> + if (IS_DAX(inode))
> + return -EINVAL;
> +
> + if (xfs_iflags_test(ip, XFS_IVERITY_CONSTRUCTION))
> + return -EBUSY;
> + xfs_iflags_set(ip, XFS_IVERITY_CONSTRUCTION);
xfs_iflags_test_and_set?
> +
> + return error;
> +}
> +
> +static int
> +xfs_end_enable_verity(
> + struct file *filp,
> + const void *desc,
> + size_t desc_size,
> + u64 merkle_tree_size)
> +{
> + struct inode *inode = file_inode(filp);
> + struct xfs_inode *ip = XFS_I(inode);
> + struct xfs_mount *mp = ip->i_mount;
> + struct xfs_trans *tp;
> + struct xfs_da_args args = {
> + .dp = ip,
> + .whichfork = XFS_ATTR_FORK,
> + .attr_filter = XFS_ATTR_VERITY,
> + .attr_flags = XATTR_CREATE,
> + .name = (const uint8_t *)XFS_VERITY_DESCRIPTOR_NAME,
> + .namelen = XFS_VERITY_DESCRIPTOR_NAME_LEN,
> + .value = (void *)desc,
> + .valuelen = desc_size,
> + };
/me wonders if the common da args initialization could be a separate
helper, but for three callsites that might not be worth the effort.
> + int error = 0;
> +
> + ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
> +
> + /* fs-verity failed, just cleanup */
> + if (desc == NULL)
> + goto out;
> +
> + error = xfs_attr_set(&args);
> + if (error)
> + goto out;
> +
> + /* Set fsverity inode flag */
> + error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_ichange,
> + 0, 0, false, &tp);
> + if (error)
> + goto out;
> +
> + /*
> + * Ensure that we've persisted the verity information before we enable
> + * it on the inode and tell the caller we have sealed the inode.
> + */
> + ip->i_diflags2 |= XFS_DIFLAG2_VERITY;
> +
> + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> + xfs_trans_set_sync(tp);
> +
> + error = xfs_trans_commit(tp);
> + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +
> + if (!error)
> + inode->i_flags |= S_VERITY;
> +
> +out:
> + xfs_iflags_clear(ip, XFS_IVERITY_CONSTRUCTION);
If the construction fails, should we erase all the XFS_ATTR_VERITY
attributes?
> + return error;
> +}
> +
> +static struct page *
> +xfs_read_merkle_tree_page(
> + struct inode *inode,
> + pgoff_t index,
> + unsigned long num_ra_pages,
> + u8 log_blocksize)
> +{
> + struct xfs_inode *ip = XFS_I(inode);
> + struct page *page = NULL;
> + __be64 name = cpu_to_be64(index << PAGE_SHIFT);
> + uint32_t bs = 1 << log_blocksize;
> + struct xfs_da_args args = {
> + .dp = ip,
> + .attr_filter = XFS_ATTR_VERITY,
> + .op_flags = XFS_DA_OP_BUFFER,
If we're going to pass the xfs_buf out of the attr_get code, why not
increment the refcount on the buffer and pass its page to the caller?
That would save allocating the value buffer, the xfs_buf, and a page.
Do we not trust fsverity not to scribble on the page?
(Can we mark the buffer page readonly and pass it out?)
Or, is the problem here that we can only pass fsverity a full page, even
if the merkle tree blocksize is (say) 1K? Or 64K?
(I noticed we're back to passing around pages and not folios.)
> + .name = (const uint8_t *)&name,
> + .namelen = sizeof(__be64),
> + .valuelen = bs,
> + };
> + int error = 0;
> +
> + page = alloc_page(GFP_KERNEL);
> + if (!page)
> + return ERR_PTR(-ENOMEM);
> +
> + error = xfs_attr_get(&args);
> + if (error) {
> + kmem_free(args.value);
> + xfs_buf_rele(args.bp);
> + put_page(page);
> + return ERR_PTR(-EFAULT);
> + }
> +
> + if (args.bp->b_flags & XBF_VERITY_CHECKED)
> + SetPageChecked(page);
> +
> + page->private = (unsigned long)args.bp;
> + memcpy(page_address(page), args.value, args.valuelen);
> +
> + kmem_free(args.value);
> + return page;
> +}
> +
> +static int
> +xfs_write_merkle_tree_block(
> + struct inode *inode,
> + const void *buf,
> + u64 pos,
> + unsigned int size)
> +{
> + struct xfs_inode *ip = XFS_I(inode);
> + __be64 name = cpu_to_be64(pos);
> + struct xfs_da_args args = {
> + .dp = ip,
> + .whichfork = XFS_ATTR_FORK,
> + .attr_filter = XFS_ATTR_VERITY,
> + .attr_flags = XATTR_CREATE,
> + .name = (const uint8_t *)&name,
> + .namelen = sizeof(__be64),
> + .value = (void *)buf,
> + .valuelen = size,
> + };
> +
> + return xfs_attr_set(&args);
> +}
> +
> +static void
> +xfs_drop_page(
> + struct page *page)
> +{
> + struct xfs_buf *buf = (struct xfs_buf *)page->private;
Indenting nit ^ here
> +
> + ASSERT(buf != NULL);
> +
> + if (PageChecked(page))
> + buf->b_flags |= XBF_VERITY_CHECKED;
> +
> + xfs_buf_rele(buf);
> + put_page(page);
> +}
> +
> +const struct fsverity_operations xfs_verity_ops = {
> + .begin_enable_verity = &xfs_begin_enable_verity,
> + .end_enable_verity = &xfs_end_enable_verity,
> + .get_verity_descriptor = &xfs_get_verity_descriptor,
> + .read_merkle_tree_page = &xfs_read_merkle_tree_page,
> + .write_merkle_tree_block = &xfs_write_merkle_tree_block,
> + .drop_page = &xfs_drop_page,
Here too ^ (line up the equals operators, please).
--D
> +};
> diff --git a/fs/xfs/xfs_verity.h b/fs/xfs/xfs_verity.h
> new file mode 100644
> index 000000000000..ae5d87ca32a8
> --- /dev/null
> +++ b/fs/xfs/xfs_verity.h
> @@ -0,0 +1,19 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2022 Red Hat, Inc.
> + */
> +#ifndef __XFS_VERITY_H__
> +#define __XFS_VERITY_H__
> +
> +#include <linux/fsverity.h>
> +
> +#define XFS_VERITY_DESCRIPTOR_NAME "verity_descriptor"
> +#define XFS_VERITY_DESCRIPTOR_NAME_LEN 17
> +
> +#ifdef CONFIG_FS_VERITY
> +extern const struct fsverity_operations xfs_verity_ops;
> +#else
> +#define xfs_verity_ops NULL
> +#endif /* CONFIG_FS_VERITY */
> +
> +#endif /* __XFS_VERITY_H__ */
> --
> 2.38.4
>
On Tue, Apr 04, 2023 at 04:53:17PM +0200, Andrey Albershteyn wrote:
> In case of different Merkle tree block size fs-verity expects
> ->read_merkle_tree_page() to return Merkle tree page filled with
> Merkle tree blocks. The XFS stores each merkle tree block under
> extended attribute. Those attributes are addressed by block offset
> into Merkle tree.
>
> This patch make ->read_merkle_tree_page() to fetch multiple merkle
> tree blocks based on size ratio. Also the reference to each xfs_buf
> is passed with page->private to ->drop_page().
>
> Signed-off-by: Andrey Albershteyn <[email protected]>
> ---
> fs/xfs/xfs_verity.c | 74 +++++++++++++++++++++++++++++++++++----------
> fs/xfs/xfs_verity.h | 8 +++++
> 2 files changed, 66 insertions(+), 16 deletions(-)
>
> diff --git a/fs/xfs/xfs_verity.c b/fs/xfs/xfs_verity.c
> index a9874ff4efcd..ef0aff216f06 100644
> --- a/fs/xfs/xfs_verity.c
> +++ b/fs/xfs/xfs_verity.c
> @@ -134,6 +134,10 @@ xfs_read_merkle_tree_page(
> struct page *page = NULL;
> __be64 name = cpu_to_be64(index << PAGE_SHIFT);
> uint32_t bs = 1 << log_blocksize;
> + int blocks_per_page =
> + (1 << (PAGE_SHIFT - log_blocksize));
> + int n = 0;
> + int offset = 0;
> struct xfs_da_args args = {
> .dp = ip,
> .attr_filter = XFS_ATTR_VERITY,
> @@ -143,26 +147,59 @@ xfs_read_merkle_tree_page(
> .valuelen = bs,
> };
> int error = 0;
> + bool is_checked = true;
> + struct xfs_verity_buf_list *buf_list;
>
> page = alloc_page(GFP_KERNEL);
> if (!page)
> return ERR_PTR(-ENOMEM);
>
> - error = xfs_attr_get(&args);
> - if (error) {
> - kmem_free(args.value);
> - xfs_buf_rele(args.bp);
> + buf_list = kzalloc(sizeof(struct xfs_verity_buf_list), GFP_KERNEL);
> + if (!buf_list) {
> put_page(page);
> - return ERR_PTR(-EFAULT);
> + return ERR_PTR(-ENOMEM);
> }
>
> - if (args.bp->b_flags & XBF_VERITY_CHECKED)
> + /*
> + * Fill the page with Merkle tree blocks. The blcoks_per_page is higher
> + * than 1 when fs block size != PAGE_SIZE or Merkle tree block size !=
> + * PAGE SIZE
> + */
> + for (n = 0; n < blocks_per_page; n++) {
Ahah, ok, that's why we can't pass the xfs_buf pages up to fsverity.
> + offset = bs * n;
> + name = cpu_to_be64(((index << PAGE_SHIFT) + offset));
Really this ought to be a typechecked helper...
struct xfs_fsverity_merkle_key {
__be64 merkleoff;
};
static inline void
xfs_fsverity_merkle_key_to_disk(struct xfs_fsverity_merkle_key *k, loff_t pos)
{
k->merkeloff = cpu_to_be64(pos);
}
> + args.name = (const uint8_t *)&name;
> +
> + error = xfs_attr_get(&args);
> + if (error) {
> + kmem_free(args.value);
> + /*
> + * No more Merkle tree blocks (e.g. this was the last
> + * block of the tree)
> + */
> + if (error == -ENOATTR)
> + break;
> + xfs_buf_rele(args.bp);
> + put_page(page);
> + kmem_free(buf_list);
> + return ERR_PTR(-EFAULT);
> + }
> +
> + buf_list->bufs[buf_list->buf_count++] = args.bp;
> +
> + /* One of the buffers was dropped */
> + if (!(args.bp->b_flags & XBF_VERITY_CHECKED))
> + is_checked = false;
If there's enough memory pressure to cause the merkle tree pages to get
evicted, what are the chances that the xfs_bufs survive the eviction?
> + memcpy(page_address(page) + offset, args.value, args.valuelen);
> + kmem_free(args.value);
> + args.value = NULL;
> + }
> +
> + if (is_checked)
> SetPageChecked(page);
> + page->private = (unsigned long)buf_list;
>
> - page->private = (unsigned long)args.bp;
> - memcpy(page_address(page), args.value, args.valuelen);
> -
> - kmem_free(args.value);
> return page;
> }
>
> @@ -191,16 +228,21 @@ xfs_write_merkle_tree_block(
>
> static void
> xfs_drop_page(
> - struct page *page)
> + struct page *page)
> {
> - struct xfs_buf *buf = (struct xfs_buf *)page->private;
> + int i = 0;
> + struct xfs_verity_buf_list *buf_list =
> + (struct xfs_verity_buf_list *)page->private;
>
> - ASSERT(buf != NULL);
> + ASSERT(buf_list != NULL);
>
> - if (PageChecked(page))
> - buf->b_flags |= XBF_VERITY_CHECKED;
> + for (i = 0; i < buf_list->buf_count; i++) {
> + if (PageChecked(page))
> + buf_list->bufs[i]->b_flags |= XBF_VERITY_CHECKED;
> + xfs_buf_rele(buf_list->bufs[i]);
> + }
>
> - xfs_buf_rele(buf);
> + kmem_free(buf_list);
> put_page(page);
> }
>
> diff --git a/fs/xfs/xfs_verity.h b/fs/xfs/xfs_verity.h
> index ae5d87ca32a8..433b2f4ae3bc 100644
> --- a/fs/xfs/xfs_verity.h
> +++ b/fs/xfs/xfs_verity.h
> @@ -16,4 +16,12 @@ extern const struct fsverity_operations xfs_verity_ops;
> #define xfs_verity_ops NULL
> #endif /* CONFIG_FS_VERITY */
>
> +/* Minimal Merkle tree block size is 1024 */
> +#define XFS_VERITY_MAX_MBLOCKS_PER_PAGE (1 << (PAGE_SHIFT - 10))
> +
> +struct xfs_verity_buf_list {
> + unsigned int buf_count;
> + struct xfs_buf *bufs[XFS_VERITY_MAX_MBLOCKS_PER_PAGE];
So... this is going to be a 520-byte allocation on arm64 with 64k pages?
Even if the merkle tree block size is the same as the page size? Ouch.
--D
> +};
> +
> #endif /* __XFS_VERITY_H__ */
> --
> 2.38.4
>
On Tue, Apr 04, 2023 at 04:52:56PM +0200, Andrey Albershteyn wrote:
> Hi all,
>
> This is V2 of fs-verity support in XFS. In this series I did
> numerous changes from V1 which are described below.
>
> This patchset introduces fs-verity [5] support for XFS. This
> implementation utilizes extended attributes to store fs-verity
> metadata. The Merkle tree blocks are stored in the remote extended
> attributes.
>
> A few key points:
> - fs-verity metadata is stored in extended attributes
> - Direct path and DAX are disabled for inodes with fs-verity
> - Pages are verified in iomap's read IO path (offloaded to
> workqueue)
> - New workqueue for verification processing
> - New ro-compat flag
> - Inodes with fs-verity have new on-disk diflag
> - xfs_attr_get() can return buffer with the attribute
>
> The patchset is tested with xfstests -g auto on xfs_1k, xfs_4k,
> xfs_1k_quota, and xfs_4k_quota. Haven't found any major failures.
>
> Patches [6/23] and [7/23] touch ext4, f2fs, btrfs, and patch [8/23]
> touches erofs, gfs2, and zonefs.
>
> The patchset consist of four parts:
> - [1..4]: Patches from Parent Pointer patchset which add binary
> xattr names with a few deps
> - [5..7]: Improvements to core fs-verity
> - [8..9]: Add read path verification to iomap
> - [10..23]: Integration of fs-verity to xfs
>
> Changes from V1:
> - Added parent pointer patches for easier testing
> - Many issues and refactoring points fixed from the V1 review
> - Adjusted for recent changes in fs-verity core (folios, non-4k)
> - Dropped disabling of large folios
> - Completely new fsverity patches (fix, callout, log_blocksize)
> - Change approach to verification in iomap to the same one as in
> write path. Callouts to fs instead of direct fs-verity use.
> - New XFS workqueue for post read folio verification
> - xfs_attr_get() can return underlying xfs_buf
> - xfs_bufs are marked with XBF_VERITY_CHECKED to track verified
> blocks
>
> kernel:
> [1]: https://github.com/alberand/linux/tree/xfs-verity-v2
>
> xfsprogs:
> [2]: https://github.com/alberand/xfsprogs/tree/fsverity-v2
Will there any means for xfs_repair to check the merkle tree contents?
Should it clear the ondisk inode flag if it decides to trash the xattr
structure, or is it ok to let the kernel deal with flag set and no
verity data?
--D
> xfstests:
> [3]: https://github.com/alberand/xfstests/tree/fsverity-v2
>
> v1:
> [4]: https://lore.kernel.org/linux-xfs/[email protected]/
>
> fs-verity:
> [5]: https://www.kernel.org/doc/html/latest/filesystems/fsverity.html
>
> Thanks,
> Andrey
>
> Allison Henderson (4):
> xfs: Add new name to attri/d
> xfs: add parent pointer support to attribute code
> xfs: define parent pointer xattr format
> xfs: Add xfs_verify_pptr
>
> Andrey Albershteyn (19):
> fsverity: make fsverity_verify_folio() accept folio's offset and size
> fsverity: add drop_page() callout
> fsverity: pass Merkle tree block size to ->read_merkle_tree_page()
> iomap: hoist iomap_readpage_ctx from the iomap_readahead/_folio
> iomap: allow filesystem to implement read path verification
> xfs: add XBF_VERITY_CHECKED xfs_buf flag
> xfs: add XFS_DA_OP_BUFFER to make xfs_attr_get() return buffer
> xfs: introduce workqueue for post read IO work
> xfs: add iomap's readpage operations
> xfs: add attribute type for fs-verity
> xfs: add fs-verity ro-compat flag
> xfs: add inode on-disk VERITY flag
> xfs: initialize fs-verity on file open and cleanup on inode
> destruction
> xfs: don't allow to enable DAX on fs-verity sealsed inode
> xfs: disable direct read path for fs-verity sealed files
> xfs: add fs-verity support
> xfs: handle merkle tree block size != fs blocksize != PAGE_SIZE
> xfs: add fs-verity ioctls
> xfs: enable ro-compat fs-verity flag
>
> fs/btrfs/verity.c | 15 +-
> fs/erofs/data.c | 12 +-
> fs/ext4/verity.c | 9 +-
> fs/f2fs/verity.c | 9 +-
> fs/gfs2/aops.c | 10 +-
> fs/ioctl.c | 4 +
> fs/iomap/buffered-io.c | 89 ++++++-----
> fs/verity/read_metadata.c | 7 +-
> fs/verity/verify.c | 9 +-
> fs/xfs/Makefile | 1 +
> fs/xfs/libxfs/xfs_attr.c | 81 +++++++++-
> fs/xfs/libxfs/xfs_attr.h | 7 +-
> fs/xfs/libxfs/xfs_attr_leaf.c | 7 +
> fs/xfs/libxfs/xfs_attr_remote.c | 13 +-
> fs/xfs/libxfs/xfs_da_btree.h | 7 +-
> fs/xfs/libxfs/xfs_da_format.h | 46 +++++-
> fs/xfs/libxfs/xfs_format.h | 14 +-
> fs/xfs/libxfs/xfs_log_format.h | 8 +-
> fs/xfs/libxfs/xfs_sb.c | 2 +
> fs/xfs/scrub/attr.c | 4 +-
> fs/xfs/xfs_aops.c | 61 +++++++-
> fs/xfs/xfs_attr_item.c | 142 +++++++++++++++---
> fs/xfs/xfs_attr_item.h | 1 +
> fs/xfs/xfs_attr_list.c | 17 ++-
> fs/xfs/xfs_buf.h | 17 ++-
> fs/xfs/xfs_file.c | 22 ++-
> fs/xfs/xfs_inode.c | 2 +
> fs/xfs/xfs_inode.h | 3 +-
> fs/xfs/xfs_ioctl.c | 22 +++
> fs/xfs/xfs_iomap.c | 14 ++
> fs/xfs/xfs_iops.c | 4 +
> fs/xfs/xfs_linux.h | 1 +
> fs/xfs/xfs_mount.h | 3 +
> fs/xfs/xfs_ondisk.h | 4 +
> fs/xfs/xfs_super.c | 19 +++
> fs/xfs/xfs_trace.h | 1 +
> fs/xfs/xfs_verity.c | 256 ++++++++++++++++++++++++++++++++
> fs/xfs/xfs_verity.h | 27 ++++
> fs/xfs/xfs_xattr.c | 9 ++
> fs/zonefs/file.c | 12 +-
> include/linux/fsverity.h | 18 ++-
> include/linux/iomap.h | 39 ++++-
> include/uapi/linux/fs.h | 1 +
> 43 files changed, 923 insertions(+), 126 deletions(-)
> create mode 100644 fs/xfs/xfs_verity.c
> create mode 100644 fs/xfs/xfs_verity.h
>
> --
> 2.38.4
>
Hi Andrey,
kernel test robot noticed the following build errors:
[auto build test ERROR on xfs-linux/for-next]
[also build test ERROR on kdave/for-next tytso-ext4/dev jaegeuk-f2fs/dev-test jaegeuk-f2fs/dev linus/master v6.3-rc5]
[cannot apply to next-20230404]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Andrey-Albershteyn/xfs-Add-new-name-to-attri-d/20230404-230224
base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
patch link: https://lore.kernel.org/r/20230404145319.2057051-21-aalbersh%40redhat.com
patch subject: [PATCH v2 20/23] xfs: add fs-verity support
config: sh-allmodconfig (https://download.01.org/0day-ci/archive/20230405/[email protected]/config)
compiler: sh4-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/1324353702eaba7da1643d589631adcaedf9a046
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Andrey-Albershteyn/xfs-Add-new-name-to-attri-d/20230404-230224
git checkout 1324353702eaba7da1643d589631adcaedf9a046
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=sh olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=sh SHELL=/bin/bash fs/xfs/
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/
All errors (new ones prefixed by >>):
In file included from <command-line>:
In function 'xfs_check_ondisk_structs',
inlined from 'init_xfs_fs' at fs/xfs/xfs_super.c:2307:2:
>> include/linux/compiler_types.h:397:45: error: call to '__compiletime_assert_1674' declared with attribute error: XFS: value of strlen(XFS_VERITY_DESCRIPTOR_NAME) is wrong, expected XFS_VERITY_DESCRIPTOR_NAME_LEN
397 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
include/linux/compiler_types.h:378:25: note: in definition of macro '__compiletime_assert'
378 | prefix ## suffix(); \
| ^~~~~~
include/linux/compiler_types.h:397:9: note: in expansion of macro '_compiletime_assert'
397 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
fs/xfs/xfs_ondisk.h:19:9: note: in expansion of macro 'BUILD_BUG_ON_MSG'
19 | BUILD_BUG_ON_MSG((value) != (expected), \
| ^~~~~~~~~~~~~~~~
fs/xfs/xfs_ondisk.h:194:9: note: in expansion of macro 'XFS_CHECK_VALUE'
194 | XFS_CHECK_VALUE(strlen(XFS_VERITY_DESCRIPTOR_NAME),
| ^~~~~~~~~~~~~~~
vim +/__compiletime_assert_1674 +397 include/linux/compiler_types.h
eb5c2d4b45e3d2 Will Deacon 2020-07-21 383
eb5c2d4b45e3d2 Will Deacon 2020-07-21 384 #define _compiletime_assert(condition, msg, prefix, suffix) \
eb5c2d4b45e3d2 Will Deacon 2020-07-21 385 __compiletime_assert(condition, msg, prefix, suffix)
eb5c2d4b45e3d2 Will Deacon 2020-07-21 386
eb5c2d4b45e3d2 Will Deacon 2020-07-21 387 /**
eb5c2d4b45e3d2 Will Deacon 2020-07-21 388 * compiletime_assert - break build and emit msg if condition is false
eb5c2d4b45e3d2 Will Deacon 2020-07-21 389 * @condition: a compile-time constant condition to check
eb5c2d4b45e3d2 Will Deacon 2020-07-21 390 * @msg: a message to emit if condition is false
eb5c2d4b45e3d2 Will Deacon 2020-07-21 391 *
eb5c2d4b45e3d2 Will Deacon 2020-07-21 392 * In tradition of POSIX assert, this macro will break the build if the
eb5c2d4b45e3d2 Will Deacon 2020-07-21 393 * supplied condition is *false*, emitting the supplied error message if the
eb5c2d4b45e3d2 Will Deacon 2020-07-21 394 * compiler has support to do so.
eb5c2d4b45e3d2 Will Deacon 2020-07-21 395 */
eb5c2d4b45e3d2 Will Deacon 2020-07-21 396 #define compiletime_assert(condition, msg) \
eb5c2d4b45e3d2 Will Deacon 2020-07-21 @397 _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
eb5c2d4b45e3d2 Will Deacon 2020-07-21 398
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
Hi Andrey,
kernel test robot noticed the following build errors:
[auto build test ERROR on xfs-linux/for-next]
[also build test ERROR on kdave/for-next jaegeuk-f2fs/dev-test jaegeuk-f2fs/dev linus/master v6.3-rc5]
[cannot apply to next-20230404]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Andrey-Albershteyn/xfs-Add-new-name-to-attri-d/20230404-230224
base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
patch link: https://lore.kernel.org/r/20230404145319.2057051-21-aalbersh%40redhat.com
patch subject: [PATCH v2 20/23] xfs: add fs-verity support
config: i386-randconfig-r036-20230403 (https://download.01.org/0day-ci/archive/20230405/[email protected]/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/1324353702eaba7da1643d589631adcaedf9a046
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Andrey-Albershteyn/xfs-Add-new-name-to-attri-d/20230404-230224
git checkout 1324353702eaba7da1643d589631adcaedf9a046
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash fs/
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/
All errors (new ones prefixed by >>):
In file included from fs/xfs/xfs_super.c:34:
>> fs/xfs/xfs_ondisk.h:194:2: error: call to __compiletime_assert_3998 declared with 'error' attribute: XFS: value of strlen(XFS_VERITY_DESCRIPTOR_NAME) is wrong, expected XFS_VERITY_DESCRIPTOR_NAME_LEN
XFS_CHECK_VALUE(strlen(XFS_VERITY_DESCRIPTOR_NAME),
^
fs/xfs/xfs_ondisk.h:19:2: note: expanded from macro 'XFS_CHECK_VALUE'
BUILD_BUG_ON_MSG((value) != (expected), \
^
include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
^
include/linux/compiler_types.h:397:2: note: expanded from macro 'compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
include/linux/compiler_types.h:385:2: note: expanded from macro '_compiletime_assert'
__compiletime_assert(condition, msg, prefix, suffix)
^
include/linux/compiler_types.h:378:4: note: expanded from macro '__compiletime_assert'
prefix ## suffix(); \
^
<scratch space>:77:1: note: expanded from here
__compiletime_assert_3998
^
1 error generated.
vim +/error +194 fs/xfs/xfs_ondisk.h
8
9 #define XFS_CHECK_STRUCT_SIZE(structname, size) \
10 BUILD_BUG_ON_MSG(sizeof(structname) != (size), "XFS: sizeof(" \
11 #structname ") is wrong, expected " #size)
12
13 #define XFS_CHECK_OFFSET(structname, member, off) \
14 BUILD_BUG_ON_MSG(offsetof(structname, member) != (off), \
15 "XFS: offsetof(" #structname ", " #member ") is wrong, " \
16 "expected " #off)
17
18 #define XFS_CHECK_VALUE(value, expected) \
19 BUILD_BUG_ON_MSG((value) != (expected), \
20 "XFS: value of " #value " is wrong, expected " #expected)
21
22 static inline void __init
23 xfs_check_ondisk_structs(void)
24 {
25 /* ag/file structures */
26 XFS_CHECK_STRUCT_SIZE(struct xfs_acl, 4);
27 XFS_CHECK_STRUCT_SIZE(struct xfs_acl_entry, 12);
28 XFS_CHECK_STRUCT_SIZE(struct xfs_agf, 224);
29 XFS_CHECK_STRUCT_SIZE(struct xfs_agfl, 36);
30 XFS_CHECK_STRUCT_SIZE(struct xfs_agi, 344);
31 XFS_CHECK_STRUCT_SIZE(struct xfs_bmbt_key, 8);
32 XFS_CHECK_STRUCT_SIZE(struct xfs_bmbt_rec, 16);
33 XFS_CHECK_STRUCT_SIZE(struct xfs_bmdr_block, 4);
34 XFS_CHECK_STRUCT_SIZE(struct xfs_btree_block_shdr, 48);
35 XFS_CHECK_STRUCT_SIZE(struct xfs_btree_block_lhdr, 64);
36 XFS_CHECK_STRUCT_SIZE(struct xfs_btree_block, 72);
37 XFS_CHECK_STRUCT_SIZE(struct xfs_dinode, 176);
38 XFS_CHECK_STRUCT_SIZE(struct xfs_disk_dquot, 104);
39 XFS_CHECK_STRUCT_SIZE(struct xfs_dqblk, 136);
40 XFS_CHECK_STRUCT_SIZE(struct xfs_dsb, 264);
41 XFS_CHECK_STRUCT_SIZE(struct xfs_dsymlink_hdr, 56);
42 XFS_CHECK_STRUCT_SIZE(struct xfs_inobt_key, 4);
43 XFS_CHECK_STRUCT_SIZE(struct xfs_inobt_rec, 16);
44 XFS_CHECK_STRUCT_SIZE(struct xfs_refcount_key, 4);
45 XFS_CHECK_STRUCT_SIZE(struct xfs_refcount_rec, 12);
46 XFS_CHECK_STRUCT_SIZE(struct xfs_rmap_key, 20);
47 XFS_CHECK_STRUCT_SIZE(struct xfs_rmap_rec, 24);
48 XFS_CHECK_STRUCT_SIZE(xfs_timestamp_t, 8);
49 XFS_CHECK_STRUCT_SIZE(struct xfs_legacy_timestamp, 8);
50 XFS_CHECK_STRUCT_SIZE(xfs_alloc_key_t, 8);
51 XFS_CHECK_STRUCT_SIZE(xfs_alloc_ptr_t, 4);
52 XFS_CHECK_STRUCT_SIZE(xfs_alloc_rec_t, 8);
53 XFS_CHECK_STRUCT_SIZE(xfs_inobt_ptr_t, 4);
54 XFS_CHECK_STRUCT_SIZE(xfs_refcount_ptr_t, 4);
55 XFS_CHECK_STRUCT_SIZE(xfs_rmap_ptr_t, 4);
56
57 /* dir/attr trees */
58 XFS_CHECK_STRUCT_SIZE(struct xfs_attr3_leaf_hdr, 80);
59 XFS_CHECK_STRUCT_SIZE(struct xfs_attr3_leafblock, 88);
60 XFS_CHECK_STRUCT_SIZE(struct xfs_attr3_rmt_hdr, 56);
61 XFS_CHECK_STRUCT_SIZE(struct xfs_da3_blkinfo, 56);
62 XFS_CHECK_STRUCT_SIZE(struct xfs_da3_intnode, 64);
63 XFS_CHECK_STRUCT_SIZE(struct xfs_da3_node_hdr, 64);
64 XFS_CHECK_STRUCT_SIZE(struct xfs_dir3_blk_hdr, 48);
65 XFS_CHECK_STRUCT_SIZE(struct xfs_dir3_data_hdr, 64);
66 XFS_CHECK_STRUCT_SIZE(struct xfs_dir3_free, 64);
67 XFS_CHECK_STRUCT_SIZE(struct xfs_dir3_free_hdr, 64);
68 XFS_CHECK_STRUCT_SIZE(struct xfs_dir3_leaf, 64);
69 XFS_CHECK_STRUCT_SIZE(struct xfs_dir3_leaf_hdr, 64);
70 XFS_CHECK_STRUCT_SIZE(xfs_attr_leaf_entry_t, 8);
71 XFS_CHECK_STRUCT_SIZE(xfs_attr_leaf_hdr_t, 32);
72 XFS_CHECK_STRUCT_SIZE(xfs_attr_leaf_map_t, 4);
73 XFS_CHECK_STRUCT_SIZE(xfs_attr_leaf_name_local_t, 4);
74
75 /*
76 * m68k has problems with xfs_attr_leaf_name_remote_t, but we pad it to
77 * 4 bytes anyway so it's not obviously a problem. Hence for the moment
78 * we don't check this structure. This can be re-instated when the attr
79 * definitions are updated to use c99 VLA definitions.
80 *
81 XFS_CHECK_STRUCT_SIZE(xfs_attr_leaf_name_remote_t, 12);
82 */
83
84 XFS_CHECK_OFFSET(xfs_attr_leaf_name_local_t, valuelen, 0);
85 XFS_CHECK_OFFSET(xfs_attr_leaf_name_local_t, namelen, 2);
86 XFS_CHECK_OFFSET(xfs_attr_leaf_name_local_t, nameval, 3);
87 XFS_CHECK_OFFSET(xfs_attr_leaf_name_remote_t, valueblk, 0);
88 XFS_CHECK_OFFSET(xfs_attr_leaf_name_remote_t, valuelen, 4);
89 XFS_CHECK_OFFSET(xfs_attr_leaf_name_remote_t, namelen, 8);
90 XFS_CHECK_OFFSET(xfs_attr_leaf_name_remote_t, name, 9);
91 XFS_CHECK_STRUCT_SIZE(xfs_attr_leafblock_t, 40);
92 XFS_CHECK_OFFSET(struct xfs_attr_shortform, hdr.totsize, 0);
93 XFS_CHECK_OFFSET(struct xfs_attr_shortform, hdr.count, 2);
94 XFS_CHECK_OFFSET(struct xfs_attr_shortform, list[0].namelen, 4);
95 XFS_CHECK_OFFSET(struct xfs_attr_shortform, list[0].valuelen, 5);
96 XFS_CHECK_OFFSET(struct xfs_attr_shortform, list[0].flags, 6);
97 XFS_CHECK_OFFSET(struct xfs_attr_shortform, list[0].nameval, 7);
98 XFS_CHECK_STRUCT_SIZE(xfs_da_blkinfo_t, 12);
99 XFS_CHECK_STRUCT_SIZE(xfs_da_intnode_t, 16);
100 XFS_CHECK_STRUCT_SIZE(xfs_da_node_entry_t, 8);
101 XFS_CHECK_STRUCT_SIZE(xfs_da_node_hdr_t, 16);
102 XFS_CHECK_STRUCT_SIZE(xfs_dir2_data_free_t, 4);
103 XFS_CHECK_STRUCT_SIZE(xfs_dir2_data_hdr_t, 16);
104 XFS_CHECK_OFFSET(xfs_dir2_data_unused_t, freetag, 0);
105 XFS_CHECK_OFFSET(xfs_dir2_data_unused_t, length, 2);
106 XFS_CHECK_STRUCT_SIZE(xfs_dir2_free_hdr_t, 16);
107 XFS_CHECK_STRUCT_SIZE(xfs_dir2_free_t, 16);
108 XFS_CHECK_STRUCT_SIZE(xfs_dir2_leaf_entry_t, 8);
109 XFS_CHECK_STRUCT_SIZE(xfs_dir2_leaf_hdr_t, 16);
110 XFS_CHECK_STRUCT_SIZE(xfs_dir2_leaf_t, 16);
111 XFS_CHECK_STRUCT_SIZE(xfs_dir2_leaf_tail_t, 4);
112 XFS_CHECK_STRUCT_SIZE(xfs_dir2_sf_entry_t, 3);
113 XFS_CHECK_OFFSET(xfs_dir2_sf_entry_t, namelen, 0);
114 XFS_CHECK_OFFSET(xfs_dir2_sf_entry_t, offset, 1);
115 XFS_CHECK_OFFSET(xfs_dir2_sf_entry_t, name, 3);
116 XFS_CHECK_STRUCT_SIZE(xfs_dir2_sf_hdr_t, 10);
117
118 /* log structures */
119 XFS_CHECK_STRUCT_SIZE(struct xfs_buf_log_format, 88);
120 XFS_CHECK_STRUCT_SIZE(struct xfs_dq_logformat, 24);
121 XFS_CHECK_STRUCT_SIZE(struct xfs_efd_log_format_32, 16);
122 XFS_CHECK_STRUCT_SIZE(struct xfs_efd_log_format_64, 16);
123 XFS_CHECK_STRUCT_SIZE(struct xfs_efi_log_format_32, 16);
124 XFS_CHECK_STRUCT_SIZE(struct xfs_efi_log_format_64, 16);
125 XFS_CHECK_STRUCT_SIZE(struct xfs_extent_32, 12);
126 XFS_CHECK_STRUCT_SIZE(struct xfs_extent_64, 16);
127 XFS_CHECK_STRUCT_SIZE(struct xfs_log_dinode, 176);
128 XFS_CHECK_STRUCT_SIZE(struct xfs_icreate_log, 28);
129 XFS_CHECK_STRUCT_SIZE(xfs_log_timestamp_t, 8);
130 XFS_CHECK_STRUCT_SIZE(struct xfs_log_legacy_timestamp, 8);
131 XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format_32, 52);
132 XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format, 56);
133 XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat, 20);
134 XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header, 16);
135 XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format, 40);
136 XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format, 16);
137 XFS_CHECK_STRUCT_SIZE(struct xfs_bui_log_format, 16);
138 XFS_CHECK_STRUCT_SIZE(struct xfs_bud_log_format, 16);
139 XFS_CHECK_STRUCT_SIZE(struct xfs_cui_log_format, 16);
140 XFS_CHECK_STRUCT_SIZE(struct xfs_cud_log_format, 16);
141 XFS_CHECK_STRUCT_SIZE(struct xfs_rui_log_format, 16);
142 XFS_CHECK_STRUCT_SIZE(struct xfs_rud_log_format, 16);
143 XFS_CHECK_STRUCT_SIZE(struct xfs_map_extent, 32);
144 XFS_CHECK_STRUCT_SIZE(struct xfs_phys_extent, 16);
145
146 XFS_CHECK_OFFSET(struct xfs_bui_log_format, bui_extents, 16);
147 XFS_CHECK_OFFSET(struct xfs_cui_log_format, cui_extents, 16);
148 XFS_CHECK_OFFSET(struct xfs_rui_log_format, rui_extents, 16);
149 XFS_CHECK_OFFSET(struct xfs_efi_log_format, efi_extents, 16);
150 XFS_CHECK_OFFSET(struct xfs_efi_log_format_32, efi_extents, 16);
151 XFS_CHECK_OFFSET(struct xfs_efi_log_format_64, efi_extents, 16);
152
153 /*
154 * The v5 superblock format extended several v4 header structures with
155 * additional data. While new fields are only accessible on v5
156 * superblocks, it's important that the v5 structures place original v4
157 * fields/headers in the correct location on-disk. For example, we must
158 * be able to find magic values at the same location in certain blocks
159 * regardless of superblock version.
160 *
161 * The following checks ensure that various v5 data structures place the
162 * subset of v4 metadata associated with the same type of block at the
163 * start of the on-disk block. If there is no data structure definition
164 * for certain types of v4 blocks, traverse down to the first field of
165 * common metadata (e.g., magic value) and make sure it is at offset
166 * zero.
167 */
168 XFS_CHECK_OFFSET(struct xfs_dir3_leaf, hdr.info.hdr, 0);
169 XFS_CHECK_OFFSET(struct xfs_da3_intnode, hdr.info.hdr, 0);
170 XFS_CHECK_OFFSET(struct xfs_dir3_data_hdr, hdr.magic, 0);
171 XFS_CHECK_OFFSET(struct xfs_dir3_free, hdr.hdr.magic, 0);
172 XFS_CHECK_OFFSET(struct xfs_attr3_leafblock, hdr.info.hdr, 0);
173
174 XFS_CHECK_STRUCT_SIZE(struct xfs_bulkstat, 192);
175 XFS_CHECK_STRUCT_SIZE(struct xfs_inumbers, 24);
176 XFS_CHECK_STRUCT_SIZE(struct xfs_bulkstat_req, 64);
177 XFS_CHECK_STRUCT_SIZE(struct xfs_inumbers_req, 64);
178
179 /*
180 * Make sure the incore inode timestamp range corresponds to hand
181 * converted values based on the ondisk format specification.
182 */
183 XFS_CHECK_VALUE(XFS_BIGTIME_TIME_MIN - XFS_BIGTIME_EPOCH_OFFSET,
184 XFS_LEGACY_TIME_MIN);
185 XFS_CHECK_VALUE(XFS_BIGTIME_TIME_MAX - XFS_BIGTIME_EPOCH_OFFSET,
186 16299260424LL);
187
188 /* Do the same with the incore quota expiration range. */
189 XFS_CHECK_VALUE(XFS_DQ_BIGTIME_EXPIRY_MIN << XFS_DQ_BIGTIME_SHIFT, 4);
190 XFS_CHECK_VALUE(XFS_DQ_BIGTIME_EXPIRY_MAX << XFS_DQ_BIGTIME_SHIFT,
191 16299260424LL);
192
193 /* fs-verity descriptor xattr name */
> 194 XFS_CHECK_VALUE(strlen(XFS_VERITY_DESCRIPTOR_NAME),
195 XFS_VERITY_DESCRIPTOR_NAME_LEN);
196 }
197
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
Hi Andrey,
On Tue, Apr 04, 2023 at 04:53:12PM +0200, Andrey Albershteyn wrote:
> Add flag to mark inodes which have fs-verity enabled on them (i.e.
> descriptor exist and tree is built).
>
> Signed-off-by: Andrey Albershteyn <[email protected]>
> ---
> fs/ioctl.c | 4 ++++
> fs/xfs/libxfs/xfs_format.h | 4 +++-
> fs/xfs/xfs_inode.c | 2 ++
> fs/xfs/xfs_iops.c | 2 ++
> include/uapi/linux/fs.h | 1 +
> 5 files changed, 12 insertions(+), 1 deletion(-)
[...]
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index b7b56871029c..5172a2eb902c 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -140,6 +140,7 @@ struct fsxattr {
> #define FS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */
> #define FS_XFLAG_DAX 0x00008000 /* use DAX for IO */
> #define FS_XFLAG_COWEXTSIZE 0x00010000 /* CoW extent size allocator hint */
> +#define FS_XFLAG_VERITY 0x00020000 /* fs-verity sealed inode */
> #define FS_XFLAG_HASATTR 0x80000000 /* no DIFLAG for this */
>
I don't think "xfs: add inode on-disk VERITY flag" is an accurate description of
a patch that involves adding something to the UAPI.
Should the other filesystems support this new flag too?
I'd also like all ways of getting the verity flag to continue to be mentioned in
Documentation/filesystems/fsverity.rst. The existing methods (FS_IOC_GETFLAGS
and statx) are already mentioned there.
- Eric
Hi Andrey,
On Tue, Apr 04, 2023 at 04:53:17PM +0200, Andrey Albershteyn wrote:
> In case of different Merkle tree block size fs-verity expects
> ->read_merkle_tree_page() to return Merkle tree page filled with
> Merkle tree blocks. The XFS stores each merkle tree block under
> extended attribute. Those attributes are addressed by block offset
> into Merkle tree.
>
> This patch make ->read_merkle_tree_page() to fetch multiple merkle
> tree blocks based on size ratio. Also the reference to each xfs_buf
> is passed with page->private to ->drop_page().
>
> Signed-off-by: Andrey Albershteyn <[email protected]>
> ---
> fs/xfs/xfs_verity.c | 74 +++++++++++++++++++++++++++++++++++----------
> fs/xfs/xfs_verity.h | 8 +++++
> 2 files changed, 66 insertions(+), 16 deletions(-)
>
> diff --git a/fs/xfs/xfs_verity.c b/fs/xfs/xfs_verity.c
> index a9874ff4efcd..ef0aff216f06 100644
> --- a/fs/xfs/xfs_verity.c
> +++ b/fs/xfs/xfs_verity.c
> @@ -134,6 +134,10 @@ xfs_read_merkle_tree_page(
> struct page *page = NULL;
> __be64 name = cpu_to_be64(index << PAGE_SHIFT);
> uint32_t bs = 1 << log_blocksize;
> + int blocks_per_page =
> + (1 << (PAGE_SHIFT - log_blocksize));
> + int n = 0;
> + int offset = 0;
> struct xfs_da_args args = {
> .dp = ip,
> .attr_filter = XFS_ATTR_VERITY,
> @@ -143,26 +147,59 @@ xfs_read_merkle_tree_page(
> .valuelen = bs,
> };
> int error = 0;
> + bool is_checked = true;
> + struct xfs_verity_buf_list *buf_list;
>
> page = alloc_page(GFP_KERNEL);
> if (!page)
> return ERR_PTR(-ENOMEM);
>
> - error = xfs_attr_get(&args);
> - if (error) {
> - kmem_free(args.value);
> - xfs_buf_rele(args.bp);
> + buf_list = kzalloc(sizeof(struct xfs_verity_buf_list), GFP_KERNEL);
> + if (!buf_list) {
> put_page(page);
> - return ERR_PTR(-EFAULT);
> + return ERR_PTR(-ENOMEM);
> }
>
> - if (args.bp->b_flags & XBF_VERITY_CHECKED)
> + /*
> + * Fill the page with Merkle tree blocks. The blcoks_per_page is higher
> + * than 1 when fs block size != PAGE_SIZE or Merkle tree block size !=
> + * PAGE SIZE
> + */
> + for (n = 0; n < blocks_per_page; n++) {
> + offset = bs * n;
> + name = cpu_to_be64(((index << PAGE_SHIFT) + offset));
> + args.name = (const uint8_t *)&name;
> +
> + error = xfs_attr_get(&args);
> + if (error) {
> + kmem_free(args.value);
> + /*
> + * No more Merkle tree blocks (e.g. this was the last
> + * block of the tree)
> + */
> + if (error == -ENOATTR)
> + break;
> + xfs_buf_rele(args.bp);
> + put_page(page);
> + kmem_free(buf_list);
> + return ERR_PTR(-EFAULT);
> + }
> +
> + buf_list->bufs[buf_list->buf_count++] = args.bp;
> +
> + /* One of the buffers was dropped */
> + if (!(args.bp->b_flags & XBF_VERITY_CHECKED))
> + is_checked = false;
> +
> + memcpy(page_address(page) + offset, args.value, args.valuelen);
> + kmem_free(args.value);
> + args.value = NULL;
> + }
I was really hoping for a solution where the cached data can be used directly,
instead allocating a temporary page and copying the cached data into it every
time the cache is accessed. The problem with what you have now is that every
time a single 32-byte hash is accessed, a full page (potentially 64KB!) will be
allocated and filled. That's not very efficient. The need to allocate the
temporary page can also cause ENOMEM (which will get reported as EIO).
Did you consider alternatives that would work more efficiently? I think it
would be worth designing something that works properly with how XFS is planned
to cache the Merkle tree, instead of designing a workaround.
->read_merkle_tree_page was not really designed for what you are doing here.
How about replacing ->read_merkle_tree_page with a function that takes in a
Merkle tree block index (not a page index!) and hands back a (page, offset) pair
that identifies where the Merkle tree block's data is located? Or (folio,
offset), I suppose.
With that, would it be possible to directly return the XFS cache?
- Eric
On Tue, Apr 04, 2023 at 04:52:56PM +0200, Andrey Albershteyn wrote:
> The patchset is tested with xfstests -g auto on xfs_1k, xfs_4k,
> xfs_1k_quota, and xfs_4k_quota. Haven't found any major failures.
Just to double check, did you verify that the tests in the "verity" group are
running, and were not skipped?
- Eric
On Tue, Apr 04, 2023 at 04:53:02PM +0200, Andrey Albershteyn wrote:
> Allow filesystem to make additional processing on verified pages
> instead of just dropping a reference. This will be used by XFS for
> internal buffer cache manipulation in further patches. The btrfs,
> ext4, and f2fs just drop the reference.
>
> Signed-off-by: Andrey Albershteyn <[email protected]>
> ---
> fs/btrfs/verity.c | 12 ++++++++++++
> fs/ext4/verity.c | 6 ++++++
> fs/f2fs/verity.c | 6 ++++++
> fs/verity/read_metadata.c | 4 ++--
> fs/verity/verify.c | 6 +++---
> include/linux/fsverity.h | 10 ++++++++++
> 6 files changed, 39 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/verity.c b/fs/btrfs/verity.c
> index c5ff16f9e9fa..4c2c09204bb4 100644
> --- a/fs/btrfs/verity.c
> +++ b/fs/btrfs/verity.c
> @@ -804,10 +804,22 @@ static int btrfs_write_merkle_tree_block(struct inode *inode, const void *buf,
> pos, buf, size);
> }
>
> +/*
> + * fsverity op that releases the reference obtained by ->read_merkle_tree_page()
> + *
> + * @page: reference to the page which can be released
> + *
> + */
> +static void btrfs_drop_page(struct page *page)
> +{
> + put_page(page);
> +}
> +
> const struct fsverity_operations btrfs_verityops = {
> .begin_enable_verity = btrfs_begin_enable_verity,
> .end_enable_verity = btrfs_end_enable_verity,
> .get_verity_descriptor = btrfs_get_verity_descriptor,
> .read_merkle_tree_page = btrfs_read_merkle_tree_page,
> .write_merkle_tree_block = btrfs_write_merkle_tree_block,
> + .drop_page = &btrfs_drop_page,
> };
Ok, that's a generic put_page() call.
....
> diff --git a/fs/verity/verify.c b/fs/verity/verify.c
> index f50e3b5b52c9..c2fc4c86af34 100644
> --- a/fs/verity/verify.c
> +++ b/fs/verity/verify.c
> @@ -210,7 +210,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
> if (is_hash_block_verified(vi, hpage, hblock_idx)) {
> memcpy_from_page(_want_hash, hpage, hoffset, hsize);
> want_hash = _want_hash;
> - put_page(hpage);
> + inode->i_sb->s_vop->drop_page(hpage);
> goto descend;
fsverity_drop_page(hpage);
static inline void
fsverity_drop_page(struct inode *inode, struct page *page)
{
if (inode->i_sb->s_vop->drop_page)
inode->i_sb->s_vop->drop_page(page);
else
put_page(page);
}
And then you don't need to add the functions to each of the
filesystems nor make an indirect call just to run put_page().
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Tue, Apr 04, 2023 at 03:41:23PM -0700, Eric Biggers wrote:
> Hi Andrey,
>
> On Tue, Apr 04, 2023 at 04:53:12PM +0200, Andrey Albershteyn wrote:
> > Add flag to mark inodes which have fs-verity enabled on them (i.e.
> > descriptor exist and tree is built).
> >
> > Signed-off-by: Andrey Albershteyn <[email protected]>
> > ---
> > fs/ioctl.c | 4 ++++
> > fs/xfs/libxfs/xfs_format.h | 4 +++-
> > fs/xfs/xfs_inode.c | 2 ++
> > fs/xfs/xfs_iops.c | 2 ++
> > include/uapi/linux/fs.h | 1 +
> > 5 files changed, 12 insertions(+), 1 deletion(-)
> [...]
> > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> > index b7b56871029c..5172a2eb902c 100644
> > --- a/include/uapi/linux/fs.h
> > +++ b/include/uapi/linux/fs.h
> > @@ -140,6 +140,7 @@ struct fsxattr {
> > #define FS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */
> > #define FS_XFLAG_DAX 0x00008000 /* use DAX for IO */
> > #define FS_XFLAG_COWEXTSIZE 0x00010000 /* CoW extent size allocator hint */
> > +#define FS_XFLAG_VERITY 0x00020000 /* fs-verity sealed inode */
> > #define FS_XFLAG_HASATTR 0x80000000 /* no DIFLAG for this */
> >
>
> I don't think "xfs: add inode on-disk VERITY flag" is an accurate description of
> a patch that involves adding something to the UAPI.
Well it does that, but it also adds the UAPI for querying the
on-disk flag via the FS_IOC_FSGETXATTR interface as well. It
probably should be split up into two patches.
> Should the other filesystems support this new flag too?
I think they should get it automatically now that it has been
defined for FS_IOC_FSGETXATTR and added to the generic fileattr flag
fill functions in fs/ioctl.c.
> I'd also like all ways of getting the verity flag to continue to be mentioned in
> Documentation/filesystems/fsverity.rst. The existing methods (FS_IOC_GETFLAGS
> and statx) are already mentioned there.
*nod*
-Dave.
--
Dave Chinner
[email protected]
Hi Christoph,
On Tue, Apr 04, 2023 at 08:30:36AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 04, 2023 at 04:53:01PM +0200, Andrey Albershteyn wrote:
> > Not the whole folio always need to be verified by fs-verity (e.g.
> > with 1k blocks). Use passed folio's offset and size.
>
> Why can't those callers just call fsverity_verify_blocks directly?
>
They can. Calling _verify_folio with explicit offset; size appeared
more clear to me. But I'm ok with dropping this patch to have full
folio verify function.
--
- Andrey
Hi Dave,
On Wed, Apr 05, 2023 at 09:40:19AM +1000, Dave Chinner wrote:
> On Tue, Apr 04, 2023 at 04:53:02PM +0200, Andrey Albershteyn wrote:
> > Allow filesystem to make additional processing on verified pages
> > instead of just dropping a reference. This will be used by XFS for
> > internal buffer cache manipulation in further patches. The btrfs,
> > ext4, and f2fs just drop the reference.
> >
> > Signed-off-by: Andrey Albershteyn <[email protected]>
> > ---
> > fs/btrfs/verity.c | 12 ++++++++++++
> > fs/ext4/verity.c | 6 ++++++
> > fs/f2fs/verity.c | 6 ++++++
> > fs/verity/read_metadata.c | 4 ++--
> > fs/verity/verify.c | 6 +++---
> > include/linux/fsverity.h | 10 ++++++++++
> > 6 files changed, 39 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/btrfs/verity.c b/fs/btrfs/verity.c
> > index c5ff16f9e9fa..4c2c09204bb4 100644
> > --- a/fs/btrfs/verity.c
> > +++ b/fs/btrfs/verity.c
> > @@ -804,10 +804,22 @@ static int btrfs_write_merkle_tree_block(struct inode *inode, const void *buf,
> > pos, buf, size);
> > }
> >
> > +/*
> > + * fsverity op that releases the reference obtained by ->read_merkle_tree_page()
> > + *
> > + * @page: reference to the page which can be released
> > + *
> > + */
> > +static void btrfs_drop_page(struct page *page)
> > +{
> > + put_page(page);
> > +}
> > +
> > const struct fsverity_operations btrfs_verityops = {
> > .begin_enable_verity = btrfs_begin_enable_verity,
> > .end_enable_verity = btrfs_end_enable_verity,
> > .get_verity_descriptor = btrfs_get_verity_descriptor,
> > .read_merkle_tree_page = btrfs_read_merkle_tree_page,
> > .write_merkle_tree_block = btrfs_write_merkle_tree_block,
> > + .drop_page = &btrfs_drop_page,
> > };
>
> Ok, that's a generic put_page() call.
>
> ....
> > diff --git a/fs/verity/verify.c b/fs/verity/verify.c
> > index f50e3b5b52c9..c2fc4c86af34 100644
> > --- a/fs/verity/verify.c
> > +++ b/fs/verity/verify.c
> > @@ -210,7 +210,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
> > if (is_hash_block_verified(vi, hpage, hblock_idx)) {
> > memcpy_from_page(_want_hash, hpage, hoffset, hsize);
> > want_hash = _want_hash;
> > - put_page(hpage);
> > + inode->i_sb->s_vop->drop_page(hpage);
> > goto descend;
>
> fsverity_drop_page(hpage);
>
> static inline void
> fsverity_drop_page(struct inode *inode, struct page *page)
> {
> if (inode->i_sb->s_vop->drop_page)
> inode->i_sb->s_vop->drop_page(page);
> else
> put_page(page);
> }
>
> And then you don't need to add the functions to each of the
> filesystems nor make an indirect call just to run put_page().
Sure, this makes more sense, thank you!
--
- Andrey
Hi Christoph,
On Tue, Apr 04, 2023 at 08:37:02AM -0700, Christoph Hellwig wrote:
> > if (iomap_block_needs_zeroing(iter, pos)) {
> > folio_zero_range(folio, poff, plen);
> > + if (iomap->flags & IOMAP_F_READ_VERITY) {
>
> Wju do we need the new flag vs just testing that folio_ops and
> folio_ops->verify_folio is non-NULL?
Yes, it can be just test, haven't noticed that it's used only here,
initially I used it in several places.
>
> > - ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
> > - REQ_OP_READ, gfp);
> > + ctx->bio = bio_alloc_bioset(iomap->bdev, bio_max_segs(nr_vecs),
> > + REQ_OP_READ, GFP_NOFS, &iomap_read_ioend_bioset);
>
> All other callers don't really need the larger bioset, so I'd avoid
> the unconditional allocation here, but more on that later.
Ok, make sense.
>
> > + ioend = container_of(ctx->bio, struct iomap_read_ioend,
> > + read_inline_bio);
> > + ioend->io_inode = iter->inode;
> > + if (ctx->ops && ctx->ops->prepare_ioend)
> > + ctx->ops->prepare_ioend(ioend);
> > +
>
> So what we're doing in writeback and direct I/O, is to:
>
> a) have a submit_bio hook
> b) allow the file system to then hook the bi_end_io caller
> c) (only in direct O/O for now) allow the file system to provide
> a bio_set to allocate from
I see.
>
> I wonder if that also makes sense and keep all the deferral in the
> file system. We'll need that for the btrfs iomap conversion anyway,
> and it seems more flexible. The ioend processing would then move into
> XFS.
>
Not sure what you mean here.
> > @@ -156,6 +160,11 @@ struct iomap_folio_ops {
> > * locked by the iomap code.
> > */
> > bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap);
> > +
> > + /*
> > + * Verify folio when successfully read
> > + */
> > + bool (*verify_folio)(struct folio *folio, loff_t pos, unsigned int len);
>
> Why isn't this in iomap_readpage_ops?
>
Yes, it can be. But it appears to me to be more relevant to
_folio_ops, any particular reason to move it there? Don't mind
moving it to iomap_readpage_ops.
--
- Andrey
Hi Eric and Dave,
On Wed, Apr 05, 2023 at 09:56:33AM +1000, Dave Chinner wrote:
> On Tue, Apr 04, 2023 at 03:41:23PM -0700, Eric Biggers wrote:
> > Hi Andrey,
> >
> > On Tue, Apr 04, 2023 at 04:53:12PM +0200, Andrey Albershteyn wrote:
> > > Add flag to mark inodes which have fs-verity enabled on them (i.e.
> > > descriptor exist and tree is built).
> > >
> > > Signed-off-by: Andrey Albershteyn <[email protected]>
> > > ---
> > > fs/ioctl.c | 4 ++++
> > > fs/xfs/libxfs/xfs_format.h | 4 +++-
> > > fs/xfs/xfs_inode.c | 2 ++
> > > fs/xfs/xfs_iops.c | 2 ++
> > > include/uapi/linux/fs.h | 1 +
> > > 5 files changed, 12 insertions(+), 1 deletion(-)
> > [...]
> > > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> > > index b7b56871029c..5172a2eb902c 100644
> > > --- a/include/uapi/linux/fs.h
> > > +++ b/include/uapi/linux/fs.h
> > > @@ -140,6 +140,7 @@ struct fsxattr {
> > > #define FS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */
> > > #define FS_XFLAG_DAX 0x00008000 /* use DAX for IO */
> > > #define FS_XFLAG_COWEXTSIZE 0x00010000 /* CoW extent size allocator hint */
> > > +#define FS_XFLAG_VERITY 0x00020000 /* fs-verity sealed inode */
> > > #define FS_XFLAG_HASATTR 0x80000000 /* no DIFLAG for this */
> > >
> >
> > I don't think "xfs: add inode on-disk VERITY flag" is an accurate description of
> > a patch that involves adding something to the UAPI.
>
> Well it does that, but it also adds the UAPI for querying the
> on-disk flag via the FS_IOC_FSGETXATTR interface as well. It
> probably should be split up into two patches.
Sure.
>
> > Should the other filesystems support this new flag too?
>
> I think they should get it automatically now that it has been
> defined for FS_IOC_FSGETXATTR and added to the generic fileattr flag
> fill functions in fs/ioctl.c.
>
> > I'd also like all ways of getting the verity flag to continue to be mentioned in
> > Documentation/filesystems/fsverity.rst. The existing methods (FS_IOC_GETFLAGS
> > and statx) are already mentioned there.
>
> *nod*
>
Ok, sure, missed that. Will split this patch and add description.
--
- Andrey
On Tue, Apr 04, 2023 at 09:10:47AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 04, 2023 at 04:53:15PM +0200, Andrey Albershteyn wrote:
> > The direct path is not supported on verity files. Attempts to use direct
> > I/O path on such files should fall back to buffered I/O path.
> >
> > Signed-off-by: Andrey Albershteyn <[email protected]>
> > ---
> > fs/xfs/xfs_file.c | 14 +++++++++++---
> > 1 file changed, 11 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index 947b5c436172..9e072e82f6c1 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -244,7 +244,8 @@ xfs_file_dax_read(
> > struct kiocb *iocb,
> > struct iov_iter *to)
> > {
> > - struct xfs_inode *ip = XFS_I(iocb->ki_filp->f_mapping->host);
> > + struct inode *inode = iocb->ki_filp->f_mapping->host;
> > + struct xfs_inode *ip = XFS_I(inode);
> > ssize_t ret = 0;
> >
> > trace_xfs_file_dax_read(iocb, to);
> > @@ -297,10 +298,17 @@ xfs_file_read_iter(
> >
> > if (IS_DAX(inode))
> > ret = xfs_file_dax_read(iocb, to);
> > - else if (iocb->ki_flags & IOCB_DIRECT)
> > + else if (iocb->ki_flags & IOCB_DIRECT && !fsverity_active(inode))
> > ret = xfs_file_dio_read(iocb, to);
> > - else
> > + else {
> > + /*
> > + * In case fs-verity is enabled, we also fallback to the
> > + * buffered read from the direct read path. Therefore,
> > + * IOCB_DIRECT is set and need to be cleared
> > + */
> > + iocb->ki_flags &= ~IOCB_DIRECT;
> > ret = xfs_file_buffered_read(iocb, to);
>
> XFS doesn't usually allow directio fallback to the pagecache. Why
> would fsverity be any different?
Didn't know that, this is what happens on ext4 so I did the same.
Then it probably make sense to just error on DIRECT on verity
sealed file.
>
> --D
>
> > + }
> >
> > if (ret > 0)
> > XFS_STATS_ADD(mp, xs_read_bytes, ret);
> > --
> > 2.38.4
> >
>
--
- Andrey
On Wed, Apr 05, 2023 at 01:01:16PM +0200, Andrey Albershteyn wrote:
> Hi Christoph,
>
> On Tue, Apr 04, 2023 at 08:37:02AM -0700, Christoph Hellwig wrote:
> > > if (iomap_block_needs_zeroing(iter, pos)) {
> > > folio_zero_range(folio, poff, plen);
> > > + if (iomap->flags & IOMAP_F_READ_VERITY) {
> >
> > Wju do we need the new flag vs just testing that folio_ops and
> > folio_ops->verify_folio is non-NULL?
>
> Yes, it can be just test, haven't noticed that it's used only here,
> initially I used it in several places.
>
> >
> > > - ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
> > > - REQ_OP_READ, gfp);
> > > + ctx->bio = bio_alloc_bioset(iomap->bdev, bio_max_segs(nr_vecs),
> > > + REQ_OP_READ, GFP_NOFS, &iomap_read_ioend_bioset);
> >
> > All other callers don't really need the larger bioset, so I'd avoid
> > the unconditional allocation here, but more on that later.
>
> Ok, make sense.
>
> >
> > > + ioend = container_of(ctx->bio, struct iomap_read_ioend,
> > > + read_inline_bio);
> > > + ioend->io_inode = iter->inode;
> > > + if (ctx->ops && ctx->ops->prepare_ioend)
> > > + ctx->ops->prepare_ioend(ioend);
> > > +
> >
> > So what we're doing in writeback and direct I/O, is to:
> >
> > a) have a submit_bio hook
> > b) allow the file system to then hook the bi_end_io caller
> > c) (only in direct O/O for now) allow the file system to provide
> > a bio_set to allocate from
>
> I see.
>
> >
> > I wonder if that also makes sense and keep all the deferral in the
> > file system. We'll need that for the btrfs iomap conversion anyway,
> > and it seems more flexible. The ioend processing would then move into
> > XFS.
> >
>
> Not sure what you mean here.
I /think/ Christoph is talking about allowing callers of iomap pagecache
operations to supply a custom submit_bio function and a bio_set so that
filesystems can add in their own post-IO processing and appropriately
sized (read: minimum you can get away with) bios. I imagine btrfs has
quite a lot of (read) ioend processing they need to do, as will xfs now
that you're adding fsverity.
> > > @@ -156,6 +160,11 @@ struct iomap_folio_ops {
> > > * locked by the iomap code.
> > > */
> > > bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap);
> > > +
> > > + /*
> > > + * Verify folio when successfully read
> > > + */
> > > + bool (*verify_folio)(struct folio *folio, loff_t pos, unsigned int len);
Any reason why we shouldn't return the usual negative errno?
> > Why isn't this in iomap_readpage_ops?
> >
>
> Yes, it can be. But it appears to me to be more relevant to
> _folio_ops, any particular reason to move it there? Don't mind
> moving it to iomap_readpage_ops.
I think the point is that this is a general "check what we just read"
hook, so it could be in readpage_ops since we're never going to need to
re-validate verity contents, right? Hence it could be in readpage_ops
instead of the general iomap_folio_ops.
<shrug> Is there a use case for ->verify_folio that isn't a read post-
processing step?
--D
> --
> - Andrey
>
On Wed, Apr 05, 2023 at 05:01:42PM +0200, Andrey Albershteyn wrote:
> On Tue, Apr 04, 2023 at 09:10:47AM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 04, 2023 at 04:53:15PM +0200, Andrey Albershteyn wrote:
> > > The direct path is not supported on verity files. Attempts to use direct
> > > I/O path on such files should fall back to buffered I/O path.
> > >
> > > Signed-off-by: Andrey Albershteyn <[email protected]>
> > > ---
> > > fs/xfs/xfs_file.c | 14 +++++++++++---
> > > 1 file changed, 11 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > > index 947b5c436172..9e072e82f6c1 100644
> > > --- a/fs/xfs/xfs_file.c
> > > +++ b/fs/xfs/xfs_file.c
> > > @@ -244,7 +244,8 @@ xfs_file_dax_read(
> > > struct kiocb *iocb,
> > > struct iov_iter *to)
> > > {
> > > - struct xfs_inode *ip = XFS_I(iocb->ki_filp->f_mapping->host);
> > > + struct inode *inode = iocb->ki_filp->f_mapping->host;
> > > + struct xfs_inode *ip = XFS_I(inode);
> > > ssize_t ret = 0;
> > >
> > > trace_xfs_file_dax_read(iocb, to);
> > > @@ -297,10 +298,17 @@ xfs_file_read_iter(
> > >
> > > if (IS_DAX(inode))
> > > ret = xfs_file_dax_read(iocb, to);
> > > - else if (iocb->ki_flags & IOCB_DIRECT)
> > > + else if (iocb->ki_flags & IOCB_DIRECT && !fsverity_active(inode))
> > > ret = xfs_file_dio_read(iocb, to);
> > > - else
> > > + else {
> > > + /*
> > > + * In case fs-verity is enabled, we also fallback to the
> > > + * buffered read from the direct read path. Therefore,
> > > + * IOCB_DIRECT is set and need to be cleared
> > > + */
> > > + iocb->ki_flags &= ~IOCB_DIRECT;
> > > ret = xfs_file_buffered_read(iocb, to);
> >
> > XFS doesn't usually allow directio fallback to the pagecache. Why
> > would fsverity be any different?
>
> Didn't know that, this is what happens on ext4 so I did the same.
> Then it probably make sense to just error on DIRECT on verity
> sealed file.
Thinking about this a little more -- I suppose we shouldn't just go
breaking directio reads from a verity file if we can help it. Is there
a way to ask fsverity to perform its validation against some arbitrary
memory buffer that happens to be fs-block aligned? In which case we
could support fsblock-aligned directio reads without falling back to the
page cache?
--D
> >
> > --D
> >
> > > + }
> > >
> > > if (ret > 0)
> > > XFS_STATS_ADD(mp, xs_read_bytes, ret);
> > > --
> > > 2.38.4
> > >
> >
>
> --
> - Andrey
>
Hi Eric,
On Tue, Apr 04, 2023 at 04:32:24PM -0700, Eric Biggers wrote:
> Hi Andrey,
>
> On Tue, Apr 04, 2023 at 04:53:17PM +0200, Andrey Albershteyn wrote:
> > In case of different Merkle tree block size fs-verity expects
> > ->read_merkle_tree_page() to return Merkle tree page filled with
> > Merkle tree blocks. The XFS stores each merkle tree block under
> > extended attribute. Those attributes are addressed by block offset
> > into Merkle tree.
> >
> > This patch make ->read_merkle_tree_page() to fetch multiple merkle
> > tree blocks based on size ratio. Also the reference to each xfs_buf
> > is passed with page->private to ->drop_page().
> >
> > Signed-off-by: Andrey Albershteyn <[email protected]>
> > ---
> > fs/xfs/xfs_verity.c | 74 +++++++++++++++++++++++++++++++++++----------
> > fs/xfs/xfs_verity.h | 8 +++++
> > 2 files changed, 66 insertions(+), 16 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_verity.c b/fs/xfs/xfs_verity.c
> > index a9874ff4efcd..ef0aff216f06 100644
> > --- a/fs/xfs/xfs_verity.c
> > +++ b/fs/xfs/xfs_verity.c
> > @@ -134,6 +134,10 @@ xfs_read_merkle_tree_page(
> > struct page *page = NULL;
> > __be64 name = cpu_to_be64(index << PAGE_SHIFT);
> > uint32_t bs = 1 << log_blocksize;
> > + int blocks_per_page =
> > + (1 << (PAGE_SHIFT - log_blocksize));
> > + int n = 0;
> > + int offset = 0;
> > struct xfs_da_args args = {
> > .dp = ip,
> > .attr_filter = XFS_ATTR_VERITY,
> > @@ -143,26 +147,59 @@ xfs_read_merkle_tree_page(
> > .valuelen = bs,
> > };
> > int error = 0;
> > + bool is_checked = true;
> > + struct xfs_verity_buf_list *buf_list;
> >
> > page = alloc_page(GFP_KERNEL);
> > if (!page)
> > return ERR_PTR(-ENOMEM);
> >
> > - error = xfs_attr_get(&args);
> > - if (error) {
> > - kmem_free(args.value);
> > - xfs_buf_rele(args.bp);
> > + buf_list = kzalloc(sizeof(struct xfs_verity_buf_list), GFP_KERNEL);
> > + if (!buf_list) {
> > put_page(page);
> > - return ERR_PTR(-EFAULT);
> > + return ERR_PTR(-ENOMEM);
> > }
> >
> > - if (args.bp->b_flags & XBF_VERITY_CHECKED)
> > + /*
> > + * Fill the page with Merkle tree blocks. The blcoks_per_page is higher
> > + * than 1 when fs block size != PAGE_SIZE or Merkle tree block size !=
> > + * PAGE SIZE
> > + */
> > + for (n = 0; n < blocks_per_page; n++) {
> > + offset = bs * n;
> > + name = cpu_to_be64(((index << PAGE_SHIFT) + offset));
> > + args.name = (const uint8_t *)&name;
> > +
> > + error = xfs_attr_get(&args);
> > + if (error) {
> > + kmem_free(args.value);
> > + /*
> > + * No more Merkle tree blocks (e.g. this was the last
> > + * block of the tree)
> > + */
> > + if (error == -ENOATTR)
> > + break;
> > + xfs_buf_rele(args.bp);
> > + put_page(page);
> > + kmem_free(buf_list);
> > + return ERR_PTR(-EFAULT);
> > + }
> > +
> > + buf_list->bufs[buf_list->buf_count++] = args.bp;
> > +
> > + /* One of the buffers was dropped */
> > + if (!(args.bp->b_flags & XBF_VERITY_CHECKED))
> > + is_checked = false;
> > +
> > + memcpy(page_address(page) + offset, args.value, args.valuelen);
> > + kmem_free(args.value);
> > + args.value = NULL;
> > + }
>
> I was really hoping for a solution where the cached data can be used directly,
> instead allocating a temporary page and copying the cached data into it every
> time the cache is accessed. The problem with what you have now is that every
> time a single 32-byte hash is accessed, a full page (potentially 64KB!) will be
> allocated and filled. That's not very efficient. The need to allocate the
> temporary page can also cause ENOMEM (which will get reported as EIO).
>
> Did you consider alternatives that would work more efficiently? I think it
> would be worth designing something that works properly with how XFS is planned
> to cache the Merkle tree, instead of designing a workaround.
> ->read_merkle_tree_page was not really designed for what you are doing here.
>
> How about replacing ->read_merkle_tree_page with a function that takes in a
> Merkle tree block index (not a page index!) and hands back a (page, offset) pair
> that identifies where the Merkle tree block's data is located? Or (folio,
> offset), I suppose.
>
> With that, would it be possible to directly return the XFS cache?
>
> - Eric
>
Yeah, I also don't like it, I didn't want to change fs-verity much
so went with this workaround. But as it's ok, I will look into trying
to pass xfs buffers to fs-verity without direct use of
->read_merkle_tree_page(). I think it's possible with (folio,
offset), the xfs buffers aren't xattr value align so the 4k merkle
tree block is stored in two pages.
Thanks for suggestion!
--
- Andrey
On 4/4/23 11:27 AM, Darrick J. Wong wrote:
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index d40de32362b1..b6e99ed3b187 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -30,6 +30,7 @@
> #include "xfs_filestream.h"
> #include "xfs_quota.h"
> #include "xfs_sysfs.h"
> +#include "xfs_verity.h"
> #include "xfs_ondisk.h"
> #include "xfs_rmap_item.h"
> #include "xfs_refcount_item.h"
> @@ -1489,6 +1490,9 @@ xfs_fs_fill_super(
> sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
> #endif
> sb->s_op = &xfs_super_operations;
> +#ifdef CONFIG_FS_VERITY
> + sb->s_vop = &xfs_verity_ops;
> +#endif
>
Hi Andrey - it might be nicer to just do:
fsverity_set_ops(sb, &xfs_verity_ops);
here and use the (existing) helper to avoid the #ifdef. (the #ifdef is handled by the helper)
(ext4 & btrfs could use this too ...)
Thanks!
-Eric
On Wed, Apr 05, 2023 at 12:36:42PM +0200, Andrey Albershteyn wrote:
> Hi Christoph,
>
> On Tue, Apr 04, 2023 at 08:30:36AM -0700, Christoph Hellwig wrote:
> > On Tue, Apr 04, 2023 at 04:53:01PM +0200, Andrey Albershteyn wrote:
> > > Not the whole folio always need to be verified by fs-verity (e.g.
> > > with 1k blocks). Use passed folio's offset and size.
> >
> > Why can't those callers just call fsverity_verify_blocks directly?
> >
>
> They can. Calling _verify_folio with explicit offset; size appeared
> more clear to me. But I'm ok with dropping this patch to have full
> folio verify function.
Well, there is no point in a wrapper if it has the exact same signature
and functionality as the functionality being wrapped.
That being said, right now fsverity_verify_folio, so it might make sense
to either rename it, or rename fsverity_verify_blocks to
fsverity_verify_folio. But that's really a question for Eric.
On Wed, Apr 05, 2023 at 08:06:27AM -0700, Darrick J. Wong wrote:
> > > I wonder if that also makes sense and keep all the deferral in the
> > > file system. We'll need that for the btrfs iomap conversion anyway,
> > > and it seems more flexible. The ioend processing would then move into
> > > XFS.
> > >
> >
> > Not sure what you mean here.
>
> I /think/ Christoph is talking about allowing callers of iomap pagecache
> operations to supply a custom submit_bio function and a bio_set so that
> filesystems can add in their own post-IO processing and appropriately
> sized (read: minimum you can get away with) bios. I imagine btrfs has
> quite a lot of (read) ioend processing they need to do, as will xfs now
> that you're adding fsverity.
Exactly.
> I think the point is that this is a general "check what we just read"
> hook, so it could be in readpage_ops since we're never going to need to
> re-validate verity contents, right? Hence it could be in readpage_ops
> instead of the general iomap_folio_ops.
>
> <shrug> Is there a use case for ->verify_folio that isn't a read post-
> processing step?
Yes. In fact I wonder if the verification might actually be done
in the per-bio end_io handler in the file system. But I'll need
to find some more time to read through the XFS parts of series to
come up with a more intelligent suggestion on that.
On Wed, Apr 05, 2023 at 08:09:27AM -0700, Darrick J. Wong wrote:
> Thinking about this a little more -- I suppose we shouldn't just go
> breaking directio reads from a verity file if we can help it. Is there
> a way to ask fsverity to perform its validation against some arbitrary
> memory buffer that happens to be fs-block aligned?
That would be my preference as well. But maybe Eric know a good reason
why this hasn't been done yet.
Hi Darrick,
On Tue, Apr 04, 2023 at 09:36:02AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 04, 2023 at 04:53:17PM +0200, Andrey Albershteyn wrote:
> > In case of different Merkle tree block size fs-verity expects
> > ->read_merkle_tree_page() to return Merkle tree page filled with
> > Merkle tree blocks. The XFS stores each merkle tree block under
> > extended attribute. Those attributes are addressed by block offset
> > into Merkle tree.
> >
> > This patch make ->read_merkle_tree_page() to fetch multiple merkle
> > tree blocks based on size ratio. Also the reference to each xfs_buf
> > is passed with page->private to ->drop_page().
> >
> > Signed-off-by: Andrey Albershteyn <[email protected]>
> > ---
> > fs/xfs/xfs_verity.c | 74 +++++++++++++++++++++++++++++++++++----------
> > fs/xfs/xfs_verity.h | 8 +++++
> > 2 files changed, 66 insertions(+), 16 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_verity.c b/fs/xfs/xfs_verity.c
> > index a9874ff4efcd..ef0aff216f06 100644
> > --- a/fs/xfs/xfs_verity.c
> > +++ b/fs/xfs/xfs_verity.c
> > @@ -134,6 +134,10 @@ xfs_read_merkle_tree_page(
> > struct page *page = NULL;
> > __be64 name = cpu_to_be64(index << PAGE_SHIFT);
> > uint32_t bs = 1 << log_blocksize;
> > + int blocks_per_page =
> > + (1 << (PAGE_SHIFT - log_blocksize));
> > + int n = 0;
> > + int offset = 0;
> > struct xfs_da_args args = {
> > .dp = ip,
> > .attr_filter = XFS_ATTR_VERITY,
> > @@ -143,26 +147,59 @@ xfs_read_merkle_tree_page(
> > .valuelen = bs,
> > };
> > int error = 0;
> > + bool is_checked = true;
> > + struct xfs_verity_buf_list *buf_list;
> >
> > page = alloc_page(GFP_KERNEL);
> > if (!page)
> > return ERR_PTR(-ENOMEM);
> >
> > - error = xfs_attr_get(&args);
> > - if (error) {
> > - kmem_free(args.value);
> > - xfs_buf_rele(args.bp);
> > + buf_list = kzalloc(sizeof(struct xfs_verity_buf_list), GFP_KERNEL);
> > + if (!buf_list) {
> > put_page(page);
> > - return ERR_PTR(-EFAULT);
> > + return ERR_PTR(-ENOMEM);
> > }
> >
> > - if (args.bp->b_flags & XBF_VERITY_CHECKED)
> > + /*
> > + * Fill the page with Merkle tree blocks. The blcoks_per_page is higher
> > + * than 1 when fs block size != PAGE_SIZE or Merkle tree block size !=
> > + * PAGE SIZE
> > + */
> > + for (n = 0; n < blocks_per_page; n++) {
>
> Ahah, ok, that's why we can't pass the xfs_buf pages up to fsverity.
>
> > + offset = bs * n;
> > + name = cpu_to_be64(((index << PAGE_SHIFT) + offset));
>
> Really this ought to be a typechecked helper...
>
> struct xfs_fsverity_merkle_key {
> __be64 merkleoff;
Sure, thanks, will change this
> };
>
> static inline void
> xfs_fsverity_merkle_key_to_disk(struct xfs_fsverity_merkle_key *k, loff_t pos)
> {
> k->merkeloff = cpu_to_be64(pos);
> }
>
>
>
> > + args.name = (const uint8_t *)&name;
> > +
> > + error = xfs_attr_get(&args);
> > + if (error) {
> > + kmem_free(args.value);
> > + /*
> > + * No more Merkle tree blocks (e.g. this was the last
> > + * block of the tree)
> > + */
> > + if (error == -ENOATTR)
> > + break;
> > + xfs_buf_rele(args.bp);
> > + put_page(page);
> > + kmem_free(buf_list);
> > + return ERR_PTR(-EFAULT);
> > + }
> > +
> > + buf_list->bufs[buf_list->buf_count++] = args.bp;
> > +
> > + /* One of the buffers was dropped */
> > + if (!(args.bp->b_flags & XBF_VERITY_CHECKED))
> > + is_checked = false;
>
> If there's enough memory pressure to cause the merkle tree pages to get
> evicted, what are the chances that the xfs_bufs survive the eviction?
The merkle tree pages are dropped after verification. When page is
dropped xfs_buf is marked as verified. If fs-verity wants to
verify again it will get the same verified buffer. If buffer is
evicted it won't have verified state.
So, with enough memory pressure buffers will be dropped and need to
be reverified.
>
> > + memcpy(page_address(page) + offset, args.value, args.valuelen);
> > + kmem_free(args.value);
> > + args.value = NULL;
> > + }
> > +
> > + if (is_checked)
> > SetPageChecked(page);
> > + page->private = (unsigned long)buf_list;
> >
> > - page->private = (unsigned long)args.bp;
> > - memcpy(page_address(page), args.value, args.valuelen);
> > -
> > - kmem_free(args.value);
> > return page;
> > }
> >
> > @@ -191,16 +228,21 @@ xfs_write_merkle_tree_block(
> >
> > static void
> > xfs_drop_page(
> > - struct page *page)
> > + struct page *page)
> > {
> > - struct xfs_buf *buf = (struct xfs_buf *)page->private;
> > + int i = 0;
> > + struct xfs_verity_buf_list *buf_list =
> > + (struct xfs_verity_buf_list *)page->private;
> >
> > - ASSERT(buf != NULL);
> > + ASSERT(buf_list != NULL);
> >
> > - if (PageChecked(page))
> > - buf->b_flags |= XBF_VERITY_CHECKED;
> > + for (i = 0; i < buf_list->buf_count; i++) {
> > + if (PageChecked(page))
> > + buf_list->bufs[i]->b_flags |= XBF_VERITY_CHECKED;
> > + xfs_buf_rele(buf_list->bufs[i]);
> > + }
> >
> > - xfs_buf_rele(buf);
> > + kmem_free(buf_list);
> > put_page(page);
> > }
> >
> > diff --git a/fs/xfs/xfs_verity.h b/fs/xfs/xfs_verity.h
> > index ae5d87ca32a8..433b2f4ae3bc 100644
> > --- a/fs/xfs/xfs_verity.h
> > +++ b/fs/xfs/xfs_verity.h
> > @@ -16,4 +16,12 @@ extern const struct fsverity_operations xfs_verity_ops;
> > #define xfs_verity_ops NULL
> > #endif /* CONFIG_FS_VERITY */
> >
> > +/* Minimal Merkle tree block size is 1024 */
> > +#define XFS_VERITY_MAX_MBLOCKS_PER_PAGE (1 << (PAGE_SHIFT - 10))
> > +
> > +struct xfs_verity_buf_list {
> > + unsigned int buf_count;
> > + struct xfs_buf *bufs[XFS_VERITY_MAX_MBLOCKS_PER_PAGE];
>
> So... this is going to be a 520-byte allocation on arm64 with 64k pages?
> Even if the merkle tree block size is the same as the page size? Ouch.
yeah, it also allocates a page and is dropped with the page, so,
I took it as an addition to already big chunk of memory. But I
probably will change it, viz. comment from Eric on this patch.
--
- Andrey
On Tue, Apr 04, 2023 at 04:37:13PM -0700, Eric Biggers wrote:
> On Tue, Apr 04, 2023 at 04:52:56PM +0200, Andrey Albershteyn wrote:
> > The patchset is tested with xfstests -g auto on xfs_1k, xfs_4k,
> > xfs_1k_quota, and xfs_4k_quota. Haven't found any major failures.
>
> Just to double check, did you verify that the tests in the "verity" group are
> running, and were not skipped?
Yes, the linked xfstests in cover-letter has patch enabling xfs
(xfsprogs also needed).
>
> - Eric
>
--
- Andrey
Hi Darrick,
On Tue, Apr 04, 2023 at 09:39:42AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 04, 2023 at 04:52:56PM +0200, Andrey Albershteyn wrote:
> > Hi all,
> >
> > This is V2 of fs-verity support in XFS. In this series I did
> > numerous changes from V1 which are described below.
> >
> > This patchset introduces fs-verity [5] support for XFS. This
> > implementation utilizes extended attributes to store fs-verity
> > metadata. The Merkle tree blocks are stored in the remote extended
> > attributes.
> >
> > A few key points:
> > - fs-verity metadata is stored in extended attributes
> > - Direct path and DAX are disabled for inodes with fs-verity
> > - Pages are verified in iomap's read IO path (offloaded to
> > workqueue)
> > - New workqueue for verification processing
> > - New ro-compat flag
> > - Inodes with fs-verity have new on-disk diflag
> > - xfs_attr_get() can return buffer with the attribute
> >
> > The patchset is tested with xfstests -g auto on xfs_1k, xfs_4k,
> > xfs_1k_quota, and xfs_4k_quota. Haven't found any major failures.
> >
> > Patches [6/23] and [7/23] touch ext4, f2fs, btrfs, and patch [8/23]
> > touches erofs, gfs2, and zonefs.
> >
> > The patchset consist of four parts:
> > - [1..4]: Patches from Parent Pointer patchset which add binary
> > xattr names with a few deps
> > - [5..7]: Improvements to core fs-verity
> > - [8..9]: Add read path verification to iomap
> > - [10..23]: Integration of fs-verity to xfs
> >
> > Changes from V1:
> > - Added parent pointer patches for easier testing
> > - Many issues and refactoring points fixed from the V1 review
> > - Adjusted for recent changes in fs-verity core (folios, non-4k)
> > - Dropped disabling of large folios
> > - Completely new fsverity patches (fix, callout, log_blocksize)
> > - Change approach to verification in iomap to the same one as in
> > write path. Callouts to fs instead of direct fs-verity use.
> > - New XFS workqueue for post read folio verification
> > - xfs_attr_get() can return underlying xfs_buf
> > - xfs_bufs are marked with XBF_VERITY_CHECKED to track verified
> > blocks
> >
> > kernel:
> > [1]: https://github.com/alberand/linux/tree/xfs-verity-v2
> >
> > xfsprogs:
> > [2]: https://github.com/alberand/xfsprogs/tree/fsverity-v2
>
> Will there any means for xfs_repair to check the merkle tree contents?
> Should it clear the ondisk inode flag if it decides to trash the xattr
> structure, or is it ok to let the kernel deal with flag set and no
> verity data?
The fsverity-util can calculate merkle tree offline, so, it's
possible for xfs_repair to do the same and compare, also it can
check that all merkle tree blocks are there. The flag without tree
is probably bad as all reading ops will fail and it won't be
possible to regenerate the tree (enable also checks for flag).
--
- Andrey
On Wed, Apr 05, 2023 at 06:02:21PM +0200, Andrey Albershteyn wrote:
> Hi Darrick,
>
> On Tue, Apr 04, 2023 at 09:36:02AM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 04, 2023 at 04:53:17PM +0200, Andrey Albershteyn wrote:
> > > In case of different Merkle tree block size fs-verity expects
> > > ->read_merkle_tree_page() to return Merkle tree page filled with
> > > Merkle tree blocks. The XFS stores each merkle tree block under
> > > extended attribute. Those attributes are addressed by block offset
> > > into Merkle tree.
> > >
> > > This patch make ->read_merkle_tree_page() to fetch multiple merkle
> > > tree blocks based on size ratio. Also the reference to each xfs_buf
> > > is passed with page->private to ->drop_page().
> > >
> > > Signed-off-by: Andrey Albershteyn <[email protected]>
> > > ---
> > > fs/xfs/xfs_verity.c | 74 +++++++++++++++++++++++++++++++++++----------
> > > fs/xfs/xfs_verity.h | 8 +++++
> > > 2 files changed, 66 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_verity.c b/fs/xfs/xfs_verity.c
> > > index a9874ff4efcd..ef0aff216f06 100644
> > > --- a/fs/xfs/xfs_verity.c
> > > +++ b/fs/xfs/xfs_verity.c
> > > @@ -134,6 +134,10 @@ xfs_read_merkle_tree_page(
> > > struct page *page = NULL;
> > > __be64 name = cpu_to_be64(index << PAGE_SHIFT);
> > > uint32_t bs = 1 << log_blocksize;
> > > + int blocks_per_page =
> > > + (1 << (PAGE_SHIFT - log_blocksize));
> > > + int n = 0;
> > > + int offset = 0;
> > > struct xfs_da_args args = {
> > > .dp = ip,
> > > .attr_filter = XFS_ATTR_VERITY,
> > > @@ -143,26 +147,59 @@ xfs_read_merkle_tree_page(
> > > .valuelen = bs,
> > > };
> > > int error = 0;
> > > + bool is_checked = true;
> > > + struct xfs_verity_buf_list *buf_list;
> > >
> > > page = alloc_page(GFP_KERNEL);
> > > if (!page)
> > > return ERR_PTR(-ENOMEM);
> > >
> > > - error = xfs_attr_get(&args);
> > > - if (error) {
> > > - kmem_free(args.value);
> > > - xfs_buf_rele(args.bp);
> > > + buf_list = kzalloc(sizeof(struct xfs_verity_buf_list), GFP_KERNEL);
> > > + if (!buf_list) {
> > > put_page(page);
> > > - return ERR_PTR(-EFAULT);
> > > + return ERR_PTR(-ENOMEM);
> > > }
> > >
> > > - if (args.bp->b_flags & XBF_VERITY_CHECKED)
> > > + /*
> > > + * Fill the page with Merkle tree blocks. The blcoks_per_page is higher
> > > + * than 1 when fs block size != PAGE_SIZE or Merkle tree block size !=
> > > + * PAGE SIZE
> > > + */
> > > + for (n = 0; n < blocks_per_page; n++) {
> >
> > Ahah, ok, that's why we can't pass the xfs_buf pages up to fsverity.
> >
> > > + offset = bs * n;
> > > + name = cpu_to_be64(((index << PAGE_SHIFT) + offset));
> >
> > Really this ought to be a typechecked helper...
> >
> > struct xfs_fsverity_merkle_key {
> > __be64 merkleoff;
>
> Sure, thanks, will change this
>
> > };
> >
> > static inline void
> > xfs_fsverity_merkle_key_to_disk(struct xfs_fsverity_merkle_key *k, loff_t pos)
> > {
> > k->merkeloff = cpu_to_be64(pos);
> > }
> >
> >
> >
> > > + args.name = (const uint8_t *)&name;
> > > +
> > > + error = xfs_attr_get(&args);
> > > + if (error) {
> > > + kmem_free(args.value);
> > > + /*
> > > + * No more Merkle tree blocks (e.g. this was the last
> > > + * block of the tree)
> > > + */
> > > + if (error == -ENOATTR)
> > > + break;
> > > + xfs_buf_rele(args.bp);
> > > + put_page(page);
> > > + kmem_free(buf_list);
> > > + return ERR_PTR(-EFAULT);
> > > + }
> > > +
> > > + buf_list->bufs[buf_list->buf_count++] = args.bp;
> > > +
> > > + /* One of the buffers was dropped */
> > > + if (!(args.bp->b_flags & XBF_VERITY_CHECKED))
> > > + is_checked = false;
> >
> > If there's enough memory pressure to cause the merkle tree pages to get
> > evicted, what are the chances that the xfs_bufs survive the eviction?
>
> The merkle tree pages are dropped after verification. When page is
> dropped xfs_buf is marked as verified. If fs-verity wants to
> verify again it will get the same verified buffer. If buffer is
> evicted it won't have verified state.
>
> So, with enough memory pressure buffers will be dropped and need to
> be reverified.
Please excuse me if this was discussed and rejected long ago, but
perhaps fsverity should try to hang on to the merkle tree pages that
this function returns for as long as possible until reclaim comes for
them?
With the merkle tree page lifetimes extended, you then don't need to
attach the xfs_buf to page->private, nor does xfs have to extend the
buffer cache to stash XBF_VERITY_CHECKED.
Also kinda wondering why you don't allocate the page, kmap it, and then
pass that address into args->value to avoid the third memory allocation
that gets done inside xfs_attr_get?
--D
> >
> > > + memcpy(page_address(page) + offset, args.value, args.valuelen);
> > > + kmem_free(args.value);
> > > + args.value = NULL;
> > > + }
> > > +
> > > + if (is_checked)
> > > SetPageChecked(page);
> > > + page->private = (unsigned long)buf_list;
> > >
> > > - page->private = (unsigned long)args.bp;
> > > - memcpy(page_address(page), args.value, args.valuelen);
> > > -
> > > - kmem_free(args.value);
> > > return page;
> > > }
> > >
> > > @@ -191,16 +228,21 @@ xfs_write_merkle_tree_block(
> > >
> > > static void
> > > xfs_drop_page(
> > > - struct page *page)
> > > + struct page *page)
> > > {
> > > - struct xfs_buf *buf = (struct xfs_buf *)page->private;
> > > + int i = 0;
> > > + struct xfs_verity_buf_list *buf_list =
> > > + (struct xfs_verity_buf_list *)page->private;
> > >
> > > - ASSERT(buf != NULL);
> > > + ASSERT(buf_list != NULL);
> > >
> > > - if (PageChecked(page))
> > > - buf->b_flags |= XBF_VERITY_CHECKED;
> > > + for (i = 0; i < buf_list->buf_count; i++) {
> > > + if (PageChecked(page))
> > > + buf_list->bufs[i]->b_flags |= XBF_VERITY_CHECKED;
> > > + xfs_buf_rele(buf_list->bufs[i]);
> > > + }
> > >
> > > - xfs_buf_rele(buf);
> > > + kmem_free(buf_list);
> > > put_page(page);
> > > }
> > >
> > > diff --git a/fs/xfs/xfs_verity.h b/fs/xfs/xfs_verity.h
> > > index ae5d87ca32a8..433b2f4ae3bc 100644
> > > --- a/fs/xfs/xfs_verity.h
> > > +++ b/fs/xfs/xfs_verity.h
> > > @@ -16,4 +16,12 @@ extern const struct fsverity_operations xfs_verity_ops;
> > > #define xfs_verity_ops NULL
> > > #endif /* CONFIG_FS_VERITY */
> > >
> > > +/* Minimal Merkle tree block size is 1024 */
> > > +#define XFS_VERITY_MAX_MBLOCKS_PER_PAGE (1 << (PAGE_SHIFT - 10))
> > > +
> > > +struct xfs_verity_buf_list {
> > > + unsigned int buf_count;
> > > + struct xfs_buf *bufs[XFS_VERITY_MAX_MBLOCKS_PER_PAGE];
> >
> > So... this is going to be a 520-byte allocation on arm64 with 64k pages?
> > Even if the merkle tree block size is the same as the page size? Ouch.
>
> yeah, it also allocates a page and is dropped with the page, so,
> I took it as an addition to already big chunk of memory. But I
> probably will change it, viz. comment from Eric on this patch.
>
> --
> - Andrey
>
On Wed, Apr 05, 2023 at 08:46:45AM -0700, Christoph Hellwig wrote:
> On Wed, Apr 05, 2023 at 12:36:42PM +0200, Andrey Albershteyn wrote:
> > Hi Christoph,
> >
> > On Tue, Apr 04, 2023 at 08:30:36AM -0700, Christoph Hellwig wrote:
> > > On Tue, Apr 04, 2023 at 04:53:01PM +0200, Andrey Albershteyn wrote:
> > > > Not the whole folio always need to be verified by fs-verity (e.g.
> > > > with 1k blocks). Use passed folio's offset and size.
> > >
> > > Why can't those callers just call fsverity_verify_blocks directly?
> > >
> >
> > They can. Calling _verify_folio with explicit offset; size appeared
> > more clear to me. But I'm ok with dropping this patch to have full
> > folio verify function.
>
> Well, there is no point in a wrapper if it has the exact same signature
> and functionality as the functionality being wrapped.
>
> That being said, right now fsverity_verify_folio, so it might make sense
> to either rename it, or rename fsverity_verify_blocks to
> fsverity_verify_folio. But that's really a question for Eric.
I thought it would be confusing for fsverity_verify_folio() to not actually
verify a whole folio. So, for now we have:
fsverity_verify_page: verify a whole page
fsverity_verify_folio: verify a whole folio
fsverity_verify_blocks: verify a range of blocks in a folio
IMO that makes sense. Note: fsverity_verify_folio() is currently unused, but
ext4 might use it.
So, just use fsverity_verify_blocks().
- Eric
On Wed, Apr 05, 2023 at 08:50:10AM -0700, Christoph Hellwig wrote:
> On Wed, Apr 05, 2023 at 08:09:27AM -0700, Darrick J. Wong wrote:
> > Thinking about this a little more -- I suppose we shouldn't just go
> > breaking directio reads from a verity file if we can help it. Is there
> > a way to ask fsverity to perform its validation against some arbitrary
> > memory buffer that happens to be fs-block aligned?
You could certainly add such a function that wraps around verify_data_block().
The minimal function prototype needed (without supporting readahead or reusing
the ahash_request) would be something like the following, I think:
bool fsverity_verify_blocks_dio(struct inode *inode, u64 pos,
struct folio *folio,
size_t len, size_t offset);
And I really hope that you don't want to do DIO to the *Merkle tree*, as that
would make the problem significantly harder. I think DIO for the data, but
handling the Merkle tree in the usual way, would be okay?
>
> That would be my preference as well. But maybe Eric know a good reason
> why this hasn't been done yet.
>
I believe it would be possible, especially if DIO to the Merkle tree is not in
scope. There just hasn't been a reason to the work yet. And ext4 and f2fs
already fall back to buffer I/O for other filesystem features, so there was
precedent for not bothering with DIO, at least in the initial version.
- Eric
On Wed, Apr 05, 2023 at 09:38:47AM -0700, Darrick J. Wong wrote:
> > The merkle tree pages are dropped after verification. When page is
> > dropped xfs_buf is marked as verified. If fs-verity wants to
> > verify again it will get the same verified buffer. If buffer is
> > evicted it won't have verified state.
> >
> > So, with enough memory pressure buffers will be dropped and need to
> > be reverified.
>
> Please excuse me if this was discussed and rejected long ago, but
> perhaps fsverity should try to hang on to the merkle tree pages that
> this function returns for as long as possible until reclaim comes for
> them?
>
> With the merkle tree page lifetimes extended, you then don't need to
> attach the xfs_buf to page->private, nor does xfs have to extend the
> buffer cache to stash XBF_VERITY_CHECKED.
Well, all the other filesystems that support fsverity (ext4, f2fs, and btrfs)
just cache the Merkle tree pages in the inode's page cache. It's an approach
that I know some people aren't a fan of, but it's efficient and it works.
We could certainly think about moving to a design where fs/verity/ asks the
filesystem to just *read* a Merkle tree block, without adding it to a cache, and
then fs/verity/ implements the caching itself. That would require some large
changes to each filesystem, though, unless we were to double-cache the Merkle
tree blocks which would be inefficient.
So it feels like continuing to have the filesystem (not fs/verity/) be
responsible for the cache is the best way to allow XFS to do things a bit
differently, without regressing the other filesystems.
I'm interested in hearing any other proposals, though.
- Eric
On Wed, Apr 05, 2023 at 08:09:27AM -0700, Darrick J. Wong wrote:
> On Wed, Apr 05, 2023 at 05:01:42PM +0200, Andrey Albershteyn wrote:
> > On Tue, Apr 04, 2023 at 09:10:47AM -0700, Darrick J. Wong wrote:
> > > On Tue, Apr 04, 2023 at 04:53:15PM +0200, Andrey Albershteyn wrote:
> > > > The direct path is not supported on verity files. Attempts to use direct
> > > > I/O path on such files should fall back to buffered I/O path.
> > > >
> > > > Signed-off-by: Andrey Albershteyn <[email protected]>
> > > > ---
> > > > fs/xfs/xfs_file.c | 14 +++++++++++---
> > > > 1 file changed, 11 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > > > index 947b5c436172..9e072e82f6c1 100644
> > > > --- a/fs/xfs/xfs_file.c
> > > > +++ b/fs/xfs/xfs_file.c
> > > > @@ -244,7 +244,8 @@ xfs_file_dax_read(
> > > > struct kiocb *iocb,
> > > > struct iov_iter *to)
> > > > {
> > > > - struct xfs_inode *ip = XFS_I(iocb->ki_filp->f_mapping->host);
> > > > + struct inode *inode = iocb->ki_filp->f_mapping->host;
> > > > + struct xfs_inode *ip = XFS_I(inode);
> > > > ssize_t ret = 0;
> > > >
> > > > trace_xfs_file_dax_read(iocb, to);
> > > > @@ -297,10 +298,17 @@ xfs_file_read_iter(
> > > >
> > > > if (IS_DAX(inode))
> > > > ret = xfs_file_dax_read(iocb, to);
> > > > - else if (iocb->ki_flags & IOCB_DIRECT)
> > > > + else if (iocb->ki_flags & IOCB_DIRECT && !fsverity_active(inode))
> > > > ret = xfs_file_dio_read(iocb, to);
> > > > - else
> > > > + else {
> > > > + /*
> > > > + * In case fs-verity is enabled, we also fallback to the
> > > > + * buffered read from the direct read path. Therefore,
> > > > + * IOCB_DIRECT is set and need to be cleared
> > > > + */
> > > > + iocb->ki_flags &= ~IOCB_DIRECT;
> > > > ret = xfs_file_buffered_read(iocb, to);
> > >
> > > XFS doesn't usually allow directio fallback to the pagecache. Why
> > > would fsverity be any different?
> >
> > Didn't know that, this is what happens on ext4 so I did the same.
> > Then it probably make sense to just error on DIRECT on verity
> > sealed file.
>
> Thinking about this a little more -- I suppose we shouldn't just go
> breaking directio reads from a verity file if we can help it. Is there
> a way to ask fsverity to perform its validation against some arbitrary
> memory buffer that happens to be fs-block aligned?
The memory buffer doesn't even need to be fs-block aligned - it just
needs to be a pointer to memory the kernel can read...
We also need fsverity to be able to handle being passed mapped
kernel memory rather than pages/folios for the merkle tree
interfaces. That way we can just pass it the mapped buffer memory
straight from the xfs-buf and we don't have to do the whacky "copy
from xattr xfs_bufs into pages so fsverity can take temporary
reference counts on what it thinks are page cache pages" as it walks
the merkle tree.
-Dave.
--
Dave Chinner
[email protected]
On Wed, Apr 05, 2023 at 06:02:47PM +0000, Eric Biggers wrote:
> And I really hope that you don't want to do DIO to the *Merkle tree*, as that
Not for XFS - the merkle tree is not held as file data.
That said, the merkle tree in XFS is not page cache or block aligned
metadata either, so we really want the same memory buffer based
interface for the merkle tree reading so that the merkle tree code
can read directly from the xfs-buf rather than requiring us to copy
it out of the xfsbuf into temporary pages...
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Wed, Apr 05, 2023 at 06:16:00PM +0000, Eric Biggers wrote:
> On Wed, Apr 05, 2023 at 09:38:47AM -0700, Darrick J. Wong wrote:
> > > The merkle tree pages are dropped after verification. When page is
> > > dropped xfs_buf is marked as verified. If fs-verity wants to
> > > verify again it will get the same verified buffer. If buffer is
> > > evicted it won't have verified state.
> > >
> > > So, with enough memory pressure buffers will be dropped and need to
> > > be reverified.
> >
> > Please excuse me if this was discussed and rejected long ago, but
> > perhaps fsverity should try to hang on to the merkle tree pages that
> > this function returns for as long as possible until reclaim comes for
> > them?
> >
> > With the merkle tree page lifetimes extended, you then don't need to
> > attach the xfs_buf to page->private, nor does xfs have to extend the
> > buffer cache to stash XBF_VERITY_CHECKED.
>
> Well, all the other filesystems that support fsverity (ext4, f2fs, and btrfs)
> just cache the Merkle tree pages in the inode's page cache. It's an approach
> that I know some people aren't a fan of, but it's efficient and it works.
Which puts pages beyond EOF in the page cache. Given that XFS also
allows persistent block allocation beyond EOF, having both data in the page
cache and blocks beyond EOF that contain unrelated information is a
Real Bad Idea.
Just because putting metadata in the file data address space works
for one filesystem, it doesn't me it's a good idea or that it works
for every filesystem.
> We could certainly think about moving to a design where fs/verity/ asks the
> filesystem to just *read* a Merkle tree block, without adding it to a cache, and
> then fs/verity/ implements the caching itself. That would require some large
> changes to each filesystem, though, unless we were to double-cache the Merkle
> tree blocks which would be inefficient.
No, that's unnecessary.
All we need if for fsverity to require filesystems to pass it byte
addressable data buffers that are externally reference counted. The
filesystem can take a page reference before mapping the page and
passing the kaddr to fsverity, then unmap and drop the reference
when the merkle tree walk is done as per Andrey's new drop callout.
fsverity doesn't need to care what the buffer is made from, how it
is cached, what it's life cycle is, etc. The caching mechanism and
reference counting is entirely controlled by the filesystem callout
implementations, and fsverity only needs to deal with memory buffers
that are guaranteed to live for the entire walk of the merkle
tree....
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Wed, Apr 05, 2023 at 05:12:34PM +0200, Andrey Albershteyn wrote:
> Hi Eric,
>
> On Tue, Apr 04, 2023 at 04:32:24PM -0700, Eric Biggers wrote:
> > Hi Andrey,
> >
> > On Tue, Apr 04, 2023 at 04:53:17PM +0200, Andrey Albershteyn wrote:
> > > In case of different Merkle tree block size fs-verity expects
> > > ->read_merkle_tree_page() to return Merkle tree page filled with
> > > Merkle tree blocks. The XFS stores each merkle tree block under
> > > extended attribute. Those attributes are addressed by block offset
> > > into Merkle tree.
> > >
> > > This patch make ->read_merkle_tree_page() to fetch multiple merkle
> > > tree blocks based on size ratio. Also the reference to each xfs_buf
> > > is passed with page->private to ->drop_page().
> > >
> > > Signed-off-by: Andrey Albershteyn <[email protected]>
> > > ---
> > > fs/xfs/xfs_verity.c | 74 +++++++++++++++++++++++++++++++++++----------
> > > fs/xfs/xfs_verity.h | 8 +++++
> > > 2 files changed, 66 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_verity.c b/fs/xfs/xfs_verity.c
> > > index a9874ff4efcd..ef0aff216f06 100644
> > > --- a/fs/xfs/xfs_verity.c
> > > +++ b/fs/xfs/xfs_verity.c
> > > @@ -134,6 +134,10 @@ xfs_read_merkle_tree_page(
> > > struct page *page = NULL;
> > > __be64 name = cpu_to_be64(index << PAGE_SHIFT);
> > > uint32_t bs = 1 << log_blocksize;
> > > + int blocks_per_page =
> > > + (1 << (PAGE_SHIFT - log_blocksize));
> > > + int n = 0;
> > > + int offset = 0;
> > > struct xfs_da_args args = {
> > > .dp = ip,
> > > .attr_filter = XFS_ATTR_VERITY,
> > > @@ -143,26 +147,59 @@ xfs_read_merkle_tree_page(
> > > .valuelen = bs,
> > > };
> > > int error = 0;
> > > + bool is_checked = true;
> > > + struct xfs_verity_buf_list *buf_list;
> > >
> > > page = alloc_page(GFP_KERNEL);
> > > if (!page)
> > > return ERR_PTR(-ENOMEM);
> > >
> > > - error = xfs_attr_get(&args);
> > > - if (error) {
> > > - kmem_free(args.value);
> > > - xfs_buf_rele(args.bp);
> > > + buf_list = kzalloc(sizeof(struct xfs_verity_buf_list), GFP_KERNEL);
> > > + if (!buf_list) {
> > > put_page(page);
> > > - return ERR_PTR(-EFAULT);
> > > + return ERR_PTR(-ENOMEM);
> > > }
> > >
> > > - if (args.bp->b_flags & XBF_VERITY_CHECKED)
> > > + /*
> > > + * Fill the page with Merkle tree blocks. The blcoks_per_page is higher
> > > + * than 1 when fs block size != PAGE_SIZE or Merkle tree block size !=
> > > + * PAGE SIZE
> > > + */
> > > + for (n = 0; n < blocks_per_page; n++) {
> > > + offset = bs * n;
> > > + name = cpu_to_be64(((index << PAGE_SHIFT) + offset));
> > > + args.name = (const uint8_t *)&name;
> > > +
> > > + error = xfs_attr_get(&args);
> > > + if (error) {
> > > + kmem_free(args.value);
> > > + /*
> > > + * No more Merkle tree blocks (e.g. this was the last
> > > + * block of the tree)
> > > + */
> > > + if (error == -ENOATTR)
> > > + break;
> > > + xfs_buf_rele(args.bp);
> > > + put_page(page);
> > > + kmem_free(buf_list);
> > > + return ERR_PTR(-EFAULT);
> > > + }
> > > +
> > > + buf_list->bufs[buf_list->buf_count++] = args.bp;
> > > +
> > > + /* One of the buffers was dropped */
> > > + if (!(args.bp->b_flags & XBF_VERITY_CHECKED))
> > > + is_checked = false;
> > > +
> > > + memcpy(page_address(page) + offset, args.value, args.valuelen);
> > > + kmem_free(args.value);
> > > + args.value = NULL;
> > > + }
> >
> > I was really hoping for a solution where the cached data can be used directly,
> > instead allocating a temporary page and copying the cached data into it every
> > time the cache is accessed. The problem with what you have now is that every
> > time a single 32-byte hash is accessed, a full page (potentially 64KB!) will be
> > allocated and filled. That's not very efficient. The need to allocate the
> > temporary page can also cause ENOMEM (which will get reported as EIO).
> >
> > Did you consider alternatives that would work more efficiently? I think it
> > would be worth designing something that works properly with how XFS is planned
> > to cache the Merkle tree, instead of designing a workaround.
> > ->read_merkle_tree_page was not really designed for what you are doing here.
> >
> > How about replacing ->read_merkle_tree_page with a function that takes in a
> > Merkle tree block index (not a page index!) and hands back a (page, offset) pair
> > that identifies where the Merkle tree block's data is located? Or (folio,
> > offset), I suppose.
{kaddr, len}, please.
> >
> > With that, would it be possible to directly return the XFS cache?
> >
> > - Eric
> >
>
> Yeah, I also don't like it, I didn't want to change fs-verity much
> so went with this workaround. But as it's ok, I will look into trying
> to pass xfs buffers to fs-verity without direct use of
> ->read_merkle_tree_page(). I think it's possible with (folio,
> offset), the xfs buffers aren't xattr value align so the 4k merkle
> tree block is stored in two pages.
I don't think this is necessary to actually merge the code. We want
it to work correctly as the primary concern, performance is a
secondary concern.
Regardless, as you mention, the xfs_buf is not made up of contiguous
pages so the merkle tree block data will be split across two
(or more) pages. AFAICT, the fsverity code doesn't work with data
structures that span multiple disjoint pages...
Another problem is that the xfs-buf might be backed by heap memory
(e.g. 4kB fs block size on 64kB PAGE_SIZE) and so it cannot be
treated like a page cache page by the fsverity merkle tree code. We
most definitely do not want to be passing pages containing heap
memory to functions expecting to be passed lru-resident page cache
pages....
That said, xfs-bufs do have a stable method of addressing the data
in the buffers, and all the XFS code uses this to access and
manipulate data directly in the buffers.
That is, xfs_buf_offset() returns a mapped kaddr that points to the
contiguous memory region containing the metadata in the buffer. If
the xfs_buf spans multiple pages, it will return a kaddr pointing
into the contiguous vmapped memory address that maps the entire
buffer data range. If it is heap memory, it simply returns a pointer
into that heap memory. If it's a single page, then it returns the
kaddr for the data within the page.
If you move all the assumptions about how the merkle tree data is
managed out of fsverity and require the fielsystems to do the
mapping to kaddrs and reference counting to guarantee life times,
then the need for multiple different methods for reading merkle tree
data go away...
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Thu, Apr 06, 2023 at 08:26:46AM +1000, Dave Chinner wrote:
> > We could certainly think about moving to a design where fs/verity/ asks the
> > filesystem to just *read* a Merkle tree block, without adding it to a cache, and
> > then fs/verity/ implements the caching itself. That would require some large
> > changes to each filesystem, though, unless we were to double-cache the Merkle
> > tree blocks which would be inefficient.
>
> No, that's unnecessary.
>
> All we need if for fsverity to require filesystems to pass it byte
> addressable data buffers that are externally reference counted. The
> filesystem can take a page reference before mapping the page and
> passing the kaddr to fsverity, then unmap and drop the reference
> when the merkle tree walk is done as per Andrey's new drop callout.
>
> fsverity doesn't need to care what the buffer is made from, how it
> is cached, what it's life cycle is, etc. The caching mechanism and
> reference counting is entirely controlled by the filesystem callout
> implementations, and fsverity only needs to deal with memory buffers
> that are guaranteed to live for the entire walk of the merkle
> tree....
Sure. Just a couple notes:
First, fs/verity/ does still need to be able to tell whether the buffer is newly
instantiated or not.
Second, fs/verity/ uses the ahash API to do the hashing. ahash is a
scatterlist-based API. Virtual addresses can still be used (see sg_set_buf()),
but the memory cannot be vmalloc'ed memory, since virt_to_page() needs to work.
Does XFS use vmalloc'ed memory for these buffers?
BTW, converting fs/verity/ from ahash to shash is an option; I've really never
been a fan of the scatterlist-based crypto APIs! The disadvantage of doing
this, though, would be that it would remove support for all the hardware crypto
drivers. That *might* actually be okay, as that approach to crypto acceleration
has mostly fallen out of favor, in favor of CPU-based acceleration. But I do
worry about e.g. someone coming out of the woodwork and saying they need to use
fsverity on a low-powered ARM board that has a crypto accelerator like CAAM, and
they MUST use their crypto accelerator to get acceptable performance.
- Eric
On Wed, Apr 05, 2023 at 10:54:06PM +0000, Eric Biggers wrote:
> On Thu, Apr 06, 2023 at 08:26:46AM +1000, Dave Chinner wrote:
> > > We could certainly think about moving to a design where fs/verity/ asks the
> > > filesystem to just *read* a Merkle tree block, without adding it to a cache, and
> > > then fs/verity/ implements the caching itself. That would require some large
> > > changes to each filesystem, though, unless we were to double-cache the Merkle
> > > tree blocks which would be inefficient.
> >
> > No, that's unnecessary.
> >
> > All we need if for fsverity to require filesystems to pass it byte
> > addressable data buffers that are externally reference counted. The
> > filesystem can take a page reference before mapping the page and
> > passing the kaddr to fsverity, then unmap and drop the reference
> > when the merkle tree walk is done as per Andrey's new drop callout.
> >
> > fsverity doesn't need to care what the buffer is made from, how it
> > is cached, what it's life cycle is, etc. The caching mechanism and
> > reference counting is entirely controlled by the filesystem callout
> > implementations, and fsverity only needs to deal with memory buffers
> > that are guaranteed to live for the entire walk of the merkle
> > tree....
>
> Sure. Just a couple notes:
>
> First, fs/verity/ does still need to be able to tell whether the buffer is newly
> instantiated or not.
Boolean flag from the caller.
> Second, fs/verity/ uses the ahash API to do the hashing. ahash is a
> scatterlist-based API. Virtual addresses can still be used (see sg_set_buf()),
> but the memory cannot be vmalloc'ed memory, since virt_to_page() needs to work.
> Does XFS use vmalloc'ed memory for these buffers?
Not vmalloc'ed, but vmapped. we allocate the pages individually, but
then call vm_map_page() to present the higher level code with a
single contiguous memory range if it is a multi-page buffer.
We do have the backing info held in the buffer, and that's what we
use for IO. If fsverity needs a page based scatter/gather list
for hardware offload, it could ask the filesystem to provide it
for that given buffer...
> BTW, converting fs/verity/ from ahash to shash is an option; I've really never
> been a fan of the scatterlist-based crypto APIs! The disadvantage of doing
> this, though, would be that it would remove support for all the hardware crypto
> drivers.
>
> That *might* actually be okay, as that approach to crypto acceleration
> has mostly fallen out of favor, in favor of CPU-based acceleration. But I do
> worry about e.g. someone coming out of the woodwork and saying they need to use
> fsverity on a low-powered ARM board that has a crypto accelerator like CAAM, and
> they MUST use their crypto accelerator to get acceptable performance.
True, but we are very unlikely to be using XFS on such small
systems and I don't think we really care about XFS performance on
android sized systems, either.
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Thu, Apr 06, 2023 at 09:37:53AM +1000, Dave Chinner wrote:
> On Wed, Apr 05, 2023 at 10:54:06PM +0000, Eric Biggers wrote:
> > On Thu, Apr 06, 2023 at 08:26:46AM +1000, Dave Chinner wrote:
> > > > We could certainly think about moving to a design where fs/verity/ asks the
> > > > filesystem to just *read* a Merkle tree block, without adding it to a cache, and
> > > > then fs/verity/ implements the caching itself. That would require some large
> > > > changes to each filesystem, though, unless we were to double-cache the Merkle
> > > > tree blocks which would be inefficient.
> > >
> > > No, that's unnecessary.
> > >
> > > All we need if for fsverity to require filesystems to pass it byte
> > > addressable data buffers that are externally reference counted. The
> > > filesystem can take a page reference before mapping the page and
> > > passing the kaddr to fsverity, then unmap and drop the reference
> > > when the merkle tree walk is done as per Andrey's new drop callout.
> > >
> > > fsverity doesn't need to care what the buffer is made from, how it
> > > is cached, what it's life cycle is, etc. The caching mechanism and
> > > reference counting is entirely controlled by the filesystem callout
> > > implementations, and fsverity only needs to deal with memory buffers
> > > that are guaranteed to live for the entire walk of the merkle
> > > tree....
> >
> > Sure. Just a couple notes:
> >
> > First, fs/verity/ does still need to be able to tell whether the buffer is newly
> > instantiated or not.
>
> Boolean flag from the caller.
>
> > Second, fs/verity/ uses the ahash API to do the hashing. ahash is a
> > scatterlist-based API. Virtual addresses can still be used (see sg_set_buf()),
> > but the memory cannot be vmalloc'ed memory, since virt_to_page() needs to work.
> > Does XFS use vmalloc'ed memory for these buffers?
>
> Not vmalloc'ed, but vmapped. we allocate the pages individually, but
> then call vm_map_page() to present the higher level code with a
> single contiguous memory range if it is a multi-page buffer.
>
> We do have the backing info held in the buffer, and that's what we
> use for IO. If fsverity needs a page based scatter/gather list
> for hardware offload, it could ask the filesystem to provide it
> for that given buffer...
>
> > BTW, converting fs/verity/ from ahash to shash is an option; I've really never
> > been a fan of the scatterlist-based crypto APIs! The disadvantage of doing
> > this, though, would be that it would remove support for all the hardware crypto
> > drivers.
> >
> > That *might* actually be okay, as that approach to crypto acceleration
> > has mostly fallen out of favor, in favor of CPU-based acceleration. But I do
> > worry about e.g. someone coming out of the woodwork and saying they need to use
> > fsverity on a low-powered ARM board that has a crypto accelerator like CAAM, and
> > they MUST use their crypto accelerator to get acceptable performance.
>
> True, but we are very unlikely to be using XFS on such small
> systems and I don't think we really care about XFS performance on
> android sized systems, either.
>
FYI, I've sent an RFC patch that converts fs/verity/ from ahash to shash:
https://lore.kernel.org/r/[email protected]
It would be great if we could do that. But I need to get a better sense for
whether anyone will complain...
- Eric
On Wed, Apr 05, 2023 at 05:44:36PM -0700, Eric Biggers wrote:
> > Not vmalloc'ed, but vmapped. we allocate the pages individually, but
> > then call vm_map_page() to present the higher level code with a
> > single contiguous memory range if it is a multi-page buffer.
> >
> > We do have the backing info held in the buffer, and that's what we
> > use for IO. If fsverity needs a page based scatter/gather list
> > for hardware offload, it could ask the filesystem to provide it
> > for that given buffer...
> >
> > > BTW, converting fs/verity/ from ahash to shash is an option; I've really never
> > > been a fan of the scatterlist-based crypto APIs! The disadvantage of doing
> > > this, though, would be that it would remove support for all the hardware crypto
> > > drivers.
> > >
> > > That *might* actually be okay, as that approach to crypto acceleration
> > > has mostly fallen out of favor, in favor of CPU-based acceleration. But I do
> > > worry about e.g. someone coming out of the woodwork and saying they need to use
> > > fsverity on a low-powered ARM board that has a crypto accelerator like CAAM, and
> > > they MUST use their crypto accelerator to get acceptable performance.
> >
> > True, but we are very unlikely to be using XFS on such small
> > systems and I don't think we really care about XFS performance on
> > android sized systems, either.
> >
>
> FYI, I've sent an RFC patch that converts fs/verity/ from ahash to shash:
> https://lore.kernel.org/r/[email protected]
>
> It would be great if we could do that. But I need to get a better sense for
> whether anyone will complain...
FWIW, dm-verity went in the other direction. It started with shash, and then in
2017 it was switched to ahash by https://git.kernel.org/linus/d1ac3ff008fb9a48
("dm verity: switch to using asynchronous hash crypto API").
I think that was part of my motivation for using ahash in fsverity from the
beginning.
Still, it does seem that ahash is more trouble than it's worth these days...
- Eric
Dave is going to hate me for this, but..
I've been looking over some of the interfaces here, and I'm starting
to very seriously questioning the design decisions of storing the
fsverity hashes in xattrs.
Yes, storing them beyond i_size in the file is a bit of a hack, but
it allows to reuse a lot of the existing infrastructure, and much
of fsverity is based around it. So storing them in an xattrs causes
a lot of churn in the interface. And the XFS side with special
casing xattr indices also seems not exactly nice.
On Mon, Apr 10, 2023 at 10:19:46PM -0700, Christoph Hellwig wrote:
> Dave is going to hate me for this, but..
>
> I've been looking over some of the interfaces here, and I'm starting
> to very seriously questioning the design decisions of storing the
> fsverity hashes in xattrs.
>
> Yes, storing them beyond i_size in the file is a bit of a hack, but
> it allows to reuse a lot of the existing infrastructure, and much
> of fsverity is based around it. So storing them in an xattrs causes
> a lot of churn in the interface. And the XFS side with special
> casing xattr indices also seems not exactly nice.
It seems it's really just the Merkle tree caching interface that is causing
problems, as it's currently too closely tied to the page cache? That is just an
implementation detail that could be reworked along the lines of what is being
discussed.
But anyway, it is up to the XFS folks. Keep in mind there is also the option of
doing what btrfs is doing, where it stores the Merkle tree separately from the
file data stream, but caches it past i_size in the page cache at runtime.
I guess there is also the issue of encryption, which hasn't come up yet since
we're talking about fsverity support only. The Merkle tree (including the
fsverity_descriptor) is supposed to be encrypted, just like the file contents
are. Having it be stored after the file contents accomplishes that easily...
Of course, it doesn't have to be that way; a separate key could be derived, or
the Merkle tree blocks could be encrypted with the file contents key using
indices past i_size, without them physically being stored in the data stream.
- Eric
On Tue, Apr 11, 2023 at 07:33:19PM -0700, Eric Biggers wrote:
> On Mon, Apr 10, 2023 at 10:19:46PM -0700, Christoph Hellwig wrote:
> > Dave is going to hate me for this, but..
> >
> > I've been looking over some of the interfaces here, and I'm starting
> > to very seriously questioning the design decisions of storing the
> > fsverity hashes in xattrs.
> >
> > Yes, storing them beyond i_size in the file is a bit of a hack, but
> > it allows to reuse a lot of the existing infrastructure, and much
> > of fsverity is based around it. So storing them in an xattrs causes
> > a lot of churn in the interface. And the XFS side with special
> > casing xattr indices also seems not exactly nice.
>
> It seems it's really just the Merkle tree caching interface that is causing
> problems, as it's currently too closely tied to the page cache? That is just an
> implementation detail that could be reworked along the lines of what is being
> discussed.
>
> But anyway, it is up to the XFS folks. Keep in mind there is also the option of
> doing what btrfs is doing, where it stores the Merkle tree separately from the
> file data stream, but caches it past i_size in the page cache at runtime.
Right. It's not entirely simple to store metadata on disk beyond EOF
in XFS because of all the assumptions throughout the IO path and
allocator interfaces that it can allocate space beyond EOF at will
and something else will clean it up later if it is not needed. This
impacts on truncate, delayed allocation, writeback, IO completion,
EOF block removal on file close, background garbage collection,
ENOSPC/EDQUOT driven space freeing, etc. Some of these things cross
over into iomap infrastructure, too.
AFAIC, it's far more intricate, complex and risky to try to store
merkle tree data beyond EOF than it is to put it in an xattr
namespace because IO path EOF handling bugs result in user data
corruption. This happens over and over again, no matter how careful
we are about these aspects of user data handling.
OTOH, putting the merkle tree data in a different namespace avoids
these issues completely. Yes, we now have to solve an API mismatch,
but we aren't risking the addition of IO path data corruption bugs
to every non-fsverity filesystem in production...
Hence I think copying the btrfs approach (i.e. only caching the
merkle tree data in the page cache beyond EOF) would be as far as I
think we'd want to go. Realistically, there would be little
practical difference between btrfs storing the merkle tree blocks in
a separate internal btree and XFS storing them in an internal
private xattr btree namespace.
I would, however, prefer not to have to do this at all if we could
simply map the blocks directly out of the xattr buffers as we
already do internally for all the XFS code...
> I guess there is also the issue of encryption, which hasn't come up yet since
> we're talking about fsverity support only. The Merkle tree (including the
> fsverity_descriptor) is supposed to be encrypted, just like the file contents
> are. Having it be stored after the file contents accomplishes that easily...
> Of course, it doesn't have to be that way; a separate key could be derived, or
> the Merkle tree blocks could be encrypted with the file contents key using
> indices past i_size, without them physically being stored in the data stream.
I'm expecting that fscrypt for XFS will include encryption of the
xattr names and values (just like we will need to do for directory
names) except for the xattrs that hold the encryption keys
themselves. That means the merkle tree blocks should get encrypted
without any extra work needing to be done anywhere. This will
simply require the fscrypt keys to be held in a private internal
xattr namespace that isn't encrypted, but that's realtively trivial
to do...
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Wed, Apr 12, 2023 at 01:18:26PM +1000, Dave Chinner wrote:
> Right. It's not entirely simple to store metadata on disk beyond EOF
> in XFS because of all the assumptions throughout the IO path and
> allocator interfaces that it can allocate space beyond EOF at will
> and something else will clean it up later if it is not needed. This
> impacts on truncate, delayed allocation, writeback, IO completion,
> EOF block removal on file close, background garbage collection,
> ENOSPC/EDQUOT driven space freeing, etc. Some of these things cross
> over into iomap infrastructure, too.
To me that actually makes it easier to support the metadata beyond
i_size. Remember that the file is immutable after add fsverity
hash is added. So basically we just need to skip freeing the
eofblocks if that flag is set.
On Tue, Apr 11, 2023 at 07:33:19PM -0700, Eric Biggers wrote:
> It seems it's really just the Merkle tree caching interface that is causing
> problems, as it's currently too closely tied to the page cache? That is just an
> implementation detail that could be reworked along the lines of what is being
> discussed.
Well, that and some of the XFS internal changes that seem a bit ugly.
But it's not only very much tied to the page cache, but also to
page aligned data, which is really part of the problem.
> But anyway, it is up to the XFS folks. Keep in mind there is also the option of
> doing what btrfs is doing, where it stores the Merkle tree separately from the
> file data stream, but caches it past i_size in the page cache at runtime.
That seems to be the worst of both worlds.
> I guess there is also the issue of encryption, which hasn't come up yet since
> we're talking about fsverity support only. The Merkle tree (including the
> fsverity_descriptor) is supposed to be encrypted, just like the file contents
> are. Having it be stored after the file contents accomplishes that easily...
> Of course, it doesn't have to be that way; a separate key could be derived, or
> the Merkle tree blocks could be encrypted with the file contents key using
> indices past i_size, without them physically being stored in the data stream.
xattrs contents better be encrypted as well, fsverity or not.