2022-12-23 20:37:53

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v2 00/11] fsverity: support for non-4K pages

[This patchset applies to mainline + some fsverity cleanups I sent out
recently. You can get everything from tag "fsverity-non4k-v2" of
https://git.kernel.org/pub/scm/fs/fscrypt/fscrypt.git ]

Currently, filesystems (ext4, f2fs, and btrfs) only support fsverity
when the Merkle tree block size, filesystem block size, and page size
are all the same. In practice that means 4K, since increasing the page
size, e.g. to 16K, forces the Merkle tree block size and filesystem
block size to be increased accordingly. That can be impractical; for
one, users want the same file signatures to work on all systems.

Therefore, this patchset reduces the coupling between these sizes.

First, patches 1-4 are cleanups.

Second, patches 5-9 allow the Merkle tree block size to be less than the
page size or filesystem block size, provided that it's not larger than
either one. This involves, among other things, changing the way that
fs/verity/verify.c tracks which hash blocks have been verified.

Finally, patches 10-11 make ext4 support fsverity when the filesystem
block size is less than the page size. Note, f2fs doesn't need similar
changes because f2fs always assumes that the filesystem block size and
page size are the same anyway. I haven't looked into btrfs yet.

I've tested this patchset using the "verity" group of tests in xfstests
with the following xfstests patchset applied:
"[PATCH v2 00/10] xfstests: update verity tests for non-4K block and page size"
(https://lore.kernel.org/fstests/[email protected]/T/#u)

Note: on the thread "[RFC PATCH 00/11] fs-verity support for XFS"
(https://lore.kernel.org/linux-xfs/[email protected]/T/#u)
there have been many requests for other things to support, including:

* folios in the pagecache
* alternative Merkle tree caching methods
* direct I/O
* merkle_tree_block_size > page_size
* extremely large files, using a reclaimable bitmap

We shouldn't try to boil the ocean, though, so to keep the scope of this
patchset manageable I haven't changed it significantly from v1. This
patchset does bring us closer to many of the above, just not all the way
there. I'd like to follow up this patchset with a change to support
folios, which should be straightforward. Next, we can do a change to
generalize the Merkle tree interface to allow XFS to use an alternative
caching method, as that sounds like the highest priority item for XFS.

Anyway, the changelog is:

Changed in v2:
- Rebased onto the recent fsverity cleanups.
- Split some parts of the big "support verification" patch into
separate patches.
- Passed the data_pos to verify_data_block() instead of computing it
using page->index, to make it ready for folio and DIO support.
- Eliminated some unnecessary arithmetic in verify_data_block().
- Changed the log_* fields in merkle_tree_params to u8.
- Restored PageLocked and !PageUptodate checks for pagecache pages.
- Eliminated the change to fsverity_hash_buffer().
- Other small cleanups

Eric Biggers (11):
fsverity: use unsigned long for level_start
fsverity: simplify Merkle tree readahead size calculation
fsverity: store log2(digest_size) precomputed
fsverity: use EFBIG for file too large to enable verity
fsverity: replace fsverity_hash_page() with fsverity_hash_block()
fsverity: support verification with tree block size < PAGE_SIZE
fsverity: support enabling with tree block size < PAGE_SIZE
ext4: simplify ext4_readpage_limit()
f2fs: simplify f2fs_readpage_limit()
fs/buffer.c: support fsverity in block_read_full_folio()
ext4: allow verity with fs block size < PAGE_SIZE

Documentation/filesystems/fsverity.rst | 76 +++---
fs/buffer.c | 67 ++++-
fs/ext4/readpage.c | 3 +-
fs/ext4/super.c | 5 -
fs/f2fs/data.c | 3 +-
fs/verity/enable.c | 260 ++++++++++----------
fs/verity/fsverity_private.h | 20 +-
fs/verity/hash_algs.c | 24 +-
fs/verity/open.c | 98 ++++++--
fs/verity/verify.c | 325 +++++++++++++++++--------
include/linux/fsverity.h | 14 +-
11 files changed, 565 insertions(+), 330 deletions(-)

--
2.39.0


2022-12-23 20:37:55

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v2 02/11] fsverity: simplify Merkle tree readahead size calculation

From: Eric Biggers <[email protected]>

First, calculate max_ra_pages more efficiently by using the bio size.

Second, calculate the number of readahead pages from the hash page
index, instead of calculating it ahead of time using the data page
index. This ends up being a bit simpler, especially since level 0 is
last in the tree, so we can just limit the readahead to the tree size.

Signed-off-by: Eric Biggers <[email protected]>
---
fs/verity/fsverity_private.h | 2 +-
fs/verity/open.c | 3 ++-
fs/verity/verify.c | 21 +++++++--------------
3 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index e8b40c8000be7..48b97f5d05569 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -46,7 +46,7 @@ struct merkle_tree_params {
unsigned int log_arity; /* log2(hashes_per_block) */
unsigned int num_levels; /* number of levels in Merkle tree */
u64 tree_size; /* Merkle tree size in bytes */
- unsigned long level0_blocks; /* number of blocks in tree level 0 */
+ unsigned long tree_pages; /* Merkle tree size in pages */

/*
* Starting block index for each tree level, ordered from leaf level (0)
diff --git a/fs/verity/open.c b/fs/verity/open.c
index 83ccc3c137363..e356eefb54d7b 100644
--- a/fs/verity/open.c
+++ b/fs/verity/open.c
@@ -7,6 +7,7 @@

#include "fsverity_private.h"

+#include <linux/mm.h>
#include <linux/slab.h>

static struct kmem_cache *fsverity_info_cachep;
@@ -97,7 +98,6 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params,
params->log_arity;
blocks_in_level[params->num_levels++] = blocks;
}
- params->level0_blocks = blocks_in_level[0];

/* Compute the starting block of each level */
offset = 0;
@@ -118,6 +118,7 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params,
}

params->tree_size = offset << log_blocksize;
+ params->tree_pages = PAGE_ALIGN(params->tree_size) >> PAGE_SHIFT;
return 0;

out_err:
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index de0d7aef785bf..4c57a1bd01afc 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -74,7 +74,7 @@ static inline int cmp_hashes(const struct fsverity_info *vi,
*/
static bool verify_page(struct inode *inode, const struct fsverity_info *vi,
struct ahash_request *req, struct page *data_page,
- unsigned long level0_ra_pages)
+ unsigned long max_ra_pages)
{
const struct merkle_tree_params *params = &vi->tree_params;
const unsigned int hsize = params->digest_size;
@@ -103,7 +103,8 @@ static bool verify_page(struct inode *inode, const struct fsverity_info *vi,
hash_at_level(params, index, level, &hindex, &hoffset);

hpage = inode->i_sb->s_vop->read_merkle_tree_page(inode, hindex,
- level == 0 ? level0_ra_pages : 0);
+ level == 0 ? min(max_ra_pages,
+ params->tree_pages - hindex) : 0);
if (IS_ERR(hpage)) {
err = PTR_ERR(hpage);
fsverity_err(inode,
@@ -199,14 +200,13 @@ void fsverity_verify_bio(struct bio *bio)
{
struct inode *inode = bio_first_page_all(bio)->mapping->host;
const struct fsverity_info *vi = inode->i_verity_info;
- const struct merkle_tree_params *params = &vi->tree_params;
struct ahash_request *req;
struct bio_vec *bv;
struct bvec_iter_all iter_all;
unsigned long max_ra_pages = 0;

/* This allocation never fails, since it's mempool-backed. */
- req = fsverity_alloc_hash_request(params->hash_alg, GFP_NOFS);
+ req = fsverity_alloc_hash_request(vi->tree_params.hash_alg, GFP_NOFS);

if (bio->bi_opf & REQ_RAHEAD) {
/*
@@ -218,24 +218,17 @@ void fsverity_verify_bio(struct bio *bio)
* This improves sequential read performance, as it greatly
* reduces the number of I/O requests made to the Merkle tree.
*/
- bio_for_each_segment_all(bv, bio, iter_all)
- max_ra_pages++;
- max_ra_pages /= 4;
+ max_ra_pages = bio->bi_iter.bi_size >> (PAGE_SHIFT + 2);
}

bio_for_each_segment_all(bv, bio, iter_all) {
- struct page *page = bv->bv_page;
- unsigned long level0_index = page->index >> params->log_arity;
- unsigned long level0_ra_pages =
- min(max_ra_pages, params->level0_blocks - level0_index);
-
- if (!verify_page(inode, vi, req, page, level0_ra_pages)) {
+ if (!verify_page(inode, vi, req, bv->bv_page, max_ra_pages)) {
bio->bi_status = BLK_STS_IOERR;
break;
}
}

- fsverity_free_hash_request(params->hash_alg, req);
+ fsverity_free_hash_request(vi->tree_params.hash_alg, req);
}
EXPORT_SYMBOL_GPL(fsverity_verify_bio);
#endif /* CONFIG_BLOCK */
--
2.39.0

2022-12-23 20:38:05

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v2 04/11] fsverity: use EFBIG for file too large to enable verity

From: Eric Biggers <[email protected]>

Currently, there is an implementation limit where files can't have more
than 8 Merkle tree levels. With SHA-256 and 4K blocks, this limit is
never reached, since a file would need to be larger than 2**64 bytes to
need 9 levels. However, with SHA-512, 9 levels are needed for files
larger than about 1.15 EB, which is possible on btrfs. Therefore, this
limit technically became reachable when btrfs added fsverity support.

Meanwhile, support for merkle_tree_block_size < PAGE_SIZE will introduce
another implementation limit on file size, resulting from the use of an
in-memory bitmap to track which Merkle tree blocks have been verified.

In any case, currently FS_IOC_ENABLE_VERITY fails with EINVAL when the
file is too large. This is undocumented, and also ambiguous since
EINVAL can mean other things too. Let's change the error code to EFBIG,
which is much clearer, and document it.

Signed-off-by: Eric Biggers <[email protected]>
---
Documentation/filesystems/fsverity.rst | 1 +
fs/verity/open.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst
index cb8e7573882a1..66cdca30ff58b 100644
--- a/Documentation/filesystems/fsverity.rst
+++ b/Documentation/filesystems/fsverity.rst
@@ -161,6 +161,7 @@ FS_IOC_ENABLE_VERITY can fail with the following errors:
- ``EBUSY``: this ioctl is already running on the file
- ``EEXIST``: the file already has verity enabled
- ``EFAULT``: the caller provided inaccessible memory
+- ``EFBIG``: the file is too large to enable verity on
- ``EINTR``: the operation was interrupted by a fatal signal
- ``EINVAL``: unsupported version, hash algorithm, or block size; or
reserved bits are set; or the file descriptor refers to neither a
diff --git a/fs/verity/open.c b/fs/verity/open.c
index ca8de73e5a0b8..09512daa22db5 100644
--- a/fs/verity/open.c
+++ b/fs/verity/open.c
@@ -92,7 +92,7 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params,
while (blocks > 1) {
if (params->num_levels >= FS_VERITY_MAX_LEVELS) {
fsverity_err(inode, "Too many levels in Merkle tree");
- err = -EINVAL;
+ err = -EFBIG;
goto out_err;
}
blocks = (blocks + params->hashes_per_block - 1) >>
--
2.39.0

2022-12-23 20:38:14

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v2 09/11] f2fs: simplify f2fs_readpage_limit()

From: Eric Biggers <[email protected]>

Now that the implementation of FS_IOC_ENABLE_VERITY has changed to not
involve reading back Merkle tree blocks that were previously written,
there is no need for f2fs_readpage_limit() to allow for this case.

Signed-off-by: Eric Biggers <[email protected]>
---
fs/f2fs/data.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 6e43e19c7d1ca..6c403e22002de 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2053,8 +2053,7 @@ int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,

static inline loff_t f2fs_readpage_limit(struct inode *inode)
{
- if (IS_ENABLED(CONFIG_FS_VERITY) &&
- (IS_VERITY(inode) || f2fs_verity_in_progress(inode)))
+ if (IS_ENABLED(CONFIG_FS_VERITY) && IS_VERITY(inode))
return inode->i_sb->s_maxbytes;

return i_size_read(inode);
--
2.39.0

2022-12-23 20:38:23

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v2 06/11] fsverity: support verification with tree block size < PAGE_SIZE

From: Eric Biggers <[email protected]>

Add support for verifying data from verity files whose Merkle tree block
size is less than the page size. The main use case for this is to allow
a single Merkle tree block size to be used across all systems, so that
only one set of fsverity file digests and signatures is needed.

To do this, eliminate various assumptions that the Merkle tree block
size and the page size are the same:

- Make fsverity_verify_page() a wrapper around a new function
fsverity_verify_blocks() which verifies one or more blocks in a page.

- When a Merkle tree block is needed, get the corresponding page and
only verify and use the needed portion. (The Merkle tree continues to
be read and cached in page-sized chunks; that doesn't need to change.)

- When the Merkle tree block size and page size differ, use a bitmap
fsverity_info::hash_block_verified to keep track of which Merkle tree
blocks have been verified, as PageChecked cannot be used directly.

Signed-off-by: Eric Biggers <[email protected]>
---
Documentation/filesystems/fsverity.rst | 49 ++--
fs/verity/fsverity_private.h | 5 +-
fs/verity/open.c | 80 ++++++-
fs/verity/verify.c | 309 ++++++++++++++++++-------
include/linux/fsverity.h | 11 +-
5 files changed, 328 insertions(+), 126 deletions(-)

diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst
index 66cdca30ff58b..0b26134ebff73 100644
--- a/Documentation/filesystems/fsverity.rst
+++ b/Documentation/filesystems/fsverity.rst
@@ -572,47 +572,44 @@ For filesystems using Linux's pagecache, the ``->read_folio()`` and
are marked Uptodate. Merely hooking ``->read_iter()`` would be
insufficient, since ``->read_iter()`` is not used for memory maps.

-Therefore, fs/verity/ provides a function fsverity_verify_page() which
-verifies a page that has been read into the pagecache of a verity
-inode, but is still locked and not Uptodate, so it's not yet readable
-by userspace. As needed to do the verification,
-fsverity_verify_page() will call back into the filesystem to read
-Merkle tree pages via fsverity_operations::read_merkle_tree_page().
-
-fsverity_verify_page() returns false if verification failed; in this
+Therefore, fs/verity/ provides the function fsverity_verify_blocks()
+which verifies data that has been read into the pagecache of a verity
+inode. The containing page must still be locked and not Uptodate, so
+it's not yet readable by userspace. As needed to do the verification,
+fsverity_verify_blocks() will call back into the filesystem to read
+hash blocks via fsverity_operations::read_merkle_tree_page().
+
+fsverity_verify_blocks() returns false if verification failed; in this
case, the filesystem must not set the page Uptodate. Following this,
as per the usual Linux pagecache behavior, attempts by userspace to
read() from the part of the file containing the page will fail with
EIO, and accesses to the page within a memory map will raise SIGBUS.

-fsverity_verify_page() currently only supports the case where the
-Merkle tree block size is equal to PAGE_SIZE (often 4096 bytes).
-
-In principle, fsverity_verify_page() verifies the entire path in the
-Merkle tree from the data page to the root hash. However, for
-efficiency the filesystem may cache the hash pages. Therefore,
-fsverity_verify_page() only ascends the tree reading hash pages until
-an already-verified hash page is seen, as indicated by the PageChecked
-bit being set. It then verifies the path to that page.
+In principle, verifying a data block requires verifying the entire
+path in the Merkle tree from the data block to the root hash.
+However, for efficiency the filesystem may cache the hash blocks.
+Therefore, fsverity_verify_blocks() only ascends the tree reading hash
+blocks until an already-verified hash block is seen. It then verifies
+the path to that block.

This optimization, which is also used by dm-verity, results in
excellent sequential read performance. This is because usually (e.g.
-127 in 128 times for 4K blocks and SHA-256) the hash page from the
+127 in 128 times for 4K blocks and SHA-256) the hash block from the
bottom level of the tree will already be cached and checked from
-reading a previous data page. However, random reads perform worse.
+reading a previous data block. However, random reads perform worse.

Block device based filesystems
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Block device based filesystems (e.g. ext4 and f2fs) in Linux also use
the pagecache, so the above subsection applies too. However, they
-also usually read many pages from a file at once, grouped into a
+also usually read many data blocks from a file at once, grouped into a
structure called a "bio". To make it easier for these types of
filesystems to support fs-verity, fs/verity/ also provides a function
-fsverity_verify_bio() which verifies all pages in a bio.
+fsverity_verify_bio() which verifies all data blocks in a bio.

ext4 and f2fs also support encryption. If a verity file is also
-encrypted, the pages must be decrypted before being verified. To
+encrypted, the data must be decrypted before being verified. To
support this, these filesystems allocate a "post-read context" for
each bio and store it in ``->bi_private``::

@@ -631,10 +628,10 @@ verification. Finally, pages where no decryption or verity error
occurred are marked Uptodate, and the pages are unlocked.

On many filesystems, files can contain holes. Normally,
-``->readahead()`` simply zeroes holes and sets the corresponding pages
-Uptodate; no bios are issued. To prevent this case from bypassing
-fs-verity, these filesystems use fsverity_verify_page() to verify hole
-pages.
+``->readahead()`` simply zeroes hole blocks and considers the
+corresponding data to be up-to-date; no bios are issued. To prevent
+this case from bypassing fs-verity, filesystems use
+fsverity_verify_blocks() to verify hole blocks.

Filesystems also disable direct I/O on verity files, since otherwise
direct I/O would bypass fs-verity.
diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index 23ded939d649f..d34dcc033d723 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -42,9 +42,11 @@ struct merkle_tree_params {
unsigned int digest_size; /* same as hash_alg->digest_size */
unsigned int block_size; /* size of data and tree blocks */
unsigned int hashes_per_block; /* number of hashes per tree block */
+ unsigned int blocks_per_page; /* PAGE_SIZE / block_size */
u8 log_digestsize; /* log2(digest_size) */
u8 log_blocksize; /* log2(block_size) */
u8 log_arity; /* log2(hashes_per_block) */
+ u8 log_blocks_per_page; /* log2(blocks_per_page) */
unsigned int num_levels; /* number of levels in Merkle tree */
u64 tree_size; /* Merkle tree size in bytes */
unsigned long tree_pages; /* Merkle tree size in pages */
@@ -70,9 +72,10 @@ struct fsverity_info {
u8 root_hash[FS_VERITY_MAX_DIGEST_SIZE];
u8 file_digest[FS_VERITY_MAX_DIGEST_SIZE];
const struct inode *inode;
+ unsigned long *hash_block_verified;
+ spinlock_t hash_page_init_lock;
};

-
#define FS_VERITY_MAX_SIGNATURE_SIZE (FS_VERITY_MAX_DESCRIPTOR_SIZE - \
sizeof(struct fsverity_descriptor))

diff --git a/fs/verity/open.c b/fs/verity/open.c
index 09512daa22db5..9366b441d01ca 100644
--- a/fs/verity/open.c
+++ b/fs/verity/open.c
@@ -56,7 +56,23 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params,
goto out_err;
}

- if (log_blocksize != PAGE_SHIFT) {
+ /*
+ * fs/verity/ directly assumes that the Merkle tree block size is a
+ * power of 2 less than or equal to PAGE_SIZE. Another restriction
+ * arises from the interaction between fs/verity/ and the filesystems
+ * themselves: filesystems expect to be able to verify a single
+ * filesystem block of data at a time. Therefore, the Merkle tree block
+ * size must also be less than or equal to the filesystem block size.
+ *
+ * The above are the only hard limitations, so in theory the Merkle tree
+ * block size could be as small as twice the digest size. However,
+ * that's not useful, and it would result in some unusually deep and
+ * large Merkle trees. So we currently require that the Merkle tree
+ * block size be at least 1024 bytes. That's small enough to test the
+ * sub-page block case on systems with 4K pages, but not too small.
+ */
+ if (log_blocksize < 10 || log_blocksize > PAGE_SHIFT ||
+ log_blocksize > inode->i_blkbits) {
fsverity_warn(inode, "Unsupported log_blocksize: %u",
log_blocksize);
err = -EINVAL;
@@ -64,6 +80,8 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params,
}
params->log_blocksize = log_blocksize;
params->block_size = 1 << log_blocksize;
+ params->log_blocks_per_page = PAGE_SHIFT - log_blocksize;
+ params->blocks_per_page = 1 << params->log_blocks_per_page;

if (WARN_ON(!is_power_of_2(params->digest_size))) {
err = -EINVAL;
@@ -108,11 +126,19 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params,
}

/*
- * Since the data, and thus also the Merkle tree, cannot have more than
- * ULONG_MAX pages, hash block indices can always fit in an
- * 'unsigned long'. To be safe, explicitly check for it too.
+ * With block_size != PAGE_SIZE, an in-memory bitmap will need to be
+ * allocated to track the "verified" status of hash blocks. Don't allow
+ * this bitmap to get too large. For now, limit it to 1 MiB, which
+ * limits the file size to about 4.4 TB with SHA-256 and 4K blocks.
+ *
+ * Together with the fact that the data, and thus also the Merkle tree,
+ * cannot have more than ULONG_MAX pages, this implies that hash block
+ * indices can always fit in an 'unsigned long'. But to be safe, we
+ * explicitly check for that too. Note, this is only for hash block
+ * indices; data block indices might not fit in an 'unsigned long'.
*/
- if (offset > ULONG_MAX) {
+ if ((params->block_size != PAGE_SIZE && offset > 1 << 23) ||
+ offset > ULONG_MAX) {
fsverity_err(inode, "Too many blocks in Merkle tree");
err = -EFBIG;
goto out_err;
@@ -170,7 +196,7 @@ struct fsverity_info *fsverity_create_info(const struct inode *inode,
fsverity_err(inode,
"Error %d initializing Merkle tree parameters",
err);
- goto out;
+ goto fail;
}

memcpy(vi->root_hash, desc->root_hash, vi->tree_params.digest_size);
@@ -179,17 +205,48 @@ struct fsverity_info *fsverity_create_info(const struct inode *inode,
vi->file_digest);
if (err) {
fsverity_err(inode, "Error %d computing file digest", err);
- goto out;
+ goto fail;
}

err = fsverity_verify_signature(vi, desc->signature,
le32_to_cpu(desc->sig_size));
-out:
- if (err) {
- fsverity_free_info(vi);
- vi = ERR_PTR(err);
+ if (err)
+ goto fail;
+
+ if (vi->tree_params.block_size != PAGE_SIZE) {
+ /*
+ * When the Merkle tree block size and page size differ, we use
+ * a bitmap to keep track of which hash blocks have been
+ * verified. This bitmap must contain one bit per hash block,
+ * including alignment to a page boundary at the end.
+ *
+ * Eventually, to support extremely large files in an efficient
+ * way, it might be necessary to make pages of this bitmap
+ * reclaimable. But for now, simply allocating the whole bitmap
+ * is a simple solution that works well on the files on which
+ * fsverity is realistically used. E.g., with SHA-256 and 4K
+ * blocks, a 100MB file only needs a 24-byte bitmap, and the
+ * bitmap for any file under 17GB fits in a 4K page.
+ */
+ unsigned long num_bits =
+ vi->tree_params.tree_pages <<
+ vi->tree_params.log_blocks_per_page;
+
+ vi->hash_block_verified = kvcalloc(BITS_TO_LONGS(num_bits),
+ sizeof(unsigned long),
+ GFP_KERNEL);
+ if (!vi->hash_block_verified) {
+ err = -ENOMEM;
+ goto fail;
+ }
+ spin_lock_init(&vi->hash_page_init_lock);
}
+
return vi;
+
+fail:
+ fsverity_free_info(vi);
+ return ERR_PTR(err);
}

void fsverity_set_info(struct inode *inode, struct fsverity_info *vi)
@@ -216,6 +273,7 @@ void fsverity_free_info(struct fsverity_info *vi)
if (!vi)
return;
kfree(vi->tree_params.hashstate);
+ kvfree(vi->hash_block_verified);
kmem_cache_free(fsverity_info_cachep, vi);
}

diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index 44df06ddcc603..e59ef9d0e21cf 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -12,35 +12,9 @@

static struct workqueue_struct *fsverity_read_workqueue;

-/**
- * hash_at_level() - compute the location of the block's hash at the given level
- *
- * @params: (in) the Merkle tree parameters
- * @dindex: (in) the index of the data block being verified
- * @level: (in) the level of hash we want (0 is leaf level)
- * @hindex: (out) the index of the hash block containing the wanted hash
- * @hoffset: (out) the byte offset to the wanted hash within the hash block
- */
-static void hash_at_level(const struct merkle_tree_params *params,
- pgoff_t dindex, unsigned int level, pgoff_t *hindex,
- unsigned int *hoffset)
-{
- pgoff_t position;
-
- /* Offset of the hash within the level's region, in hashes */
- position = dindex >> (level * params->log_arity);
-
- /* Index of the hash block in the tree overall */
- *hindex = params->level_start[level] + (position >> params->log_arity);
-
- /* Offset of the wanted hash (in bytes) within the hash block */
- *hoffset = (position & ((1 << params->log_arity) - 1)) <<
- params->log_digestsize;
-}
-
static inline int cmp_hashes(const struct fsverity_info *vi,
const u8 *want_hash, const u8 *real_hash,
- pgoff_t index, int level)
+ u64 data_pos, int level)
{
const unsigned int hsize = vi->tree_params.digest_size;

@@ -48,148 +22,310 @@ static inline int cmp_hashes(const struct fsverity_info *vi,
return 0;

fsverity_err(vi->inode,
- "FILE CORRUPTED! index=%lu, level=%d, want_hash=%s:%*phN, real_hash=%s:%*phN",
- index, level,
+ "FILE CORRUPTED! pos=%llu, level=%d, want_hash=%s:%*phN, real_hash=%s:%*phN",
+ data_pos, level,
vi->tree_params.hash_alg->name, hsize, want_hash,
vi->tree_params.hash_alg->name, hsize, real_hash);
return -EBADMSG;
}

+static bool data_is_zeroed(struct inode *inode, struct page *page,
+ unsigned int len, unsigned int offset)
+{
+ void *virt = kmap_local_page(page);
+
+ if (memchr_inv(virt + offset, 0, len)) {
+ kunmap_local(virt);
+ fsverity_err(inode,
+ "FILE CORRUPTED! Data past EOF is not zeroed");
+ return false;
+ }
+ kunmap_local(virt);
+ return true;
+}
+
+/*
+ * Returns true if the hash block with index @hblock_idx in the tree, located in
+ * @hpage, has already been verified.
+ */
+static bool is_hash_block_verified(struct fsverity_info *vi, struct page *hpage,
+ unsigned long hblock_idx)
+{
+ bool verified;
+ unsigned int blocks_per_page;
+ unsigned int i;
+
+ /*
+ * When the Merkle tree block size and page size are the same, then the
+ * ->hash_block_verified bitmap isn't allocated, and we use PG_checked
+ * to directly indicate whether the page's block has been verified.
+ *
+ * Using PG_checked also guarantees that we re-verify hash pages that
+ * get evicted and re-instantiated from the backing storage, as new
+ * pages always start out with PG_checked cleared.
+ */
+ if (!vi->hash_block_verified)
+ return PageChecked(hpage);
+
+ /*
+ * When the Merkle tree block size and page size differ, we use a bitmap
+ * to indicate whether each hash block has been verified.
+ *
+ * However, we still need to ensure that hash pages that get evicted and
+ * re-instantiated from the backing storage are re-verified. To do
+ * this, we use PG_checked again, but now it doesn't really mean
+ * "checked". Instead, now it just serves as an indicator for whether
+ * the hash page is newly instantiated or not.
+ *
+ * The first thread that sees PG_checked=0 must clear the corresponding
+ * bitmap bits, then set PG_checked=1. This requires a spinlock. To
+ * avoid having to take this spinlock in the common case of
+ * PG_checked=1, we start with an opportunistic lockless read.
+ */
+ if (PageChecked(hpage)) {
+ /*
+ * A read memory barrier is needed here to give ACQUIRE
+ * semantics to the above PageChecked() test.
+ */
+ smp_rmb();
+ return test_bit(hblock_idx, vi->hash_block_verified);
+ }
+ spin_lock(&vi->hash_page_init_lock);
+ if (PageChecked(hpage)) {
+ verified = test_bit(hblock_idx, vi->hash_block_verified);
+ } else {
+ blocks_per_page = vi->tree_params.blocks_per_page;
+ hblock_idx = round_down(hblock_idx, blocks_per_page);
+ for (i = 0; i < blocks_per_page; i++)
+ clear_bit(hblock_idx + i, vi->hash_block_verified);
+ /*
+ * A write memory barrier is needed here to give RELEASE
+ * semantics to the below SetPageChecked() operation.
+ */
+ smp_wmb();
+ SetPageChecked(hpage);
+ verified = false;
+ }
+ spin_unlock(&vi->hash_page_init_lock);
+ return verified;
+}
+
/*
- * Verify a single data page against the file's Merkle tree.
+ * Verify a single data block against the file's Merkle tree.
*
* In principle, we need to verify the entire path to the root node. However,
- * for efficiency the filesystem may cache the hash pages. Therefore we need
- * only ascend the tree until an already-verified page is seen, as indicated by
- * the PageChecked bit being set; then verify the path to that page.
+ * for efficiency the filesystem may cache the hash blocks. Therefore we need
+ * only ascend the tree until an already-verified hash block is seen, and then
+ * verify the path to that block.
*
- * This code currently only supports the case where the verity block size is
- * equal to PAGE_SIZE. Doing otherwise would be possible but tricky, since we
- * wouldn't be able to use the PageChecked bit.
- *
- * Note that multiple processes may race to verify a hash page and mark it
- * Checked, but it doesn't matter; the result will be the same either way.
- *
- * Return: true if the page is valid, else false.
+ * Return: %true if the data block is valid, else %false.
*/
-static bool verify_page(struct inode *inode, const struct fsverity_info *vi,
- struct ahash_request *req, struct page *data_page,
- unsigned long max_ra_pages)
+static bool
+verify_data_block(struct inode *inode, struct fsverity_info *vi,
+ struct ahash_request *req, struct page *data_page,
+ u64 data_pos, unsigned int dblock_offset_in_page,
+ unsigned long max_ra_pages)
{
const struct merkle_tree_params *params = &vi->tree_params;
const unsigned int hsize = params->digest_size;
- const pgoff_t index = data_page->index;
int level;
u8 _want_hash[FS_VERITY_MAX_DIGEST_SIZE];
const u8 *want_hash;
u8 real_hash[FS_VERITY_MAX_DIGEST_SIZE];
- struct page *hpages[FS_VERITY_MAX_LEVELS];
- unsigned int hoffsets[FS_VERITY_MAX_LEVELS];
+ /* The hash blocks that are traversed, indexed by level */
+ struct {
+ /* Page containing the hash block */
+ struct page *page;
+ /* Index of the hash block in the tree overall */
+ unsigned long index;
+ /* Byte offset of the hash block within @page */
+ unsigned int offset_in_page;
+ /* Byte offset of the wanted hash within @page */
+ unsigned int hoffset;
+ } hblocks[FS_VERITY_MAX_LEVELS];
+ /*
+ * The index of the previous level's block within that level; also the
+ * index of that block's hash within the current level.
+ */
+ u64 hidx = data_pos >> params->log_blocksize;
int err;

- if (WARN_ON_ONCE(!PageLocked(data_page) || PageUptodate(data_page)))
- return false;
+ if (unlikely(data_pos >= inode->i_size)) {
+ /*
+ * This can happen in the data page spanning EOF when the Merkle
+ * tree block size is less than the page size. The Merkle tree
+ * doesn't cover data blocks fully past EOF. But the entire
+ * page spanning EOF can be visible to userspace via a mmap, and
+ * any part past EOF should be all zeroes. Therefore, we need
+ * to verify that any data blocks fully past EOF are all zeroes.
+ */
+ return data_is_zeroed(inode, data_page, params->block_size,
+ dblock_offset_in_page);
+ }

/*
- * Starting at the leaf level, ascend the tree saving hash pages along
- * the way until we find a verified hash page, indicated by PageChecked;
- * or until we reach the root.
+ * Starting at the leaf level, ascend the tree saving hash blocks along
+ * the way until we find a hash block that has already been verified, or
+ * until we reach the root.
*/
for (level = 0; level < params->num_levels; level++) {
- pgoff_t hindex;
+ unsigned long next_hidx;
+ unsigned long hblock_idx;
+ pgoff_t hpage_idx;
+ unsigned int hblock_offset_in_page;
unsigned int hoffset;
struct page *hpage;

- hash_at_level(params, index, level, &hindex, &hoffset);
+ /*
+ * The index of the block in the current level; also the index
+ * of that block's hash within the next level.
+ */
+ next_hidx = hidx >> params->log_arity;
+
+ /* Index of the hash block in the tree overall */
+ hblock_idx = params->level_start[level] + next_hidx;
+
+ /* Index of the hash page in the tree overall */
+ hpage_idx = hblock_idx >> params->log_blocks_per_page;
+
+ /* Byte offset of the hash block within the page */
+ hblock_offset_in_page =
+ (hblock_idx << params->log_blocksize) & ~PAGE_MASK;
+
+ /* Byte offset of the hash within the page */
+ hoffset = hblock_offset_in_page +
+ ((hidx << params->log_digestsize) &
+ (params->block_size - 1));

- hpage = inode->i_sb->s_vop->read_merkle_tree_page(inode, hindex,
- level == 0 ? min(max_ra_pages,
- params->tree_pages - hindex) : 0);
+ hpage = inode->i_sb->s_vop->read_merkle_tree_page(inode,
+ hpage_idx, level == 0 ? min(max_ra_pages,
+ params->tree_pages - hpage_idx) : 0);
if (IS_ERR(hpage)) {
err = PTR_ERR(hpage);
fsverity_err(inode,
"Error %d reading Merkle tree page %lu",
- err, hindex);
+ err, hpage_idx);
goto out;
}
-
- if (PageChecked(hpage)) {
+ if (is_hash_block_verified(vi, hpage, hblock_idx)) {
memcpy_from_page(_want_hash, hpage, hoffset, hsize);
want_hash = _want_hash;
put_page(hpage);
goto descend;
}
- hpages[level] = hpage;
- hoffsets[level] = hoffset;
+ hblocks[level].page = hpage;
+ hblocks[level].index = hblock_idx;
+ hblocks[level].offset_in_page = hblock_offset_in_page;
+ hblocks[level].hoffset = hoffset;
+ hidx = next_hidx;
}

want_hash = vi->root_hash;
descend:
/* Descend the tree verifying hash blocks. */
for (; level > 0; level--) {
- struct page *hpage = hpages[level - 1];
- unsigned int hoffset = hoffsets[level - 1];
-
- err = fsverity_hash_block(params, inode, req, hpage, 0,
- real_hash);
+ struct page *hpage = hblocks[level - 1].page;
+ unsigned long hblock_idx = hblocks[level - 1].index;
+ unsigned int hblock_offset_in_page =
+ hblocks[level - 1].offset_in_page;
+ unsigned int hoffset = hblocks[level - 1].hoffset;
+
+ err = fsverity_hash_block(params, inode, req, hpage,
+ hblock_offset_in_page, real_hash);
if (err)
goto out;
- err = cmp_hashes(vi, want_hash, real_hash, index, level - 1);
+ err = cmp_hashes(vi, want_hash, real_hash, data_pos, level - 1);
if (err)
goto out;
- SetPageChecked(hpage);
+ /*
+ * Mark the hash block as verified. This must be atomic and
+ * idempotent, as the same hash block might be verified by
+ * multiple threads concurrently.
+ */
+ if (vi->hash_block_verified)
+ set_bit(hblock_idx, vi->hash_block_verified);
+ else
+ SetPageChecked(hpage);
memcpy_from_page(_want_hash, hpage, hoffset, hsize);
want_hash = _want_hash;
put_page(hpage);
}

/* Finally, verify the data block. */
- err = fsverity_hash_block(params, inode, req, data_page, 0, real_hash);
+ err = fsverity_hash_block(params, inode, req, data_page,
+ dblock_offset_in_page, real_hash);
if (err)
goto out;
- err = cmp_hashes(vi, want_hash, real_hash, index, -1);
+ err = cmp_hashes(vi, want_hash, real_hash, data_pos, -1);
out:
for (; level > 0; level--)
- put_page(hpages[level - 1]);
+ put_page(hblocks[level - 1].page);

return err == 0;
}

+static bool
+verify_data_blocks(struct inode *inode, struct fsverity_info *vi,
+ struct ahash_request *req, struct page *data_page,
+ unsigned int len, unsigned int offset,
+ unsigned long max_ra_pages)
+{
+ const unsigned int block_size = vi->tree_params.block_size;
+ u64 pos = (u64)data_page->index << PAGE_SHIFT;
+
+ if (WARN_ON_ONCE(len <= 0 || !IS_ALIGNED(len | offset, block_size)))
+ return false;
+ if (WARN_ON_ONCE(!PageLocked(data_page) || PageUptodate(data_page)))
+ return false;
+ do {
+ if (!verify_data_block(inode, vi, req, data_page,
+ pos + offset, offset, max_ra_pages))
+ return false;
+ offset += block_size;
+ len -= block_size;
+ } while (len);
+ return true;
+}
+
/**
- * fsverity_verify_page() - verify a data page
- * @page: the page to verity
+ * fsverity_verify_blocks() - verify data in a page
+ * @page: the page containing the data to verify
+ * @len: the length of the data to verify in the page
+ * @offset: the offset of the data to verify in the page
*
- * Verify a page that has just been read from a verity file. The page must be a
- * pagecache page that is still locked and not yet uptodate.
+ * Verify data that has just been read from a verity file. The data must be
+ * located in a pagecache page that is still locked and not yet uptodate. The
+ * length and offset of the data must be Merkle tree block size aligned.
*
- * Return: true if the page is valid, else false.
+ * Return: %true if the data is valid, else %false.
*/
-bool fsverity_verify_page(struct page *page)
+bool fsverity_verify_blocks(struct page *page, unsigned int len,
+ unsigned int offset)
{
struct inode *inode = page->mapping->host;
- const struct fsverity_info *vi = inode->i_verity_info;
+ struct fsverity_info *vi = inode->i_verity_info;
struct ahash_request *req;
bool valid;

/* This allocation never fails, since it's mempool-backed. */
req = fsverity_alloc_hash_request(vi->tree_params.hash_alg, GFP_NOFS);

- valid = verify_page(inode, vi, req, page, 0);
+ valid = verify_data_blocks(inode, vi, req, page, len, offset, 0);

fsverity_free_hash_request(vi->tree_params.hash_alg, req);

return valid;
}
-EXPORT_SYMBOL_GPL(fsverity_verify_page);
+EXPORT_SYMBOL_GPL(fsverity_verify_blocks);

#ifdef CONFIG_BLOCK
/**
* fsverity_verify_bio() - verify a 'read' bio that has just completed
* @bio: the bio to verify
*
- * Verify a set of pages that have just been read from a verity file. The pages
- * must be pagecache pages that are still locked and not yet uptodate. If a
- * page fails verification, then bio->bi_status is set to an error status.
+ * Verify the bio's data against the file's Merkle tree. All bio data segments
+ * must be aligned to the file's Merkle tree block size. If any data fails
+ * verification, then bio->bi_status is set to an error status.
*
* This is a helper function for use by the ->readahead() method of filesystems
* that issue bios to read data directly into the page cache. Filesystems that
@@ -200,7 +336,7 @@ EXPORT_SYMBOL_GPL(fsverity_verify_page);
void fsverity_verify_bio(struct bio *bio)
{
struct inode *inode = bio_first_page_all(bio)->mapping->host;
- const struct fsverity_info *vi = inode->i_verity_info;
+ struct fsverity_info *vi = inode->i_verity_info;
struct ahash_request *req;
struct bio_vec *bv;
struct bvec_iter_all iter_all;
@@ -223,7 +359,8 @@ void fsverity_verify_bio(struct bio *bio)
}

bio_for_each_segment_all(bv, bio, iter_all) {
- if (!verify_page(inode, vi, req, bv->bv_page, max_ra_pages)) {
+ if (!verify_data_blocks(inode, vi, req, bv->bv_page, bv->bv_len,
+ bv->bv_offset, max_ra_pages)) {
bio->bi_status = BLK_STS_IOERR;
break;
}
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index f5ed7ecfd9ab2..6ecc51f80221a 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -170,7 +170,8 @@ int fsverity_ioctl_read_metadata(struct file *filp, const void __user *uarg);

/* verify.c */

-bool fsverity_verify_page(struct page *page);
+bool fsverity_verify_blocks(struct page *page, unsigned int len,
+ unsigned int offset);
void fsverity_verify_bio(struct bio *bio);
void fsverity_enqueue_verify_work(struct work_struct *work);

@@ -230,7 +231,8 @@ static inline int fsverity_ioctl_read_metadata(struct file *filp,

/* verify.c */

-static inline bool fsverity_verify_page(struct page *page)
+static inline bool fsverity_verify_blocks(struct page *page, unsigned int len,
+ unsigned int offset)
{
WARN_ON(1);
return false;
@@ -248,6 +250,11 @@ static inline void fsverity_enqueue_verify_work(struct work_struct *work)

#endif /* !CONFIG_FS_VERITY */

+static inline bool fsverity_verify_page(struct page *page)
+{
+ return fsverity_verify_blocks(page, PAGE_SIZE, 0);
+}
+
/**
* fsverity_active() - do reads from the inode need to go through fs-verity?
* @inode: inode to check
--
2.39.0

2022-12-23 20:38:49

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v2 05/11] fsverity: replace fsverity_hash_page() with fsverity_hash_block()

From: Eric Biggers <[email protected]>

In preparation for allowing the Merkle tree block size to differ from
PAGE_SIZE, replace fsverity_hash_page() with fsverity_hash_block(). The
new function is similar to the old one, but it operates on the block at
the given offset in the page instead of on the full page.

(For now, all callers still pass a full page.)

Signed-off-by: Eric Biggers <[email protected]>
---
fs/verity/enable.c | 4 ++--
fs/verity/fsverity_private.h | 6 +++---
fs/verity/hash_algs.c | 24 +++++++++++-------------
fs/verity/verify.c | 9 +++++----
4 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/fs/verity/enable.c b/fs/verity/enable.c
index 8a9189d479837..144483319f1a3 100644
--- a/fs/verity/enable.c
+++ b/fs/verity/enable.c
@@ -99,8 +99,8 @@ static int build_merkle_tree_level(struct file *filp, unsigned int level,
}
}

- err = fsverity_hash_page(params, inode, req, src_page,
- &pending_hashes[pending_size]);
+ err = fsverity_hash_block(params, inode, req, src_page, 0,
+ &pending_hashes[pending_size]);
put_page(src_page);
if (err)
return err;
diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index fc1c2797fab19..23ded939d649f 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -88,9 +88,9 @@ void fsverity_free_hash_request(struct fsverity_hash_alg *alg,
struct ahash_request *req);
const u8 *fsverity_prepare_hash_state(struct fsverity_hash_alg *alg,
const u8 *salt, size_t salt_size);
-int fsverity_hash_page(const struct merkle_tree_params *params,
- const struct inode *inode,
- struct ahash_request *req, struct page *page, u8 *out);
+int fsverity_hash_block(const struct merkle_tree_params *params,
+ const struct inode *inode, struct ahash_request *req,
+ struct page *page, unsigned int offset, u8 *out);
int fsverity_hash_buffer(struct fsverity_hash_alg *alg,
const void *data, size_t size, u8 *out);
void __init fsverity_check_hash_algs(void);
diff --git a/fs/verity/hash_algs.c b/fs/verity/hash_algs.c
index 6f8170cf4ae71..13fcf31be8441 100644
--- a/fs/verity/hash_algs.c
+++ b/fs/verity/hash_algs.c
@@ -220,35 +220,33 @@ const u8 *fsverity_prepare_hash_state(struct fsverity_hash_alg *alg,
}

/**
- * fsverity_hash_page() - hash a single data or hash page
+ * fsverity_hash_block() - hash a single data or hash block
* @params: the Merkle tree's parameters
* @inode: inode for which the hashing is being done
* @req: preallocated hash request
- * @page: the page to hash
+ * @page: the page containing the block to hash
+ * @offset: the offset of the block within @page
* @out: output digest, size 'params->digest_size' bytes
*
- * Hash a single data or hash block, assuming block_size == PAGE_SIZE.
- * The hash is salted if a salt is specified in the Merkle tree parameters.
+ * Hash a single data or hash block. The hash is salted if a salt is specified
+ * in the Merkle tree parameters.
*
* Return: 0 on success, -errno on failure
*/
-int fsverity_hash_page(const struct merkle_tree_params *params,
- const struct inode *inode,
- struct ahash_request *req, struct page *page, u8 *out)
+int fsverity_hash_block(const struct merkle_tree_params *params,
+ const struct inode *inode, struct ahash_request *req,
+ struct page *page, unsigned int offset, u8 *out)
{
struct scatterlist sg;
DECLARE_CRYPTO_WAIT(wait);
int err;

- if (WARN_ON(params->block_size != PAGE_SIZE))
- return -EINVAL;
-
sg_init_table(&sg, 1);
- sg_set_page(&sg, page, PAGE_SIZE, 0);
+ sg_set_page(&sg, page, params->block_size, offset);
ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP |
CRYPTO_TFM_REQ_MAY_BACKLOG,
crypto_req_done, &wait);
- ahash_request_set_crypt(req, &sg, out, PAGE_SIZE);
+ ahash_request_set_crypt(req, &sg, out, params->block_size);

if (params->hashstate) {
err = crypto_ahash_import(req, params->hashstate);
@@ -264,7 +262,7 @@ int fsverity_hash_page(const struct merkle_tree_params *params,

err = crypto_wait_req(err, &wait);
if (err)
- fsverity_err(inode, "Error %d computing page hash", err);
+ fsverity_err(inode, "Error %d computing block hash", err);
return err;
}

diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index d2fcb6a21ea8e..44df06ddcc603 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -125,12 +125,13 @@ static bool verify_page(struct inode *inode, const struct fsverity_info *vi,

want_hash = vi->root_hash;
descend:
- /* Descend the tree verifying hash pages */
+ /* Descend the tree verifying hash blocks. */
for (; level > 0; level--) {
struct page *hpage = hpages[level - 1];
unsigned int hoffset = hoffsets[level - 1];

- err = fsverity_hash_page(params, inode, req, hpage, real_hash);
+ err = fsverity_hash_block(params, inode, req, hpage, 0,
+ real_hash);
if (err)
goto out;
err = cmp_hashes(vi, want_hash, real_hash, index, level - 1);
@@ -142,8 +143,8 @@ static bool verify_page(struct inode *inode, const struct fsverity_info *vi,
put_page(hpage);
}

- /* Finally, verify the data page */
- err = fsverity_hash_page(params, inode, req, data_page, real_hash);
+ /* Finally, verify the data block. */
+ err = fsverity_hash_block(params, inode, req, data_page, 0, real_hash);
if (err)
goto out;
err = cmp_hashes(vi, want_hash, real_hash, index, -1);
--
2.39.0

2022-12-23 20:38:50

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v2 10/11] fs/buffer.c: support fsverity in block_read_full_folio()

From: Eric Biggers <[email protected]>

After each filesystem block (as represented by a buffer_head) has been
read from disk by block_read_full_folio(), verify it if needed. The
verification is done on the fsverity_read_workqueue. Also allow reads
of verity metadata past i_size, as required by ext4.

This is needed to support fsverity on ext4 filesystems where the
filesystem block size is less than the page size.

The new code is compiled away when CONFIG_FS_VERITY=n.

Signed-off-by: Eric Biggers <[email protected]>
---
fs/buffer.c | 67 +++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 57 insertions(+), 10 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index d9c6d1fbb6dde..2e65ba2b3919b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -48,6 +48,7 @@
#include <linux/sched/mm.h>
#include <trace/events/block.h>
#include <linux/fscrypt.h>
+#include <linux/fsverity.h>

#include "internal.h"

@@ -295,20 +296,52 @@ static void end_buffer_async_read(struct buffer_head *bh, int uptodate)
return;
}

-struct decrypt_bh_ctx {
+struct postprocess_bh_ctx {
struct work_struct work;
struct buffer_head *bh;
};

+static void verify_bh(struct work_struct *work)
+{
+ struct postprocess_bh_ctx *ctx =
+ container_of(work, struct postprocess_bh_ctx, work);
+ struct buffer_head *bh = ctx->bh;
+ bool valid;
+
+ valid = fsverity_verify_blocks(bh->b_page, bh->b_size, bh_offset(bh));
+ end_buffer_async_read(bh, valid);
+ kfree(ctx);
+}
+
+static bool need_fsverity(struct buffer_head *bh)
+{
+ struct page *page = bh->b_page;
+ struct inode *inode = page->mapping->host;
+
+ return fsverity_active(inode) &&
+ /* needed by ext4 */
+ page->index < DIV_ROUND_UP(inode->i_size, PAGE_SIZE);
+}
+
static void decrypt_bh(struct work_struct *work)
{
- struct decrypt_bh_ctx *ctx =
- container_of(work, struct decrypt_bh_ctx, work);
+ struct postprocess_bh_ctx *ctx =
+ container_of(work, struct postprocess_bh_ctx, work);
struct buffer_head *bh = ctx->bh;
int err;

err = fscrypt_decrypt_pagecache_blocks(bh->b_page, bh->b_size,
bh_offset(bh));
+ if (err == 0 && need_fsverity(bh)) {
+ /*
+ * We use different work queues for decryption and for verity
+ * because verity may require reading metadata pages that need
+ * decryption, and we shouldn't recurse to the same workqueue.
+ */
+ INIT_WORK(&ctx->work, verify_bh);
+ fsverity_enqueue_verify_work(&ctx->work);
+ return;
+ }
end_buffer_async_read(bh, err == 0);
kfree(ctx);
}
@@ -319,15 +352,24 @@ static void decrypt_bh(struct work_struct *work)
*/
static void end_buffer_async_read_io(struct buffer_head *bh, int uptodate)
{
- /* Decrypt if needed */
- if (uptodate &&
- fscrypt_inode_uses_fs_layer_crypto(bh->b_page->mapping->host)) {
- struct decrypt_bh_ctx *ctx = kmalloc(sizeof(*ctx), GFP_ATOMIC);
+ struct inode *inode = bh->b_page->mapping->host;
+ bool decrypt = fscrypt_inode_uses_fs_layer_crypto(inode);
+ bool verify = need_fsverity(bh);
+
+ /* Decrypt (with fscrypt) and/or verify (with fsverity) if needed. */
+ if (uptodate && (decrypt || verify)) {
+ struct postprocess_bh_ctx *ctx =
+ kmalloc(sizeof(*ctx), GFP_ATOMIC);

if (ctx) {
- INIT_WORK(&ctx->work, decrypt_bh);
ctx->bh = bh;
- fscrypt_enqueue_decrypt_work(&ctx->work);
+ if (decrypt) {
+ INIT_WORK(&ctx->work, decrypt_bh);
+ fscrypt_enqueue_decrypt_work(&ctx->work);
+ } else {
+ INIT_WORK(&ctx->work, verify_bh);
+ fsverity_enqueue_verify_work(&ctx->work);
+ }
return;
}
uptodate = 0;
@@ -2245,6 +2287,11 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
int nr, i;
int fully_mapped = 1;
bool page_error = false;
+ loff_t limit = i_size_read(inode);
+
+ /* This is needed for ext4. */
+ if (IS_ENABLED(CONFIG_FS_VERITY) && IS_VERITY(inode))
+ limit = inode->i_sb->s_maxbytes;

VM_BUG_ON_FOLIO(folio_test_large(folio), folio);

@@ -2253,7 +2300,7 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
bbits = block_size_bits(blocksize);

iblock = (sector_t)folio->index << (PAGE_SHIFT - bbits);
- lblock = (i_size_read(inode)+blocksize-1) >> bbits;
+ lblock = (limit+blocksize-1) >> bbits;
bh = head;
nr = 0;
i = 0;
--
2.39.0

2022-12-23 20:40:18

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v2 08/11] ext4: simplify ext4_readpage_limit()

From: Eric Biggers <[email protected]>

Now that the implementation of FS_IOC_ENABLE_VERITY has changed to not
involve reading back Merkle tree blocks that were previously written,
there is no need for ext4_readpage_limit() to allow for this case.

Signed-off-by: Eric Biggers <[email protected]>
---
fs/ext4/readpage.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index d5266932ce6cd..c61dc8a7c0147 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -211,8 +211,7 @@ static void ext4_set_bio_post_read_ctx(struct bio *bio,

static inline loff_t ext4_readpage_limit(struct inode *inode)
{
- if (IS_ENABLED(CONFIG_FS_VERITY) &&
- (IS_VERITY(inode) || ext4_verity_in_progress(inode)))
+ if (IS_ENABLED(CONFIG_FS_VERITY) && IS_VERITY(inode))
return inode->i_sb->s_maxbytes;

return i_size_read(inode);
--
2.39.0

2022-12-23 20:40:21

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v2 11/11] ext4: allow verity with fs block size < PAGE_SIZE

From: Eric Biggers <[email protected]>

Now that the needed changes have been made to fs/buffer.c, ext4 is ready
to support the verity feature when the filesystem block size is less
than the page size. So remove the mount-time check that prevented this.

Signed-off-by: Eric Biggers <[email protected]>
---
Documentation/filesystems/fsverity.rst | 8 +++++---
fs/ext4/super.c | 5 -----
2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst
index 948d202545240..c0c8a25b41bb8 100644
--- a/Documentation/filesystems/fsverity.rst
+++ b/Documentation/filesystems/fsverity.rst
@@ -497,9 +497,11 @@ To create verity files on an ext4 filesystem, the filesystem must have
been formatted with ``-O verity`` or had ``tune2fs -O verity`` run on
it. "verity" is an RO_COMPAT filesystem feature, so once set, old
kernels will only be able to mount the filesystem readonly, and old
-versions of e2fsck will be unable to check the filesystem. Moreover,
-currently ext4 only supports mounting a filesystem with the "verity"
-feature when its block size is equal to PAGE_SIZE (often 4096 bytes).
+versions of e2fsck will be unable to check the filesystem.
+
+Originally, an ext4 filesystem with the "verity" feature could only be
+mounted when its block size was equal to the system page size
+(typically 4096 bytes). In Linux v6.3, this limitation was removed.

ext4 sets the EXT4_VERITY_FL on-disk inode flag on verity files. It
can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be cleared.
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 16a343e8047d4..798cb19e2258b 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5336,11 +5336,6 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
}
}

- if (ext4_has_feature_verity(sb) && sb->s_blocksize != PAGE_SIZE) {
- ext4_msg(sb, KERN_ERR, "Unsupported blocksize for fs-verity");
- goto failed_mount_wq;
- }
-
/*
* Get the # of file system overhead blocks from the
* superblock if present.
--
2.39.0

2022-12-23 20:40:38

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v2 03/11] fsverity: store log2(digest_size) precomputed

From: Eric Biggers <[email protected]>

Add log_digestsize to struct merkle_tree_params so that it can be used
in verify.c. Also save memory by using u8 for all the log_* fields.

Signed-off-by: Eric Biggers <[email protected]>
---
fs/verity/fsverity_private.h | 5 +++--
fs/verity/open.c | 3 ++-
fs/verity/verify.c | 2 +-
3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index 48b97f5d05569..fc1c2797fab19 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -42,8 +42,9 @@ struct merkle_tree_params {
unsigned int digest_size; /* same as hash_alg->digest_size */
unsigned int block_size; /* size of data and tree blocks */
unsigned int hashes_per_block; /* number of hashes per tree block */
- unsigned int log_blocksize; /* log2(block_size) */
- unsigned int log_arity; /* log2(hashes_per_block) */
+ u8 log_digestsize; /* log2(digest_size) */
+ u8 log_blocksize; /* log2(block_size) */
+ u8 log_arity; /* log2(hashes_per_block) */
unsigned int num_levels; /* number of levels in Merkle tree */
u64 tree_size; /* Merkle tree size in bytes */
unsigned long tree_pages; /* Merkle tree size in pages */
diff --git a/fs/verity/open.c b/fs/verity/open.c
index e356eefb54d7b..ca8de73e5a0b8 100644
--- a/fs/verity/open.c
+++ b/fs/verity/open.c
@@ -76,7 +76,8 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params,
err = -EINVAL;
goto out_err;
}
- params->log_arity = params->log_blocksize - ilog2(params->digest_size);
+ params->log_digestsize = ilog2(params->digest_size);
+ params->log_arity = log_blocksize - params->log_digestsize;
params->hashes_per_block = 1 << params->log_arity;

/*
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index 4c57a1bd01afc..d2fcb6a21ea8e 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -35,7 +35,7 @@ static void hash_at_level(const struct merkle_tree_params *params,

/* Offset of the wanted hash (in bytes) within the hash block */
*hoffset = (position & ((1 << params->log_arity) - 1)) <<
- (params->log_blocksize - params->log_arity);
+ params->log_digestsize;
}

static inline int cmp_hashes(const struct fsverity_info *vi,
--
2.39.0

2022-12-23 20:40:43

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v2 07/11] fsverity: support enabling with tree block size < PAGE_SIZE

From: Eric Biggers <[email protected]>

Make FS_IOC_ENABLE_VERITY support values of
fsverity_enable_arg::block_size other than PAGE_SIZE.

To make this possible, rework build_merkle_tree(), which was reading
data and hash pages from the file and assuming that they were the same
thing as "blocks".

For reading the data blocks, just replace the direct pagecache access
with __kernel_read(), to naturally read one block at a time.

(A disadvantage of the above is that we lose the two optimizations of
hashing the pagecache pages in-place and forcing the maximum readahead.
That shouldn't be very important, though.)

The hash block reads are a bit more difficult to handle, as the only way
to do them is through fsverity_operations::read_merkle_tree_page().

Instead, let's switch to the single-pass tree construction algorithm
that fsverity-utils uses. This eliminates the need to read back any
hash blocks while the tree is being built, at the small cost of an extra
block-sized memory buffer per Merkle tree level. This is probably what
I should have done originally.

Taken together, the above two changes result in page-size independent
code that is also a bit simpler than what we had before.

Signed-off-by: Eric Biggers <[email protected]>
---
Documentation/filesystems/fsverity.rst | 20 +-
fs/verity/enable.c | 260 ++++++++++++-------------
include/linux/fsverity.h | 3 +-
3 files changed, 134 insertions(+), 149 deletions(-)

diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst
index 0b26134ebff73..948d202545240 100644
--- a/Documentation/filesystems/fsverity.rst
+++ b/Documentation/filesystems/fsverity.rst
@@ -118,10 +118,11 @@ as follows:
- ``hash_algorithm`` must be the identifier for the hash algorithm to
use for the Merkle tree, such as FS_VERITY_HASH_ALG_SHA256. See
``include/uapi/linux/fsverity.h`` for the list of possible values.
-- ``block_size`` must be the Merkle tree block size. Currently, this
- must be equal to the system page size, which is usually 4096 bytes.
- Other sizes may be supported in the future. This value is not
- necessarily the same as the filesystem block size.
+- ``block_size`` is the Merkle tree block size, in bytes. In Linux
+ v6.3 and later, this can be any power of 2 between (inclusively)
+ 1024 and the minimum of the system page size and the filesystem
+ block size. In earlier versions, the page size was the only allowed
+ value.
- ``salt_size`` is the size of the salt in bytes, or 0 if no salt is
provided. The salt is a value that is prepended to every hashed
block; it can be used to personalize the hashing for a particular
@@ -519,9 +520,7 @@ support paging multi-gigabyte xattrs into memory, and to support
encrypting xattrs. Note that the verity metadata *must* be encrypted
when the file is, since it contains hashes of the plaintext data.

-Currently, ext4 verity only supports the case where the Merkle tree
-block size, filesystem block size, and page size are all the same. It
-also only supports extent-based files.
+ext4 only allows verity on extent-based files.

f2fs
----
@@ -539,11 +538,10 @@ Like ext4, f2fs stores the verity metadata (Merkle tree and
fsverity_descriptor) past the end of the file, starting at the first
64K boundary beyond i_size. See explanation for ext4 above.
Moreover, f2fs supports at most 4096 bytes of xattr entries per inode
-which wouldn't be enough for even a single Merkle tree block.
+which usually wouldn't be enough for even a single Merkle tree block.

-Currently, f2fs verity only supports a Merkle tree block size of 4096.
-Also, f2fs doesn't support enabling verity on files that currently
-have atomic or volatile writes pending.
+f2fs doesn't support enabling verity on files that currently have
+atomic or volatile writes pending.

btrfs
-----
diff --git a/fs/verity/enable.c b/fs/verity/enable.c
index 144483319f1a3..e13db6507b38b 100644
--- a/fs/verity/enable.c
+++ b/fs/verity/enable.c
@@ -7,132 +7,50 @@

#include "fsverity_private.h"

-#include <crypto/hash.h>
-#include <linux/backing-dev.h>
#include <linux/mount.h>
#include <linux/pagemap.h>
#include <linux/sched/signal.h>
#include <linux/uaccess.h>

-/*
- * Read a file data page for Merkle tree construction. Do aggressive readahead,
- * since we're sequentially reading the entire file.
- */
-static struct page *read_file_data_page(struct file *file, pgoff_t index,
- struct file_ra_state *ra,
- unsigned long remaining_pages)
-{
- DEFINE_READAHEAD(ractl, file, ra, file->f_mapping, index);
- struct folio *folio;
-
- folio = __filemap_get_folio(ractl.mapping, index, FGP_ACCESSED, 0);
- if (!folio || !folio_test_uptodate(folio)) {
- if (folio)
- folio_put(folio);
- else
- page_cache_sync_ra(&ractl, remaining_pages);
- folio = read_cache_folio(ractl.mapping, index, NULL, file);
- if (IS_ERR(folio))
- return &folio->page;
- }
- if (folio_test_readahead(folio))
- page_cache_async_ra(&ractl, folio, remaining_pages);
- return folio_file_page(folio, index);
-}
+struct block_buffer {
+ u32 filled;
+ u8 *data;
+};

-static int build_merkle_tree_level(struct file *filp, unsigned int level,
- u64 num_blocks_to_hash,
- const struct merkle_tree_params *params,
- u8 *pending_hashes,
- struct ahash_request *req)
+/* Hash a block, writing the result to the next level's pending block buffer. */
+static int hash_one_block(struct inode *inode,
+ const struct merkle_tree_params *params,
+ struct ahash_request *req, struct block_buffer *cur)
{
- struct inode *inode = file_inode(filp);
- const struct fsverity_operations *vops = inode->i_sb->s_vop;
- struct file_ra_state ra = { 0 };
- unsigned int pending_size = 0;
- u64 dst_block_num;
- u64 i;
+ struct block_buffer *next = cur + 1;
int err;

- if (WARN_ON(params->block_size != PAGE_SIZE)) /* checked earlier too */
- return -EINVAL;
-
- if (level < params->num_levels) {
- dst_block_num = params->level_start[level];
- } else {
- if (WARN_ON(num_blocks_to_hash != 1))
- return -EINVAL;
- dst_block_num = 0; /* unused */
- }
+ /* Zero-pad the block if it's shorter than the block size. */
+ memset(&cur->data[cur->filled], 0, params->block_size - cur->filled);

- file_ra_state_init(&ra, filp->f_mapping);
-
- for (i = 0; i < num_blocks_to_hash; i++) {
- struct page *src_page;
-
- if (level == 0) {
- /* Leaf: hashing a data block */
- src_page = read_file_data_page(filp, i, &ra,
- num_blocks_to_hash - i);
- if (IS_ERR(src_page)) {
- err = PTR_ERR(src_page);
- fsverity_err(inode,
- "Error %d reading data page %llu",
- err, i);
- return err;
- }
- } else {
- unsigned long num_ra_pages =
- min_t(unsigned long, num_blocks_to_hash - i,
- inode->i_sb->s_bdi->io_pages);
-
- /* Non-leaf: hashing hash block from level below */
- src_page = vops->read_merkle_tree_page(inode,
- params->level_start[level - 1] + i,
- num_ra_pages);
- if (IS_ERR(src_page)) {
- err = PTR_ERR(src_page);
- fsverity_err(inode,
- "Error %d reading Merkle tree page %llu",
- err, params->level_start[level - 1] + i);
- return err;
- }
- }
+ err = fsverity_hash_block(params, inode, req, virt_to_page(cur->data),
+ offset_in_page(cur->data),
+ &next->data[next->filled]);
+ if (err)
+ return err;
+ next->filled += params->digest_size;
+ cur->filled = 0;
+ return 0;
+}

- err = fsverity_hash_block(params, inode, req, src_page, 0,
- &pending_hashes[pending_size]);
- put_page(src_page);
- if (err)
- return err;
- pending_size += params->digest_size;
-
- if (level == params->num_levels) /* Root hash? */
- return 0;
-
- if (pending_size + params->digest_size > params->block_size ||
- i + 1 == num_blocks_to_hash) {
- /* Flush the pending hash block */
- memset(&pending_hashes[pending_size], 0,
- params->block_size - pending_size);
- err = vops->write_merkle_tree_block(inode,
- pending_hashes,
- dst_block_num << params->log_blocksize,
- params->block_size);
- if (err) {
- fsverity_err(inode,
- "Error %d writing Merkle tree block %llu",
- err, dst_block_num);
- return err;
- }
- dst_block_num++;
- pending_size = 0;
- }
+static int write_merkle_tree_block(struct inode *inode, const u8 *buf,
+ unsigned long index,
+ const struct merkle_tree_params *params)
+{
+ u64 pos = (u64)index << params->log_blocksize;
+ int err;

- if (fatal_signal_pending(current))
- return -EINTR;
- cond_resched();
- }
- return 0;
+ err = inode->i_sb->s_vop->write_merkle_tree_block(inode, buf, pos,
+ params->block_size);
+ if (err)
+ fsverity_err(inode, "Error %d writing Merkle tree block %lu",
+ err, index);
+ return err;
}

/*
@@ -148,13 +66,17 @@ static int build_merkle_tree(struct file *filp,
u8 *root_hash)
{
struct inode *inode = file_inode(filp);
- u8 *pending_hashes;
+ const u64 data_size = inode->i_size;
+ const int num_levels = params->num_levels;
struct ahash_request *req;
- u64 blocks;
- unsigned int level;
- int err = -ENOMEM;
+ struct block_buffer _buffers[1 + FS_VERITY_MAX_LEVELS + 1] = {};
+ struct block_buffer *buffers = &_buffers[1];
+ unsigned long level_offset[FS_VERITY_MAX_LEVELS];
+ int level;
+ u64 offset;
+ int err;

- if (inode->i_size == 0) {
+ if (data_size == 0) {
/* Empty file is a special case; root hash is all 0's */
memset(root_hash, 0, params->digest_size);
return 0;
@@ -163,29 +85,95 @@ static int build_merkle_tree(struct file *filp,
/* This allocation never fails, since it's mempool-backed. */
req = fsverity_alloc_hash_request(params->hash_alg, GFP_KERNEL);

- pending_hashes = kmalloc(params->block_size, GFP_KERNEL);
- if (!pending_hashes)
- goto out;
-
/*
- * Build each level of the Merkle tree, starting at the leaf level
- * (level 0) and ascending to the root node (level 'num_levels - 1').
- * Then at the end (level 'num_levels'), calculate the root hash.
+ * Allocate the block buffers. Buffer "-1" is for data blocks.
+ * Buffers 0 <= level < num_levels are for the actual tree levels.
+ * Buffer 'num_levels' is for the root hash.
*/
- blocks = ((u64)inode->i_size + params->block_size - 1) >>
- params->log_blocksize;
- for (level = 0; level <= params->num_levels; level++) {
- err = build_merkle_tree_level(filp, level, blocks, params,
- pending_hashes, req);
+ for (level = -1; level < num_levels; level++) {
+ buffers[level].data = kzalloc(params->block_size, GFP_KERNEL);
+ if (!buffers[level].data) {
+ err = -ENOMEM;
+ goto out;
+ }
+ }
+ buffers[num_levels].data = root_hash;
+
+ BUILD_BUG_ON(sizeof(level_offset) != sizeof(params->level_start));
+ memcpy(level_offset, params->level_start, sizeof(level_offset));
+
+ /* Hash each data block, also hashing the tree blocks as they fill up */
+ for (offset = 0; offset < data_size; offset += params->block_size) {
+ ssize_t bytes_read;
+ loff_t pos = offset;
+
+ buffers[-1].filled = min_t(u64, params->block_size,
+ data_size - offset);
+ bytes_read = __kernel_read(filp, buffers[-1].data,
+ buffers[-1].filled, &pos);
+ if (bytes_read < 0) {
+ err = bytes_read;
+ fsverity_err(inode, "Error %d reading file data", err);
+ goto out;
+ }
+ if (bytes_read != buffers[-1].filled) {
+ err = -EINVAL;
+ fsverity_err(inode, "Short read of file data");
+ goto out;
+ }
+ err = hash_one_block(inode, params, req, &buffers[-1]);
if (err)
goto out;
- blocks = (blocks + params->hashes_per_block - 1) >>
- params->log_arity;
+ for (level = 0; level < num_levels; level++) {
+ if (buffers[level].filled + params->digest_size <=
+ params->block_size) {
+ /* Next block at @level isn't full yet */
+ break;
+ }
+ /* Next block at @level is full */
+
+ err = hash_one_block(inode, params, req,
+ &buffers[level]);
+ if (err)
+ goto out;
+ err = write_merkle_tree_block(inode,
+ buffers[level].data,
+ level_offset[level],
+ params);
+ if (err)
+ goto out;
+ level_offset[level]++;
+ }
+ if (fatal_signal_pending(current)) {
+ err = -EINTR;
+ goto out;
+ }
+ cond_resched();
+ }
+ /* Finish all nonempty pending tree blocks. */
+ for (level = 0; level < num_levels; level++) {
+ if (buffers[level].filled != 0) {
+ err = hash_one_block(inode, params, req,
+ &buffers[level]);
+ if (err)
+ goto out;
+ err = write_merkle_tree_block(inode,
+ buffers[level].data,
+ level_offset[level],
+ params);
+ if (err)
+ goto out;
+ }
+ }
+ /* The root hash was filled by the last call to hash_one_block(). */
+ if (WARN_ON(buffers[num_levels].filled != params->digest_size)) {
+ err = -EINVAL;
+ goto out;
}
- memcpy(root_hash, pending_hashes, params->digest_size);
err = 0;
out:
- kfree(pending_hashes);
+ for (level = -1; level < num_levels; level++)
+ kfree(buffers[level].data);
fsverity_free_hash_request(params->hash_alg, req);
return err;
}
@@ -341,7 +329,7 @@ int fsverity_ioctl_enable(struct file *filp, const void __user *uarg)
memchr_inv(arg.__reserved2, 0, sizeof(arg.__reserved2)))
return -EINVAL;

- if (arg.block_size != PAGE_SIZE)
+ if (!is_power_of_2(arg.block_size))
return -EINVAL;

if (arg.salt_size > sizeof_field(struct fsverity_descriptor, salt))
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 6ecc51f80221a..991a444589966 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -93,8 +93,7 @@ struct fsverity_operations {
* isn't already cached. Implementations may ignore this
* argument; it's only a performance optimization.
*
- * This can be called at any time on an open verity file, as well as
- * between ->begin_enable_verity() and ->end_enable_verity(). It may be
+ * This can be called at any time on an open verity file. It may be
* called by multiple processes concurrently, even with the same page.
*
* Note that this must retrieve a *page*, not necessarily a *block*.
--
2.39.0

2023-01-04 06:44:41

by Ojaswin Mujoo

[permalink] [raw]
Subject: Re: [PATCH v2 00/11] fsverity: support for non-4K pages

Hi Eric,

I have roughly gone through the series and run the (patched) xfstests on
this patchset on a powerpc machine with 64k pagesize and 64k,4k and 1k
merkle tree size on EXT4 and everything seems to work correctly.

Just for records, test generic/692 takes a lot of time to complete with
64k merkel tree size due to the calculations assuming it to be 4k,
however I was able to manually test that particular scenario. (I'll try
to send a patch to fix the fstest later).

Anyways, feel free to add:

Tested-by: Ojaswin Mujoo <[email protected]>

Since I was not very familiar with the fsverty codebase, I'll try to
take some more time to review the code and get back with any
comments/RVBs.

Regards,
ojaswin

On Fri, Dec 23, 2022 at 12:36:27PM -0800, Eric Biggers wrote:
> [This patchset applies to mainline + some fsverity cleanups I sent out
> recently. You can get everything from tag "fsverity-non4k-v2" of
> https://git.kernel.org/pub/scm/fs/fscrypt/fscrypt.git ]
>
> Currently, filesystems (ext4, f2fs, and btrfs) only support fsverity
> when the Merkle tree block size, filesystem block size, and page size
> are all the same. In practice that means 4K, since increasing the page
> size, e.g. to 16K, forces the Merkle tree block size and filesystem
> block size to be increased accordingly. That can be impractical; for
> one, users want the same file signatures to work on all systems.
>
> Therefore, this patchset reduces the coupling between these sizes.
>
> First, patches 1-4 are cleanups.
>
> Second, patches 5-9 allow the Merkle tree block size to be less than the
> page size or filesystem block size, provided that it's not larger than
> either one. This involves, among other things, changing the way that
> fs/verity/verify.c tracks which hash blocks have been verified.
>
> Finally, patches 10-11 make ext4 support fsverity when the filesystem
> block size is less than the page size. Note, f2fs doesn't need similar
> changes because f2fs always assumes that the filesystem block size and
> page size are the same anyway. I haven't looked into btrfs yet.
>
> I've tested this patchset using the "verity" group of tests in xfstests
> with the following xfstests patchset applied:
> "[PATCH v2 00/10] xfstests: update verity tests for non-4K block and page size"
> (https://lore.kernel.org/fstests/[email protected]/T/#u)
>
> Note: on the thread "[RFC PATCH 00/11] fs-verity support for XFS"
> (https://lore.kernel.org/linux-xfs/[email protected]/T/#u)
> there have been many requests for other things to support, including:
>
> * folios in the pagecache
> * alternative Merkle tree caching methods
> * direct I/O
> * merkle_tree_block_size > page_size
> * extremely large files, using a reclaimable bitmap
>
> We shouldn't try to boil the ocean, though, so to keep the scope of this
> patchset manageable I haven't changed it significantly from v1. This
> patchset does bring us closer to many of the above, just not all the way
> there. I'd like to follow up this patchset with a change to support
> folios, which should be straightforward. Next, we can do a change to
> generalize the Merkle tree interface to allow XFS to use an alternative
> caching method, as that sounds like the highest priority item for XFS.
>
> Anyway, the changelog is:
>
> Changed in v2:
> - Rebased onto the recent fsverity cleanups.
> - Split some parts of the big "support verification" patch into
> separate patches.
> - Passed the data_pos to verify_data_block() instead of computing it
> using page->index, to make it ready for folio and DIO support.
> - Eliminated some unnecessary arithmetic in verify_data_block().
> - Changed the log_* fields in merkle_tree_params to u8.
> - Restored PageLocked and !PageUptodate checks for pagecache pages.
> - Eliminated the change to fsverity_hash_buffer().
> - Other small cleanups
>
> Eric Biggers (11):
> fsverity: use unsigned long for level_start
> fsverity: simplify Merkle tree readahead size calculation
> fsverity: store log2(digest_size) precomputed
> fsverity: use EFBIG for file too large to enable verity
> fsverity: replace fsverity_hash_page() with fsverity_hash_block()
> fsverity: support verification with tree block size < PAGE_SIZE
> fsverity: support enabling with tree block size < PAGE_SIZE
> ext4: simplify ext4_readpage_limit()
> f2fs: simplify f2fs_readpage_limit()
> fs/buffer.c: support fsverity in block_read_full_folio()
> ext4: allow verity with fs block size < PAGE_SIZE
>
> Documentation/filesystems/fsverity.rst | 76 +++---
> fs/buffer.c | 67 ++++-
> fs/ext4/readpage.c | 3 +-
> fs/ext4/super.c | 5 -
> fs/f2fs/data.c | 3 +-
> fs/verity/enable.c | 260 ++++++++++----------
> fs/verity/fsverity_private.h | 20 +-
> fs/verity/hash_algs.c | 24 +-
> fs/verity/open.c | 98 ++++++--
> fs/verity/verify.c | 325 +++++++++++++++++--------
> include/linux/fsverity.h | 14 +-
> 11 files changed, 565 insertions(+), 330 deletions(-)
>
> --
> 2.39.0
>

2023-01-04 07:39:54

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 00/11] fsverity: support for non-4K pages

On Wed, Jan 04, 2023 at 12:08:09PM +0530, Ojaswin Mujoo wrote:
> Hi Eric,
>
> I have roughly gone through the series and run the (patched) xfstests on
> this patchset on a powerpc machine with 64k pagesize and 64k,4k and 1k
> merkle tree size on EXT4 and everything seems to work correctly.
>
> Just for records, test generic/692 takes a lot of time to complete with
> 64k merkel tree size due to the calculations assuming it to be 4k,
> however I was able to manually test that particular scenario. (I'll try
> to send a patch to fix the fstest later).
>
> Anyways, feel free to add:
>
> Tested-by: Ojaswin Mujoo <[email protected]>
>
> Since I was not very familiar with the fsverty codebase, I'll try to
> take some more time to review the code and get back with any
> comments/RVBs.
>
> Regards,
> ojaswin

Thanks Ojaswin! That's a good point about generic/692. The right fix for it is
to make it use $FSV_BLOCK_SIZE instead of 4K in its calculations.

I suppose you saw that issue by running the test on ext4 with fs_block_size ==
page_size == 64K, causing xfstests to use merkle_tree_block_size == 64K by
default. Thanks for doing that; that's something I haven't been able to test
yet. My focus has been on merkle_tree_block_size < page_size.
merkle_tree_block_size > 4K should just work, though, assuming
merkle_tree_block_size <= min(fs_block_size, page_size). (Or
merkle_tree_block_size == fs_block_size == page_size before this patch series.)

- Eric

2023-01-05 11:25:54

by Ojaswin Mujoo

[permalink] [raw]
Subject: Re: [PATCH v2 00/11] fsverity: support for non-4K pages

On Tue, Jan 03, 2023 at 11:25:18PM -0800, Eric Biggers wrote:
>

Hi Eric,

> Thanks Ojaswin! That's a good point about generic/692. The right fix for it is
> to make it use $FSV_BLOCK_SIZE instead of 4K in its calculations.
Yes, that should fix the issue, I'll try to send in a patch for this
when I find some time.

>
> I suppose you saw that issue by running the test on ext4 with fs_block_size ==
> page_size == 64K, causing xfstests to use merkle_tree_block_size == 64K by
> default. Thanks for doing that; that's something I haven't been able to test
> yet. My focus has been on merkle_tree_block_size < page_size.
Correct, I was testing "everything = 64k" scenario when I
noticed the slowdown.

> merkle_tree_block_size > 4K should just work, though, assuming
> merkle_tree_block_size <= min(fs_block_size, page_size). (Or
> merkle_tree_block_size == fs_block_size == page_size before this patch series.)

Yes true, I still tested them just in case :)

Regards,
Ojaswin

2023-01-09 17:39:20

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 00/11] fsverity: support for non-4K pages

On Fri, Dec 23, 2022 at 12:36:27PM -0800, Eric Biggers wrote:
> [This patchset applies to mainline + some fsverity cleanups I sent out
> recently. You can get everything from tag "fsverity-non4k-v2" of
> https://git.kernel.org/pub/scm/fs/fscrypt/fscrypt.git ]

I've applied this patchset for 6.3, but I'd still greatly appreciate reviews and
acks, especially on the last 4 patches, which touch files outside fs/verity/.

(I applied it to
https://git.kernel.org/pub/scm/fs/fscrypt/fscrypt.git/log/?h=fsverity for now,
but there might be a new git repo soon, as is being discussed elsewhere.)

- Eric

2023-01-09 19:36:02

by Andrey Albershteyn

[permalink] [raw]
Subject: Re: [PATCH v2 00/11] fsverity: support for non-4K pages

On Mon, Jan 09, 2023 at 09:38:41AM -0800, Eric Biggers wrote:
> On Fri, Dec 23, 2022 at 12:36:27PM -0800, Eric Biggers wrote:
> > [This patchset applies to mainline + some fsverity cleanups I sent out
> > recently. You can get everything from tag "fsverity-non4k-v2" of
> > https://git.kernel.org/pub/scm/fs/fscrypt/fscrypt.git ]
>
> I've applied this patchset for 6.3, but I'd still greatly appreciate reviews and
> acks, especially on the last 4 patches, which touch files outside fs/verity/.
>
> (I applied it to
> https://git.kernel.org/pub/scm/fs/fscrypt/fscrypt.git/log/?h=fsverity for now,
> but there might be a new git repo soon, as is being discussed elsewhere.)
>
> - Eric
>

The fs/verity patches look good to me, I've checked them but forgot
to send RVB :( Haven't tested them yet though

Reviewed-by: Andrey Albershteyn <[email protected]>

--
- Andrey

2023-01-10 02:57:31

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2 10/11] fs/buffer.c: support fsverity in block_read_full_folio()

On Fri, 23 Dec 2022 12:36:37 -0800 Eric Biggers <[email protected]> wrote:

> After each filesystem block (as represented by a buffer_head) has been
> read from disk by block_read_full_folio(), verify it if needed. The
> verification is done on the fsverity_read_workqueue. Also allow reads
> of verity metadata past i_size, as required by ext4.

Sigh. Do we reeeeealy need to mess with buffer.c in this fashion? Did
any other subsystems feel a need to do this?

> This is needed to support fsverity on ext4 filesystems where the
> filesystem block size is less than the page size.

Does any real person actually do this?

2023-01-10 03:09:09

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 10/11] fs/buffer.c: support fsverity in block_read_full_folio()

On Mon, Jan 09, 2023 at 06:37:59PM -0800, Andrew Morton wrote:
> On Fri, 23 Dec 2022 12:36:37 -0800 Eric Biggers <[email protected]> wrote:
>
> > After each filesystem block (as represented by a buffer_head) has been
> > read from disk by block_read_full_folio(), verify it if needed. The
> > verification is done on the fsverity_read_workqueue. Also allow reads
> > of verity metadata past i_size, as required by ext4.
>
> Sigh. Do we reeeeealy need to mess with buffer.c in this fashion? Did
> any other subsystems feel a need to do this?

ext4 is currently the only filesystem that uses block_read_full_folio() and that
supports fsverity. However, since fsverity has a common infrastructure across
filesystems, in fs/verity/, it makes sense to support it in the other filesystem
infrastructure so that things aren't mutually exclusive for no reason.

Note that this applies to fscrypt too, which block_read_full_folio() (previously
block_read_full_page()) already supports since v5.5.

If you'd prefer that block_read_full_folio() be copied into ext4, then modified
to support fscrypt and fsverity, and then the fscrypt support removed from the
original copy, we could do that. That seems more like a workaround to avoid
modifying certain files than an actually better solution, but it could be done.

>
> > This is needed to support fsverity on ext4 filesystems where the
> > filesystem block size is less than the page size.
>
> Does any real person actually do this?

Yes, on systems with the page size larger than 4K, the ext4 filesystem block
size is often smaller than the page size. ext4 encryption (fscrypt) originally
had the same limitation, and Chandan Rajendra from IBM did significant work to
solve it a few years ago, with the changes landing in v5.5.

- Eric

2023-01-10 03:13:59

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 00/11] fsverity: support for non-4K pages

On Mon, Jan 09, 2023 at 08:34:46PM +0100, Andrey Albershteyn wrote:
> On Mon, Jan 09, 2023 at 09:38:41AM -0800, Eric Biggers wrote:
> > On Fri, Dec 23, 2022 at 12:36:27PM -0800, Eric Biggers wrote:
> > > [This patchset applies to mainline + some fsverity cleanups I sent out
> > > recently. You can get everything from tag "fsverity-non4k-v2" of
> > > https://git.kernel.org/pub/scm/fs/fscrypt/fscrypt.git ]
> >
> > I've applied this patchset for 6.3, but I'd still greatly appreciate reviews and
> > acks, especially on the last 4 patches, which touch files outside fs/verity/.
> >
> > (I applied it to
> > https://git.kernel.org/pub/scm/fs/fscrypt/fscrypt.git/log/?h=fsverity for now,
> > but there might be a new git repo soon, as is being discussed elsewhere.)
> >
> > - Eric
> >
>
> The fs/verity patches look good to me, I've checked them but forgot
> to send RVB :( Haven't tested them yet though
>
> Reviewed-by: Andrey Albershteyn <[email protected]>
>

Thanks Andrey! I added your Reviewed-by to patches 1-7 only, since you said
"the fs/verity patches". Let me know if I can add it to patches 8-11 too.

- Eric

2023-01-20 20:08:30

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 10/11] fs/buffer.c: support fsverity in block_read_full_folio()

On Mon, Jan 09, 2023 at 07:05:07PM -0800, Eric Biggers wrote:
> On Mon, Jan 09, 2023 at 06:37:59PM -0800, Andrew Morton wrote:
> > On Fri, 23 Dec 2022 12:36:37 -0800 Eric Biggers <[email protected]> wrote:
> >
> > > After each filesystem block (as represented by a buffer_head) has been
> > > read from disk by block_read_full_folio(), verify it if needed. The
> > > verification is done on the fsverity_read_workqueue. Also allow reads
> > > of verity metadata past i_size, as required by ext4.
> >
> > Sigh. Do we reeeeealy need to mess with buffer.c in this fashion? Did
> > any other subsystems feel a need to do this?
>
> ext4 is currently the only filesystem that uses block_read_full_folio() and that
> supports fsverity. However, since fsverity has a common infrastructure across
> filesystems, in fs/verity/, it makes sense to support it in the other filesystem
> infrastructure so that things aren't mutually exclusive for no reason.
>
> Note that this applies to fscrypt too, which block_read_full_folio() (previously
> block_read_full_page()) already supports since v5.5.
>
> If you'd prefer that block_read_full_folio() be copied into ext4, then modified
> to support fscrypt and fsverity, and then the fscrypt support removed from the
> original copy, we could do that. That seems more like a workaround to avoid
> modifying certain files than an actually better solution, but it could be done.
>
> >
> > > This is needed to support fsverity on ext4 filesystems where the
> > > filesystem block size is less than the page size.
> >
> > Does any real person actually do this?
>
> Yes, on systems with the page size larger than 4K, the ext4 filesystem block
> size is often smaller than the page size. ext4 encryption (fscrypt) originally
> had the same limitation, and Chandan Rajendra from IBM did significant work to
> solve it a few years ago, with the changes landing in v5.5.
>
> - Eric

Any more thoughts on this from Andrew, the ext4 maintainers, or anyone else?

- Eric

2023-01-21 06:52:15

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 10/11] fs/buffer.c: support fsverity in block_read_full_folio()

On Fri, Jan 20, 2023 at 11:56:45AM -0800, Eric Biggers wrote:
> Any more thoughts on this from Andrew, the ext4 maintainers, or anyone else?

As someone else: I relaly much prefer to support common functionality
(fsverity) in common helpers rather than copy and pasting them into
various file systems. The copy common helper and slightly modify it
is a cancer infecting various file systems that makes it really hard
to maintain the kernel.