From: Zhang Yi <[email protected]>
Commit '943bc0882ceb ("iomap: don't increase i_size if it's not a write
operation")' breaks xfs with realtime device on generic/561, the problem
is when unaligned truncate down a xfs realtime inode with rtextsize > 1
fs block, xfs only zero out the EOF block but doesn't zero out the tail
blocks that aligned to rtextsize, so if we don't increase i_size in
iomap_write_end(), it could expose stale data after we do an append
write beyond the aligned EOF block.
xfs should zero out the tail blocks when truncate down, but before we
finish that, let's fix the issue by just revert the changes in
iomap_write_end().
Fixes: 943bc0882ceb ("iomap: don't increase i_size if it's not a write operation")
Reported-by: Chandan Babu R <[email protected]>
Link: https://lore.kernel.org/linux-xfs/[email protected]
Signed-off-by: Zhang Yi <[email protected]>
---
fs/iomap/buffered-io.c | 53 +++++++++++++++++++-----------------------
1 file changed, 24 insertions(+), 29 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index c5802a459334..bd70fcbc168e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -877,22 +877,37 @@ static bool iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
size_t copied, struct folio *folio)
{
const struct iomap *srcmap = iomap_iter_srcmap(iter);
+ loff_t old_size = iter->inode->i_size;
+ size_t written;
if (srcmap->type == IOMAP_INLINE) {
iomap_write_end_inline(iter, folio, pos, copied);
- return true;
+ written = copied;
+ } else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
+ written = block_write_end(NULL, iter->inode->i_mapping, pos,
+ len, copied, &folio->page, NULL);
+ WARN_ON_ONCE(written != copied && written != 0);
+ } else {
+ written = __iomap_write_end(iter->inode, pos, len, copied,
+ folio) ? copied : 0;
}
- if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
- size_t bh_written;
-
- bh_written = block_write_end(NULL, iter->inode->i_mapping, pos,
- len, copied, &folio->page, NULL);
- WARN_ON_ONCE(bh_written != copied && bh_written != 0);
- return bh_written == copied;
+ /*
+ * Update the in-memory inode size after copying the data into the page
+ * cache. It's up to the file system to write the updated size to disk,
+ * preferably after I/O completion so that no stale data is exposed.
+ * Only once that's done can we unlock and release the folio.
+ */
+ if (pos + written > old_size) {
+ i_size_write(iter->inode, pos + written);
+ iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
}
+ __iomap_put_folio(iter, pos, written, folio);
- return __iomap_write_end(iter->inode, pos, len, copied, folio);
+ if (old_size < pos)
+ pagecache_isize_extended(iter->inode, old_size, pos);
+
+ return written == copied;
}
static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
@@ -907,7 +922,6 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
do {
struct folio *folio;
- loff_t old_size;
size_t offset; /* Offset into folio */
size_t bytes; /* Bytes to write to folio */
size_t copied; /* Bytes copied from user */
@@ -959,23 +973,6 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
written = iomap_write_end(iter, pos, bytes, copied, folio) ?
copied : 0;
- /*
- * Update the in-memory inode size after copying the data into
- * the page cache. It's up to the file system to write the
- * updated size to disk, preferably after I/O completion so that
- * no stale data is exposed. Only once that's done can we
- * unlock and release the folio.
- */
- old_size = iter->inode->i_size;
- if (pos + written > old_size) {
- i_size_write(iter->inode, pos + written);
- iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
- }
- __iomap_put_folio(iter, pos, written, folio);
-
- if (old_size < pos)
- pagecache_isize_extended(iter->inode, old_size, pos);
-
cond_resched();
if (unlikely(written == 0)) {
/*
@@ -1346,7 +1343,6 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
bytes = folio_size(folio) - offset;
ret = iomap_write_end(iter, pos, bytes, bytes, folio);
- __iomap_put_folio(iter, pos, bytes, folio);
if (WARN_ON_ONCE(!ret))
return -EIO;
@@ -1412,7 +1408,6 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
folio_mark_accessed(folio);
ret = iomap_write_end(iter, pos, bytes, bytes, folio);
- __iomap_put_folio(iter, pos, bytes, folio);
if (WARN_ON_ONCE(!ret))
return -EIO;
--
2.39.2
Looks good:
Reviewed-by: Christoph Hellwig <[email protected]>
hopefully we can bring it back soon.
On 2024/6/4 12:08, Christoph Hellwig wrote:
> Looks good:
>
> Reviewed-by: Christoph Hellwig <[email protected]>
>
> hopefully we can bring it back soon.
>
Yeah, it will :)
Thanks,
Yi.
On Mon, 03 Jun 2024 19:22:22 +0800, Zhang Yi wrote:
> Commit '943bc0882ceb ("iomap: don't increase i_size if it's not a write
> operation")' breaks xfs with realtime device on generic/561, the problem
> is when unaligned truncate down a xfs realtime inode with rtextsize > 1
> fs block, xfs only zero out the EOF block but doesn't zero out the tail
> blocks that aligned to rtextsize, so if we don't increase i_size in
> iomap_write_end(), it could expose stale data after we do an append
> write beyond the aligned EOF block.
>
> [...]
Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes
[1/1] iomap: keep on increasing i_size in iomap_write_end()
https://git.kernel.org/vfs/vfs/c/86e71b5f0366
On Mon, Jun 03, 2024 at 07:22:22 PM +0800, Zhang Yi wrote:
> From: Zhang Yi <[email protected]>
>
> Commit '943bc0882ceb ("iomap: don't increase i_size if it's not a write
> operation")' breaks xfs with realtime device on generic/561, the problem
> is when unaligned truncate down a xfs realtime inode with rtextsize > 1
> fs block, xfs only zero out the EOF block but doesn't zero out the tail
> blocks that aligned to rtextsize, so if we don't increase i_size in
> iomap_write_end(), it could expose stale data after we do an append
> write beyond the aligned EOF block.
>
> xfs should zero out the tail blocks when truncate down, but before we
> finish that, let's fix the issue by just revert the changes in
> iomap_write_end().
I didn't notice any regressions with this patch applied. Hence,
Tested-by: Chandan Babu R <[email protected]>
--
Chandan