2024-04-25 13:24:51

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v5 0/9] xfs/iomap: fix non-atomic clone operation and don't update size when zeroing range post eof

Changes since v4:
- For zeroing range in xfs, move the delalloc check to before searching
the COW fork when zeroing range. Only modify patch 04, please see it
for details, not modify other patches.

Changes since v3:
- Improve some git message comments and do some minor code cleanup, no
logic changes.

Changes since v2:
- Merge the patch for dropping of xfs_convert_blocks() and the patch
for modifying xfs_bmapi_convert_delalloc().
- Reword the commit message of the second patch.

Changes since v1:
- Make xfs_bmapi_convert_delalloc() to allocate the target offset and
drop the writeback helper xfs_convert_blocks().
- Don't use xfs_iomap_write_direct() to convert delalloc blocks for
zeroing posteof case, use xfs_bmapi_convert_delalloc() instead.
- Fix two off-by-one issues when converting delalloc blocks.
- Add a separate patch to drop the buffered write failure handle in
zeroing and unsharing.
- Add a comments do emphasize updating i_size should under folio lock.
- Make iomap_write_end() to return a boolean, and do some cleanups in
buffered write begin path.

This patch series fix a problem of exposing zeroed data on xfs since the
non-atomic clone operation. This problem was found while I was
developing ext4 buffered IO iomap conversion (ext4 is relying on this
fix [1]), the root cause of this problem and the discussion about the
solution please see [2]. After fix the problem, iomap_zero_range()
doesn't need to update i_size so that ext4 can use it to zero partial
block, e.g. truncate eof block [3].

[1] https://lore.kernel.org/linux-ext4/[email protected]/
[2] https://lore.kernel.org/linux-ext4/[email protected]/
[3] https://lore.kernel.org/linux-ext4/[email protected]/

Thanks,
Yi.

Zhang Yi (9):
xfs: match lock mode in xfs_buffered_write_iomap_begin()
xfs: make the seq argument to xfs_bmapi_convert_delalloc() optional
xfs: make xfs_bmapi_convert_delalloc() to allocate the target offset
xfs: convert delayed extents to unwritten when zeroing post eof blocks
iomap: drop the write failure handles when unsharing and zeroing
iomap: don't increase i_size if it's not a write operation
iomap: use a new variable to handle the written bytes in
iomap_write_iter()
iomap: make iomap_write_end() return a boolean
iomap: do some small logical cleanup in buffered write

fs/iomap/buffered-io.c | 105 ++++++++++++++++++++++-----------------
fs/xfs/libxfs/xfs_bmap.c | 40 +++++++++++++--
fs/xfs/xfs_aops.c | 54 ++++++--------------
fs/xfs/xfs_iomap.c | 39 +++++++++++++--
4 files changed, 144 insertions(+), 94 deletions(-)

--
2.39.2



2024-04-25 13:25:21

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v5 6/9] iomap: don't increase i_size if it's not a write operation

From: Zhang Yi <[email protected]>

Increase i_size in iomap_zero_range() and iomap_unshare_iter() is not
needed, the caller should handle it. Especially, when truncate partial
block, we should not increase i_size beyond the new EOF here. It doesn't
affect xfs and gfs2 now because they set the new file size after zero
out, it doesn't matter that a transient increase in i_size, but it will
affect ext4 because it set file size before truncate. So move the i_size
updating logic to iomap_write_iter().

Signed-off-by: Zhang Yi <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
---
fs/iomap/buffered-io.c | 50 +++++++++++++++++++++---------------------
1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 433eaae39966..63d94189e568 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -875,32 +875,13 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
size_t copied, struct folio *folio)
{
const struct iomap *srcmap = iomap_iter_srcmap(iter);
- loff_t old_size = iter->inode->i_size;
- size_t ret;
-
- if (srcmap->type == IOMAP_INLINE) {
- ret = iomap_write_end_inline(iter, folio, pos, copied);
- } else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
- ret = block_write_end(NULL, iter->inode->i_mapping, pos, len,
- copied, &folio->page, NULL);
- } else {
- ret = __iomap_write_end(iter->inode, pos, len, copied, folio);
- }
-
- /*
- * Update the in-memory inode size after copying the data into the page
- * cache. It's up to the file system to write the updated size to disk,
- * preferably after I/O completion so that no stale data is exposed.
- */
- if (pos + ret > old_size) {
- i_size_write(iter->inode, pos + ret);
- iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
- }
- __iomap_put_folio(iter, pos, ret, folio);

- if (old_size < pos)
- pagecache_isize_extended(iter->inode, old_size, pos);
- return ret;
+ if (srcmap->type == IOMAP_INLINE)
+ return iomap_write_end_inline(iter, folio, pos, copied);
+ if (srcmap->flags & IOMAP_F_BUFFER_HEAD)
+ return block_write_end(NULL, iter->inode->i_mapping, pos, len,
+ copied, &folio->page, NULL);
+ return __iomap_write_end(iter->inode, pos, len, copied, folio);
}

static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
@@ -915,6 +896,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)

do {
struct folio *folio;
+ loff_t old_size;
size_t offset; /* Offset into folio */
size_t bytes; /* Bytes to write to folio */
size_t copied; /* Bytes copied from user */
@@ -964,6 +946,22 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
status = iomap_write_end(iter, pos, bytes, copied, folio);

+ /*
+ * Update the in-memory inode size after copying the data into
+ * the page cache. It's up to the file system to write the
+ * updated size to disk, preferably after I/O completion so that
+ * no stale data is exposed. Only once that's done can we
+ * unlock and release the folio.
+ */
+ old_size = iter->inode->i_size;
+ if (pos + status > old_size) {
+ i_size_write(iter->inode, pos + status);
+ iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
+ }
+ __iomap_put_folio(iter, pos, status, folio);
+
+ if (old_size < pos)
+ pagecache_isize_extended(iter->inode, old_size, pos);
if (status < bytes)
iomap_write_failed(iter->inode, pos + status,
bytes - status);
@@ -1336,6 +1334,7 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
bytes = folio_size(folio) - offset;

bytes = iomap_write_end(iter, pos, bytes, bytes, folio);
+ __iomap_put_folio(iter, pos, bytes, folio);
if (WARN_ON_ONCE(bytes == 0))
return -EIO;

@@ -1400,6 +1399,7 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
folio_mark_accessed(folio);

bytes = iomap_write_end(iter, pos, bytes, bytes, folio);
+ __iomap_put_folio(iter, pos, bytes, folio);
if (WARN_ON_ONCE(bytes == 0))
return -EIO;

--
2.39.2


2024-04-25 13:25:36

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v5 7/9] iomap: use a new variable to handle the written bytes in iomap_write_iter()

From: Zhang Yi <[email protected]>

In iomap_write_iter(), the status variable used to receive the return
value from iomap_write_end() is confusing, replace it with a new written
variable to represent the written bytes in each cycle, no logic changes.

Signed-off-by: Zhang Yi <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
---
fs/iomap/buffered-io.c | 33 +++++++++++++++++----------------
1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 63d94189e568..854fa3d4b1c8 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -889,7 +889,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
loff_t length = iomap_length(iter);
size_t chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER;
loff_t pos = iter->pos;
- ssize_t written = 0;
+ ssize_t total_written = 0;
long status = 0;
struct address_space *mapping = iter->inode->i_mapping;
unsigned int bdp_flags = (iter->flags & IOMAP_NOWAIT) ? BDP_ASYNC : 0;
@@ -900,6 +900,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
size_t offset; /* Offset into folio */
size_t bytes; /* Bytes to write to folio */
size_t copied; /* Bytes copied from user */
+ size_t written; /* Bytes have been written */

bytes = iov_iter_count(i);
retry:
@@ -944,7 +945,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
flush_dcache_folio(folio);

copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
- status = iomap_write_end(iter, pos, bytes, copied, folio);
+ written = iomap_write_end(iter, pos, bytes, copied, folio);

/*
* Update the in-memory inode size after copying the data into
@@ -954,22 +955,22 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
* unlock and release the folio.
*/
old_size = iter->inode->i_size;
- if (pos + status > old_size) {
- i_size_write(iter->inode, pos + status);
+ if (pos + written > old_size) {
+ i_size_write(iter->inode, pos + written);
iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
}
- __iomap_put_folio(iter, pos, status, folio);
+ __iomap_put_folio(iter, pos, written, folio);

if (old_size < pos)
pagecache_isize_extended(iter->inode, old_size, pos);
- if (status < bytes)
- iomap_write_failed(iter->inode, pos + status,
- bytes - status);
- if (unlikely(copied != status))
- iov_iter_revert(i, copied - status);
+ if (written < bytes)
+ iomap_write_failed(iter->inode, pos + written,
+ bytes - written);
+ if (unlikely(copied != written))
+ iov_iter_revert(i, copied - written);

cond_resched();
- if (unlikely(status == 0)) {
+ if (unlikely(written == 0)) {
/*
* A short copy made iomap_write_end() reject the
* thing entirely. Might be memory poisoning
@@ -983,17 +984,17 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
goto retry;
}
} else {
- pos += status;
- written += status;
- length -= status;
+ pos += written;
+ total_written += written;
+ length -= written;
}
} while (iov_iter_count(i) && length);

if (status == -EAGAIN) {
- iov_iter_revert(i, written);
+ iov_iter_revert(i, total_written);
return -EAGAIN;
}
- return written ? written : status;
+ return total_written ? total_written : status;
}

ssize_t
--
2.39.2


2024-04-25 13:25:54

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v5 8/9] iomap: make iomap_write_end() return a boolean

From: Zhang Yi <[email protected]>

For now, we can make sure iomap_write_end() always return 0 or copied
bytes, so instead of return written bytes, convert to return a boolean
to indicate the copied bytes have been written to the pagecache.

Signed-off-by: Zhang Yi <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
---
fs/iomap/buffered-io.c | 48 +++++++++++++++++++++++++++---------------
1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 854fa3d4b1c8..176a9ea502ba 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -828,7 +828,7 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
return status;
}

-static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
+static bool __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
size_t copied, struct folio *folio)
{
flush_dcache_folio(folio);
@@ -845,14 +845,14 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
* redo the whole thing.
*/
if (unlikely(copied < len && !folio_test_uptodate(folio)))
- return 0;
+ return false;
iomap_set_range_uptodate(folio, offset_in_folio(folio, pos), len);
iomap_set_range_dirty(folio, offset_in_folio(folio, pos), copied);
filemap_dirty_folio(inode->i_mapping, folio);
- return copied;
+ return true;
}

-static size_t iomap_write_end_inline(const struct iomap_iter *iter,
+static void iomap_write_end_inline(const struct iomap_iter *iter,
struct folio *folio, loff_t pos, size_t copied)
{
const struct iomap *iomap = &iter->iomap;
@@ -867,20 +867,31 @@ static size_t iomap_write_end_inline(const struct iomap_iter *iter,
kunmap_local(addr);

mark_inode_dirty(iter->inode);
- return copied;
}

-/* Returns the number of bytes copied. May be 0. Cannot be an errno. */
-static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
+/*
+ * Returns true if all copied bytes have been written to the pagecache,
+ * otherwise return false.
+ */
+static bool iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
size_t copied, struct folio *folio)
{
const struct iomap *srcmap = iomap_iter_srcmap(iter);

- if (srcmap->type == IOMAP_INLINE)
- return iomap_write_end_inline(iter, folio, pos, copied);
- if (srcmap->flags & IOMAP_F_BUFFER_HEAD)
- return block_write_end(NULL, iter->inode->i_mapping, pos, len,
- copied, &folio->page, NULL);
+ if (srcmap->type == IOMAP_INLINE) {
+ iomap_write_end_inline(iter, folio, pos, copied);
+ return true;
+ }
+
+ if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
+ size_t bh_written;
+
+ bh_written = block_write_end(NULL, iter->inode->i_mapping, pos,
+ len, copied, &folio->page, NULL);
+ WARN_ON_ONCE(bh_written != copied && bh_written != 0);
+ return bh_written == copied;
+ }
+
return __iomap_write_end(iter->inode, pos, len, copied, folio);
}

@@ -945,7 +956,8 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
flush_dcache_folio(folio);

copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
- written = iomap_write_end(iter, pos, bytes, copied, folio);
+ written = iomap_write_end(iter, pos, bytes, copied, folio) ?
+ copied : 0;

/*
* Update the in-memory inode size after copying the data into
@@ -1323,6 +1335,7 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
int status;
size_t offset;
size_t bytes = min_t(u64, SIZE_MAX, length);
+ bool ret;

status = iomap_write_begin(iter, pos, bytes, &folio);
if (unlikely(status))
@@ -1334,9 +1347,9 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
if (bytes > folio_size(folio) - offset)
bytes = folio_size(folio) - offset;

- bytes = iomap_write_end(iter, pos, bytes, bytes, folio);
+ ret = iomap_write_end(iter, pos, bytes, bytes, folio);
__iomap_put_folio(iter, pos, bytes, folio);
- if (WARN_ON_ONCE(bytes == 0))
+ if (WARN_ON_ONCE(!ret))
return -EIO;

cond_resched();
@@ -1385,6 +1398,7 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
int status;
size_t offset;
size_t bytes = min_t(u64, SIZE_MAX, length);
+ bool ret;

status = iomap_write_begin(iter, pos, bytes, &folio);
if (status)
@@ -1399,9 +1413,9 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
folio_zero_range(folio, offset, bytes);
folio_mark_accessed(folio);

- bytes = iomap_write_end(iter, pos, bytes, bytes, folio);
+ ret = iomap_write_end(iter, pos, bytes, bytes, folio);
__iomap_put_folio(iter, pos, bytes, folio);
- if (WARN_ON_ONCE(bytes == 0))
+ if (WARN_ON_ONCE(!ret))
return -EIO;

pos += bytes;
--
2.39.2


2024-04-25 13:26:29

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v5 9/9] iomap: do some small logical cleanup in buffered write

From: Zhang Yi <[email protected]>

Since iomap_write_end() can never return a partial write length, the
comparison between written, copied and bytes becomes useless, just
merge them with the unwritten branch.

Signed-off-by: Zhang Yi <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
---
fs/iomap/buffered-io.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 176a9ea502ba..0926d216a5af 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -975,11 +975,6 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)

if (old_size < pos)
pagecache_isize_extended(iter->inode, old_size, pos);
- if (written < bytes)
- iomap_write_failed(iter->inode, pos + written,
- bytes - written);
- if (unlikely(copied != written))
- iov_iter_revert(i, copied - written);

cond_resched();
if (unlikely(written == 0)) {
@@ -989,6 +984,9 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
* halfway through, might be a race with munmap,
* might be severe memory pressure.
*/
+ iomap_write_failed(iter->inode, pos, bytes);
+ iov_iter_revert(i, copied);
+
if (chunk > PAGE_SIZE)
chunk /= 2;
if (copied) {
--
2.39.2


2024-04-25 13:37:46

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v5 1/9] xfs: match lock mode in xfs_buffered_write_iomap_begin()

From: Zhang Yi <[email protected]>

Commit 1aa91d9c9933 ("xfs: Add async buffered write support") replace
xfs_ilock(XFS_ILOCK_EXCL) with xfs_ilock_for_iomap() when locking the
writing inode, and a new variable lockmode is used to indicate the lock
mode. Although the lockmode should always be XFS_ILOCK_EXCL, it's still
better to use this variable instead of useing XFS_ILOCK_EXCL directly
when unlocking the inode.

Fixes: 1aa91d9c9933 ("xfs: Add async buffered write support")
Signed-off-by: Zhang Yi <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
fs/xfs/xfs_iomap.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 4087af7f3c9f..236ee78aa75b 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1158,13 +1158,13 @@ xfs_buffered_write_iomap_begin(
* them out if the write happens to fail.
*/
seq = xfs_iomap_inode_sequence(ip, IOMAP_F_NEW);
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ xfs_iunlock(ip, lockmode);
trace_xfs_iomap_alloc(ip, offset, count, allocfork, &imap);
return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, IOMAP_F_NEW, seq);

found_imap:
seq = xfs_iomap_inode_sequence(ip, 0);
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ xfs_iunlock(ip, lockmode);
return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, 0, seq);

found_cow:
@@ -1174,17 +1174,17 @@ xfs_buffered_write_iomap_begin(
if (error)
goto out_unlock;
seq = xfs_iomap_inode_sequence(ip, IOMAP_F_SHARED);
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ xfs_iunlock(ip, lockmode);
return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags,
IOMAP_F_SHARED, seq);
}

xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ xfs_iunlock(ip, lockmode);
return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, 0, seq);

out_unlock:
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ xfs_iunlock(ip, lockmode);
return error;
}

--
2.39.2


2024-04-25 13:38:55

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v5 5/9] iomap: drop the write failure handles when unsharing and zeroing

From: Zhang Yi <[email protected]>

Unsharing and zeroing can only happen within EOF, so there is never a
need to perform posteof pagecache truncation if write begin fails, also
partial write could never theoretically happened from iomap_write_end(),
so remove both of them.

Signed-off-by: Zhang Yi <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
---
fs/iomap/buffered-io.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 4e8e41c8b3c0..433eaae39966 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -824,7 +824,6 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,

out_unlock:
__iomap_put_folio(iter, pos, 0, folio);
- iomap_write_failed(iter->inode, pos, len);

return status;
}
@@ -901,8 +900,6 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,

if (old_size < pos)
pagecache_isize_extended(iter->inode, old_size, pos);
- if (ret < len)
- iomap_write_failed(iter->inode, pos + ret, len - ret);
return ret;
}

@@ -950,8 +947,10 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
}

status = iomap_write_begin(iter, pos, bytes, &folio);
- if (unlikely(status))
+ if (unlikely(status)) {
+ iomap_write_failed(iter->inode, pos, bytes);
break;
+ }
if (iter->iomap.flags & IOMAP_F_STALE)
break;

@@ -965,6 +964,9 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
status = iomap_write_end(iter, pos, bytes, copied, folio);

+ if (status < bytes)
+ iomap_write_failed(iter->inode, pos + status,
+ bytes - status);
if (unlikely(copied != status))
iov_iter_revert(i, copied - status);

--
2.39.2