Recently we found a bug related with ext4 buffer head is fixed by
commit 0b73284c564d("ext4: ext4_read_bh_lock() should submit IO if the
buffer isn't uptodate")[1].
This bug is fixed on some kernel long term versions, such as 5.10 and 5.15.
However, on 5.4 stable version, we can still easily reproduce this bug by
adding some delay after buffer_migrate_lock_buffers() in __buffer_migrate_page()
and do fsstress on the ext4 filesystem. We can get some errors in dmesg like:
EXT4-fs error (device pmem1): __ext4_find_entry:1658: inode #73193:
comm fsstress: reading directory lblock 0
EXT4-fs error (device pmem1): __ext4_find_entry:1658: inode #75334:
comm fsstress: reading directory lblock 0
About how to fix this bug in 5.4 version, currently I have three ideas.
But I don't know which one is better or is there any other feasible way to
fix this bug elegantly based on the 5.4 stable branch?
The first idea comes from this thread[2]. In __buffer_migrate_page(),
we can let it fallback to migrate_page that are not uptodate like
fallback_migrate_page(), those pages that has buffers may probably do
read operation soon. From [3], we can see this solution is not good enough
because there are other places that lock the buffer without doing IO.
I think this solution can be a candidate option to fix if we do not want to
change a lot. Also based on my test results, the ext4 filesystem remains
stable after one week stress test with this patch applied.
The second idea is backport a series of commits from upstream, such as
2d069c0889ef ("ext4: use common helpers in all places reading metadata buffers")
0b73284c564d ("ext4: ext4_read_bh_lock() should submit IO if the buffer isn't uptodate")
79f597842069 ("fs/buffer: remove ll_rw_block() helper")
This will lead to many lines of code change and should be carefully conducted,
but it looks like the most reasonable solution so far.
The third idea is replace trylock_buffer in ll_rw_block() with lock_buffer and
change ll_rw_block() in __breadahead_gfp() to trylock_buffer. However,
this will change the semantic of ll_rw_block(), and will not be suitable for
some readahead circumstances. Besides, the ll_rw_block() has many occurences
among many filesystems other than ext4, I think it is better to limit the
fix in the ext4 filesystem without affecting other filesystems.
Here I send the patch based on the first idea, hope someone can give more ideas
about how to fix this bug in kernel 5.4 version, thanks.
[1] https://lore.kernel.org/linux-mm/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/T/
[3] https://lore.kernel.org/linux-mm/20220825105704.e46hz6dp6opawsjk@quack3/
Yue Zhao (1):
mm: migrate: buffer_migrate_page_norefs() fallback migrate not
uptodate pages
mm/migrate.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
--
2.17.1
Recently we notice that ext4 filesystem occasionally fail to read
metadata from disk and report error message, but the disk and block
layer looks fine. After analyse, we lockon commit 88dbcbb3a484
("blkdev: avoid migration stalls for blkdev pages"). It provide a
migration method for the bdev, we could move page that has buffers
without extra users now, but it will lock the buffers on the page, which
breaks a lot of current filesystem's fragile metadata read operations,
like ll_rw_block() for common usage and ext4_read_bh_lock() for ext4,
these helpers just trylock the buffer and skip submit IO if it lock
failed, many callers just wait_on_buffer() and conclude IO error if the
buffer is not uptodate after buffer unlocked.
This issue could be easily reproduced by add some delay just after
buffer_migrate_lock_buffers() in __buffer_migrate_page() and do
fsstress on ext4 filesystem.
EXT4-fs error (device pmem1): __ext4_find_entry:1658: inode #73193:
comm fsstress: reading directory lblock 0
EXT4-fs error (device pmem1): __ext4_find_entry:1658: inode #75334:
comm fsstress: reading directory lblock 0
Something like ll_rw_block() should be used carefully and seems could
only be safely used for the readahead case. So the best way is to fix
the read operations in filesystem in the long run, but now let us avoid
this issue first. This patch avoid this issue by fallback to migrate
pages that are not uptodate like fallback_migrate_page(), those pages
that has buffers may probably do read operation soon.
Fixes: 88dbcbb3a484 ("blkdev: avoid migration stalls for blkdev pages")
Signed-off-by: Yue Zhao <[email protected]>
---
mm/migrate.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/mm/migrate.c b/mm/migrate.c
index 034b0662fd3b..1e908e997147 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -751,6 +751,39 @@ static int __buffer_migrate_page(struct address_space *mapping,
return -EAGAIN;
head = page_buffers(page);
+
+ /*
+ * If the mapped buffers on the page are not uptodate and has refcount,
+ * some others may propably try to lock the buffer and submit read IO
+ * through ll_rw_block(), but it will not submit IO once it failed to
+ * lock the buffer, so try to fallback to migrate_page() to prevent
+ * false positive EIO.
+ */
+ if (check_refs) {
+ bool uptodate = true;
+ bool invalidate = false;
+
+ bh = head;
+ do {
+ if (buffer_mapped(bh) && !buffer_uptodate(bh)) {
+ uptodate = false;
+ if (atomic_read(&bh->b_count)) {
+ invalidate = true;
+ break;
+ }
+ }
+ bh = bh->b_this_page;
+ } while (bh != head);
+
+ if (!uptodate) {
+ if (invalidate)
+ invalidate_bh_lrus();
+ if (try_to_release_page(page, GFP_KERNEL))
+ return migrate_page(mapping, newpage, page, mode);
+ return -EAGAIN;
+ }
+ }
+
if (!buffer_migrate_lock_buffers(head, mode))
return -EAGAIN;
--
2.17.1
On Thu, May 04, 2023 at 12:34:25AM +0800, Yue Zhao wrote:
> Recently we found a bug related with ext4 buffer head is fixed by
> commit 0b73284c564d("ext4: ext4_read_bh_lock() should submit IO if the
> buffer isn't uptodate")[1].
>
> This bug is fixed on some kernel long term versions, such as 5.10 and 5.15.
> However, on 5.4 stable version, we can still easily reproduce this bug by
> adding some delay after buffer_migrate_lock_buffers() in __buffer_migrate_page()
> and do fsstress on the ext4 filesystem. We can get some errors in dmesg like:
>
> EXT4-fs error (device pmem1): __ext4_find_entry:1658: inode #73193:
> comm fsstress: reading directory lblock 0
> EXT4-fs error (device pmem1): __ext4_find_entry:1658: inode #75334:
> comm fsstress: reading directory lblock 0
>
> About how to fix this bug in 5.4 version, currently I have three ideas.
> But I don't know which one is better or is there any other feasible way to
> fix this bug elegantly based on the 5.4 stable branch?
>
> The first idea comes from this thread[2]. In __buffer_migrate_page(),
> we can let it fallback to migrate_page that are not uptodate like
> fallback_migrate_page(), those pages that has buffers may probably do
> read operation soon. From [3], we can see this solution is not good enough
> because there are other places that lock the buffer without doing IO.
> I think this solution can be a candidate option to fix if we do not want to
> change a lot. Also based on my test results, the ext4 filesystem remains
> stable after one week stress test with this patch applied.
>
> The second idea is backport a series of commits from upstream, such as
>
> 2d069c0889ef ("ext4: use common helpers in all places reading metadata buffers")
> 0b73284c564d ("ext4: ext4_read_bh_lock() should submit IO if the buffer isn't uptodate")
> 79f597842069 ("fs/buffer: remove ll_rw_block() helper")
Backporting the original upstream commits is almost always the correct
solution. Please try doing that instead of a one-off patch like this.
thanks,
greg k-h