2022-04-16 02:31:16

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH] fs-writeback: Flush plug before next iteration in wb_writeback()

Commit 505a666ee3fc ("writeback: plug writeback in wb_writeback() and
writeback_inodes_wb()") has us holding a plug during wb_writeback, which
may cause a potential ABBA dead lock:

wb_writeback fat_file_fsync
blk_start_plug(&plug)
for (;;) {
iter i-1: some reqs have been added into plug->mq_list // LOCK A
iter i:
progress = __writeback_inodes_wb(wb, work)
. writeback_sb_inodes // fat's bdev
. __writeback_single_inode
. . generic_writepages
. . __block_write_full_page
. . . . __generic_file_fsync
. . . . sync_inode_metadata
. . . . writeback_single_inode
. . . . __writeback_single_inode
. . . . fat_write_inode
. . . . __fat_write_inode
. . . . sync_dirty_buffer // fat's bdev
. . . . lock_buffer(bh) // LOCK B
. . . . submit_bh
. . . . blk_mq_get_tag // LOCK A
. . . trylock_buffer(bh) // LOCK B
. . . redirty_page_for_writepage
. . . wbc->pages_skipped++
. . --wbc->nr_to_write
. wrote += write_chunk - wbc.nr_to_write // wrote > 0
. requeue_inode
. redirty_tail_locked
if (progress) // progress > 0
continue;
iter i+1:
queue_io
// similar process with iter i, infinite for-loop !
}
blk_finish_plug(&plug) // flush plug won't be called

Above process triggers a hungtask like:
[ 399.044861] INFO: task bb:2607 blocked for more than 30 seconds.
[ 399.046824] Not tainted 5.18.0-rc1-00005-gefae4d9eb6a2-dirty
[ 399.051539] task:bb state:D stack: 0 pid: 2607 ppid:
2426 flags:0x00004000
[ 399.051556] Call Trace:
[ 399.051570] __schedule+0x480/0x1050
[ 399.051592] schedule+0x92/0x1a0
[ 399.051602] io_schedule+0x22/0x50
[ 399.051613] blk_mq_get_tag+0x1d3/0x3c0
[ 399.051640] __blk_mq_alloc_requests+0x21d/0x3f0
[ 399.051657] blk_mq_submit_bio+0x68d/0xca0
[ 399.051674] __submit_bio+0x1b5/0x2d0
[ 399.051708] submit_bio_noacct+0x34e/0x720
[ 399.051718] submit_bio+0x3b/0x150
[ 399.051725] submit_bh_wbc+0x161/0x230
[ 399.051734] __sync_dirty_buffer+0xd1/0x420
[ 399.051744] sync_dirty_buffer+0x17/0x20
[ 399.051750] __fat_write_inode+0x289/0x310
[ 399.051766] fat_write_inode+0x2a/0xa0
[ 399.051783] __writeback_single_inode+0x53c/0x6f0
[ 399.051795] writeback_single_inode+0x145/0x200
[ 399.051803] sync_inode_metadata+0x45/0x70
[ 399.051856] __generic_file_fsync+0xa3/0x150
[ 399.051880] fat_file_fsync+0x1d/0x80
[ 399.051895] vfs_fsync_range+0x40/0xb0
[ 399.051929] __x64_sys_fsync+0x18/0x30

In my test, 'need_resched()' (which is imported by 590dca3a71 "fs-writeback:
unplug before cond_resched in writeback_sb_inodes") in function
'writeback_sb_inodes()' seldom comes true, unless cond_resched() is deleted
from write_cache_pages().

Fix it by flush plug before next iteration in wb_writeback().

Goto Link to find a reproducer.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215837
Cc: [email protected] # v4.3
Reported-by: Zhihao Cheng <[email protected]>
---
fs/fs-writeback.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 591fe9cf1659..e524c0a1749c 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2036,8 +2036,21 @@ static long wb_writeback(struct bdi_writeback *wb,
* mean the overall work is done. So we keep looping as long
* as made some progress on cleaning pages or inodes.
*/
- if (progress)
+ if (progress) {
+ /*
+ * The progress may be false postive in page redirty
+ * case (which is caused by failing to get buffer head
+ * lock), which will requeue dirty inodes and start
+ * next writeback iteration, and other tasks maybe
+ * stuck for getting tags for new requests. So, flush
+ * plug to schedule requests holding tags.
+ *
+ * The code can be removed after buffer head
+ * disappering from linux.
+ */
+ blk_flush_plug(current->plug, false);
continue;
+ }
/*
* No more inodes for IO, bail
*/
--
2.31.1


2022-04-16 05:49:46

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] fs-writeback: Flush plug before next iteration in wb_writeback()

> I think the root cause is fsync gets buffer head's lock without locking
> corresponding page, fixing 'progess' and flushing plug are both
> workarounds.

So let's fix that.

2022-04-16 12:54:25

by Zhihao Cheng

[permalink] [raw]
Subject: Re: [PATCH] fs-writeback: Flush plug before next iteration in wb_writeback()

?? 2022/4/16 13:42, Christoph Hellwig ะด??:
>> I think the root cause is fsync gets buffer head's lock without locking
>> corresponding page, fixing 'progess' and flushing plug are both
>> workarounds.
>
> So let's fix that.
>

I think adding page lock before locking buffer head is a little
difficult and risky:
1. There are too many places getting buffer head before submitting bio,
and not all filesystems behave same in readpage/writepage/write_inode.
For example, ntfs_read_block() has locked page before locking buffer
head and then submitting bh, ext4(no journal) and fat may lock buffer
head without locking page while writing inode. It's a huge work to check
all places.
2. Import page lock before locking buffer head may bring new unknown
problem(other deadlocks about page ?). Taking page lock before locking
buffer head(in all processes which can be concurrent with wb_writeback)
is a dangerous thing.

So, how about applying the safe and simple method(flush plug) for the
time being?
PS: Maybe someday buffer head is removed from all filesystems, then we
can remove this superfluous blk_flush_plug.

2022-04-18 09:30:31

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] fs-writeback: Flush plug before next iteration in wb_writeback()

On Sat, Apr 16, 2022 at 04:41:35PM +0800, Zhihao Cheng wrote:
> So, how about applying the safe and simple method(flush plug) for the time
> being?

As said before - if you flush everytime here the plug is basically
useless and we could remove it. But it was added for a reason. So let's
at least improve the accounting for the skipped writeback as suggested in
your last mail.