When writing back an inode and performing an fsync on it concurrently, a
deadlock issue may arise as shown below. In each writeback iteration, a
clean inode is requeued to the wb->b_dirty queue due to non-zero
pages_skipped, without anything actually being written. This causes an
infinite loop and prevents the plug from being flushed, resulting in a
deadlock. We now avoid requeuing the clean inode to prevent this issue.
wb_writeback fsync (inode-Y)
blk_start_plug(&plug)
for (;;) {
iter i-1: some reqs with page-X added into plug->mq_list // f2fs node page-X with PG_writeback
filemap_fdatawrite
__filemap_fdatawrite_range // write inode-Y with sync_mode WB_SYNC_ALL
do_writepages
f2fs_write_data_pages
__f2fs_write_data_pages // wb_sync_req[DATA]++ for WB_SYNC_ALL
f2fs_write_cache_pages
f2fs_write_single_data_page
f2fs_do_write_data_page
f2fs_outplace_write_data
f2fs_update_data_blkaddr
f2fs_wait_on_page_writeback
wait_on_page_writeback // wait for f2fs node page-X
iter i:
progress = __writeback_inodes_wb(wb, work)
. writeback_sb_inodes
. __writeback_single_inode // write inode-Y with sync_mode WB_SYNC_NONE
. . do_writepages
. . f2fs_write_data_pages
. . . __f2fs_write_data_pages // skip writepages due to (wb_sync_req[DATA]>0)
. . . wbc->pages_skipped += get_dirty_pages(inode) // wbc->pages_skipped = 1
. if (!(inode->i_state & I_DIRTY_ALL)) // i_state = I_SYNC | I_SYNC_QUEUED
. total_wrote++; // total_wrote = 1
. requeue_inode // requeue inode-Y to wb->b_dirty queue due to non-zero pages_skipped
if (progress) // progress = 1
continue;
iter i+1:
queue_io
// similar process with iter i, infinite for-loop !
}
blk_finish_plug(&plug) // flush plug won't be called
Signed-off-by: Chunhai Guo <[email protected]>
---
fs/fs-writeback.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 969ce991b0b0..c1af01b2c42d 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1535,10 +1535,15 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
if (wbc->pages_skipped) {
/*
- * writeback is not making progress due to locked
- * buffers. Skip this inode for now.
+ * Writeback is not making progress due to locked buffers.
+ * Skip this inode for now. Although having skipped pages
+ * is odd for clean inodes, it can happen for some
+ * filesystems so handle that gracefully.
*/
- redirty_tail_locked(inode, wb);
+ if (inode->i_state & I_DIRTY_ALL)
+ redirty_tail_locked(inode, wb);
+ else
+ inode_cgwb_move_to_attached(inode, wb);
return;
}
--
2.25.1
On Fri, 15 Sep 2023 22:51:31 -0600, Chunhai Guo wrote:
> When writing back an inode and performing an fsync on it concurrently, a
> deadlock issue may arise as shown below. In each writeback iteration, a
> clean inode is requeued to the wb->b_dirty queue due to non-zero
> pages_skipped, without anything actually being written. This causes an
> infinite loop and prevents the plug from being flushed, resulting in a
> deadlock. We now avoid requeuing the clean inode to prevent this issue.
>
> [...]
Applied to the vfs.misc branch of the vfs/vfs.git tree.
Patches in the vfs.misc branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.misc
[1/1] fs-writeback: do not requeue a clean inode having skipped pages
https://git.kernel.org/vfs/vfs/c/3d744ef7ea7a
On Fri 15-09-23 22:51:31, Chunhai Guo wrote:
> When writing back an inode and performing an fsync on it concurrently, a
> deadlock issue may arise as shown below. In each writeback iteration, a
> clean inode is requeued to the wb->b_dirty queue due to non-zero
> pages_skipped, without anything actually being written. This causes an
> infinite loop and prevents the plug from being flushed, resulting in a
> deadlock. We now avoid requeuing the clean inode to prevent this issue.
>
> wb_writeback fsync (inode-Y)
> blk_start_plug(&plug)
> for (;;) {
> iter i-1: some reqs with page-X added into plug->mq_list // f2fs node page-X with PG_writeback
> filemap_fdatawrite
> __filemap_fdatawrite_range // write inode-Y with sync_mode WB_SYNC_ALL
> do_writepages
> f2fs_write_data_pages
> __f2fs_write_data_pages // wb_sync_req[DATA]++ for WB_SYNC_ALL
> f2fs_write_cache_pages
> f2fs_write_single_data_page
> f2fs_do_write_data_page
> f2fs_outplace_write_data
> f2fs_update_data_blkaddr
> f2fs_wait_on_page_writeback
> wait_on_page_writeback // wait for f2fs node page-X
> iter i:
> progress = __writeback_inodes_wb(wb, work)
> . writeback_sb_inodes
> . __writeback_single_inode // write inode-Y with sync_mode WB_SYNC_NONE
> . . do_writepages
> . . f2fs_write_data_pages
> . . . __f2fs_write_data_pages // skip writepages due to (wb_sync_req[DATA]>0)
> . . . wbc->pages_skipped += get_dirty_pages(inode) // wbc->pages_skipped = 1
> . if (!(inode->i_state & I_DIRTY_ALL)) // i_state = I_SYNC | I_SYNC_QUEUED
> . total_wrote++; // total_wrote = 1
> . requeue_inode // requeue inode-Y to wb->b_dirty queue due to non-zero pages_skipped
> if (progress) // progress = 1
> continue;
> iter i+1:
> queue_io
> // similar process with iter i, infinite for-loop !
> }
> blk_finish_plug(&plug) // flush plug won't be called
>
> Signed-off-by: Chunhai Guo <[email protected]>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 969ce991b0b0..c1af01b2c42d 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1535,10 +1535,15 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
>
> if (wbc->pages_skipped) {
> /*
> - * writeback is not making progress due to locked
> - * buffers. Skip this inode for now.
> + * Writeback is not making progress due to locked buffers.
> + * Skip this inode for now. Although having skipped pages
> + * is odd for clean inodes, it can happen for some
> + * filesystems so handle that gracefully.
> */
> - redirty_tail_locked(inode, wb);
> + if (inode->i_state & I_DIRTY_ALL)
> + redirty_tail_locked(inode, wb);
> + else
> + inode_cgwb_move_to_attached(inode, wb);
> return;
> }
>
> --
> 2.25.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR