2022-09-05 07:30:27

by Kassey Li

[permalink] [raw]
Subject: [PATCH] fuse: fix the deadlock in race of reclaim path with kswapd

Task A wait for writeback, while writeback Task B send request to fuse.
Task C is expected to serve this request, here it is in direct reclaim
path cause deadlock when system is in low memory.

without __GFP_FS in Task_C break throttle_direct_reclaim with an
HZ timeout.

kswpad (Task_A): writeback(Task_B):
__switch_to+0x14c schedule+0x70
__schedule+0xb5c __fuse_request_send+0x154
schedule+0x70 fuse_simple_request+0x184
bit_wait+0x18 fuse_flush_times+0x114
__wait_on_bit+0x74 fuse_write_inode+0x60
inode_wait_for_writeback+0xa4 __writeback_single_inode+0x3d8
evict+0xa8 writeback_sb_inodes+0x4c0
iput+0x248 __writeback_inodes_wb+0xb0
dentry_unlink_inode+0xdc wb_writeback+0x270
__dentry_kill[jt]+0x110 wb_workfn+0x37c
shrink_dentry_list+0x17c process_one_work+0x284
prune_dcache_sb+0x5c
super_cache_scan+0x11c
do_shrink_slab+0x248
shrink_slab+0x260
shrink_node+0x678
kswapd+0x8ec
kthread+0x140
ret_from_fork+0x10

Task_C:
__switch_to+0x14c
__schedule+0xb5c
schedule+0x70
throttle_direct_reclaim
try_to_free_pages
__perform_reclaim
__alloc_pages_direct_reclaim
__alloc_pages_slowpath
__alloc_pages_nodemask
alloc_pages
fuse_copy_fill+0x168
fuse_dev_do_read+0x37c
fuse_dev_splice_read+0x94

Suggested-by: Wang Mao <[email protected]>
Signed-off-by: Kassey Li <[email protected]>
---
fs/fuse/dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 51897427a534..0df7234840c3 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -713,7 +713,7 @@ static int fuse_copy_fill(struct fuse_copy_state *cs)
if (cs->nr_segs >= cs->pipe->max_usage)
return -EIO;

- page = alloc_page(GFP_HIGHUSER);
+ page = alloc_page(GFP_HIGHUSER & ~__GFP_FS);
if (!page)
return -ENOMEM;

--
2.17.1


2022-09-05 13:56:53

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] fuse: fix the deadlock in race of reclaim path with kswapd

On Mon, 5 Sept 2022 at 09:17, Kassey Li <[email protected]> wrote:
>
> Task A wait for writeback, while writeback Task B send request to fuse.
> Task C is expected to serve this request, here it is in direct reclaim
> path cause deadlock when system is in low memory.
>
> without __GFP_FS in Task_C break throttle_direct_reclaim with an
> HZ timeout.
>
> kswpad (Task_A): writeback(Task_B):
> __switch_to+0x14c schedule+0x70
> __schedule+0xb5c __fuse_request_send+0x154
> schedule+0x70 fuse_simple_request+0x184
> bit_wait+0x18 fuse_flush_times+0x114
> __wait_on_bit+0x74 fuse_write_inode+0x60
> inode_wait_for_writeback+0xa4 __writeback_single_inode+0x3d8
> evict+0xa8 writeback_sb_inodes+0x4c0
> iput+0x248 __writeback_inodes_wb+0xb0
> dentry_unlink_inode+0xdc wb_writeback+0x270
> __dentry_kill[jt]+0x110 wb_workfn+0x37c
> shrink_dentry_list+0x17c process_one_work+0x284
> prune_dcache_sb+0x5c
> super_cache_scan+0x11c
> do_shrink_slab+0x248
> shrink_slab+0x260
> shrink_node+0x678
> kswapd+0x8ec
> kthread+0x140
> ret_from_fork+0x10
>
> Task_C:
> __switch_to+0x14c
> __schedule+0xb5c
> schedule+0x70
> throttle_direct_reclaim
> try_to_free_pages
> __perform_reclaim
> __alloc_pages_direct_reclaim
> __alloc_pages_slowpath
> __alloc_pages_nodemask
> alloc_pages
> fuse_copy_fill+0x168
> fuse_dev_do_read+0x37c
> fuse_dev_splice_read+0x94

Should already be fixed in v5.16 by commit 5c791fe1e2a4 ("fuse: make
sure reclaim doesn't write the inode").

Thanks,
Miklos

2022-09-06 00:32:31

by Kassey Li

[permalink] [raw]
Subject: Re: [PATCH] fuse: fix the deadlock in race of reclaim path with kswapd



On 9/5/2022 9:28 PM, Miklos Szeredi wrote:
> On Mon, 5 Sept 2022 at 09:17, Kassey Li <[email protected]> wrote:
>>
>> Task A wait for writeback, while writeback Task B send request to fuse.
>> Task C is expected to serve this request, here it is in direct reclaim
>> path cause deadlock when system is in low memory.
>>
>> without __GFP_FS in Task_C break throttle_direct_reclaim with an
>> HZ timeout.
>>
>> kswpad (Task_A): writeback(Task_B):
>> __switch_to+0x14c schedule+0x70
>> __schedule+0xb5c __fuse_request_send+0x154
>> schedule+0x70 fuse_simple_request+0x184
>> bit_wait+0x18 fuse_flush_times+0x114
>> __wait_on_bit+0x74 fuse_write_inode+0x60
>> inode_wait_for_writeback+0xa4 __writeback_single_inode+0x3d8
>> evict+0xa8 writeback_sb_inodes+0x4c0
>> iput+0x248 __writeback_inodes_wb+0xb0
>> dentry_unlink_inode+0xdc wb_writeback+0x270
>> __dentry_kill[jt]+0x110 wb_workfn+0x37c
>> shrink_dentry_list+0x17c process_one_work+0x284
>> prune_dcache_sb+0x5c
>> super_cache_scan+0x11c
>> do_shrink_slab+0x248
>> shrink_slab+0x260
>> shrink_node+0x678
>> kswapd+0x8ec
>> kthread+0x140
>> ret_from_fork+0x10
>>
>> Task_C:
>> __switch_to+0x14c
>> __schedule+0xb5c
>> schedule+0x70
>> throttle_direct_reclaim
>> try_to_free_pages
>> __perform_reclaim
>> __alloc_pages_direct_reclaim
>> __alloc_pages_slowpath
>> __alloc_pages_nodemask
>> alloc_pages
>> fuse_copy_fill+0x168
>> fuse_dev_do_read+0x37c
>> fuse_dev_splice_read+0x94
>
> Should already be fixed in v5.16 by commit 5c791fe1e2a4 ("fuse: make
> sure reclaim doesn't write the inode").
thanks for this info.
>
> Thanks,
> Miklos