by Christian Schoenebeck

[permalink] [raw]

Subject: Re: [V9fs-developer] [PATCH 0/5] Take 3 at async RPCs and no longer looping forever on signals

On Thursday, March 23, 2023 4:58:51 PM CET Christian Schoenebeck wrote:
> On Sunday, March 19, 2023 12:53:46 PM CET Dominique Martinet wrote:
> > It's been a while but I didn't forget...
> >
> > Dominique Martinet wrote on Tue, Feb 14, 2023 at 08:16:38PM +0900:
> > > > Yes, apparently it tries to write dirty pages of the mapped file and keeps
> > > > hanging there [fs/9p/vfs_inode_dotl.c:586]:
> > >
> > > Yeah, it'd help to get the trace of the thread actually trying to do the
> > > IO, if it still exists.
> > > I had some hangs in the check that there are no flush in flight at some
> > > point, and I thought I fixed that, but I can't really see anywhere else
> > > that'd start hanging with this... it'll be clearer if I can reproduce.
> >
> > I couldn't reproduce this one, but manually inspecting
> > p9_client_wait_flush again I noticed the wait_event_interruptible was
> > waiting on req->flushed_req->wq but looking at req->status in the
> > condition; that was an error.
> > Also, we have a ref on req->flushed_req but not on req, so
> > req->flushed_req wasn't safe.
> >
> > I've changed the code to add a variable directly on req->flushed_req and
> > use it consistently, I'm not sure that's the problem you ran into but it
> > might help.
> > It's been a while but do you remember if that hang was consistently
> > happening on shutdown, or was it a one time thing?
> >
> > Either way, I'd appreciate if you could try my 9p-test branch again:
> > https://github.com/martinetd/linux/commits/9p-test
> >
> >
> > With that said, I expect that p9_client_wait_req will cause hangs on
> > broken servers.
> > If connection drops hopefully the reqs will just be marked as error and
> > free the thread, but I can see syzbot complaining about yet another
> > thread stuck.. Well it's interruptible at least, and bails out on
> > ERESTARTSYS.
>
> I gave your current test branch some spins today.
>
> And yes, the problem did not happen reliably each time, but consistently
> enough for me to reproduce it.
>
> With your latest test branch it appears to have mitigated the problem. Once in
> a while on shutdown I see it blocking for few minutes, but recovering:
>
> [** ] (2 of 3) A stop job is running for …ave Random Seed (2min 36s / 10min)
> [ 484.986388] INFO: task systemd-user-ru:8782 blocked for more than 120 seconds.
> [ 484.990150] Tainted: G E 6.3.0-rc2+ #63
> [ 484.992553] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 484.993657] task:systemd-user-ru state:D stack:0 pid:8782 ppid:1 flags:0x00000004
> [ 484.994863] Call Trace:
> [ 484.995398] <TASK>
> [ 484.995866] __schedule (kernel/sched/core.c:5304 kernel/sched/core.c:6622)
> [ 484.996614] schedule (kernel/sched/core.c:6699 (discriminator 1))
> [ 484.997283] d_alloc_parallel (./include/linux/spinlock.h:350 fs/dcache.c:2626 fs/dcache.c:2707)
> [ 484.998150] ? __pfx_default_wake_function (kernel/sched/core.c:6944)
> [ 484.999213] __lookup_slow (fs/namei.c:1671)
> [ 485.000006] walk_component (./include/linux/fs.h:773 fs/namei.c:1704 fs/namei.c:1994)
> [ 485.000805] path_lookupat (fs/namei.c:2451 fs/namei.c:2475)
> [ 485.001594] filename_lookup (fs/namei.c:2504)
> [ 485.002452] ? __check_object_size (mm/usercopy.c:196 mm/usercopy.c:251 mm/usercopy.c:213)
> [ 485.003523] ? strncpy_from_user (./include/asm-generic/access_ok.h:40 ./arch/x86/include/asm/uaccess.h:551 lib/strncpy_from_user.c:138)
> [ 485.004537] user_path_at_empty (fs/namei.c:2879)
> [ 485.005508] do_faccessat (fs/open.c:484)
> [ 485.006410] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
> [ 485.007281] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
> [ 485.008495] RIP: 0033:0x7f7b4527d8f7
> [ 485.009380] RSP: 002b:00007ffc9ee73c08 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
> [ 485.011118] RAX: ffffffffffffffda RBX: 00007f7b44ee2608 RCX: 00007f7b4527d8f7
> [ 485.012831] RDX: 000000000000001c RSI: 0000000000000000 RDI: 00007f7b44eda86b
> [ 485.014560] RBP: 0000000000000003 R08: 0000000000000000 R09: 00007f7b45360be0
> [ 485.016246] R10: 00005571ade756c0 R11: 0000000000000246 R12: 00007ffc9ee73cf8
> [ 485.017937] R13: 00007ffc9ee73d18 R14: 00007f7b44ee2610 R15: 0000000000000000
> [ 485.019669] </TASK>
>
> However that happens now before unmount rule is reached on shutdown?
> Previously it was hanging after the shutdown rule was reached. Does this make
> sense?

Correction: I still get the previous issue on unmount with your latest version
as well. :/

Call trace looks slightly different, but it seems to be the same issue:

[ OK ] Reached target Unmount All Filesystems.
[* ] A stop job is running for Load/Save Random Seed (3min 27s / 10min)
[ 243.031668] INFO: task systemd-journal:199 blocked for more than 120 seconds.
[ 243.036460] Tainted: G E 6.3.0-rc2+ #63
[ 243.037666] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 243.039004] task:systemd-journal state:D stack:0 pid:199 ppid:1 flags:0x00004004
[ 243.040434] Call Trace:
[ 243.040878] <TASK>
[ 243.041265] __schedule (kernel/sched/core.c:5304 kernel/sched/core.c:6622)
[ 243.041887] ? __pagevec_release (./include/linux/pagevec.h:38 mm/swap.c:1063)
[ 243.042594] schedule (kernel/sched/core.c:6699 (discriminator 1))
[ 243.043150] io_schedule (kernel/sched/core.c:8859 kernel/sched/core.c:8885)
[ 243.043751] folio_wait_bit_common (mm/filemap.c:1293)
[ 243.044519] ? __pfx_wake_page_function (mm/filemap.c:1083)
[ 243.045321] folio_wait_writeback (./arch/x86/include/asm/bitops.h:207 ./arch/x86/include/asm/bitops.h:239 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 ./include/linux/page-flags.h:518 mm/page-writeback.c:3125)
[ 243.046043] __filemap_fdatawait_range (./arch/x86/include/asm/bitops.h:75 (discriminator 3) ./include/asm-generic/bitops/instrumented-atomic.h:42 (discriminator 3) ./include/linux/page-flags.h:473 (discriminator 3) mm/filemap.c:526 (discriminator 3))
[ 243.046828] ? filemap_fdatawrite_wbc (mm/filemap.c:393)
[ 243.047622] ? __filemap_fdatawrite_range (mm/filemap.c:424)
[ 243.048531] filemap_write_and_wait_range (mm/filemap.c:689 mm/filemap.c:669)
[ 243.049358] v9fs_vfs_setattr_dotl (./include/linux/pagemap.h:60 fs/9p/vfs_inode_dotl.c:587) 9p
[ 243.050229] notify_change (fs/attr.c:486)
[ 243.050924] ? __vfs_getxattr (fs/xattr.c:426)
[ 243.051571] ? do_truncate (./include/linux/fs.h:763 fs/open.c:67)
[ 243.052224] do_truncate (./include/linux/fs.h:763 fs/open.c:67)
[ 243.052816] do_sys_ftruncate (fs/open.c:194)
[ 243.053548] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[ 243.054187] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 243.055057] RIP: 0033:0x7f6db123ab77
[ 243.055706] RSP: 002b:00007ffc4dda0ca8 EFLAGS: 00000206 ORIG_RAX: 000000000000004d
[ 243.056978] RAX: ffffffffffffffda RBX: 00007ffc4dda0cf0 RCX: 00007f6db123ab77
[ 243.058198] RDX: 000055715f9b2a00 RSI: 0000000001000000 RDI: 0000000000000018
[ 243.059478] RBP: 000055715f9b2bf0 R08: 0000000000000001 R09: 000055715f9b2c74
[ 243.060716] R10: 0000000000000000 R11: 0000000000000206 R12: 00007ffc4dda0ce8

I just realized I don't have to do anything fancy to reproduce this. I just
have to boot and shutdown guest few times to trigger it.