2023-03-15 12:20:10

by Pengfei Xu

[permalink] [raw]
Subject: [Syzkaller & bisect] There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel

Hi Li Huafei and kernel experts,

Greeting!

Platform: x86 platforms
There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel in guest:

All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230315_023443_synchronize_rcu
Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/repro.c
Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/kconfig_origin
Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/bisect_info.log
v6.3-rc2 issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/eeac8ede17557680855031c6f305ece2378af326_dmesg.log
"
[ 24.844767] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=330 'systemd'
[ 31.392668] hrtimer: interrupt took 10726570 ns
[ 62.552360] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 2942 jiffies old.
[ 92.760386] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 10494 jiffies old.
[ 182.808389] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 33006 jiffies old.
[ 300.120396] INFO: task repro:540 blocked for more than 147 seconds.
[ 300.120937] Not tainted 6.3.0-rc2-eeac8ede1755+ #1
[ 300.121351] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 300.121888] task:repro state:D stack:0 pid:540 ppid:538 flags:0x00004006
[ 300.122552] Call Trace:
[ 300.122835] <TASK>
[ 300.123267] __schedule+0x40a/0xc90
[ 300.124351] ? wait_for_completion+0x7b/0x180
[ 300.124772] schedule+0x5b/0xe0
[ 300.125156] schedule_timeout+0x50b/0x670
[ 300.125829] ? schedule_timeout+0x9/0x670
[ 300.126414] ? wait_for_completion+0x7b/0x180
[ 300.126826] wait_for_completion+0xa6/0x180
[ 300.127478] __wait_rcu_gp+0x136/0x160
[ 300.127922] ? __pfx_arch_ftrace_ops_list_func+0x10/0x10
[ 300.128639] synchronize_rcu_tasks_generic.part.24+0x3b/0x60
[ 300.129092] ? __pfx_call_rcu_tasks+0x10/0x10
[ 300.129499] ? __pfx_wakeme_after_rcu+0x10/0x10
[ 300.130115] ? verify_cpu+0x10/0x100
[ 300.130574] synchronize_rcu_tasks_generic+0x24/0x70
[ 300.131020] synchronize_rcu_tasks+0x19/0x20
[ 300.131438] ftrace_shutdown+0x1cc/0x410
[ 300.132206] unregister_ftrace_function+0x35/0x230
[ 300.132722] ? __sanitizer_cov_trace_switch+0x57/0xa0
[ 300.133311] perf_ftrace_event_register+0x95/0xf0
[ 300.133750] ? __pfx_tp_perf_event_destroy+0x10/0x10
[ 300.134190] perf_trace_destroy+0x3a/0xa0
[ 300.134590] ? __pfx_tp_perf_event_destroy+0x10/0x10
[ 300.135028] tp_perf_event_destroy+0x1e/0x30
[ 300.135473] _free_event+0x101/0x810
[ 300.136191] put_event+0x3c/0x50
[ 300.136605] perf_event_release_kernel+0x2de/0x360
[ 300.137011] ? perf_event_release_kernel+0x9/0x360
[ 300.137577] ? __pfx_perf_release+0x10/0x10
[ 300.138031] perf_release+0x22/0x30
[ 300.138434] __fput+0x11f/0x450
[ 300.139038] ____fput+0x1e/0x30
[ 300.139425] task_work_run+0xb6/0x120
[ 300.140164] do_exit+0x547/0x1360
[ 300.140632] ? write_comp_data+0x2f/0x90
[ 300.141223] do_group_exit+0x5e/0xf0
[ 300.141732] get_signal+0x15d1/0x1600
[ 300.142596] arch_do_signal_or_restart+0x33/0x280
[ 300.143463] exit_to_user_mode_prepare+0x13b/0x210
[ 300.143955] syscall_exit_to_user_mode+0x2d/0x60
[ 300.144623] do_syscall_64+0x4a/0x90
[ 300.145032] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 300.145459] RIP: 0033:0x7f2a7abd859d
[ 300.145823] RSP: 002b:00007ffcc732b338 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
[ 300.146429] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 00007f2a7abd859d
[ 300.146930] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 00000000200004c0
[ 300.147436] RBP: 00007ffcc732b350 R08: 0000000000000000 R09: 00007ffcc732b350
[ 300.147938] R10: 00000000ffffffff R11: 0000000000000246 R12: 00000000004011e0
[ 300.148645] R13: 00007ffcc732b470 R14: 0000000000000000 R15: 0000000000000000
[ 300.149713] </TASK>
[ 300.149978]
[ 300.149978] Showing all locks held in the system:
[ 300.150435] 1 lock held by rcu_tasks_kthre/11:
[ 300.150808] #0: ffffffff83f63490 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
[ 300.151849] 1 lock held by rcu_tasks_rude_/12:
[ 300.152593] #0: ffffffff83f63210 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
[ 300.153645] 1 lock held by rcu_tasks_trace/13:
[ 300.154013] #0: ffffffff83f62f50 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
[ 300.155078] 1 lock held by khungtaskd/29:
[ 300.155425] #0: ffffffff83f63ea0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x1b/0x1e0
[ 300.156644] 2 locks held by repro/540:
[ 300.156977] #0: ffffffff840206a8 (event_mutex){+.+.}-{3:3}, at: perf_trace_destroy+0x21/0xa0
[ 300.157971] #1: ffffffff8401cd68 (ftrace_lock){+.+.}-{3:3}, at: unregister_ftrace_function+0x2b/0x230
[ 300.159005]
[ 300.159233] =============================================
[ 300.159233]
"
Bisected and found first bad commit:
"
0e792b89e6800cd9cb4757a76a96f7ef3e8b6294
ftrace: Fix use-after-free for dynamic ftrace_ops
"
After reverted the above commit on top of v6.3-rc2 kernel, this issue was gone.

And this issue could be reproduced in 2100s.

There was no this issue report in syzbot link:
https://syzkaller.appspot.com/upstream

I hope it's helpful.

Thanks!

---

If you don't need the following environment to reproduce the problem or if you
already have one, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
// start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
// You could change the bzImage_xxx as you want
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel in vm.


Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl
make
make install

Thanks!
BR.


2023-04-13 03:00:37

by Pengfei Xu

[permalink] [raw]
Subject: Re: [Syzkaller & bisect] There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel

Hi Huafei and kernel experts,

It's a soft remind.
This issue could be reproduced in v6.3-rc6 kernel.
It could be reproduced on Alder lake, Raptor lake and so on x86 platforms.
After reverted the commit "0e792b89e6800c:ftrace: Fix use-after-free for
dynamic ftrace_ops" on top of v6.3-rc6 kernel, this issue was gone.

New syzkaller reproduced code, repro.report, bisect_info.log and detailed logs
are in link: https://github.com/xupengfe/syzkaller_logs/tree/main/230412_031722_synchronize_rcu

Thanks!
BR.

On 2023-03-15 at 20:20:44 +0800, Pengfei Xu wrote:
> Hi Li Huafei and kernel experts,
>
> Greeting!
>
> Platform: x86 platforms
> There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel in guest:
>
> All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230315_023443_synchronize_rcu
> Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/repro.c
> Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/kconfig_origin
> Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/bisect_info.log
> v6.3-rc2 issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/eeac8ede17557680855031c6f305ece2378af326_dmesg.log
> "
> [ 24.844767] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=330 'systemd'
> [ 31.392668] hrtimer: interrupt took 10726570 ns
> [ 62.552360] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 2942 jiffies old.
> [ 92.760386] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 10494 jiffies old.
> [ 182.808389] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 33006 jiffies old.
> [ 300.120396] INFO: task repro:540 blocked for more than 147 seconds.
> [ 300.120937] Not tainted 6.3.0-rc2-eeac8ede1755+ #1
> [ 300.121351] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 300.121888] task:repro state:D stack:0 pid:540 ppid:538 flags:0x00004006
> [ 300.122552] Call Trace:
> [ 300.122835] <TASK>
> [ 300.123267] __schedule+0x40a/0xc90
> [ 300.124351] ? wait_for_completion+0x7b/0x180
> [ 300.124772] schedule+0x5b/0xe0
> [ 300.125156] schedule_timeout+0x50b/0x670
> [ 300.125829] ? schedule_timeout+0x9/0x670
> [ 300.126414] ? wait_for_completion+0x7b/0x180
> [ 300.126826] wait_for_completion+0xa6/0x180
> [ 300.127478] __wait_rcu_gp+0x136/0x160
> [ 300.127922] ? __pfx_arch_ftrace_ops_list_func+0x10/0x10
> [ 300.128639] synchronize_rcu_tasks_generic.part.24+0x3b/0x60
> [ 300.129092] ? __pfx_call_rcu_tasks+0x10/0x10
> [ 300.129499] ? __pfx_wakeme_after_rcu+0x10/0x10
> [ 300.130115] ? verify_cpu+0x10/0x100
> [ 300.130574] synchronize_rcu_tasks_generic+0x24/0x70
> [ 300.131020] synchronize_rcu_tasks+0x19/0x20
> [ 300.131438] ftrace_shutdown+0x1cc/0x410
> [ 300.132206] unregister_ftrace_function+0x35/0x230
> [ 300.132722] ? __sanitizer_cov_trace_switch+0x57/0xa0
> [ 300.133311] perf_ftrace_event_register+0x95/0xf0
> [ 300.133750] ? __pfx_tp_perf_event_destroy+0x10/0x10
> [ 300.134190] perf_trace_destroy+0x3a/0xa0
> [ 300.134590] ? __pfx_tp_perf_event_destroy+0x10/0x10
> [ 300.135028] tp_perf_event_destroy+0x1e/0x30
> [ 300.135473] _free_event+0x101/0x810
> [ 300.136191] put_event+0x3c/0x50
> [ 300.136605] perf_event_release_kernel+0x2de/0x360
> [ 300.137011] ? perf_event_release_kernel+0x9/0x360
> [ 300.137577] ? __pfx_perf_release+0x10/0x10
> [ 300.138031] perf_release+0x22/0x30
> [ 300.138434] __fput+0x11f/0x450
> [ 300.139038] ____fput+0x1e/0x30
> [ 300.139425] task_work_run+0xb6/0x120
> [ 300.140164] do_exit+0x547/0x1360
> [ 300.140632] ? write_comp_data+0x2f/0x90
> [ 300.141223] do_group_exit+0x5e/0xf0
> [ 300.141732] get_signal+0x15d1/0x1600
> [ 300.142596] arch_do_signal_or_restart+0x33/0x280
> [ 300.143463] exit_to_user_mode_prepare+0x13b/0x210
> [ 300.143955] syscall_exit_to_user_mode+0x2d/0x60
> [ 300.144623] do_syscall_64+0x4a/0x90
> [ 300.145032] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [ 300.145459] RIP: 0033:0x7f2a7abd859d
> [ 300.145823] RSP: 002b:00007ffcc732b338 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
> [ 300.146429] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 00007f2a7abd859d
> [ 300.146930] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 00000000200004c0
> [ 300.147436] RBP: 00007ffcc732b350 R08: 0000000000000000 R09: 00007ffcc732b350
> [ 300.147938] R10: 00000000ffffffff R11: 0000000000000246 R12: 00000000004011e0
> [ 300.148645] R13: 00007ffcc732b470 R14: 0000000000000000 R15: 0000000000000000
> [ 300.149713] </TASK>
> [ 300.149978]
> [ 300.149978] Showing all locks held in the system:
> [ 300.150435] 1 lock held by rcu_tasks_kthre/11:
> [ 300.150808] #0: ffffffff83f63490 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
> [ 300.151849] 1 lock held by rcu_tasks_rude_/12:
> [ 300.152593] #0: ffffffff83f63210 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
> [ 300.153645] 1 lock held by rcu_tasks_trace/13:
> [ 300.154013] #0: ffffffff83f62f50 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
> [ 300.155078] 1 lock held by khungtaskd/29:
> [ 300.155425] #0: ffffffff83f63ea0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x1b/0x1e0
> [ 300.156644] 2 locks held by repro/540:
> [ 300.156977] #0: ffffffff840206a8 (event_mutex){+.+.}-{3:3}, at: perf_trace_destroy+0x21/0xa0
> [ 300.157971] #1: ffffffff8401cd68 (ftrace_lock){+.+.}-{3:3}, at: unregister_ftrace_function+0x2b/0x230
> [ 300.159005]
> [ 300.159233] =============================================
> [ 300.159233]
> "
> Bisected and found first bad commit:
> "
> 0e792b89e6800cd9cb4757a76a96f7ef3e8b6294
> ftrace: Fix use-after-free for dynamic ftrace_ops
> "
> After reverted the above commit on top of v6.3-rc2 kernel, this issue was gone.
>
> And this issue could be reproduced in 2100s.
>
> There was no this issue report in syzbot link:
> https://syzkaller.appspot.com/upstream
>
> I hope it's helpful.
>
> Thanks!
>
> ---
>
> If you don't need the following environment to reproduce the problem or if you
> already have one, please ignore the following information.
>
> How to reproduce:
> git clone https://gitlab.com/xupengfe/repro_vm_env.git
> cd repro_vm_env
> tar -xvf repro_vm_env.tar.gz
> cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
> // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
> // You could change the bzImage_xxx as you want
> You could use below command to log in, there is no password for root.
> ssh -p 10023 root@localhost
>
> After login vm(virtual machine) successfully, you could transfer reproduced
> binary to the vm by below way, and reproduce the problem in vm:
> gcc -pthread -o repro repro.c
> scp -P 10023 repro root@localhost:/root/
>
> Get the bzImage for target kernel:
> Please use target kconfig and copy it to kernel_src/.config
> make olddefconfig
> make -jx bzImage //x should equal or less than cpu num your pc has
>
> Fill the bzImage file into above start3.sh to load the target kernel in vm.
>
>
> Tips:
> If you already have qemu-system-x86_64, please ignore below info.
> If you want to install qemu v7.1.0 version:
> git clone https://github.com/qemu/qemu.git
> cd qemu
> git checkout -f v7.1.0
> mkdir build
> cd build
> yum install -y ninja-build.x86_64
> ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl
> make
> make install
>
> Thanks!
> BR.

2023-04-13 08:09:06

by Pengfei Xu

[permalink] [raw]
Subject: Re: [Syzkaller & bisect] There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel

Hi Danton,

On 2023-04-13 at 15:00:42 +0800, Hillf Danton wrote:
> On 13 Apr 2023 10:45:35 +0800 Pengfei Xu <[email protected]>
> > Hi Huafei and kernel experts,
> >
> > It's a soft remind.
> > This issue could be reproduced in v6.3-rc6 kernel.
> > It could be reproduced on Alder lake, Raptor lake and so on x86 platforms.
> > After reverted the commit "0e792b89e6800c:ftrace: Fix use-after-free for
> > dynamic ftrace_ops" on top of v6.3-rc6 kernel, this issue was gone.
> >
> > New syzkaller reproduced code, repro.report, bisect_info.log and detailed logs
> > are in link: https://github.com/xupengfe/syzkaller_logs/tree/main/230412_031722_synchronize_rcu
> >
> > Thanks!
> > BR.
> >
> > On 2023-03-15 at 20:20:44 +0800, Pengfei Xu wrote:
> > > Hi Li Huafei and kernel experts,
> > >
> > > Greeting!
> > >
> > > Platform: x86 platforms
> > > There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel in guest:
> > >
> > > All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230315_023443_synchronize_rcu
> > > Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/repro.c
> > > Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/kconfig_origin
> > > Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/bisect_info.log
> > > v6.3-rc2 issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/eeac8ede17557680855031c6f305ece2378af326_dmesg.log
> > > "
> > > [ 24.844767] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=330 'systemd'
> > > [ 31.392668] hrtimer: interrupt took 10726570 ns
>
> What hrtimer/s triggered this print? And for what? The hrtimer trace helps here.
> Info like that helps work out the cause of the task hang.
> Feel free to fold in the debug diff below if it makes sense to you.
>
Thanks for your debug patch.

I newly updated previous alder lake S reproduced syzkaller repro.prog and
machineInfo0(vm info) info in log link:
https://github.com/xupengfe/syzkaller_logs/tree/main/230315_023443_synchronize_rcu

And tried debug kernel on alder lake S, there were 2 cpu in guest, and
CPU1 reported "interrupt took 331596 ns", then kernel should not trigger
"rcu_tasks_wait_gp" for this issue.
"
[ 104.171591] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=334 'systemd'
[ 126.770981] hrtimer CPU1: interrupt took 331596 ns // "rcu_tasks_wait_gp" after 15s
[ 141.127282] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 2571 jiffies old.
[ 171.272100] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 10107 jiffies old.
[ 261.767189] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 32731 jiffies old.
[ 300.102911] INFO: task repro_rcu:398 blocked for more than 147 seconds.
[ 300.103408] Not tainted 6.3.0-rc6-dbg-dirty #1
[ 300.103743] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 300.104230] task:repro_rcu state:D stack:0 pid:398 ppid:396 flags:0x00004006
[ 300.104780] Call Trace:
[ 300.104992] <TASK>
[ 300.105265] __schedule+0x40a/0xc30
[ 300.105706] ? wait_for_completion+0x7b/0x180
[ 300.106041] schedule+0x5b/0xe0
[ 300.106330] schedule_timeout+0x4db/0x5b0
[ 300.106959] ? schedule_timeout+0x9/0x5b0
[ 300.107364] ? wait_for_completion+0x7b/0x180
[ 300.107698] wait_for_completion+0xa6/0x180
[ 300.108141] __wait_rcu_gp+0x136/0x160
[ 300.108474] ? __pfx_arch_ftrace_ops_list_func+0x10/0x10
[ 300.108875] synchronize_rcu_tasks_generic.part.24+0x3b/0x60
[ 300.109263] ? __pfx_call_rcu_tasks+0x10/0x10
[ 300.109592] ? __pfx_wakeme_after_rcu+0x10/0x10
[ 300.110020] ? verify_cpu+0x10/0x100
[ 300.110353] synchronize_rcu_tasks_generic+0x24/0x70
[ 300.110840] synchronize_rcu_tasks+0x19/0x20
[ 300.111173] ftrace_shutdown+0x1cc/0x410
[ 300.111564] unregister_ftrace_function+0x35/0x230
[ 300.111950] ? __sanitizer_cov_trace_switch+0x57/0xa0
[ 300.112380] perf_ftrace_event_register+0x95/0xf0
[ 300.112733] ? __pfx_tp_perf_event_destroy+0x10/0x10
[ 300.113092] perf_trace_destroy+0x3a/0xa0
[ 300.113410] ? __pfx_tp_perf_event_destroy+0x10/0x10
[ 300.113768] tp_perf_event_destroy+0x1e/0x30
[ 300.114109] _free_event+0x101/0x810
[ 300.114470] put_event+0x3c/0x50
[ 300.114920] perf_event_release_kernel+0x2de/0x360
[ 300.115266] ? perf_event_release_kernel+0x9/0x360
[ 300.115685] ? __pfx_perf_release+0x10/0x10
[ 300.116036] perf_release+0x22/0x30
[ 300.116344] __fput+0x11f/0x450
[ 300.116739] ____fput+0x1e/0x30
[ 300.117031] task_work_run+0xb6/0x120
[ 300.117407] do_exit+0x547/0x12b0
[ 300.117739] ? write_comp_data+0x2f/0x90
[ 300.118141] do_group_exit+0x5e/0xf0
[ 300.118498] get_signal+0x1465/0x14a0
[ 300.119142] arch_do_signal_or_restart+0x33/0x280
[ 300.119697] exit_to_user_mode_prepare+0x13b/0x210
[ 300.120078] syscall_exit_to_user_mode+0x2d/0x60
[ 300.120439] do_syscall_64+0x4a/0x90
[ 300.120748] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 300.121102] RIP: 0033:0x7f2c55b8a59d
[ 300.121389] RSP: 002b:00007ffc6ea72078 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
[ 300.121894] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 00007f2c55b8a59d
[ 300.122332] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000080
[ 300.122890] RBP: 00007ffc6ea72090 R08: 0000000000000000 R09: 00007ffc6ea72090
[ 300.123327] R10: 00000000ffffffff R11: 0000000000000246 R12: 00000000004011c0
[ 300.123765] R13: 00007ffc6ea721b0 R14: 0000000000000000 R15: 0000000000000000
[ 300.124479] </TASK>
[ 300.124676]
Showing all locks held in the system:
[ 300.125063] 1 lock held by rcu_tasks_kthre/11:
[ 300.125370] #0: ffffffff83d63450 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
[ 300.126170] 1 lock held by rcu_tasks_rude_/12:
[ 300.126476] #0: ffffffff83d631d0 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
[ 300.127421] 1 lock held by rcu_tasks_trace/13:
[ 300.127728] #0: ffffffff83d62f10 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
[ 300.128551] 1 lock held by khungtaskd/29:
[ 300.128834] #0: ffffffff83d63e60 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x1b/0x1e0
[ 300.129607] 1 lock held by systemd-journal/124:
[ 300.129921] 2 locks held by repro_rcu/398:
[ 300.130209] #0: ffffffff83e20668 (event_mutex){+.+.}-{3:3}, at: perf_trace_destroy+0x21/0xa0
[ 300.131083] #1: ffffffff83e1cd28 (ftrace_lock){+.+.}-{3:3}, at: unregister_ftrace_function+0x2b/0x230

[ 300.132040] =============================================
"

Thanks!
BR.

> +++ b/kernel/time/hrtimer.c
> @@ -1866,7 +1866,7 @@ retry:
> else
> expires_next = ktime_add(now, delta);
> tick_program_event(expires_next, 1);
> - pr_warn_once("hrtimer: interrupt took %llu ns\n", ktime_to_ns(delta));
> + pr_warn_once("hrtimer CPU%u: interrupt took %llu ns\n", cpu_base->cpu, ktime_to_ns(delta));
> }
>
> /* called with interrupts disabled */

2023-04-13 09:08:51

by Pengfei Xu

[permalink] [raw]
Subject: Re: [Syzkaller & bisect] There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel

On 2023-04-13 at 16:30:17 +0800, Hillf Danton wrote:
> On Thu, 13 Apr 2023 16:03:51 +0800 Pengfei Xu <[email protected]>
> >
> > I newly updated previous alder lake S reproduced syzkaller repro.prog and
> > machineInfo0(vm info) info in log link:
> > https://github.com/xupengfe/syzkaller_logs/tree/main/230315_023443_synchronize_rcu
> >
> > And tried debug kernel on alder lake S, there were 2 cpu in guest, and
> > CPU1 reported "interrupt took 331596 ns", then kernel should not trigger
> > "rcu_tasks_wait_gp" for this issue.
> > [ 104.171591] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=334 'systemd'
> > [ 126.770981] hrtimer CPU1: interrupt took 331596 ns // "rcu_tasks_wait_gp" after 15s
> > [ 141.127282] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 2571 jiffies old.
> > [ 171.272100] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 10107 jiffies old.
> > [ 261.767189] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 32731 jiffies old.
> > [ 300.102911] INFO: task repro_rcu:398 blocked for more than 147 seconds.
> > [ 300.103408] Not tainted 6.3.0-rc6-dbg-dirty #1
>
> Similar issue, and hrtimer trace helps in both cases.
I enabled "hrtimer_*" and tried again.
I didn't find some clue for this issue due to lack of experience for hrtimer
logs.
Anyway I put the issue dmesg and hrtimer in link:
https://github.com/xupengfe/syzkaller_logs/tree/main/230412_031722_synchronize_rcu/0413_debug_on_alder_lake-s

Thanks!
BR.

2023-04-13 11:35:18

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [Syzkaller & bisect] There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel

On Thu, Apr 13, 2023 at 10:45:35AM +0800, Pengfei Xu wrote:
> Hi Huafei and kernel experts,
>
> It's a soft remind.
> This issue could be reproduced in v6.3-rc6 kernel.
> It could be reproduced on Alder lake, Raptor lake and so on x86 platforms.
> After reverted the commit "0e792b89e6800c:ftrace: Fix use-after-free for
> dynamic ftrace_ops" on top of v6.3-rc6 kernel, this issue was gone.
>
> New syzkaller reproduced code, repro.report, bisect_info.log and detailed logs
> are in link:
> https://github.com/xupengfe/syzkaller_logs/tree/main/230412_031722_synchronize_rcu

I just tested against v6.3-rc6 and again all I get is this io_ring related
issue:

[ 448.290752] INFO: task kworker/u4:0:9 blocked for more than 294 seconds.
[ 448.293868] Not tainted 6.3.0-rc6-kvm #1
[ 448.296019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 448.299592] task:kworker/u4:0 state:D stack:0 pid:9 ppid:2 flags:0x00004000
[ 448.303397] Workqueue: events_unbound io_ring_exit_work
[ 448.305884] Call Trace:
[ 448.307147] <TASK>
[ 448.308166] __schedule+0x422/0xc90
[ 448.309824] ? wait_for_completion+0x77/0x170
[ 448.311870] schedule+0x63/0xd0
[ 448.313346] schedule_timeout+0x2fe/0x4c0
[ 448.315255] ? __this_cpu_preempt_check+0x1c/0x30
[ 448.317360] ? _raw_spin_unlock_irq+0x27/0x60
[ 448.319400] ? lockdep_hardirqs_on+0x88/0x120
[ 448.321395] ? wait_for_completion+0x77/0x170
[ 448.323462] wait_for_completion+0x9e/0x170
[ 448.325356] io_ring_exit_work+0x2b0/0x810
[ 448.327300] ? __pfx_io_tctx_exit_cb+0x10/0x10
[ 448.329345] ? _raw_spin_unlock_irq+0x27/0x60
[ 448.331397] process_one_work+0x34e/0x720
[ 448.333212] ? __pfx_io_ring_exit_work+0x10/0x10
[ 448.335377] ? process_one_work+0x34e/0x720
[ 448.337295] worker_thread+0x4e/0x530
[ 448.339079] ? __pfx_worker_thread+0x10/0x10
[ 448.341008] kthread+0x128/0x160
[ 448.342513] ? __pfx_kthread+0x10/0x10
[ 448.344305] ret_from_fork+0x2c/0x50
[ 448.346016] </TASK>
[ 448.347176]
[ 448.347176] Showing all locks held in the system:
[ 448.349887] 2 locks held by kworker/u4:0/9:
[ 448.351829] #0: ffff88807eb6dd38 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x2b4/0x720
[ 448.356362] #1: ffffc9000005fe68 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x2b4/0x720
[ 448.361052] 1 lock held by rcu_tasks_kthre/11:
[ 448.363142] #0: ffffffff83963450 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x440
[ 448.367396] 1 lock held by rcu_tasks_rude_/12:
[ 448.369387] #0: ffffffff839631d0 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x440
[ 448.373808] 1 lock held by rcu_tasks_trace/13:
[ 448.375852] #0: ffffffff83962f10 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x440
[ 448.380434] 1 lock held by khungtaskd/30:
[ 448.382219] #0: ffffffff83963e60 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x17/0x1d0
[ 448.386291]
[ 448.387159] =============================================
[ 448.387159]

2023-04-13 11:54:04

by Zqiang

[permalink] [raw]
Subject: RE: [Syzkaller & bisect] There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel

> Hi Huafei and kernel experts,
>
> It's a soft remind.
> This issue could be reproduced in v6.3-rc6 kernel.
> It could be reproduced on Alder lake, Raptor lake and so on x86 platforms.
> After reverted the commit "0e792b89e6800c:ftrace: Fix use-after-free for
> dynamic ftrace_ops" on top of v6.3-rc6 kernel, this issue was gone.
>
> New syzkaller reproduced code, repro.report, bisect_info.log and detailed logs
> are in link:
> https://github.com/xupengfe/syzkaller_logs/tree/main/230412_031722_synchronize_rcu
>
>I just tested against v6.3-rc6 and again all I get is this io_ring related
>issue:
>

I also did not reproduce this synchronize_rcu_tasks() hung problem, but from this link:
https://github.com/xupengfe/syzkaller_logs/blob/main/230412_031722_synchronize_rcu/02bf43c7b7f7a19aa59a75f5244f0a3408bace1a_dmesg.log

I find rcu-task stall

INFO: rcu_tasks detected stalls on tasks:
[ 1243.720952] 000000008aaceb95: .. nvcsw: 1/1 holdout: 1 idle_cpu: -1/0
[ 1243.721915] task:repro state:R running task stack:0 pid:592 ppid:585 flags:0x00004004
[ 1243.723486] Call Trace:
[ 1243.723996] <TASK>
[ 1243.725204] __schedule+0x385/0xb00
[ 1243.726212] ? __cond_resched+0x25/0x30
[ 1243.727108] preempt_schedule_common+0x11/0x90
[ 1243.727964] __cond_resched+0x25/0x30
[ 1243.729076] mousedev_write+0xd2/0x3d0
[ 1243.730168] vfs_write+0x1b6/0x780
[ 1243.730873] ? mousedev_fasync+0x40/0x40
[ 1243.731864] ? vfs_write+0x9/0x780
[ 1243.732874] ? write_comp_data+0x2f/0x90
[ 1243.733926] ksys_write+0x153/0x170
[ 1243.734929] __x64_sys_write+0x27/0x30
[ 1243.735718] do_syscall_64+0x3b/0x90
[ 1243.736760] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 1243.737586] RIP: 0033:0x7f374119759d

Thanks
Zqiang


>
>[ 448.290752] INFO: task kworker/u4:0:9 blocked for more than 294 seconds.
>[ 448.293868] Not tainted 6.3.0-rc6-kvm #1
>[ 448.296019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>[ 448.299592] task:kworker/u4:0 state:D stack:0 pid:9 ppid:2 flags:0x00004000
>[ 448.303397] Workqueue: events_unbound io_ring_exit_work
>[ 448.305884] Call Trace:
>[ 448.307147] <TASK>
>[ 448.308166] __schedule+0x422/0xc90
>[ 448.309824] ? wait_for_completion+0x77/0x170
>[ 448.311870] schedule+0x63/0xd0
>[ 448.313346] schedule_timeout+0x2fe/0x4c0
>[ 448.315255] ? __this_cpu_preempt_check+0x1c/0x30
>[ 448.317360] ? _raw_spin_unlock_irq+0x27/0x60
>[ 448.319400] ? lockdep_hardirqs_on+0x88/0x120
>[ 448.321395] ? wait_for_completion+0x77/0x170
>[ 448.323462] wait_for_completion+0x9e/0x170
>[ 448.325356] io_ring_exit_work+0x2b0/0x810
>[ 448.327300] ? __pfx_io_tctx_exit_cb+0x10/0x10
>[ 448.329345] ? _raw_spin_unlock_irq+0x27/0x60
>[ 448.331397] process_one_work+0x34e/0x720
>[ 448.333212] ? __pfx_io_ring_exit_work+0x10/0x10
>[ 448.335377] ? process_one_work+0x34e/0x720
>[ 448.337295] worker_thread+0x4e/0x530
>[ 448.339079] ? __pfx_worker_thread+0x10/0x10
>[ 448.341008] kthread+0x128/0x160
>[ 448.342513] ? __pfx_kthread+0x10/0x10
>[ 448.344305] ret_from_fork+0x2c/0x50
>[ 448.346016] </TASK>
>[ 448.347176]
>[ 448.347176] Showing all locks held in the system:
>[ 448.349887] 2 locks held by kworker/u4:0/9:
>[ 448.351829] #0: ffff88807eb6dd38 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x2b4/0x720
>[ 448.356362] #1: ffffc9000005fe68 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x2b4/0x720
>[ 448.361052] 1 lock held by rcu_tasks_kthre/11:
>[ 448.363142] #0: ffffffff83963450 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x440
>[ 448.367396] 1 lock held by rcu_tasks_rude_/12:
>[ 448.369387] #0: ffffffff839631d0 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x440
>[ 448.373808] 1 lock held by rcu_tasks_trace/13:
>[ 448.375852] #0: ffffffff83962f10 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x440
>[ 448.380434] 1 lock held by khungtaskd/30:
>[ 448.382219] #0: ffffffff83963e60 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x17/0x1d0
>[ 448.386291]
>[ 448.387159] =============================================
>[ 448.387159]

2023-04-13 12:11:15

by Pengfei Xu

[permalink] [raw]
Subject: Re: [Syzkaller & bisect] There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel

On 2023-04-13 at 13:19:08 +0200, Frederic Weisbecker wrote:
> On Thu, Apr 13, 2023 at 10:45:35AM +0800, Pengfei Xu wrote:
> > Hi Huafei and kernel experts,
> >
> > It's a soft remind.
> > This issue could be reproduced in v6.3-rc6 kernel.
> > It could be reproduced on Alder lake, Raptor lake and so on x86 platforms.
> > After reverted the commit "0e792b89e6800c:ftrace: Fix use-after-free for
> > dynamic ftrace_ops" on top of v6.3-rc6 kernel, this issue was gone.
> >
> > New syzkaller reproduced code, repro.report, bisect_info.log and detailed logs
> > are in link:
> > https://github.com/xupengfe/syzkaller_logs/tree/main/230412_031722_synchronize_rcu
>
> I just tested against v6.3-rc6 and again all I get is this io_ring related
> issue:
>
Seems this issue could get different behavior on different platforms.
I could reproduce the synchronize_rcu issue on Alder lake S(cpu model:151)
and Raptor lake P(cpu model:186) platforms.

Thanks!
BR.

> [ 448.290752] INFO: task kworker/u4:0:9 blocked for more than 294 seconds.
> [ 448.293868] Not tainted 6.3.0-rc6-kvm #1
> [ 448.296019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 448.299592] task:kworker/u4:0 state:D stack:0 pid:9 ppid:2 flags:0x00004000
> [ 448.303397] Workqueue: events_unbound io_ring_exit_work
> [ 448.305884] Call Trace:
> [ 448.307147] <TASK>
> [ 448.308166] __schedule+0x422/0xc90
> [ 448.309824] ? wait_for_completion+0x77/0x170
> [ 448.311870] schedule+0x63/0xd0
> [ 448.313346] schedule_timeout+0x2fe/0x4c0
> [ 448.315255] ? __this_cpu_preempt_check+0x1c/0x30
> [ 448.317360] ? _raw_spin_unlock_irq+0x27/0x60
> [ 448.319400] ? lockdep_hardirqs_on+0x88/0x120
> [ 448.321395] ? wait_for_completion+0x77/0x170
> [ 448.323462] wait_for_completion+0x9e/0x170
> [ 448.325356] io_ring_exit_work+0x2b0/0x810
> [ 448.327300] ? __pfx_io_tctx_exit_cb+0x10/0x10
> [ 448.329345] ? _raw_spin_unlock_irq+0x27/0x60
> [ 448.331397] process_one_work+0x34e/0x720
> [ 448.333212] ? __pfx_io_ring_exit_work+0x10/0x10
> [ 448.335377] ? process_one_work+0x34e/0x720
> [ 448.337295] worker_thread+0x4e/0x530
> [ 448.339079] ? __pfx_worker_thread+0x10/0x10
> [ 448.341008] kthread+0x128/0x160
> [ 448.342513] ? __pfx_kthread+0x10/0x10
> [ 448.344305] ret_from_fork+0x2c/0x50
> [ 448.346016] </TASK>
> [ 448.347176]
> [ 448.347176] Showing all locks held in the system:
> [ 448.349887] 2 locks held by kworker/u4:0/9:
> [ 448.351829] #0: ffff88807eb6dd38 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x2b4/0x720
> [ 448.356362] #1: ffffc9000005fe68 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x2b4/0x720
> [ 448.361052] 1 lock held by rcu_tasks_kthre/11:
> [ 448.363142] #0: ffffffff83963450 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x440
> [ 448.367396] 1 lock held by rcu_tasks_rude_/12:
> [ 448.369387] #0: ffffffff839631d0 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x440
> [ 448.373808] 1 lock held by rcu_tasks_trace/13:
> [ 448.375852] #0: ffffffff83962f10 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x440
> [ 448.380434] 1 lock held by khungtaskd/30:
> [ 448.382219] #0: ffffffff83963e60 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x17/0x1d0
> [ 448.386291]
> [ 448.387159] =============================================
> [ 448.387159]

2023-04-13 12:41:23

by Jens Axboe

[permalink] [raw]
Subject: Re: [Syzkaller & bisect] There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel

On 4/13/23 5:19 AM, Frederic Weisbecker wrote:
> On Thu, Apr 13, 2023 at 10:45:35AM +0800, Pengfei Xu wrote:
>> Hi Huafei and kernel experts,
>>
>> It's a soft remind.
>> This issue could be reproduced in v6.3-rc6 kernel.
>> It could be reproduced on Alder lake, Raptor lake and so on x86 platforms.
>> After reverted the commit "0e792b89e6800c:ftrace: Fix use-after-free for
>> dynamic ftrace_ops" on top of v6.3-rc6 kernel, this issue was gone.
>>
>> New syzkaller reproduced code, repro.report, bisect_info.log and detailed logs
>> are in link:
>> https://github.com/xupengfe/syzkaller_logs/tree/main/230412_031722_synchronize_rcu
>
> I just tested against v6.3-rc6 and again all I get is this io_ring related
> issue:
>
> [ 448.290752] INFO: task kworker/u4:0:9 blocked for more than 294 seconds.
> [ 448.293868] Not tainted 6.3.0-rc6-kvm #1
> [ 448.296019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 448.299592] task:kworker/u4:0 state:D stack:0 pid:9 ppid:2 flags:0x00004000
> [ 448.303397] Workqueue: events_unbound io_ring_exit_work
> [ 448.305884] Call Trace:
> [ 448.307147] <TASK>
> [ 448.308166] __schedule+0x422/0xc90
> [ 448.309824] ? wait_for_completion+0x77/0x170
> [ 448.311870] schedule+0x63/0xd0
> [ 448.313346] schedule_timeout+0x2fe/0x4c0
> [ 448.315255] ? __this_cpu_preempt_check+0x1c/0x30
> [ 448.317360] ? _raw_spin_unlock_irq+0x27/0x60
> [ 448.319400] ? lockdep_hardirqs_on+0x88/0x120
> [ 448.321395] ? wait_for_completion+0x77/0x170
> [ 448.323462] wait_for_completion+0x9e/0x170
> [ 448.325356] io_ring_exit_work+0x2b0/0x810
> [ 448.327300] ? __pfx_io_tctx_exit_cb+0x10/0x10
> [ 448.329345] ? _raw_spin_unlock_irq+0x27/0x60
> [ 448.331397] process_one_work+0x34e/0x720
> [ 448.333212] ? __pfx_io_ring_exit_work+0x10/0x10
> [ 448.335377] ? process_one_work+0x34e/0x720
> [ 448.337295] worker_thread+0x4e/0x530
> [ 448.339079] ? __pfx_worker_thread+0x10/0x10
> [ 448.341008] kthread+0x128/0x160
> [ 448.342513] ? __pfx_kthread+0x10/0x10
> [ 448.344305] ret_from_fork+0x2c/0x50
> [ 448.346016] </TASK>
> [ 448.347176]
> [ 448.347176] Showing all locks held in the system:
> [ 448.349887] 2 locks held by kworker/u4:0/9:
> [ 448.351829] #0: ffff88807eb6dd38 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x2b4/0x720
> [ 448.356362] #1: ffffc9000005fe68 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x2b4/0x720
> [ 448.361052] 1 lock held by rcu_tasks_kthre/11:
> [ 448.363142] #0: ffffffff83963450 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x440
> [ 448.367396] 1 lock held by rcu_tasks_rude_/12:
> [ 448.369387] #0: ffffffff839631d0 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x440
> [ 448.373808] 1 lock held by rcu_tasks_trace/13:
> [ 448.375852] #0: ffffffff83962f10 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x440
> [ 448.380434] 1 lock held by khungtaskd/30:
> [ 448.382219] #0: ffffffff83963e60 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x17/0x1d0
> [ 448.386291]
> [ 448.387159] =============================================
> [ 448.387159]

Is there a reproducer for this one so we can take a look? It's not
impossible to get into this state if you muck with signals, eg ring
exit work is queued but needs requests to complete, and the latter
is prevented by deliberately sending a SIGSTOP to the task that needs
to complete them.

--
Jens Axboe


2023-04-13 13:58:09

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [Syzkaller & bisect] There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel

Le Thu, Apr 13, 2023 at 06:40:16AM -0600, Jens Axboe a ?crit :
> Is there a reproducer for this one so we can take a look? It's not
> impossible to get into this state if you muck with signals, eg ring
> exit work is queued but needs requests to complete, and the latter
> is prevented by deliberately sending a SIGSTOP to the task that needs
> to complete them.

Yep, the original message provides all the necessary steps to reproduce:

https://lore.kernel.org/lkml/[email protected]/

I cross my fingers that you'll be able to trigger it...

Thanks.