2024-03-27 18:39:54

by syzbot

[permalink] [raw]
Subject: [syzbot] [audit?] [bpf?] INFO: rcu detected stall in kauditd_thread (4)

Hello,

syzbot found the following issue on:

HEAD commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=121a9e21180000
kernel config: https://syzkaller.appspot.com/x/.config?x=aef2a55903e5791c
dashboard link: https://syzkaller.appspot.com/bug?extid=81f5ca46b043d4a1b789
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12104f9e180000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=112840ee180000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/089e25869df5/disk-fe46a7dd.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/423b1787914f/vmlinux-fe46a7dd.xz
kernel image: https://storage.googleapis.com/syzbot-assets/4c043e30c07d/bzImage-fe46a7dd.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 1-...!: (1 ticks this GP) idle=9bf4/1/0x4000000000000000 softirq=6591/6591 fqs=5
rcu: (detected by 0, t=10502 jiffies, g=7873, q=65 ncpus=2)
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 28 Comm: kauditd Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
RIP: 0010:write_comp_data+0x0/0x90 kernel/kcov.c:230
Code: 48 8b 05 13 e2 76 7e 48 8b 80 08 16 00 00 c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <49> 89 d2 49 89 f8 49 89 f1 65 48 8b 15 df e1 76 7e 65 8b 05 e0 e1
RSP: 0018:ffffc90000a08d78 EFLAGS: 00000046
RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff88e73a44
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000005
RBP: 000000000003d3cc R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000002 R12: ffff8880b952cac0
R13: ffff88802abeb340 R14: ffff88802abeb340 R15: ffffffff88e73220
FS: 0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555582f0dca8 CR3: 000000002bcea000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<NMI>
</NMI>
<IRQ>
advance_sched+0x824/0xc60 net/sched/sch_taprio.c:925
__run_hrtimer kernel/time/hrtimer.c:1692 [inline]
__hrtimer_run_queues+0x20c/0xc20 kernel/time/hrtimer.c:1756
hrtimer_interrupt+0x31b/0x800 kernel/time/hrtimer.c:1818
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1032 [inline]
__sysvec_apic_timer_interrupt+0x10f/0x410 arch/x86/kernel/apic/apic.c:1049
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
sysvec_apic_timer_interrupt+0x90/0xb0 arch/x86/kernel/apic/apic.c:1043
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
RIP: 0010:console_flush_all+0xa19/0xd70 kernel/printk/printk.c:2979
Code: e8 9c dd 25 00 9c 5b 81 e3 00 02 00 00 31 ff 48 89 de e8 0a c9 1e 00 48 85 db 0f 85 8b 01 00 00 e8 ec cd 1e 00 fb 48 8b 04 24 <4c> 89 fa 83 e2 07 0f b6 00 38 d0 7f 08 84 c0 0f 85 a9 02 00 00 41
RSP: 0018:ffffc90000a47a98 EFLAGS: 00000293
RAX: fffff52000148f7a RBX: 0000000000000000 RCX: ffffffff816e2ab6
RDX: ffff888018ee9e00 RSI: ffffffff816e2ac4 RDI: 0000000000000007
RBP: dffffc0000000000 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000003 R12: 0000000000000200
R13: ffffffff8e3995f8 R14: ffffffff8e3995a0 R15: ffffc90000a47bd0
console_unlock+0xae/0x290 kernel/printk/printk.c:3042
vprintk_emit kernel/printk/printk.c:2342 [inline]
vprintk_emit+0x11a/0x5a0 kernel/printk/printk.c:2297
vprintk+0x7f/0xa0 kernel/printk/printk_safe.c:45
_printk+0xc8/0x100 kernel/printk/printk.c:2367
kauditd_printk_skb kernel/audit.c:546 [inline]
kauditd_hold_skb+0x1fb/0x240 kernel/audit.c:581
kauditd_send_queue+0x236/0x290 kernel/audit.c:766
kauditd_thread+0x61e/0xa80 kernel/audit.c:890
kthread+0x2c1/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
</TASK>
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 2.415 msecs
rcu: rcu_preempt kthread starved for 10491 jiffies! g7873 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:R running task stack:28304 pid:16 tgid:16 ppid:2 flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5409 [inline]
__schedule+0xf15/0x5c70 kernel/sched/core.c:6736
__schedule_loop kernel/sched/core.c:6813 [inline]
schedule+0xe7/0x350 kernel/sched/core.c:6828
schedule_timeout+0x136/0x2a0 kernel/time/timer.c:2572
rcu_gp_fqs_loop+0x1eb/0xb00 kernel/rcu/tree.c:1663
rcu_gp_kthread+0x271/0x380 kernel/rcu/tree.c:1862
kthread+0x2c1/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
</TASK>
rcu: Stack dump where RCU GP kthread last ran:
CPU: 0 PID: 43 Comm: kworker/u8:3 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
Workqueue: events_unbound toggle_allocation_gate
RIP: 0010:csd_lock_wait kernel/smp.c:311 [inline]
RIP: 0010:smp_call_function_many_cond+0x4e7/0x1590 kernel/smp.c:855
Code: 0c 00 85 ed 74 4d 48 b8 00 00 00 00 00 fc ff df 4d 89 f4 4c 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 e8 4b 05 0c 00 f3 90 <41> 0f b6 04 24 40 38 c5 7c 08 84 c0 0f 85 5d 0e 00 00 8b 43 08 31
RSP: 0018:ffffc90000b37910 EFLAGS: 00000293
RAX: 0000000000000000 RBX: ffff8880b9544380 RCX: ffffffff8180f38b
RDX: ffff8880192f0000 RSI: ffffffff8180f365 RDI: 0000000000000005
RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000006 R12: ffffed10172a8871
R13: 0000000000000001 R14: ffff8880b9544388 R15: ffff8880b943f840
FS: 0000000000000000(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055e176bb4000 CR3: 000000000d57a000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
</IRQ>
<TASK>
on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:1023
on_each_cpu include/linux/smp.h:71 [inline]
text_poke_sync arch/x86/kernel/alternative.c:2086 [inline]
text_poke_bp_batch+0x22b/0x760 arch/x86/kernel/alternative.c:2296
text_poke_flush arch/x86/kernel/alternative.c:2487 [inline]
text_poke_flush arch/x86/kernel/alternative.c:2484 [inline]
text_poke_finish+0x30/0x40 arch/x86/kernel/alternative.c:2494
arch_jump_label_transform_apply+0x1c/0x30 arch/x86/kernel/jump_label.c:146
jump_label_update+0x1d7/0x400 kernel/jump_label.c:829
static_key_enable_cpuslocked+0x1b7/0x270 kernel/jump_label.c:205
static_key_enable+0x1a/0x20 kernel/jump_label.c:218
toggle_allocation_gate mm/kfence/core.c:826 [inline]
toggle_allocation_gate+0xf8/0x250 mm/kfence/core.c:818
process_one_work+0x9a9/0x1a60 kernel/workqueue.c:3254
process_scheduled_works kernel/workqueue.c:3335 [inline]
worker_thread+0x6c8/0xf70 kernel/workqueue.c:3416
kthread+0x2c1/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
</TASK>
watchdog: BUG: soft lockup - CPU#0 stuck for 246s! [kworker/u8:3:43]
Modules linked in:
irq event stamp: 519008
hardirqs last enabled at (519007): [<ffffffff8ad251cb>] irqentry_exit+0x3b/0x90 kernel/entry/common.c:351
hardirqs last disabled at (519008): [<ffffffff8ad239ae>] sysvec_apic_timer_interrupt+0xe/0xb0 arch/x86/kernel/apic/apic.c:1043
softirqs last enabled at (519006): [<ffffffff8ad63156>] softirq_handle_end kernel/softirq.c:400 [inline]
softirqs last enabled at (519006): [<ffffffff8ad63156>] __do_softirq+0x596/0x8de kernel/softirq.c:583
softirqs last disabled at (518991): [<ffffffff8151a149>] invoke_softirq kernel/softirq.c:428 [inline]
softirqs last disabled at (518991): [<ffffffff8151a149>] __irq_exit_rcu kernel/softirq.c:633 [inline]
softirqs last disabled at (518991): [<ffffffff8151a149>] irq_exit_rcu+0xb9/0x120 kernel/softirq.c:645
CPU: 0 PID: 43 Comm: kworker/u8:3 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
Workqueue: events_unbound toggle_allocation_gate
RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:26 [inline]
RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline]
RIP: 0010:write_comp_data+0x11/0x90 kernel/kcov.c:236
Code: cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 49 89 d2 49 89 f8 49 89 f1 65 48 8b 15 df e1 76 7e <65> 8b 05 e0 e1 76 7e a9 00 01 ff 00 74 0f f6 c4 01 74 59 8b 82 14
RSP: 0018:ffffc90000b37908 EFLAGS: 00000202
RAX: 0000000000000001 RBX: ffff8880b9544380 RCX: ffffffff8180f38b
RDX: ffff8880192f0000 RSI: 0000000000000000 RDI: 0000000000000005
RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000006 R12: ffffed10172a8871
R13: 0000000000000001 R14: ffff8880b9544388 R15: ffff8880b943f840
FS: 0000000000000000(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055e176bb4000 CR3: 000000000d57a000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
</IRQ>
<TASK>
csd_lock_wait kernel/smp.c:311 [inline]
smp_call_function_many_cond+0x50b/0x1590 kernel/smp.c:855
on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:1023
on_each_cpu include/linux/smp.h:71 [inline]
text_poke_sync arch/x86/kernel/alternative.c:2086 [inline]
text_poke_bp_batch+0x22b/0x760 arch/x86/kernel/alternative.c:2296
text_poke_flush arch/x86/kernel/alternative.c:2487 [inline]
text_poke_flush arch/x86/kernel/alternative.c:2484 [inline]
text_poke_finish+0x30/0x40 arch/x86/kernel/alternative.c:2494
arch_jump_label_transform_apply+0x1c/0x30 arch/x86/kernel/jump_label.c:146
jump_label_update+0x1d7/0x400 kernel/jump_label.c:829
static_key_enable_cpuslocked+0x1b7/0x270 kernel/jump_label.c:205
static_key_enable+0x1a/0x20 kernel/jump_label.c:218
toggle_allocation_gate mm/kfence/core.c:826 [inline]
toggle_allocation_gate+0xf8/0x250 mm/kfence/core.c:818
process_one_work+0x9a9/0x1a60 kernel/workqueue.c:3254
process_scheduled_works kernel/workqueue.c:3335 [inline]
worker_thread+0x6c8/0xf70 kernel/workqueue.c:3416
kthread+0x2c1/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup


2024-03-27 19:17:15

by Paul Moore

[permalink] [raw]
Subject: Re: [syzbot] [audit?] [bpf?] INFO: rcu detected stall in kauditd_thread (4)

On Wed, Mar 27, 2024 at 2:39 PM syzbot
<[email protected]> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=121a9e21180000
> kernel config: https://syzkaller.appspot.com/x/.config?x=aef2a55903e5791c
> dashboard link: https://syzkaller.appspot.com/bug?extid=81f5ca46b043d4a1b789
> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12104f9e180000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=112840ee180000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/089e25869df5/disk-fe46a7dd.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/423b1787914f/vmlinux-fe46a7dd.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/4c043e30c07d/bzImage-fe46a7dd.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]

This doesn't look to be audit related, but I do see something that may
be related in the printk code which was just merged into Linus' tree:

commit 8076972468584d4a21dab9aa50e388b3ea9ad8c7
Author: John Ogness <[email protected]>
Date: Mon Feb 26 13:07:24 2024 +0106

printk: Update @console_may_schedule in console_trylock_spinning()

console_trylock_spinning() may takeover the console lock from a
schedulable context. Update @console_may_schedule to make sure it
reflects a trylock acquire.

Reported-by: Mukesh Ojha <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]
Fixes: dbdda842fe96 ("printk: Add console owner and ...")
Signed-off-by: John Ogness <[email protected]>
Reviewed-by: Mukesh Ojha <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>

> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu: 1-...!: (1 ticks this GP) idle=9bf4/1/0x4000000000000000 softirq=6591/6591 fqs=5
> rcu: (detected by 0, t=10502 jiffies, g=7873, q=65 ncpus=2)
> Sending NMI from CPU 0 to CPUs 1:
> NMI backtrace for cpu 1
> CPU: 1 PID: 28 Comm: kauditd Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
> RIP: 0010:write_comp_data+0x0/0x90 kernel/kcov.c:230
> Code: 48 8b 05 13 e2 76 7e 48 8b 80 08 16 00 00 c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <49> 89 d2 49 89 f8 49 89 f1 65 48 8b 15 df e1 76 7e 65 8b 05 e0 e1
> RSP: 0018:ffffc90000a08d78 EFLAGS: 00000046
> RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff88e73a44
> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000005
> RBP: 000000000003d3cc R08: 0000000000000005 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000002 R12: ffff8880b952cac0
> R13: ffff88802abeb340 R14: ffff88802abeb340 R15: ffffffff88e73220
> FS: 0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000555582f0dca8 CR3: 000000002bcea000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <NMI>
> </NMI>
> <IRQ>
> advance_sched+0x824/0xc60 net/sched/sch_taprio.c:925
> __run_hrtimer kernel/time/hrtimer.c:1692 [inline]
> __hrtimer_run_queues+0x20c/0xc20 kernel/time/hrtimer.c:1756
> hrtimer_interrupt+0x31b/0x800 kernel/time/hrtimer.c:1818
> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1032 [inline]
> __sysvec_apic_timer_interrupt+0x10f/0x410 arch/x86/kernel/apic/apic.c:1049
> instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
> sysvec_apic_timer_interrupt+0x90/0xb0 arch/x86/kernel/apic/apic.c:1043
> </IRQ>
> <TASK>
> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
> RIP: 0010:console_flush_all+0xa19/0xd70 kernel/printk/printk.c:2979
> Code: e8 9c dd 25 00 9c 5b 81 e3 00 02 00 00 31 ff 48 89 de e8 0a c9 1e 00 48 85 db 0f 85 8b 01 00 00 e8 ec cd 1e 00 fb 48 8b 04 24 <4c> 89 fa 83 e2 07 0f b6 00 38 d0 7f 08 84 c0 0f 85 a9 02 00 00 41
> RSP: 0018:ffffc90000a47a98 EFLAGS: 00000293
> RAX: fffff52000148f7a RBX: 0000000000000000 RCX: ffffffff816e2ab6
> RDX: ffff888018ee9e00 RSI: ffffffff816e2ac4 RDI: 0000000000000007
> RBP: dffffc0000000000 R08: 0000000000000007 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000003 R12: 0000000000000200
> R13: ffffffff8e3995f8 R14: ffffffff8e3995a0 R15: ffffc90000a47bd0
> console_unlock+0xae/0x290 kernel/printk/printk.c:3042
> vprintk_emit kernel/printk/printk.c:2342 [inline]
> vprintk_emit+0x11a/0x5a0 kernel/printk/printk.c:2297
> vprintk+0x7f/0xa0 kernel/printk/printk_safe.c:45
> _printk+0xc8/0x100 kernel/printk/printk.c:2367
> kauditd_printk_skb kernel/audit.c:546 [inline]
> kauditd_hold_skb+0x1fb/0x240 kernel/audit.c:581
> kauditd_send_queue+0x236/0x290 kernel/audit.c:766
> kauditd_thread+0x61e/0xa80 kernel/audit.c:890
> kthread+0x2c1/0x3a0 kernel/kthread.c:388
> ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
> </TASK>
> INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 2.415 msecs
> rcu: rcu_preempt kthread starved for 10491 jiffies! g7873 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> rcu: RCU grace-period kthread stack dump:
> task:rcu_preempt state:R running task stack:28304 pid:16 tgid:16 ppid:2 flags:0x00004000
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5409 [inline]
> __schedule+0xf15/0x5c70 kernel/sched/core.c:6736
> __schedule_loop kernel/sched/core.c:6813 [inline]
> schedule+0xe7/0x350 kernel/sched/core.c:6828
> schedule_timeout+0x136/0x2a0 kernel/time/timer.c:2572
> rcu_gp_fqs_loop+0x1eb/0xb00 kernel/rcu/tree.c:1663
> rcu_gp_kthread+0x271/0x380 kernel/rcu/tree.c:1862
> kthread+0x2c1/0x3a0 kernel/kthread.c:388
> ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
> </TASK>
> rcu: Stack dump where RCU GP kthread last ran:
> CPU: 0 PID: 43 Comm: kworker/u8:3 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
> Workqueue: events_unbound toggle_allocation_gate
> RIP: 0010:csd_lock_wait kernel/smp.c:311 [inline]
> RIP: 0010:smp_call_function_many_cond+0x4e7/0x1590 kernel/smp.c:855
> Code: 0c 00 85 ed 74 4d 48 b8 00 00 00 00 00 fc ff df 4d 89 f4 4c 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 e8 4b 05 0c 00 f3 90 <41> 0f b6 04 24 40 38 c5 7c 08 84 c0 0f 85 5d 0e 00 00 8b 43 08 31
> RSP: 0018:ffffc90000b37910 EFLAGS: 00000293
> RAX: 0000000000000000 RBX: ffff8880b9544380 RCX: ffffffff8180f38b
> RDX: ffff8880192f0000 RSI: ffffffff8180f365 RDI: 0000000000000005
> RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000006 R12: ffffed10172a8871
> R13: 0000000000000001 R14: ffff8880b9544388 R15: ffff8880b943f840
> FS: 0000000000000000(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055e176bb4000 CR3: 000000000d57a000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <IRQ>
> </IRQ>
> <TASK>
> on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:1023
> on_each_cpu include/linux/smp.h:71 [inline]
> text_poke_sync arch/x86/kernel/alternative.c:2086 [inline]
> text_poke_bp_batch+0x22b/0x760 arch/x86/kernel/alternative.c:2296
> text_poke_flush arch/x86/kernel/alternative.c:2487 [inline]
> text_poke_flush arch/x86/kernel/alternative.c:2484 [inline]
> text_poke_finish+0x30/0x40 arch/x86/kernel/alternative.c:2494
> arch_jump_label_transform_apply+0x1c/0x30 arch/x86/kernel/jump_label.c:146
> jump_label_update+0x1d7/0x400 kernel/jump_label.c:829
> static_key_enable_cpuslocked+0x1b7/0x270 kernel/jump_label.c:205
> static_key_enable+0x1a/0x20 kernel/jump_label.c:218
> toggle_allocation_gate mm/kfence/core.c:826 [inline]
> toggle_allocation_gate+0xf8/0x250 mm/kfence/core.c:818
> process_one_work+0x9a9/0x1a60 kernel/workqueue.c:3254
> process_scheduled_works kernel/workqueue.c:3335 [inline]
> worker_thread+0x6c8/0xf70 kernel/workqueue.c:3416
> kthread+0x2c1/0x3a0 kernel/kthread.c:388
> ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
> </TASK>
> watchdog: BUG: soft lockup - CPU#0 stuck for 246s! [kworker/u8:3:43]
> Modules linked in:
> irq event stamp: 519008
> hardirqs last enabled at (519007): [<ffffffff8ad251cb>] irqentry_exit+0x3b/0x90 kernel/entry/common.c:351
> hardirqs last disabled at (519008): [<ffffffff8ad239ae>] sysvec_apic_timer_interrupt+0xe/0xb0 arch/x86/kernel/apic/apic.c:1043
> softirqs last enabled at (519006): [<ffffffff8ad63156>] softirq_handle_end kernel/softirq.c:400 [inline]
> softirqs last enabled at (519006): [<ffffffff8ad63156>] __do_softirq+0x596/0x8de kernel/softirq.c:583
> softirqs last disabled at (518991): [<ffffffff8151a149>] invoke_softirq kernel/softirq.c:428 [inline]
> softirqs last disabled at (518991): [<ffffffff8151a149>] __irq_exit_rcu kernel/softirq.c:633 [inline]
> softirqs last disabled at (518991): [<ffffffff8151a149>] irq_exit_rcu+0xb9/0x120 kernel/softirq.c:645
> CPU: 0 PID: 43 Comm: kworker/u8:3 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
> Workqueue: events_unbound toggle_allocation_gate
> RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:26 [inline]
> RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline]
> RIP: 0010:write_comp_data+0x11/0x90 kernel/kcov.c:236
> Code: cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 49 89 d2 49 89 f8 49 89 f1 65 48 8b 15 df e1 76 7e <65> 8b 05 e0 e1 76 7e a9 00 01 ff 00 74 0f f6 c4 01 74 59 8b 82 14
> RSP: 0018:ffffc90000b37908 EFLAGS: 00000202
> RAX: 0000000000000001 RBX: ffff8880b9544380 RCX: ffffffff8180f38b
> RDX: ffff8880192f0000 RSI: 0000000000000000 RDI: 0000000000000005
> RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000006 R12: ffffed10172a8871
> R13: 0000000000000001 R14: ffff8880b9544388 R15: ffff8880b943f840
> FS: 0000000000000000(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055e176bb4000 CR3: 000000000d57a000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <IRQ>
> </IRQ>
> <TASK>
> csd_lock_wait kernel/smp.c:311 [inline]
> smp_call_function_many_cond+0x50b/0x1590 kernel/smp.c:855
> on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:1023
> on_each_cpu include/linux/smp.h:71 [inline]
> text_poke_sync arch/x86/kernel/alternative.c:2086 [inline]
> text_poke_bp_batch+0x22b/0x760 arch/x86/kernel/alternative.c:2296
> text_poke_flush arch/x86/kernel/alternative.c:2487 [inline]
> text_poke_flush arch/x86/kernel/alternative.c:2484 [inline]
> text_poke_finish+0x30/0x40 arch/x86/kernel/alternative.c:2494
> arch_jump_label_transform_apply+0x1c/0x30 arch/x86/kernel/jump_label.c:146
> jump_label_update+0x1d7/0x400 kernel/jump_label.c:829
> static_key_enable_cpuslocked+0x1b7/0x270 kernel/jump_label.c:205
> static_key_enable+0x1a/0x20 kernel/jump_label.c:218
> toggle_allocation_gate mm/kfence/core.c:826 [inline]
> toggle_allocation_gate+0xf8/0x250 mm/kfence/core.c:818
> process_one_work+0x9a9/0x1a60 kernel/workqueue.c:3254
> process_scheduled_works kernel/workqueue.c:3335 [inline]
> worker_thread+0x6c8/0xf70 kernel/workqueue.c:3416
> kthread+0x2c1/0x3a0 kernel/kthread.c:388
> ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
> </TASK>
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup



--
paul-moore.com

2024-03-28 14:44:46

by Petr Mladek

[permalink] [raw]
Subject: Re: [syzbot] [audit?] [bpf?] INFO: rcu detected stall in kauditd_thread (4)

On Wed 2024-03-27 15:16:25, Paul Moore wrote:
> On Wed, Mar 27, 2024 at 2:39 PM syzbot
> <[email protected]> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=121a9e21180000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=aef2a55903e5791c
> > dashboard link: https://syzkaller.appspot.com/bug?extid=81f5ca46b043d4a1b789
> > compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12104f9e180000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=112840ee180000
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/089e25869df5/disk-fe46a7dd.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/423b1787914f/vmlinux-fe46a7dd.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/4c043e30c07d/bzImage-fe46a7dd.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: [email protected]
>
> This doesn't look to be audit related, but I do see something that may
> be related in the printk code which was just merged into Linus' tree:
>
> commit 8076972468584d4a21dab9aa50e388b3ea9ad8c7
> Author: John Ogness <[email protected]>
> Date: Mon Feb 26 13:07:24 2024 +0106
>
> printk: Update @console_may_schedule in console_trylock_spinning()
>
> console_trylock_spinning() may takeover the console lock from a
> schedulable context. Update @console_may_schedule to make sure it
> reflects a trylock acquire.

This change should not introduce any new solflockup. By other words,
if we see a softlockup with this patch then it was possible even before.

conosle_trylock_spinning() does two things:

1. It call console_trylock(). On success, it has always cleared
@console_may_schedule.

2. As a fallback, it tries to take over the cosnole_lock() from
the current owner. On success, it should clear
@console_may_schedule like the plain console_trylock().

This code path has an effect only when another CPU is just
emitting messages on the console (calling con->write()).
It might happen only when there is a flood of messages.

The bug was that conosle_trylock_spinning() did not clear
@console_may_schedule in the fallback path.

Note that conosle_trylock_spinning() must always clear
@console_may_schedule. It is used only in printk() which might
be called in atomic context.


> > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > rcu: 1-...!: (1 ticks this GP) idle=9bf4/1/0x4000000000000000 softirq=6591/6591 fqs=5
> > rcu: (detected by 0, t=10502 jiffies, g=7873, q=65 ncpus=2)
> > Sending NMI from CPU 0 to CPUs 1:
> > NMI backtrace for cpu 1
> > CPU: 1 PID: 28 Comm: kauditd Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
> > RIP: 0010:write_comp_data+0x0/0x90 kernel/kcov.c:230
> > Code: 48 8b 05 13 e2 76 7e 48 8b 80 08 16 00 00 c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <49> 89 d2 49 89 f8 49 89 f1 65 48 8b 15 df e1 76 7e 65 8b 05 e0 e1
> > RSP: 0018:ffffc90000a08d78 EFLAGS: 00000046
> > RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff88e73a44
> > RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000005
> > RBP: 000000000003d3cc R08: 0000000000000005 R09: 0000000000000000
> > R10: 0000000000000001 R11: 0000000000000002 R12: ffff8880b952cac0
> > R13: ffff88802abeb340 R14: ffff88802abeb340 R15: ffffffff88e73220
> > FS: 0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000555582f0dca8 CR3: 000000002bcea000 CR4: 00000000003506f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> > <NMI>
> > </NMI>
> > <IRQ>
> > advance_sched+0x824/0xc60 net/sched/sch_taprio.c:925
> > __run_hrtimer kernel/time/hrtimer.c:1692 [inline]
> > __hrtimer_run_queues+0x20c/0xc20 kernel/time/hrtimer.c:1756
> > hrtimer_interrupt+0x31b/0x800 kernel/time/hrtimer.c:1818
> > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1032 [inline]
> > __sysvec_apic_timer_interrupt+0x10f/0x410 arch/x86/kernel/apic/apic.c:1049
> > instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
> > sysvec_apic_timer_interrupt+0x90/0xb0 arch/x86/kernel/apic/apic.c:1043

I wonder what this timer does and if it might get stuck and cause
the lockup.

> > </IRQ>
> > <TASK>
> > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
> > RIP: 0010:console_flush_all+0xa19/0xd70 kernel/printk/printk.c:2979
> > Code: e8 9c dd 25 00 9c 5b 81 e3 00 02 00 00 31 ff 48 89 de e8 0a c9 1e 00 48 85 db 0f 85 8b 01 00 00 e8 ec cd 1e 00 fb 48 8b 04 24 <4c> 89 fa 83 e2 07 0f b6 00 38 d0 7f 08 84 c0 0f 85 a9 02 00 00 41
> > RSP: 0018:ffffc90000a47a98 EFLAGS: 00000293
> > RAX: fffff52000148f7a RBX: 0000000000000000 RCX: ffffffff816e2ab6
> > RDX: ffff888018ee9e00 RSI: ffffffff816e2ac4 RDI: 0000000000000007
> > RBP: dffffc0000000000 R08: 0000000000000007 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000003 R12: 0000000000000200
> > R13: ffffffff8e3995f8 R14: ffffffff8e3995a0 R15: ffffc90000a47bd0
> > console_unlock+0xae/0x290 kernel/printk/printk.c:3042
> > vprintk_emit kernel/printk/printk.c:2342 [inline]
> > vprintk_emit+0x11a/0x5a0 kernel/printk/printk.c:2297
> > vprintk+0x7f/0xa0 kernel/printk/printk_safe.c:45
> > _printk+0xc8/0x100 kernel/printk/printk.c:2367
> > kauditd_printk_skb kernel/audit.c:546 [inline]
> > kauditd_hold_skb+0x1fb/0x240 kernel/audit.c:581
> > kauditd_send_queue+0x236/0x290 kernel/audit.c:766
> > kauditd_thread+0x61e/0xa80 kernel/audit.c:890
> > kthread+0x2c1/0x3a0 kernel/kthread.c:388
> > ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
> > </TASK>
> > INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 2.415 msecs
> > rcu: rcu_preempt kthread starved for 10491 jiffies! g7873 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> > rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> > rcu: RCU grace-period kthread stack dump:
> > task:rcu_preempt state:R running task stack:28304 pid:16 tgid:16 ppid:2 flags:0x00004000
> > Call Trace:
> > <TASK>
> > context_switch kernel/sched/core.c:5409 [inline]
> > __schedule+0xf15/0x5c70 kernel/sched/core.c:6736
> > __schedule_loop kernel/sched/core.c:6813 [inline]
> > schedule+0xe7/0x350 kernel/sched/core.c:6828
> > schedule_timeout+0x136/0x2a0 kernel/time/timer.c:2572
> > rcu_gp_fqs_loop+0x1eb/0xb00 kernel/rcu/tree.c:1663
> > rcu_gp_kthread+0x271/0x380 kernel/rcu/tree.c:1862
> > kthread+0x2c1/0x3a0 kernel/kthread.c:388
> > ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
> > </TASK>
> > rcu: Stack dump where RCU GP kthread last ran:
> > CPU: 0 PID: 43 Comm: kworker/u8:3 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
> > Workqueue: events_unbound toggle_allocation_gate
> > RIP: 0010:csd_lock_wait kernel/smp.c:311 [inline]
> > RIP: 0010:smp_call_function_many_cond+0x4e7/0x1590 kernel/smp.c:855
> > Code: 0c 00 85 ed 74 4d 48 b8 00 00 00 00 00 fc ff df 4d 89 f4 4c 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 e8 4b 05 0c 00 f3 90 <41> 0f b6 04 24 40 38 c5 7c 08 84 c0 0f 85 5d 0e 00 00 8b 43 08 31
> > RSP: 0018:ffffc90000b37910 EFLAGS: 00000293
> > RAX: 0000000000000000 RBX: ffff8880b9544380 RCX: ffffffff8180f38b
> > RDX: ffff8880192f0000 RSI: ffffffff8180f365 RDI: 0000000000000005
> > RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000
> > R10: 0000000000000001 R11: 0000000000000006 R12: ffffed10172a8871
> > R13: 0000000000000001 R14: ffff8880b9544388 R15: ffff8880b943f840
> > FS: 0000000000000000(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 000055e176bb4000 CR3: 000000000d57a000 CR4: 00000000003506f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> > <IRQ>
> > </IRQ>
> > <TASK>
> > on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:1023
> > on_each_cpu include/linux/smp.h:71 [inline]
> > text_poke_sync arch/x86/kernel/alternative.c:2086 [inline]

I guess that this wants to run something on each CPU before
it continues. It would block this CPU when the other CPU gets stuck.

> > text_poke_bp_batch+0x22b/0x760 arch/x86/kernel/alternative.c:2296
> > text_poke_flush arch/x86/kernel/alternative.c:2487 [inline]
> > text_poke_flush arch/x86/kernel/alternative.c:2484 [inline]
> > text_poke_finish+0x30/0x40 arch/x86/kernel/alternative.c:2494
> > arch_jump_label_transform_apply+0x1c/0x30 arch/x86/kernel/jump_label.c:146
> > jump_label_update+0x1d7/0x400 kernel/jump_label.c:829
> > static_key_enable_cpuslocked+0x1b7/0x270 kernel/jump_label.c:205
> > static_key_enable+0x1a/0x20 kernel/jump_label.c:218
> > toggle_allocation_gate mm/kfence/core.c:826 [inline]
> > toggle_allocation_gate+0xf8/0x250 mm/kfence/core.c:818
> > process_one_work+0x9a9/0x1a60 kernel/workqueue.c:3254
> > process_scheduled_works kernel/workqueue.c:3335 [inline]
> > worker_thread+0x6c8/0xf70 kernel/workqueue.c:3416
> > kthread+0x2c1/0x3a0 kernel/kthread.c:388
> > ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
> > </TASK>
> > watchdog: BUG: soft lockup - CPU#0 stuck for 246s! [kworker/u8:3:43]

It is interesting that it took 246s. AFAIK, the default watchdog
thresh is much sorter (like 20s). Well, I havn't checked setting
used for this test.

> > Modules linked in:
> > irq event stamp: 519008
> > hardirqs last enabled at (519007): [<ffffffff8ad251cb>] irqentry_exit+0x3b/0x90 kernel/entry/common.c:351
> > hardirqs last disabled at (519008): [<ffffffff8ad239ae>] sysvec_apic_timer_interrupt+0xe/0xb0 arch/x86/kernel/apic/apic.c:1043
> > softirqs last enabled at (519006): [<ffffffff8ad63156>] softirq_handle_end kernel/softirq.c:400 [inline]
> > softirqs last enabled at (519006): [<ffffffff8ad63156>] __do_softirq+0x596/0x8de kernel/softirq.c:583
> > softirqs last disabled at (518991): [<ffffffff8151a149>] invoke_softirq kernel/softirq.c:428 [inline]
> > softirqs last disabled at (518991): [<ffffffff8151a149>] __irq_exit_rcu kernel/softirq.c:633 [inline]
> > softirqs last disabled at (518991): [<ffffffff8151a149>] irq_exit_rcu+0xb9/0x120 kernel/softirq.c:645
> > CPU: 0 PID: 43 Comm: kworker/u8:3 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
> > Workqueue: events_unbound toggle_allocation_gate
> > RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:26 [inline]
> > RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline]
> > RIP: 0010:write_comp_data+0x11/0x90 kernel/kcov.c:236
> > Code: cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 49 89 d2 49 89 f8 49 89 f1 65 48 8b 15 df e1 76 7e <65> 8b 05 e0 e1 76 7e a9 00 01 ff 00 74 0f f6 c4 01 74 59 8b 82 14
> > RSP: 0018:ffffc90000b37908 EFLAGS: 00000202
> > RAX: 0000000000000001 RBX: ffff8880b9544380 RCX: ffffffff8180f38b
> > RDX: ffff8880192f0000 RSI: 0000000000000000 RDI: 0000000000000005
> > RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000
> > R10: 0000000000000001 R11: 0000000000000006 R12: ffffed10172a8871
> > R13: 0000000000000001 R14: ffff8880b9544388 R15: ffff8880b943f840
> > FS: 0000000000000000(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 000055e176bb4000 CR3: 000000000d57a000 CR4: 00000000003506f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> > <IRQ>
> > </IRQ>
> > <TASK>
> > csd_lock_wait kernel/smp.c:311 [inline]
> > smp_call_function_many_cond+0x50b/0x1590 kernel/smp.c:855
> > on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:1023
> > on_each_cpu include/linux/smp.h:71 [inline]

It seems that this is really stuck because it is not able to
run something on the other CPU.

IMHO, the primary problem is in the code proceed by the hrtimer.
Adding networking people into Cc.

Best Regards,
Petr

> > text_poke_sync arch/x86/kernel/alternative.c:2086 [inline]
> > text_poke_bp_batch+0x22b/0x760 arch/x86/kernel/alternative.c:2296
> > text_poke_flush arch/x86/kernel/alternative.c:2487 [inline]
> > text_poke_flush arch/x86/kernel/alternative.c:2484 [inline]
> > text_poke_finish+0x30/0x40 arch/x86/kernel/alternative.c:2494
> > arch_jump_label_transform_apply+0x1c/0x30 arch/x86/kernel/jump_label.c:146
> > jump_label_update+0x1d7/0x400 kernel/jump_label.c:829
> > static_key_enable_cpuslocked+0x1b7/0x270 kernel/jump_label.c:205
> > static_key_enable+0x1a/0x20 kernel/jump_label.c:218
> > toggle_allocation_gate mm/kfence/core.c:826 [inline]
> > toggle_allocation_gate+0xf8/0x250 mm/kfence/core.c:818
> > process_one_work+0x9a9/0x1a60 kernel/workqueue.c:3254
> > process_scheduled_works kernel/workqueue.c:3335 [inline]
> > worker_thread+0x6c8/0xf70 kernel/workqueue.c:3416
> > kthread+0x2c1/0x3a0 kernel/kthread.c:388
> > ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
> > </TASK>
> >
> >
> > ---
> > This report is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at [email protected].
> >
> > syzbot will keep track of this issue. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> >
> > If the report is already addressed, let syzbot know by replying with:
> > #syz fix: exact-commit-title
> >
> > If you want syzbot to run the reproducer, reply with:
> > #syz test: git://repo/address.git branch-or-commit-hash
> > If you attach or paste a git patch, syzbot will apply it before testing.
> >
> > If you want to overwrite report's subsystems, reply with:
> > #syz set subsystems: new-subsystem
> > (See the list of subsystem names on the web dashboard)
> >
> > If the report is a duplicate of another one, reply with:
> > #syz dup: exact-subject-of-another-report
> >
> > If you want to undo deduplication, reply with:
> > #syz undup
>
>
>
> --
> paul-moore.com

2024-03-28 23:28:40

by Hillf Danton

[permalink] [raw]
Subject: Re: [syzbot] [audit?] [bpf?] INFO: rcu detected stall in kauditd_thread (4)

On Wed, 27 Mar 2024 11:39:42 -0700
> syzbot found the following issue on:
>
> HEAD commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel..
> git tree: upstream
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=112840ee180000

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

--- x/net/sched/sch_taprio.c
+++ y/net/sched/sch_taprio.c
@@ -980,7 +980,7 @@ first_run:
rcu_assign_pointer(q->current_entry, next);
spin_unlock(&q->current_entry_lock);

- hrtimer_set_expires(&q->advance_timer, end_time);
+ hrtimer_forward_now(&q->advance_timer, ns_to_ktime(NSEC_PER_USEC *200));

rcu_read_lock();
__netif_schedule(sch);
--

2024-03-29 17:12:41

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] [audit?] [bpf?] INFO: rcu detected stall in kauditd_thread (4)

Hello,

syzbot tried to test the proposed patch but the build/boot failed:

onnector registered
[ 8.925501][ T1] [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0
[ 8.936918][ T1] [drm] Initialized vkms 1.0.0 20180514 for vkms on minor 1
[ 9.031871][ T1] Console: switching to colour frame buffer device 128x48
[ 9.049570][ T1] platform vkms: [drm] fb0: vkmsdrmfb frame buffer device
[ 9.050924][ T1] usbcore: registered new interface driver udl
[ 9.140069][ T1] brd: module loaded
[ 9.257049][ T1] loop: module loaded
[ 9.412333][ T1] zram: Added device: zram0
[ 9.424889][ T1] null_blk: disk nullb0 created
[ 9.425718][ T1] null_blk: module loaded
[ 9.427782][ T1] Guest personality initialized and is inactive
[ 9.429970][ T1] VMCI host device registered (name=vmci, major=10, minor=118)
[ 9.431095][ T1] Initialized host personality
[ 9.432024][ T1] usbcore: registered new interface driver rtsx_usb
[ 9.433983][ T1] usbcore: registered new interface driver viperboard
[ 9.435713][ T1] usbcore: registered new interface driver dln2
[ 9.437613][ T1] usbcore: registered new interface driver pn533_usb
[ 9.445487][ T1] nfcsim 0.2 initialized
[ 9.446814][ T1] usbcore: registered new interface driver port100
[ 9.448211][ T1] usbcore: registered new interface driver nfcmrvl
[ 9.455761][ T1] Loading iSCSI transport class v2.0-870.
[ 9.481077][ T1] virtio_scsi virtio0: 1/0/0 default/read/poll queues
[ 9.491184][ T1] ------------[ cut here ]------------
[ 9.491949][ T1] refcount_t: decrement hit 0; leaking memory.
[ 9.493197][ T1] WARNING: CPU: 0 PID: 1 at lib/refcount.c:31 refcount_warn_saturate+0x1ed/0x210
[ 9.494503][ T1] Modules linked in:
[ 9.495197][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc1-syzkaller-00178-g317c7bc0ef03-dirty #0
[ 9.496752][ T1] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
[ 9.498118][ T1] RIP: 0010:refcount_warn_saturate+0x1ed/0x210
[ 9.499016][ T1] Code: 8b e8 d7 34 d2 fc 90 0f 0b 90 90 e9 c3 fe ff ff e8 c8 c6 0f fd c6 05 fa f0 4c 0b 01 90 48 c7 c7 80 e5 8e 8b e8 b4 34 d2 fc 90 <0f> 0b 90 90 e9 a0 fe ff ff 48 89 ef e8 e2 f2 6a fd e9 44 fe ff ff
[ 9.501796][ T1] RSP: 0000:ffffc900000673c0 EFLAGS: 00010286
[ 9.502670][ T1] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff814fe149
[ 9.503753][ T1] RDX: ffff888016ec0000 RSI: ffffffff814fe156 RDI: 0000000000000001
[ 9.504990][ T1] RBP: ffff888147a9da7c R08: 0000000000000001 R09: 0000000000000000
[ 9.506141][ T1] R10: 0000000000000000 R11: 0000000000000001 R12: ffff888147a9da7c
[ 9.507308][ T1] R13: 0000000000000000 R14: 00000000034c00b9 R15: ffff88801687cd88
[ 9.508457][ T1] FS: 0000000000000000(0000) GS:ffff8880b9200000(0000) knlGS:0000000000000000
[ 9.509770][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9.510753][ T1] CR2: ffff88823ffff000 CR3: 000000000d97a000 CR4: 00000000003506f0
[ 9.511894][ T1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9.513054][ T1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 9.514199][ T1] Call Trace:
[ 9.514874][ T1] <TASK>
[ 9.515320][ T1] ? show_regs+0x8c/0xa0
[ 9.515938][ T1] ? __warn+0xe5/0x3c0
[ 9.516504][ T1] ? __wake_up_klogd.part.0+0x99/0xf0
[ 9.517289][ T1] ? refcount_warn_saturate+0x1ed/0x210
[ 9.518112][ T1] ? report_bug+0x3c0/0x580
[ 9.518784][ T1] ? handle_bug+0x3d/0x70
[ 9.519418][ T1] ? exc_invalid_op+0x17/0x50
[ 9.520133][ T1] ? asm_exc_invalid_op+0x1a/0x20
[ 9.520890][ T1] ? __warn_printk+0x199/0x350
[ 9.521627][ T1] ? __warn_printk+0x1a6/0x350
[ 9.522359][ T1] ? refcount_warn_saturate+0x1ed/0x210
[ 9.523209][ T1] __reset_page_owner+0x2ea/0x370
[ 9.523932][ T1] __free_pages_ok+0x5a9/0xbf0
[ 9.524704][ T1] ? __split_page_owner+0xdd/0x120
[ 9.525413][ T1] make_alloc_exact+0x165/0x260
[ 9.526180][ T1] vring_alloc_queue+0xc1/0x110
[ 9.527000][ T1] vring_alloc_queue_split+0x186/0x510
[ 9.527845][ T1] ? __pfx_vring_alloc_queue_split+0x10/0x10
[ 9.528704][ T1] ? kasan_save_stack+0x42/0x60
[ 9.529397][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 9.530249][ T1] vring_create_virtqueue_split+0xe8/0x280
[ 9.531109][ T1] ? __pfx_vp_notify+0x10/0x10
[ 9.531838][ T1] ? __pfx_vring_create_virtqueue_split+0x10/0x10
[ 9.532706][ T1] ? virtio_scsi_init+0x75/0x100
[ 9.533384][ T1] ? do_one_initcall+0x128/0x700
[ 9.534078][ T1] ? kernel_init_freeable+0x69d/0xca0
[ 9.534939][ T1] vring_create_virtqueue+0xd7/0x140
[ 9.535789][ T1] ? __pfx_vp_notify+0x10/0x10
[ 9.536450][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 9.537233][ T1] setup_vq+0x123/0x2f0
[ 9.537807][ T1] ? __pfx_vp_notify+0x10/0x10
[ 9.538463][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 9.539325][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 9.540143][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 9.540960][ T1] vp_setup_vq+0xb1/0x380
[ 9.541628][ T1] ? iowrite16+0x55/0x80
[ 9.542223][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 9.542989][ T1] vp_find_vqs_msix+0x661/0xe30
[ 9.543719][ T1] vp_find_vqs+0x58/0x560
[ 9.544394][ T1] virtscsi_init+0x459/0x9f0
[ 9.545082][ T1] ? __pfx_virtscsi_init+0x10/0x10
[ 9.545805][ T1] ? __pfx_default_calc_sets+0x10/0x10
[ 9.546590][ T1] ? ioread8+0x4e/0xa0
[ 9.547161][ T1] ? vp_get+0xf6/0x140
[ 9.547727][ T1] virtscsi_probe+0x3b2/0xdd0
[ 9.548378][ T1] ? __pfx_virtscsi_probe+0x10/0x10
[ 9.549118][ T1] ? trace_raw_output_sched_pi_setprio+0xad/0x150
[ 9.550033][ T1] ? vring_transport_features+0x49/0xb0
[ 9.550796][ T1] virtio_dev_probe+0x5ff/0x9b0
[ 9.551467][ T1] ? __pfx_virtio_dev_probe+0x10/0x10
[ 9.552207][ T1] really_probe+0x23e/0xa90
[ 9.552833][ T1] __driver_probe_device+0x1de/0x440
[ 9.553620][ T1] ? _raw_spin_unlock_irqrestore+0x52/0x80
[ 9.554433][ T1] driver_probe_device+0x4c/0x1b0
[ 9.555130][ T1] __driver_attach+0x283/0x580
[ 9.555809][ T1] ? __pfx___driver_attach+0x10/0x10
[ 9.556553][ T1] bus_for_each_dev+0x13c/0x1d0
[ 9.557221][ T1] ? __pfx_bus_for_each_dev+0x10/0x10
[ 9.557958][ T1] bus_add_driver+0x2ed/0x640
[ 9.558607][ T1] driver_register+0x15c/0x4b0
[ 9.559261][ T1] ? __pfx_virtio_scsi_init+0x10/0x10
[ 9.560012][ T1] ? __pfx_virtio_scsi_init+0x10/0x10
[ 9.560746][ T1] virtio_scsi_init+0x75/0x100
[ 9.561401][ T1] do_one_initcall+0x128/0x700
[ 9.562065][ T1] ? __pfx_do_one_initcall+0x10/0x10
[ 9.562813][ T1] ? trace_kmalloc+0x2d/0xe0
[ 9.563451][ T1] ? __kmalloc+0x218/0x440
[ 9.564063][ T1] kernel_init_freeable+0x69d/0xca0
[ 9.564821][ T1] ? __pfx_kernel_init+0x10/0x10
[ 9.565561][ T1] kernel_init+0x1c/0x2b0
[ 9.566160][ T1] ? __pfx_kernel_init+0x10/0x10
[ 9.566837][ T1] ret_from_fork+0x45/0x80
[ 9.567453][ T1] ? __pfx_kernel_init+0x10/0x10
[ 9.568132][ T1] ret_from_fork_asm+0x1a/0x30
[ 9.568792][ T1] </TASK>
[ 9.569217][ T1] Kernel panic - not syncing: kernel: panic_on_warn set ...
[ 9.570203][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc1-syzkaller-00178-g317c7bc0ef03-dirty #0
[ 9.571565][ T1] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
[ 9.572900][ T1] Call Trace:
[ 9.573380][ T1] <TASK>
[ 9.573813][ T1] dump_stack_lvl+0x3d/0x1f0
[ 9.574451][ T1] panic+0x6f5/0x7a0
[ 9.574804][ T1] ? __pfx_panic+0x10/0x10
[ 9.574804][ T1] ? show_trace_log_lvl+0x363/0x500
[ 9.574804][ T1] ? check_panic_on_warn+0x1f/0xb0
[ 9.574804][ T1] ? refcount_warn_saturate+0x1ed/0x210
[ 9.574804][ T1] check_panic_on_warn+0xab/0xb0
[ 9.574804][ T1] __warn+0xf1/0x3c0
[ 9.574804][ T1] ? __wake_up_klogd.part.0+0x99/0xf0
[ 9.574804][ T1] ? refcount_warn_saturate+0x1ed/0x210
[ 9.574804][ T1] report_bug+0x3c0/0x580
[ 9.574804][ T1] handle_bug+0x3d/0x70
[ 9.574804][ T1] exc_invalid_op+0x17/0x50
[ 9.574804][ T1] asm_exc_invalid_op+0x1a/0x20
[ 9.574804][ T1] RIP: 0010:refcount_warn_saturate+0x1ed/0x210
[ 9.574804][ T1] Code: 8b e8 d7 34 d2 fc 90 0f 0b 90 90 e9 c3 fe ff ff e8 c8 c6 0f fd c6 05 fa f0 4c 0b 01 90 48 c7 c7 80 e5 8e 8b e8 b4 34 d2 fc 90 <0f> 0b 90 90 e9 a0 fe ff ff 48 89 ef e8 e2 f2 6a fd e9 44 fe ff ff
[ 9.574804][ T1] RSP: 0000:ffffc900000673c0 EFLAGS: 00010286
[ 9.574804][ T1] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff814fe149
[ 9.574804][ T1] RDX: ffff888016ec0000 RSI: ffffffff814fe156 RDI: 0000000000000001
[ 9.574804][ T1] RBP: ffff888147a9da7c R08: 0000000000000001 R09: 0000000000000000
[ 9.574804][ T1] R10: 0000000000000000 R11: 0000000000000001 R12: ffff888147a9da7c
[ 9.574804][ T1] R13: 0000000000000000 R14: 00000000034c00b9 R15: ffff88801687cd88
[ 9.574804][ T1] ? __warn_printk+0x199/0x350
[ 9.574804][ T1] ? __warn_printk+0x1a6/0x350
[ 9.574804][ T1] __reset_page_owner+0x2ea/0x370
[ 9.574804][ T1] __free_pages_ok+0x5a9/0xbf0
[ 9.574804][ T1] ? __split_page_owner+0xdd/0x120
[ 9.574804][ T1] make_alloc_exact+0x165/0x260
[ 9.574804][ T1] vring_alloc_queue+0xc1/0x110
[ 9.574804][ T1] vring_alloc_queue_split+0x186/0x510
[ 9.574804][ T1] ? __pfx_vring_alloc_queue_split+0x10/0x10
[ 9.574804][ T1] ? kasan_save_stack+0x42/0x60
[ 9.574804][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 9.574804][ T1] vring_create_virtqueue_split+0xe8/0x280
[ 9.574804][ T1] ? __pfx_vp_notify+0x10/0x10
[ 9.574804][ T1] ? __pfx_vring_create_virtqueue_split+0x10/0x10
[ 9.574804][ T1] ? virtio_scsi_init+0x75/0x100
[ 9.574804][ T1] ? do_one_initcall+0x128/0x700
[ 9.574804][ T1] ? kernel_init_freeable+0x69d/0xca0
[ 9.574804][ T1] vring_create_virtqueue+0xd7/0x140
[ 9.574804][ T1] ? __pfx_vp_notify+0x10/0x10
[ 9.574804][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 9.574804][ T1] setup_vq+0x123/0x2f0
[ 9.574804][ T1] ? __pfx_vp_notify+0x10/0x10
[ 9.574804][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 9.574804][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 9.574804][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 9.574804][ T1] vp_setup_vq+0xb1/0x380
[ 9.574804][ T1] ? iowrite16+0x55/0x80
[ 9.574804][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 9.574804][ T1] vp_find_vqs_msix+0x661/0xe30
[ 9.574804][ T1] vp_find_vqs+0x58/0x560
[ 9.574804][ T1] virtscsi_init+0x459/0x9f0
[ 9.574804][ T1] ? __pfx_virtscsi_init+0x10/0x10
[ 9.574804][ T1] ? __pfx_default_calc_sets+0x10/0x10
[ 9.574804][ T1] ? ioread8+0x4e/0xa0
[ 9.574804][ T1] ? vp_get+0xf6/0x140
[ 9.574804][ T1] virtscsi_probe+0x3b2/0xdd0
[ 9.574804][ T1] ? __pfx_virtscsi_probe+0x10/0x10
[ 9.574804][ T1] ? trace_raw_output_sched_pi_setprio+0xad/0x150
[ 9.574804][ T1] ? vring_transport_features+0x49/0xb0
[ 9.574804][ T1] virtio_dev_probe+0x5ff/0x9b0
[ 9.574804][ T1] ? __pfx_virtio_dev_probe+0x10/0x10
[ 9.574804][ T1] really_probe+0x23e/0xa90
[ 9.574804][ T1] __driver_probe_device+0x1de/0x440
[ 9.574804][ T1] ? _raw_spin_unlock_irqrestore+0x52/0x80
[ 9.574804][ T1] driver_probe_device+0x4c/0x1b0
[ 9.574804][ T1] __driver_attach+0x283/0x580
[ 9.574804][ T1] ? __pfx___driver_attach+0x10/0x10
[ 9.574804][ T1] bus_for_each_dev+0x13c/0x1d0
[ 9.574804][ T1] ? __pfx_bus_for_each_dev+0x10/0x10
[ 9.574804][ T1] bus_add_driver+0x2ed/0x640
[ 9.574804][ T1] driver_register+0x15c/0x4b0
[ 9.574804][ T1] ? __pfx_virtio_scsi_init+0x10/0x10
[ 9.574804][ T1] ? __pfx_virtio_scsi_init+0x10/0x10
[ 9.574804][ T1] virtio_scsi_init+0x75/0x100
[ 9.574804][ T1] do_one_initcall+0x128/0x700
[ 9.574804][ T1] ? __pfx_do_one_initcall+0x10/0x10
[ 9.574804][ T1] ? trace_kmalloc+0x2d/0xe0
[ 9.574804][ T1] ? __kmalloc+0x218/0x440
[ 9.574804][ T1] kernel_init_freeable+0x69d/0xca0
[ 9.574804][ T1] ? __pfx_kernel_init+0x10/0x10
[ 9.574804][ T1] kernel_init+0x1c/0x2b0
[ 9.574804][ T1] ? __pfx_kernel_init+0x10/0x10
[ 9.574804][ T1] ret_from_fork+0x45/0x80
[ 9.574804][ T1] ? __pfx_kernel_init+0x10/0x10
[ 9.574804][ T1] ret_from_fork_asm+0x1a/0x30
[ 9.574804][ T1] </TASK>
[ 9.574804][ T1] Kernel Offset: disabled
[ 9.574804][ T1] Rebooting in 86400 seconds..


syzkaller build log:
go env (err=<nil>)
GO111MODULE='auto'
GOARCH='amd64'
GOBIN=''
GOCACHE='/syzkaller/.cache/go-build'
GOENV='/syzkaller/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/syzkaller/jobs/linux/gopath/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/syzkaller/jobs/linux/gopath'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21.4'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/syzkaller/jobs/linux/gopath/src/github.com/google/syzkaller/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build932799495=/tmp/go-build -gno-record-gcc-switches'

git status (err=<nil>)
HEAD detached at 454571b6a
nothing to commit, working tree clean


tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
Makefile:31: run command via tools/syz-env for best compatibility, see:
Makefile:32: https://github.com/google/syzkaller/blob/master/docs/contributing.md#using-syz-env
go list -f '{{.Stale}}' ./sys/syz-sysgen | grep -q false || go install ./sys/syz-sysgen
make .descriptions
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
Makefile:31: run command via tools/syz-env for best compatibility, see:
Makefile:32: https://github.com/google/syzkaller/blob/master/docs/contributing.md#using-syz-env
bin/syz-sysgen
touch .descriptions
GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=454571b6a16598f5a6e015b9fb1a04932bce7ab9 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20240326-163935'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-fuzzer github.com/google/syzkaller/syz-fuzzer
GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=454571b6a16598f5a6e015b9fb1a04932bce7ab9 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20240326-163935'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-execprog github.com/google/syzkaller/tools/syz-execprog
GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=454571b6a16598f5a6e015b9fb1a04932bce7ab9 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20240326-163935'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-stress github.com/google/syzkaller/tools/syz-stress
mkdir -p ./bin/linux_amd64
gcc -o ./bin/linux_amd64/syz-executor executor/executor.cc \
-m64 -O2 -pthread -Wall -Werror -Wparentheses -Wunused-const-variable -Wframe-larger-than=16384 -Wno-stringop-overflow -Wno-array-bounds -Wno-format-overflow -Wno-unused-but-set-variable -Wno-unused-command-line-argument -static-pie -fpermissive -w -DGOOS_linux=1 -DGOARCH_amd64=1 \
-DHOSTGOOS_linux=1 -DGIT_REVISION=\"454571b6a16598f5a6e015b9fb1a04932bce7ab9\"


Error text is too large and was truncated, full error text is at:
https://syzkaller.appspot.com/x/error.txt?x=11a50795180000


Tested on:

commit: 317c7bc0 Merge tag 'mmc-v6.9-rc1' of git://git.kernel...
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
kernel config: https://syzkaller.appspot.com/x/.config?x=f64ec427e98bccd7
dashboard link: https://syzkaller.appspot.com/bug?extid=81f5ca46b043d4a1b789
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=17287791180000


2024-03-30 00:44:37

by Hillf Danton

[permalink] [raw]
Subject: Re: [syzbot] [audit?] [bpf?] INFO: rcu detected stall in kauditd_thread (4)

On Wed, 27 Mar 2024 11:39:42 -0700
> syzbot found the following issue on:
>
> HEAD commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel..
> git tree: upstream
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=112840ee180000

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

--- x/net/sched/sch_taprio.c
+++ y/net/sched/sch_taprio.c
@@ -980,7 +980,7 @@ first_run:
rcu_assign_pointer(q->current_entry, next);
spin_unlock(&q->current_entry_lock);

- hrtimer_set_expires(&q->advance_timer, end_time);
+ hrtimer_forward_now(&q->advance_timer, ns_to_ktime(NSEC_PER_USEC *200));

rcu_read_lock();
__netif_schedule(sch);
--- x/mm/page_owner.c
+++ y/mm/page_owner.c
@@ -196,7 +196,8 @@ static void add_stack_record_to_list(str
spin_unlock_irqrestore(&stack_list_lock, flags);
}

-static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask)
+static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask,
+ int nr_base_pages)
{
struct stack_record *stack_record = __stack_depot_get_stack_record(handle);

@@ -217,20 +218,74 @@ static void inc_stack_record_count(depot
/* Add the new stack_record to our list */
add_stack_record_to_list(stack_record, gfp_mask);
}
- refcount_inc(&stack_record->count);
+ refcount_add(nr_base_pages, &stack_record->count);
}

-static void dec_stack_record_count(depot_stack_handle_t handle)
+static void dec_stack_record_count(depot_stack_handle_t handle,
+ int nr_base_pages)
{
struct stack_record *stack_record = __stack_depot_get_stack_record(handle);

- if (stack_record)
- refcount_dec(&stack_record->count);
+ if (!stack_record)
+ return;
+
+ if (refcount_sub_and_test(nr_base_pages, &stack_record->count))
+ pr_warn("%s: refcount went to 0 for %u handle\n", __func__,
+ handle);
}

-void __reset_page_owner(struct page *page, unsigned short order)
+static inline void __update_page_owner_handle(struct page_ext *page_ext,
+ depot_stack_handle_t handle,
+ unsigned short order,
+ gfp_t gfp_mask,
+ short last_migrate_reason, u64 ts_nsec,
+ pid_t pid, pid_t tgid, char *comm)
{
int i;
+ struct page_owner *page_owner;
+
+ for (i = 0; i < (1 << order); i++) {
+ page_owner = get_page_owner(page_ext);
+ page_owner->handle = handle;
+ page_owner->order = order;
+ page_owner->gfp_mask = gfp_mask;
+ page_owner->last_migrate_reason = last_migrate_reason;
+ page_owner->pid = pid;
+ page_owner->tgid = tgid;
+ page_owner->ts_nsec = ts_nsec;
+ strscpy(page_owner->comm, comm,
+ sizeof(page_owner->comm));
+ __set_bit(PAGE_EXT_OWNER, &page_ext->flags);
+ __set_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
+ page_ext = page_ext_next(page_ext);
+ }
+}
+
+static inline void __update_page_owner_free_handle(struct page_ext *page_ext,
+ depot_stack_handle_t handle,
+ unsigned short order,
+ pid_t pid, pid_t tgid,
+ u64 free_ts_nsec)
+{
+ int i;
+ struct page_owner *page_owner;
+
+ for (i = 0; i < (1 << order); i++) {
+ page_owner = get_page_owner(page_ext);
+ /* Only __reset_page_owner() wants to clear the bit */
+ if (handle) {
+ __clear_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
+ page_owner->free_handle = handle;
+ }
+ page_owner->free_ts_nsec = free_ts_nsec;
+ page_owner->free_pid = current->pid;
+ page_owner->free_tgid = current->tgid;
+ page_ext = page_ext_next(page_ext);
+ }
+}
+
+void __reset_page_owner(struct page *page, unsigned short order)
+{
struct page_ext *page_ext;
depot_stack_handle_t handle;
depot_stack_handle_t alloc_handle;
@@ -245,16 +300,10 @@ void __reset_page_owner(struct page *pag
alloc_handle = page_owner->handle;

handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
- for (i = 0; i < (1 << order); i++) {
- __clear_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
- page_owner->free_handle = handle;
- page_owner->free_ts_nsec = free_ts_nsec;
- page_owner->free_pid = current->pid;
- page_owner->free_tgid = current->tgid;
- page_ext = page_ext_next(page_ext);
- page_owner = get_page_owner(page_ext);
- }
+ __update_page_owner_free_handle(page_ext, handle, order, current->pid,
+ current->tgid, free_ts_nsec);
page_ext_put(page_ext);
+
if (alloc_handle != early_handle)
/*
* early_handle is being set as a handle for all those
@@ -263,39 +312,14 @@ void __reset_page_owner(struct page *pag
* the machinery is not ready yet, we cannot decrement
* their refcount either.
*/
- dec_stack_record_count(alloc_handle);
-}
-
-static inline void __set_page_owner_handle(struct page_ext *page_ext,
- depot_stack_handle_t handle,
- unsigned short order, gfp_t gfp_mask)
-{
- struct page_owner *page_owner;
- int i;
- u64 ts_nsec = local_clock();
-
- for (i = 0; i < (1 << order); i++) {
- page_owner = get_page_owner(page_ext);
- page_owner->handle = handle;
- page_owner->order = order;
- page_owner->gfp_mask = gfp_mask;
- page_owner->last_migrate_reason = -1;
- page_owner->pid = current->pid;
- page_owner->tgid = current->tgid;
- page_owner->ts_nsec = ts_nsec;
- strscpy(page_owner->comm, current->comm,
- sizeof(page_owner->comm));
- __set_bit(PAGE_EXT_OWNER, &page_ext->flags);
- __set_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
-
- page_ext = page_ext_next(page_ext);
- }
+ dec_stack_record_count(alloc_handle, 1 << order);
}

noinline void __set_page_owner(struct page *page, unsigned short order,
gfp_t gfp_mask)
{
struct page_ext *page_ext;
+ u64 ts_nsec = local_clock();
depot_stack_handle_t handle;

handle = save_stack(gfp_mask);
@@ -303,9 +327,11 @@ noinline void __set_page_owner(struct pa
page_ext = page_ext_get(page);
if (unlikely(!page_ext))
return;
- __set_page_owner_handle(page_ext, handle, order, gfp_mask);
+ __update_page_owner_handle(page_ext, handle, order, gfp_mask, -1,
+ current->pid, current->tgid, ts_nsec,
+ current->comm);
page_ext_put(page_ext);
- inc_stack_record_count(handle, gfp_mask);
+ inc_stack_record_count(handle, gfp_mask, 1 << order);
}

void __set_page_owner_migrate_reason(struct page *page, int reason)
@@ -342,7 +368,7 @@ void __folio_copy_owner(struct folio *ne
{
struct page_ext *old_ext;
struct page_ext *new_ext;
- struct page_owner *old_page_owner, *new_page_owner;
+ struct page_owner *old_page_owner;

old_ext = page_ext_get(&old->page);
if (unlikely(!old_ext))
@@ -355,31 +381,21 @@ void __folio_copy_owner(struct folio *ne
}

old_page_owner = get_page_owner(old_ext);
- new_page_owner = get_page_owner(new_ext);
- new_page_owner->order = old_page_owner->order;
- new_page_owner->gfp_mask = old_page_owner->gfp_mask;
- new_page_owner->last_migrate_reason =
- old_page_owner->last_migrate_reason;
- new_page_owner->handle = old_page_owner->handle;
- new_page_owner->pid = old_page_owner->pid;
- new_page_owner->tgid = old_page_owner->tgid;
- new_page_owner->free_pid = old_page_owner->free_pid;
- new_page_owner->free_tgid = old_page_owner->free_tgid;
- new_page_owner->ts_nsec = old_page_owner->ts_nsec;
- new_page_owner->free_ts_nsec = old_page_owner->ts_nsec;
- strcpy(new_page_owner->comm, old_page_owner->comm);
-
+ __update_page_owner_handle(new_ext, old_page_owner->handle,
+ old_page_owner->order, old_page_owner->gfp_mask,
+ old_page_owner->last_migrate_reason,
+ old_page_owner->ts_nsec, old_page_owner->pid,
+ old_page_owner->tgid, old_page_owner->comm);
/*
- * We don't clear the bit on the old folio as it's going to be freed
- * after migration. Until then, the info can be useful in case of
- * a bug, and the overall stats will be off a bit only temporarily.
- * Also, migrate_misplaced_transhuge_page() can still fail the
- * migration and then we want the old folio to retain the info. But
- * in that case we also don't need to explicitly clear the info from
- * the new page, which will be freed.
+ * Do not proactively clear PAGE_EXT_OWNER{_ALLOCATED} bits as the folio
+ * will be freed after migration. Keep them until then as they may be
+ * useful.
*/
- __set_bit(PAGE_EXT_OWNER, &new_ext->flags);
- __set_bit(PAGE_EXT_OWNER_ALLOCATED, &new_ext->flags);
+ __update_page_owner_free_handle(new_ext, 0, old_page_owner->order,
+ old_page_owner->free_pid,
+ old_page_owner->free_tgid,
+ old_page_owner->free_ts_nsec);
+
page_ext_put(new_ext);
page_ext_put(old_ext);
}
@@ -787,8 +803,9 @@ static void init_pages_in_zone(pg_data_t
goto ext_put_continue;

/* Found early allocated page */
- __set_page_owner_handle(page_ext, early_handle,
- 0, 0);
+ __update_page_owner_handle(page_ext, early_handle, 0, 0,
+ -1, local_clock(), current->pid,
+ current->tgid, current->comm);
count++;
ext_put_continue:
page_ext_put(page_ext);
@@ -861,11 +878,11 @@ static void *stack_next(struct seq_file
return stack;
}

-static unsigned long page_owner_stack_threshold;
+static unsigned long page_owner_pages_threshold;

static int stack_print(struct seq_file *m, void *v)
{
- int i, stack_count;
+ int i, nr_base_pages;
struct stack *stack = v;
unsigned long *entries;
unsigned long nr_entries;
@@ -876,14 +893,14 @@ static int stack_print(struct seq_file *

nr_entries = stack_record->size;
entries = stack_record->entries;
- stack_count = refcount_read(&stack_record->count) - 1;
+ nr_base_pages = refcount_read(&stack_record->count) - 1;

- if (stack_count < 1 || stack_count < page_owner_stack_threshold)
+ if (nr_base_pages < 1 || nr_base_pages < page_owner_pages_threshold)
return 0;

for (i = 0; i < nr_entries; i++)
seq_printf(m, " %pS\n", (void *)entries[i]);
- seq_printf(m, "stack_count: %d\n\n", stack_count);
+ seq_printf(m, "nr_base_pages: %d\n\n", nr_base_pages);

return 0;
}
@@ -913,13 +930,13 @@ static const struct file_operations page

static int page_owner_threshold_get(void *data, u64 *val)
{
- *val = READ_ONCE(page_owner_stack_threshold);
+ *val = READ_ONCE(page_owner_pages_threshold);
return 0;
}

static int page_owner_threshold_set(void *data, u64 val)
{
- WRITE_ONCE(page_owner_stack_threshold, val);
+ WRITE_ONCE(page_owner_pages_threshold, val);
return 0;
}

--

2024-03-30 15:46:16

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] [audit?] [bpf?] INFO: rcu detected stall in kauditd_thread (4)

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING in hrtimer_forward

------------[ cut here ]------------
WARNING: CPU: 1 PID: 5464 at kernel/time/hrtimer.c:1053 hrtimer_forward+0x1d3/0x260 kernel/time/hrtimer.c:1053
Modules linked in:
CPU: 1 PID: 5464 Comm: udevd Not tainted 6.9.0-rc1-syzkaller-00274-g486291a0e624-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
RIP: 0010:hrtimer_forward+0x1d3/0x260 kernel/time/hrtimer.c:1053
Code: 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 49 bc ff ff ff ff ff ff ff 7f eb 85 49 bc ff ff ff ff ff ff ff 7f eb c3 e8 9e d5 12 00 90 <0f> 0b 90 48 c7 04 24 00 00 00 00 eb b9 e8 8b d5 12 00 4c 89 e0 4c
RSP: 0018:ffffc90000a08d30 EFLAGS: 00010046
RAX: 0000000080010001 RBX: ffff88807bc91340 RCX: ffffffff817a9ad4
RDX: ffff888029330000 RSI: ffffffff817a9c12 RDI: 0000000000000001
RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 00000000000035cd
R13: 58f2e2407277c48d R14: 58f2e24072778ec0 R15: 0000000000030d40
FS: 00007f983e6dac80(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa573182b10 CR3: 000000001e630000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
hrtimer_forward_now include/linux/hrtimer.h:355 [inline]
advance_sched+0x670/0xc50 net/sched/sch_taprio.c:983
__run_hrtimer kernel/time/hrtimer.c:1692 [inline]
__hrtimer_run_queues+0x20c/0xcc0 kernel/time/hrtimer.c:1756
hrtimer_interrupt+0x31b/0x800 kernel/time/hrtimer.c:1818
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1032 [inline]
__sysvec_apic_timer_interrupt+0x10f/0x450 arch/x86/kernel/apic/apic.c:1049
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
sysvec_apic_timer_interrupt+0x90/0xb0 arch/x86/kernel/apic/apic.c:1043
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
RIP: 0010:on_stack arch/x86/include/asm/stacktrace.h:59 [inline]
RIP: 0010:stack_access_ok+0x1cd/0x270 arch/x86/kernel/unwind_orc.c:393
Code: 8b 44 24 08 48 01 c5 49 39 ec 0f 83 3e ff ff ff e8 a8 ab 50 00 48 39 eb 41 0f 93 c6 e9 30 ff ff ff e8 97 ab 50 00 4c 8b 3c 24 <4c> 39 fd 0f 83 d3 fe ff ff e8 85 ab 50 00 48 8b 44 24 08 48 01 e8
RSP: 0018:ffffc90003b57870 EFLAGS: 00000293
RAX: 0000000000000000 RBX: ffffc90003b57990 RCX: ffffffff813cc4e6
RDX: ffff888029330000 RSI: ffffffff813cc619 RDI: 0000000000000005
RBP: ffffc90003b57980 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90003b57998
R13: ffffc90003b579a0 R14: ffffc90003b50000 R15: ffffc90003b58000
deref_stack_reg arch/x86/kernel/unwind_orc.c:403 [inline]
unwind_next_frame+0xd9b/0x23a0 arch/x86/kernel/unwind_orc.c:585
__unwind_start+0x5aa/0x880 arch/x86/kernel/unwind_orc.c:760
unwind_start arch/x86/include/asm/unwind.h:64 [inline]
arch_stack_walk+0xb2/0x170 arch/x86/kernel/stacktrace.c:24
stack_trace_save+0x95/0xd0 kernel/stacktrace.c:122
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:579
poison_slab_object mm/kasan/common.c:240 [inline]
__kasan_slab_free+0x11d/0x1a0 mm/kasan/common.c:256
kasan_slab_free include/linux/kasan.h:184 [inline]
slab_free_hook mm/slub.c:2106 [inline]
slab_free mm/slub.c:4280 [inline]
kmem_cache_free+0x12e/0x380 mm/slub.c:4344
putname+0x12e/0x170 fs/namei.c:273
do_sys_openat2+0x160/0x1e0 fs/open.c:1414
do_sys_open fs/open.c:1421 [inline]
__do_sys_openat fs/open.c:1437 [inline]
__se_sys_openat fs/open.c:1432 [inline]
__x64_sys_openat+0x175/0x210 fs/open.c:1432
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd2/0x260 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x72/0x7a
RIP: 0033:0x7f983e3169a4
Code: 24 20 48 8d 44 24 30 48 89 44 24 28 64 8b 04 25 18 00 00 00 85 c0 75 2c 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 76 60 48 8b 15 55 a4 0d 00 f7 d8 64 89 02 48 83
RSP: 002b:00007fff4c215900 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f983e3169a4
RDX: 0000000000080241 RSI: 00007fff4c215e48 RDI: 00000000ffffff9c
RBP: 00007fff4c215e48 R08: 0000000000000004 R09: 0000000000000001
R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000080241
R13: 000055d9597f572e R14: 0000000000000001 R15: 000055d959810160
</TASK>
----------------
Code disassembly (best guess):
0: 8b 44 24 08 mov 0x8(%rsp),%eax
4: 48 01 c5 add %rax,%rbp
7: 49 39 ec cmp %rbp,%r12
a: 0f 83 3e ff ff ff jae 0xffffff4e
10: e8 a8 ab 50 00 call 0x50abbd
15: 48 39 eb cmp %rbp,%rbx
18: 41 0f 93 c6 setae %r14b
1c: e9 30 ff ff ff jmp 0xffffff51
21: e8 97 ab 50 00 call 0x50abbd
26: 4c 8b 3c 24 mov (%rsp),%r15
* 2a: 4c 39 fd cmp %r15,%rbp <-- trapping instruction
2d: 0f 83 d3 fe ff ff jae 0xffffff06
33: e8 85 ab 50 00 call 0x50abbd
38: 48 8b 44 24 08 mov 0x8(%rsp),%rax
3d: 48 01 e8 add %rbp,%rax


Tested on:

commit: 486291a0 Merge tag 'drm-fixes-2024-03-30' of https://g..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=129ac3c6180000
kernel config: https://syzkaller.appspot.com/x/.config?x=f64ec427e98bccd7
dashboard link: https://syzkaller.appspot.com/bug?extid=81f5ca46b043d4a1b789
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=15be4109180000


2024-03-31 00:12:04

by Hillf Danton

[permalink] [raw]
Subject: Re: [syzbot] [audit?] [bpf?] INFO: rcu detected stall in kauditd_thread (4)

On Wed, 27 Mar 2024 11:39:42 -0700
> syzbot found the following issue on:
>
> HEAD commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel..
> git tree: upstream
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=112840ee180000

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

--- x/mm/page_owner.c
+++ y/mm/page_owner.c
@@ -196,7 +196,8 @@ static void add_stack_record_to_list(str
spin_unlock_irqrestore(&stack_list_lock, flags);
}

-static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask)
+static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask,
+ int nr_base_pages)
{
struct stack_record *stack_record = __stack_depot_get_stack_record(handle);

@@ -217,20 +218,74 @@ static void inc_stack_record_count(depot
/* Add the new stack_record to our list */
add_stack_record_to_list(stack_record, gfp_mask);
}
- refcount_inc(&stack_record->count);
+ refcount_add(nr_base_pages, &stack_record->count);
}

-static void dec_stack_record_count(depot_stack_handle_t handle)
+static void dec_stack_record_count(depot_stack_handle_t handle,
+ int nr_base_pages)
{
struct stack_record *stack_record = __stack_depot_get_stack_record(handle);

- if (stack_record)
- refcount_dec(&stack_record->count);
+ if (!stack_record)
+ return;
+
+ if (refcount_sub_and_test(nr_base_pages, &stack_record->count))
+ pr_warn("%s: refcount went to 0 for %u handle\n", __func__,
+ handle);
}

-void __reset_page_owner(struct page *page, unsigned short order)
+static inline void __update_page_owner_handle(struct page_ext *page_ext,
+ depot_stack_handle_t handle,
+ unsigned short order,
+ gfp_t gfp_mask,
+ short last_migrate_reason, u64 ts_nsec,
+ pid_t pid, pid_t tgid, char *comm)
{
int i;
+ struct page_owner *page_owner;
+
+ for (i = 0; i < (1 << order); i++) {
+ page_owner = get_page_owner(page_ext);
+ page_owner->handle = handle;
+ page_owner->order = order;
+ page_owner->gfp_mask = gfp_mask;
+ page_owner->last_migrate_reason = last_migrate_reason;
+ page_owner->pid = pid;
+ page_owner->tgid = tgid;
+ page_owner->ts_nsec = ts_nsec;
+ strscpy(page_owner->comm, comm,
+ sizeof(page_owner->comm));
+ __set_bit(PAGE_EXT_OWNER, &page_ext->flags);
+ __set_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
+ page_ext = page_ext_next(page_ext);
+ }
+}
+
+static inline void __update_page_owner_free_handle(struct page_ext *page_ext,
+ depot_stack_handle_t handle,
+ unsigned short order,
+ pid_t pid, pid_t tgid,
+ u64 free_ts_nsec)
+{
+ int i;
+ struct page_owner *page_owner;
+
+ for (i = 0; i < (1 << order); i++) {
+ page_owner = get_page_owner(page_ext);
+ /* Only __reset_page_owner() wants to clear the bit */
+ if (handle) {
+ __clear_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
+ page_owner->free_handle = handle;
+ }
+ page_owner->free_ts_nsec = free_ts_nsec;
+ page_owner->free_pid = current->pid;
+ page_owner->free_tgid = current->tgid;
+ page_ext = page_ext_next(page_ext);
+ }
+}
+
+void __reset_page_owner(struct page *page, unsigned short order)
+{
struct page_ext *page_ext;
depot_stack_handle_t handle;
depot_stack_handle_t alloc_handle;
@@ -245,16 +300,10 @@ void __reset_page_owner(struct page *pag
alloc_handle = page_owner->handle;

handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
- for (i = 0; i < (1 << order); i++) {
- __clear_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
- page_owner->free_handle = handle;
- page_owner->free_ts_nsec = free_ts_nsec;
- page_owner->free_pid = current->pid;
- page_owner->free_tgid = current->tgid;
- page_ext = page_ext_next(page_ext);
- page_owner = get_page_owner(page_ext);
- }
+ __update_page_owner_free_handle(page_ext, handle, order, current->pid,
+ current->tgid, free_ts_nsec);
page_ext_put(page_ext);
+
if (alloc_handle != early_handle)
/*
* early_handle is being set as a handle for all those
@@ -263,39 +312,14 @@ void __reset_page_owner(struct page *pag
* the machinery is not ready yet, we cannot decrement
* their refcount either.
*/
- dec_stack_record_count(alloc_handle);
-}
-
-static inline void __set_page_owner_handle(struct page_ext *page_ext,
- depot_stack_handle_t handle,
- unsigned short order, gfp_t gfp_mask)
-{
- struct page_owner *page_owner;
- int i;
- u64 ts_nsec = local_clock();
-
- for (i = 0; i < (1 << order); i++) {
- page_owner = get_page_owner(page_ext);
- page_owner->handle = handle;
- page_owner->order = order;
- page_owner->gfp_mask = gfp_mask;
- page_owner->last_migrate_reason = -1;
- page_owner->pid = current->pid;
- page_owner->tgid = current->tgid;
- page_owner->ts_nsec = ts_nsec;
- strscpy(page_owner->comm, current->comm,
- sizeof(page_owner->comm));
- __set_bit(PAGE_EXT_OWNER, &page_ext->flags);
- __set_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
-
- page_ext = page_ext_next(page_ext);
- }
+ dec_stack_record_count(alloc_handle, 1 << order);
}

noinline void __set_page_owner(struct page *page, unsigned short order,
gfp_t gfp_mask)
{
struct page_ext *page_ext;
+ u64 ts_nsec = local_clock();
depot_stack_handle_t handle;

handle = save_stack(gfp_mask);
@@ -303,9 +327,11 @@ noinline void __set_page_owner(struct pa
page_ext = page_ext_get(page);
if (unlikely(!page_ext))
return;
- __set_page_owner_handle(page_ext, handle, order, gfp_mask);
+ __update_page_owner_handle(page_ext, handle, order, gfp_mask, -1,
+ current->pid, current->tgid, ts_nsec,
+ current->comm);
page_ext_put(page_ext);
- inc_stack_record_count(handle, gfp_mask);
+ inc_stack_record_count(handle, gfp_mask, 1 << order);
}

void __set_page_owner_migrate_reason(struct page *page, int reason)
@@ -342,7 +368,7 @@ void __folio_copy_owner(struct folio *ne
{
struct page_ext *old_ext;
struct page_ext *new_ext;
- struct page_owner *old_page_owner, *new_page_owner;
+ struct page_owner *old_page_owner;

old_ext = page_ext_get(&old->page);
if (unlikely(!old_ext))
@@ -355,31 +381,21 @@ void __folio_copy_owner(struct folio *ne
}

old_page_owner = get_page_owner(old_ext);
- new_page_owner = get_page_owner(new_ext);
- new_page_owner->order = old_page_owner->order;
- new_page_owner->gfp_mask = old_page_owner->gfp_mask;
- new_page_owner->last_migrate_reason =
- old_page_owner->last_migrate_reason;
- new_page_owner->handle = old_page_owner->handle;
- new_page_owner->pid = old_page_owner->pid;
- new_page_owner->tgid = old_page_owner->tgid;
- new_page_owner->free_pid = old_page_owner->free_pid;
- new_page_owner->free_tgid = old_page_owner->free_tgid;
- new_page_owner->ts_nsec = old_page_owner->ts_nsec;
- new_page_owner->free_ts_nsec = old_page_owner->ts_nsec;
- strcpy(new_page_owner->comm, old_page_owner->comm);
-
+ __update_page_owner_handle(new_ext, old_page_owner->handle,
+ old_page_owner->order, old_page_owner->gfp_mask,
+ old_page_owner->last_migrate_reason,
+ old_page_owner->ts_nsec, old_page_owner->pid,
+ old_page_owner->tgid, old_page_owner->comm);
/*
- * We don't clear the bit on the old folio as it's going to be freed
- * after migration. Until then, the info can be useful in case of
- * a bug, and the overall stats will be off a bit only temporarily.
- * Also, migrate_misplaced_transhuge_page() can still fail the
- * migration and then we want the old folio to retain the info. But
- * in that case we also don't need to explicitly clear the info from
- * the new page, which will be freed.
+ * Do not proactively clear PAGE_EXT_OWNER{_ALLOCATED} bits as the folio
+ * will be freed after migration. Keep them until then as they may be
+ * useful.
*/
- __set_bit(PAGE_EXT_OWNER, &new_ext->flags);
- __set_bit(PAGE_EXT_OWNER_ALLOCATED, &new_ext->flags);
+ __update_page_owner_free_handle(new_ext, 0, old_page_owner->order,
+ old_page_owner->free_pid,
+ old_page_owner->free_tgid,
+ old_page_owner->free_ts_nsec);
+
page_ext_put(new_ext);
page_ext_put(old_ext);
}
@@ -787,8 +803,9 @@ static void init_pages_in_zone(pg_data_t
goto ext_put_continue;

/* Found early allocated page */
- __set_page_owner_handle(page_ext, early_handle,
- 0, 0);
+ __update_page_owner_handle(page_ext, early_handle, 0, 0,
+ -1, local_clock(), current->pid,
+ current->tgid, current->comm);
count++;
ext_put_continue:
page_ext_put(page_ext);
@@ -861,11 +878,11 @@ static void *stack_next(struct seq_file
return stack;
}

-static unsigned long page_owner_stack_threshold;
+static unsigned long page_owner_pages_threshold;

static int stack_print(struct seq_file *m, void *v)
{
- int i, stack_count;
+ int i, nr_base_pages;
struct stack *stack = v;
unsigned long *entries;
unsigned long nr_entries;
@@ -876,14 +893,14 @@ static int stack_print(struct seq_file *

nr_entries = stack_record->size;
entries = stack_record->entries;
- stack_count = refcount_read(&stack_record->count) - 1;
+ nr_base_pages = refcount_read(&stack_record->count) - 1;

- if (stack_count < 1 || stack_count < page_owner_stack_threshold)
+ if (nr_base_pages < 1 || nr_base_pages < page_owner_pages_threshold)
return 0;

for (i = 0; i < nr_entries; i++)
seq_printf(m, " %pS\n", (void *)entries[i]);
- seq_printf(m, "stack_count: %d\n\n", stack_count);
+ seq_printf(m, "nr_base_pages: %d\n\n", nr_base_pages);

return 0;
}
@@ -913,13 +930,13 @@ static const struct file_operations page

static int page_owner_threshold_get(void *data, u64 *val)
{
- *val = READ_ONCE(page_owner_stack_threshold);
+ *val = READ_ONCE(page_owner_pages_threshold);
return 0;
}

static int page_owner_threshold_set(void *data, u64 val)
{
- WRITE_ONCE(page_owner_stack_threshold, val);
+ WRITE_ONCE(page_owner_pages_threshold, val);
return 0;
}

--- x/net/sched/sch_taprio.c
+++ y/net/sched/sch_taprio.c
@@ -980,7 +980,9 @@ first_run:
rcu_assign_pointer(q->current_entry, next);
spin_unlock(&q->current_entry_lock);

- hrtimer_set_expires(&q->advance_timer, end_time);
+ hrtimer_set_expires(&q->advance_timer,
+ ktime_add(timer->base->get_time(),
+ ns_to_ktime(NSEC_PER_USEC *200)));

rcu_read_lock();
__netif_schedule(sch);
--

2024-03-31 06:39:14

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] [audit?] [bpf?] INFO: rcu detected stall in kauditd_thread (4)

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: [email protected]

Tested on:

commit: 712e1425 Merge tag 'xfs-6.9-fixes-1' of git://git.kern..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=13cd811d180000
kernel config: https://syzkaller.appspot.com/x/.config?x=f64ec427e98bccd7
dashboard link: https://syzkaller.appspot.com/bug?extid=81f5ca46b043d4a1b789
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=10c7aa1d180000

Note: testing is done by a robot and is best-effort only.

2024-03-31 07:25:30

by Hillf Danton

[permalink] [raw]
Subject: Re: [syzbot] [audit?] [bpf?] INFO: rcu detected stall in kauditd_thread (4)

On Thu, 28 Mar 2024 15:44:27 +0100 Petr Mladek <[email protected]>
> > > Call Trace:
> > > <IRQ>
> > > </IRQ>
> > > <TASK>
> > > csd_lock_wait kernel/smp.c:311 [inline]
> > > smp_call_function_many_cond+0x50b/0x1590 kernel/smp.c:855
> > > on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:1023
> > > on_each_cpu include/linux/smp.h:71 [inline]
>
> It seems that this is really stuck because it is not able to
> run something on the other CPU.
>
> IMHO, the primary problem is in the code proceed by the hrtimer.
> Adding networking people into Cc.

Yes the taprio hrtimer is a CPU hog [1]

[1] https://lore.kernel.org/lkml/[email protected]/