2024-02-08 02:32:24

by Oliver Sang

[permalink] [raw]
Subject: [mcgrof:20240131-test-xarray] [test_xarray] 3bc2ee924e: WARNING:at_kernel/rcu/tree_stall.h:#rcu_check_gp_start_stall



Hello,

kernel test robot noticed "WARNING:at_kernel/rcu/tree_stall.h:#rcu_check_gp_start_stall" on:

commit: 3bc2ee924e22d16abb6483d266f4c461f665fadf ("test_xarray: add tests for advanced multi-index use")
https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux.git 20240131-test-xarray

in testcase: boot

compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


[ 157.401342][ C0] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 59s!
[ 157.403783][ C0] Showing busy workqueues and worker pools:
[ 157.405264][ C0] workqueue events: flags=0x0
[ 157.406458][ C0] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 refcnt=3
[ 157.406481][ C0] pending: psi_avgs_work, stop_one_cpu_nowait_workfn
[ 157.406529][ C0] workqueue events_power_efficient: flags=0x80
[ 157.411631][ C0] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=4/256 refcnt=5
[ 157.411655][ C0] pending: neigh_managed_work, neigh_periodic_work, do_cache_clean, check_lifetime
[ 157.411741][ C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[ 188.121356][ C0] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 89s!
[ 188.123611][ C0] Showing busy workqueues and worker pools:
[ 188.124965][ C0] workqueue events: flags=0x0
[ 188.126133][ C0] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 refcnt=3
[ 188.126154][ C0] pending: psi_avgs_work, stop_one_cpu_nowait_workfn
[ 188.126199][ C0] workqueue events_power_efficient: flags=0x80
[ 188.131195][ C0] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=4/256 refcnt=5
[ 188.131212][ C0] pending: neigh_managed_work, neigh_periodic_work, do_cache_clean, check_lifetime
[ 188.131260][ C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[ 202.629415][ C0] ------------[ cut here ]------------
[ 202.630948][ C0] WARNING: CPU: 0 PID: 1 at kernel/rcu/tree_stall.h:990 rcu_check_gp_start_stall+0x17f/0x2c0
[ 202.633356][ C0] Modules linked in:
[ 202.633356][ C0] CPU: 0 PID: 1 Comm: swapper Tainted: G W N 6.8.0-rc2-00001-g3bc2ee924e22 #1 147c3e57e8173b4ddcb54882b8d3ddba616ae71f
[ 202.633356][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 202.633356][ C0] RIP: rcu_check_gp_start_stall+0x17f/0x2c0
[ 202.633356][ C0] Code: d5 00 00 00 48 8b 15 e0 25 c2 02 48 01 f2 48 39 d0 0f 88 c2 00 00 00 b8 01 00 00 00 87 05 f5 49 60 04 85 c0 0f 85 af 00 00 00 <0f> 0b 48 81 fb 00 65 3d 9a 74 2c 8b 15 04 44 f9 02 85 d2 74 16 8b
All code
========
0: d5 (bad)
1: 00 00 add %al,(%rax)
3: 00 48 8b add %cl,-0x75(%rax)
6: 15 e0 25 c2 02 adc $0x2c225e0,%eax
b: 48 01 f2 add %rsi,%rdx
e: 48 39 d0 cmp %rdx,%rax
11: 0f 88 c2 00 00 00 js 0xd9
17: b8 01 00 00 00 mov $0x1,%eax
1c: 87 05 f5 49 60 04 xchg %eax,0x46049f5(%rip) # 0x4604a17
22: 85 c0 test %eax,%eax
24: 0f 85 af 00 00 00 jne 0xd9
2a:* 0f 0b ud2 <-- trapping instruction
2c: 48 81 fb 00 65 3d 9a cmp $0xffffffff9a3d6500,%rbx
33: 74 2c je 0x61
35: 8b 15 04 44 f9 02 mov 0x2f94404(%rip),%edx # 0x2f9443f
3b: 85 d2 test %edx,%edx
3d: 74 16 je 0x55
3f: 8b .byte 0x8b

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 48 81 fb 00 65 3d 9a cmp $0xffffffff9a3d6500,%rbx
9: 74 2c je 0x37
b: 8b 15 04 44 f9 02 mov 0x2f94404(%rip),%edx # 0x2f94415
11: 85 d2 test %edx,%edx
13: 74 16 je 0x2b
15: 8b .byte 0x8b
[ 202.633356][ C0] RSP: 0000:ffffaaae80003f48 EFLAGS: 00010046
[ 202.633356][ C0] RAX: 0000000000000000 RBX: ffffffff9a3d6500 RCX: 0000000000003b10
[ 202.633356][ C0] RDX: 00000000ffff9d65 RSI: 000000000000668a RDI: 0000000000000000
[ 202.633356][ C0] RBP: 0000000000000246 R08: 0000000000000000 R09: 0000000000000000
[ 202.633356][ C0] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000009
[ 202.633356][ C0] R13: 0000000000000100 R14: ffffffff9b1c4368 R15: 0000000000000009
[ 202.633356][ C0] FS: 0000000000000000(0000) GS:ffffffff99e67000(0000) knlGS:0000000000000000
[ 202.633356][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 202.633356][ C0] CR2: ffff96127ffff000 CR3: 00000002c7257000 CR4: 00000000000406b0
[ 202.633356][ C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 202.633356][ C0] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 202.633356][ C0] Call Trace:
[ 202.633356][ C0] <IRQ>
[ 202.633356][ C0] ? show_regs (arch/x86/kernel/dumpstack.c:479)
[ 202.633356][ C0] ? __warn (kernel/panic.c:677)
[ 202.633356][ C0] ? rcu_check_gp_start_stall+0x17f/0x2c0
[ 202.633356][ C0] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 202.633356][ C0] ? handle_bug (arch/x86/kernel/traps.c:238)
[ 202.633356][ C0] ? exc_invalid_op (arch/x86/kernel/traps.c:259 (discriminator 1))
[ 202.633356][ C0] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568)
[ 202.633356][ C0] ? rcu_check_gp_start_stall+0x17f/0x2c0
[ 202.633356][ C0] ? rcu_check_gp_start_stall+0x8b/0x2c0
[ 202.633356][ C0] rcu_core (kernel/rcu/tree.c:2463)
[ 202.633356][ C0] __do_softirq (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/irq.h:142 kernel/softirq.c:554)
[ 202.633356][ C0] irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632 kernel/softirq.c:644)
[ 202.633356][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076 (discriminator 14))
[ 202.633356][ C0] </IRQ>
[ 202.633356][ C0] <TASK>
[ 202.633356][ C0] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:649)
[ 202.633356][ C0] RIP: 0010:lock_is_held_type (kernel/locking/lockdep.c:5830)
[ 202.633356][ C0] Code: 0f 00 00 b8 ff ff ff ff 0f c1 05 fe 65 22 01 83 f8 01 75 31 9c 58 f6 c4 02 75 44 48 f7 04 24 00 02 00 00 74 01 fb 48 83 c4 08 <44> 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f 31 d2 31 c9 31 f6 31 ff c3
All code
========
0: 0f 00 00 sldt (%rax)
3: b8 ff ff ff ff mov $0xffffffff,%eax
8: 0f c1 05 fe 65 22 01 xadd %eax,0x12265fe(%rip) # 0x122660d
f: 83 f8 01 cmp $0x1,%eax
12: 75 31 jne 0x45
14: 9c pushf
15: 58 pop %rax
16: f6 c4 02 test $0x2,%ah
19: 75 44 jne 0x5f
1b: 48 f7 04 24 00 02 00 testq $0x200,(%rsp)
22: 00
23: 74 01 je 0x26
25: fb sti
26: 48 83 c4 08 add $0x8,%rsp
2a:* 44 89 e0 mov %r12d,%eax <-- trapping instruction
2d: 5b pop %rbx
2e: 5d pop %rbp
2f: 41 5c pop %r12
31: 41 5d pop %r13
33: 41 5e pop %r14
35: 41 5f pop %r15
37: 31 d2 xor %edx,%edx
39: 31 c9 xor %ecx,%ecx
3b: 31 f6 xor %esi,%esi
3d: 31 ff xor %edi,%edi
3f: c3 ret

Code starting with the faulting instruction
===========================================
0: 44 89 e0 mov %r12d,%eax
3: 5b pop %rbx
4: 5d pop %rbp
5: 41 5c pop %r12
7: 41 5d pop %r13
9: 41 5e pop %r14
b: 41 5f pop %r15
d: 31 d2 xor %edx,%edx
f: 31 c9 xor %ecx,%ecx
11: 31 f6 xor %esi,%esi
13: 31 ff xor %edi,%edi
15: c3 ret
[ 202.633356][ C0] RSP: 0000:ffffaaae80013cd8 EFLAGS: 00000286
[ 202.633356][ C0] RAX: 0000000000000046 RBX: ffffffff9a3d4080 RCX: 0000000000000000
[ 202.633356][ C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff99b17b04
[ 202.633356][ C0] RBP: ffff960f44e56000 R08: 0000000000000000 R09: 0000000000000000
[ 202.633356][ C0] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 202.633356][ C0] R13: 00000000ffffffff R14: 0000000000000000 R15: ffff960f44e566f8
[ 202.633356][ C0] xas_descend (include/linux/xarray.h:1216 lib/xarray.c:206)
[ 202.633356][ C0] xas_load (lib/xarray.c:245)
[ 202.633356][ C0] test_get_entry+0x8f/0x1c0
[ 202.633356][ C0] check_xa_multi_store_adv+0x41e/0x600
[ 202.633356][ C0] check_multi_store_advanced+0x36/0x80
[ 202.633356][ C0] xarray_checks (lib/test_xarray.c:1971)
[ 202.633356][ C0] ? check_xas_retry+0xb00/0xb00
[ 202.633356][ C0] do_one_initcall (init/main.c:1236)
[ 202.633356][ C0] do_initcalls (init/main.c:1297 init/main.c:1314)
[ 202.633356][ C0] kernel_init_freeable (init/main.c:1553)
[ 202.633356][ C0] ? rest_init (init/main.c:1433)
[ 202.633356][ C0] kernel_init (init/main.c:1443)
[ 202.633356][ C0] ? _raw_spin_unlock_irq (arch/x86/include/asm/preempt.h:103 include/linux/spinlock_api_smp.h:160 kernel/locking/spinlock.c:202)
[ 202.633356][ C0] ret_from_fork (arch/x86/kernel/process.c:153)
[ 202.633356][ C0] ? rest_init (init/main.c:1433)
[ 202.633356][ C0] ret_from_fork_asm (arch/x86/entry/entry_64.S:250)
[ 202.633356][ C0] </TASK>
[ 202.633356][ C0] irq event stamp: 7770391
[ 202.633356][ C0] hardirqs last enabled at (7770390): rcu_core (arch/x86/include/asm/irqflags.h:26 arch/x86/include/asm/irqflags.h:67 arch/x86/include/asm/irqflags.h:127 kernel/rcu/tree_nocb.h:1738 kernel/rcu/tree.c:2457)
[ 202.633356][ C0] hardirqs last disabled at (7770391): _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:108 kernel/locking/spinlock.c:162)
[ 202.633356][ C0] softirqs last enabled at (7770364): __do_softirq (arch/x86/include/asm/preempt.h:26 kernel/softirq.c:400 kernel/softirq.c:582)
[ 202.633356][ C0] softirqs last disabled at (7770379): irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632 kernel/softirq.c:644)
[ 202.633356][ C0] ---[ end trace 0000000000000000 ]---
[ 202.738070][ C0] rcu: rcu_preempt: wait state: RCU_GP_WAIT_GPS(1) ->state: 0x0 ->rt_priority 0 delta ->gp_start 26476 ->gp_activity 26475 ->gp_req_activity 26277 ->gp_wake_time 26277 ->gp_wake_seq 15120 ->gp_seq 15120 ->gp_seq_needed 15124 ->gp_max 2505 ->gp_flags 0x1
[ 202.743977][ C0] rcu: rcu_node 0:0 ->gp_seq 15120 ->gp_seq_needed 15124 ->qsmask 0x0 .... ->n_boosts 0
[ 202.746434][ C0] rcu: RCU callbacks invoked since boot: 121850
[ 202.747976][ C0] rcu_tasks: RTGS_WAIT_CBS(11) since 46815 g:16 i:0/0 k.u. l:63
[ 202.749977][ C0] rcu_tasks_rude: RTGS_WAIT_CBS(11) since 46816 g:16 i:0/4 k.u. l:63


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240207/[email protected]



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki