2022-06-12 16:23:28

by Oliver Sang

[permalink] [raw]
Subject: [cpuidle,intel_idle] 32d4fd5751: WARNING:at_kernel/rcu/tree.c:#rcu_eqs_exit



Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: 32d4fd5751eadbe1823a37eb38df85ec5c8e6207 ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: kernel-selftests
version: kernel-selftests-x86_64-cef46213-1_20220609
with following parameters:

group: resctrl
ucode: 0x500320a

test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
test-url: https://www.kernel.org/doc/Documentation/kselftest.txt


on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 29.104402][ T0] WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:864 rcu_eqs_exit+0x4b/0xc0
[ 29.104417][ T0]
[ 29.104418][ T0] =============================
[ 29.104419][ T0] WARNING: suspicious RCU usage
[ 29.104421][ T0] 5.19.0-rc1-00001-g32d4fd5751ea #1 Not tainted
[ 29.104424][ T0] -----------------------------
[ 29.104425][ T0] include/trace/events/lock.h:24 suspicious rcu_dereference_check() usage!
[ 29.104428][ T0]
[ 29.104428][ T0] other info that might help us debug this:
[ 29.104428][ T0]
[ 29.104430][ T0]
[ 29.104430][ T0] rcu_scheduler_active = 2, debug_locks = 1
[ 29.104432][ T0] RCU used illegally from extended quiescent state!
[ 29.104433][ T0] no locks held by swapper/0/0.
[ 29.104435][ T0]
[ 29.104435][ T0] stack backtrace:
[ 29.104436][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc1-00001-g32d4fd5751ea #1
[ 29.104439][ T0] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 29.104441][ T0] Call Trace:
[ 29.104443][ T0] <TASK>
[ 29.104449][ T0] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
[ 29.104457][ T0] lock_acquire.cold (include/trace/events/lock.h:24 kernel/locking/lockdep.c:5636)
[ 29.104465][ T0] ? rcu_read_unlock (include/linux/rcupdate.h:724 (discriminator 5))
[ 29.104480][ T0] ? check_prev_add (kernel/locking/lockdep.c:3785)
[ 29.104483][ T0] ? console_trylock_spinning (kernel/printk/printk.c:2000)
[ 29.104490][ T0] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
[ 29.104495][ T0] ? down_trylock (kernel/locking/semaphore.c:140)
[ 29.104499][ T0] down_trylock (kernel/locking/semaphore.c:140)
[ 29.104501][ T0] ? console_trylock_spinning (kernel/printk/printk.c:2000)
[ 29.104505][ T0] __down_trylock_console_sem (kernel/printk/printk.c:273)
[ 29.104509][ T0] console_trylock (kernel/printk/printk.c:2728)
[ 29.104512][ T0] console_trylock_spinning (kernel/printk/printk.c:2000)
[ 29.104517][ T0] vprintk_emit (kernel/printk/printk.c:2400)
[ 29.104520][ T0] ? __irq_work_queue_local (kernel/irq_work.c:101 (discriminator 1))
[ 29.104528][ T0] ? rcu_eqs_exit+0x4b/0xc0
[ 29.104533][ T0] _printk (kernel/printk/printk.c:2417)
[ 29.104537][ T0] ? record_print_text.cold (kernel/printk/printk.c:2417)
[ 29.104540][ T0] ? rcu_eqs_exit+0x4b/0xc0
[ 29.104545][ T0] ? _printk (kernel/printk/printk.c:2417)
[ 29.104548][ T0] ? record_print_text.cold (kernel/printk/printk.c:2417)
[ 29.104557][ T0] ? rcu_eqs_exit+0x4b/0xc0
[ 29.104561][ T0] __warn (kernel/panic.c:616)
[ 29.104569][ T0] ? rcu_eqs_exit+0x4b/0xc0
[ 29.104574][ T0] report_bug (lib/bug.c:200)
[ 29.104585][ T0] handle_bug (arch/x86/kernel/traps.c:316)
[ 29.104588][ T0] exc_invalid_op (arch/x86/kernel/traps.c:336 (discriminator 1))
[ 29.104592][ T0] asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568)
[ 29.104596][ T0] RIP: rcu_eqs_exit+0x4b/0xc0
[ 29.104600][ T0] Code: 1d 32 68 43 7c 48 8b 83 40 01 00 00 48 85 c0 74 1d 48 83 c0 01 48 89 83 40 01 00 00 5b 5d c3 65 8b 05 75 ca 43 7c 85 c0 74 cc <0f> 0b eb c8 e8 6c ff ff ff 48 8d ab 50 01 00 00 be 04 00 00 00 48
All code
========
0: 1d 32 68 43 7c sbb $0x7c436832,%eax
5: 48 8b 83 40 01 00 00 mov 0x140(%rbx),%rax
c: 48 85 c0 test %rax,%rax
f: 74 1d je 0x2e
11: 48 83 c0 01 add $0x1,%rax
15: 48 89 83 40 01 00 00 mov %rax,0x140(%rbx)
1c: 5b pop %rbx
1d: 5d pop %rbp
1e: c3 retq
1f: 65 8b 05 75 ca 43 7c mov %gs:0x7c43ca75(%rip),%eax # 0x7c43ca9b
26: 85 c0 test %eax,%eax
28: 74 cc je 0xfffffffffffffff6
2a:* 0f 0b ud2 <-- trapping instruction
2c: eb c8 jmp 0xfffffffffffffff6
2e: e8 6c ff ff ff callq 0xffffffffffffff9f
33: 48 8d ab 50 01 00 00 lea 0x150(%rbx),%rbp
3a: be 04 00 00 00 mov $0x4,%esi
3f: 48 rex.W

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: eb c8 jmp 0xffffffffffffffcc
4: e8 6c ff ff ff callq 0xffffffffffffff75
9: 48 8d ab 50 01 00 00 lea 0x150(%rbx),%rbp
10: be 04 00 00 00 mov $0x4,%esi
15: 48 rex.W
[ 29.104603][ T0] RSP: 0000:ffffffff85207da0 EFLAGS: 00010002
[ 29.104607][ T0] RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff825face9
[ 29.104609][ T0] RDX: 1ffffffff0b823e0 RSI: 0000000000000008 RDI: ffffffff852387c0
[ 29.104611][ T0] RBP: ffffffff85c11e40 R08: 0000000000000000 R09: ffffffff852387c7
[ 29.104612][ T0] R10: fffffbfff0a470f8 R11: 0000000000000001 R12: 0000000000000001
[ 29.104614][ T0] R13: ffffe8ec26c031ec R14: ffffffff85c11ec0 R15: ffffe8ec26c031e8
[ 29.104623][ T0] ? mwait_idle_with_hints (include/linux/instrumented.h:71 include/asm-generic/bitops/instrumented-non-atomic.h:134 include/linux/thread_info.h:118 include/linux/sched/idle.h:88 arch/x86/include/asm/mwait.h:120)
[ 29.104635][ T0] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:239)
[ 29.104648][ T0] cpuidle_enter (drivers/cpuidle/cpuidle.c:353)
[ 29.104653][ T0] cpuidle_idle_call (kernel/sched/idle.c:155 kernel/sched/idle.c:236)
[ 29.104659][ T0] ? arch_cpu_idle_exit+0xc0/0xc0
[ 29.104664][ T0] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526)
[ 29.104667][ T0] ? tick_nohz_idle_enter (arch/x86/include/asm/irqflags.h:45 (discriminator 3) arch/x86/include/asm/irqflags.h:80 (discriminator 3) kernel/time/tick-sched.c:1163 (discriminator 3))
[ 29.104671][ T0] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:50 (discriminator 22))
[ 29.104676][ T0] ? tsc_verify_tsc_adjust (arch/x86/kernel/tsc_sync.c:58)
[ 29.104685][ T0] do_idle (kernel/sched/idle.c:303)
[ 29.104690][ T0] cpu_startup_entry (kernel/sched/idle.c:399 (discriminator 1))
[ 29.104694][ T0] rest_init (include/linux/rcupdate.h:696 init/main.c:697)
[ 29.104698][ T0] arch_call_rest_init+0xf/0x15
[ 29.104704][ T0] start_kernel (init/main.c:1139)
[ 29.104709][ T0] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
[ 29.104728][ T0] </TASK>
[ 29.104730][ T0]
[ 29.104731][ T0] =============================
[ 29.104732][ T0] WARNING: suspicious RCU usage
[ 29.104732][ T0] 5.19.0-rc1-00001-g32d4fd5751ea #1 Not tainted
[ 29.104734][ T0] -----------------------------
[ 29.104734][ T0] include/trace/events/lock.h:69 suspicious rcu_dereference_check() usage!
[ 29.104736][ T0]
[ 29.104736][ T0] other info that might help us debug this:
[ 29.104736][ T0]
[ 29.104737][ T0]
[ 29.104737][ T0] rcu_scheduler_active = 2, debug_locks = 1
[ 29.104738][ T0] RCU used illegally from extended quiescent state!
[ 29.104739][ T0] 1 lock held by swapper/0/0:
[ 29.104740][ T0] #0: ffffffff85778a98 ((console_sem).lock){....}-{2:2}, at: down_trylock (kernel/locking/semaphore.c:140)
[ 29.104748][ T0]
[ 29.104748][ T0] stack backtrace:
[ 29.104749][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc1-00001-g32d4fd5751ea #1
[ 29.104751][ T0] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 29.104752][ T0] Call Trace:
[ 29.104753][ T0] <TASK>
[ 29.104756][ T0] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
[ 29.104760][ T0] ? down_trylock (kernel/locking/semaphore.c:145)
[ 29.104761][ T0] lock_release.cold (include/trace/events/lock.h:69 kernel/locking/lockdep.c:5676)
[ 29.104766][ T0] _raw_spin_unlock_irqrestore (include/linux/spinlock_api_smp.h:150 kernel/locking/spinlock.c:194)
[ 29.104770][ T0] down_trylock (kernel/locking/semaphore.c:145)
[ 29.104772][ T0] ? console_trylock_spinning (kernel/printk/printk.c:2000)
[ 29.104775][ T0] __down_trylock_console_sem (kernel/printk/printk.c:273)
[ 29.104779][ T0] console_trylock (kernel/printk/printk.c:2728)
[ 29.104782][ T0] console_trylock_spinning (kernel/printk/printk.c:2000)
[ 29.104787][ T0] vprintk_emit (kernel/printk/printk.c:2400)
[ 29.104790][ T0] ? __irq_work_queue_local (kernel/irq_work.c:101 (discriminator 1))
[ 29.104796][ T0] ? rcu_eqs_exit+0x4b/0xc0
[ 29.104800][ T0] _printk (kernel/printk/printk.c:2417)
[ 29.104804][ T0] ? record_print_text.cold (kernel/printk/printk.c:2417)
[ 29.104806][ T0] ? rcu_eqs_exit+0x4b/0xc0
[ 29.104811][ T0] ? _printk (kernel/printk/printk.c:2417)
[ 29.104815][ T0] ? record_print_text.cold (kernel/printk/printk.c:2417)
[ 29.104825][ T0] ? rcu_eqs_exit+0x4b/0xc0
[ 29.104829][ T0] __warn (kernel/panic.c:616)
[ 29.104833][ T0] ? rcu_eqs_exit+0x4b/0xc0
[ 29.104838][ T0] report_bug (lib/bug.c:200)
[ 29.104847][ T0] handle_bug (arch/x86/kernel/traps.c:316)
[ 29.104850][ T0] exc_invalid_op (arch/x86/kernel/traps.c:336 (discriminator 1))
[ 29.104854][ T0] asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568)
[ 29.104856][ T0] RIP: rcu_eqs_exit+0x4b/0xc0
[ 29.104859][ T0] Code: 1d 32 68 43 7c 48 8b 83 40 01 00 00 48 85 c0 74 1d 48 83 c0 01 48 89 83 40 01 00 00 5b 5d c3 65 8b 05 75 ca 43 7c 85 c0 74 cc <0f> 0b eb c8 e8 6c ff ff ff 48 8d ab 50 01 00 00 be 04 00 00 00 48
All code
========
0: 1d 32 68 43 7c sbb $0x7c436832,%eax
5: 48 8b 83 40 01 00 00 mov 0x140(%rbx),%rax
c: 48 85 c0 test %rax,%rax
f: 74 1d je 0x2e
11: 48 83 c0 01 add $0x1,%rax
15: 48 89 83 40 01 00 00 mov %rax,0x140(%rbx)
1c: 5b pop %rbx
1d: 5d pop %rbp
1e: c3 retq
1f: 65 8b 05 75 ca 43 7c mov %gs:0x7c43ca75(%rip),%eax # 0x7c43ca9b
26: 85 c0 test %eax,%eax
28: 74 cc je 0xfffffffffffffff6
2a:* 0f 0b ud2 <-- trapping instruction
2c: eb c8 jmp 0xfffffffffffffff6
2e: e8 6c ff ff ff callq 0xffffffffffffff9f
33: 48 8d ab 50 01 00 00 lea 0x150(%rbx),%rbp
3a: be 04 00 00 00 mov $0x4,%esi
3f: 48 rex.W

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: eb c8 jmp 0xffffffffffffffcc
4: e8 6c ff ff ff callq 0xffffffffffffff75
9: 48 8d ab 50 01 00 00 lea 0x150(%rbx),%rbp
10: be 04 00 00 00 mov $0x4,%esi
15: 48 rex.W
[ 29.104861][ T0] RSP: 0000:ffffffff85207da0 EFLAGS: 00010002
[ 29.104863][ T0] RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff825face9
[ 29.104865][ T0] RDX: 1ffffffff0b823e0 RSI: 0000000000000008 RDI: ffffffff852387c0
[ 29.104866][ T0] RBP: ffffffff85c11e40 R08: 0000000000000000 R09: ffffffff852387c7
[ 29.104867][ T0] R10: fffffbfff0a470f8 R11: 0000000000000001 R12: 0000000000000001
[ 29.104868][ T0] R13: ffffe8ec26c031ec R14: ffffffff85c11ec0 R15: ffffe8ec26c031e8
[ 29.104878][ T0] ? mwait_idle_with_hints (include/linux/instrumented.h:71 include/asm-generic/bitops/instrumented-non-atomic.h:134 include/linux/thread_info.h:118 include/linux/sched/idle.h:88 arch/x86/include/asm/mwait.h:120)
[ 29.104888][ T0] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:239)
[ 29.104897][ T0] cpuidle_enter (drivers/cpuidle/cpuidle.c:353)
[ 29.104903][ T0] cpuidle_idle_call (kernel/sched/idle.c:155 kernel/sched/idle.c:236)
[ 29.104907][ T0] ? arch_cpu_idle_exit+0xc0/0xc0
[ 29.104912][ T0] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526)
[ 29.104915][ T0] ? tick_nohz_idle_enter (arch/x86/include/asm/irqflags.h:45 (discriminator 3) arch/x86/include/asm/irqflags.h:80 (discriminator 3) kernel/time/tick-sched.c:1163 (discriminator 3))
[ 29.104917][ T0] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:50 (discriminator 22))
[ 29.104919][ T0] ? tsc_verify_tsc_adjust (arch/x86/kernel/tsc_sync.c:58)
[ 29.104926][ T0] do_idle (kernel/sched/idle.c:303)
[ 29.104931][ T0] cpu_startup_entry (kernel/sched/idle.c:399 (discriminator 1))
[ 29.104935][ T0] rest_init (include/linux/rcupdate.h:696 init/main.c:697)
[ 29.104939][ T0] arch_call_rest_init+0xf/0x15
[ 29.104941][ T0] start_kernel (init/main.c:1139)
[ 29.104947][ T0] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
[ 29.104964][ T0] </TASK>
[ 29.105350][ T0]
[ 29.105350][ T0] =============================
[ 29.105350][ T0] WARNING: suspicious RCU usage
[ 29.105350][ T0] 5.19.0-rc1-00001-g32d4fd5751ea #1 Not tainted
[ 29.105350][ T0] -----------------------------
[ 29.105350][ T0] include/linux/rcupdate.h:696 rcu_read_lock() used illegally while idle!
[ 29.105350][ T0]
[ 29.105350][ T0] other info that might help us debug this:
[ 29.105350][ T0]
[ 29.105350][ T0]
[ 29.105350][ T0] rcu_scheduler_active = 2, debug_locks = 1
[ 29.105350][ T0] RCU used illegally from extended quiescent state!
[ 29.105350][ T0] 4 locks held by swapper/0/0:
[ 29.105350][ T0] #0: ffffffff85778a20 (console_lock){+.+.}-{0:0}, at: console_trylock_spinning (kernel/printk/printk.c:2000)
[ 29.105350][ T0] #1: ffffffff853f8380 (console_owner){....}-{0:0}, at: __console_emit_next_record (kernel/printk/printk.c:2909 (discriminator 3))
[ 29.105350][ T0] #2: ffffffff85ca5098 (printing_lock){+.+.}-{2:2}, at: vt_console_print (drivers/tty/vt/vt.c:3090)
[ 29.105350][ T0] #3: ffffffff85789b40 (rcu_read_lock){....}-{1:2}, at: atomic_notifier_call_chain (kernel/notifier.c:221)
[ 29.105350][ T0]
[ 29.105350][ T0] stack backtrace:
[ 29.105350][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc1-00001-g32d4fd5751ea #1
[ 29.105350][ T0] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 29.105350][ T0] Call Trace:
[ 29.105350][ T0] <TASK>
[ 29.105350][ T0] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
[ 29.105350][ T0] atomic_notifier_call_chain (include/linux/rcupdate.h:696 kernel/notifier.c:224)
[ 29.105350][ T0] vt_console_print (drivers/tty/vt/vt.c:3137)
[ 29.105350][ T0] ? __lock_acquire (kernel/locking/lockdep.c:4973)
[ 29.105350][ T0] ? lf (drivers/tty/vt/vt.c:3079)
[ 29.105350][ T0] ? rwlock_bug+0xc0/0xc0
[ 29.105350][ T0] ? __console_emit_next_record (kernel/printk/printk.c:1938 (discriminator 3) kernel/printk/printk.c:2910 (discriminator 3))
[ 29.105350][ T0] __console_emit_next_record (kernel/printk/printk.c:2075 kernel/printk/printk.c:2916)
[ 29.105350][ T0] ? record_print_text.cold (kernel/printk/printk.c:2417)
[ 29.105350][ T0] ? devkmsg_read (kernel/printk/printk.c:2856)
[ 29.105350][ T0] ? __lock_acquire (kernel/locking/lockdep.c:5053)
[ 29.105350][ T0] ? lock_acquire (kernel/locking/lockdep.c:466 kernel/locking/lockdep.c:5667 kernel/locking/lockdep.c:5630)
[ 29.105350][ T0] console_flush_all (kernel/printk/printk.c:2950 kernel/printk/printk.c:3007)
[ 29.105350][ T0] console_unlock (kernel/printk/printk.c:3074)
[ 29.105350][ T0] ? wake_up_klogd (kernel/printk/printk.c:3046)
[ 29.105350][ T0] ? kmsg_dump_register (kernel/printk/printk.c:4043)
[ 29.105350][ T0] ? __down_trylock_console_sem (kernel/printk/printk.c:279)
[ 29.105350][ T0] vprintk_emit (kernel/printk/printk.c:2401)
[ 29.105350][ T0] ? __irq_work_queue_local (kernel/irq_work.c:101 (discriminator 1))
[ 29.105350][ T0] ? rcu_eqs_exit+0x4b/0xc0
[ 29.105350][ T0] _printk (kernel/printk/printk.c:2417)
[ 29.105350][ T0] ? record_print_text.cold (kernel/printk/printk.c:2417)
[ 29.105350][ T0] ? rcu_eqs_exit+0x4b/0xc0
[ 29.105350][ T0] ? _printk (kernel/printk/printk.c:2417)
[ 29.105350][ T0] ? record_print_text.cold (kernel/printk/printk.c:2417)
[ 29.105350][ T0] ? rcu_eqs_exit+0x4b/0xc0
[ 29.105350][ T0] __warn (kernel/panic.c:616)
[ 29.105350][ T0] ? rcu_eqs_exit+0x4b/0xc0
[ 29.105350][ T0] report_bug (lib/bug.c:200)
[ 29.105350][ T0] handle_bug (arch/x86/kernel/traps.c:316)
[ 29.105350][ T0] exc_invalid_op (arch/x86/kernel/traps.c:336 (discriminator 1))
[ 29.105350][ T0] asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568)
[ 29.105350][ T0] RIP: rcu_eqs_exit+0x4b/0xc0
[ 29.105350][ T0] Code: 1d 32 68 43 7c 48 8b 83 40 01 00 00 48 85 c0 74 1d 48 83 c0 01 48 89 83 40 01 00 00 5b 5d c3 65 8b 05 75 ca 43 7c 85 c0 74 cc <0f> 0b eb c8 e8 6c ff ff ff 48 8d ab 50 01 00 00 be 04 00 00 00 48
All code
========
0: 1d 32 68 43 7c sbb $0x7c436832,%eax
5: 48 8b 83 40 01 00 00 mov 0x140(%rbx),%rax
c: 48 85 c0 test %rax,%rax
f: 74 1d je 0x2e
11: 48 83 c0 01 add $0x1,%rax
15: 48 89 83 40 01 00 00 mov %rax,0x140(%rbx)
1c: 5b pop %rbx
1d: 5d pop %rbp
1e: c3 retq
1f: 65 8b 05 75 ca 43 7c mov %gs:0x7c43ca75(%rip),%eax # 0x7c43ca9b
26: 85 c0 test %eax,%eax
28: 74 cc je 0xfffffffffffffff6
2a:* 0f 0b ud2 <-- trapping instruction
2c: eb c8 jmp 0xfffffffffffffff6
2e: e8 6c ff ff ff callq 0xffffffffffffff9f
33: 48 8d ab 50 01 00 00 lea 0x150(%rbx),%rbp
3a: be 04 00 00 00 mov $0x4,%esi
3f: 48 rex.W

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: eb c8 jmp 0xffffffffffffffcc
4: e8 6c ff ff ff callq 0xffffffffffffff75
9: 48 8d ab 50 01 00 00 lea 0x150(%rbx),%rbp
10: be 04 00 00 00 mov $0x4,%esi
15: 48 rex.W
[ 29.105350][ T0] RSP: 0000:ffffffff85207da0 EFLAGS: 00010002
[ 29.105350][ T0] RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff825face9
[ 29.105350][ T0] RDX: 1ffffffff0b823e0 RSI: 0000000000000008 RDI: ffffffff852387c0
[ 29.105350][ T0] RBP: ffffffff85c11e40 R08: 0000000000000000 R09: ffffffff852387c7
[ 29.105350][ T0] R10: fffffbfff0a470f8 R11: 0000000000000001 R12: 0000000000000001
[ 29.105350][ T0] R13: ffffe8ec26c031ec R14: ffffffff85c11ec0 R15: ffffe8ec26c031e8
[ 29.105350][ T0] ? mwait_idle_with_hints (include/linux/instrumented.h:71 include/asm-generic/bitops/instrumented-non-atomic.h:134 include/linux/thread_info.h:118 include/linux/sched/idle.h:88 arch/x86/include/asm/mwait.h:120)
[ 29.105350][ T0] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:239)
[ 29.105350][ T0] cpuidle_enter (drivers/cpuidle/cpuidle.c:353)
[ 29.105350][ T0] cpuidle_idle_call (kernel/sched/idle.c:155 kernel/sched/idle.c:236)
[ 29.105350][ T0] ? arch_cpu_idle_exit+0xc0/0xc0
[ 29.105350][ T0] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526)
[ 29.105350][ T0] ? tick_nohz_idle_enter (arch/x86/include/asm/irqflags.h:45 (discriminator 3) arch/x86/include/asm/irqflags.h:80 (discriminator 3) kernel/time/tick-sched.c:1163 (discriminator 3))
[ 29.105350][ T0] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:50 (discriminator 22))
[ 29.105350][ T0] ? tsc_verify_tsc_adjust (arch/x86/kernel/tsc_sync.c:58)
[ 29.105350][ T0] do_idle (kernel/sched/idle.c:303)
[ 29.105350][ T0] cpu_startup_entry (kernel/sched/idle.c:399 (discriminator 1))
[ 29.105350][ T0] rest_init (include/linux/rcupdate.h:696 init/main.c:697)
[ 29.105350][ T0] arch_call_rest_init+0xf/0x15
[ 29.105350][ T0] start_kernel (init/main.c:1139)
[ 29.105350][ T0] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
[ 29.105350][ T0] </TASK>
[ 29.105350][ T0]
[ 29.105350][ T0] =============================


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://01.org/lkp



Attachments:
(No filename) (21.12 kB)
config-5.19.0-rc1-00001-g32d4fd5751ea (170.90 kB)
job-script (6.08 kB)
dmesg.xz (35.30 kB)
kernel-selftests (4.22 kB)
job.yaml (5.03 kB)
reproduce (154.00 B)
Download all attachments

2022-06-23 11:58:37

by Shinichiro Kawasaki

[permalink] [raw]
Subject: Re: [cpuidle,intel_idle] 32d4fd5751: WARNING:at_kernel/rcu/tree.c:#rcu_eqs_exit

On Jun 13, 2022 / 00:00, kernel test robot wrote:
>
>
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-11):
>
> commit: 32d4fd5751eadbe1823a37eb38df85ec5c8e6207 ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: kernel-selftests
> version: kernel-selftests-x86_64-cef46213-1_20220609
> with following parameters:
>
> group: resctrl
> ucode: 0x500320a
>
> test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
>
>
> on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
> [ 29.104402][ T0] WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:864 rcu_eqs_exit+0x4b/0xc0
> [ 29.104417][ T0]
> [ 29.104418][ T0] =============================
> [ 29.104419][ T0] WARNING: suspicious RCU usage
> [ 29.104421][ T0] 5.19.0-rc1-00001-g32d4fd5751ea #1 Not tainted
> [ 29.104424][ T0] -----------------------------

FYI, I observe this WARNING on my test servers for fstests, with kernel
v5.19-rc3. It was observed at system boot, and was also observed repeatedly
during fstests run. I reverted the commit 32d4fd5751ea then the WARNING
disappeared. The WARNING was observed on systems with 20 threads CPU, but
not observed on systems with 8 threads CPU.

Looking in the commit, I'm not sure how it is related to the RCU warning.
If any further action on my system would help, please let me know.

--
Shin'ichiro Kawasaki

2022-09-14 08:37:00

by Oliver Sang

[permalink] [raw]
Subject: Re: [cpuidle,intel_idle] 32d4fd5751: WARNING:at_kernel/rcu/tree.c:#rcu_eqs_exit

Hi Shin'ichiro Kawasaki and Peter Zijlstra,

On Thu, Jun 23, 2022 at 11:23:59AM +0000, Shinichiro Kawasaki wrote:
> On Jun 13, 2022 / 00:00, kernel test robot wrote:
> >
> >
> > Greeting,
> >
> > FYI, we noticed the following commit (built with gcc-11):
> >
> > commit: 32d4fd5751eadbe1823a37eb38df85ec5c8e6207 ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > in testcase: kernel-selftests
> > version: kernel-selftests-x86_64-cef46213-1_20220609
> > with following parameters:
> >
> > group: resctrl
> > ucode: 0x500320a
> >
> > test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
> > test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
> >
> >
> > on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> >
> >
> >
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <[email protected]>
> >
> >
> > [ 29.104402][ T0] WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:864 rcu_eqs_exit+0x4b/0xc0
> > [ 29.104417][ T0]
> > [ 29.104418][ T0] =============================
> > [ 29.104419][ T0] WARNING: suspicious RCU usage
> > [ 29.104421][ T0] 5.19.0-rc1-00001-g32d4fd5751ea #1 Not tainted
> > [ 29.104424][ T0] -----------------------------
>
> FYI, I observe this WARNING on my test servers for fstests, with kernel
> v5.19-rc3. It was observed at system boot, and was also observed repeatedly
> during fstests run. I reverted the commit 32d4fd5751ea then the WARNING
> disappeared. The WARNING was observed on systems with 20 threads CPU, but
> not observed on systems with 8 threads CPU.
>
> Looking in the commit, I'm not sure how it is related to the RCU warning.
> If any further action on my system would help, please let me know.

recently we made further tests and confirmed the issue is existing on this
commit but clean on parent, still on test machine:
88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory

=========================================================================================
compiler/group/kconfig/rootfs/tbox_group/testcase:
gcc-11/resctrl/x86_64-rhel-8.3-kselftests/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/kernel-selftests

commit:
v5.19-rc1
32d4fd5751eadbe1823a37eb38df85ec5c8e6207

v5.19-rc1 32d4fd5751eadbe1823a37eb38d
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:20 100% 20:20 dmesg.RIP:rcu_eqs_exit <------
:20 95% 19:20 dmesg.RIP:sched_clock_tick
:20 90% 18:20 dmesg.WARNING:at_kernel/rcu/tree.c:#rcu_eqs_exit
:20 90% 18:20 dmesg.WARNING:at_kernel/sched/clock.c:#sched_clock_tick
:20 100% 20:20 dmesg.WARNING:suspicious_RCU_usage
:20 100% 20:20 dmesg.boot_failures
:20 5% 1:20 dmesg.include/linux/rcupdate.h:#rcu_read_lock()used_illegally_while_idle
:20 5% 1:20 dmesg.include/linux/rcupdate.h:#rcu_read_unlock()used_illegally_while_idle
:20 95% 19:20 dmesg.include/trace/events/error_report.h:#suspicious_rcu_dereference_check()usage
:20 100% 20:20 dmesg.include/trace/events/lock.h:#suspicious_rcu_dereference_check()usage


as Shin'ichiro Kawasaki mentioned, the issues seems not be able to reproduce on
systems with small number of threads of CPU. so we tested on a vm which only
have 2 threads
qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

we confirmed the issue cannot be reproduced.

we actually don't have related knolwedge, if need extra data or testing we can
help.

>
> --
> Shin'ichiro Kawasaki