2023-09-15 07:56:40

by kernel test robot

[permalink] [raw]
Subject: [linus:master] [sched/fair] e0c2ff903c: WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_fwd_prog_nr


hi, Peter Zijlstra,

as below table:

af4cf40470c22efa e0c2ff903c320d3fd3c2c604dc4
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
100:100 0% 100:100 dmesg.BUG:workqueue_lockup-pool
:100 39% 39:100 dmesg.EIP:console_flush_all
1:100 -1% :100 dmesg.EIP:kthread_should_stop
1:100 0% 1:100 dmesg.EIP:raw_spin_rq_unlock_irq
6:100 -1% 5:100 dmesg.EIP:rcu_torture_fwd_prog_cr
:100 97% 97:100 dmesg.EIP:rcu_torture_fwd_prog_nr <----- (a)
100:100 0% 100:100 dmesg.EIP:ref_tracker_dir_exit
100:100 0% 100:100 dmesg.EIP:ref_tracker_free
6:100 -1% 5:100 dmesg.WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_fwd_prog_cr
:100 97% 97:100 dmesg.WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_fwd_prog_nr <----- (a)
100:100 0% 100:100 dmesg.WARNING:at_lib/ref_tracker.c:#ref_tracker_dir_exit
100:100 0% 100:100 dmesg.WARNING:at_lib/ref_tracker.c:#ref_tracker_free


we actually saw various issues for both parent and e0c2ff903c in vm boot tests
with a randcofig (the randconfig is same for parent and e0c2ff903c as in [1]),
some of them are very persistent for both commits, and some are very random.

the most obvious difference is about rcu_torture_fwd_prog_nr as above (a).
even when we run tests for both commits to 100 times, parent keeps clean,
but e0c2ff903c almost always show this issue.

frankly, I don't have enough knowledge to connect the commit change with this
issue, or to analyze its relation with other issues, so just report to you with
these data and dmesg (also in [1]) FYI.

if you need more data, log, or tests, please let me know. Thanks!


Hello,

kernel test robot noticed "WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_fwd_prog_nr" on:

commit: e0c2ff903c320d3fd3c2c604dc401b3b7c0a1d13 ("sched/fair: Remove sched_feat(START_DEBIT)")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master 3669558bdf354cd352be955ef2764cde6a9bf5ec]
[test failed on linux-next/master e143016b56ecb0fcda5bb6026b0a25fe55274f56]

in testcase: boot

compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


[1]
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230915/[email protected]


[ 357.942103][ T37] ------------[ cut here ]------------
[ 357.942792][ T37] WARNING: CPU: 0 PID: 37 at kernel/rcu/rcutorture.c:2702 rcu_torture_fwd_prog_nr (kernel/rcu/rcutorture.c:2702 (discriminator 1))
[ 357.943914][ T37] Modules linked in:
[ 357.944379][ T37] CPU: 0 PID: 37 Comm: rcu_torture_fwd Tainted: G W N 6.5.0-rc2-00017-ge0c2ff903c32 #1
[ 357.945561][ T37] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 357.946684][ T37] EIP: rcu_torture_fwd_prog_nr (kernel/rcu/rcutorture.c:2702 (discriminator 1))
[ 357.947330][ T37] Code: c2 83 f8 01 89 c6 0f 96 c0 31 c9 21 c2 0f b6 fa 89 f8 89 fa 88 45 d8 b8 20 ab 68 c2 6a 00 e8 c1 3f 06 00 80 7d d8 00 59 74 02 <0f> 0b 6a 00 31 c9 89 fa b8 08 ab 68 c2 e8 a8 3f 06 00 56 53 ff 75
All code
========
0: c2 83 f8 ret $0xf883
3: 01 89 c6 0f 96 c0 add %ecx,-0x3f69f03a(%rcx)
9: 31 c9 xor %ecx,%ecx
b: 21 c2 and %eax,%edx
d: 0f b6 fa movzbl %dl,%edi
10: 89 f8 mov %edi,%eax
12: 89 fa mov %edi,%edx
14: 88 45 d8 mov %al,-0x28(%rbp)
17: b8 20 ab 68 c2 mov $0xc268ab20,%eax
1c: 6a 00 push $0x0
1e: e8 c1 3f 06 00 call 0x63fe4
23: 80 7d d8 00 cmpb $0x0,-0x28(%rbp)
27: 59 pop %rcx
28: 74 02 je 0x2c
2a:* 0f 0b ud2 <-- trapping instruction
2c: 6a 00 push $0x0
2e: 31 c9 xor %ecx,%ecx
30: 89 fa mov %edi,%edx
32: b8 08 ab 68 c2 mov $0xc268ab08,%eax
37: e8 a8 3f 06 00 call 0x63fe4
3c: 56 push %rsi
3d: 53 push %rbx
3e: ff .byte 0xff
3f: 75 .byte 0x75

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 6a 00 push $0x0
4: 31 c9 xor %ecx,%ecx
6: 89 fa mov %edi,%edx
8: b8 08 ab 68 c2 mov $0xc268ab08,%eax
d: e8 a8 3f 06 00 call 0x63fba
12: 56 push %rsi
13: 53 push %rbx
14: ff .byte 0xff
15: 75 .byte 0x75
[ 357.949385][ T37] EAX: c268ab20 EBX: 00000000 ECX: 00000000 EDX: 00000001
[ 357.950145][ T37] ESI: 00000000 EDI: 00000001 EBP: c6519f4c ESP: c6519f10
[ 357.950909][ T37] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010202
[ 357.958410][ T37] CR0: 80050033 CR2: ffd99000 CR3: 0288f000 CR4: 000406d0
[ 357.959185][ T37] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 357.959952][ T37] DR6: fffe0ff0 DR7: 00000400
[ 357.960488][ T37] Call Trace:
[ 357.960896][ T37] ? show_regs (arch/x86/kernel/dumpstack.c:478)
[ 357.961417][ T37] ? rcu_torture_fwd_prog_nr (kernel/rcu/rcutorture.c:2702 (discriminator 1))
[ 357.962043][ T37] ? __warn (kernel/panic.c:673)
[ 357.962536][ T37] ? report_bug (lib/bug.c:201 lib/bug.c:219)
[ 357.963046][ T37] ? rcu_torture_fwd_prog_nr (kernel/rcu/rcutorture.c:2702 (discriminator 1))
[ 357.963673][ T37] ? exc_overflow (arch/x86/kernel/traps.c:337)
[ 357.964197][ T37] ? handle_bug (arch/x86/kernel/traps.c:324)
[ 357.964703][ T37] ? exc_invalid_op (arch/x86/kernel/traps.c:345 (discriminator 1))
[ 357.965242][ T37] ? handle_exception (arch/x86/entry/entry_32.S:1056)
[ 357.965832][ T37] ? xas_retry (include/linux/xarray.h:1279 include/linux/xarray.h:1508)
[ 357.966328][ T37] ? exc_overflow (arch/x86/kernel/traps.c:337)
[ 357.966852][ T37] ? rcu_torture_fwd_prog_nr (kernel/rcu/rcutorture.c:2702 (discriminator 1))
[ 357.967484][ T37] ? exc_overflow (arch/x86/kernel/traps.c:337)
[ 357.968006][ T37] ? rcu_torture_fwd_prog_nr (kernel/rcu/rcutorture.c:2702 (discriminator 1))
[ 357.968634][ T37] ? rcutorture_seq_diff (kernel/rcu/rcutorture.c:2509)
[ 357.969220][ T37] rcu_torture_fwd_prog (kernel/rcu/rcutorture.c:2914)
[ 357.969817][ T37] kthread (kernel/kthread.c:391)
[ 357.970290][ T37] ? rcu_torture_fwd_prog_cr (kernel/rcu/rcutorture.c:2876)
[ 357.970913][ T37] ? kthread_complete_and_exit (kernel/kthread.c:342)
[ 357.971537][ T37] ret_from_fork (arch/x86/kernel/process.c:151)
[ 357.972047][ T37] ? kthread_complete_and_exit (kernel/kthread.c:342)
[ 357.972684][ T37] ret_from_fork_asm (arch/x86/entry/entry_32.S:741)
[ 357.973232][ T37] entry_INT80_32 (arch/x86/entry/entry_32.S:947)
[ 357.973768][ T37] irq event stamp: 133627
[ 357.974269][ T37] hardirqs last enabled at (133637): __up_console_sem (arch/x86/include/asm/irqflags.h:42 (discriminator 1) arch/x86/include/asm/irqflags.h:77 (discriminator 1) arch/x86/include/asm/irqflags.h:135 (discriminator 1) kernel/printk/printk.c:347 (discriminator 1))
[ 357.975253][ T37] hardirqs last disabled at (133646): __up_console_sem (kernel/printk/printk.c:345 (discriminator 1))
[ 357.976244][ T37] softirqs last enabled at (133664): __do_softirq (arch/x86/include/asm/preempt.h:27 kernel/softirq.c:400 kernel/softirq.c:582)
[ 357.977202][ T37] softirqs last disabled at (133655): do_softirq_own_stack (arch/x86/kernel/irq_32.c:57 arch/x86/kernel/irq_32.c:147)
[ 357.978216][ T37] ---[ end trace 0000000000000000 ]---
[ 357.978834][ T37] rcu_torture_fwd_prog_nr: 0 Duration 2908 cver 0 gps 0
[ 357.979586][ T37] rcu_torture_fwd_prog_nr: Waiting for CBs: rcu_barrier+0x0/0x3b() 0
[ 357.980607][ T1] Testing (7,4)_8 code...
[ 357.981126][ T1] Testing up to error correction capacity...
[ 357.981801][ T1] Testing correction buffer interface...
[ 372.272315][ T1] Testing with caller provided syndrome...
[ 387.604016][ T1] Testing in-place interface...
[ 408.902506][ T1] Testing beyond error correction capacity...



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki