Greeting,
FYI, we noticed the following commit (built with clang-15):
commit: d74b7e8fef5d9d1c6d7aba7b2ac898f77081a18a ("mm/page_alloc: Avoid disabling interruptions on hot paths")
https://git.kernel.org/cgit/linux/kernel/git/nsaenz/linux-rpi.git pcpdrain-sl-v3r1
in testcase: boot
on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
[ 595.240111][ C0] BUG: spinlock trylock failure on UP on CPU#0, boot-1-aliyun-x/3533
[ 595.241444][ C0] lock: 0xffff88843ffdcc70, .magic: dead4ead, .owner: boot-1-aliyun-x/3533, .owner_cpu: 0
[ 595.243114][ C0] CPU: 0 PID: 3533 Comm: boot-1-aliyun-x Not tainted 5.17.0-04447-gd74b7e8fef5d #1
[ 595.244782][ C0] Call Trace:
[ 595.245358][ C0] <IRQ>
[ 595.245884][ C0] dump_stack_lvl (lib/dump_stack.c:108)
[ 595.246661][ C0] dump_stack (lib/dump_stack.c:114)
[ 595.247387][ C0] spin_bug (kernel/locking/spinlock_debug.c:? kernel/locking/spinlock_debug.c:77)
[ 595.248093][ C0] do_raw_spin_trylock (include/linux/spinlock_up.h:42 kernel/locking/spinlock_debug.c:122)
[ 595.249033][ C0] _raw_spin_trylock (include/linux/spinlock_api_smp.h:89 kernel/locking/spinlock.c:138)
[ 595.249903][ C0] __rmqueue_pcplist (include/linux/spinlock.h:359 mm/page_alloc.c:3547)
[ 595.250816][ C0] ? tcp_data_ready (net/ipv4/tcp_input.c:4980)
[ 595.251730][ C0] ? tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2127)
[ 595.252576][ C0] ? rcu_read_lock_sched_held (include/linux/lockdep.h:283 kernel/rcu/update.c:125)
[ 595.253620][ C0] ? zone_watermark_fast (mm/page_alloc.c:3763 mm/page_alloc.c:3876)
[ 595.254595][ C0] get_page_from_freelist (mm/page_alloc.c:3603 mm/page_alloc.c:3631 mm/page_alloc.c:4096)
[ 595.255550][ C0] ? validate_chain (kernel/locking/lockdep.c:3696 kernel/locking/lockdep.c:3716 kernel/locking/lockdep.c:3771)
[ 595.256470][ C0] ? fs_reclaim_release (include/linux/sched/mm.h:190 mm/page_alloc.c:4516)
[ 595.257434][ C0] ? prepare_alloc_pages (include/linux/mmzone.h:1194 include/linux/mmzone.h:1220 mm/page_alloc.c:5115)
[ 595.258394][ C0] __alloc_pages (mm/page_alloc.c:?)
[ 595.259246][ C0] ? slob_alloc (mm/slob.c:358)
[ 595.260089][ C0] ? rcu_read_lock_sched_held (include/linux/lockdep.h:283 kernel/rcu/update.c:125)
[ 595.261077][ C0] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4208 kernel/locking/lockdep.c:4226 kernel/locking/lockdep.c:4294)
[ 595.262178][ C0] slob_new_pages (mm/slob.c:202)
[ 595.263008][ C0] slob_alloc (mm/slob.c:360)
[ 595.263786][ C0] ? __napi_alloc_skb (net/core/skbuff.c:570)
[ 595.264769][ C0] __kmalloc_track_caller (mm/slob.c:503 mm/slob.c:533)
[ 595.265774][ C0] ? __napi_alloc_skb (net/core/skbuff.c:570)
[ 595.266726][ C0] ? __napi_alloc_skb (net/core/skbuff.c:570)
[ 595.267561][ C0] __alloc_skb (net/core/skbuff.c:354 net/core/skbuff.c:426)
[ 595.268373][ C0] __napi_alloc_skb (net/core/skbuff.c:570)
[ 595.269275][ C0] e1000_clean_rx_irq (include/linux/skbuff.h:3005 drivers/net/ethernet/intel/e1000/e1000_main.c:4112 drivers/net/ethernet/intel/e1000/e1000_main.c:4331 drivers/net/ethernet/intel/e1000/e1000_main.c:4383)
[ 595.270213][ C0] ? e1000_alloc_jumbo_rx_buffers (drivers/net/ethernet/intel/e1000/e1000_main.c:4353)
[ 595.271290][ C0] e1000_clean (drivers/net/ethernet/intel/e1000/e1000_main.c:3933 drivers/net/ethernet/intel/e1000/e1000_main.c:3801)
[ 595.272132][ C0] ? __lock_acquire (kernel/locking/lockdep.c:?)
[ 595.273066][ C0] __napi_poll (net/core/dev.c:6365)
[ 595.273844][ C0] net_rx_action (net/core/dev.c:6432 net/core/dev.c:6519)
[ 595.274692][ C0] __do_softirq (arch/x86/include/asm/atomic.h:29 include/linux/atomic/atomic-instrumented.h:28 include/linux/jump_label.h:261 include/linux/jump_label.h:271 include/trace/events/irq.h:142 kernel/softirq.c:559)
[ 595.275537][ C0] ? handle_fasteoi_irq (kernel/irq/chip.c:722)
[ 595.276450][ C0] __irq_exit_rcu (kernel/softirq.c:640)
[ 595.277246][ C0] irq_exit_rcu (kernel/softirq.c:651)
[ 595.278011][ C0] common_interrupt (arch/x86/kernel/irq.c:240)
[ 595.278890][ C0] </IRQ>
[ 595.279428][ C0] <TASK>
[ 595.279974][ C0] asm_common_interrupt (??:?)
[ 595.280922][ C0] RIP: 0010:kcsan_setup_watchpoint (kernel/kcsan/core.c:357 kernel/kcsan/core.c:693)
[ 595.282042][ C0] Code: 95 fb ff 48 c7 45 c8 00 00 00 00 9c 8f 45 c8 f7 45 c8 00 02 00 00 0f 85 90 00 00 00 f7 45 98 00 02 00 00 74 01 fb 48 8b 43 30 <49> 89 47 30 48 8b 43 28 49 89 47 28 48 8b 43 20 49 89 47 20 48 8b
All code
========
0: 95 xchg %eax,%ebp
1: fb sti
2: ff 48 c7 decl -0x39(%rax)
5: 45 c8 00 00 00 rex.RB enterq $0x0,$0x0
a: 00 9c 8f 45 c8 f7 45 add %bl,0x45f7c845(%rdi,%rcx,4)
11: c8 00 02 00 enterq $0x200,$0x0
15: 00 0f add %cl,(%rdi)
17: 85 90 00 00 00 f7 test %edx,-0x9000000(%rax)
1d: 45 98 rex.RB cwtl
1f: 00 02 add %al,(%rdx)
21: 00 00 add %al,(%rax)
23: 74 01 je 0x26
25: fb sti
26: 48 8b 43 30 mov 0x30(%rbx),%rax
2a:* 49 89 47 30 mov %rax,0x30(%r15) <-- trapping instruction
2e: 48 8b 43 28 mov 0x28(%rbx),%rax
32: 49 89 47 28 mov %rax,0x28(%r15)
36: 48 8b 43 20 mov 0x20(%rbx),%rax
3a: 49 89 47 20 mov %rax,0x20(%r15)
3e: 48 rex.W
3f: 8b .byte 0x8b
Code starting with the faulting instruction
===========================================
0: 49 89 47 30 mov %rax,0x30(%r15)
4: 48 8b 43 28 mov 0x28(%rbx),%rax
8: 49 89 47 28 mov %rax,0x28(%r15)
c: 48 8b 43 20 mov 0x20(%rbx),%rax
10: 49 89 47 20 mov %rax,0x20(%r15)
14: 48 rex.W
15: 8b .byte 0x8b
[ 595.285406][ C0] RSP: 0000:ffffc90001a378a0 EFLAGS: 00000206
[ 595.286469][ C0] RAX: 0000231c0000231a RBX: ffff88816bcc60e0 RCX: 0000000000000000
[ 595.287838][ C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 595.289253][ C0] RBP: ffffc90001a37918 R08: 0000000000000000 R09: 0000000000292c74
[ 595.290514][ C0] R10: 0001ffffffffff00 R11: 0000000000000000 R12: 0000000000000008
[ 595.291885][ C0] R13: 0000000000000000 R14: ffff88816bcc60b0 R15: ffff88816bcc4798
[ 595.293348][ C0] ? __list_del_entry_valid (lib/list_debug.c:45)
[ 595.294280][ C0] __tsan_read_write8 (kernel/kcsan/core.c:? kernel/kcsan/core.c:1014)
[ 595.295129][ C0] __list_del_entry_valid (lib/list_debug.c:45)
[ 595.296047][ C0] __rmqueue_pcplist (include/linux/list.h:134 include/linux/list.h:148 mm/page_alloc.c:3574)
[ 595.296969][ C0] ? validate_chain (kernel/locking/lockdep.c:3696 kernel/locking/lockdep.c:3716 kernel/locking/lockdep.c:3771)
[ 595.297812][ C0] get_page_from_freelist (mm/page_alloc.c:3603 mm/page_alloc.c:3631 mm/page_alloc.c:4096)
[ 595.298786][ C0] ? __cond_resched (arch/x86/include/asm/preempt.h:103 kernel/sched/core.c:8153)
[ 595.299605][ C0] ? prepare_alloc_pages (include/linux/mmzone.h:1194 include/linux/mmzone.h:1220 mm/page_alloc.c:5115)
[ 595.300540][ C0] __alloc_pages (mm/page_alloc.c:?)
[ 595.301395][ C0] ? wp_page_copy (include/linux/mmu_notifier.h:491 mm/memory.c:3125)
[ 595.302244][ C0] ? rcu_read_lock_sched_held (include/linux/lockdep.h:283 kernel/rcu/update.c:125)
[ 595.303370][ C0] ? __lock_acquire (kernel/locking/lockdep.c:?)
[ 595.304245][ C0] wp_page_copy (include/linux/gfp.h:? include/linux/gfp.h:595 include/linux/gfp.h:609 mm/memory.c:3018)
[ 595.305098][ C0] ? rcu_read_lock_sched_held (include/linux/lockdep.h:283 kernel/rcu/update.c:125)
[ 595.306107][ C0] ? handle_mm_fault (include/linux/spinlock.h:? mm/memory.c:3317 mm/memory.c:4586 mm/memory.c:4704 mm/memory.c:4802)
[ 595.307071][ C0] ? do_raw_spin_unlock (include/linux/spinlock_up.h:48 kernel/locking/spinlock_debug.c:141)
[ 595.308065][ C0] handle_mm_fault (mm/memory.c:3318 mm/memory.c:4586 mm/memory.c:4704 mm/memory.c:4802)
[ 595.308924][ C0] do_user_addr_fault (arch/x86/mm/fault.c:?)
[ 595.309774][ C0] exc_page_fault (arch/x86/include/asm/irqflags.h:22 arch/x86/include/asm/irqflags.h:70 arch/x86/include/asm/irqflags.h:130 arch/x86/mm/fault.c:1492 arch/x86/mm/fault.c:1540)
[ 595.310587][ C0] ? asm_exc_page_fault (??:?)
[ 595.311378][ C0] asm_exc_page_fault (??:?)
[ 595.312204][ C0] RIP: 0033:0x42e32f
[ 595.312952][ C0] Code: 2e 0f 1f 84 00 00 00 00 00 66 90 53 48 89 fb 48 8b 3f 48 85 ff 74 05 e8 ff cb fe ff 8b 05 31 9c 2b 00 39 05 2f 9c 2b 00 7d 61 <c6> 03 df c6 43 01 df c6 43 02 df c6 43 03 df c6 43 04 df c6 43 05
All code
========
0: 2e 0f 1f 84 00 00 00 nopl %cs:0x0(%rax,%rax,1)
7: 00 00
9: 66 90 xchg %ax,%ax
b: 53 push %rbx
c: 48 89 fb mov %rdi,%rbx
f: 48 8b 3f mov (%rdi),%rdi
12: 48 85 ff test %rdi,%rdi
15: 74 05 je 0x1c
17: e8 ff cb fe ff callq 0xfffffffffffecc1b
1c: 8b 05 31 9c 2b 00 mov 0x2b9c31(%rip),%eax # 0x2b9c53
22: 39 05 2f 9c 2b 00 cmp %eax,0x2b9c2f(%rip) # 0x2b9c57
28: 7d 61 jge 0x8b
2a:* c6 03 df movb $0xdf,(%rbx) <-- trapping instruction
2d: c6 43 01 df movb $0xdf,0x1(%rbx)
31: c6 43 02 df movb $0xdf,0x2(%rbx)
35: c6 43 03 df movb $0xdf,0x3(%rbx)
39: c6 43 04 df movb $0xdf,0x4(%rbx)
3d: c6 .byte 0xc6
3e: 43 rex.XB
3f: 05 .byte 0x5
Code starting with the faulting instruction
===========================================
0: c6 03 df movb $0xdf,(%rbx)
3: c6 43 01 df movb $0xdf,0x1(%rbx)
7: c6 43 02 df movb $0xdf,0x2(%rbx)
b: c6 43 03 df movb $0xdf,0x3(%rbx)
f: c6 43 04 df movb $0xdf,0x4(%rbx)
13: c6 .byte 0xc6
14: 43 rex.XB
15: 05 .byte 0x5
To reproduce:
# build kernel
cd linux
cp config-5.17.0-04447-gd74b7e8fef5d .config
make HOSTCC=clang-15 CC=clang-15 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=clang-15 CC=clang-15 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
--
0-DAY CI Kernel Test Service
https://01.org/lkp