2022-04-25 02:21:10

by kernel test robot

[permalink] [raw]
Subject: [mm/page_owner.c] 73fc16ad40: WARNING:inconsistent_lock_state



Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: 73fc16ad40634badd5f51083e94792bb267f2919 ("mm/page_owner.c: use get_task_comm() to record task command name with the protection of task_lock()")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 42.475348][ C1] WARNING: inconsistent lock state
[ 42.476513][ C1] 5.18.0-rc3-00045-g73fc16ad4063 #1 Not tainted
[ 42.477915][ C1] --------------------------------
[ 42.479133][ C1] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 42.480717][ C1] swapper/1/0 [HC0[0]:SC1[3]:HE1:SE0] takes:
[ 42.482139][ C1] ffff888100808cd8 (&p->alloc_lock){+.?.}-{2:2}, at: __get_task_comm (fs/exec.c:1217)
[ 42.484200][ C1] {SOFTIRQ-ON-W} state was registered at:
[ 42.485512][ C1] __lock_acquire (kernel/locking/lockdep.c:4983)
[ 42.490900][ C1] lock_acquire (kernel/locking/lockdep.c:436 kernel/locking/lockdep.c:5643 kernel/locking/lockdep.c:5606)
[ 42.492097][ C1] _raw_spin_lock (include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
[ 42.493338][ C1] __set_task_comm (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/task.h:34 fs/exec.c:1231)
[ 42.494644][ C1] kthreadd (kernel/kthread.c:723)
[ 42.495684][ C1] ret_from_fork (arch/x86/entry/entry_64.S:304)
[ 42.496803][ C1] irq event stamp: 172198
[ 42.497798][ C1] hardirqs last enabled at (172198): _raw_spin_unlock_irqrestore (arch/x86/include/asm/irqflags.h:45 arch/x86/include/asm/irqflags.h:80 arch/x86/include/asm/irqflags.h:138 include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
[ 42.500372][ C1] hardirqs last disabled at (172197): _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:108 kernel/locking/spinlock.c:162)
[ 42.502627][ C1] softirqs last enabled at (172176): __do_softirq (arch/x86/include/asm/preempt.h:27 kernel/softirq.c:402 kernel/softirq.c:587)
[ 42.504810][ C1] softirqs last disabled at (172183): __irq_exit_rcu (kernel/softirq.c:432 kernel/softirq.c:637)
[ 42.507080][ C1]
[ 42.507080][ C1] other info that might help us debug this:
[ 42.509073][ C1] Possible unsafe locking scenario:
[ 42.509073][ C1]
[ 42.510935][ C1] CPU0
[ 42.511819][ C1] ----
[ 42.512771][ C1] lock(&p->alloc_lock);
[ 42.513881][ C1] <Interrupt>
[ 42.514897][ C1] lock(&p->alloc_lock);
[ 42.516025][ C1]
[ 42.516025][ C1] *** DEADLOCK ***
[ 42.516025][ C1]
[ 42.518075][ C1] no locks held by swapper/1/0.
[ 42.519348][ C1]
[ 42.519348][ C1] stack backtrace:
[ 42.520964][ C1] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.18.0-rc3-00045-g73fc16ad4063 #1
[ 42.523160][ C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 42.525319][ C1] Call Trace:
[ 42.526317][ C1] <IRQ>
[ 42.527101][ C1] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
[ 42.528143][ C1] mark_lock_irq.cold (kernel/locking/lockdep.c:4178 kernel/locking/lockdep.c:3947 kernel/locking/lockdep.c:4150)
[ 42.529457][ C1] ? secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:303)
[ 42.530942][ C1] ? lock_chain_count (kernel/locking/lockdep.c:4141)
[ 42.532165][ C1] ? stack_trace_save (kernel/stacktrace.c:123)
[ 42.533480][ C1] ? filter_irq_stacks (kernel/stacktrace.c:114)
[ 42.534827][ C1] ? save_trace (kernel/locking/lockdep.c:555)
[ 42.535869][ C1] mark_lock+0x4c9/0xac0
[ 42.537066][ C1] ? mark_lock_irq (kernel/locking/lockdep.c:4566)
[ 42.538311][ C1] ? lock_downgrade (kernel/locking/lockdep.c:5293)
[ 42.539540][ C1] mark_usage (kernel/locking/lockdep.c:4502)
[ 42.540657][ C1] __lock_acquire (kernel/locking/lockdep.c:4983)
[ 42.541858][ C1] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/irqflags.h:45 arch/x86/include/asm/irqflags.h:80 arch/x86/include/asm/irqflags.h:138 include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
[ 42.543367][ C1] lock_acquire (kernel/locking/lockdep.c:436 kernel/locking/lockdep.c:5643 kernel/locking/lockdep.c:5606)
[ 42.544612][ C1] ? __get_task_comm (fs/exec.c:1217)
[ 42.545826][ C1] ? rcu_read_unlock (include/linux/rcupdate.h:723 (discriminator 5))
[ 42.547145][ C1] ? memory_failure_queue_kick (mm/page_owner.c:107)
[ 42.548624][ C1] ? e1000_alloc_rx_buffers (include/linux/skbuff.h:3071 drivers/net/ethernet/intel/e1000/e1000_main.c:2072 drivers/net/ethernet/intel/e1000/e1000_main.c:4583)
[ 42.549998][ C1] ? e1000_clean_rx_irq (drivers/net/ethernet/intel/e1000/e1000_main.c:4485)
[ 42.551339][ C1] ? e1000_clean (drivers/net/ethernet/intel/e1000/e1000_main.c:3805)
[ 42.552510][ C1] ? __napi_poll+0xa1/0x380
[ 42.553784][ C1] ? net_rx_action (net/core/dev.c:6482 net/core/dev.c:6567)
[ 42.555053][ C1] ? __do_softirq (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/irq.h:142 kernel/softirq.c:559)
[ 42.556298][ C1] _raw_spin_lock (include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
[ 42.557427][ C1] ? __get_task_comm (fs/exec.c:1217)
[ 42.558732][ C1] __get_task_comm (fs/exec.c:1217)
[ 42.559817][ C1] __set_page_owner (include/linux/instrumented.h:41 include/asm-generic/bitops/instrumented-non-atomic.h:27 mm/page_owner.c:172 mm/page_owner.c:189)
[ 42.561029][ C1] get_page_from_freelist (mm/page_alloc.c:2452 mm/page_alloc.c:4182)
[ 42.562325][ C1] __alloc_pages (mm/page_alloc.c:5408)
[ 42.563467][ C1] ? __alloc_pages_slowpath+0x1900/0x1900
[ 42.565180][ C1] ? __skb_gro_checksum_complete (net/core/gro.c:774)
[ 42.566702][ C1] ? __lock_release (kernel/locking/lockdep.c:5317)
[ 42.567796][ C1] page_frag_alloc_align (mm/page_alloc.c:5532 mm/page_alloc.c:5561)
[ 42.569066][ C1] __netdev_alloc_frag_align (include/linux/bottom_half.h:33 net/core/skbuff.c:164)
[ 42.570427][ C1] e1000_alloc_rx_buffers (include/linux/skbuff.h:3071 drivers/net/ethernet/intel/e1000/e1000_main.c:2072 drivers/net/ethernet/intel/e1000/e1000_main.c:4583)
[ 42.571776][ C1] ? napi_gro_receive (include/trace/events/net.h:287 net/core/gro.c:631)
[ 42.573118][ C1] e1000_clean_rx_irq (drivers/net/ethernet/intel/e1000/e1000_main.c:4485)
[ 42.574128][ C1] ? e1000_82547_tx_fifo_stall_task (drivers/net/ethernet/intel/e1000/e1000_main.c:4353)
[ 42.575704][ C1] e1000_clean (drivers/net/ethernet/intel/e1000/e1000_main.c:3805)
[ 42.576928][ C1] ? e1000_clean_tx_irq (drivers/net/ethernet/intel/e1000/e1000_main.c:3796)
[ 42.578339][ C1] __napi_poll+0xa1/0x380
[ 42.579552][ C1] net_rx_action (net/core/dev.c:6482 net/core/dev.c:6567)
[ 42.580746][ C1] ? __napi_poll+0x380/0x380
[ 42.582031][ C1] ? sched_clock_cpu (kernel/sched/clock.c:369)
[ 42.583354][ C1] ? lock_downgrade (kernel/locking/lockdep.c:5293)
[ 42.584624][ C1] __do_softirq (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/irq.h:142 kernel/softirq.c:559)
[ 42.585748][ C1] __irq_exit_rcu (kernel/softirq.c:432 kernel/softirq.c:637)
[ 42.586968][ C1] irq_exit_rcu (kernel/softirq.c:651)
[ 42.588004][ C1] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 14))
[ 42.589199][ C1] </IRQ>
[ 42.589956][ C1] <TASK>
[ 42.590819][ C1] asm_common_interrupt (arch/x86/include/asm/idtentry.h:636)
[ 42.592069][ C1] RIP: 0010:default_idle (arch/x86/kernel/process.c:734)
[ 42.593441][ C1] Code: ff ff ff 48 89 df e8 8f 12 f7 fd e9 55 ff ff ff 4c 89 e7 e8 82 12 f7 fd eb 96 0f 1f 44 00 00 eb 07 0f 00 2d b2 61 48 00 fb f4 <c3> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
All code
========
0: ff (bad)
1: ff (bad)
2: ff 48 89 decl -0x77(%rax)
5: df e8 fucomip %st(0),%st
7: 8f (bad)
8: 12 f7 adc %bh,%dh
a: fd std
b: e9 55 ff ff ff jmpq 0xffffffffffffff65
10: 4c 89 e7 mov %r12,%rdi
13: e8 82 12 f7 fd callq 0xfffffffffdf7129a
18: eb 96 jmp 0xffffffffffffffb0
1a: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
1f: eb 07 jmp 0x28
21: 0f 00 2d b2 61 48 00 verw 0x4861b2(%rip) # 0x4861da
28: fb sti
29: f4 hlt
2a:* c3 retq <-- trapping instruction
2b: cc int3
2c: cc int3
2d: cc int3
2e: cc int3
2f: cc int3
30: cc int3
31: cc int3
32: cc int3
33: cc int3
34: cc int3
35: cc int3
36: cc int3
37: cc int3
38: cc int3
39: cc int3
3a: cc int3
3b: cc int3
3c: cc int3
3d: cc int3
3e: cc int3
3f: cc int3

Code starting with the faulting instruction
===========================================
0: c3 retq
1: cc int3
2: cc int3
3: cc int3
4: cc int3
5: cc int3
6: cc int3
7: cc int3
8: cc int3
9: cc int3
a: cc int3
b: cc int3
c: cc int3
d: cc int3
e: cc int3
f: cc int3
10: cc int3
11: cc int3
12: cc int3
13: cc int3
14: cc int3
15: cc int3


To reproduce:

# build kernel
cd linux
cp config-5.18.0-rc3-00045-g73fc16ad4063 .config
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://01.org/lkp



Attachments:
(No filename) (10.98 kB)
config-5.18.0-rc3-00045-g73fc16ad4063 (169.67 kB)
job-script (4.63 kB)
dmesg.xz (15.14 kB)
Download all attachments