Greeting,
FYI, we noticed the following commit (built with gcc-9):
commit: 4d8b9319282ae84f5a17b28d8b5b5d1e7e537312 ("mctp: Add neighbour implementation")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: trinity
version: trinity-x86_64-b1a0aef9-1_20210908
with following parameters:
number: 99999
group: group-00
test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/
on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
+-------------------------------------------------------------------------+------------+------------+
| | 06d2f4c583 | 4d8b931928 |
+-------------------------------------------------------------------------+------------+------------+
| net/mctp/route.c:#RCU-list_traversed_in_non-reader_section | 27 | |
| WARNING:at_kernel/locking/mutex.c:#__mutex_lock | 0 | 23 |
| RIP:__mutex_lock | 0 | 23 |
+-------------------------------------------------------------------------+------------+------------+
please be noted we reported "[mctp] 831119f887: net/mctp/route.c:#RCU-list_traversed_in_non-reader_section"
at the link https://lore.kernel.org/lkml/20210912132631.GB25450@xsang-OptiPlex-9020/
where we also mentioned this __mutex_lock issue on 4d8b931928.
and by further checking, we found the net/mctp/route.c:#RCU-list_traversed_in_non-reader_section issue
has similar Call Trace on 06d2f4c583 (parent of 4d8b931928) and 831119f887 (child of 4d8b931928)
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
[ 1034.479060][ T266] WARNING: CPU: 1 PID: 266 at kernel/locking/mutex.c:941 __mutex_lock (kernel/locking/mutex.c:941 kernel/locking/mutex.c:1104)
[ 1034.516551][ T266] Modules linked in:
[ 1034.534194][ T266] CPU: 1 PID: 266 Comm: kworker/u4:6 Not tainted 5.14.0-rc2-00608-g4d8b9319282a #1 dc5c32cf09a0a09c39ca0d2edc0ac77d1d0cd212
[ 1034.571472][ T266] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 1034.591364][ T266] Workqueue: netns cleanup_net
[ 1034.610339][ T266] RIP: 0010:__mutex_lock (kernel/locking/mutex.c:941 kernel/locking/mutex.c:1104)
[ 1034.629526][ T266] Code: 8b 8d 60 ff ff ff 48 8b 95 68 ff ff ff e9 be fe ff ff 90 0f 0b 90 48 c7 c6 5f 93 3a 8e 48 c7 c7 85 54 38 8e e8 4d 1d 87 ff 90 <0f> 0b 90 90 eb b0 90 48 c7 c6 73 93 3a 8e 48 c7 c7 85 54 38 8e e8
All code
========
0: 8b 8d 60 ff ff ff mov -0xa0(%rbp),%ecx
6: 48 8b 95 68 ff ff ff mov -0x98(%rbp),%rdx
d: e9 be fe ff ff jmpq 0xfffffffffffffed0
12: 90 nop
13: 0f 0b ud2
15: 90 nop
16: 48 c7 c6 5f 93 3a 8e mov $0xffffffff8e3a935f,%rsi
1d: 48 c7 c7 85 54 38 8e mov $0xffffffff8e385485,%rdi
24: e8 4d 1d 87 ff callq 0xffffffffff871d76
29: 90 nop
2a:* 0f 0b ud2 <-- trapping instruction
2c: 90 nop
2d: 90 nop
2e: eb b0 jmp 0xffffffffffffffe0
30: 90 nop
31: 48 c7 c6 73 93 3a 8e mov $0xffffffff8e3a9373,%rsi
38: 48 c7 c7 85 54 38 8e mov $0xffffffff8e385485,%rdi
3f: e8 .byte 0xe8
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 90 nop
3: 90 nop
4: eb b0 jmp 0xffffffffffffffb6
6: 90 nop
7: 48 c7 c6 73 93 3a 8e mov $0xffffffff8e3a9373,%rsi
e: 48 c7 c7 85 54 38 8e mov $0xffffffff8e385485,%rdi
15: e8 .byte 0xe8
[ 1034.670425][ T266] RSP: 0018:ffff97bfc092fb50 EFLAGS: 00010282
[ 1034.690094][ T266] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff83d3359f
[ 1034.711079][ T266] RDX: 0000000000000000 RSI: ffff8b0472599000 RDI: 0000000000000002
[ 1034.731031][ T266] RBP: ffff97bfc092fbf0 R08: 0000000000000000 R09: 0000000000000000
[ 1034.751044][ T266] R10: 0000000000000000 R11: 000000002d2d2d2d R12: 0000000000000000
[ 1034.771080][ T266] R13: ffff8b0478fdef80 R14: ffff8b04e5fe53f8 R15: 0000000000000001
[ 1034.791216][ T266] FS: 0000000000000000(0000) GS:ffff8b076fa00000(0000) knlGS:0000000000000000
[ 1034.811874][ T266] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1034.832081][ T266] CR2: 00007f244db90834 CR3: 00000001a5f62000 CR4: 00000000000406e0
[ 1034.852604][ T266] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1034.873016][ T266] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1034.893289][ T266] Call Trace:
[ 1034.912235][ T266] ? mark_held_locks (kernel/locking/lockdep.c:4194)
[ 1034.931722][ T266] ? mctp_neigh_remove_dev (net/mctp/neigh.c:77)
[ 1034.951847][ T266] ? __call_rcu (arch/x86/include/asm/irqflags.h:29 (discriminator 3) arch/x86/include/asm/irqflags.h:70 (discriminator 3) arch/x86/include/asm/irqflags.h:132 (discriminator 3) kernel/rcu/tree.c:3063 (discriminator 3))
[ 1034.971671][ T266] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:50 (discriminator 22))
[ 1034.991324][ T266] ? mctp_neigh_remove_dev (net/mctp/neigh.c:77)
[ 1035.011487][ T266] mctp_neigh_remove_dev (net/mctp/neigh.c:77)
[ 1035.031017][ T266] mctp_dev_notify (net/mctp/device.c:346 net/mctp/device.c:383)
[ 1035.050716][ T266] notifier_call_chain (kernel/notifier.c:83)
[ 1035.074077][ T266] call_netdevice_notifiers_info (net/core/dev.c:2123)
[ 1035.093920][ T266] unregister_netdevice_many (net/core/dev.c:11114)
[ 1035.113733][ T266] default_device_exit_batch (net/core/dev.c:11643)
[ 1035.132921][ T266] ? autoremove_wake_function (kernel/sched/wait.c:469)
[ 1035.152478][ T266] ? unregister_netdev (net/core/dev.c:11612)
[ 1035.171879][ T266] ? __dev_change_net_namespace (net/core/dev.c:11550)
[ 1035.191700][ T266] ops_exit_list+0x7e/0xc0
[ 1035.211330][ T266] cleanup_net (net/core/net_namespace.c:594 (discriminator 9))
[ 1035.230650][ T266] process_one_work (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:212 include/trace/events/workqueue.h:108 kernel/workqueue.c:2281)
[ 1035.249799][ T266] worker_thread (include/linux/list.h:282 kernel/workqueue.c:2423)
[ 1035.268473][ T266] ? process_one_work (kernel/workqueue.c:2365)
[ 1035.286450][ T266] kthread (kernel/kthread.c:319)
[ 1035.303893][ T266] ? set_kthread_struct (kernel/kthread.c:272)
[ 1035.321688][ T266] ret_from_fork (arch/x86/entry/entry_64.S:301)
[ 1035.339275][ T266] irq event stamp: 1226509
[ 1035.356677][ T266] hardirqs last enabled at (1226509): __call_rcu (arch/x86/include/asm/irqflags.h:29 (discriminator 3) arch/x86/include/asm/irqflags.h:70 (discriminator 3) arch/x86/include/asm/irqflags.h:132 (discriminator 3) kernel/rcu/tree.c:3063 (discriminator 3))
[ 1035.375119][ T266] hardirqs last disabled at (1226508): __call_rcu (kernel/rcu/tree.c:3028 (discriminator 1))
[ 1035.392516][ T266] softirqs last enabled at (1226428): __neigh_ifdown (net/core/neighbour.c:810 net/core/neighbour.c:359)
[ 1035.408958][ T266] softirqs last disabled at (1226426): __neigh_ifdown (net/core/neighbour.c:358)
[ 1035.429648][ T266] ---[ end trace 786fb201382b4b2f ]---
[ 1035.550054][ T801] [main] kernel became tainted! (512/0) Last seed was 2524587618
[ 1035.550097][ T801]
[ 1035.673720][ T801] trinity: Detected kernel tainting. Last seed was 2524587618
[ 1035.673771][ T801]
[ 1035.734438][ T801] [main] exit_reason=7, but 3 children still running.
[ 1035.734485][ T801]
[ 1037.664586][ T801] [main] Bailing main loop because kernel became tainted..
[ 1037.668820][ T801]
[ 1037.741020][ T801] [main] Ran 585 syscalls. Successes: 107 Failures: 477
[ 1037.741073][ T801]
[ 1037.983034][ T801] Command exited with non-zero status 1
[ 1037.983085][ T801]
[ 1038.076815][ T801] 1.28user 37.76system 1:23.36elapsed 46%CPU (0avgtext+0avgdata 50208maxresident)k
[ 1038.076868][ T801]
[ 1038.155654][ T801] 0inputs+0outputs (5463major+5873minor)pagefaults 0swaps
[ 1038.155706][ T801]
[ 1050.440550][ T787] sysrq: Emergency Sync
[ 1050.454033][ T4306] Emergency Sync complete
[ 1050.481001][ T787] sysrq: Resetting
Kboot worker: lkp-worker01
Elapsed time: 1260
kvm=(
qemu-system-x86_64
-enable-kvm
-cpu SandyBridge
-kernel $kernel
-initrd initrd-vm-snb-19.cgz
-m 16384
-smp 2
-device e1000,netdev=net0
-netdev user,id=net0,hostfwd=tcp::32032-:22
-boot order=nc
-no-reboot
-watchdog i6300esb
-watchdog-action debug
-rtc base=localtime
-serial stdio
-display none
-monitor null
)
append=(
ip=::::vm-snb-19::dhcp
root=/dev/ram0
user=lkp
job=/job-script
ARCH=x86_64
kconfig=x86_64-allyesconfig
branch=linus/master
commit=4d8b9319282ae84f5a17b28d8b5b5d1e7e537312
BOOT_IMAGE=/pkg/linux/x86_64-allyesconfig/gcc-9/4d8b9319282ae84f5a17b28d8b5b5d1e7e537312/vmlinuz-5.14.0-rc2-00608-g4d8b9319282a
vmalloc=128M
initramfs_async=0
page_owner=on
max_uptime=2100
RESULT_ROOT=/result/trinity/group-00-99999/vm-snb/debian-10.4-x86_64-20200603.cgz/x86_64-allyesconfig/gcc-9/4d8b9319282ae84f5a17b28d8b5b5d1e7e537312/21
result_service=tmpfs
selinux=0
debug
apic=debug
sysrq_always_enabled
rcupdate.rcu_cpu_stall_timeout=100
net.ifnames=0
printk.devkmsg=on
panic=-1
softlockup_panic=1
nmi_watchdog=panic
oops=panic
load_ramdisk=2
prompt_ramdisk=0
drbd.minor_count=8
systemd.log_level=err
To reproduce:
# build kernel
cd linux
cp config-5.14.0-rc2-00608-g4d8b9319282a .config
make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang