2023-06-09 07:34:32

by kernel test robot

[permalink] [raw]
Subject: [mm] 47ebd0310e: kernel_BUG_at_arch/x86/kernel/irqinit.c


hi, Alexander Potapenko,

in our below tests, both 47ebd0310e and its parent show various issues, but
we found some differences:

both 47ebd0310e and its parent will first show issue
"WARNING:at_mm/vmalloc.c:#__vmap_pages_range_noflush"
such like:

[ 0.010000][ T0] WARNING: CPU: 0 PID: 0 at mm/vmalloc.c:477 __vmap_pages_range_noflush+0x1182/0x1550

after it, for 47ebd0310e, it will show issue
"kernel_BUG_at_arch/x86/kernel/irqinit.c" such like:

[ 0.010000][ T0] kernel BUG at arch/x86/kernel/irqinit.c:89!

then crash soon:

[ 0.010000][ T0] Kernel panic - not syncing: Fatal exception

however, for parent, we observed there is no
"kernel_BUG_at_arch/x86/kernel/irqinit.c"
it could run further with other issues as listed in table within below full
report, until:

[ 13.341722][ T1] general protection fault, maybe for address 0x0: 0000 [#1] PREEMPT SMP

then:

[ 13.365591][ T1] Kernel panic - not syncing: Fatal exception

both dmesg for 47ebd0310e and its parent are attached.

for the difference, we have not enough idea whether they are meaningful
enough since after the first same issue:
"WARNING:at_mm/vmalloc.c:#__vmap_pages_range_noflush"
(if meaningless, maybe this report could be ignored)
and what's the real relation with the code change in 47ebd0310e.

we just report what we found in our tests for your information and hope it
could supply some hints.

if you need more tests, please let us know. Thanks!

below is the detail report.


Greeting,

FYI, we noticed the following commit (built with clang-15):

commit: 47ebd0310e89c087f56e58c103c44b72a2f6b216 ("mm: kmsan: handle alloc failures in kmsan_vmap_pages_range_noflush()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+--------------------------------------------------------------+------------+------------+
| | a101482421 | 47ebd0310e |
+--------------------------------------------------------------+------------+------------+
| boot_successes | 0 | 0 |
| boot_failures | 9 | 6 |
| WARNING:at_mm/vmalloc.c:#__vmap_pages_range_noflush | 9 | 6 |
| RIP:__vmap_pages_range_noflush | 9 | 6 |
| WARNING:at_mm/kmsan/shadow.c:#kmsan_vmap_pages_range_noflush | 9 | |
| RIP:kmsan_vmap_pages_range_noflush | 9 | |
| maybe_for_address#:#[##] | 9 | |
| RIP:memset | 9 | |
| Kernel_panic-not_syncing:Fatal_exception | 9 | 6 |
| kernel_BUG_at_arch/x86/kernel/irqinit.c | 0 | 6 |
| invalid_opcode:#[##] | 0 | 6 |
| RIP:init_IRQ | 0 | 6 |
+--------------------------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 0.010000][ T0] kernel BUG at arch/x86/kernel/irqinit.c:89!
[ 0.010000][ T0] invalid opcode: 0000 [#1] PREEMPT SMP
[ 0.010000][ T0] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.3.0-rc4-00044-g47ebd0310e89 #1
[ 0.010000][ T0] RIP: 0010:init_IRQ (arch/x86/kernel/irqinit.c:89)
[ 0.010000][ T0] Code: 3f cd fb e9 08 fe ff ff 8b 3a e8 56 3f cd fb e9 10 fe ff ff e8 4c 3f cd fb eb 9d 41 8b bc 24 a8 0f 00 00 e8 3d 3f cd fb eb aa <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 57
All code
========
0: 3f (bad)
1: cd fb int $0xfb
3: e9 08 fe ff ff jmp 0xfffffffffffffe10
8: 8b 3a mov (%rdx),%edi
a: e8 56 3f cd fb call 0xfffffffffbcd3f65
f: e9 10 fe ff ff jmp 0xfffffffffffffe24
14: e8 4c 3f cd fb call 0xfffffffffbcd3f65
19: eb 9d jmp 0xffffffffffffffb8
1b: 41 8b bc 24 a8 0f 00 mov 0xfa8(%r12),%edi
22: 00
23: e8 3d 3f cd fb call 0xfffffffffbcd3f65
28: eb aa jmp 0xffffffffffffffd4
2a:* 0f 0b ud2 <-- trapping instruction
2c: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
33: 00 00
35: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
3a: 55 push %rbp
3b: 48 89 e5 mov %rsp,%rbp
3e: 41 57 push %r15

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
9: 00 00
b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
10: 55 push %rbp
11: 48 89 e5 mov %rsp,%rbp
14: 41 57 push %r15
[ 0.010000][ T0] RSP: 0000:ffffffff85403e68 EFLAGS: 00010082
[ 0.010000][ T0] RAX: 0000000000000000 RBX: 00000000fffffff4 RCX: 0000000000000000
[ 0.010000][ T0] RDX: ffff88841f48d3d0 RSI: 0000000000000004 RDI: 0000000000262088
[ 0.010000][ T0] RBP: ffffffff85403eb8 R08: ffffea000000000f R09: ffff88843ffbf000
[ 0.010000][ T0] R10: ffff8883ef369a98 R11: 0000000000000000 R12: ffffffff85416b48
[ 0.010000][ T0] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000010
[ 0.010000][ T0] FS: 0000000000000000(0000) GS:ffff88843f600000(0000) knlGS:0000000000000000
[ 0.010000][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.010000][ T0] CR2: ffff88843ffff000 CR3: 000000000545a000 CR4: 00000000000406b0
[ 0.010000][ T0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.010000][ T0] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.010000][ T0] Call Trace:
[ 0.010000][ T0] <TASK>
[ 0.010000][ T0] start_kernel (init/main.c:1050)
[ 0.010000][ T0] ? __msan_metadata_ptr_for_load_8 (arch/x86/include/asm/smap.h:56 mm/kmsan/instrumentation.c:37 mm/kmsan/instrumentation.c:92)
[ 0.010000][ T0] x86_64_start_reservations (arch/x86/kernel/head64.c:557)
[ 0.010000][ T0] x86_64_start_kernel (arch/x86/kernel/head64.c:538)
[ 0.010000][ T0] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
[ 0.010000][ T0] </TASK>
[ 0.010000][ T0] Modules linked in:
[ 0.010000][ T0] ---[ end trace 0000000000000000 ]---
[ 0.010000][ T0] RIP: 0010:init_IRQ (arch/x86/kernel/irqinit.c:89)
[ 0.010000][ T0] Code: 3f cd fb e9 08 fe ff ff 8b 3a e8 56 3f cd fb e9 10 fe ff ff e8 4c 3f cd fb eb 9d 41 8b bc 24 a8 0f 00 00 e8 3d 3f cd fb eb aa <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 57
All code
========
0: 3f (bad)
1: cd fb int $0xfb
3: e9 08 fe ff ff jmp 0xfffffffffffffe10
8: 8b 3a mov (%rdx),%edi
a: e8 56 3f cd fb call 0xfffffffffbcd3f65
f: e9 10 fe ff ff jmp 0xfffffffffffffe24
14: e8 4c 3f cd fb call 0xfffffffffbcd3f65
19: eb 9d jmp 0xffffffffffffffb8
1b: 41 8b bc 24 a8 0f 00 mov 0xfa8(%r12),%edi
22: 00
23: e8 3d 3f cd fb call 0xfffffffffbcd3f65
28: eb aa jmp 0xffffffffffffffd4
2a:* 0f 0b ud2 <-- trapping instruction
2c: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
33: 00 00
35: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
3a: 55 push %rbp
3b: 48 89 e5 mov %rsp,%rbp
3e: 41 57 push %r15

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
9: 00 00
b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
10: 55 push %rbp
11: 48 89 e5 mov %rsp,%rbp
14: 41 57 push %r15


To reproduce:

# build kernel
cd linux
cp config-6.3.0-rc4-00044-g47ebd0310e89 .config
make HOSTCC=clang-15 CC=clang-15 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=clang-15 CC=clang-15 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://01.org/lkp



Attachments:
(No filename) (9.10 kB)
config-6.3.0-rc4-00044-g47ebd0310e89 (138.58 kB)
job-script (4.80 kB)
dmesg.xz (4.93 kB)
dmesg-parent-a101482421.xz (7.74 kB)
Download all attachments