Greeting,
FYI, we noticed the following commit (built with clang-14):
commit: 1391efa952e8b22088f8626fc63ade26767b92d6 ("gcov: use kvmalloc()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: boot
on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
[ 119.538115][ T207] BUG: KASAN: slab-out-of-bounds in gcov_info_add (kernel/gcov/clang.c:328)
[ 119.539782][ T207] Read of size 8 at addr ffff88810a985328 by task systemd-udevd/207
[ 119.541504][ T207]
[ 119.542060][ T207] CPU: 1 PID: 207 Comm: systemd-udevd Not tainted 5.12.0-11517-g1391efa952e8 #2
[ 119.544625][ T207] Call Trace:
[ 119.544673][ T207] dump_stack (lib/dump_stack.c:79 lib/dump_stack.c:120)
[ 119.546417][ T207] ? gcov_info_add (kernel/gcov/clang.c:328)
[ 119.547527][ T207] print_address_description (mm/kasan/report.c:234)
[ 119.548808][ T207] ? _raw_spin_lock_irqsave (kernel/locking/spinlock.c:159)
0m] Listening on[ 119.552074][ T207] kasan_report (mm/kasan/report.c:420 mm/kasan/report.c:436)
D-Bus [ 119.553265][ T207] ? gcov_info_add (kernel/gcov/clang.c:328)
System Message B[ 119.554491][ T207] __asan_report_load8_noabort (mm/kasan/report_generic.c:309)
us Socket.[ 119.555899][ T207] gcov_info_add (kernel/gcov/clang.c:328)
[ 119.557116][ T207] gcov_event (kernel/gcov/fs.c:794)
[ 119.558212][ T207] gcov_module_notifier (kernel/gcov/base.c:120)
[ 119.559371][ T207] notifier_call_chain (kernel/notifier.c:83)
[ 119.560600][ T207] blocking_notifier_call_chain (kernel/notifier.c:337)
[ 119.560664][ T207] do_init_module (kernel/module.c:3750)
[ 119.562888][ T207] load_module (kernel/module.c:4060)
[ 119.563941][ T207] ? init_wait_entry (kernel/sched/wait.c:403)
0m] Reached targ[ 119.566476][ T207] do_syscall_64 (arch/x86/entry/common.c:47)
et Sock[ 119.567629][ T207] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:112)
ets.[ 119.569056][ T207] RIP: 0033:0x7fc466b18f59
[ 119.570136][ T207] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 07 6f 0c 00 f7 d8 64 89 01 48
All code
========
0: 00 c3 add %al,%bl
2: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
9: 00 00 00
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 48 89 f8 mov %rdi,%rax
14: 48 89 f7 mov %rsi,%rdi
17: 48 89 d6 mov %rdx,%rsi
1a: 48 89 ca mov %rcx,%rdx
1d: 4d 89 c2 mov %r8,%r10
20: 4d 89 c8 mov %r9,%r8
23: 4c 8b 4c 24 08 mov 0x8(%rsp),%r9
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
30: 73 01 jae 0x33
32: c3 retq
33: 48 8b 0d 07 6f 0c 00 mov 0xc6f07(%rip),%rcx # 0xc6f41
3a: f7 d8 neg %eax
3c: 64 89 01 mov %eax,%fs:(%rcx)
3f: 48 rex.W
Code starting with the faulting instruction
===========================================
0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
6: 73 01 jae 0x9
8: c3 retq
9: 48 8b 0d 07 6f 0c 00 mov 0xc6f07(%rip),%rcx # 0xc6f17
10: f7 d8 neg %eax
12: 64 89 01 mov %eax,%fs:(%rcx)
15: 48 rex.W
[ 119.574787][ T207] RSP: 002b:00007ffe0dd0f608 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 119.576607][ T207] RAX: ffffffffffffffda RBX: 000055766851e850 RCX: 00007fc466b18f59
[ 119.578303][ T207] RDX: 0000000000000000 RSI: 00007fc466a1dcad RDI: 000000000000000f
[ 119.580018][ T207] RBP: 00007fc466a1dcad R08: 0000000000000000 R09: 0000000000000000
[ 119.581712][ T207] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000000000
[ 119.583416][ T207] R13: 0000557668514ce0 R14: 0000000000020000 R15: 000055766851e850
[ 119.585244][ T207]
[ 119.585788][ T207] Allocated by task 195:
[ 119.586780][ T207] ____kasan_kmalloc (mm/kasan/common.c:39 mm/kasan/common.c:46 mm/kasan/common.c:428 mm/kasan/common.c:507)
[ 119.587889][ T207] __kasan_kmalloc (mm/kasan/common.c:516)
[ 119.588931][ T207] __kmalloc (include/linux/kasan.h:246)
[ 119.588961][ T207] kvmalloc_node (mm/util.c:593)
[ 119.588989][ T207] gcov_info_dup (include/linux/mm.h:804 kernel/gcov/clang.c:371 kernel/gcov/clang.c:404)
[ 119.589017][ T207] gcov_event (kernel/gcov/fs.c:796 kernel/gcov/fs.c:820 kernel/gcov/fs.c:853)
[ 119.596780][ T207] gcov_module_notifier (kernel/gcov/base.c:120)
1;39mDaily apt d[ 119.600624][ T207] do_init_module (kernel/module.c:3750)
ownload activiti[ 119.601829][ T207] load_module (kernel/module.c:4060)
es.[ 119.603037][ T207] __x64_sys_finit_module (kernel/module.c:4151 kernel/module.c:4128 kernel/module.c:4128)
[ 119.604319][ T207] do_syscall_64 (arch/x86/entry/common.c:47)
[ 119.605277][ T207] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:112)
[ 119.606539][ T207]
[ 119.609202][ T207] The buggy address belongs to the object at ffff88810a985320
[ 119.609202][ T207] which belongs to the cache kmalloc-8 of size 8
[ 119.612113][ T207] The buggy address is located 0 bytes to the right of
[ 119.612113][ T207] 8-byte region [ffff88810a985320, ffff88810a985328)
[ 119.615017][ T207] The buggy address belongs to the page:
[ 119.616320][ T207] page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10a985
[ 119.618522][ T207] flags: 0x8000000000000200(slab|zone=2)
[ 119.619804][ T207] raw: 8000000000000200 ffffea00042afec0 0000000300000003 ffff888100041280
[ 119.619832][ T207] raw: 0000000000000000 0000000000660066 00000001ffffffff 0000000000000000
[ 119.619853][ T207] page dumped because: kasan: bad access detected
[ 119.619926][ T207] page_owner tracks the page as allocated
[ 119.625991][ T207] page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 1, ts 34574012785, free_ts 0
1;39mDaily rotat[ 119.631773][ T207] __alloc_pages (mm/page_alloc.c:5201)
ion of log files[ 119.632961][ T207] new_slab (mm/slub.c:344 mm/slub.c:1784 mm/slub.c:1847)
.[ 119.634055][ T207] ___slab_alloc (mm/slub.c:2594 mm/slub.c:2756)
[ 119.635185][ T207] __slab_alloc (mm/slub.c:2796)
[ 119.636168][ T207] __kmalloc (mm/slub.c:2878 mm/slub.c:2920 mm/slub.c:4063)
[ 119.637106][ T207] acpi_ns_internalize_name (drivers/acpi/acpica/nsutils.c:332)
[ 119.638379][ T207] acpi_ns_get_node_unlocked (drivers/acpi/acpica/nsutils.c:667)
[ 119.641517][ T207] acpi_ns_get_node (drivers/acpi/acpica/nsutils.c:726)
[ 119.642568][ T207] acpi_get_handle (drivers/acpi/acpica/nsxfname.c:99)
[ 119.643648][ T207] acpi_has_method (drivers/acpi/utils.c:539)
[ 119.644724][ T207] acpi_is_video_device (drivers/acpi/scan.c:1212)
[ 119.645877][ T207] acpi_init_device_object (drivers/acpi/scan.c:1352 drivers/acpi/scan.c:1662)
[ 119.645908][ T207] acpi_add_single_object (drivers/acpi/scan.c:1705)
[ 119.645936][ T207] acpi_bus_check_add (drivers/acpi/scan.c:1951)
[ 119.645968][ T207] page_owner free stack trace missing
[ 119.650485][ T207]
1;39mDaily apt u[ 119.654379][ T207] ffff88810a985280: 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc 00
pgrade and clean[ 119.656297][ T207] >ffff88810a985300: fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc 00 fc
activities.[ 119.658193][ T207] ^
[ 119.659563][ T207] ffff88810a985380: fc fc fc 00 fc fc fc fc 00 fc fc fc fc 00 fc fc
[ 119.661299][ T207] ffff88810a985400: fc fc 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc
[ 119.663111][ T207] ==================================================================
[ 119.664832][ T207] Disabling lock debugging due to kernel taint
[ OK ] Reached target Timers.
[ OK ] Reached target Basic System.
[ OK ] Started Regular background program processing daemon.
[ OK ] Started D-Bus System Message Bus.
Starting LSB: OpenIPMI Driver init script...
Starting System Logging Service...
[ 120.038787][ C0] random: fast init done
Starting Login Service...
[ 120.181505][ T238] _warn_unseeded_randomness: 80 callbacks suppressed
[ 120.181528][ T238] random: get_random_u64 called from arch_pick_mmap_layout+0x178/0x550 with crng_init=1
[ 120.181573][ T238] random: get_random_u64 called from randomize_stack_top+0x56/0xb0 with crng_init=1
[ 120.181786][ T238] random: get_random_u64 called from arch_mmap_rnd+0x80/0x120 with crng_init=1
Starting Helper to synchronize boot up for ifupdown...
[ OK ] Started Helper to synchronize boot up for ifupdown.
[ OK ] Started Login Service.
Starting LSB: Load kernel image with kexec...
Starting Raise network interfaces...
[ OK ] Started LSB: Load kernel image with kexec.
[ 121.015633][ T258] ipmi_si: IPMI System Interface driver
[ 121.024502][ T258] ipmi_si: Unable to find any System Interface(s)
[ OK ] Started Raise network interfaces.
[ OK ] Reached target Network.
Starting LKP bootstrap...
[ 121.191889][ T272] _warn_unseeded_randomness: 195 callbacks suppressed
[ 121.191917][ T272] random: get_random_bytes called from key_alloc+0x9c8/0x10a0 with crng_init=1
[ 121.196979][ T273] random: get_random_u64 called from arch_pick_mmap_layout+0x178/0x550 with crng_init=1
[ 121.197012][ T273] random: get_random_u64 called from randomize_stack_top+0x56/0xb0 with crng_init=1
Starting Permit User Sessions...
Starting OpenBSD Secure Shell server...
Starting /etc/rc.local Compatibility...
[FAILED] Failed to start LSB: OpenIPMI Driver init script.
See 'systemctl status openipmi.service' for details.
[ OK ] Started Permit User Sessions.
[ OK ] Started LKP bootstrap.
[ 122.253005][ T301] _warn_unseeded_randomness: 176 callbacks suppressed
To reproduce:
# build kernel
cd linux
cp config-5.12.0-11517-g1391efa952e8 .config
make HOSTCC=clang-14 CC=clang-14 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang
On Thu, 2021-11-25 at 14:26 +0000, Sang, Oliver wrote:
>
> Greeting,
>
> FYI, we noticed the following commit (built with clang-14):
>
> commit: 1391efa952e8b22088f8626fc63ade26767b92d6 ("gcov: use kvmalloc()")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: boot
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
I don't think it actually *caused* the issue, but who knows? To me it
rather looks like it exposed an issue, which wasn't noticed before
because vmalloc() always allocates a page anyway? Assuming KASAN doesn't
track vmalloc() allocations more granular than their actual page-aligned
memory size (though perhaps it should?)
In any case, reviewing my commit again I see nothing wrong there, and I
don't understand the code/clang well enough to see what might be the
issue.
johannes
On Thu, Nov 25, 2021 at 7:45 AM Johannes Berg <[email protected]> wrote:
>
> In any case, reviewing my commit again I see nothing wrong there, and I
> don't understand the code/clang well enough to see what might be the
> issue.
Yeah, looks like the kvmalloc() conversion simply tightened the
checking boundaries, now that it might be using kmalloc() instead of
full pages.
The report isn't wonderful (the KASAN report is garbled, and the "code
disassembly" is not from the actual KASAN report, it's from the return
to user space code), but it blames a 8-byte read at
gcov_info_add (kernel/gcov/clang.c:328)
which in that kernel version (1391efa952e8) is that
for (i = 0; i < sfn_ptr->num_counters; i++)
-> dfn_ptr->counters[i] += sfn_ptr->counters[i];
and so it looks like that 'sfn_ptr->counters[i]' access is the problem case.
The allocation was
gcov_info_dup (include/linux/mm.h:804 kernel/gcov/clang.c:371
kernel/gcov/clang.c:404)
and in particular, it's the inlined gcov_fn_info_dup() at line 371:
cv_size = fn->num_counters * sizeof(fn->counters[0]);
fn_dup->counters = kvmalloc(cv_size, GFP_KERNEL);
but I'm not seeing how that could fail.
Sure, that code doesn't explicitly ever set "fn_dup->num_counters",
but it should get set thanks to the
fn_dup = kmemdup(fn, sizeof(*fn), GFP_KERNEL);
so I don't see anything wrong there, but I'm adding some of the
involved people to the participants in case they see what's going on..
Linus
On Thu, 2021-11-25 at 09:48 -0800, Linus Torvalds wrote:
>
> The report isn't wonderful (the KASAN report is garbled, and the "code
> disassembly" is not from the actual KASAN report, it's from the return
> to user space code), but it blames a 8-byte read at
>
> gcov_info_add (kernel/gcov/clang.c:328)
>
> which in that kernel version (1391efa952e8) is that
>
> for (i = 0; i < sfn_ptr->num_counters; i++)
> -> dfn_ptr->counters[i] += sfn_ptr->counters[i];
>
> and so it looks like that 'sfn_ptr->counters[i]' access is the problem case.
>
Not sure. Note the +=, so it has to read both sides first. Would KASAN
report the follow-up write as well?
The loop is going to sfn_ptr->num_counters, so that seems less likely to
get confused, unless clang has a bug, which is where the original
counters and num_counters comes from, afaict?
I mean, we seem to always consistently allocate counters from
num_counters for these things, but here we use sfn_ptr and dfn_ptr, and
the loop checks only dfn_ptr. Just being suspicious I guess, but I
really don't know anything about this. All I know that vmalloc made this
really slow on ARCH=um and so I converted all of this ... :)
johannes