Greeting,
FYI, we noticed the following commit (built with gcc-9):
commit: e47110e90584a22e9980510b00d0dfad3a83354e ("mm/vunmap: add cond_resched() in vunmap_pmd_range")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: rcutorture
version:
with following parameters:
runtime: 300s
test: cpuhotplug
torture_type: srcu
test-description: rcutorture is rcutorture kernel module load/unload test.
test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt
on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
+-------------------------------------------------+------------+------------+
| | f3f99d63a8 | e47110e905 |
+-------------------------------------------------+------------+------------+
| WARNING:at_mm/vmalloc.c:#__vunmap | 0 | 12 |
| RIP:__vunmap | 0 | 12 |
| RIP:kfree | 0 | 12 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 12 |
| stack_segment:#[##] | 0 | 11 |
| WARNING:at_lib/kobject.c:#kobject_add_internal | 0 | 1 |
| RIP:kobject_add_internal | 0 | 1 |
+-------------------------------------------------+------------+------------+
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
[ 198.731223] WARNING: CPU: 0 PID: 1948 at mm/vmalloc.c:2247 __vunmap (kbuild/src/consumer/mm/vmalloc.c:2247 (discriminator 1))
[ 198.731996] Modules linked in: rcutorture torture intel_rapl_msr intel_rapl_common iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel input_leds bochs_drm serio_raw rtc_cmos qemu_fw_cfg stm_p_basic
[ 198.734078] CPU: 0 PID: 1948 Comm: systemd-udevd Not tainted 5.9.0-rc1-00107-ge47110e90584a #1
[ 198.734996] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 198.735873] RIP: 0010:__vunmap (kbuild/src/consumer/mm/vmalloc.c:2247 (discriminator 1))
[ 198.736322] Code: ff e9 61 ff ff ff 31 d2 31 f6 48 c7 c7 ff ff ff ff e8 11 53 ff ff e9 4c ff ff ff 48 89 fe 48 c7 c7 60 ec 6c bb e8 69 22 bf ff <0f> 0b 48 83 c4 40 5b 5d 41 5c 41 5d 41 5e 41 5f c3 4c 89 e6 48 c7
All code
========
0: ff (bad)
1: e9 61 ff ff ff jmpq 0xffffffffffffff67
6: 31 d2 xor %edx,%edx
8: 31 f6 xor %esi,%esi
a: 48 c7 c7 ff ff ff ff mov $0xffffffffffffffff,%rdi
11: e8 11 53 ff ff callq 0xffffffffffff5327
16: e9 4c ff ff ff jmpq 0xffffffffffffff67
1b: 48 89 fe mov %rdi,%rsi
1e: 48 c7 c7 60 ec 6c bb mov $0xffffffffbb6cec60,%rdi
25: e8 69 22 bf ff callq 0xffffffffffbf2293
2a:* 0f 0b ud2 <-- trapping instruction
2c: 48 83 c4 40 add $0x40,%rsp
30: 5b pop %rbx
31: 5d pop %rbp
32: 41 5c pop %r12
34: 41 5d pop %r13
36: 41 5e pop %r14
38: 41 5f pop %r15
3a: c3 retq
3b: 4c 89 e6 mov %r12,%rsi
3e: 48 rex.W
3f: c7 .byte 0xc7
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 48 83 c4 40 add $0x40,%rsp
6: 5b pop %rbx
7: 5d pop %rbp
8: 41 5c pop %r12
a: 41 5d pop %r13
c: 41 5e pop %r14
e: 41 5f pop %r15
10: c3 retq
11: 4c 89 e6 mov %r12,%rsi
14: 48 rex.W
15: c7 .byte 0xc7
[ 198.738277] RSP: 0018:ffff88838d59fa08 EFLAGS: 00010286
[ 198.738865] RAX: 0000000000000000 RBX: ffffffffc0131f40 RCX: 0000000000000000
[ 198.739618] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffed1071ab3f37
[ 198.740411] RBP: ffffffffbc137640 R08: 0000000000000001 R09: ffffed1075d7c2e8
[ 198.741166] R10: ffff8883aebe173b R11: ffffed1075d7c2e7 R12: f8f8f8f8f8f8f8f8
[ 198.741913] R13: 00000000000008f8 R14: 0000000000000017 R15: ffff888362f80000
[ 198.742676] FS: 0000000000000000(0000) GS:ffff8883aea00000(0063) knlGS:00000000f7c71800
[ 198.743553] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 198.744173] CR2: 00000000566ea328 CR3: 00000003914b2000 CR4: 00000000000406b0
[ 198.744933] Call Trace:
[ 198.745229] free_module (kbuild/src/consumer/kernel/module.c:2251)
[ 198.745629] do_init_module (kbuild/src/consumer/kernel/module.c:3705)
[ 198.746054] ? rcu_read_lock_bh_held (kbuild/src/consumer/kernel/rcu/update.c:131)
[ 198.746576] load_module (kbuild/src/consumer/kernel/module.c:3968)
[ 198.747016] ? post_relocation (kbuild/src/consumer/kernel/module.c:3822)
[ 198.747510] ? kernel_read_file (kbuild/src/consumer/arch/x86/include/asm/atomic.h:95 kbuild/src/consumer/include/asm-generic/atomic-instrumented.h:241 kbuild/src/consumer/include/linux/fs.h:2829 kbuild/src/consumer/include/linux/fs.h:2826 kbuild/src/consumer/fs/exec.c:1017 kbuild/src/consumer/fs/exec.c:952)
[ 198.747982] ? __do_sys_finit_module (kbuild/src/consumer/kernel/module.c:4058)
[ 198.748477] __do_sys_finit_module (kbuild/src/consumer/kernel/module.c:4058)
[ 198.748960] ? __ia32_sys_init_module (kbuild/src/consumer/kernel/module.c:4035)
[ 198.749475] ? lockdep_hardirqs_on_prepare (kbuild/src/consumer/kernel/locking/lockdep.c:3637 kbuild/src/consumer/kernel/locking/lockdep.c:3697 kbuild/src/consumer/kernel/locking/lockdep.c:3649)
[ 198.750054] ? syscall_enter_from_user_mode (kbuild/src/consumer/arch/x86/include/asm/paravirt.h:780 kbuild/src/consumer/kernel/entry/common.c:78)
[ 198.750625] ? trace_hardirqs_on (kbuild/src/consumer/kernel/trace/trace_preemptirq.c:50 (discriminator 22))
[ 198.751083] ? lockdep_hardirqs_on (kbuild/src/consumer/kernel/locking/lockdep.c:3747 (discriminator 1))
[ 198.751551] __do_fast_syscall_32 (kbuild/src/consumer/arch/x86/entry/common.c:84 kbuild/src/consumer/arch/x86/entry/common.c:126)
[ 198.752008] do_fast_syscall_32 (kbuild/src/consumer/arch/x86/entry/common.c:149)
[ 198.752445] entry_SYSENTER_compat_after_hwframe (kbuild/src/consumer/arch/x86/entry/entry_64_compat.S:141)
[ 198.753029] RIP: 0023:0xf7fa8549
[ 198.753422] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
All code
========
0: 03 74 c0 01 add 0x1(%rax,%rax,8),%esi
4: 10 05 03 74 b8 01 adc %al,0x1b87403(%rip) # 0x1b8740d
a: 10 06 adc %al,(%rsi)
c: 03 74 b4 01 add 0x1(%rsp,%rsi,4),%esi
10: 10 07 adc %al,(%rdi)
12: 03 74 b0 01 add 0x1(%rax,%rsi,4),%esi
16: 10 08 adc %cl,(%rax)
18: 03 74 d8 01 add 0x1(%rax,%rbx,8),%esi
1c: 00 00 add %al,(%rax)
1e: 00 00 add %al,(%rax)
20: 00 51 52 add %dl,0x52(%rcx)
23: 55 push %rbp
24: 89 e5 mov %esp,%ebp
26: 0f 34 sysenter
28: cd 80 int $0x80
2a:* 5d pop %rbp <-- trapping instruction
2b: 5a pop %rdx
2c: 59 pop %rcx
2d: c3 retq
2e: 90 nop
2f: 90 nop
30: 90 nop
31: 90 nop
32: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
39: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
Code starting with the faulting instruction
===========================================
0: 5d pop %rbp
1: 5a pop %rdx
2: 59 pop %rcx
3: c3 retq
4: 90 nop
5: 90 nop
6: 90 nop
7: 90 nop
8: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
f: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
To reproduce:
# build kernel
cd linux
cp config-5.9.0-rc1-00107-ge47110e90584a .config
make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang
On Thu, Apr 22, 2021 at 11:15 PM kernel test robot
<[email protected]> wrote:
>
> commit: e47110e90584a22e9980510b00d0dfad3a83354e ("mm/vunmap: add cond_resched() in vunmap_pmd_range")
Funky. That commit doesn't seem to have anything to do with the oops.
The oops is odd too:
> [ 198.731223] WARNING: CPU: 0 PID: 1948 at mm/vmalloc.c:2247 __vunmap (kbuild/src/consumer/mm/vmalloc.c:2247 (discriminator 1))
That's the warning for an unaligned vunmap():
2247 if (WARN(!PAGE_ALIGNED(addr), "Trying to vfree() bad
address (%p)\n",
2248 addr))
2249 return;
> [ 198.744933] Call Trace:
> [ 198.745229] free_module (kbuild/src/consumer/kernel/module.c:2251)
2248 /* This may be empty, but that's OK */
2249 module_arch_freeing_init(mod);
2250 module_memfree(mod->init_layout.base);
2251 kfree(mod->args);
That's the "module_memfree()" - the return address points to the
return point, which is the next line.
And as far as I can tell, the only thing that assigns anything but
NULL to that init_layout.base is
ptr = module_alloc(mod->init_layout.size);
which uses __vmalloc_node_range() for the allocation.
So absolutely nothing in this report makes sense to me. I suspect it's
some odd memory corruption.
Oliver - how reliable is that bisection?
Does anybody else see what might be up?
Linus
Hi Linus,
On Fri, Apr 23, 2021 at 10:18:18AM -0700, Linus Torvalds wrote:
> On Thu, Apr 22, 2021 at 11:15 PM kernel test robot
> <[email protected]> wrote:
> >
> > commit: e47110e90584a22e9980510b00d0dfad3a83354e ("mm/vunmap: add cond_resched() in vunmap_pmd_range")
>
> Funky. That commit doesn't seem to have anything to do with the oops.
>
> The oops is odd too:
>
> > [ 198.731223] WARNING: CPU: 0 PID: 1948 at mm/vmalloc.c:2247 __vunmap (kbuild/src/consumer/mm/vmalloc.c:2247 (discriminator 1))
>
> That's the warning for an unaligned vunmap():
>
> 2247 if (WARN(!PAGE_ALIGNED(addr), "Trying to vfree() bad
> address (%p)\n",
> 2248 addr))
> 2249 return;
>
> > [ 198.744933] Call Trace:
> > [ 198.745229] free_module (kbuild/src/consumer/kernel/module.c:2251)
>
> 2248 /* This may be empty, but that's OK */
> 2249 module_arch_freeing_init(mod);
> 2250 module_memfree(mod->init_layout.base);
> 2251 kfree(mod->args);
>
> That's the "module_memfree()" - the return address points to the
> return point, which is the next line.
>
> And as far as I can tell, the only thing that assigns anything but
> NULL to that init_layout.base is
>
> ptr = module_alloc(mod->init_layout.size);
>
> which uses __vmalloc_node_range() for the allocation.
>
> So absolutely nothing in this report makes sense to me. I suspect it's
> some odd memory corruption.
>
> Oliver - how reliable is that bisection?
we will check further if any issue in our test env.
by bot auto tests, we saw 12 issue instances out of 74 runs. but not happen
out of 100 runs of parent.
f3f99d63a8156c7a e47110e90584a22e9980510b00d
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
1:100 -1% :74 dmesg.BUG:kernel_reboot-without-warning_in_test_stage
2:100 0% 2:74 dmesg.BUG:unable_to_handle_page_fault_for_address
:100 12% 12:74 dmesg.Kernel_panic-not_syncing:Fatal_exception
2:100 0% 2:74 dmesg.Oops:#[##]
1:100 -1% :74 dmesg.RIP:__is_module_percpu_address
:100 12% 12:74 dmesg.RIP:__vunmap <-----
:100 12% 12:74 dmesg.RIP:kfree
:100 1% 1:74 dmesg.RIP:kobject_add_internal
2:100 -1% 1:74 dmesg.RIP:print_modules
1:100 -1% :74 dmesg.RIP:skip_spaces
1:100 -1% :74 dmesg.RIP:usercopy_abort
:100 1% 1:74 dmesg.WARNING:at_lib/kobject.c:#kobject_add_internal
:100 12% 12:74 dmesg.WARNING:at_mm/vmalloc.c:#__vunmap
3:100 10% 13:74 dmesg.boot_failures
1:100 -1% :74 dmesg.canonical_address#:#[##]
2:100 -2% :74 dmesg.invalid_opcode:#[##]
2:100 -2% :74 dmesg.kernel_BUG_at_mm/usercopy.c
:100 11% 11:74 dmesg.stack_segment:#[##]
>
> Does anybody else see what might be up?
>
> Linus
On Sat, Apr 24, 2021 at 6:31 PM Oliver Sang <[email protected]> wrote:
> >
> > Oliver - how reliable is that bisection?
>
> we will check further if any issue in our test env.
>
> by bot auto tests, we saw 12 issue instances out of 74 runs. but not happen
> out of 100 runs of parent.
Oh, that's interesting. So only 12 out of 74 runs saw that __vunmap
warning, but if I understand your table correctly, there were some
_other_ issues in there?
Are those also for that same commit? (ie those RIP:kfree /
RIP:kobject_add_internal / etc)?
I'm not sure how to read that table of yours - if I understand it
correctly, it looks like the parent commit had some different ones
that the child did not (eg 2 cases of BUG_at_mm/usercopy.c?)
So it feels to me like there's some memory corruption somewhere, and
that commit that it bisected to likely just changed the failure case
(due to timing differences or allocation ordering changes).
IOW, there seem to be other panics even in the parent.
Yes/No?
Linus
Hi Linus,
On 4/24/2021 1:18 AM, Linus Torvalds wrote:
> On Thu, Apr 22, 2021 at 11:15 PM kernel test robot
> <[email protected]> wrote:
>>
>> commit: e47110e90584a22e9980510b00d0dfad3a83354e ("mm/vunmap: add cond_resched() in vunmap_pmd_range")
>
> Funky. That commit doesn't seem to have anything to do with the oops.
>
> The oops is odd too:
>
>> [ 198.731223] WARNING: CPU: 0 PID: 1948 at mm/vmalloc.c:2247 __vunmap (kbuild/src/consumer/mm/vmalloc.c:2247 (discriminator 1))
>
> That's the warning for an unaligned vunmap():
>
> 2247 if (WARN(!PAGE_ALIGNED(addr), "Trying to vfree() bad
> address (%p)\n",
> 2248 addr))
> 2249 return;
>
>> [ 198.744933] Call Trace:
>> [ 198.745229] free_module (kbuild/src/consumer/kernel/module.c:2251)
>
> 2248 /* This may be empty, but that's OK */
> 2249 module_arch_freeing_init(mod);
> 2250 module_memfree(mod->init_layout.base);
We add debug code to print logs when mod->init_layout.base is NULL,
after more than 100 times test, we find that when mod->init_layout.base
is NULL, no align warning happened. From the descriptions of vfree, if
@addr is NULL, no operation is performed. So when the warning happened,
the mod->init_layout.base is not a NULL.
void vfree(const void *addr)
{
BUG_ON(in_nmi());
kmemleak_free(addr);
might_sleep_if(!in_interrupt());
if (!addr)
return;
__vfree(addr);
}
static void __vfree(const void *addr)
{
if (unlikely(in_interrupt()))
__vfree_deferred(addr);
else
__vunmap(addr, 1);
}
> 2251 kfree(mod->args);
>
> That's the "module_memfree()" - the return address points to the
> return point, which is the next line.
>
> And as far as I can tell, the only thing that assigns anything but
> NULL to that init_layout.base is
>
> ptr = module_alloc(mod->init_layout.size);
>
> which uses __vmalloc_node_range() for the allocation.
>
> So absolutely nothing in this report makes sense to me. I suspect it's
> some odd memory corruption.
>
> Oliver - how reliable is that bisection?
>
> Does anybody else see what might be up?
>
> Linus
> _______________________________________________
> LKP mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
--
Zhengjun Xing
Hi, Linus,
On Sat, Apr 24, 2021 at 07:04:00PM -0700, Linus Torvalds wrote:
> On Sat, Apr 24, 2021 at 6:31 PM Oliver Sang <[email protected]> wrote:
> > >
> > > Oliver - how reliable is that bisection?
> >
> > we will check further if any issue in our test env.
> >
> > by bot auto tests, we saw 12 issue instances out of 74 runs. but not happen
> > out of 100 runs of parent.
>
> Oh, that's interesting. So only 12 out of 74 runs saw that __vunmap
> warning, but if I understand your table correctly, there were some
> _other_ issues in there?
yes, there are.
>
> Are those also for that same commit? (ie those RIP:kfree /
yes. the 1st column (fail:runs) is for parent,
the 3rd colum (fail:runs) is for e47110e905
> RIP:kobject_add_internal / etc)?
the RIP:kfree happens just after __vunmap warning (as attached dmesg):
===================================================================
[ 198.730073] ------------[ cut here ]------------
[ 198.730597] Trying to vfree() bad address (0000000070935066)
[ 198.731223] WARNING: CPU: 0 PID: 1948 at mm/vmalloc.c:2247 __vunmap+0x663/0x990
..
[ 198.763940] ---[ end trace 572fd76a7879a124 ]---
[ 198.764449] stack segment: 0000 [#1] SMP KASAN
[ 198.764933] CPU: 0 PID: 1948 Comm: systemd-udevd Tainted: G W 5.9.0-rc1-00107-ge47110e90584a #1
[ 198.765985] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 198.766872] RIP: 0010:kfree+0x64/0x360
==================================================================
and it did not happen on parent:
f3f99d63a8156c7a e47110e90584a22e9980510b00d
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:998 11% 108:432 dmesg.RIP:kfree
RIP:kobject_add_internal happens on both commits after more runs,
but much lower happen rate than __vunmap (or kfree) on e47110e905.
:998 13% 125:432 dmesg.RIP:__vunmap
:998 11% 108:432 dmesg.RIP:kfree
11:998 -1% 6:432 dmesg.RIP:kobject_add_internal
>
> I'm not sure how to read that table of yours - if I understand it
> correctly, it looks like the parent commit had some different ones
yes, the parent has some different ones that e47110e905 did not have
(below there is a full table after we run both parent and the e47110e905
more times which has more details)
> that the child did not (eg 2 cases of BUG_at_mm/usercopy.c?)
after more runs, the BUG_at_mm/usercopy.c is also reproduced on e47110e905
9:998 -1% 3:432 dmesg.kernel_BUG_at_mm/usercopy.c
>
> So it feels to me like there's some memory corruption somewhere, and
> that commit that it bisected to likely just changed the failure case
> (due to timing differences or allocation ordering changes).
>
> IOW, there seem to be other panics even in the parent.
>
> Yes/No?
Yes. we also concerned that the panics on parent could cover the __vunmap issue,
so we ran it up to almost 1000 times.
f3f99d63a8156c7a e47110e90584a22e9980510b00d
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
3:998 0% 6:432 dmesg.BUG:kernel_NULL_pointer_dereference,address
1:998 0% 2:432 dmesg.BUG:non-zero_pgtables_bytes_on_freeing_mm
24:998 0% 25:432 dmesg.BUG:unable_to_handle_page_fault_for_address
1:998 -0% :432 dmesg.Bad_pagetable:#[##]
2:998 -0% :432 dmesg.INFO:rcu_sched_detected_stalls_on_CPUs/tasks
40:998 14% 180:432 dmesg.Kernel_panic-not_syncing:Fatal_exception
:998 0% 1:432 dmesg.Kernel_panic-not_syncing:Fatal_exception_in_interrupt
27:998 0% 31:432 dmesg.Oops:#[##]
1:998 -0% :432 dmesg.RIP:__asan_report_load8_noabort
6:998 -0% 4:432 dmesg.RIP:__is_module_percpu_address
1:998 -0% :432 dmesg.RIP:__kasan_check_read
1:998 0% 5:432 dmesg.RIP:__lock_acquire
:998 1% 7:432 dmesg.RIP:__migration_entry_wait
1:998 0% 1:432 dmesg.RIP:__module_address
2:998 0% 3:432 dmesg.RIP:__rb_insert_augmented
:998 13% 125:432 dmesg.RIP:__vunmap
2:998 -0% :432 dmesg.RIP:anon_vma_interval_tree_insert
:998 0% 2:432 dmesg.RIP:apply_relocate_add
1:998 -0% :432 dmesg.RIP:check_memory_region
1:998 0% 1:432 dmesg.RIP:copy_pte_range
1:998 -0% :432 dmesg.RIP:copy_user_generic_string
2:998 -0% 1:432 dmesg.RIP:deactivate_slab
:998 0% 1:432 dmesg.RIP:del_usage_links
1:998 -0% :432 dmesg.RIP:do_raw_spin_trylock
:998 0% 1:432 dmesg.RIP:ep_send_events_proc
:998 0% 2:432 dmesg.RIP:find_vma
:998 1% 6:432 dmesg.RIP:free_percpu
1:998 0% 1:432 dmesg.RIP:handle_mm_fault
1:998 -0% :432 dmesg.RIP:kasan_report
1:998 0% 1:432 dmesg.RIP:kernfs_find_ns
1:998 -0% :432 dmesg.RIP:kernfs_link_sibling
:998 11% 108:432 dmesg.RIP:kfree
1:998 0% 1:432 dmesg.RIP:kmem_cache_alloc
:998 0% 2:432 dmesg.RIP:kmem_cache_alloc_trace
11:998 -1% 6:432 dmesg.RIP:kobject_add_internal
:998 0% 1:432 dmesg.RIP:llist_del_first
:998 0% 2:432 dmesg.RIP:load_module
1:998 -0% :432 dmesg.RIP:lock_release
7:998 -1% 1:432 dmesg.RIP:module_put
:998 0% 1:432 dmesg.RIP:module_remove_modinfo_attrs
1:998 -0% :432 dmesg.RIP:native_queued_spin_lock_slowpath
:998 0% 1:432 dmesg.RIP:native_safe_halt
:998 0% 1:432 dmesg.RIP:nmi_uaccess_okay
:998 0% 1:432 dmesg.RIP:paravirt_patch_default.cold
23:998 -1% 17:432 dmesg.RIP:print_modules
2:998 -0% 1:432 dmesg.RIP:qlist_free_all
1:998 -0% :432 dmesg.RIP:rb_erase
:998 0% 1:432 dmesg.RIP:rb_next
26:998 0% 27:432 dmesg.RIP:skip_spaces
:998 0% 1:432 dmesg.RIP:strncmp
:998 0% 1:432 dmesg.RIP:sysfs_file_ops
1:998 -0% :432 dmesg.RIP:sysfs_remove_group
:998 0% 1:432 dmesg.RIP:update_sd_lb_stats
7:998 -1% 1:432 dmesg.RIP:usercopy_abort
:998 0% 2:432 dmesg.RIP:vmalloc_to_page
:998 0% 1:432 dmesg.RIP:zap_p4d_range
1:998 -0% :432 dmesg.WARNING:at_fs/sysfs/group.c:#sysfs_remove_group
:998 0% 2:432 dmesg.WARNING:at_kernel/locking/lockdep.c:#__lock_acquire
7:998 -1% 1:432 dmesg.WARNING:at_kernel/module.c:#module_put
11:998 -1% 6:432 dmesg.WARNING:at_lib/kobject.c:#kobject_add_internal
:998 0% 1:432 dmesg.WARNING:at_mm/memory.c:#wp_page_copy
:998 13% 125:432 dmesg.WARNING:at_mm/vmalloc.c:#__vunmap
:998 0% 2:432 dmesg.WARNING:at_mm/vmalloc.c:#vmalloc_to_page
1:998 -0% :432 dmesg.WARNING:stack_going_in_the_wrong_direction?at_asm_exc_double_fault/0x
1:998 -0% :432 dmesg.WARNING:stack_recursion
42:998 3% 67:432 dmesg.canonical_address#:#[##]
9:998 -1% 4:432 dmesg.invalid_opcode:#[##]
1:998 -0% :432 dmesg.kernel_BUG_at_arch/x86/entry/common.c
:998 0% 1:432 dmesg.kernel_BUG_at_arch/x86/kernel/paravirt.c
9:998 -1% 3:432 dmesg.kernel_BUG_at_mm/usercopy.c
:998 10% 103:432 dmesg.stack_segment:#[##]
BTW, we noticed the e91d8d7823 ("mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING")
is a fix for e47110e90584, so we also tested it, as well as the v5.12-rc8,
found the issue still exists though lower happen rate.
f3f99d63a8156c7a e47110e90584a22e e91d8d78237de8d7 v5.12-rc8
---------------- ---------------- ---------------- ----------------
fail:runs fail:runs fail:runs fail:runs
| | | |
:998 125:432 46:299 35:334 dmesg.RIP:__vunmap
:998 125:432 46:299 35:334 dmesg.WARNING:at_mm/vmalloc.c:#__vunmap
>
> Linus
[ Backgroudn for Jessica:
https://lore.kernel.org/lkml/20210423063227.GA17429@xsang-OptiPlex-9020/
with some more emails in this thread ]
On Thu, Apr 29, 2021 at 12:36 AM Oliver Sang <[email protected]> wrote:
> >
> > Oh, that's interesting. So only 12 out of 74 runs saw that __vunmap
> > warning, but if I understand your table correctly, there were some
> > _other_ issues in there?
>
> yes, there are.
Yeah, so it really does look like the commit you bisected to changes
timing - and thus just exposes an existing problem.
And it's likely a race condition, and it exposes the existing problem
not just in a new place, but much *MORE*.
The fact that you see something like
Trying to vfree() bad address (0000000070935066)
means that clearly something has corrupted the the module data in
'struct module *'. Because that should have been a core_layout.base
pointer that was allocated with moodule_alloc(), which is just a
vmalloc wrapper on x86-64.
The allocations in the module handling are somewhat odd (putting it
mildly), with that 'struct module *' actually being a pointer *into*
the module allocation itself, so I suspect there's some race with
del_module() or something, and the reason you bisect to that commit
e47110e90584a22e9980510b00d0dfad3a83354e is that the added
cond_schedule() in the vfree path now exposes exactly that race of two
things trying to free the same module at the same time.
This all *should* be serialized by various subtle things
- MODULE_STATE_UNFORMED means that modules are bypassed
- the actual allocation/freeing functions use module_mutex to serialize
- some lookup functions use RCU to look up a module but should then
only look at things like the symbol arrays etc.
but if somehow del_module() can race with do_init_module() creating a
module halfway and then doing free_module() (or perhaps multiple
module loads of the same module racing and aborting), then I can see
one freeing the underlying module data, and the other one seeing
corrupt 'struct module *' as a result.
I suspect the other oopses are different symptoms of the same bug, and
I wonder if they woudl give more hints about what the "other side" of
the thing is. Like that BUG:unable_to_handle_page_fault_for_address /
canonical_address one that you have multiple hits for on both side of
that commit you bisected to?
Jessica, have you seen any particular problematic module loading
reports that could give other hints about what goes wrong?
Linus
+++ Linus Torvalds [29/04/21 10:31 -0700]:
>[ Backgroudn for Jessica:
>
> https://lore.kernel.org/lkml/20210423063227.GA17429@xsang-OptiPlex-9020/
>
> with some more emails in this thread ]
>
>On Thu, Apr 29, 2021 at 12:36 AM Oliver Sang <[email protected]> wrote:
>> >
>> > Oh, that's interesting. So only 12 out of 74 runs saw that __vunmap
>> > warning, but if I understand your table correctly, there were some
>> > _other_ issues in there?
>>
>> yes, there are.
>
>Yeah, so it really does look like the commit you bisected to changes
>timing - and thus just exposes an existing problem.
>
>And it's likely a race condition, and it exposes the existing problem
>not just in a new place, but much *MORE*.
>
>The fact that you see something like
>
> Trying to vfree() bad address (0000000070935066)
>
>means that clearly something has corrupted the the module data in
>'struct module *'. Because that should have been a core_layout.base
>pointer that was allocated with moodule_alloc(), which is just a
>vmalloc wrapper on x86-64.
>
>The allocations in the module handling are somewhat odd (putting it
>mildly), with that 'struct module *' actually being a pointer *into*
>the module allocation itself, so I suspect there's some race with
>del_module() or something, and the reason you bisect to that commit
>e47110e90584a22e9980510b00d0dfad3a83354e is that the added
>cond_schedule() in the vfree path now exposes exactly that race of two
>things trying to free the same module at the same time.
>
>This all *should* be serialized by various subtle things
>
> - MODULE_STATE_UNFORMED means that modules are bypassed
>
> - the actual allocation/freeing functions use module_mutex to serialize
>
> - some lookup functions use RCU to look up a module but should then
>only look at things like the symbol arrays etc.
>
>but if somehow del_module() can race with do_init_module() creating a
>module halfway and then doing free_module() (or perhaps multiple
>module loads of the same module racing and aborting), then I can see
>one freeing the underlying module data, and the other one seeing
>corrupt 'struct module *' as a result.
The situation is bizarre to me. From the call trace:
[ 198.744933] Call Trace:
[ 198.745229] free_module (kbuild/src/consumer/kernel/module.c:2251)
[ 198.745629] do_init_module (kbuild/src/consumer/kernel/module.c:3705)
[ 198.746054] ? rcu_read_lock_bh_held (kbuild/src/consumer/kernel/rcu/update.c:131)
[ 198.746576] load_module (kbuild/src/consumer/kernel/module.c:3968)
This tells me that we were in the middle of loading a module and hit one of the
error conditions in do_init_module() - either the kmalloc() call failed or the
module init function returned an error and now we've landed in the error path
of do_init_module(), hence the call to free_module().
But if we failed here, this also means the module never made it to the LIVE
state - it was COMING, then immediately GOING. There is a liveness check in
delete_module(), so any calls to delete_module() on this module should have
returned -EBUSY. So module removal operations should not even be touching this
module. Multiple module loads of the same name should have blocked while
waiting for this module to finish loading or fail loading.
Sorry I could not be more immediately helpful here, I will have to
stare at the error report a bit more and chew through some possible
race scenarios and get back to you on this..