Greeting,
FYI, we noticed the following commit (built with clang-15):
commit: 408579cd627a15bd703fe3eeb8485fd02726e9d3 ("mm: Update do_vmi_align_munmap() return semantics")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: boot
on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
[ 215.088258][ T1] WARNING: suspicious RCU usage
[ 215.089042][ T1] 6.4.0-09907-g408579cd627a #1 Not tainted
[ 215.090018][ T1] -----------------------------
[ 215.090877][ T1] lib/maple_tree.c:860 suspicious rcu_dereference_check() usage!
[ 215.092160][ T1]
[ 215.092160][ T1] other info that might help us debug this:
[ 215.092160][ T1]
[ 215.093936][ T1]
[ 215.093936][ T1] rcu_scheduler_active = 2, debug_locks = 1
[ 215.095232][ T1] no locks held by init/1.
[ 215.095989][ T1]
[ 215.095989][ T1] stack backtrace:
[ 215.097067][ T1] CPU: 0 PID: 1 Comm: init Not tainted 6.4.0-09907-g408579cd627a #1
[ 215.098372][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 215.099963][ T1] Call Trace:
[ 215.100474][ T1] <TASK>
[ 215.101027][ T1] dump_stack_lvl (lib/dump_stack.c:107)
[ 215.101807][ T1] lockdep_rcu_suspicious (include/linux/context_tracking.h:122)
[ 215.102688][ T1] mas_start (lib/maple_tree.c:?)
[ 215.103355][ T1] mas_find (lib/maple_tree.c:3861 lib/maple_tree.c:4980 lib/maple_tree.c:5924 lib/maple_tree.c:5965)
[ 215.104024][ T1] validate_mm (mm/mmap.c:301)
[ 215.104777][ T1] do_vmi_align_munmap (mm/mmap.c:?)
[ 215.105691][ T1] do_vmi_munmap (mm/mmap.c:2619)
[ 215.106384][ T1] __vm_munmap (mm/mmap.c:2899)
[ 215.107145][ T1] __x64_sys_munmap (mm/mmap.c:2916 mm/mmap.c:2913 mm/mmap.c:2913)
[ 215.107962][ T1] do_syscall_64 (arch/x86/entry/common.c:?)
[ 215.108784][ T1] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 215.109733][ T1] RIP: 0033:0x7f5ef4577e07
[ 215.110443][ T1] Code: 0f 1f 00 f7 d8 89 05 60 03 01 00 48 c7 c0 ff ff ff ff eb 99 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8d 0d 31 03 01 00 f7 d8 89 01 48 83
All code
========
0: 0f 1f 00 nopl (%rax)
3: f7 d8 neg %eax
5: 89 05 60 03 01 00 mov %eax,0x10360(%rip) # 0x1036b
b: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
12: eb 99 jmp 0xffffffffffffffad
14: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
1b: 00 00 00
1e: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
23: b8 0b 00 00 00 mov $0xb,%eax
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
30: 73 01 jae 0x33
32: c3 ret
33: 48 8d 0d 31 03 01 00 lea 0x10331(%rip),%rcx # 0x1036b
3a: f7 d8 neg %eax
3c: 89 01 mov %eax,(%rcx)
3e: 48 rex.W
3f: 83 .byte 0x83
Code starting with the faulting instruction
===========================================
0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
6: 73 01 jae 0x9
8: c3 ret
9: 48 8d 0d 31 03 01 00 lea 0x10331(%rip),%rcx # 0x10341
10: f7 d8 neg %eax
12: 89 01 mov %eax,(%rcx)
14: 48 rex.W
15: 83 .byte 0x83
[ 215.113442][ T1] RSP: 002b:00007fff4bee7608 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
[ 215.114734][ T1] RAX: ffffffffffffffda RBX: 0000008d3439883d RCX: 00007f5ef4577e07
[ 215.115953][ T1] RDX: 0000008d00000000 RSI: 00000000000065e0 RDI: 00007f5ef42bf000
[ 215.117299][ T1] RBP: 00007fff4bee78b0 R08: 0000000000000000 R09: 00007f5ef3fd6738
[ 215.118541][ T1] R10: 00007f5ef4586f78 R11: 0000000000000202 R12: 0000008d2ecd462d
[ 215.119813][ T1] R13: 00007f5ef4588180 R14: 00007f5ef4588180 R15: 00007f5ef38ef900
[ 215.121130][ T1] </TASK>
[ 215.121750][ T1]
[ 215.122149][ T1] =============================
[ 215.122955][ T1] WARNING: suspicious RCU usage
[ 215.123772][ T1] 6.4.0-09907-g408579cd627a #1 Not tainted
[ 215.124752][ T1] -----------------------------
[ 215.125549][ T1] lib/maple_tree.c:816 suspicious rcu_dereference_check() usage!
[ 215.126747][ T1]
[ 215.126747][ T1] other info that might help us debug this:
[ 215.126747][ T1]
[ 215.128401][ T1]
[ 215.128401][ T1] rcu_scheduler_active = 2, debug_locks = 1
[ 215.129720][ T1] no locks held by init/1.
[ 215.130417][ T1]
[ 215.130417][ T1] stack backtrace:
[ 215.131392][ T1] CPU: 0 PID: 1 Comm: init Not tainted 6.4.0-09907-g408579cd627a #1
[ 215.132633][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 215.134353][ T1] Call Trace:
[ 215.134896][ T1] <TASK>
[ 215.135380][ T1] dump_stack_lvl (lib/dump_stack.c:107)
[ 215.136130][ T1] lockdep_rcu_suspicious (include/linux/context_tracking.h:122)
[ 215.137047][ T1] mtree_range_walk (lib/maple_tree.c:?)
[ 215.137865][ T1] mas_find (lib/maple_tree.c:3868 lib/maple_tree.c:4980 lib/maple_tree.c:5924 lib/maple_tree.c:5965)
[ 215.138570][ T1] validate_mm (mm/mmap.c:301)
[ 215.139322][ T1] do_vmi_align_munmap (mm/mmap.c:?)
[ 215.140203][ T1] do_vmi_munmap (mm/mmap.c:2619)
[ 215.140985][ T1] __vm_munmap (mm/mmap.c:2899)
[ 215.141721][ T1] __x64_sys_munmap (mm/mmap.c:2916 mm/mmap.c:2913 mm/mmap.c:2913)
[ 215.142511][ T1] do_syscall_64 (arch/x86/entry/common.c:?)
[ 215.143276][ T1] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 215.144276][ T1] RIP: 0033:0x7f5ef4577e07
[ 215.144998][ T1] Code: 0f 1f 00 f7 d8 89 05 60 03 01 00 48 c7 c0 ff ff ff ff eb 99 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8d 0d 31 03 01 00 f7 d8 89 01 48 83
All code
========
0: 0f 1f 00 nopl (%rax)
3: f7 d8 neg %eax
5: 89 05 60 03 01 00 mov %eax,0x10360(%rip) # 0x1036b
b: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
12: eb 99 jmp 0xffffffffffffffad
14: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
1b: 00 00 00
1e: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
23: b8 0b 00 00 00 mov $0xb,%eax
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
30: 73 01 jae 0x33
32: c3 ret
33: 48 8d 0d 31 03 01 00 lea 0x10331(%rip),%rcx # 0x1036b
3a: f7 d8 neg %eax
3c: 89 01 mov %eax,(%rcx)
3e: 48 rex.W
3f: 83 .byte 0x83
Code starting with the faulting instruction
===========================================
0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
6: 73 01 jae 0x9
8: c3 ret
9: 48 8d 0d 31 03 01 00 lea 0x10331(%rip),%rcx # 0x10341
10: f7 d8 neg %eax
12: 89 01 mov %eax,(%rcx)
14: 48 rex.W
15: 83 .byte 0x83
[ 215.148009][ T1] RSP: 002b:00007fff4bee7608 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
[ 215.149494][ T1] RAX: ffffffffffffffda RBX: 0000008d3439883d RCX: 00007f5ef4577e07
[ 215.150834][ T1] RDX: 0000008d00000000 RSI: 00000000000065e0 RDI: 00007f5ef42bf000
[ 215.152031][ T1] RBP: 00007fff4bee78b0 R08: 0000000000000000 R09: 00007f5ef3fd6738
[ 215.153251][ T1] R10: 00007f5ef4586f78 R11: 0000000000000202 R12: 0000008d2ecd462d
[ 215.154510][ T1] R13: 00007f5ef4588180 R14: 00007f5ef4588180 R15: 00007f5ef38ef900
[ 215.155756][ T1] </TASK>
[ 215.182678][ T1] systemd[1]: RTC configured in localtime, applying delta of 0 minutes to system time.
Welcome to Debian GNU/Linux 11 (bullseye)!
[ 215.449392][ T79] process 79 ((sd-executor)) attempted a POSIX timer syscall while CONFIG_POSIX_TIMERS is not set
[ 215.758912][ T84] systemd-getty-g (84) used greatest stack depth: 26896 bytes left
[ 215.948756][ T80] systemd-bless-b (80) used greatest stack depth: 26864 bytes left
[ 218.217378][ C1] random: crng init done
To reproduce:
# build kernel
cd linux
cp config-6.4.0-09907-g408579cd627a .config
make HOSTCC=clang-15 CC=clang-15 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=clang-15 CC=clang-15 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
--
0-DAY CI Kernel Test Service
https://01.org/lkp
On Sun, 2 Jul 2023 at 19:04, kernel test robot <[email protected]> wrote:
>
> commit: 408579cd627a15bd703fe3eeb8485fd02726e9d3 ("mm: Update do_vmi_align_munmap() return semantics")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: boot
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
> [ 215.088258][ T1] WARNING: suspicious RCU usage
I think the
validate_mm(mm);
call (mm/mmap.c:2561) needs to be moved up to before the
if (unlock)
mmap_read_unlock(mm);
that is just a couple of lines earlier.
Can you verify that that fixes the warning?
Linus
Hey Linus,
On Sun, Jul 02, 2023 at 09:34:35PM -0700, Linus Torvalds wrote:
> On Sun, 2 Jul 2023 at 19:04, kernel test robot <[email protected]> wrote:
> >
> > commit: 408579cd627a15bd703fe3eeb8485fd02726e9d3 ("mm: Update do_vmi_align_munmap() return semantics")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > in testcase: boot
> >
> > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> >
> >
> >
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <[email protected]>
I'm not entirely sure if it is related, as stuff in the guts of mm like
this is beyond me, but I've been seeing similar warnings on RISC-V. This
is the only hit on lore for "lib/maple_tree.c:{860,816} suspicious
rcu_dereference_check() usage!", so I figured it might be. Apologies if
not!
Running RCU synchronous self tests
Running RCU synchronous self tests
RCU Tasks: Setting shift to 4 and lim to 1 rcu_task_cb_adjust=1.
RCU Tasks Trace: Setting shift to 4 and lim to 1 rcu_task_cb_adjust=1.
Running RCU-tasks wait API self tests
riscv: ELF compat mode supported
ASID allocator using 16 bits (65536 entries)
rcu: Hierarchical SRCU implementation.
rcu: Max phase no-delay instances is 1000.
EFI services will not be available.
Callback from call_rcu_tasks_trace() invoked.
smp: Bringing up secondary CPUs ...
=============================
WARNING: suspicious RCU usage
6.4.0-10173-ga901a3568fd2 #1 Not tainted
-----------------------------
lib/maple_tree.c:860 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
1 lock held by swapper/1/0:
#0: ffffffff8169ceb0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x32
stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.4.0-10173-ga901a3568fd2 #1
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff80006a20>] show_stack+0x2c/0x38
[<ffffffff80af3ee0>] dump_stack_lvl+0x5e/0x80
[<ffffffff80af3f16>] dump_stack+0x14/0x1c
[<ffffffff80083ff0>] lockdep_rcu_suspicious+0x19e/0x232
[<ffffffff80ad4802>] mtree_load+0x18a/0x3b6
[<ffffffff80091632>] __irq_get_desc_lock+0x2c/0x82
[<ffffffff80094722>] enable_percpu_irq+0x36/0x9e
[<ffffffff800087d4>] riscv_ipi_enable+0x32/0x4e
[<ffffffff80008692>] smp_callin+0x24/0x66
=============================
WARNING: suspicious RCU usage
6.4.0-10173-ga901a3568fd2 #1 Not tainted
-----------------------------
lib/maple_tree.c:816 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
1 lock held by swapper/1/0:
#0: ffffffff8169ceb0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x32
stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.4.0-10173-ga901a3568fd2 #1
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff80006a20>] show_stack+0x2c/0x38
[<ffffffff80af3ee0>] dump_stack_lvl+0x5e/0x80
[<ffffffff80af3f16>] dump_stack+0x14/0x1c
[<ffffffff80083ff0>] lockdep_rcu_suspicious+0x19e/0x232
[<ffffffff80ad4932>] mtree_load+0x2ba/0x3b6
[<ffffffff80091632>] __irq_get_desc_lock+0x2c/0x82
[<ffffffff80094722>] enable_percpu_irq+0x36/0x9e
[<ffffffff800087d4>] riscv_ipi_enable+0x32/0x4e
[<ffffffff80008692>] smp_callin+0x24/0x66
smp: Brought up 1 node, 16 CPUs
> > [ 215.088258][ T1] WARNING: suspicious RCU usage
>
> I think the
>
> validate_mm(mm);
>
> call (mm/mmap.c:2561) needs to be moved up to before the
>
> if (unlock)
> mmap_read_unlock(mm);
>
> that is just a couple of lines earlier.
>
> Can you verify that that fixes the warning?
I can't speak for Oliver & LKP, but that change didn't fix it for me,
although my issue could be different of course.
Cheers,
Conor.
On Mon, 3 Jul 2023 at 10:00, Conor Dooley <[email protected]> wrote:
>
> I'm not entirely sure if it is related, as stuff in the guts of mm like
> this is beyond me, but I've been seeing similar warnings on RISC-V.
No, that RISC-V warning is also about bad RCU usage, but that's a
different thing.
> RCU used illegally from offline CPU!
> rcu_scheduler_active = 1, debug_locks = 1
> 1 lock held by swapper/1/0:
> #0: ffffffff8169ceb0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x32
>
> stack backtrace:
> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.4.0-10173-ga901a3568fd2 #1
> Hardware name: riscv-virtio,qemu (DT)
> Call Trace:
> [<ffffffff80006a20>] show_stack+0x2c/0x38
> [<ffffffff80af3ee0>] dump_stack_lvl+0x5e/0x80
> [<ffffffff80af3f16>] dump_stack+0x14/0x1c
> [<ffffffff80083ff0>] lockdep_rcu_suspicious+0x19e/0x232
> [<ffffffff80ad4802>] mtree_load+0x18a/0x3b6
> [<ffffffff80091632>] __irq_get_desc_lock+0x2c/0x82
> [<ffffffff80094722>] enable_percpu_irq+0x36/0x9e
> [<ffffffff800087d4>] riscv_ipi_enable+0x32/0x4e
> [<ffffffff80008692>] smp_callin+0x24/0x66
This is also triggering on the maple tree sanity checks, but it' sa
different maple tree, and a different code sequence.
And a different case of suspicious RCU usage - not a lack of locking,
but simply using RCU before marking the CPU online.
I suspect the riscv_ipi_enable() in the RISC-V version of smp_callin()
needs to be moved down to below the
set_cpu_online(curr_cpuid, 1);
or was there some reason why it needed to be done quite _that_ early
in commit 832f15f42646 ("RISC-V: Treat IPIs as normal Linux IRQs")?
Added guilty parties to the cc.
Linus
hi, Linus,
On Sun, Jul 02, 2023 at 09:34:35PM -0700, Linus Torvalds wrote:
> On Sun, 2 Jul 2023 at 19:04, kernel test robot <[email protected]> wrote:
> >
> > commit: 408579cd627a15bd703fe3eeb8485fd02726e9d3 ("mm: Update do_vmi_align_munmap() return semantics")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > in testcase: boot
> >
> > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> >
> >
> >
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <[email protected]>
> >
> >
> > [ 215.088258][ T1] WARNING: suspicious RCU usage
>
> I think the
>
> validate_mm(mm);
>
> call (mm/mmap.c:2561) needs to be moved up to before the
>
> if (unlock)
> mmap_read_unlock(mm);
>
> that is just a couple of lines earlier.
>
> Can you verify that that fixes the warning?
by patch [1], we found the warning is not fixed.
we also found there are some changes in stack backtrace. now it's as below:
(detail dmesg is attached)
[ 26.412372][ T1] stack backtrace:
[ 26.412846][ T1] CPU: 0 PID: 1 Comm: systemd Not tainted 6.4.0-09908-gcb226fb1fb7a #1
[ 26.413506][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 26.414326][ T1] Call Trace:
[ 26.414605][ T1] <TASK>
[ 26.414847][ T1] dump_stack_lvl+0x73/0xc0
[ 26.415225][ T1] lockdep_rcu_suspicious+0x1b7/0x280
[ 26.415669][ T1] mas_start+0x280/0x400
[ 26.416037][ T1] mas_find+0x27a/0x400
[ 26.416391][ T1] validate_mm+0x8b/0x2c0
[ 26.416757][ T1] __se_sys_brk+0xa35/0xc00
[ 26.417141][ T1] ? vtime_user_exit+0x1a6/0x280
[ 26.417563][ T1] do_syscall_64+0x2b/0x80
[ 26.417935][ T1] entry_SYSCALL_64_after_hwframe+0x5e/0xc8
[ 26.418410][ T1] RIP: 0033:0x7f8a92160bb7
as a contrast, for commit 408579cd62:
[ 215.095989][ T1] stack backtrace:
[ 215.097067][ T1] CPU: 0 PID: 1 Comm: init Not tainted 6.4.0-09907-g408579cd627a #1
[ 215.098372][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 215.099963][ T1] Call Trace:
[ 215.100474][ T1] <TASK>
[ 215.101027][ T1] dump_stack_lvl+0x73/0xc0
[ 215.101807][ T1] lockdep_rcu_suspicious+0x1b7/0x280
[ 215.102688][ T1] mas_start+0x280/0x400
[ 215.103355][ T1] mas_find+0x27a/0x400
[ 215.104024][ T1] validate_mm+0x8b/0x2c0
[ 215.104777][ T1] do_vmi_align_munmap+0xf92/0x1180
[ 215.105691][ T1] do_vmi_munmap+0x1a4/0x240
[ 215.106384][ T1] __vm_munmap+0x19f/0x300
[ 215.107145][ T1] __x64_sys_munmap+0x60/0x80
[ 215.107962][ T1] do_syscall_64+0x2b/0x80
[ 215.108784][ T1] entry_SYSCALL_64_after_hwframe+0x5e/0xc8
[ 215.109733][ T1] RIP: 0033:0x7f5ef4577e07
[1]
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2554,11 +2554,12 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
mas_set(&mas_detach, start);
remove_mt(mm, &mas_detach);
__mt_destroy(&mt_detach);
+
+ validate_mm(mm);
+
if (unlock)
mmap_read_unlock(mm);
-
- validate_mm(mm);
return 0;
clear_tree_failed:
>
> Linus
On Mon, 3 Jul 2023 at 18:48, Oliver Sang <[email protected]> wrote:
>
> by patch [1], we found the warning is not fixed.
Hmm. I already committed that "fix" as obvious, since the main
difference in commit 408579cd627a ("mm: Update do_vmi_align_munmap()
return semantics") around that validate_mm() call was how it did that
mmap_read_unlock().
> we also found there are some changes in stack backtrace. now it's as below:
> (detail dmesg is attached)
>
> [ 26.412372][ T1] stack backtrace:
> [ 26.412846][ T1] CPU: 0 PID: 1 Comm: systemd Not tainted 6.4.0-09908-gcb226fb1fb7a #1
> [ 26.413506][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 26.414326][ T1] Call Trace:
> [ 26.414605][ T1] <TASK>
> [ 26.414847][ T1] dump_stack_lvl+0x73/0xc0
> [ 26.415225][ T1] lockdep_rcu_suspicious+0x1b7/0x280
> [ 26.415669][ T1] mas_start+0x280/0x400
> [ 26.416037][ T1] mas_find+0x27a/0x400
> [ 26.416391][ T1] validate_mm+0x8b/0x2c0
> [ 26.416757][ T1] __se_sys_brk+0xa35/0xc00
Ok, that is indeed a very different stack trace.
So maybe the fix is a real fix, but the first complaint shut up
lockdep, so this is the *second* and unrelated complaint.
And indeed: it turns out that do_vma_munmap() does this:
ret = do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
validate_mm(mm);
and so we have *another* validate_mm() that is now done outside the lock.
That one is actually pretty pointless. We've *just* validated the mm
already inside do_vmi_align_munmap(), except we only did it in one of
the two return cases.
So I think the fix is to just move that validate_mm() into the other
return case of do_vmi_align_munmap(), and remove it from the caller.
IOW, something like the attached (NOTE! This is in *addition* to the
previous patch, which is the same as the one you quoted, just with
slightly different whitespace as commit ae80b4041984: "mm: validate
the mm before dropping the mmap lock").
Linus
hi, Linus,
On Mon, Jul 03, 2023 at 07:29:48PM -0700, Linus Torvalds wrote:
> On Mon, 3 Jul 2023 at 18:48, Oliver Sang <[email protected]> wrote:
> >
> > by patch [1], we found the warning is not fixed.
>
> Hmm. I already committed that "fix" as obvious, since the main
> difference in commit 408579cd627a ("mm: Update do_vmi_align_munmap()
> return semantics") around that validate_mm() call was how it did that
> mmap_read_unlock().
>
> > we also found there are some changes in stack backtrace. now it's as below:
> > (detail dmesg is attached)
> >
> > [ 26.412372][ T1] stack backtrace:
> > [ 26.412846][ T1] CPU: 0 PID: 1 Comm: systemd Not tainted 6.4.0-09908-gcb226fb1fb7a #1
> > [ 26.413506][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > [ 26.414326][ T1] Call Trace:
> > [ 26.414605][ T1] <TASK>
> > [ 26.414847][ T1] dump_stack_lvl+0x73/0xc0
> > [ 26.415225][ T1] lockdep_rcu_suspicious+0x1b7/0x280
> > [ 26.415669][ T1] mas_start+0x280/0x400
> > [ 26.416037][ T1] mas_find+0x27a/0x400
> > [ 26.416391][ T1] validate_mm+0x8b/0x2c0
> > [ 26.416757][ T1] __se_sys_brk+0xa35/0xc00
>
> Ok, that is indeed a very different stack trace.
>
> So maybe the fix is a real fix, but the first complaint shut up
> lockdep, so this is the *second* and unrelated complaint.
>
> And indeed: it turns out that do_vma_munmap() does this:
>
> ret = do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
> validate_mm(mm);
>
> and so we have *another* validate_mm() that is now done outside the lock.
>
> That one is actually pretty pointless. We've *just* validated the mm
> already inside do_vmi_align_munmap(), except we only did it in one of
> the two return cases.
>
> So I think the fix is to just move that validate_mm() into the other
> return case of do_vmi_align_munmap(), and remove it from the caller.
>
> IOW, something like the attached (NOTE! This is in *addition* to the
> previous patch, which is the same as the one you quoted, just with
> slightly different whitespace as commit ae80b4041984: "mm: validate
> the mm before dropping the mmap lock").
Thanks a lot for guidance!
I applied below patch directly upon ae80b4041984, and confirmed the
WARNING gone. Thanks
>
> Linus
> mm/mmap.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 547b40531791..204ddcd52625 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2571,6 +2571,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> __mt_destroy(&mt_detach);
> start_split_failed:
> map_count_exceeded:
> + validate_mm(mm);
> return error;
> }
>
> @@ -3019,12 +3020,9 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> bool unlock)
> {
> struct mm_struct *mm = vma->vm_mm;
> - int ret;
>
> arch_unmap(mm, start, end);
> - ret = do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
> - validate_mm(mm);
> - return ret;
> + return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
> }
>
> /*
* Oliver Sang <[email protected]> [230704 09:51]:
> hi, Linus,
>
> On Mon, Jul 03, 2023 at 07:29:48PM -0700, Linus Torvalds wrote:
> > On Mon, 3 Jul 2023 at 18:48, Oliver Sang <[email protected]> wrote:
> > >
> > > by patch [1], we found the warning is not fixed.
> >
> > Hmm. I already committed that "fix" as obvious, since the main
> > difference in commit 408579cd627a ("mm: Update do_vmi_align_munmap()
> > return semantics") around that validate_mm() call was how it did that
> > mmap_read_unlock().
> >
> > > we also found there are some changes in stack backtrace. now it's as below:
> > > (detail dmesg is attached)
> > >
> > > [ 26.412372][ T1] stack backtrace:
> > > [ 26.412846][ T1] CPU: 0 PID: 1 Comm: systemd Not tainted 6.4.0-09908-gcb226fb1fb7a #1
> > > [ 26.413506][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > [ 26.414326][ T1] Call Trace:
> > > [ 26.414605][ T1] <TASK>
> > > [ 26.414847][ T1] dump_stack_lvl+0x73/0xc0
> > > [ 26.415225][ T1] lockdep_rcu_suspicious+0x1b7/0x280
> > > [ 26.415669][ T1] mas_start+0x280/0x400
> > > [ 26.416037][ T1] mas_find+0x27a/0x400
> > > [ 26.416391][ T1] validate_mm+0x8b/0x2c0
> > > [ 26.416757][ T1] __se_sys_brk+0xa35/0xc00
> >
> > Ok, that is indeed a very different stack trace.
> >
> > So maybe the fix is a real fix, but the first complaint shut up
> > lockdep, so this is the *second* and unrelated complaint.
> >
> > And indeed: it turns out that do_vma_munmap() does this:
> >
> > ret = do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
> > validate_mm(mm);
> >
> > and so we have *another* validate_mm() that is now done outside the lock.
> >
> > That one is actually pretty pointless. We've *just* validated the mm
> > already inside do_vmi_align_munmap(), except we only did it in one of
> > the two return cases.
> >
> > So I think the fix is to just move that validate_mm() into the other
> > return case of do_vmi_align_munmap(), and remove it from the caller.
> >
> > IOW, something like the attached (NOTE! This is in *addition* to the
> > previous patch, which is the same as the one you quoted, just with
> > slightly different whitespace as commit ae80b4041984: "mm: validate
> > the mm before dropping the mmap lock").
>
> Thanks a lot for guidance!
> I applied below patch directly upon ae80b4041984, and confirmed the
> WARNING gone. Thanks
>
Thanks for testing this.
I can clean more of this up now that the mmap locking has been changed.
For instance, we can drop a number of checks before a write (and all
read cases, if any remain) since there is no alteration without the
write lock.
Thanks,
Liam
* Liam R. Howlett <[email protected]> [230704 11:25]:
> * Oliver Sang <[email protected]> [230704 09:51]:
...
> > > Ok, that is indeed a very different stack trace.
> > >
> > > So maybe the fix is a real fix, but the first complaint shut up
> > > lockdep, so this is the *second* and unrelated complaint.
> > >
> > > And indeed: it turns out that do_vma_munmap() does this:
> > >
> > > ret = do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
> > > validate_mm(mm);
> > >
> > > and so we have *another* validate_mm() that is now done outside the lock.
> > >
> > > That one is actually pretty pointless. We've *just* validated the mm
> > > already inside do_vmi_align_munmap(), except we only did it in one of
> > > the two return cases.
> > >
> > > So I think the fix is to just move that validate_mm() into the other
> > > return case of do_vmi_align_munmap(), and remove it from the caller.
> > >
> > > IOW, something like the attached (NOTE! This is in *addition* to the
> > > previous patch, which is the same as the one you quoted, just with
> > > slightly different whitespace as commit ae80b4041984: "mm: validate
> > > the mm before dropping the mmap lock").
> >
> > Thanks a lot for guidance!
> > I applied below patch directly upon ae80b4041984, and confirmed the
> > WARNING gone. Thanks
> >
>
> Thanks for testing this.
>
> I can clean more of this up now that the mmap locking has been changed.
> For instance, we can drop a number of checks before a write (and all
> read cases, if any remain) since there is no alteration without the
> write lock.
Thanks Linus for fixing the two issues causing rcu warnings - my testing
had lockdep & maple tree debug, so I'm still working on why my testing
failed on the initial patch.
For the time being, I've used the config from the bot to test a larger
cleanup of this code. I'll send it through the mm development as to get
better test coverage.
Regards,
Liam