Hello,
kernel test robot noticed "WARNING:at_mm/gup.c:#__get_user_pages" on:
commit: a425ac5365f6cb3cc47bf83e6bff0213c10445f7 ("gup: add warning if some caller would seem to want stack expansion")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
we noticed this commit 'add a (temporary) warning' for the case that
'anybody actually does anything quite this strange'.
and in our this test, the warning hits. just FYI.
[test failed on linus/master a901a3568fd26ca9c4a82d8bc5ed5b3ed844d451]
[test failed on linux-next/master 296d53d8f84ce50ffaee7d575487058c8d437335]
in testcase: trinity
version: trinity-i386-abe9de86-1_20230429
with following parameters:
runtime: 300s
group: group-00
nr_groups: 5
test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/
compiler: clang-15
on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]
[ 410.961829][ T3941] WARNING: CPU: 1 PID: 3941 at mm/gup.c:1101 __get_user_pages (mm/gup.c:1101)
[ 410.963037][ T3941] Modules linked in: ipmi_devintf ipmi_msghandler crc32c_intel sha512_ssse3 sg pcspkr evdev floppy tiny_power_button button fuse
[ 410.964888][ T3941] CPU: 1 PID: 3941 Comm: trinity-c2 Not tainted 6.4.0-rc7-00013-ga425ac5365f6 #1
[ 410.966162][ T3941] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 410.967315][ T3941] RIP: 0010:__get_user_pages (mm/gup.c:1101)
[ 410.967988][ T3941] Code: f6 ff 49 8b 5e 20 81 e3 00 01 00 00 48 89 dd 48 c1 ed 08 48 c7 c7 40 9c 2a bd 89 ee 31 d2 31 c9 e8 0e cd f3 ff 48 85 db 74 02 <0f> 0b 48 c7 c7 70 9c 2a bd 89 ee 31 d2 31 c9 e8 f5 cc f3 ff 48 8b
All code
========
0: f6 ff idiv %bh
2: 49 8b 5e 20 mov 0x20(%r14),%rbx
6: 81 e3 00 01 00 00 and $0x100,%ebx
c: 48 89 dd mov %rbx,%rbp
f: 48 c1 ed 08 shr $0x8,%rbp
13: 48 c7 c7 40 9c 2a bd mov $0xffffffffbd2a9c40,%rdi
1a: 89 ee mov %ebp,%esi
1c: 31 d2 xor %edx,%edx
1e: 31 c9 xor %ecx,%ecx
20: e8 0e cd f3 ff call 0xfffffffffff3cd33
25: 48 85 db test %rbx,%rbx
28: 74 02 je 0x2c
2a:* 0f 0b ud2 <-- trapping instruction
2c: 48 c7 c7 70 9c 2a bd mov $0xffffffffbd2a9c70,%rdi
33: 89 ee mov %ebp,%esi
35: 31 d2 xor %edx,%edx
37: 31 c9 xor %ecx,%ecx
39: e8 f5 cc f3 ff call 0xfffffffffff3cd33
3e: 48 rex.W
3f: 8b .byte 0x8b
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 48 c7 c7 70 9c 2a bd mov $0xffffffffbd2a9c70,%rdi
9: 89 ee mov %ebp,%esi
b: 31 d2 xor %edx,%edx
d: 31 c9 xor %ecx,%ecx
f: e8 f5 cc f3 ff call 0xfffffffffff3cd09
14: 48 rex.W
15: 8b .byte 0x8b
[ 410.970326][ T3941] RSP: 0018:ffff8881478bfa10 EFLAGS: 00010206
[ 410.971186][ T3941] RAX: 0000000000000000 RBX: 0000000000000100 RCX: 0000000000000000
[ 410.972183][ T3941] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 410.973321][ T3941] RBP: 0000000000000001 R08: 0001ffffffffffff R09: 0000000000000000
[ 410.974484][ T3941] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000f69a9000
[ 410.975470][ T3941] R13: 0000000000000000 R14: ffff8881560d7708 R15: 0000000000000000
[ 410.976511][ T3941] FS: 0000000000000000(0000) GS:ffff88842fa00000(0063) knlGS:00000000f7f1c280
[ 410.977654][ T3941] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 410.978442][ T3941] CR2: 00000000f72ae000 CR3: 0000000155633000 CR4: 00000000000406a0
[ 410.979480][ T3941] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 410.980467][ T3941] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 410.981514][ T3941] Call Trace:
[ 410.981989][ T3941] <TASK>
[ 410.982436][ T3941] ? __warn (kernel/panic.c:673)
[ 410.983007][ T3941] ? __get_user_pages (mm/gup.c:1101)
[ 410.983719][ T3941] ? report_bug (lib/bug.c:?)
[ 410.984500][ T3941] ? handle_bug (arch/x86/kernel/traps.c:324)
[ 410.985177][ T3941] ? exc_invalid_op (arch/x86/kernel/traps.c:345)
[ 410.985772][ T3941] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568)
[ 410.986410][ T3941] ? __get_user_pages (mm/gup.c:1101)
[ 410.987100][ T3941] ? pvclock_clocksource_read_nowd (arch/x86/include/asm/pvclock.h:36 arch/x86/kernel/pvclock.c:79 arch/x86/kernel/pvclock.c:120)
[ 410.987939][ T3941] __gup_longterm_locked (mm/gup.c:1389)
[ 410.988605][ T3941] ? process_vm_rw (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/linux/mmap_lock.h:35 include/linux/mmap_lock.h:143 mm/process_vm_access.c:104 mm/process_vm_access.c:215 mm/process_vm_access.c:283)
[ 410.989355][ T3941] ? process_vm_rw (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/linux/mmap_lock.h:35 include/linux/mmap_lock.h:143 mm/process_vm_access.c:104 mm/process_vm_access.c:215 mm/process_vm_access.c:283)
[ 410.990202][ T3941] ? is_valid_gup_args (mm/gup.c:2162)
[ 410.991069][ T3941] pin_user_pages_remote (mm/gup.c:3132)
[ 410.991884][ T3941] process_vm_rw (mm/process_vm_access.c:105)
[ 410.992728][ T3941] ? __ct_user_exit (kernel/context_tracking.c:623)
[ 410.993526][ T3941] __ia32_sys_process_vm_readv (mm/process_vm_access.c:295 mm/process_vm_access.c:291 mm/process_vm_access.c:291)
[ 410.994422][ T3941] __do_fast_syscall_32 (arch/x86/entry/common.c:? arch/x86/entry/common.c:178)
[ 410.995197][ T3941] ? __do_fast_syscall_32 (arch/x86/entry/common.c:165)
[ 410.995988][ T3941] ? __do_fast_syscall_32 (arch/x86/entry/common.c:165)
[ 411.000892][ T3941] ? irqentry_exit (kernel/entry/common.c:446)
[ 411.001656][ T3941] do_fast_syscall_32 (arch/x86/entry/common.c:203)
[ 411.002442][ T3941] do_SYSENTER_32 (arch/x86/entry/common.c:246)
[ 411.003178][ T3941] entry_SYSENTER_compat_after_hwframe (arch/x86/entry/entry_64_compat.S:122)
[ 411.004161][ T3941] RIP: 0023:0xf7f21539
[ 411.004859][ T3941] Code: 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
All code
========
0: 03 74 b4 01 add 0x1(%rsp,%rsi,4),%esi
4: 10 07 adc %al,(%rdi)
6: 03 74 b0 01 add 0x1(%rax,%rsi,4),%esi
a: 10 08 adc %cl,(%rax)
c: 03 74 d8 01 add 0x1(%rax,%rbx,8),%esi
...
20: 00 51 52 add %dl,0x52(%rcx)
23: 55 push %rbp
24:* 89 e5 mov %esp,%ebp <-- trapping instruction
26: 0f 34 sysenter
28: cd 80 int $0x80
2a: 5d pop %rbp
2b: 5a pop %rdx
2c: 59 pop %rcx
2d: c3 ret
2e: 90 nop
2f: 90 nop
30: 90 nop
31: 90 nop
32: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
39: 00 00 00
3c: 0f .byte 0xf
3d: 1f (bad)
3e: 44 rex.R
...
Code starting with the faulting instruction
===========================================
0: 5d pop %rbp
1: 5a pop %rdx
2: 59 pop %rcx
3: c3 ret
4: 90 nop
5: 90 nop
6: 90 nop
7: 90 nop
8: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
f: 00 00 00
12: 0f .byte 0xf
13: 1f (bad)
14: 44 rex.R
To reproduce:
# build kernel
cd linux
cp config-6.4.0-rc7-00013-ga425ac5365f6 .config
make HOSTCC=clang-15 CC=clang-15 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=clang-15 CC=clang-15 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On Tue, 4 Jul 2023 at 00:03, kernel test robot <[email protected]> wrote:
>
> we noticed this commit 'add a (temporary) warning' for the case that
> 'anybody actually does anything quite this strange'.
> and in our this test, the warning hits. just FYI.
Yeah, so it looks like this is trinity doing system calls with random
arguments, and that will obviously hit the whole
"GUP will no longer expand the stack, warn if somebody seems to want
to do GUP under the stack"
test.
So then it will warn if somebody passes in bogus addresses that *used*
to maybe work.
But with a random argument tester like trinity, passing in random
bogus addresses is obviously expected, so the warning will trigger
even if it's not something that we would not want to keep working.
I do not have a good idea for how to not warn for things like syzbot
and trinity that do random system calls, and only warn for any
potential real applications that do crazy things and expect them to
work.
And I *do* want the backtrace from the warning (in this case, it shows
that it's the "process_vm_readv/writev()" path, which actually might
be worth adding stack expansion to, the same way __access_remote_vm()
does).
I guess I can do the limiting manually, and just avoid WARN_ON_ONCE().
If I do just "dump_stack()", will the kernel test robot react to that
too? IOW, would a patch like the attached make the kernel test robot
not react?
Linus
On 7/4/23 07:12, Linus Torvalds wrote:
...
> And I *do* want the backtrace from the warning (in this case, it shows
> that it's the "process_vm_readv/writev()" path, which actually might
> be worth adding stack expansion to, the same way __access_remote_vm()
> does).
This is good timing, because I was looking for an email to reply to, to
report that I've isolated a somewhat more valid user space behavior that
is triggering this, via __access_remote_vm().
It happens on my main workstation when there is a problem with any app
that uses Chromium's crashpad [1] to report problems. crashpad is a
rather sophisticated app that does extensive forensics, including a
bunch of pread64() system calls of /proc/pid/mem. Fortunately, the unit
test suite included with [1] also reproduces the problem very reliably.
So it boils down to this: crashpad is reading from a valid starting
address, inside the vma for the ld-linux-x86-64.so.2 file, but it reads
a full 4KB page's worth, which takes it past the end of that vma.
And although the expand_stack() logic is there as part of the
__access_remote_vm() path, that logic ignores the size of the read! So
it slips past without trying to expand the stack.
Note that while next vma is indeed the stack, it is 919 GB away--a very
large gap.
The crashpad app is arguably a little bit buggy. It is reading a bit
more than it should, and landing in between vmas for the tail end of the
read. However, the app seems to handle it OK.
/proc/pid/maps:
7f19d671a000-7f19d671c000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7fffc2c28000-7fffc2c4a000 [stack] (919 GB above previous vma)
"start" address in __get_user_pages(): 0x7f19d671c000
dump_vma() if the WARN_ON_ONCE() fires:
vma ffff8881143e0e60 start 00007fffc2c28000 end 00007fffc2c4a000 mm ffff888106adbd40
prot 8000000000000025 anon_vma ffff88810b0735f0 vm_ops 0000000000000000
pgoff 7ffffffdd file 0000000000000000 private_data 0000000000000000
flags: 0x100173(read|write|mayread|maywrite|mayexec|growsdown|account)
Analysis of the ranges, which overlap:
0x7f19d671a000 start of /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
0x7f19d671b8a0 pread64 start (read size: 4096)
0x7f19d671c000 end of /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
0x7f19d671c8a0 pread64 end
WARNING: CPU: 11 PID: 8490 at mm/gup.c:1180 __get_user_pages+0x53e/0x670
...
Call Trace:
<TASK>
? __warn+0xc5/0x1f0
? __get_user_pages+0x53e/0x670
? report_bug+0xcd/0x160
? handle_bug+0x3d/0x80
? exc_invalid_op+0x16/0x50
? asm_exc_invalid_op+0x16/0x20
? __get_user_pages+0x53e/0x670
get_user_pages_remote+0x17f/0x490
__access_remote_vm+0x106/0x2c0
mem_rw+0x134/0x1e0
vfs_read+0xdf/0x2d0
? __pfx_rcu_lock_release+0x10/0x10
? __fget_files+0x128/0x140
__x64_sys_pread64+0x6b/0xc0
do_syscall_64+0x41/0x90
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[1] https://chromium.googlesource.com/crashpad/crashpad.git
thanks,
--
John Hubbard
NVIDIA
hi Linus,
On Tue, Jul 04, 2023 at 07:12:23AM -0700, Linus Torvalds wrote:
> On Tue, 4 Jul 2023 at 00:03, kernel test robot <[email protected]> wrote:
> >
> > we noticed this commit 'add a (temporary) warning' for the case that
> > 'anybody actually does anything quite this strange'.
> > and in our this test, the warning hits. just FYI.
>
> Yeah, so it looks like this is trinity doing system calls with random
> arguments, and that will obviously hit the whole
>
> "GUP will no longer expand the stack, warn if somebody seems to want
> to do GUP under the stack"
>
> test.
>
> So then it will warn if somebody passes in bogus addresses that *used*
> to maybe work.
>
> But with a random argument tester like trinity, passing in random
> bogus addresses is obviously expected, so the warning will trigger
> even if it's not something that we would not want to keep working.
>
> I do not have a good idea for how to not warn for things like syzbot
> and trinity that do random system calls, and only warn for any
> potential real applications that do crazy things and expect them to
> work.
>
> And I *do* want the backtrace from the warning (in this case, it shows
> that it's the "process_vm_readv/writev()" path, which actually might
> be worth adding stack expansion to, the same way __access_remote_vm()
> does).
>
> I guess I can do the limiting manually, and just avoid WARN_ON_ONCE().
>
> If I do just "dump_stack()", will the kernel test robot react to that
> too? IOW, would a patch like the attached make the kernel test robot
> not react?
by applying below patch upon
"a425ac5365f6c gup: add warning if some caller would seem to want stack expansion"
then runing same trinity tests, we noticed there is no explict WARNING now,
instead, we saw below in dmesg (attached also):
[ 323.996325][ T3994] GUP no longer grows the stack f7197000-f723e000 (f7196000)
[ 323.997613][ T3994] CPU: 1 PID: 3994 Comm: trinity-c1 Not tainted 6.4.0-rc7-00014-ga7fb8f6e6830 #1
[ 323.998883][ T3994] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 324.000288][ T3994] Call Trace:
[ 324.000829][ T3994] <TASK>
[ 324.001326][ T3994] dump_stack_lvl+0xc5/0x140
[ 324.002020][ T3994] dump_stack+0xc/0x10
[ 324.002653][ T3994] __get_user_pages+0x78f/0x8d0
[ 324.003399][ T3994] __gup_longterm_locked+0xa2d/0xef0
[ 324.004170][ T3994] ? process_vm_rw+0x3c8/0x690
[ 324.004873][ T3994] ? process_vm_rw+0x3c8/0x690
[ 324.005594][ T3994] ? is_valid_gup_args+0x2a2/0x2b0
[ 324.006349][ T3994] pin_user_pages_remote+0x70/0xa0
[ 324.007107][ T3994] process_vm_rw+0x3f0/0x690
[ 324.007842][ T3994] ? __ct_user_exit+0x57/0x70
[ 324.008543][ T3994] __ia32_sys_process_vm_readv+0x75/0xa0
[ 324.009362][ T3994] __do_fast_syscall_32+0xed/0x130
[ 324.010116][ T3994] ? __do_fast_syscall_32+0x108/0x130
[ 324.010902][ T3994] ? __do_fast_syscall_32+0x108/0x130
[ 324.011690][ T3994] ? __do_fast_syscall_32+0x108/0x130
[ 324.012469][ T3994] ? irqentry_exit_to_user_mode+0x23/0x40
[ 324.013295][ T3994] ? irqentry_exit+0x6d/0xc0
[ 324.014002][ T3994] do_fast_syscall_32+0x2f/0x70
[ 324.014723][ T3994] do_SYSENTER_32+0x17/0x20
[ 324.015416][ T3994] entry_SYSENTER_compat_after_hwframe+0x53/0x62
[ 324.016311][ T3994] RIP: 0023:0xf7fb3539
[ 324.016942][ T3994] Code: 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd
80 <5d> 5a 59 c3 90 90 90 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
[ 324.019421][ T3994] RSP: 002b:00000000ffa1c20c EFLAGS: 00000292 ORIG_RAX: 000000000000015b
[ 324.020595][ T3994] RAX: ffffffffffffffda RBX: 0000000000000f9a RCX: 0000000057150a40
[ 324.021721][ T3994] RDX: 0000000000000001 RSI: 00000000571509d0 RDI: 0000000000000001
[ 324.022834][ T3994] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 324.023981][ T3994] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 324.025108][ T3994] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 324.026255][ T3994] </TASK>
>
> Linus
> mm/gup.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index ef29641671c7..c9d799d28de7 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1091,6 +1091,21 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
> return 0;
> }
>
> +static void gup_stack_expansion_warning(const struct vm_area_struct *vma,
> + unsigned long addr)
> +{
> + static volatile unsigned long next_warn;
> + unsigned long now = jiffies, next = next_warn;
> +
> + /* Let's not warn more than once an hour.. */
> + if (next && time_before(now, next))
> + return;
> + next_warn = now + 60*60*HZ;
> + pr_warn("GUP no longer grows the stack %lx-%lx (%lx)\n",
> + vma->vm_start, vma->vm_end, addr);
> + dump_stack();
> +}
> +
> /**
> * __get_user_pages() - pin user pages in memory
> * @mm: mm_struct of target mm
> @@ -1170,7 +1185,8 @@ static long __get_user_pages(struct mm_struct *mm,
> if (!vma || start >= vma->vm_end) {
> vma = find_vma(mm, start);
> if (vma && (start < vma->vm_start)) {
> - WARN_ON_ONCE(vma->vm_flags & VM_GROWSDOWN);
> + if (unlikely(vma->vm_flags & VM_GROWSDOWN))
> + gup_stack_expansion_warning(vma, start);
> vma = NULL;
> }
> if (!vma && in_gate_area(mm, start)) {
On Wed, 5 Jul 2023 at 00:27, John Hubbard <[email protected]> wrote:
>
> So it boils down to this: crashpad is reading from a valid starting
> address, inside the vma for the ld-linux-x86-64.so.2 file, but it reads
> a full 4KB page's worth, which takes it past the end of that vma.
>
> And although the expand_stack() logic is there as part of the
> __access_remote_vm() path, that logic ignores the size of the read! So
> it slips past without trying to expand the stack.
>
> Note that while next vma is indeed the stack, it is 919 GB away--a very
> large gap.
Ok, that's just the warning being a bit too simplistic.
For the case of a accessing past the end of the previous vma, old
kernels wouldn't have expanded the stack either, because not only do
we have a stack size ulimit, but even if you set that to infinity we
leave a guard gap between the previous mapping and the stack and don't
allow them to grow together.
I made the warning be about "any access below the stack" rather than
try to limit it, so your warning is basically a situation where no
actual semantic change has happened, and it's just that the warning
was overly broad.
I'll tighten it up, and switch the WARN_ON_ONCE() to just do a
"dump_stack()" so that it won't cause problems with the syzbot tests
either.
Linus
On Wed, 5 Jul 2023 at 08:54, Linus Torvalds
<[email protected]> wrote:
>
> I'll tighten it up, and switch the WARN_ON_ONCE() to just do a
> "dump_stack()" so that it won't cause problems with the syzbot tests
> either.
I pushed it out. It's based on the thing that Oliver already tested,
but expanded a bit from that (and with a better calling convention, so
that when we're done we can just delete the whole helper function and
replace it with "find_vma()" like it is supposed to just be).
It's lightly tested, but I no longer have any trivial ways to trigger
the warning, so the testing was literally "now it doesn't say anything
at all".
It should still trigger the warning by literally doing some direct-IO
GUP below the stack - but at least the obvious cases of false
positives hopefully don't trigger.
At least until somebody comes up with another obvious case that I
didn't think of ;)
Linus