2017-06-21 02:36:23

by kernel test robot

[permalink] [raw]
Subject: [lkp-robot] [mm] 1be7107fbe: kernel_BUG_at_mm/mmap.c


FYI, we noticed the following commit:

commit: 1be7107fbe18eed3e319a6c3e83c78254b693acb ("mm: larger stack guard gap, between vmas")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-i386 -enable-kvm -smp 2 -m 320M

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+------------------------------------------+------------+------------+
| | 1132d5e7b6 | 1be7107fbe |
+------------------------------------------+------------+------------+
| boot_successes | 5 | 4 |
| boot_failures | 0 | 4 |
| kernel_BUG_at_mm/mmap.c | 0 | 4 |
| invalid_opcode:#[##] | 0 | 4 |
| EIP:unmapped_area_topdown | 0 | 4 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 4 |
+------------------------------------------+------------+------------+



[ 87.792040] kernel BUG at mm/mmap.c:1963!
[ 87.793442] invalid opcode: 0000 [#1] DEBUG_PAGEALLOC
[ 87.794812] Modules linked in:
[ 87.795849] CPU: 0 PID: 424 Comm: trinity-c2 Not tainted 4.12.0-rc5-00285-g1be7107f #1
[ 87.798138] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
[ 87.800657] task: ce6177c0 task.stack: cd0fc000
[ 87.801877] EIP: unmapped_area_topdown+0x14b/0x15c
[ 87.803063] EFLAGS: 00010206 CPU: 0
[ 87.804075] EAX: 00000000 EBX: b5200000 ECX: 00000000 EDX: b4feb000
[ 87.805469] ESI: 00201000 EDI: b4feb000 EBP: cd0fde84 ESP: cd0fde60
[ 87.806872] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[ 87.808182] CR0: 80050033 CR2: 00000004 CR3: 0d098c60 CR4: 000006b0
[ 87.809558] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 87.810919] DR6: fffe0ff0 DR7: 00000400
[ 87.812002] Call Trace:
[ 87.812857] arch_get_unmapped_area_topdown+0x74/0x11f
[ 87.814011] ? arch_get_unmapped_area+0xb4/0xb4
[ 87.815095] get_unmapped_area+0x5b/0xae
[ 87.816103] do_mmap+0xc7/0x2ac
[ 87.817061] vm_mmap_pgoff+0x6b/0x94
[ 87.818080] SYSC_mmap_pgoff+0x13f/0x162
[ 87.819004] SyS_mmap_pgoff+0x1a/0x1c
[ 87.819873] do_int80_syscall_32+0x65/0x79
[ 87.820791] entry_INT80_32+0x2a/0x2a
[ 87.821710] EIP: 0x8090aa2
[ 87.822490] EFLAGS: 00000246 CPU: 0
[ 87.823345] EAX: ffffffda EBX: 00000000 ECX: 00201000 EDX: 00000003
[ 87.824489] ESI: 00000022 EDI: ffffffff EBP: 00000000 ESP: bff1c8c8
[ 87.825650] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[ 87.826735] Code: 31 c9 e8 20 15 fb ff 39 7d ec 5a 76 02 0f 0b 31 d2 6a 00 39 fb 0f 97 c2 b8 c0 db b1 c1 31 c9 e8 03 15 fb ff 39 fb 89 fa 58 76 07 <0f> 0b ba f4 ff ff ff 8d 65 f4 89 d0 5b 5e 5f 5d c3 55 89 e5 56
[ 87.830175] EIP: unmapped_area_topdown+0x14b/0x15c SS:ESP: 0068:cd0fde60
[ 87.831396] ---[ end trace 67da11e70888e7ec ]---


To reproduce:

git clone https://github.com/01org/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
Xiaolong


Attachments:
(No filename) (3.29 kB)
config-4.12.0-rc5-00285-g1be7107f (121.18 kB)
job-script (3.79 kB)
dmesg.xz (10.51 kB)
Download all attachments

2017-06-21 02:41:50

by Hugh Dickins

[permalink] [raw]
Subject: Re: [lkp-robot] [mm] 1be7107fbe: kernel_BUG_at_mm/mmap.c

On Wed, 21 Jun 2017, kernel test robot wrote:

>
> FYI, we noticed the following commit:
>
> commit: 1be7107fbe18eed3e319a6c3e83c78254b693acb ("mm: larger stack guard gap, between vmas")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: trinity
> with following parameters:
>
> runtime: 300s
>
> test-description: Trinity is a linux system call fuzz tester.
> test-url: http://codemonkey.org.uk/projects/trinity/
>
>
> on test machine: qemu-system-i386 -enable-kvm -smp 2 -m 320M
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> +------------------------------------------+------------+------------+
> | | 1132d5e7b6 | 1be7107fbe |
> +------------------------------------------+------------+------------+
> | boot_successes | 5 | 4 |
> | boot_failures | 0 | 4 |
> | kernel_BUG_at_mm/mmap.c | 0 | 4 |
> | invalid_opcode:#[##] | 0 | 4 |
> | EIP:unmapped_area_topdown | 0 | 4 |
> | Kernel_panic-not_syncing:Fatal_exception | 0 | 4 |
> +------------------------------------------+------------+------------+
>
>
>
> [ 87.792040] kernel BUG at mm/mmap.c:1963!
> [ 87.793442] invalid opcode: 0000 [#1] DEBUG_PAGEALLOC
> [ 87.794812] Modules linked in:
> [ 87.795849] CPU: 0 PID: 424 Comm: trinity-c2 Not tainted 4.12.0-rc5-00285-g1be7107f #1
> [ 87.798138] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
> [ 87.800657] task: ce6177c0 task.stack: cd0fc000
> [ 87.801877] EIP: unmapped_area_topdown+0x14b/0x15c
> [ 87.803063] EFLAGS: 00010206 CPU: 0
> [ 87.804075] EAX: 00000000 EBX: b5200000 ECX: 00000000 EDX: b4feb000
> [ 87.805469] ESI: 00201000 EDI: b4feb000 EBP: cd0fde84 ESP: cd0fde60
> [ 87.806872] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
> [ 87.808182] CR0: 80050033 CR2: 00000004 CR3: 0d098c60 CR4: 000006b0
> [ 87.809558] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 87.810919] DR6: fffe0ff0 DR7: 00000400
> [ 87.812002] Call Trace:
> [ 87.812857] arch_get_unmapped_area_topdown+0x74/0x11f
> [ 87.814011] ? arch_get_unmapped_area+0xb4/0xb4
> [ 87.815095] get_unmapped_area+0x5b/0xae
> [ 87.816103] do_mmap+0xc7/0x2ac
> [ 87.817061] vm_mmap_pgoff+0x6b/0x94
> [ 87.818080] SYSC_mmap_pgoff+0x13f/0x162
> [ 87.819004] SyS_mmap_pgoff+0x1a/0x1c
> [ 87.819873] do_int80_syscall_32+0x65/0x79
> [ 87.820791] entry_INT80_32+0x2a/0x2a
> [ 87.821710] EIP: 0x8090aa2
> [ 87.822490] EFLAGS: 00000246 CPU: 0
> [ 87.823345] EAX: ffffffda EBX: 00000000 ECX: 00201000 EDX: 00000003
> [ 87.824489] ESI: 00000022 EDI: ffffffff EBP: 00000000 ESP: bff1c8c8
> [ 87.825650] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
> [ 87.826735] Code: 31 c9 e8 20 15 fb ff 39 7d ec 5a 76 02 0f 0b 31 d2 6a 00 39 fb 0f 97 c2 b8 c0 db b1 c1 31 c9 e8 03 15 fb ff 39 fb 89 fa 58 76 07 <0f> 0b ba f4 ff ff ff 8d 65 f4 89 d0 5b 5e 5f 5d c3 55 89 e5 56
> [ 87.830175] EIP: unmapped_area_topdown+0x14b/0x15c SS:ESP: 0068:cd0fde60
> [ 87.831396] ---[ end trace 67da11e70888e7ec ]---
>
>
> To reproduce:
>
> git clone https://github.com/01org/lkp-tests.git
> cd lkp-tests
> bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email

Thanks for the report: yes, this is the same one as Dave Jones
found yesterday, which is fixed by this patch posted last night:

[PATCH] mm: fix new crash in unmapped_area_topdown()

Trinity gets kernel BUG at mm/mmap.c:1963! in about 3 minutes of
mmap testing. That's the VM_BUG_ON(gap_end < gap_start) at the
end of unmapped_area_topdown(). Linus points out how MAP_FIXED
(which does not have to respect our stack guard gap intentions)
could result in gap_end below gap_start there. Fix that, and
the similar case in its alternative, unmapped_area().

Fixes: 1be7107fbe18 ("mm: larger stack guard gap, between vmas")
Reported-by: Dave Jones <[email protected]>
Debugged-by: Linus Torvalds <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
---

mm/mmap.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- 4.12-rc6/mm/mmap.c 2017-06-19 09:06:10.035407505 -0700
+++ linux/mm/mmap.c 2017-06-19 21:09:28.616707311 -0700
@@ -1817,7 +1817,8 @@ unsigned long unmapped_area(struct vm_un
/* Check if current node has a suitable gap */
if (gap_start > high_limit)
return -ENOMEM;
- if (gap_end >= low_limit && gap_end - gap_start >= length)
+ if (gap_end >= low_limit &&
+ gap_end > gap_start && gap_end - gap_start >= length)
goto found;

/* Visit right subtree if it looks promising */
@@ -1920,7 +1921,8 @@ unsigned long unmapped_area_topdown(stru
gap_end = vm_start_gap(vma);
if (gap_end < low_limit)
return -ENOMEM;
- if (gap_start <= high_limit && gap_end - gap_start >= length)
+ if (gap_start <= high_limit &&
+ gap_end > gap_start && gap_end - gap_start >= length)
goto found;

/* Visit left subtree if it looks promising */

2017-06-21 18:29:38

by Linus Torvalds

[permalink] [raw]
Subject: Re: [lkp-robot] [mm] 1be7107fbe: kernel_BUG_at_mm/mmap.c

On Tue, Jun 20, 2017 at 7:41 PM, Hugh Dickins <[email protected]> wrote:
> On Wed, 21 Jun 2017, kernel test robot wrote:
>
> Thanks for the report: yes, this is the same one as Dave Jones
> found yesterday, which is fixed by this patch posted last night:

.. and that patch is in current -git now, so hopefully we're all good.

Hugh, Michal - I also merged Helge's drop-up cleanup, is there
anything I've missed? I think Oleg had something, but I can't recall
right now, and I might just have missed it.

Linus

2017-06-21 19:33:44

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [lkp-robot] [mm] 1be7107fbe: kernel_BUG_at_mm/mmap.c

On 06/21, Linus Torvalds wrote:
>
> Hugh, Michal - I also merged Helge's drop-up cleanup, is there
> anything I've missed? I think Oleg had something, but I can't recall
> right now, and I might just have missed it.

Well, I meant, perhaps we need a bit more changes to ensure that a new
GROWSDOWN vma can't come without a gap below. But this is really minor,
we can do this later even if I am right.

However, there is another regression reported by Cyrill. Fixed by the
patch below.

And yes, I think this check should either go away, or we need to make
it more clever.

In short, the vma created by mmap(MAP_GROWSDOWN) does not grow down
automatically, because of this check.

This worked before, because with the stack guard page at ->vm_start
__do_page_fault() hits this expand-stack path only if the stack grows
by more than PAGE_SIZE, now it is called every time. I'll send the
patch tomorrow if nobody else does this before.

Oleg.
---

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 8ad91a0..edc5d68 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1416,7 +1416,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
* and pusha to work. ("enter $65535, $31" pushes
* 32 pointers and then decrements %sp by 65535.)
*/
- if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) {
+if (0) if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) {
bad_area(regs, error_code, address);
return;
}

2017-06-21 19:39:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: [lkp-robot] [mm] 1be7107fbe: kernel_BUG_at_mm/mmap.c

On Wed, Jun 21, 2017 at 12:33 PM, Oleg Nesterov <[email protected]> wrote:
> - if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) {
> +if (0) if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) {

This smells bad.

That test is not about grow-down or even the guard page. That test is
that it's always wrong to grow down the stack below %esp.

Except we allow some slop, because certain instructions take the page
fault before actually updating %rsp.

So that patch is not correct. We want a page fault (and *not* expand
the stack) if somebody accesses below the stack pointer.

If we had a regression, it's due to something else.

Linus

2017-06-21 19:40:03

by Hugh Dickins

[permalink] [raw]
Subject: Re: [lkp-robot] [mm] 1be7107fbe: kernel_BUG_at_mm/mmap.c

On Wed, 21 Jun 2017, Linus Torvalds wrote:
> On Tue, Jun 20, 2017 at 7:41 PM, Hugh Dickins <[email protected]> wrote:
> > On Wed, 21 Jun 2017, kernel test robot wrote:
> >
> > Thanks for the report: yes, this is the same one as Dave Jones
> > found yesterday, which is fixed by this patch posted last night:
>
> .. and that patch is in current -git now, so hopefully we're all good.
>
> Hugh, Michal - I also merged Helge's drop-up cleanup, is there
> anything I've missed? I think Oleg had something, but I can't recall
> right now, and I might just have missed it.

No, the tree looks good now, thanks. You're right that Oleg raised a
concern about how mmap(MAP_GROWSDOWN) behaves, and is wondering about
how to make it more sensible: but that's neither a vulnerability nor
a regression - he agreed minor, to think about another time. He's
also been fielding a concern from the CRIU people, we're hoping that's
just something they need to fix at their end - ah, no, I see Oleg has
just replied ahead of me, not digested yet, I'll look into that later...

Hugh

2017-06-21 20:27:56

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [lkp-robot] [mm] 1be7107fbe: kernel_BUG_at_mm/mmap.c

On 06/21, Linus Torvalds wrote:
>
> On Wed, Jun 21, 2017 at 12:33 PM, Oleg Nesterov <[email protected]> wrote:
> > - if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) {
> > +if (0) if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) {
>
> This smells bad.

Yes.

> That test is not about grow-down or even the guard page. That test is
> that it's always wrong to grow down the stack below %esp.

Sure.

but let me repeat that this test was essentially dismissed when the stack
guard page was introduced. Simply because do_page_pault() never hits (before
the recent patch) this need-to-grow-VM_GROWSDOWN-vma path if the stack grows
by less than PAGE_SIZE.

IOP. Suppose that an application does

char * p = mmap(MAP_GROWSDOWN);
for (;;)
*p-- = 'x';


before the "larger stack guard gap, between vmas" change the stack was
enlarged by do_anonymous_page(), __do_page_fault() didn't hit this path.

Now __do_page_fault() tries to expand the stack itself, and this check
fails.

Oleg.

2017-06-21 20:30:31

by Linus Torvalds

[permalink] [raw]
Subject: Re: [lkp-robot] [mm] 1be7107fbe: kernel_BUG_at_mm/mmap.c

On Wed, Jun 21, 2017 at 1:27 PM, Oleg Nesterov <[email protected]> wrote:
>
> Now __do_page_fault() tries to expand the stack itself, and this check
> fails.

But we want that check to trigger and cause the access to fail.
Accessing the stack below the stack pointer is wrong.

Do you have a pointer to the report for this regression? I must have missed it.

Linus

2017-06-21 20:56:22

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [lkp-robot] [mm] 1be7107fbe: kernel_BUG_at_mm/mmap.c

On 06/21, Linus Torvalds wrote:
>
> On Wed, Jun 21, 2017 at 1:27 PM, Oleg Nesterov <[email protected]> wrote:
> >
> > Now __do_page_fault() tries to expand the stack itself, and this check
> > fails.
>
> But we want that check to trigger and cause the access to fail.
> Accessing the stack below the stack pointer is wrong.

I understand. My point is that this check was invalidated by stack-guard-page
a long ago, and this means that we add the user-visible change now.

> Do you have a pointer to the report for this regression? I must have missed it.

See http://marc.info/?t=149794523000001&r=1&w=2

this thread is a bit confusing, the most relevant emails are

http://marc.info/?l=linux-kernel&m=149806256621440&w=2 (with test case)
http://marc.info/?l=linux-kernel&m=149806452322233&w=2

Actually, I got another (redhat internal) bug report after this discussion,
and at first glance this is the same thing.

Just in case, I agree that mmap(MAP_GROWSDOWN) is mostly useless, perhaps
we do not even care. But still this is regression.

Oleg.

2017-06-21 22:19:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: [lkp-robot] [mm] 1be7107fbe: kernel_BUG_at_mm/mmap.c

On Wed, Jun 21, 2017 at 1:56 PM, Oleg Nesterov <[email protected]> wrote:
>
> I understand. My point is that this check was invalidated by stack-guard-page
> a long ago, and this means that we add the user-visible change now.

Yeah. I guess we could consider it an *old* regression that got fixed,
but if people started relying on the regression...

>> Do you have a pointer to the report for this regression? I must have missed it.
>
> See http://marc.info/?t=149794523000001&r=1&w=2

Ok.

And thinking about it, while that is a silly test-case, the notion of
"create top-down segment, then start populating it _before_ moving the
stack pointer into it" is actually perfectly valid.

So I guess checking against the stack pointer is wrong in that case -
at least if the stack pointer isn't inside that vma to begin with.

So yes, removing that check looks like the right thing to do for now.

Do you want to send me the patch if you already have a commit message etc?

Linus

2017-06-22 01:07:40

by Hugh Dickins

[permalink] [raw]
Subject: Re: [lkp-robot] [mm] 1be7107fbe: kernel_BUG_at_mm/mmap.c

On Wed, 21 Jun 2017, Linus Torvalds wrote:
> On Wed, Jun 21, 2017 at 1:56 PM, Oleg Nesterov <[email protected]> wrote:
> >
> > I understand. My point is that this check was invalidated by stack-guard-page
> > a long ago, and this means that we add the user-visible change now.
>
> Yeah. I guess we could consider it an *old* regression that got fixed,
> but if people started relying on the regression...
>
> >> Do you have a pointer to the report for this regression? I must have missed it.
> >
> > See http://marc.info/?t=149794523000001&r=1&w=2
>
> Ok.
>
> And thinking about it, while that is a silly test-case, the notion of
> "create top-down segment, then start populating it _before_ moving the
> stack pointer into it" is actually perfectly valid.
>
> So I guess checking against the stack pointer is wrong in that case -
> at least if the stack pointer isn't inside that vma to begin with.
>
> So yes, removing that check looks like the right thing to do for now.
>
> Do you want to send me the patch if you already have a commit message etc?

I have a bit of a bad feeling about this.

Perhaps it's just sentimental attachment to all those weird
and ancient stack pointer checks in arch/<some>/fault.c.

We have been inconsistent: cris frv m32r m68k microblaze mn10300
openrisc powerpc tile um x86 have such checks, the others don't.
So that's a good reason to delete them.

But at least at the moment those checks impose some sanity:
just a page less than we had imagined for several years.
Once we remove them, they cannot go back. Should we now
complicate them with an extra page of slop?

I'm not entirely persuaded by your pre-population argument:
it's perfectly possible to prepare a MAP_GROWSDOWN area with
an initial size, that's populated in a normal way, before handing
off for stack expansion - isn't it?

I'd be interested to hear more about that (redhat internal) bug
report that Oleg mentions: whether it gives stronger grounds for
making this sudden change than the CRIU testcase.

I can go ahead and create a patch if Oleg is not there at the
moment - but I might prefer his or your name on it - particularly
if we're rushing it in before consulting the arch maintainers
whose work we would be deleting.

Queasily,
Hugh

2017-06-22 04:23:31

by Hugh Dickins

[permalink] [raw]
Subject: Re: [lkp-robot] [mm] 1be7107fbe: kernel_BUG_at_mm/mmap.c

On Wed, 21 Jun 2017, Oleg Nesterov wrote:
> On 06/21, Linus Torvalds wrote:
> >
> > Hugh, Michal - I also merged Helge's drop-up cleanup, is there
> > anything I've missed? I think Oleg had something, but I can't recall
> > right now, and I might just have missed it.
>
> Well, I meant, perhaps we need a bit more changes to ensure that a new
> GROWSDOWN vma can't come without a gap below. But this is really minor,
> we can do this later even if I am right.

I'm mortified. At last I understand you, and see that you spelt it out
perfectly clearly in your "unmapped_gap" mail earlier in another thread,
when I was in a rush to prioritize what bugs needed looking at first,
and brusquely persuaded you to say that this is only a minor defect.

Well, yes, it's not a new vulnerability and it's not a new regression,
and the users of MAP_GROWSDOWN are few and far between, and often the
assignment of holes will work out just fine; but it's an embarrassing
oversight on my part, that everything was geared to inflating the
size of the existing VM_GROWS vmas, completely forgetting to inflate
the size of the area to be added.

Without reading you thoroughly (and all the fault mine not yours),
I had thought you were referring to the way that a MAP_GROWSDOWN
area may be assigned a place in which the stack_guard_gap would
immediately prevent it from growing down afterwards (and I'm not
sure what to do about that). But your point is that it may be
assigned a place in which there is not even a stack_guard_gap
below it. (And so the bug that trinity found would not even
depend upon MAP_FIXED.)

I'll go back to read you again and think on the best way to
correct that, I hope it won't need lots of plumbing through
different levels.

Hugh

2017-06-22 10:58:55

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [lkp-robot] [mm] 1be7107fbe: kernel_BUG_at_mm/mmap.c

On 06/22/2017 04:07 AM, Hugh Dickins wrote:
> On Wed, 21 Jun 2017, Linus Torvalds wrote:
>> On Wed, Jun 21, 2017 at 1:56 PM, Oleg Nesterov <[email protected]> wrote:
>>>
>>> I understand. My point is that this check was invalidated by stack-guard-page
>>> a long ago, and this means that we add the user-visible change now.
>>
>> Yeah. I guess we could consider it an *old* regression that got fixed,
>> but if people started relying on the regression...
>>
>>>> Do you have a pointer to the report for this regression? I must have missed it.
>>>
>>> See http://marc.info/?t=149794523000001&r=1&w=2
>>
>> Ok.
>>
>> And thinking about it, while that is a silly test-case, the notion of
>> "create top-down segment, then start populating it _before_ moving the
>> stack pointer into it" is actually perfectly valid.
>>
>> So I guess checking against the stack pointer is wrong in that case -
>> at least if the stack pointer isn't inside that vma to begin with.
>>
>> So yes, removing that check looks like the right thing to do for now.
>>
>> Do you want to send me the patch if you already have a commit message etc?
>
> I have a bit of a bad feeling about this.
>
> Perhaps it's just sentimental attachment to all those weird
> and ancient stack pointer checks in arch/<some>/fault.c.
>
> We have been inconsistent: cris frv m32r m68k microblaze mn10300
> openrisc powerpc tile um x86 have such checks, the others don't.
> So that's a good reason to delete them.
>
> But at least at the moment those checks impose some sanity:
> just a page less than we had imagined for several years.
> Once we remove them, they cannot go back. Should we now
> complicate them with an extra page of slop?
>
> I'm not entirely persuaded by your pre-population argument:
> it's perfectly possible to prepare a MAP_GROWSDOWN area with
> an initial size, that's populated in a normal way, before handing
> off for stack expansion - isn't it?
>
> I'd be interested to hear more about that (redhat internal) bug
> report that Oleg mentions: whether it gives stronger grounds for
> making this sudden change than the CRIU testcase.

Well, if all the deal is in CRIU testcase - it can be easily reworked.
The question - will it break anything else?

Maybe it's better to disable this check on the release and enable it
back for v4.13 kernel, so if it'll break some user-space, it'll be
caught on linux-next.

>
> I can go ahead and create a patch if Oleg is not there at the
> moment - but I might prefer his or your name on it - particularly
> if we're rushing it in before consulting the arch maintainers
> whose work we would be deleting.
>
> Queasily,
> Hugh
>

--
Dmitry

2017-06-22 15:16:30

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [lkp-robot] [mm] 1be7107fbe: kernel_BUG_at_mm/mmap.c

On 06/21, Hugh Dickins wrote:
>
> On Wed, 21 Jun 2017, Linus Torvalds wrote:
> > On Wed, Jun 21, 2017 at 1:56 PM, Oleg Nesterov <[email protected]> wrote:
> > >
> > > I understand. My point is that this check was invalidated by stack-guard-page
> > > a long ago, and this means that we add the user-visible change now.
> >
> > Yeah. I guess we could consider it an *old* regression that got fixed,
> > but if people started relying on the regression...
> >
> > >> Do you have a pointer to the report for this regression? I must have missed it.
> > >
> > > See http://marc.info/?t=149794523000001&r=1&w=2
> >
> > Ok.
> >
> > And thinking about it, while that is a silly test-case, the notion of
> > "create top-down segment, then start populating it _before_ moving the
> > stack pointer into it" is actually perfectly valid.
> >
> > So I guess checking against the stack pointer is wrong in that case -
> > at least if the stack pointer isn't inside that vma to begin with.
> >
> > So yes, removing that check looks like the right thing to do for now.
> >
> > Do you want to send me the patch if you already have a commit message etc?
>
> I have a bit of a bad feeling about this.
>
> Perhaps it's just sentimental attachment to all those weird
> and ancient stack pointer checks in arch/<some>/fault.c.
>
> We have been inconsistent: cris frv m32r m68k microblaze mn10300
> openrisc powerpc tile um x86 have such checks, the others don't.
> So that's a good reason to delete them.

OK, I didn't bother to check other acrhitectures, thanks...

> But at least at the moment those checks impose some sanity:
> just a page less than we had imagined for several years.
> Once we remove them, they cannot go back. Should we now
> complicate them with an extra page of slop?

Something like the patch below? Yes, I thought about this too.

I simply do not know. Honestly, I do not even know why MAP_GROWSDOWN
exists. I mean, I do not understand how user-space can actually use it
to get auto-growing, the usage of MAP_GROWSDOWN in (say) criu is clear.
The main thread's stack can grow, but this is only because it is placed
at the right place, above mm->mmap_base in case of top-down layout.

> I'm not entirely persuaded by your pre-population argument:
> it's perfectly possible to prepare a MAP_GROWSDOWN area with
> an initial size, that's populated in a normal way, before handing
> off for stack expansion - isn't it?

Exactly.

> I'd be interested to hear more about that (redhat internal) bug
> report that Oleg mentions: whether it gives stronger grounds for
> making this sudden change than the CRIU testcase.

Probably not. Well, the customer reported multiple problems, but most
of them were caused by rhel-specific bugs. As for "MAP_GROWSDOWN does
not grow", most probably this was another test-case, not the real
application. I will ask and report back if this is not true.

In short, I agree with any decision. Even with "we do not care if we
break some artificial test-cases".

Oleg.
---

--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1409,7 +1409,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
bad_area(regs, error_code, address);
return;
}
- if (error_code & PF_USER) {
+ if ((error_code & PF_USER) && (address + PAGE_SIZE < vma->vm_start)) {
/*
* Accessing the stack below %sp is always a bug.
* The large cushion allows instructions like enter

2017-06-22 18:04:29

by Hugh Dickins

[permalink] [raw]
Subject: Re: [lkp-robot] [mm] 1be7107fbe: kernel_BUG_at_mm/mmap.c

On Thu, 22 Jun 2017, Oleg Nesterov wrote:
> On 06/21, Hugh Dickins wrote:
> >
> > On Wed, 21 Jun 2017, Linus Torvalds wrote:
> > > On Wed, Jun 21, 2017 at 1:56 PM, Oleg Nesterov <[email protected]> wrote:
> > > >
> > > > I understand. My point is that this check was invalidated by stack-guard-page
> > > > a long ago, and this means that we add the user-visible change now.
> > >
> > > Yeah. I guess we could consider it an *old* regression that got fixed,
> > > but if people started relying on the regression...
> > >
> > > >> Do you have a pointer to the report for this regression? I must have missed it.
> > > >
> > > > See http://marc.info/?t=149794523000001&r=1&w=2
> > >
> > > Ok.
> > >
> > > And thinking about it, while that is a silly test-case, the notion of
> > > "create top-down segment, then start populating it _before_ moving the
> > > stack pointer into it" is actually perfectly valid.
> > >
> > > So I guess checking against the stack pointer is wrong in that case -
> > > at least if the stack pointer isn't inside that vma to begin with.
> > >
> > > So yes, removing that check looks like the right thing to do for now.
> > >
> > > Do you want to send me the patch if you already have a commit message etc?
> >
> > I have a bit of a bad feeling about this.
> >
> > Perhaps it's just sentimental attachment to all those weird
> > and ancient stack pointer checks in arch/<some>/fault.c.
> >
> > We have been inconsistent: cris frv m32r m68k microblaze mn10300
> > openrisc powerpc tile um x86 have such checks, the others don't.
> > So that's a good reason to delete them.
>
> OK, I didn't bother to check other acrhitectures, thanks...
>
> > But at least at the moment those checks impose some sanity:
> > just a page less than we had imagined for several years.
> > Once we remove them, they cannot go back. Should we now
> > complicate them with an extra page of slop?
>
> Something like the patch below? Yes, I thought about this too.

Yes, that patch (times 11 for all the architectures) would be a good
conservative choice: imposing the traditional sanity check, but
weakened by one page to match what we've inadvertently been doing
for the last four years.

Would deserve a comment (since it makes no sense in any tree by
itself), but unfair to ask you to write that: I must get this mail
off before a meeting, can't think what to say now.

But my own preference this morning is to do nothing, until we hear
more complaints and can classify them as genuine userspace breakage,
as opposed to testcases surprised by a new kernel implementation.

Hugh

>
> I simply do not know. Honestly, I do not even know why MAP_GROWSDOWN
> exists. I mean, I do not understand how user-space can actually use it
> to get auto-growing, the usage of MAP_GROWSDOWN in (say) criu is clear.
> The main thread's stack can grow, but this is only because it is placed
> at the right place, above mm->mmap_base in case of top-down layout.
>
> > I'm not entirely persuaded by your pre-population argument:
> > it's perfectly possible to prepare a MAP_GROWSDOWN area with
> > an initial size, that's populated in a normal way, before handing
> > off for stack expansion - isn't it?
>
> Exactly.
>
> > I'd be interested to hear more about that (redhat internal) bug
> > report that Oleg mentions: whether it gives stronger grounds for
> > making this sudden change than the CRIU testcase.
>
> Probably not. Well, the customer reported multiple problems, but most
> of them were caused by rhel-specific bugs. As for "MAP_GROWSDOWN does
> not grow", most probably this was another test-case, not the real
> application. I will ask and report back if this is not true.
>
> In short, I agree with any decision. Even with "we do not care if we
> break some artificial test-cases".
>
> Oleg.
> ---
>
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1409,7 +1409,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
> bad_area(regs, error_code, address);
> return;
> }
> - if (error_code & PF_USER) {
> + if ((error_code & PF_USER) && (address + PAGE_SIZE < vma->vm_start)) {
> /*
> * Accessing the stack below %sp is always a bug.
> * The large cushion allows instructions like enter

2017-06-22 20:51:26

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [lkp-robot] [mm] 1be7107fbe: kernel_BUG_at_mm/mmap.c

On 06/22, Hugh Dickins wrote:
>
> On Thu, 22 Jun 2017, Oleg Nesterov wrote:
> >
> > Something like the patch below? Yes, I thought about this too.
>
> Yes, that patch (times 11 for all the architectures)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Yes, yes, this is clear to me. But only after you have already explained
this in your previous email ;)

> But my own preference this morning is to do nothing, until we hear
> more complaints and can classify them as genuine userspace breakage,
> as opposed to testcases surprised by a new kernel implementation.

OK. Agreed. Lets wait for the "real" bug report.

FYI. I am still investigating that redhat internal bug report. And yes,
it was the real application. But. I still think that it fails by another
reason, just the test-case they provided doesn't match the reality and
it hits another (this) problem by accident.

Oleg.