LinuxLists.cc - Re: [PATCH] mm: larger stack guard gap, between vmas

2017-07-04 10:51:37

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH] mm: larger stack guard gap, between vmas

On Tue 04-07-17 12:46:52, Michal Hocko wrote:
[...]
> Tested with the attached program.

Err, attached now for real.
--
Michal Hocko
SUSE Labs

Attachments:

(No filename) (144.00 B)
stack_crash.c (2.41 kB)
Download all attachments

2017-07-04 11:36:43

On Wed, Jul 05, 2017 at 01:21:54PM +0100, Ben Hutchings wrote:
> On Wed, 2017-07-05 at 10:14 +0200, Willy Tarreau wrote:
> > On Wed, Jul 05, 2017 at 08:36:46AM +0200, Michal Hocko wrote:
> > > PROT_NONE would explicitly fault but we would simply
> > > run over this mapping too easily and who knows what might end up below
> > > it. So to me the guard gap does its job here.
> >
> > I tend to think that applications that implement their own stack guard
> > using PROT_NONE also assume that they will never perfom unchecked stack
> > allocations larger than their own guard, thus the condition above should
> > never happen. Otherwise they're bogus and/or vulnerable by design and it
> > is their responsibility to fix it.
> >
> > Thus maybe if that helps we could even relax some of the stack guard
> > checks as soon as we meet a PROT_NONE area, allowing VMAs to be tightly
> > packed if the application knows what it's doing. That wouldn't solve
> > the libreoffice issue though, given the lower page is RWX.
>
> How about, instead of looking at permissions, we remember whether vmas
> were allocated with MAP_FIXED and ignore those when evaluating the gap?

I like this idea. It leaves complete control to the application. Our
usual principle of letting people shoot themselves in the foot if they
insist on doing so.

Do you think something like this could work (not even build tested) ?

Willy
--

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d16f524..8ad7f40 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -90,6 +90,7 @@
#define VM_PFNMAP 0x00000400 /* Page-ranges managed without "struct page", just pure PFN */
#define VM_DENYWRITE 0x00000800 /* ETXTBSY on write attempts.. */

+#define VM_FIXED 0x00001000 /* MAP_FIXED was used */
#define VM_LOCKED 0x00002000
#define VM_IO 0x00004000 /* Memory mapped I/O or similar */

diff --git a/include/linux/mman.h b/include/linux/mman.h
index 9aa863d..4df2659 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -79,6 +79,7 @@ static inline int arch_validate_prot(unsigned long prot)
{
return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
_calc_vm_trans(flags, MAP_DENYWRITE, VM_DENYWRITE ) |
+ _calc_vm_trans(flags, MAP_FIXED, VM_FIXED ) |
_calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED );
}
#endif /* _LINUX_MMAN_H */
diff --git a/mm/mmap.c b/mm/mmap.c
index 3c4e4d7..b612868 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2145,7 +2145,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)

next = vma->vm_next;
if (next && next->vm_start < gap_addr) {
- if (!(next->vm_flags & VM_GROWSUP))
+ if (!(next->vm_flags & (VM_GROWSUP|VM_FIXED)))
return -ENOMEM;
/* Check that both stack segments have the same anon_vma? */
}
@@ -2225,7 +2225,7 @@ int expand_downwards(struct vm_area_struct *vma,
return -ENOMEM;
prev = vma->vm_prev;
if (prev && prev->vm_end > gap_addr) {
- if (!(prev->vm_flags & VM_GROWSDOWN))
+ if (!(prev->vm_flags & (VM_GROWSDOWN|VM_FIXED)))
return -ENOMEM;
/* Check that both stack segments have the same anon_vma? */
}

2017-07-05 14:17:59

by Solar Designer

[permalink] [raw]

Subject: Re: [vs-plain] Re: [PATCH] mm: larger stack guard gap, between vmas

Hi all,

On Tue, Jul 04, 2017 at 07:16:06PM -0600, [email protected] wrote:
> This issue occurs post stackguard patches correct? Fixing it sounds like
> this might go beyond hardening and into CVE territory.

Since this thread is public on LKML, as it should be, it's no longer
valid to be CC'ed to linux-distros, which is for handling of embargoed
issues only. So please drop linux-distros from further CC's (I moved
linux-distros to Bcc on this reply, just so they know what happened).

If specific security issues are identified (such as with LibreOffice and
Java), then ideally those should be posted to oss-security as separate
reports. I'd appreciate it if anyone takes care of that (regardless of
CVE worthiness).

In fact, I already mentioned this thread in:

http://www.openwall.com/lists/oss-security/2017/07/05/11

Thank you!

Alexander

2017-07-05 14:20:09

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH] mm: larger stack guard gap, between vmas

On Wed 05-07-17 13:21:54, Ben Hutchings wrote:
> On Wed, 2017-07-05 at 10:14 +0200, Willy Tarreau wrote:
> > On Wed, Jul 05, 2017 at 08:36:46AM +0200, Michal Hocko wrote:
> > > PROT_NONE would explicitly fault but we would simply
> > > run over this mapping too easily and who knows what might end up below
> > > it. So to me the guard gap does its job here.
> >
> > I tend to think that applications that implement their own stack guard
> > using PROT_NONE also assume that they will never perfom unchecked stack
> > allocations larger than their own guard, thus the condition above should
> > never happen. Otherwise they're bogus and/or vulnerable by design and it
> > is their responsibility to fix it.
> >
> > Thus maybe if that helps we could even relax some of the stack guard
> > checks as soon as we meet a PROT_NONE area, allowing VMAs to be tightly
> > packed if the application knows what it's doing. That wouldn't solve
> > the libreoffice issue though, given the lower page is RWX.
>
> How about, instead of looking at permissions, we remember whether vmas
> were allocated with MAP_FIXED and ignore those when evaluating the gap?

To be honest I really hate this. The same way as any other heuristics
where we try to guess the gap which will not fault to let userspace
know something is wrong. And the Java example just proves the point
AFAIU. The mapping we clash on is _not_ a gap. It is a real mapping we
should rather not scribble over. It contains a code to execute and that
is even more worrying. So I guess the _only_ sane way forward for this
case is to reduce stack gap for the particular code.
--
Michal Hocko
SUSE Labs

2017-07-05 14:24:03

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH] mm: larger stack guard gap, between vmas

On Wed 05-07-17 13:19:40, Ben Hutchings wrote:
> On Tue, 2017-07-04 at 16:31 -0700, Linus Torvalds wrote:
> > On Tue, Jul 4, 2017 at 4:01 PM, Ben Hutchings <[email protected]>
> > wrote:
> > >
> > > We have:
> > >
> > > bottom = 0xff803fff
> > > sp =?????0xffffb178
> > >
> > > The relevant mappings are:
> > >
> > > ff7fc000-ff7fd000 rwxp 00000000 00:00 0
> > > fffdd000-ffffe000 rw-p 00000000 00:00
> > > 0??????????????????????????????????[stack]
> >
> > Ugh. So that stack is actually 8MB in size, but the alloca() is about
> > to use up almost all of it, and there's only about 28kB left between
> > "bottom" and that 'rwx' mapping.
> >
> > Still, that rwx mapping is interesting: it is a single page, and it
> > really is almost exactly 8MB below the stack.
> >
> > In fact, the top of stack (at 0xffffe000) is *exactly* 8MB+4kB from
> > the top of that odd one-page allocation (0xff7fd000).
> >
> > Can you find out where that is allocated? Perhaps a breakpoint on
> > mmap, with a condition to catch that particular one?
> [...]
>
> Found it, and it's now clear why only i386 is affected:
> http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/os/linux/vm/os_linux.cpp#l4852
> http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/os_cpu/linux_x86/vm/os_linux_x86.cpp#l881

This is really worrying. This doesn't look like a gap at all. It is a
mapping which actually contains a code and so we should absolutely not
allow to scribble over it. So I am afraid the only way forward is to
allow per process stack gap and run this particular program to have a
smaller gap. We basically have two ways. Either /proc/<pid>/$file or
a prctl inherited on exec. The later is a smaller code. What do you
think?
--
Michal Hocko
SUSE Labs

2017-07-05 15:25:54

On Wed, Jul 05, 2017 at 04:25:00PM +0100, Ben Hutchings wrote:
[...]
> Soemthing I noticed is that Java doesn't immediately use MAP_FIXED.
> Look at os::pd_attempt_reserve_memory_at(). If the first, hinted,
> mmap() doesn't return the hinted address it then attempts to allocate
> huge areas (I'm not sure how intentional this is) and unmaps the
> unwanted parts. Then os::workaround_expand_exec_shield_cs_limit() re-
> mmap()s the wanted part with MAP_FIXED. If this fails at any point it
> is not a fatal error.
>
> So if we change vm_start_gap() to take the stack limit into account
> (when it's finite) that should neutralise
> os::workaround_expand_exec_shield_cs_limit(). I'll try this.

I ended up with the following two patches, which seem to deal with
both the Java and Rust regressions. These don't touch the
stack-grows-up paths at all because Rust doesn't run on those
architectures and the Java weirdness is i386-specific.

They definitely need longer commit messages and comments, but aside
from that do these look reasonable?

Ben.

Subject: [1/2] mmap: Skip a single VM_NONE mapping when checking the stack gap

Signed-off-by: Ben Hutchings <[email protected]>
---
mm/mmap.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index a5e3dcd75e79..c7906ae1a7a1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2323,11 +2323,16 @@ int expand_downwards(struct vm_area_struct *vma,
if (error)
return error;

- /* Enforce stack_guard_gap */
+ /*
+ * Enforce stack_guard_gap. Some applications allocate a VM_NONE
+ * mapping just below the stack, which we can safely ignore.
+ */
gap_addr = address - stack_guard_gap;
if (gap_addr > address)
return -ENOMEM;
prev = vma->vm_prev;
+ if (prev && !(prev->vm_flags & (VM_READ | VM_WRITE | VM_EXEC)))
+ prev = prev->vm_prev;
if (prev && prev->vm_end > gap_addr) {
if (!(prev->vm_flags & VM_GROWSDOWN))
return -ENOMEM;

Subject: [2/2] mmap: Avoid mapping anywhere within the full stack extent
if finite

Signed-off-by: Ben Hutchings <[email protected]>
---
include/linux/mm.h | 9 ++++-----
mm/mmap.c | 19 +++++++++++++++++++
2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6f543a47fc92..2240a0505072 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2223,15 +2223,14 @@ static inline struct vm_area_struct * find_vma_intersection(struct mm_struct * m
return vma;
}

+unsigned long __vm_start_gap(struct vm_area_struct *vma);
+
static inline unsigned long vm_start_gap(struct vm_area_struct *vma)
{
unsigned long vm_start = vma->vm_start;

- if (vma->vm_flags & VM_GROWSDOWN) {
- vm_start -= stack_guard_gap;
- if (vm_start > vma->vm_start)
- vm_start = 0;
- }
+ if (vma->vm_flags & VM_GROWSDOWN)
+ vm_start = __vm_start_gap(vma);
return vm_start;
}

diff --git a/mm/mmap.c b/mm/mmap.c
index c7906ae1a7a1..f8131a94e56e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2307,6 +2307,25 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
}
#endif /* CONFIG_STACK_GROWSUP || CONFIG_IA64 */

+unsigned long __vm_start_gap(struct vm_area_struct *vma)
+{
+ unsigned long stack_limit =
+ current->signal->rlim[RLIMIT_STACK].rlim_cur;
+ unsigned long vm_start;
+
+ if (stack_limit != RLIM_INFINITY &&
+ vma->vm_end - vma->vm_start < stack_limit)
+ vm_start = vma->vm_end - PAGE_ALIGN(stack_limit);
+ else
+ vm_start = vma->vm_start;
+
+ vm_start -= stack_guard_gap;
+ if (vm_start > vma->vm_start)
+ vm_start = 0;
+
+ return vm_start;
+}
+
/*
* vma is the first one with address < vma->vm_start. Have to extend vma.
*/

--
Ben Hutchings
For every complex problem
there is a solution that is simple, neat, and wrong.

Attachments:

(No filename) (3.69 kB)
signature.asc (811.00 B)
Digital signature Download all attachments

2017-07-05 17:05:59

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH] mm: larger stack guard gap, between vmas

On Wed 05-07-17 17:58:45, Ben Hutchings wrote:
[...]
> diff --git a/mm/mmap.c b/mm/mmap.c
> index c7906ae1a7a1..f8131a94e56e 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2307,6 +2307,25 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
> }
> #endif /* CONFIG_STACK_GROWSUP || CONFIG_IA64 */
>
> +unsigned long __vm_start_gap(struct vm_area_struct *vma)
> +{
> + unsigned long stack_limit =
> + current->signal->rlim[RLIMIT_STACK].rlim_cur;
> + unsigned long vm_start;
> +
> + if (stack_limit != RLIM_INFINITY &&
> + vma->vm_end - vma->vm_start < stack_limit)
> + vm_start = vma->vm_end - PAGE_ALIGN(stack_limit);

This is exactly what I was worried about in my previous email. Say
somebody sets stack ulimit to 1G or so. Should we reduce the available
address space that much? Say you are 32b and you have an application
with multiple stacks each doing its MAP_GROWSDOWN. You are quickly out
of address space. That's why I've said that we would need to find a cap
for the user defined limit. How much that should be though? Few (tens,
hundreds) megs. If we can figure that up I would be of course quite
happy about such a change because MAP_GROWSDOWN doesn't work really well
these days.

> + else
> + vm_start = vma->vm_start;
> +
> + vm_start -= stack_guard_gap;
> + if (vm_start > vma->vm_start)
> + vm_start = 0;
> +
> + return vm_start;
> +}
> +
> /*
> * vma is the first one with address < vma->vm_start. Have to extend vma.
> */
>
> --
> Ben Hutchings
> For every complex problem
> there is a solution that is simple, neat, and wrong.

--
Michal Hocko
SUSE Labs

2017-07-05 17:15:58

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH] mm: larger stack guard gap, between vmas

On Wed, Jul 5, 2017 at 9:58 AM, Ben Hutchings <[email protected]> wrote:
>
> I ended up with the following two patches, which seem to deal with
> both the Java and Rust regressions. These don't touch the
> stack-grows-up paths at all because Rust doesn't run on those
> architectures and the Java weirdness is i386-specific.
>
> They definitely need longer commit messages and comments, but aside
> from that do these look reasonable?

I thin kthey both look reasonable, but I think we might still want to
massage things a bit (cutting down the quoting to a minimum, hopefully
leaving enough context to still make sense):

> Subject: [1/2] mmap: Skip a single VM_NONE mapping when checking the stack gap
>
> prev = vma->vm_prev;
> + if (prev && !(prev->vm_flags & (VM_READ | VM_WRITE | VM_EXEC)))
> + prev = prev->vm_prev;
> if (prev && prev->vm_end > gap_addr) {

Do we just want to ignore the user-supplied guard mapping, or do we
want to say "if the user does a guard mapping, we use that *instead*
of our stack gap"?

IOW, instead of "prev = prev->vm_prev;" and continuing, maybe we want
to just return "ok".

> Subject: [2/2] mmap: Avoid mapping anywhere within the full stack extent if finite

This is good thinking, but no, I don't think the "if finite" is right.

I've seen people use "really big values" as replacement for
RLIM_INIFITY, for various reasons.

We've had huge confusion about RLIM_INFINITY over the years - look for
things like COMPAT_RLIM_OLD_INFINITY to see the kinds of confusions
we've had.

Some people just use MAX_LONG etc, which is *not* the same as
RLIM_INFINITY, but in practice ends up doing the same thing. Yadda
yadda.

So I'm personally leery of checking and depending on "exactly
RLIM_INIFITY", because I've seen it go wrong so many times.

And I think your second patch breaks that "use a really large value to
approximate infinity" case that definitely has existed as a pattern.

Linus

2017-07-05 17:24:24

by Andy Lutomirski

[permalink] [raw]

Subject: Re: [PATCH] mm: larger stack guard gap, between vmas

On Wed, Jul 5, 2017 at 9:20 AM, Linus Torvalds
<[email protected]> wrote:
> On Wed, Jul 5, 2017 at 9:15 AM, Andy Lutomirski <[email protected]> wrote:
>> On Wed, Jul 5, 2017 at 7:23 AM, Michal Hocko <[email protected]> wrote:
>>>
>>> This is really worrying. This doesn't look like a gap at all. It is a
>>> mapping which actually contains a code and so we should absolutely not
>>> allow to scribble over it. So I am afraid the only way forward is to
>>> allow per process stack gap and run this particular program to have a
>>> smaller gap. We basically have two ways. Either /proc/<pid>/$file or
>>> a prctl inherited on exec. The later is a smaller code. What do you
>>> think?
>>
>> Why inherit on exec?
>
> .. because the whole point is that you have an existing binary that breaks.
>
> So you need to be able to wrap it in "let's lower the stack gap, then
> run that known-problematic binary".
>
> If you think the problem is solved by recompiling existing binaries,
> then why are we doing this kernel hack to begin with? The *real*
> solution was always to just fix the damn compiler and ABI.

That's not what I was suggesting at all. I was suggesting that, if
we're going to suggest a new API, that the new API actually be sane.

>
> That *real* solution is simple and needs no kernel support at all.
>
> In other words, *ALL* of the kernel work in this area is purely to
> support existing binaries. Don't overlook that fact.

Right. But I think the approach that we're all taking here is a bit
nutty. We all realize that this issue is a longstanding *GCC* bug
[1], but we're acting like it's a Big Deal (tm) kernel bug that Must
Be Fixed (tm) and therefore is allowed to break ABI. My security hat
is normally pretty hard-line, but I think it may be time to call BS.

Imagine if Kees had sent some symlink hardening patch that was
default-on and broke a stock distro. Or if I had sent a vsyscall
hardening patch that broke real code. It would get reverted right
away, probably along with a diatribe about how we should have known
better. I think this stack gap stuff is the same thing. It's not a
security fix -- it's a hardening patch.

Looking at it that way, I think a new inherited-on-exec flag is nucking futs.

I'm starting to think that the right approach is to mostly revert all
this stuff (the execve fixes are fine). Then start over and think
about it as hardening. I would suggest the following approach:

- The stack gap is one page, just like it's been for years.
- As a hardening feature, if the stack would expand within 64k or
whatever of a non-MAP_FIXED mapping, refuse to expand it. (This might
have to be a non-hinted mapping, not just a non-MAP_FIXED mapping.)
The idea being that, if you deliberately place a mapping under the
stack, you know what you're doing. If you're like LibreOffice and do
something daft and are thus exploitable, you're on your own.
- As a hardening measure, don't let mmap without MAP_FIXED position
something within 64k or whatever of the bottom of the stack unless a
MAP_FIXED mapping is between them.

And that's all. It's not like a 64k gap actually fixes these bugs for
real -- it just makes them harder to exploit.

[1] The code that GCC generates for char buf[bug number] and alloca()
is flat-out wrong. Everyone who's ever thought about it all all knows
it and has known about it for years, but no one cared to fix it.

2017-07-05 17:25:07

On Wed, 2017-07-05 at 10:23 -0700, Andy Lutomirski wrote:
[...]
> Looking at it that way, I think a new inherited-on-exec flag is nucking futs.
>
> I'm starting to think that the right approach is to mostly revert all
> this stuff (the execve fixes are fine).  Then start over and think
> about it as hardening.  I would suggest the following approach:
>
> - The stack gap is one page, just like it's been for years.

Given that in the following points you say that something sounding like
a stack gap would be "64k or whatever", what does "the stack gap" mean
in this first point?

> - As a hardening feature, if the stack would expand within 64k or
> whatever of a non-MAP_FIXED mapping, refuse to expand it.  (This might
> have to be a non-hinted mapping, not just a non-MAP_FIXED mapping.)
> The idea being that, if you deliberately place a mapping under the
> stack, you know what you're doing.  If you're like LibreOffice and do
> something daft and are thus exploitable, you're on your own.
> - As a hardening measure, don't let mmap without MAP_FIXED position
> something within 64k or whatever of the bottom of the stack unless a
> MAP_FIXED mapping is between them.

Having tested patches along these lines, I think the above would avoid
the reported regressions.

Ben.

> And that's all.  It's not like a 64k gap actually fixes these bugs for
> real -- it just makes them harder to exploit.
>
> [1] The code that GCC generates for char buf[bug number] and alloca()
> is flat-out wrong.  Everyone who's ever thought about it all all knows
> it and has known about it for years, but no one cared to fix it.
--
Ben Hutchings
Anthony's Law of Force: Don't force it, get a larger hammer.

Attachments:

signature.asc (833.00 B)
This is a digitally signed message part

2017-07-05 20:41:23

by Willy Tarreau

[permalink] [raw]

Subject: Re: [PATCH] mm: larger stack guard gap, between vmas

On Wed, Jul 05, 2017 at 08:32:43PM +0100, Ben Hutchings wrote:
> > ?- As a hardening feature, if the stack would expand within 64k or
> > whatever of a non-MAP_FIXED mapping, refuse to expand it.??(This might
> > have to be a non-hinted mapping, not just a non-MAP_FIXED mapping.)
> > The idea being that, if you deliberately place a mapping under the
> > stack, you know what you're doing.??If you're like LibreOffice and do
> > something daft and are thus exploitable, you're on your own.
> > ?- As a hardening measure, don't let mmap without MAP_FIXED position
> > something within 64k or whatever of the bottom of the stack unless a
> > MAP_FIXED mapping is between them.
>
> Having tested patches along these lines, I think the above would avoid
> the reported regressions.

Stuff like this has already been proposed but Linus suspects that more
software than we imagine uses MAP_FIXED and could break. I cannot infirm
nor confirm, and that probably indicates that there's nothing fundamentally
wrong with this approach from the userland's perspective and that it could
indeed imply such software may be more common than we would like it.

Willy

2017-07-05 20:53:50

by Andy Lutomirski

[permalink] [raw]

Subject: Re: [PATCH] mm: larger stack guard gap, between vmas

> On Jul 5, 2017, at 12:32 PM, Ben Hutchings <[email protected]> wrote:
>
>> On Wed, 2017-07-05 at 10:23 -0700, Andy Lutomirski wrote:
>> [...]
>> Looking at it that way, I think a new inherited-on-exec flag is nucking futs.
>>
>> I'm starting to think that the right approach is to mostly revert all
>> this stuff (the execve fixes are fine). Then start over and think
>> about it as hardening. I would suggest the following approach:
>>
>> - The stack gap is one page, just like it's been for years.
>
> Given that in the following points you say that something sounding like
> a stack gap would be "64k or whatever", what does "the stack gap" mean
> in this first point?

I mean one page, with semantics as close to previous (4.11) behavior as practical.

>
>> - As a hardening feature, if the stack would expand within 64k or
>> whatever of a non-MAP_FIXED mapping, refuse to expand it. (This might
>> have to be a non-hinted mapping, not just a non-MAP_FIXED mapping.)
>> The idea being that, if you deliberately place a mapping under the
>> stack, you know what you're doing. If you're like LibreOffice and do
>> something daft and are thus exploitable, you're on your own.
>> - As a hardening measure, don't let mmap without MAP_FIXED position
>> something within 64k or whatever of the bottom of the stack unless a
>> MAP_FIXED mapping is between them.
>
> Having tested patches along these lines, I think the above would avoid
> the reported regressions.
>

FWIW, even this last part may be problematic. It'll break anything that tries to allocate many small MAP_GROWSDOWN stacks on 32-bit. Hopefully nothing does this, but maybe Java does.

> Ben.
>
>> And that's all. It's not like a 64k gap actually fixes these bugs for
>> real -- it just makes them harder to exploit.
>>
>> [1] The code that GCC generates for char buf[bug number] and alloca()
>> is flat-out wrong. Everyone who's ever thought about it all all knows
>> it and has known about it for years, but no one cared to fix it.
> --
> Ben Hutchings
> Anthony's Law of Force: Don't force it, get a larger hammer.
>

2017-07-05 23:36:04

[permalink] [raw]

Subject: [lkp-robot] [mm] a99d848d3b: kernel_BUG_at_mm/mmap.c

FYI, we noticed the following commit:

commit: a99d848d3bc6586e922584ce8ec673a451a09cf1 ("mm: larger stack guard gap, between vmas")
url: https://github.com/0day-ci/linux/commits/Ben-Hutchings/mmap-Skip-a-single-VM_NONE-mapping-when-checking-the-stack/20170707-131750

in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/

on test machine: qemu-system-x86_64 -enable-kvm -m 512M

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):

+------------------------------------------+------------+------------+
| | 9b51f04424 | a99d848d3b |
+------------------------------------------+------------+------------+
| boot_successes | 88 | 0 |
| boot_failures | 11 | 14 |
| BUG:kernel_hang_in_test_stage | 11 | |
| kernel_BUG_at_mm/mmap.c | 0 | 14 |
| invalid_opcode:#[##] | 0 | 14 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 14 |
+------------------------------------------+------------+------------+

[ 7.169579] kernel BUG at mm/mmap.c:388!
[ 7.170690] invalid opcode: 0000 [#1] PREEMPT SMP
[ 7.171625] CPU: 0 PID: 1 Comm: init Not tainted 4.12.0-06091-ga99d848 #3
[ 7.172985] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
[ 7.174982] task: ffff8ab880048000 task.stack: ffffacde40008000
[ 7.176176] RIP: 0010:validate_mm+0x213/0x224
[ 7.177045] RSP: 0000:ffffacde4000bb90 EFLAGS: 00010282
[ 7.178094] RAX: 0000000000000290 RBX: 00000000ffffffff RCX: b0e7f7ea00000000
[ 7.179508] RDX: 00000001b0449a78 RSI: 0000000000000001 RDI: 0000000000000246
[ 7.180915] RBP: ffffacde4000bbd0 R08: ffff8ab880048770 R09: 0000000051472920
[ 7.182313] R10: ffff8ab898919020 R11: ffffffffb12d8eaa R12: ffff8ab89e560b00
[ 7.183758] R13: 0000000000000001 R14: 0000000000000000 R15: 00007fffdd106000
[ 7.185175] FS: 0000000000000000(0000) GS:ffff8ab89f800000(0000) knlGS:0000000000000000
[ 7.186776] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7.187916] CR2: 0000000000000000 CR3: 0000000017e25000 CR4: 00000000000006f0
[ 7.189313] Call Trace:
[ 7.189828] __vma_adjust+0x657/0x6ca
[ 7.190583] ? tlb_flush_mmu+0x15/0x18
[ 7.191331] shift_arg_pages+0x152/0x167
[ 7.192162] setup_arg_pages+0x1c1/0x1f4
[ 7.192970] load_elf_binary+0x344/0xe48
[ 7.193782] ? kvm_clock_read+0x25/0x35
[ 7.194553] ? kvm_sched_clock_read+0x9/0x12
[ 7.195412] ? search_binary_handler+0x52/0xce
[ 7.196281] search_binary_handler+0x5f/0xce
[ 7.197150] do_execveat_common+0x4dc/0x64c
[ 7.198121] ? rest_init+0x143/0x143
[ 7.198851] do_execve+0x1e/0x20
[ 7.199519] run_init_process+0x26/0x28
[ 7.200288] kernel_init+0x4f/0xe6
[ 7.200977] ret_from_fork+0x25/0x30
[ 7.201679] Code: 41 8b 74 24 70 39 de 74 15 83 fb ff 74 15 89 da 48 c7 c7 d6 c8 23 b0 e8 ba f6 fc ff eb 05 45 85 f6 74 0a 4c 89 e7 e8 67 42 ff ff <0f> 0b 48 83 c4 18 5b 41 5c 41 5d 41 5e 41 5f 5d c3 55 48 89 e5
[ 7.205614] RIP: validate_mm+0x213/0x224 RSP: ffffacde4000bb90
[ 7.206830] ---[ end trace 95e0c74c93056b9b ]---

To reproduce:

git clone https://github.com/01org/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email

Thanks,
Xiaolong

Attachments:

(No filename) (3.54 kB)
config-4.12.0-06091-ga99d848 (120.86 kB)
job-script (3.65 kB)
dmesg.xz (13.22 kB)
Download all attachments