Date: Tue, 4 Jul 2017 11:47:28 +0200
From: Willy Tarreau <w@1wt.eu>
To: Michal Hocko <mhocko@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
        Ben Hutchings <ben@decadent.org.uk>, Hugh Dickins <hughd@google.com>,
        Oleg Nesterov <oleg@redhat.com>,
        "Jason A. Donenfeld" <Jason@zx2c4.com>, Rik van Riel <riel@redhat.com>,
        Larry Woodman <lwoodman@redhat.com>,
        "Kirill A. Shutemov" <kirill@shutemov.name>,
        Tony Luck <tony.luck@intel.com>,
        "James E.J. Bottomley" <jejb@parisc-linux.org>,
        Helge Diller <deller@gmx.de>, James Hogan <james.hogan@imgtec.com>,
        Laura Abbott <labbott@redhat.com>, Greg KH <greg@kroah.com>,
        "security@kernel.org" <security@kernel.org>,
        linux-distros@vs.openwall.org,
        Qualys Security Advisory <qsa@qualys.com>,
        LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: larger stack guard gap, between vmas
Message-ID: <20170704094728.GB22013@1wt.eu>
References: <alpine.LSU.2.11.1706190355140.2626@eggly.anvils>
 <CA+55aFx6j4na3BVRC2aQuf-kNp1jzGahN8To_SFpNu+H=gopJA@mail.gmail.com>
 <20170619142358.GA32654@1wt.eu>
 <1498009101.2655.6.camel@decadent.org.uk>
 <20170621092419.GA22051@dhcp22.suse.cz>
 <1498042057.2655.8.camel@decadent.org.uk>
 <1499126133.2707.20.camel@decadent.org.uk>
 <CA+55aFzMX72+Kb=zNgjCf6UfPt+C+e7WDp_rpbSLuOVx1k7iqg@mail.gmail.com>
 <20170704084122.GC14722@dhcp22.suse.cz>
 <20170704093538.GF14722@dhcp22.suse.cz>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170704093538.GF14722@dhcp22.suse.cz>
User-Agent: Mutt/1.6.1 (2016-04-27)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4151
Lines: 90

On Tue, Jul 04, 2017 at 11:35:38AM +0200, Michal Hocko wrote:
> On Tue 04-07-17 10:41:22, Michal Hocko wrote:
> > On Mon 03-07-17 17:05:27, Linus Torvalds wrote:
> > > On Mon, Jul 3, 2017 at 4:55 PM, Ben Hutchings <ben@decadent.org.uk> wrote:
> > > >
> > > > Firstly, some Rust programs are crashing on ppc64el with 64 KiB pages.
> > > > Apparently Rust maps its own guard page at the lower limit of the stack
> > > > (determined using pthread_getattr_np() and pthread_attr_getstack()).  I
> > > > don't think this ever actually worked for the main thread stack, but it
> > > > now also blocks expansion as the default stack size of 8 MiB is smaller
> > > > than the stack gap of 16 MiB.  Would it make sense to skip over
> > > > PROT_NONE mappings when checking whether it's safe to expand?
> > 
> > This is what my workaround for the older patch was doing, actually. We
> > have deployed that as a follow up fix on our older code bases. And this
> > has fixed verious issues with Java which was doing the similar thing.
> 
> Here is a forward port (on top of the current Linus tree) of my earlier
> patch. I have dropped a note about java stack trace because this would
> most likely be not the case with the Hugh's patch. The problem is the
> same in principle though. Note I didn't get to test this properly yet
> but it should be pretty much obvious.
> ---
> >From d9f6faccf2c286ed81fbc860c9b0b7fe23ef0836 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Tue, 4 Jul 2017 11:27:39 +0200
> Subject: [PATCH] mm: mm, mmap: do not blow on PROT_NONE MAP_FIXED holes in the
>  stack
> 
> "mm: enlarge stack guard gap" has introduced a regression in some rust
> and Java environments which are trying to implement their own stack
> guard page.  They are punching a new MAP_FIXED mapping inside the
> existing stack Vma.
> 
> This will confuse expand_{downwards,upwards} into thinking that the stack
> expansion would in fact get us too close to an existing non-stack vma
> which is a correct behavior wrt. safety. It is a real regression on
> the other hand. Let's work around the problem by considering PROT_NONE
> mapping as a part of the stack. This is a gros hack but overflowing to
> such a mapping would trap anyway an we only can hope that usespace
> knows what it is doing and handle it propely.
> 
> Fixes: d4d2d35e6ef9 ("mm: larger stack guard gap, between vmas")
> Debugged-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/mmap.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index f60a8bc2869c..2e996cbf4ff3 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2244,7 +2244,8 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>  		gap_addr = TASK_SIZE;
>  
>  	next = vma->vm_next;
> -	if (next && next->vm_start < gap_addr) {
> +	if (next && next->vm_start < gap_addr &&
> +			(next->vm_flags & (VM_WRITE|VM_READ|VM_EXEC))) {
>  		if (!(next->vm_flags & VM_GROWSUP))
>  			return -ENOMEM;
>  		/* Check that both stack segments have the same anon_vma? */
> @@ -2325,7 +2326,8 @@ int expand_downwards(struct vm_area_struct *vma,
>  	/* Enforce stack_guard_gap */
>  	prev = vma->vm_prev;
>  	/* Check that both stack segments have the same anon_vma? */
> -	if (prev && !(prev->vm_flags & VM_GROWSDOWN)) {
> +	if (prev && !(prev->vm_flags & VM_GROWSDOWN) &&
> +			(prev->vm_flags & (VM_WRITE|VM_READ|VM_EXEC))) {
>  		if (address - prev->vm_end < stack_guard_gap)
>  			return -ENOMEM;
>  	}

But wouldn't this completely disable the check in case such a guard page
is installed, and possibly continue to allow the collision when the stack
allocation is large enough to skip this guard page ? Shouldn't we instead
"skip" such a vma and look for the next one ?

I was thinking about something more like :

	prev = vma->vm_prev;
+	/* Don't consider a possible user-space stack guard page */
+	if (prev && !(prev->vm_flags & VM_GROWSDOWN) &&
+	    !(prev->vm_flags & (VM_WRITE|VM_READ|VM_EXEC)))
+		prev = prev->vm_prev;
+
       /* Check that both stack segments have the same anon_vma? */

Willy