Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752199AbdGDJsC (ORCPT ); Tue, 4 Jul 2017 05:48:02 -0400 Received: from wtarreau.pck.nerim.net ([62.212.114.60]:53928 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751631AbdGDJsB (ORCPT ); Tue, 4 Jul 2017 05:48:01 -0400 Date: Tue, 4 Jul 2017 11:47:28 +0200 From: Willy Tarreau To: Michal Hocko Cc: Linus Torvalds , Ben Hutchings , Hugh Dickins , Oleg Nesterov , "Jason A. Donenfeld" , Rik van Riel , Larry Woodman , "Kirill A. Shutemov" , Tony Luck , "James E.J. Bottomley" , Helge Diller , James Hogan , Laura Abbott , Greg KH , "security@kernel.org" , linux-distros@vs.openwall.org, Qualys Security Advisory , LKML Subject: Re: [PATCH] mm: larger stack guard gap, between vmas Message-ID: <20170704094728.GB22013@1wt.eu> References: <20170619142358.GA32654@1wt.eu> <1498009101.2655.6.camel@decadent.org.uk> <20170621092419.GA22051@dhcp22.suse.cz> <1498042057.2655.8.camel@decadent.org.uk> <1499126133.2707.20.camel@decadent.org.uk> <20170704084122.GC14722@dhcp22.suse.cz> <20170704093538.GF14722@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170704093538.GF14722@dhcp22.suse.cz> User-Agent: Mutt/1.6.1 (2016-04-27) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4151 Lines: 90 On Tue, Jul 04, 2017 at 11:35:38AM +0200, Michal Hocko wrote: > On Tue 04-07-17 10:41:22, Michal Hocko wrote: > > On Mon 03-07-17 17:05:27, Linus Torvalds wrote: > > > On Mon, Jul 3, 2017 at 4:55 PM, Ben Hutchings wrote: > > > > > > > > Firstly, some Rust programs are crashing on ppc64el with 64 KiB pages. > > > > Apparently Rust maps its own guard page at the lower limit of the stack > > > > (determined using pthread_getattr_np() and pthread_attr_getstack()). I > > > > don't think this ever actually worked for the main thread stack, but it > > > > now also blocks expansion as the default stack size of 8 MiB is smaller > > > > than the stack gap of 16 MiB. Would it make sense to skip over > > > > PROT_NONE mappings when checking whether it's safe to expand? > > > > This is what my workaround for the older patch was doing, actually. We > > have deployed that as a follow up fix on our older code bases. And this > > has fixed verious issues with Java which was doing the similar thing. > > Here is a forward port (on top of the current Linus tree) of my earlier > patch. I have dropped a note about java stack trace because this would > most likely be not the case with the Hugh's patch. The problem is the > same in principle though. Note I didn't get to test this properly yet > but it should be pretty much obvious. > --- > >From d9f6faccf2c286ed81fbc860c9b0b7fe23ef0836 Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Tue, 4 Jul 2017 11:27:39 +0200 > Subject: [PATCH] mm: mm, mmap: do not blow on PROT_NONE MAP_FIXED holes in the > stack > > "mm: enlarge stack guard gap" has introduced a regression in some rust > and Java environments which are trying to implement their own stack > guard page. They are punching a new MAP_FIXED mapping inside the > existing stack Vma. > > This will confuse expand_{downwards,upwards} into thinking that the stack > expansion would in fact get us too close to an existing non-stack vma > which is a correct behavior wrt. safety. It is a real regression on > the other hand. Let's work around the problem by considering PROT_NONE > mapping as a part of the stack. This is a gros hack but overflowing to > such a mapping would trap anyway an we only can hope that usespace > knows what it is doing and handle it propely. > > Fixes: d4d2d35e6ef9 ("mm: larger stack guard gap, between vmas") > Debugged-by: Vlastimil Babka > Signed-off-by: Michal Hocko > --- > mm/mmap.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index f60a8bc2869c..2e996cbf4ff3 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -2244,7 +2244,8 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address) > gap_addr = TASK_SIZE; > > next = vma->vm_next; > - if (next && next->vm_start < gap_addr) { > + if (next && next->vm_start < gap_addr && > + (next->vm_flags & (VM_WRITE|VM_READ|VM_EXEC))) { > if (!(next->vm_flags & VM_GROWSUP)) > return -ENOMEM; > /* Check that both stack segments have the same anon_vma? */ > @@ -2325,7 +2326,8 @@ int expand_downwards(struct vm_area_struct *vma, > /* Enforce stack_guard_gap */ > prev = vma->vm_prev; > /* Check that both stack segments have the same anon_vma? */ > - if (prev && !(prev->vm_flags & VM_GROWSDOWN)) { > + if (prev && !(prev->vm_flags & VM_GROWSDOWN) && > + (prev->vm_flags & (VM_WRITE|VM_READ|VM_EXEC))) { > if (address - prev->vm_end < stack_guard_gap) > return -ENOMEM; > } But wouldn't this completely disable the check in case such a guard page is installed, and possibly continue to allow the collision when the stack allocation is large enough to skip this guard page ? Shouldn't we instead "skip" such a vma and look for the next one ? I was thinking about something more like : prev = vma->vm_prev; + /* Don't consider a possible user-space stack guard page */ + if (prev && !(prev->vm_flags & VM_GROWSDOWN) && + !(prev->vm_flags & (VM_WRITE|VM_READ|VM_EXEC))) + prev = prev->vm_prev; + /* Check that both stack segments have the same anon_vma? */ Willy