Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752068AbdGFKM1 (ORCPT ); Thu, 6 Jul 2017 06:12:27 -0400 Received: from wtarreau.pck.nerim.net ([62.212.114.60]:32861 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751059AbdGFKM0 (ORCPT ); Thu, 6 Jul 2017 06:12:26 -0400 Date: Thu, 6 Jul 2017 12:11:49 +0200 From: Willy Tarreau To: Linus Torvalds Cc: Ben Hutchings , Michal Hocko , Hugh Dickins , Oleg Nesterov , "Jason A. Donenfeld" , Rik van Riel , Larry Woodman , "Kirill A. Shutemov" , Tony Luck , "James E.J. Bottomley" , Helge Diller , James Hogan , Laura Abbott , Greg KH , "security@kernel.org" , Qualys Security Advisory , LKML , Ximin Luo Subject: Re: [PATCH] mm: larger stack guard gap, between vmas Message-ID: <20170706101149.GA25937@1wt.eu> References: <1499209315.2707.29.camel@decadent.org.uk> <1499257180.2707.34.camel@decadent.org.uk> <20170705142354.GB21220@dhcp22.suse.cz> <1499268300.2707.41.camel@decadent.org.uk> <20170705165845.GB4732@decadent.org.uk> <1499297724.2707.56.camel@decadent.org.uk> <20170706082406.GA25812@1wt.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170706082406.GA25812@1wt.eu> User-Agent: Mutt/1.6.1 (2016-04-27) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5389 Lines: 131 On Thu, Jul 06, 2017 at 10:24:06AM +0200, Willy Tarreau wrote: > On Wed, Jul 05, 2017 at 04:51:06PM -0700, Linus Torvalds wrote: > > On Wed, Jul 5, 2017 at 4:35 PM, Ben Hutchings wrote: > > >> > > >> And I think your second patch breaks that "use a really large value to > > >> approximate infinity" case that definitely has existed as a pattern. > > > > > > Right. Well that seems to leave us with remembering the MAP_FIXED flag > > > and using that as the condition to ignore the previous mapping. > > > > I'm not particularly happy about having a MAP_FIXED special case, but > > yeah, I'm not seeing a lot of alternatives. > > We can possibly refine it like this : > - use PROT_NONE as a mark for the end of the stack and consider the > application doing this knows exactly what it's doing ; > > - use other MAP_FIXED as a limit for a shorter gap (ie 4kB), considering > that 1) it used to work like this for many years, and 2) if an application > is forcing a MAP_FIXED just below the stack and at the same time uses > large alloca() or VLA it's definitely bogus and looking for unfixable > trouble. Not allowing this means we break existing applications anyway. That would probably give the following (only build-tested on x86_64). Do you think it would make sense and/or be acceptable ? That would more easily avoid the other options like adding sysctl + warnings or making a special case of setuid. Willy --- >From 56ae4e57e446bc92fd2647327da281e313930524 Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Thu, 6 Jul 2017 12:00:54 +0200 Subject: mm: mm, mmap: only apply a one page gap betwen the stack an MAP_FIXED Some programs place a MAP_FIXED below the stack, not leaving enough room for the stack guard. This patch keeps track of MAP_FIXED, mirroring it in a new VM_FIXED flag and reduces the stack guard to a single page (as it used to be) in such a situation, assuming that when an application places a fixed map close to the stack, it very likely does it on purpose and is taking the full responsibility for the risk of the stack blowing up. Cc: Ben Hutchings Cc: Michal Hocko Cc: Kees Cook Cc: Andy Lutomirski Signed-off-by: Willy Tarreau --- include/linux/mm.h | 1 + include/linux/mman.h | 1 + mm/mmap.c | 30 ++++++++++++++++++++---------- 3 files changed, 22 insertions(+), 10 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 6f543a4..41492b9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -188,6 +188,7 @@ extern int overcommit_kbytes_handler(struct ctl_table *, int, void __user *, #define VM_ACCOUNT 0x00100000 /* Is a VM accounted object */ #define VM_NORESERVE 0x00200000 /* should the VM suppress accounting */ #define VM_HUGETLB 0x00400000 /* Huge TLB Page VM */ +#define VM_FIXED 0x00800000 /* MAP_FIXED was used */ #define VM_ARCH_1 0x01000000 /* Architecture-specific flag */ #define VM_ARCH_2 0x02000000 #define VM_DONTDUMP 0x04000000 /* Do not include in the core dump */ diff --git a/include/linux/mman.h b/include/linux/mman.h index 634c4c5..3a29069 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -86,6 +86,7 @@ static inline bool arch_validate_prot(unsigned long prot) { return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | _calc_vm_trans(flags, MAP_DENYWRITE, VM_DENYWRITE ) | + _calc_vm_trans(flags, MAP_FIXED, VM_FIXED ) | _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ); } diff --git a/mm/mmap.c b/mm/mmap.c index ece0f6d..7fc1c29 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2244,12 +2244,17 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address) gap_addr = TASK_SIZE; next = vma->vm_next; + + /* PROT_NONE above a MAP_GROWSUP always serves as a mark and inhibits + * the stack guard gap. + * MAP_FIXED above a MAP_GROWSUP only requires a single page guard. + */ if (next && next->vm_start < gap_addr && - (next->vm_flags & (VM_WRITE|VM_READ|VM_EXEC))) { - if (!(next->vm_flags & VM_GROWSUP)) - return -ENOMEM; - /* Check that both stack segments have the same anon_vma? */ - } + !(next->vm_flags & VM_GROWSUP) && + (next->vm_flags & (VM_WRITE|VM_READ|VM_EXEC)) && + (!(next->vm_flags & VM_FIXED) || + next->vm_start < address + PAGE_SIZE)) + return -ENOMEM; /* We must make sure the anon_vma is allocated. */ if (unlikely(anon_vma_prepare(vma))) @@ -2329,12 +2334,17 @@ int expand_downwards(struct vm_area_struct *vma, if (gap_addr > address) return -ENOMEM; prev = vma->vm_prev; + + /* PROT_NONE below a MAP_GROWSDOWN always serves as a mark and inhibits + * the stack guard gap. + * MAP_FIXED below a MAP_GROWSDOWN only requires a single page guard. + */ if (prev && prev->vm_end > gap_addr && - (prev->vm_flags & (VM_WRITE|VM_READ|VM_EXEC))) { - if (!(prev->vm_flags & VM_GROWSDOWN)) - return -ENOMEM; - /* Check that both stack segments have the same anon_vma? */ - } + !(prev->vm_flags & VM_GROWSDOWN) && + (prev->vm_flags & (VM_WRITE|VM_READ|VM_EXEC)) && + (!(prev->vm_flags & VM_FIXED) || + prev->vm_end > address - PAGE_SIZE)) + return -ENOMEM; /* We must make sure the anon_vma is allocated. */ if (unlikely(anon_vma_prepare(vma))) -- 1.7.12.1