Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751915AbdGEQPu (ORCPT ); Wed, 5 Jul 2017 12:15:50 -0400 Received: from mail.kernel.org ([198.145.29.99]:56622 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751653AbdGEQPs (ORCPT ); Wed, 5 Jul 2017 12:15:48 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9CF9F22C93 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org MIME-Version: 1.0 In-Reply-To: <20170705142354.GB21220@dhcp22.suse.cz> References: <1499126133.2707.20.camel@decadent.org.uk> <20170704084122.GC14722@dhcp22.suse.cz> <20170704093538.GF14722@dhcp22.suse.cz> <20170704094728.GB22013@1wt.eu> <20170704104211.GG14722@dhcp22.suse.cz> <20170704113611.GA4732@decadent.org.uk> <1499209315.2707.29.camel@decadent.org.uk> <1499257180.2707.34.camel@decadent.org.uk> <20170705142354.GB21220@dhcp22.suse.cz> From: Andy Lutomirski Date: Wed, 5 Jul 2017 09:15:20 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] mm: larger stack guard gap, between vmas To: Michal Hocko Cc: Ben Hutchings , Linus Torvalds , Willy Tarreau , Hugh Dickins , Oleg Nesterov , "Jason A. Donenfeld" , Rik van Riel , Larry Woodman , "Kirill A. Shutemov" , Tony Luck , "James E.J. Bottomley" , Helge Diller , James Hogan , Laura Abbott , Greg KH , "security@kernel.org" , linux-distros@vs.openwall.org, Qualys Security Advisory , LKML , Ximin Luo Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2605 Lines: 58 On Wed, Jul 5, 2017 at 7:23 AM, Michal Hocko wrote: > On Wed 05-07-17 13:19:40, Ben Hutchings wrote: >> On Tue, 2017-07-04 at 16:31 -0700, Linus Torvalds wrote: >> > On Tue, Jul 4, 2017 at 4:01 PM, Ben Hutchings >> > wrote: >> > > >> > > We have: >> > > >> > > bottom = 0xff803fff >> > > sp = 0xffffb178 >> > > >> > > The relevant mappings are: >> > > >> > > ff7fc000-ff7fd000 rwxp 00000000 00:00 0 >> > > fffdd000-ffffe000 rw-p 00000000 00:00 >> > > 0 [stack] >> > >> > Ugh. So that stack is actually 8MB in size, but the alloca() is about >> > to use up almost all of it, and there's only about 28kB left between >> > "bottom" and that 'rwx' mapping. >> > >> > Still, that rwx mapping is interesting: it is a single page, and it >> > really is almost exactly 8MB below the stack. >> > >> > In fact, the top of stack (at 0xffffe000) is *exactly* 8MB+4kB from >> > the top of that odd one-page allocation (0xff7fd000). >> > >> > Can you find out where that is allocated? Perhaps a breakpoint on >> > mmap, with a condition to catch that particular one? >> [...] >> >> Found it, and it's now clear why only i386 is affected: >> http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/os/linux/vm/os_linux.cpp#l4852 >> http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/os_cpu/linux_x86/vm/os_linux_x86.cpp#l881 > > This is really worrying. This doesn't look like a gap at all. It is a > mapping which actually contains a code and so we should absolutely not > allow to scribble over it. So I am afraid the only way forward is to > allow per process stack gap and run this particular program to have a > smaller gap. We basically have two ways. Either /proc//$file or > a prctl inherited on exec. The later is a smaller code. What do you > think? Why inherit on exec? I think that, if we add a new API, we should do it right rather than making it even more hackish. Specifically, we'd add a real VMA type (via flag or whatever) that means "this is a modern stack". A modern stack wouldn't ever expand and would have no guard page at all. It would, however, properly account stack space by tracking the pages used as stack space. Users of the new VMA type would be responsible for allocating their own guard pages, probably by mapping an extra page and than mapping PROT_NONE over it. Also, this doesn't even need a new API, I think. What's wrong with plain old mmap(2) with MAP_STACK and *without* MAP_GROWSDOWN? Only new kernels would get the accounting right, but I doubt that matters much in practice.