Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752365AbdGDXbz (ORCPT ); Tue, 4 Jul 2017 19:31:55 -0400 Received: from mail-oi0-f47.google.com ([209.85.218.47]:34052 "EHLO mail-oi0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752319AbdGDXby (ORCPT ); Tue, 4 Jul 2017 19:31:54 -0400 MIME-Version: 1.0 In-Reply-To: <1499209315.2707.29.camel@decadent.org.uk> References: <20170619142358.GA32654@1wt.eu> <1498009101.2655.6.camel@decadent.org.uk> <20170621092419.GA22051@dhcp22.suse.cz> <1498042057.2655.8.camel@decadent.org.uk> <1499126133.2707.20.camel@decadent.org.uk> <20170704084122.GC14722@dhcp22.suse.cz> <20170704093538.GF14722@dhcp22.suse.cz> <20170704094728.GB22013@1wt.eu> <20170704104211.GG14722@dhcp22.suse.cz> <20170704113611.GA4732@decadent.org.uk> <1499209315.2707.29.camel@decadent.org.uk> From: Linus Torvalds Date: Tue, 4 Jul 2017 16:31:52 -0700 X-Google-Sender-Auth: BAmlI_b51-E9QzX7j0upV14M-VU Message-ID: Subject: Re: [PATCH] mm: larger stack guard gap, between vmas To: Ben Hutchings Cc: Michal Hocko , Willy Tarreau , Hugh Dickins , Oleg Nesterov , "Jason A. Donenfeld" , Rik van Riel , Larry Woodman , "Kirill A. Shutemov" , Tony Luck , "James E.J. Bottomley" , Helge Diller , James Hogan , Laura Abbott , Greg KH , "security@kernel.org" , linux-distros@vs.openwall.org, Qualys Security Advisory , LKML , Ximin Luo Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2573 Lines: 65 On Tue, Jul 4, 2017 at 4:01 PM, Ben Hutchings wrote: > > We have: > > bottom = 0xff803fff > sp = 0xffffb178 > > The relevant mappings are: > > ff7fc000-ff7fd000 rwxp 00000000 00:00 0 > fffdd000-ffffe000 rw-p 00000000 00:00 0 [stack] Ugh. So that stack is actually 8MB in size, but the alloca() is about to use up almost all of it, and there's only about 28kB left between "bottom" and that 'rwx' mapping. Still, that rwx mapping is interesting: it is a single page, and it really is almost exactly 8MB below the stack. In fact, the top of stack (at 0xffffe000) is *exactly* 8MB+4kB from the top of that odd one-page allocation (0xff7fd000). Can you find out where that is allocated? Perhaps a breakpoint on mmap, with a condition to catch that particular one? Because I'm wondering if it was done explicitly as a 8MB stack boundary allocation, with the "knowledge" that the kernel then adds a one-page guard page. I really don't know why somebody would do that (as opposed to just limiting the stack with ulimit), but the 8MB+4kB distance is kind of intriguing. Maybe that one-page mapping is some hack to make sure that no random mmap() will ever get too close to the stack, so it really is a "guard mapping", except it's explicitly designed not so much to guard the stack from growing down further (ulimit does that), but to guard the brk() and other mmaps from growing *up* into the stack area.. Sometimes user mode does crazy things just because people are insane. But sometimes there really is a method to the madness. I would *not* be surprised if the way somebody allocared the stack was to basically say: - let's use "mmap()" with a size of 8MB+2 pages to find a sufficiently sized virtual memory area - once we've gotten that virtual address space range, let's over-map the last page as the new stack using MAP_FIXED - finally, munmap the 8MB in between so that the new stack can grow down into that gap the munmap creates. Notice how you end up with exactly the above pattern of allocations, and how it guarantees that you get a nice 8MB stack without having to do any locking (you rely on the kernel to just find the 8MB+8kB areas, and once one has been allocated, it will be "safe"). And yes, it would have been much nicer to just use PROT_NONE for that initial sizing allocation, but for somebody who is only interested in carving out a 8MB stack in virtual space, the protections are actually kind of immaterial, so 'rwx' might be just their mental default. Linus