Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752133AbdFUHFY (ORCPT ); Wed, 21 Jun 2017 03:05:24 -0400 Received: from mail-pf0-f171.google.com ([209.85.192.171]:33431 "EHLO mail-pf0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750979AbdFUHFW (ORCPT ); Wed, 21 Jun 2017 03:05:22 -0400 Date: Wed, 21 Jun 2017 00:05:07 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Willy Tarreau cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org, linux@roeck-us.net, Hugh Dickins Subject: Re: [PATCH 3.10 268/268] mm: larger stack guard gap, between vmas In-Reply-To: <1497897167-14556-269-git-send-email-w@1wt.eu> Message-ID: References: <1497897167-14556-1-git-send-email-w@1wt.eu> <1497897167-14556-269-git-send-email-w@1wt.eu> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6745 Lines: 197 On Mon, 19 Jun 2017, Willy Tarreau wrote: > From: Hugh Dickins > > commit 1be7107fbe18eed3e319a6c3e83c78254b693acb upstream. Some of these suggested adjustments below are just what comparing mine and yours showed up, and I'm being anal in passing them on e.g. I do like your blank line in mm.h, but Michal chose to leave it out, and I think that the closer we keep these sources to each other, the less trouble we shall have patching on top in future. Which is particularly true in expand_upwards() and expand_downwards() (and you're thinking of backporting Helge's TASK_SIZE enhancement on top of that, though I don't think it's strictly necessary for a stable tree). Your patch is not wrong there: though odd to be trying anon_vma_prepare() twice in expand_downwards(), and tiresome to have to unlock at each error exit. But I'd already decided in one of our internal trees just to factor in some of Konstantin's change, that made the ordering much more sensible there, and the two more like each other; so recommend that 3.10 do the same, keeping it closer to the final 4.12 code. But you may have different priorities and disagree with that: just suggesting. And there is the possibility that we shall want another patch or two on top there. I've left a question as to whether we should be comparing anon_vmas. And there's a potential (but I think ignorable) locking issue, in the case of an architecture that supports both VM_GROWSUP and VM_GROWSDOWN: if they expand towards each other at the same instant, they could gobble up the gap between them (they almost certainly have different anon_vmas, so the anon_vma locking does not protect against that). When it gets to updating the vma tree, it is careful to use page_table_lock to maintain the consistency of the tree in such a case, but maybe we should do that earlier. Then there's the FOLL_MLOCK thing, and the WARN_ON (phew, remembered in time that you don't have VM_WARN_ON) - but keep in mind that I have not even built this tree, let alone tested it. Sorry if I'm being annoying, Willy: you must be heartily sick of these patches by now! Or, being a longtime longterm maintainer, perhaps it's all joy for you ;-? Hugh diff -purN 310n/include/linux/mm.h 310h/include/linux/mm.h --- 310n/include/linux/mm.h 2017-06-20 16:50:29.809546868 -0700 +++ 310h/include/linux/mm.h 2017-06-20 19:52:59.359942133 -0700 @@ -1595,7 +1595,6 @@ unsigned long ra_submit(struct file_ra_s struct file *filp); extern unsigned long stack_guard_gap; - /* Generic expand stack which grows the stack according to GROWS{UP,DOWN} */ extern int expand_stack(struct vm_area_struct *vma, unsigned long address); diff -purN 310n/mm/memory.c 310h/mm/memory.c --- 310n/mm/memory.c 2017-06-20 16:50:29.809546868 -0700 +++ 310h/mm/memory.c 2017-06-20 19:57:14.537573559 -0700 @@ -1821,9 +1821,6 @@ long __get_user_pages(struct task_struct int ret; unsigned int fault_flags = 0; - /* mlock all present pages, but do not fault in new pages */ - if (foll_flags & FOLL_MLOCK) - goto next_page; if (foll_flags & FOLL_WRITE) fault_flags |= FAULT_FLAG_WRITE; if (nonblocking) diff -purN 310n/mm/mmap.c 310h/mm/mmap.c --- 310n/mm/mmap.c 2017-06-20 16:50:29.809546868 -0700 +++ 310h/mm/mmap.c 2017-06-20 20:48:08.409202485 -0700 @@ -892,7 +892,7 @@ again: remove_next = 1 + (end > next-> else if (next) vma_gap_update(next); else - mm->highest_vm_end = end; + WARN_ON(mm->highest_vm_end != vm_end_gap(vma)); } if (insert && file) uprobe_mmap(insert); @@ -2123,48 +2123,39 @@ int expand_upwards(struct vm_area_struct { struct vm_area_struct *next; unsigned long gap_addr; - int error; + int error = 0; if (!(vma->vm_flags & VM_GROWSUP)) return -EFAULT; - /* - * We must make sure the anon_vma is allocated - * so that the anon_vma locking is not a noop. - */ - if (unlikely(anon_vma_prepare(vma))) - return -ENOMEM; - vma_lock_anon_vma(vma); - - /* - * vma->vm_start/vm_end cannot change under us because the caller - * is required to hold the mmap_sem in read mode. We need the - * anon_vma lock to serialize against concurrent expand_stacks. - * Also guard against wrapping around to address 0. - */ + /* Guard against wrapping around to address 0. */ address &= PAGE_MASK; address += PAGE_SIZE; - if (!address) { - vma_unlock_anon_vma(vma); + if (!address) return -ENOMEM; - } - error = 0; /* Enforce stack_guard_gap */ gap_addr = address + stack_guard_gap; - if (gap_addr < address) { - vma_unlock_anon_vma(vma); + if (gap_addr < address) return -ENOMEM; - } next = vma->vm_next; if (next && next->vm_start < gap_addr) { - if (!(next->vm_flags & VM_GROWSUP)) { - vma_unlock_anon_vma(vma); + if (!(next->vm_flags & VM_GROWSUP)) return -ENOMEM; - } /* Check that both stack segments have the same anon_vma? */ } + /* We must make sure the anon_vma is allocated. */ + if (unlikely(anon_vma_prepare(vma))) + return -ENOMEM; + + /* + * vma->vm_start/vm_end cannot change under us because the caller + * is required to hold the mmap_sem in read mode. We need the + * anon_vma lock to serialize against concurrent expand_stacks. + */ + vma_lock_anon_vma(vma); + /* Somebody else might have raced and expanded it already */ if (address > vma->vm_end) { unsigned long size, grow; @@ -2218,46 +2209,32 @@ int expand_downwards(struct vm_area_stru unsigned long gap_addr; int error; - /* - * We must make sure the anon_vma is allocated - * so that the anon_vma locking is not a noop. - */ - if (unlikely(anon_vma_prepare(vma))) - return -ENOMEM; - address &= PAGE_MASK; error = security_mmap_addr(address); if (error) return error; - vma_lock_anon_vma(vma); - /* Enforce stack_guard_gap */ gap_addr = address - stack_guard_gap; - if (gap_addr > address) { - vma_unlock_anon_vma(vma); + if (gap_addr > address) return -ENOMEM; - } prev = vma->vm_prev; if (prev && prev->vm_end > gap_addr) { - if (!(prev->vm_flags & VM_GROWSDOWN)) { - vma_unlock_anon_vma(vma); + if (!(prev->vm_flags & VM_GROWSDOWN)) return -ENOMEM; - } /* Check that both stack segments have the same anon_vma? */ } /* We must make sure the anon_vma is allocated. */ - if (unlikely(anon_vma_prepare(vma))) { - vma_unlock_anon_vma(vma); + if (unlikely(anon_vma_prepare(vma))) return -ENOMEM; - } /* * vma->vm_start/vm_end cannot change under us because the caller * is required to hold the mmap_sem in read mode. We need the * anon_vma lock to serialize against concurrent expand_stacks. */ + vma_lock_anon_vma(vma); /* Somebody else might have raced and expanded it already */ if (address < vma->vm_start) {