Received: by 10.213.65.68 with SMTP id h4csp824738imn; Sun, 25 Mar 2018 14:51:26 -0700 (PDT) X-Google-Smtp-Source: AG47ELvptSmiW8Z1Yxmc5fCG5zNex4KsjQSSP+jLP8Caxro7NjSu3j1EUjUmN1kEqXLrkXKLL+lh X-Received: by 10.99.109.75 with SMTP id i72mr21326177pgc.403.1522014686116; Sun, 25 Mar 2018 14:51:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522014686; cv=none; d=google.com; s=arc-20160816; b=HKJUvv8dqYkpMmEPkJdZEZydRONRgxqGIXu8Jn28nWShwaGyC7V3+j4Q9kWU0j5mMT tIG5j1TUQRrteXogXD+UkgUhsx5hbjG3m4n40p2Xp2OypXBypB5caqoQ/WDLaKoCo/Xq +TLih7I7PX8BhdZ9uUE6sn6UBKBDDNlY6hf5jMgnhd16LDhdRDWXRO+ns62Au+YMHKsi e26Sp2nXVhlUMFTHZcTymxlwfR4H0x5SXGm5FTFd56aHCX7/rE8ApUaLoXICE8KBpnWd 6fv7nY2eLTOkbrWS2G81pf/LDKdXPfZm6pM4F9DVZpnghJKCsoCp9UfFKe9XlUacQOqa A0xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=UBD9xtag/MAysmHrJETpmcpBujCvM0CdZ1sX4Xy3kxI=; b=QEg04OB5OeED+ny9LyKzHjV58Jkofxdcy908XWTNr0kgRCSr1iqR5X55v8sSdMHkQu wdLsdofLEKvrfEftQZDTMe1QRCW1VApCW1ypAdG0oOVGRU+LrRQKrcM86Y4FMtgNEsiA 2fqZyO/Gjf7UlfNlWGmp0txlXhQ+NUpgbtGq627sst5CuKe0ccnQy7huwskl8wYBCAGS zKk8GdYH/n+TBZOkIQdFy+O76Dz2S12k2vzEuYGyf1OjhOAcB+XhN2bmfSEvteOPcAvG d/imWEbNr2EirrnYWAeTDK7/ij/8/CEnRH9MR6Nez9en/M1bhUPXvXs96JnQKW8arUFj MthA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=SA9D6t5K; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c18-v6si12883526plo.474.2018.03.25.14.51.12; Sun, 25 Mar 2018 14:51:26 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=SA9D6t5K; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752338AbeCYVuR (ORCPT + 99 others); Sun, 25 Mar 2018 17:50:17 -0400 Received: from mail-pf0-f195.google.com ([209.85.192.195]:34355 "EHLO mail-pf0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752259AbeCYVuO (ORCPT ); Sun, 25 Mar 2018 17:50:14 -0400 Received: by mail-pf0-f195.google.com with SMTP id q9so45177pff.1 for ; Sun, 25 Mar 2018 14:50:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=UBD9xtag/MAysmHrJETpmcpBujCvM0CdZ1sX4Xy3kxI=; b=SA9D6t5KioTIqkS0viLADluvDvEcqwaF8/gPbi5TUQr7DPrfulkEMfFm0GR3zeX4s7 sbTeTvYPe/kaQzlNT3dtkQXPt8vxonZWQV4VdloSwM+OJOEcj9wq9ws/kBvqeagv10Ep Nam2bq4dOYyuKXe2XrWo2b9rMOTEqd325UVYS/rPSRI8BGCv6dofVU7qEoRb6wIiujmd zlDrM1Hm98tnvZsMH7pArEOF1owm9A7RhxIb1ZsfhzxvQoKQJNhPUFvVhNUTmUBTJ/by XkfXDVoEqdFjVSdwi3f7MOhJw6eXinnDypI7TEsywrhg5W3eX57dwS/xendVW0YAyNMw 3Iog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=UBD9xtag/MAysmHrJETpmcpBujCvM0CdZ1sX4Xy3kxI=; b=Lklnl7Z4Y/Bawykwdt72se3eAdOUnivLGEZbRn4qtSZVnblHayuA07DIdBpX88FJfz 7q8Oi/Qoldgafv5+Z2BIEz1V65mSVOF6C3QGPTA0QycTUOLxAPyD0qoM+/GtKWVNlCLJ uNY3eey7fEIwuSvduU/4ffZDphzmjlR9FjTRx9OEFokkkpekOIGCtuhjOSDjvFebvwOZ s4qGvSzj8iJwQPN1FO397n/Z6cS4GBFXlAGsQfgwSQ+dPS8wIawPa6MJn+NOfBFlcGp2 5CVw8gHI/dX0eVgEPVA7YpIlcaIgWqNeq424iQfBFbvx/4bqNUWI9ph7LH9YpvMUiFSJ BbAA== X-Gm-Message-State: AElRT7ECooT8wUYQYcJNcLIrN8wadEVvKrjf0/shxgvs2JvL3PCTzuyy 4+aNi1DbVpVDur24mIQrONGzwQ== X-Received: by 10.99.97.203 with SMTP id v194mr26528437pgb.373.1522014612975; Sun, 25 Mar 2018 14:50:12 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id e4sm28421742pfa.166.2018.03.25.14.50.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 25 Mar 2018 14:50:12 -0700 (PDT) Date: Sun, 25 Mar 2018 14:50:11 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Laurent Dufour cc: paulmck@linux.vnet.ibm.com, peterz@infradead.org, akpm@linux-foundation.org, kirill@shutemov.name, ak@linux.intel.com, mhocko@kernel.org, dave@stgolabs.net, jack@suse.cz, Matthew Wilcox , benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org, Thomas Gleixner , Ingo Molnar , hpa@zytor.com, Will Deacon , Sergey Senozhatsky , Andrea Arcangeli , Alexei Starovoitov , kemi.wang@intel.com, sergey.senozhatsky.work@gmail.com, Daniel Jordan , linux-kernel@vger.kernel.org, linux-mm@kvack.org, haren@linux.vnet.ibm.com, khandual@linux.vnet.ibm.com, npiggin@gmail.com, bsingharora@gmail.com, Tim Chen , linuxppc-dev@lists.ozlabs.org, x86@kernel.org Subject: Re: [PATCH v9 04/24] mm: Prepare for FAULT_FLAG_SPECULATIVE In-Reply-To: <1520963994-28477-5-git-send-email-ldufour@linux.vnet.ibm.com> Message-ID: References: <1520963994-28477-1-git-send-email-ldufour@linux.vnet.ibm.com> <1520963994-28477-5-git-send-email-ldufour@linux.vnet.ibm.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 13 Mar 2018, Laurent Dufour wrote: > From: Peter Zijlstra > > When speculating faults (without holding mmap_sem) we need to validate > that the vma against which we loaded pages is still valid when we're > ready to install the new PTE. > > Therefore, replace the pte_offset_map_lock() calls that (re)take the > PTL with pte_map_lock() which can fail in case we find the VMA changed > since we started the fault. > Based on how its used, I would have suspected this to be named pte_map_trylock(). > Signed-off-by: Peter Zijlstra (Intel) > > [Port to 4.12 kernel] > [Remove the comment about the fault_env structure which has been > implemented as the vm_fault structure in the kernel] > [move pte_map_lock()'s definition upper in the file] > Signed-off-by: Laurent Dufour > --- > include/linux/mm.h | 1 + > mm/memory.c | 56 ++++++++++++++++++++++++++++++++++++++---------------- > 2 files changed, 41 insertions(+), 16 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 4d02524a7998..2f3e98edc94a 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -300,6 +300,7 @@ extern pgprot_t protection_map[16]; > #define FAULT_FLAG_USER 0x40 /* The fault originated in userspace */ > #define FAULT_FLAG_REMOTE 0x80 /* faulting for non current tsk/mm */ > #define FAULT_FLAG_INSTRUCTION 0x100 /* The fault was during an instruction fetch */ > +#define FAULT_FLAG_SPECULATIVE 0x200 /* Speculative fault, not holding mmap_sem */ > > #define FAULT_FLAG_TRACE \ > { FAULT_FLAG_WRITE, "WRITE" }, \ I think FAULT_FLAG_SPECULATIVE should be introduced in the patch that actually uses it. > diff --git a/mm/memory.c b/mm/memory.c > index e0ae4999c824..8ac241b9f370 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2288,6 +2288,13 @@ int apply_to_page_range(struct mm_struct *mm, unsigned long addr, > } > EXPORT_SYMBOL_GPL(apply_to_page_range); > > +static bool pte_map_lock(struct vm_fault *vmf) inline? > +{ > + vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, > + vmf->address, &vmf->ptl); > + return true; > +} > + > /* > * handle_pte_fault chooses page fault handler according to an entry which was > * read non-atomically. Before making any commitment, on those architectures > @@ -2477,6 +2484,7 @@ static int wp_page_copy(struct vm_fault *vmf) > const unsigned long mmun_start = vmf->address & PAGE_MASK; > const unsigned long mmun_end = mmun_start + PAGE_SIZE; > struct mem_cgroup *memcg; > + int ret = VM_FAULT_OOM; > > if (unlikely(anon_vma_prepare(vma))) > goto oom; > @@ -2504,7 +2512,11 @@ static int wp_page_copy(struct vm_fault *vmf) > /* > * Re-check the pte - we dropped the lock > */ > - vmf->pte = pte_offset_map_lock(mm, vmf->pmd, vmf->address, &vmf->ptl); > + if (!pte_map_lock(vmf)) { > + mem_cgroup_cancel_charge(new_page, memcg, false); > + ret = VM_FAULT_RETRY; > + goto oom_free_new; > + } Ugh, but we aren't oom here, so maybe rename oom_free_new so that it makes sense for return values other than VM_FAULT_OOM? > if (likely(pte_same(*vmf->pte, vmf->orig_pte))) { > if (old_page) { > if (!PageAnon(old_page)) { > @@ -2596,7 +2608,7 @@ static int wp_page_copy(struct vm_fault *vmf) > oom: > if (old_page) > put_page(old_page); > - return VM_FAULT_OOM; > + return ret; > } > > /** > @@ -2617,8 +2629,8 @@ static int wp_page_copy(struct vm_fault *vmf) > int finish_mkwrite_fault(struct vm_fault *vmf) > { > WARN_ON_ONCE(!(vmf->vma->vm_flags & VM_SHARED)); > - vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, vmf->address, > - &vmf->ptl); > + if (!pte_map_lock(vmf)) > + return VM_FAULT_RETRY; > /* > * We might have raced with another page fault while we released the > * pte_offset_map_lock. > @@ -2736,8 +2748,11 @@ static int do_wp_page(struct vm_fault *vmf) > get_page(vmf->page); > pte_unmap_unlock(vmf->pte, vmf->ptl); > lock_page(vmf->page); > - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, > - vmf->address, &vmf->ptl); > + if (!pte_map_lock(vmf)) { > + unlock_page(vmf->page); > + put_page(vmf->page); > + return VM_FAULT_RETRY; > + } > if (!pte_same(*vmf->pte, vmf->orig_pte)) { > unlock_page(vmf->page); > pte_unmap_unlock(vmf->pte, vmf->ptl); > @@ -2947,8 +2962,10 @@ int do_swap_page(struct vm_fault *vmf) > * Back out if somebody else faulted in this pte > * while we released the pte lock. > */ Comment needs updating, pte_same() isn't the only reason to bail out here. > - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, > - vmf->address, &vmf->ptl); > + if (!pte_map_lock(vmf)) { > + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); > + return VM_FAULT_RETRY; > + } > if (likely(pte_same(*vmf->pte, vmf->orig_pte))) > ret = VM_FAULT_OOM; > delayacct_clear_flag(DELAYACCT_PF_SWAPIN); Not crucial, but it would be nice if this could do goto out instead, otherwise this is the first mid function return. > @@ -3003,8 +3020,11 @@ int do_swap_page(struct vm_fault *vmf) > /* > * Back out if somebody else already faulted in this pte. > */ Same as above. > - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, > - &vmf->ptl); > + if (!pte_map_lock(vmf)) { > + ret = VM_FAULT_RETRY; > + mem_cgroup_cancel_charge(page, memcg, false); > + goto out_page; > + } > if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) > goto out_nomap; > mem_cgroup_try_charge() is done before grabbing pte_offset_map_lock(), why does the out_nomap exit path do mem_cgroup_cancel_charge(); pte_unmap_unlock()? If the pte lock can be droppde first, there's no need to embed the mem_cgroup_cancel_charge() here. > @@ -3133,8 +3153,8 @@ static int do_anonymous_page(struct vm_fault *vmf) > !mm_forbids_zeropage(vma->vm_mm)) { > entry = pte_mkspecial(pfn_pte(my_zero_pfn(vmf->address), > vma->vm_page_prot)); > - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, > - vmf->address, &vmf->ptl); > + if (!pte_map_lock(vmf)) > + return VM_FAULT_RETRY; > if (!pte_none(*vmf->pte)) > goto unlock; > ret = check_stable_address_space(vma->vm_mm); > @@ -3169,8 +3189,11 @@ static int do_anonymous_page(struct vm_fault *vmf) > if (vma->vm_flags & VM_WRITE) > entry = pte_mkwrite(pte_mkdirty(entry)); > > - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, > - &vmf->ptl); > + if (!pte_map_lock(vmf)) { > + mem_cgroup_cancel_charge(page, memcg, false); > + put_page(page); > + return VM_FAULT_RETRY; > + } > if (!pte_none(*vmf->pte)) > goto release; > This is more spaghetti, can the exit path be fixed up so we order things consistently for all gotos? > @@ -3294,8 +3317,9 @@ static int pte_alloc_one_map(struct vm_fault *vmf) > * pte_none() under vmf->ptl protection when we return to > * alloc_set_pte(). > */ > - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, > - &vmf->ptl); > + if (!pte_map_lock(vmf)) > + return VM_FAULT_RETRY; > + > return 0; > } >