Received: by 10.213.65.68 with SMTP id h4csp1057101imn; Wed, 28 Mar 2018 19:28:19 -0700 (PDT) X-Google-Smtp-Source: AIpwx49FZPrh2NiwAePjOB19br2KrN/sVuosioTwZJJ3Mybe+F5Ysby5Ih7ryVmqx57M4UJ9vxsC X-Received: by 2002:a17:902:57d2:: with SMTP id g18-v6mr6213321plj.381.1522290499487; Wed, 28 Mar 2018 19:28:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522290499; cv=none; d=google.com; s=arc-20160816; b=F6VYcygL2mJ9j0IqB+QwA63ZGhRUusWFVE8Yk8Bx3EOEzZchcroS16N+sQU38I1iY9 ZLcJu6NToP/FqYJ1jgrCpWmkP+PG57/HWm6tTxp52SQ8/8JwFAnPqe3ipW9iEZWDO4IU S+DDpG6Lp5TMyPEXBzVuERd1QzbP5taRqnTN9SwH4EaKyU0Zp0JpWDSAAtiOf5KlUZ8R jhB1yEbC00E9h3C4hogshdlYvZbrg+CJ4nvAVP3Wd1VbAoI2NgtRyX1vvyRoTT7dMiUH YdJBhGtuMkgM1QuaIwEssDUDf9IHK8VQzMt04lXGFimM6MTDIgM3C+GYrvY4/CPl4KGS dSZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=j821QRPdSDuj3QsUoILuLVcWSDJqQsKz/kAVH1YRNuM=; b=AOFE3N+PYwQ1Zr9pek9eDkXa9j6L4gPhg48w4Qd3juNgiLM0eRSt0cF5ZFHeWfbe+u +3NaLiWMfW17Q4ttyRszJ/09NF1/K0GQ9UCCZsfd5XoL1J/iadqpksY2A1ldI4YkDJO8 wytbzLtiSti83/DZ2BWftRzE6UOWG6kd4l0RzqloSRwkpl7006BvE61aWUqEZmsHfS5q 1q8fi0Np3+rjC2LEE8Lt3cffH2tz14hE2VRas8qR1B/xFtqP9+64cBdvWxItxWvdSwAV MyHZ8evjiN3hnHvl8Ip3U2OA7av/pK7Kyc9+4NoirVUzF6UM2a2lLkuGS19UCw2QQSs1 95Zw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=HtjNwyXG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g14si3670528pfe.263.2018.03.28.19.28.05; Wed, 28 Mar 2018 19:28:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=HtjNwyXG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751205AbeC2C0t (ORCPT + 99 others); Wed, 28 Mar 2018 22:26:49 -0400 Received: from mail-it0-f65.google.com ([209.85.214.65]:54036 "EHLO mail-it0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751147AbeC2C0q (ORCPT ); Wed, 28 Mar 2018 22:26:46 -0400 Received: by mail-it0-f65.google.com with SMTP id m134-v6so6262754itb.3 for ; Wed, 28 Mar 2018 19:26:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=j821QRPdSDuj3QsUoILuLVcWSDJqQsKz/kAVH1YRNuM=; b=HtjNwyXG134M3gNMhry8BtJ9Swj4soLs2lYrXAAiA9pBUavS0BH394pu79HkJR3ARH aXEZaOG4bN14VJybWXk963jcw9+DlrgxKAtxoz58JdF50G5PdmkoxFLjucjvIYHaVztN kgzIDKIPpWyUJ2hJ2NgIFykGH5PPWhj8IXqpPG/Sy3wcZAKjOh4yT28iYchNdUiDRIxR pAnyqGJYVkuowjvU3++tRkR6rrBbFBHAXO3xYUPfDgOXZUjiIPhkDM+N2Vxh0fe+nmy7 ubbGwUnePVVbOmAxl14OD4dMsRA66NHvlmr90Fo09QO1Yg4m4FC6AgLm0z72DSqK3p/L 6JIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=j821QRPdSDuj3QsUoILuLVcWSDJqQsKz/kAVH1YRNuM=; b=W/VTe+i5sGXRVXl2UJ1uJ433vh6FdUIK6YrwhmVL3LRSBh2liiz3lBFmRt5IxA+v39 yNd4fk2/XiME/FznBu/uImezBlsdh41yCCBX/O/sjd1eKcxJhnn8gQhgEEFk6SKHtbu7 atQId4equl61wskTdGUnaQcjwM+ikgDjQww+zrEb+nZr3Q/uqyenem3dMeoidVEsoiEX BoPXveHJkBswjq/QJXYHI8wLb/6F7hg/EPvVT2NnQ8IFtI39qhaYoTWPfGlFzAH1Crz2 uwIBhWNBqnfUuvSvPUBaspdq/Xs0P/IJvlpDmYhHK26N3k4Xkf5miTVOZopedDyns4eE ZA3w== X-Gm-Message-State: AElRT7FP9KMVppSXMXWkLs73K0tsblmBQ0ZDU0+X1u9wdxbqqiURZ1JP tlfFIWYwiGyPJySsb1Z4SESSC2Iec5YLZ38xCkI= X-Received: by 2002:a24:49a3:: with SMTP id e35-v6mr6214956itd.114.1522290405707; Wed, 28 Mar 2018 19:26:45 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.168.153 with HTTP; Wed, 28 Mar 2018 19:26:45 -0700 (PDT) In-Reply-To: <1518794738-4186-23-git-send-email-ldufour@linux.vnet.ibm.com> References: <1518794738-4186-1-git-send-email-ldufour@linux.vnet.ibm.com> <1518794738-4186-23-git-send-email-ldufour@linux.vnet.ibm.com> From: Ganesh Mahendran Date: Thu, 29 Mar 2018 10:26:45 +0800 Message-ID: Subject: Re: [PATCH v8 22/24] mm: Speculative page fault handler return VMA To: Laurent Dufour Cc: Paul McKenney , Peter Zijlstra , Andrew Morton , kirill@shutemov.name, ak@linux.intel.com, Michal Hocko , dave@stgolabs.net, jack@suse.cz, Matthew Wilcox , benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org, Thomas Gleixner , Ingo Molnar , hpa@zytor.com, Will Deacon , Sergey Senozhatsky , Andrea Arcangeli , Alexei Starovoitov , kemi.wang@intel.com, Sergey Senozhatsky , Daniel Jordan , linux-kernel , Linux-MM , haren@linux.vnet.ibm.com, khandual@linux.vnet.ibm.com, npiggin@gmail.com, Balbir Singh , Tim Chen , linuxppc-dev@lists.ozlabs.org, x86@kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Laurent 2018-02-16 23:25 GMT+08:00 Laurent Dufour : > When the speculative page fault handler is returning VM_RETRY, there is a > chance that VMA fetched without grabbing the mmap_sem can be reused by the > legacy page fault handler. By reusing it, we avoid calling find_vma() > again. To achieve, that we must ensure that the VMA structure will not be > freed in our back. This is done by getting the reference on it (get_vma()) > and by assuming that the caller will call the new service > can_reuse_spf_vma() once it has grabbed the mmap_sem. > > can_reuse_spf_vma() is first checking that the VMA is still in the RB tree > , and then that the VMA's boundaries matched the passed address and release > the reference on the VMA so that it can be freed if needed. > > In the case the VMA is freed, can_reuse_spf_vma() will have returned false > as the VMA is no more in the RB tree. when I applied this patch to arm64, I got a crash: [ 6.088296] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 6.088307] pgd = ffffff9d67735000 [ 6.088313] [00000000] *pgd=00000001795e3003, *pud=00000001795e3003, *pmd=0000000000000000 [ 6.088372] ------------[ cut here ]------------ [ 6.088377] Kernel BUG at ffffff9d64f65960 [verbose debug info unavailable] [ 6.088384] Internal error: Oops - BUG: 96000045 [#1] PREEMPT SMP [ 6.088389] BUG: Bad rss-counter state mm:ffffffe8f3861040 idx:0 val:90 [ 6.088393] BUG: Bad rss-counter state mm:ffffffe8f3861040 idx:1 val:58 [ 6.088398] Modules linked in: [ 6.088408] CPU: 1 PID: 621 Comm: qseecomd Not tainted 4.4.78-perf+ #88 [ 6.088413] Hardware name: Qualcomm Technologies, Inc. SDM 636 PM660 + PM660L MTP E7S (DT) [ 6.088419] task: ffffffe8f6208000 ti: ffffffe872a8c000 task.ti: ffffffe872a8c000 [ 6.088432] PC is at __rb_erase_color+0x108/0x240 [ 6.088441] LR is at vma_interval_tree_remove+0x244/0x24c [ 6.088447] pc : [] lr : [] pstate: 604001c5 [ 6.088451] sp : ffffffe872a8fa50 [ 6.088455] x29: ffffffe872a8fa50 x28: 0000000000000008 [ 6.088462] x27: 0000000000000009 x26: 0000000000000000 [ 6.088470] x25: ffffffe8f458fb80 x24: 000000768ff87000 [ 6.088477] x23: 0000000000000000 x22: 0000000000000000 [ 6.088484] x21: ffffff9d64d9be7c x20: ffffffe8f3ff0680 [ 6.088492] x19: ffffffe8f212e9b0 x18: 0000000000000074 [ 6.088499] x17: 0000000000000007 x16: 000000000000000e [ 6.088507] x15: ffffff9d65c88000 x14: 0000000000000001 [ 6.088514] x13: 0000000000192d76 x12: 0000000000989680 [ 6.088521] x11: 00000000001fffff x10: ffffff9d661ded1b [ 6.088528] x9 : 0000007691759000 x8 : 0000000007691759 [ 6.088535] x7 : 0000000000000000 x6 : ffffffe871ebada8 [ 6.088541] x5 : 00000000000000e1 x4 : ffffffe8f212e958 [ 6.088548] x3 : 00000000000000e9 x2 : 0000000000000000 [ 6.088555] x1 : ffffffe8f212f110 x0 : ffffffe8f212e9b1 [ 6.088564] [ 6.088564] PC: 0xffffff9d64f65920: [ 6.088568] 5920 f9000002 aa0103e0 aa1603e1 d63f02a0 aa1603e1 f9400822 f9000662 f9000833 [ 6.088590] 5940 1400003b f9400a61 f9400020 370002c0 f9400436 b2400260 f9000a76 f9000433 [ 6.088610] 5960 f90002c0 f9400260 f9000020 f9000261 f27ef400 54000100 f9400802 eb13005f [ 6.088630] 5980 54000061 f9000801 14000004 f9000401 14000002 f9000281 aa1303e0 d63f02a0 [ 6.088652] [ 6.088652] LR: 0xffffff9d64d9c298: [ 6.088656] c298 f9403083 b4000083 f9400c63 eb03005f 9a832042 f9403883 eb02007f 540000a0 [ 6.088676] c2b8 f9003882 f9402c82 927ef442 b5fffd22 b4000080 f0ffffe2 9139f042 94072561 [ 6.088695] c2d8 a8c17bfd d65f03c0 a9bf7bfd 910003fd f9400003 d2800000 b40000e3 f9400c65 [ 6.088715] c2f8 d1016063 eb0100bf 54000063 aa0303e0 97fffef2 a8c17bfd d65f03c0 a9bf7bfd [ 6.088735] [ 6.088735] SP: 0xffffffe872a8fa10: [ 6.088740] fa10 64d9c2d8 ffffff9d 72a8fa50 ffffffe8 64f65960 ffffff9d 604001c5 00000000 [ 6.088759] fa30 71d67d70 ffffffe8 71c281e8 ffffffe8 00000000 00000080 64daa90c ffffff9d [ 6.088779] fa50 72a8fa90 ffffffe8 64d9c2d8 ffffff9d 71ebada8 ffffffe8 f3ff0678 ffffffe8 [ 6.088799] fa70 72a8fb80 ffffffe8 00000000 00000000 00000000 00000000 00000001 00000000 [ 6.088818] [ 6.088823] Process qseecomd (pid: 621, stack limit = 0xffffffe872a8c028) [ 6.088828] Call trace: [ 6.088834] Exception stack(0xffffffe872a8f860 to 0xffffffe872a8f990) [ 6.088841] f860: ffffffe8f212e9b0 0000008000000000 0000000082b37000 ffffff9d64f65960 [ 6.088848] f880: 00000000604001c5 ffffff9d672c8680 ffffff9d672c9c00 ffffff9d672d3ab7 [ 6.088855] f8a0: ffffffe872a8f8f0 ffffff9d64db9bfc 0000000000000000 ffffffe8f9402c00 [ 6.088861] f8c0: ffffffe872a8c000 0000000000000000 ffffffe872a8f920 ffffff9d64db9bfc [ 6.088867] f8e0: 0000000000000000 ffffffe8f9402b00 ffffffe872a8fa10 ffffff9d64dba568 [ 6.088874] f900: ffffffbe61c759c0 ffffffe871d67d70 ffffffe8f9402c00 1de56fb006cba396 [ 6.088881] f920: ffffffe8f212e9b1 ffffffe8f212f110 0000000000000000 00000000000000e9 [ 6.088888] f940: ffffffe8f212e958 00000000000000e1 ffffffe871ebada8 0000000000000000 [ 6.088895] f960: 0000000007691759 0000007691759000 ffffff9d661ded1b 00000000001fffff [ 6.088901] f980: 0000000000989680 0000000000192d76 [ 6.088908] [] __rb_erase_color+0x108/0x240 [ 6.088915] [] vma_interval_tree_remove+0x244/0x24c [ 6.088924] [] __remove_shared_vm_struct+0x74/0x88 [ 6.088930] [] unlink_file_vma+0x40/0x54 [ 6.088937] [] free_pgtables+0xb8/0xfc [ 6.088945] [] exit_mmap+0x78/0x13c [ 6.088953] [] mmput+0x40/0xe8 [ 6.088961] [] do_exit+0x3ac/0x8d8 [ 6.088966] [] do_group_exit+0x44/0x9c [ 6.088974] [] get_signal+0x4e8/0x524 [ 6.088981] [] do_signal+0xac/0x93c [ 6.088989] [] do_notify_resume+0x18/0x58 [ 6.088995] [] work_pending+0x10/0x14 [ 6.089003] Code: f9400436 b2400260 f9000a76 f9000433 (f90002c0) [ 6.089009] ---[ end trace 224ce5f97841b6a5 ]--- [ 6.110819] Kernel panic - not syncing: Fatal exception Thanks. > > Signed-off-by: Laurent Dufour > --- > include/linux/mm.h | 5 +- > mm/memory.c | 136 +++++++++++++++++++++++++++++++++-------------------- > 2 files changed, 88 insertions(+), 53 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index c383a4e2ceb3..0cd31a37bb3d 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1355,7 +1355,10 @@ extern int handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > unsigned int flags); > #ifdef CONFIG_SPECULATIVE_PAGE_FAULT > extern int handle_speculative_fault(struct mm_struct *mm, > - unsigned long address, unsigned int flags); > + unsigned long address, unsigned int flags, > + struct vm_area_struct **vma); > +extern bool can_reuse_spf_vma(struct vm_area_struct *vma, > + unsigned long address); > #endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ > extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, > unsigned long address, unsigned int fault_flags, > diff --git a/mm/memory.c b/mm/memory.c > index 2ef686405154..1f5ce5ff79af 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4307,13 +4307,22 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > /* This is required by vm_normal_page() */ > #error "Speculative page fault handler requires __HAVE_ARCH_PTE_SPECIAL" > #endif > - > /* > * vm_normal_page() adds some processing which should be done while > * hodling the mmap_sem. > */ > + > +/* > + * Tries to handle the page fault in a speculative way, without grabbing the > + * mmap_sem. > + * When VM_FAULT_RETRY is returned, the vma pointer is valid and this vma must > + * be checked later when the mmap_sem has been grabbed by calling > + * can_reuse_spf_vma(). > + * This is needed as the returned vma is kept in memory until the call to > + * can_reuse_spf_vma() is made. > + */ > int handle_speculative_fault(struct mm_struct *mm, unsigned long address, > - unsigned int flags) > + unsigned int flags, struct vm_area_struct **vma) > { > struct vm_fault vmf = { > .address = address, > @@ -4322,7 +4331,6 @@ int handle_speculative_fault(struct mm_struct *mm, unsigned long address, > p4d_t *p4d, p4dval; > pud_t pudval; > int seq, ret = VM_FAULT_RETRY; > - struct vm_area_struct *vma; > #ifdef CONFIG_NUMA > struct mempolicy *pol; > #endif > @@ -4331,14 +4339,16 @@ int handle_speculative_fault(struct mm_struct *mm, unsigned long address, > flags &= ~(FAULT_FLAG_ALLOW_RETRY|FAULT_FLAG_KILLABLE); > flags |= FAULT_FLAG_SPECULATIVE; > > - vma = get_vma(mm, address); > - if (!vma) > + *vma = get_vma(mm, address); > + if (!*vma) > return ret; > + vmf.vma = *vma; > > - seq = raw_read_seqcount(&vma->vm_sequence); /* rmb <-> seqlock,vma_rb_erase() */ > + /* rmb <-> seqlock,vma_rb_erase() */ > + seq = raw_read_seqcount(&vmf.vma->vm_sequence); > if (seq & 1) { > - trace_spf_vma_changed(_RET_IP_, vma, address); > - goto out_put; > + trace_spf_vma_changed(_RET_IP_, vmf.vma, address); > + return ret; > } > > /* > @@ -4346,9 +4356,9 @@ int handle_speculative_fault(struct mm_struct *mm, unsigned long address, > * with the VMA. > * This include huge page from hugetlbfs. > */ > - if (vma->vm_ops) { > - trace_spf_vma_notsup(_RET_IP_, vma, address); > - goto out_put; > + if (vmf.vma->vm_ops) { > + trace_spf_vma_notsup(_RET_IP_, vmf.vma, address); > + return ret; > } > > /* > @@ -4356,18 +4366,18 @@ int handle_speculative_fault(struct mm_struct *mm, unsigned long address, > * because vm_next and vm_prev must be safe. This can't be guaranteed > * in the speculative path. > */ > - if (unlikely(!vma->anon_vma)) { > - trace_spf_vma_notsup(_RET_IP_, vma, address); > - goto out_put; > + if (unlikely(!vmf.vma->anon_vma)) { > + trace_spf_vma_notsup(_RET_IP_, vmf.vma, address); > + return ret; > } > > - vmf.vma_flags = READ_ONCE(vma->vm_flags); > - vmf.vma_page_prot = READ_ONCE(vma->vm_page_prot); > + vmf.vma_flags = READ_ONCE(vmf.vma->vm_flags); > + vmf.vma_page_prot = READ_ONCE(vmf.vma->vm_page_prot); > > /* Can't call userland page fault handler in the speculative path */ > if (unlikely(vmf.vma_flags & VM_UFFD_MISSING)) { > - trace_spf_vma_notsup(_RET_IP_, vma, address); > - goto out_put; > + trace_spf_vma_notsup(_RET_IP_, vmf.vma, address); > + return ret; > } > > if (vmf.vma_flags & VM_GROWSDOWN || vmf.vma_flags & VM_GROWSUP) { > @@ -4376,48 +4386,39 @@ int handle_speculative_fault(struct mm_struct *mm, unsigned long address, > * boundaries but we want to trace it as not supported instead > * of changed. > */ > - trace_spf_vma_notsup(_RET_IP_, vma, address); > - goto out_put; > + trace_spf_vma_notsup(_RET_IP_, vmf.vma, address); > + return ret; > } > > - if (address < READ_ONCE(vma->vm_start) > - || READ_ONCE(vma->vm_end) <= address) { > - trace_spf_vma_changed(_RET_IP_, vma, address); > - goto out_put; > + if (address < READ_ONCE(vmf.vma->vm_start) > + || READ_ONCE(vmf.vma->vm_end) <= address) { > + trace_spf_vma_changed(_RET_IP_, vmf.vma, address); > + return ret; > } > > - if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE, > + if (!arch_vma_access_permitted(vmf.vma, flags & FAULT_FLAG_WRITE, > flags & FAULT_FLAG_INSTRUCTION, > - flags & FAULT_FLAG_REMOTE)) { > - trace_spf_vma_access(_RET_IP_, vma, address); > - ret = VM_FAULT_SIGSEGV; > - goto out_put; > - } > + flags & FAULT_FLAG_REMOTE)) > + goto out_segv; > > /* This is one is required to check that the VMA has write access set */ > if (flags & FAULT_FLAG_WRITE) { > - if (unlikely(!(vmf.vma_flags & VM_WRITE))) { > - trace_spf_vma_access(_RET_IP_, vma, address); > - ret = VM_FAULT_SIGSEGV; > - goto out_put; > - } > - } else if (unlikely(!(vmf.vma_flags & (VM_READ|VM_EXEC|VM_WRITE)))) { > - trace_spf_vma_access(_RET_IP_, vma, address); > - ret = VM_FAULT_SIGSEGV; > - goto out_put; > - } > + if (unlikely(!(vmf.vma_flags & VM_WRITE))) > + goto out_segv; > + } else if (unlikely(!(vmf.vma_flags & (VM_READ|VM_EXEC|VM_WRITE)))) > + goto out_segv; > > #ifdef CONFIG_NUMA > /* > * MPOL_INTERLEAVE implies additional check in mpol_misplaced() which > * are not compatible with the speculative page fault processing. > */ > - pol = __get_vma_policy(vma, address); > + pol = __get_vma_policy(vmf.vma, address); > if (!pol) > pol = get_task_policy(current); > if (pol && pol->mode == MPOL_INTERLEAVE) { > - trace_spf_vma_notsup(_RET_IP_, vma, address); > - goto out_put; > + trace_spf_vma_notsup(_RET_IP_, vmf.vma, address); > + return ret; > } > #endif > > @@ -4479,9 +4480,8 @@ int handle_speculative_fault(struct mm_struct *mm, unsigned long address, > vmf.pte = NULL; > } > > - vmf.vma = vma; > - vmf.pgoff = linear_page_index(vma, address); > - vmf.gfp_mask = __get_fault_gfp_mask(vma); > + vmf.pgoff = linear_page_index(vmf.vma, address); > + vmf.gfp_mask = __get_fault_gfp_mask(vmf.vma); > vmf.sequence = seq; > vmf.flags = flags; > > @@ -4491,16 +4491,22 @@ int handle_speculative_fault(struct mm_struct *mm, unsigned long address, > * We need to re-validate the VMA after checking the bounds, otherwise > * we might have a false positive on the bounds. > */ > - if (read_seqcount_retry(&vma->vm_sequence, seq)) { > - trace_spf_vma_changed(_RET_IP_, vma, address); > - goto out_put; > + if (read_seqcount_retry(&vmf.vma->vm_sequence, seq)) { > + trace_spf_vma_changed(_RET_IP_, vmf.vma, address); > + return ret; > } > > mem_cgroup_oom_enable(); > ret = handle_pte_fault(&vmf); > mem_cgroup_oom_disable(); > > - put_vma(vma); > + /* > + * If there is no need to retry, don't return the vma to the caller. > + */ > + if (!(ret & VM_FAULT_RETRY)) { > + put_vma(vmf.vma); > + *vma = NULL; > + } > > /* > * The task may have entered a memcg OOM situation but > @@ -4513,9 +4519,35 @@ int handle_speculative_fault(struct mm_struct *mm, unsigned long address, > return ret; > > out_walk: > - trace_spf_vma_notsup(_RET_IP_, vma, address); > + trace_spf_vma_notsup(_RET_IP_, vmf.vma, address); > local_irq_enable(); > -out_put: > + return ret; > + > +out_segv: > + trace_spf_vma_access(_RET_IP_, vmf.vma, address); > + /* > + * We don't return VM_FAULT_RETRY so the caller is not expected to > + * retrieve the fetched VMA. > + */ > + put_vma(vmf.vma); > + *vma = NULL; > + return VM_FAULT_SIGSEGV; > +} > + > +/* > + * This is used to know if the vma fetch in the speculative page fault handler > + * is still valid when trying the regular fault path while holding the > + * mmap_sem. > + * The call to put_vma(vma) must be made after checking the vma's fields, as > + * the vma may be freed by put_vma(). In such a case it is expected that false > + * is returned. > + */ > +bool can_reuse_spf_vma(struct vm_area_struct *vma, unsigned long address) > +{ > + bool ret; > + > + ret = !RB_EMPTY_NODE(&vma->vm_rb) && > + vma->vm_start <= address && address < vma->vm_end; > put_vma(vma); > return ret; > } > -- > 2.7.4 >