Received: by 10.213.65.68 with SMTP id h4csp501865imn; Tue, 13 Mar 2018 11:03:15 -0700 (PDT) X-Google-Smtp-Source: AG47ELuaHGG9yDyNd8yYtOhIr2A4n6Im3TrG071xTAR16zqnO7DM0Bcbsze2BcSnYEO3UYDEhksx X-Received: by 10.99.120.197 with SMTP id t188mr1204317pgc.358.1520964194810; Tue, 13 Mar 2018 11:03:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520964194; cv=none; d=google.com; s=arc-20160816; b=SayVCo1N15Wxe3fSJsGGifIl1OP7JSilSVGsuQaXnZK2t01Du20/ucBdO2NQK/jzMz jfyV+hQgabKyWg1+T2+WZCjAqjBMJUPI+Bq9qN34gXxwJJBehI6omZ61lDpKLfVmysL5 1s3utY+PWJJmYI4M5Opw3q0IKzkinRffBssN/b/JyhmQe0tyYg7mRzmaYG/kzLOOKSSJ hiXHAO3qvjJGMB15/nzftC1xeXvfUOOBINWRslZ424hRdHz/zav1RW5Lt39g0RySMxk9 hBHkVnqdD77VjmkBFoa3HNU/rPxqDqhYqBPNx/2/WypT1JMPYqRuMecs3eu/E1dHWKdO FkKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:references:in-reply-to:date :subject:cc:to:from:arc-authentication-results; bh=el/bbr70syTVa3RQ2/zYHFfiG/XcsHb3xXsKVyUi+yY=; b=DayAsK31wgKRUh0dn78tOhtZGFCVL5tvkhTSRq6VVYkc5LoGEh/ApSmUkBQuoQChuo upNbYv3qMgUQZajIVfuwiLVW4zTXSQD55XXB9Dn6XyDja3nNopsYlqbguMGlpqEX3GhX A4jDpzKB9dWxbde/d0I2CcT2QwHhDQQzeanCcbGgPeD4qeL93JaE4ovhOv0RmDUDgDd8 3lIIEREtEUUyI/xuXT/DEC4HvsrWTXzcrQHJqmlF6TsXmq/Q+DubIPPNk8ty/v0DQxex uOsvGn0ici30qpF4IcadfD33B/C4XkjoliU96p74iUkcxN82ZsYQw+Pr4LvNzolj9AUb nQlw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e4si427571pgv.581.2018.03.13.11.02.52; Tue, 13 Mar 2018 11:03:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752889AbeCMSBB (ORCPT + 99 others); Tue, 13 Mar 2018 14:01:01 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:58396 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752704AbeCMSA5 (ORCPT ); Tue, 13 Mar 2018 14:00:57 -0400 Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w2DHv1kG041137 for ; Tue, 13 Mar 2018 14:00:56 -0400 Received: from e06smtp12.uk.ibm.com (e06smtp12.uk.ibm.com [195.75.94.108]) by mx0a-001b2d01.pphosted.com with ESMTP id 2gpkcn8f07-1 (version=TLSv1.2 cipher=AES256-SHA256 bits=256 verify=NOT) for ; Tue, 13 Mar 2018 14:00:55 -0400 Received: from localhost by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 13 Mar 2018 18:00:52 -0000 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp12.uk.ibm.com (192.168.101.142) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 13 Mar 2018 18:00:44 -0000 Received: from d06av24.portsmouth.uk.ibm.com (d06av24.portsmouth.uk.ibm.com [9.149.105.60]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w2DI0imL53608470; Tue, 13 Mar 2018 18:00:44 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 51C1F4204B; Tue, 13 Mar 2018 17:53:00 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3F7844203F; Tue, 13 Mar 2018 17:52:59 +0000 (GMT) Received: from nimbus.lab.toulouse-stg.fr.ibm.com (unknown [9.101.4.33]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 13 Mar 2018 17:52:59 +0000 (GMT) From: Laurent Dufour To: paulmck@linux.vnet.ibm.com, peterz@infradead.org, akpm@linux-foundation.org, kirill@shutemov.name, ak@linux.intel.com, mhocko@kernel.org, dave@stgolabs.net, jack@suse.cz, Matthew Wilcox , benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org, Thomas Gleixner , Ingo Molnar , hpa@zytor.com, Will Deacon , Sergey Senozhatsky , Andrea Arcangeli , Alexei Starovoitov , kemi.wang@intel.com, sergey.senozhatsky.work@gmail.com, Daniel Jordan Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, haren@linux.vnet.ibm.com, khandual@linux.vnet.ibm.com, npiggin@gmail.com, bsingharora@gmail.com, Tim Chen , linuxppc-dev@lists.ozlabs.org, x86@kernel.org Subject: [PATCH v9 22/24] mm: Speculative page fault handler return VMA Date: Tue, 13 Mar 2018 18:59:52 +0100 X-Mailer: git-send-email 2.7.4 In-Reply-To: <1520963994-28477-1-git-send-email-ldufour@linux.vnet.ibm.com> References: <1520963994-28477-1-git-send-email-ldufour@linux.vnet.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 18031318-0008-0000-0000-000004DD01A3 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18031318-0009-0000-0000-00001E700197 Message-Id: <1520963994-28477-23-git-send-email-ldufour@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-03-13_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1803130203 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When the speculative page fault handler is returning VM_RETRY, there is a chance that VMA fetched without grabbing the mmap_sem can be reused by the legacy page fault handler. By reusing it, we avoid calling find_vma() again. To achieve, that we must ensure that the VMA structure will not be freed in our back. This is done by getting the reference on it (get_vma()) and by assuming that the caller will call the new service can_reuse_spf_vma() once it has grabbed the mmap_sem. can_reuse_spf_vma() is first checking that the VMA is still in the RB tree , and then that the VMA's boundaries matched the passed address and release the reference on the VMA so that it can be freed if needed. In the case the VMA is freed, can_reuse_spf_vma() will have returned false as the VMA is no more in the RB tree. Signed-off-by: Laurent Dufour --- include/linux/mm.h | 5 +- mm/memory.c | 136 +++++++++++++++++++++++++++++++++-------------------- 2 files changed, 88 insertions(+), 53 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1acc3f4e07d1..38a8c0041fd0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1357,7 +1357,10 @@ extern int handle_mm_fault(struct vm_area_struct *vma, unsigned long address, unsigned int flags); #ifdef CONFIG_SPECULATIVE_PAGE_FAULT extern int handle_speculative_fault(struct mm_struct *mm, - unsigned long address, unsigned int flags); + unsigned long address, unsigned int flags, + struct vm_area_struct **vma); +extern bool can_reuse_spf_vma(struct vm_area_struct *vma, + unsigned long address); #endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, unsigned long address, unsigned int fault_flags, diff --git a/mm/memory.c b/mm/memory.c index f39c4a4df703..16d3f5f4ffdd 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4292,13 +4292,22 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address, /* This is required by vm_normal_page() */ #error "Speculative page fault handler requires __HAVE_ARCH_PTE_SPECIAL" #endif - /* * vm_normal_page() adds some processing which should be done while * hodling the mmap_sem. */ + +/* + * Tries to handle the page fault in a speculative way, without grabbing the + * mmap_sem. + * When VM_FAULT_RETRY is returned, the vma pointer is valid and this vma must + * be checked later when the mmap_sem has been grabbed by calling + * can_reuse_spf_vma(). + * This is needed as the returned vma is kept in memory until the call to + * can_reuse_spf_vma() is made. + */ int handle_speculative_fault(struct mm_struct *mm, unsigned long address, - unsigned int flags) + unsigned int flags, struct vm_area_struct **vma) { struct vm_fault vmf = { .address = address, @@ -4307,7 +4316,6 @@ int handle_speculative_fault(struct mm_struct *mm, unsigned long address, p4d_t *p4d, p4dval; pud_t pudval; int seq, ret = VM_FAULT_RETRY; - struct vm_area_struct *vma; #ifdef CONFIG_NUMA struct mempolicy *pol; #endif @@ -4316,14 +4324,16 @@ int handle_speculative_fault(struct mm_struct *mm, unsigned long address, flags &= ~(FAULT_FLAG_ALLOW_RETRY|FAULT_FLAG_KILLABLE); flags |= FAULT_FLAG_SPECULATIVE; - vma = get_vma(mm, address); - if (!vma) + *vma = get_vma(mm, address); + if (!*vma) return ret; + vmf.vma = *vma; - seq = raw_read_seqcount(&vma->vm_sequence); /* rmb <-> seqlock,vma_rb_erase() */ + /* rmb <-> seqlock,vma_rb_erase() */ + seq = raw_read_seqcount(&vmf.vma->vm_sequence); if (seq & 1) { - trace_spf_vma_changed(_RET_IP_, vma, address); - goto out_put; + trace_spf_vma_changed(_RET_IP_, vmf.vma, address); + return ret; } /* @@ -4331,9 +4341,9 @@ int handle_speculative_fault(struct mm_struct *mm, unsigned long address, * with the VMA. * This include huge page from hugetlbfs. */ - if (vma->vm_ops) { - trace_spf_vma_notsup(_RET_IP_, vma, address); - goto out_put; + if (vmf.vma->vm_ops) { + trace_spf_vma_notsup(_RET_IP_, vmf.vma, address); + return ret; } /* @@ -4341,18 +4351,18 @@ int handle_speculative_fault(struct mm_struct *mm, unsigned long address, * because vm_next and vm_prev must be safe. This can't be guaranteed * in the speculative path. */ - if (unlikely(!vma->anon_vma)) { - trace_spf_vma_notsup(_RET_IP_, vma, address); - goto out_put; + if (unlikely(!vmf.vma->anon_vma)) { + trace_spf_vma_notsup(_RET_IP_, vmf.vma, address); + return ret; } - vmf.vma_flags = READ_ONCE(vma->vm_flags); - vmf.vma_page_prot = READ_ONCE(vma->vm_page_prot); + vmf.vma_flags = READ_ONCE(vmf.vma->vm_flags); + vmf.vma_page_prot = READ_ONCE(vmf.vma->vm_page_prot); /* Can't call userland page fault handler in the speculative path */ if (unlikely(vmf.vma_flags & VM_UFFD_MISSING)) { - trace_spf_vma_notsup(_RET_IP_, vma, address); - goto out_put; + trace_spf_vma_notsup(_RET_IP_, vmf.vma, address); + return ret; } if (vmf.vma_flags & VM_GROWSDOWN || vmf.vma_flags & VM_GROWSUP) { @@ -4361,48 +4371,39 @@ int handle_speculative_fault(struct mm_struct *mm, unsigned long address, * boundaries but we want to trace it as not supported instead * of changed. */ - trace_spf_vma_notsup(_RET_IP_, vma, address); - goto out_put; + trace_spf_vma_notsup(_RET_IP_, vmf.vma, address); + return ret; } - if (address < READ_ONCE(vma->vm_start) - || READ_ONCE(vma->vm_end) <= address) { - trace_spf_vma_changed(_RET_IP_, vma, address); - goto out_put; + if (address < READ_ONCE(vmf.vma->vm_start) + || READ_ONCE(vmf.vma->vm_end) <= address) { + trace_spf_vma_changed(_RET_IP_, vmf.vma, address); + return ret; } - if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE, + if (!arch_vma_access_permitted(vmf.vma, flags & FAULT_FLAG_WRITE, flags & FAULT_FLAG_INSTRUCTION, - flags & FAULT_FLAG_REMOTE)) { - trace_spf_vma_access(_RET_IP_, vma, address); - ret = VM_FAULT_SIGSEGV; - goto out_put; - } + flags & FAULT_FLAG_REMOTE)) + goto out_segv; /* This is one is required to check that the VMA has write access set */ if (flags & FAULT_FLAG_WRITE) { - if (unlikely(!(vmf.vma_flags & VM_WRITE))) { - trace_spf_vma_access(_RET_IP_, vma, address); - ret = VM_FAULT_SIGSEGV; - goto out_put; - } - } else if (unlikely(!(vmf.vma_flags & (VM_READ|VM_EXEC|VM_WRITE)))) { - trace_spf_vma_access(_RET_IP_, vma, address); - ret = VM_FAULT_SIGSEGV; - goto out_put; - } + if (unlikely(!(vmf.vma_flags & VM_WRITE))) + goto out_segv; + } else if (unlikely(!(vmf.vma_flags & (VM_READ|VM_EXEC|VM_WRITE)))) + goto out_segv; #ifdef CONFIG_NUMA /* * MPOL_INTERLEAVE implies additional check in mpol_misplaced() which * are not compatible with the speculative page fault processing. */ - pol = __get_vma_policy(vma, address); + pol = __get_vma_policy(vmf.vma, address); if (!pol) pol = get_task_policy(current); if (pol && pol->mode == MPOL_INTERLEAVE) { - trace_spf_vma_notsup(_RET_IP_, vma, address); - goto out_put; + trace_spf_vma_notsup(_RET_IP_, vmf.vma, address); + return ret; } #endif @@ -4464,9 +4465,8 @@ int handle_speculative_fault(struct mm_struct *mm, unsigned long address, vmf.pte = NULL; } - vmf.vma = vma; - vmf.pgoff = linear_page_index(vma, address); - vmf.gfp_mask = __get_fault_gfp_mask(vma); + vmf.pgoff = linear_page_index(vmf.vma, address); + vmf.gfp_mask = __get_fault_gfp_mask(vmf.vma); vmf.sequence = seq; vmf.flags = flags; @@ -4476,16 +4476,22 @@ int handle_speculative_fault(struct mm_struct *mm, unsigned long address, * We need to re-validate the VMA after checking the bounds, otherwise * we might have a false positive on the bounds. */ - if (read_seqcount_retry(&vma->vm_sequence, seq)) { - trace_spf_vma_changed(_RET_IP_, vma, address); - goto out_put; + if (read_seqcount_retry(&vmf.vma->vm_sequence, seq)) { + trace_spf_vma_changed(_RET_IP_, vmf.vma, address); + return ret; } mem_cgroup_oom_enable(); ret = handle_pte_fault(&vmf); mem_cgroup_oom_disable(); - put_vma(vma); + /* + * If there is no need to retry, don't return the vma to the caller. + */ + if (!(ret & VM_FAULT_RETRY)) { + put_vma(vmf.vma); + *vma = NULL; + } /* * The task may have entered a memcg OOM situation but @@ -4498,9 +4504,35 @@ int handle_speculative_fault(struct mm_struct *mm, unsigned long address, return ret; out_walk: - trace_spf_vma_notsup(_RET_IP_, vma, address); + trace_spf_vma_notsup(_RET_IP_, vmf.vma, address); local_irq_enable(); -out_put: + return ret; + +out_segv: + trace_spf_vma_access(_RET_IP_, vmf.vma, address); + /* + * We don't return VM_FAULT_RETRY so the caller is not expected to + * retrieve the fetched VMA. + */ + put_vma(vmf.vma); + *vma = NULL; + return VM_FAULT_SIGSEGV; +} + +/* + * This is used to know if the vma fetch in the speculative page fault handler + * is still valid when trying the regular fault path while holding the + * mmap_sem. + * The call to put_vma(vma) must be made after checking the vma's fields, as + * the vma may be freed by put_vma(). In such a case it is expected that false + * is returned. + */ +bool can_reuse_spf_vma(struct vm_area_struct *vma, unsigned long address) +{ + bool ret; + + ret = !RB_EMPTY_NODE(&vma->vm_rb) && + vma->vm_start <= address && address < vma->vm_end; put_vma(vma); return ret; } -- 2.7.4