Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp740308pxf; Wed, 7 Apr 2021 10:20:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzqS4/i4pYsbZt9SjpNLeh/EDSOghGe4NnhIpFT7c0mX+FYgtPmgjckgY6I2KW/tysgkS6i X-Received: by 2002:a05:6602:2596:: with SMTP id p22mr3351966ioo.186.1617816015949; Wed, 07 Apr 2021 10:20:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617816015; cv=none; d=google.com; s=arc-20160816; b=ZrnPffiAm/qIcI/Wtj14ihNieSwUXYzuxRiT4LjAiZJypzZ0CFIv9S5MZMbqQY+qVg aARgxSHXfzjf3ohaAjmo/gXWkC4X6MXCR+94G4zwZcrbPx5DXpaSisD0dJ0L0p6ZoeTo pa5wc4F/Abu7OWzq7M304fvnnTuVFvqPUAFVioadNAmQWG5BT5sKSW8WKQkGywTOHo9a NyLEgW1qygV1ghSluL/WLirr9OGDcxclq/Z5Zy8JcLm5hywKMtJjEF4OpCNfjPAiqrha t/BofqPJ5Fyeb5JsQjCiQiIrdE+JCjQteNUmUSzNUxvqU8Zc79OgCo5r6z/35GE2LRJT MDVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-signature; bh=mO5MU45MyVZcFHxHRLPz46S2R58Y3f1o/RvXopsTgug=; b=e5AVE0Gran67Bo5kA9sQSZ67n0sLC0oKGcELmIvvtJYQPVJvstu2lRNOyjcIHFrQGr qxoL7M9jzEgC++ZXJBzbhTSwN5F4BwA1Htdaay+ytfqyr/9X//pUEJHLUml1Xgxd/xF7 qc0KP7RSjR4RQMx9yeeZN0WBo/VfWGIXdzDC0QN0mnMr6sZNz8AjoGHSsCaYQgwFpyXz MjpGAo9K2RjfNim5bW2gzUEN8qHZ8iG/TkPjZUbGxDdKQFBeTJR2kp3zvoPvtwGyL8Rj h3KQgjbcdee2F6zoHWg5svxutPAUaXyZnnxpHsRrjQwLr+obyUSh9bcsd906OrhAhO+t HnAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (no key) header.i=@lespinasse.org header.s=srv-11-ed header.b=x8Ntsiwc; dkim=pass (test mode) header.i=@lespinasse.org header.s=srv-11-rsa header.b=psR8q2El; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=lespinasse.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h15si22773666ili.57.2021.04.07.10.20.03; Wed, 07 Apr 2021 10:20:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=neutral (no key) header.i=@lespinasse.org header.s=srv-11-ed header.b=x8Ntsiwc; dkim=pass (test mode) header.i=@lespinasse.org header.s=srv-11-rsa header.b=psR8q2El; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=lespinasse.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348062AbhDGBxo (ORCPT + 99 others); Tue, 6 Apr 2021 21:53:44 -0400 Received: from server.lespinasse.org ([63.205.204.226]:40059 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347844AbhDGBvx (ORCPT ); Tue, 6 Apr 2021 21:51:53 -0400 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-11-ed; t=1617759902; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=mO5MU45MyVZcFHxHRLPz46S2R58Y3f1o/RvXopsTgug=; b=x8Ntsiwc/FMdqgxnWlPWGRMTMXKGXUd7FH7ae8ZNK07QdvjB9kdBhnKqEpDWUGep8WKgH agBG8aS8vepwuR9DQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-11-rsa; t=1617759902; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=mO5MU45MyVZcFHxHRLPz46S2R58Y3f1o/RvXopsTgug=; b=psR8q2EllG8INTzPGm5GoRHikfJCeKeFiVpoGicOrWGGGHq0JVzKS7chnTxPadxN82J4S jgkUavAqZfvS7qtYoZ30w4IeLh1awXgYDxl7BwEvUJ4oQE6+AhCFkcE0Qzh08C4HXiovxGS hLlPXxtWwLKFOgCkA3eP9WXrmsitGCWenDzpIgjLC7olWkErJSelAlrPp2mvIB7lLX50swf YYOHYT/YnPjrziUErSUmlNX1HI0Z670r9mkloDUGgn4ugJbzNQcZloAxwZcMhUFKxiONRWf tBhnOM8mm/8aax6ivgVkb1HvaqDBIQwG/lC6CHAPXCUknPqfVjHYSjx3gBhg== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id B8D011602D1; Tue, 6 Apr 2021 18:45:02 -0700 (PDT) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id A8AAC19F31F; Tue, 6 Apr 2021 18:45:02 -0700 (PDT) From: Michel Lespinasse To: Linux-MM Cc: Laurent Dufour , Peter Zijlstra , Michal Hocko , Matthew Wilcox , Rik van Riel , Paul McKenney , Andrew Morton , Suren Baghdasaryan , Joel Fernandes , Rom Lemarchand , Linux-Kernel , Michel Lespinasse Subject: [RFC PATCH 12/37] mm: refactor __handle_mm_fault() / handle_pte_fault() Date: Tue, 6 Apr 2021 18:44:37 -0700 Message-Id: <20210407014502.24091-13-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210407014502.24091-1-michel@lespinasse.org> References: <20210407014502.24091-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Move the code that initializes vmf->pte and vmf->orig_pte from handle_pte_fault() to its single call site in __handle_mm_fault(). This ensures vmf->pte is now initialized together with the higher levels of the page table hierarchy. This also prepares for speculative page fault handling, where the entire page table walk (higher levels down to ptes) needs special care in the speculative case. Signed-off-by: Michel Lespinasse --- mm/memory.c | 98 ++++++++++++++++++++++++++--------------------------- 1 file changed, 49 insertions(+), 49 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 3691be1f1319..66e7a4554c54 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3516,7 +3516,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) if (pte_alloc(vma->vm_mm, vmf->pmd)) return VM_FAULT_OOM; - /* See comment in handle_pte_fault() */ + /* See comment in __handle_mm_fault() */ if (unlikely(pmd_trans_unstable(vmf->pmd))) return 0; @@ -3797,7 +3797,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf) return VM_FAULT_OOM; } - /* See comment in handle_pte_fault() */ + /* See comment in __handle_mm_fault() */ if (pmd_devmap_trans_unstable(vmf->pmd)) return 0; @@ -4253,53 +4253,6 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) { pte_t entry; - if (unlikely(pmd_none(*vmf->pmd))) { - /* - * Leave __pte_alloc() until later: because vm_ops->fault may - * want to allocate huge page, and if we expose page table - * for an instant, it will be difficult to retract from - * concurrent faults and from rmap lookups. - */ - vmf->pte = NULL; - } else { - /* - * If a huge pmd materialized under us just retry later. Use - * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead - * of pmd_trans_huge() to ensure the pmd didn't become - * pmd_trans_huge under us and then back to pmd_none, as a - * result of MADV_DONTNEED running immediately after a huge pmd - * fault in a different thread of this mm, in turn leading to a - * misleading pmd_trans_huge() retval. All we have to ensure is - * that it is a regular pmd that we can walk with - * pte_offset_map() and we can do that through an atomic read - * in C, which is what pmd_trans_unstable() provides. - */ - if (pmd_devmap_trans_unstable(vmf->pmd)) - return 0; - /* - * A regular pmd is established and it can't morph into a huge - * pmd from under us anymore at this point because we hold the - * mmap_lock read mode and khugepaged takes it in write mode. - * So now it's safe to run pte_offset_map(). - */ - vmf->pte = pte_offset_map(vmf->pmd, vmf->address); - vmf->orig_pte = *vmf->pte; - - /* - * some architectures can have larger ptes than wordsize, - * e.g.ppc44x-defconfig has CONFIG_PTE_64BIT=y and - * CONFIG_32BIT=y, so READ_ONCE cannot guarantee atomic - * accesses. The code below just needs a consistent view - * for the ifs and we later double check anyway with the - * ptl lock held. So here a barrier will do. - */ - barrier(); - if (pte_none(vmf->orig_pte)) { - pte_unmap(vmf->pte); - vmf->pte = NULL; - } - } - if (!vmf->pte) { if (vma_is_anonymous(vmf->vma)) return do_anonymous_page(vmf); @@ -4439,6 +4392,53 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, } } + if (unlikely(pmd_none(*vmf.pmd))) { + /* + * Leave __pte_alloc() until later: because vm_ops->fault may + * want to allocate huge page, and if we expose page table + * for an instant, it will be difficult to retract from + * concurrent faults and from rmap lookups. + */ + vmf.pte = NULL; + } else { + /* + * If a huge pmd materialized under us just retry later. Use + * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead + * of pmd_trans_huge() to ensure the pmd didn't become + * pmd_trans_huge under us and then back to pmd_none, as a + * result of MADV_DONTNEED running immediately after a huge pmd + * fault in a different thread of this mm, in turn leading to a + * misleading pmd_trans_huge() retval. All we have to ensure is + * that it is a regular pmd that we can walk with + * pte_offset_map() and we can do that through an atomic read + * in C, which is what pmd_trans_unstable() provides. + */ + if (pmd_devmap_trans_unstable(vmf.pmd)) + return 0; + /* + * A regular pmd is established and it can't morph into a huge + * pmd from under us anymore at this point because we hold the + * mmap_lock read mode and khugepaged takes it in write mode. + * So now it's safe to run pte_offset_map(). + */ + vmf.pte = pte_offset_map(vmf.pmd, vmf.address); + vmf.orig_pte = *vmf.pte; + + /* + * some architectures can have larger ptes than wordsize, + * e.g.ppc44x-defconfig has CONFIG_PTE_64BIT=y and + * CONFIG_32BIT=y, so READ_ONCE cannot guarantee atomic + * accesses. The code below just needs a consistent view + * for the ifs and we later double check anyway with the + * ptl lock held. So here a barrier will do. + */ + barrier(); + if (pte_none(vmf.orig_pte)) { + pte_unmap(vmf.pte); + vmf.pte = NULL; + } + } + return handle_pte_fault(&vmf); } -- 2.20.1