Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp775944pxp; Fri, 11 Mar 2022 14:50:41 -0800 (PST) X-Google-Smtp-Source: ABdhPJxIdspa2vK+xICCzqU85HTyNuJHx0fN10b4OzBEh2PanAQkOR8Iq9B5SvQxvSpj7A7jBB1w X-Received: by 2002:a05:6a00:bc8:b0:4f6:ff68:50ba with SMTP id x8-20020a056a000bc800b004f6ff6850bamr12287682pfu.69.1647039041143; Fri, 11 Mar 2022 14:50:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1647039041; cv=none; d=google.com; s=arc-20160816; b=ak16y6Owb0JAv4v6q86JuXrx5Ti5xMxEKpBdTkK5N3rx2Gdsc5cXAl18sDH1f7ZfQE 7HwjNYoqOP5KDoICsHRxN3Eckhgu+rE0UXiy672D/YeAPLl5rFaGj8g0mSitiluIWTAo 8OXbFRNGSyUpbNskAse7181bAZ0hrK6Yv7HbO7//kGinn9zdmGtgcCEDXYLKEkwj9lIp DEwpyp4D0uSg/MBCbyhJiRXGLKgoQZQdTFhmE7qEi7vdOeBXuSK6mHKX3gCDHwSEFwzx sKog41BKjM1ulI0igGQkagQNK9TTrOr4kudUSjWwYwFy5LY3QIPkr50Z7bGqg9LaFM8e nFEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=H+7HXLBkyAOLpnbUPOmqITaSjFmsP/5pELWsoOHR/vE=; b=vKHGnrxgBw0DnTDK1QnH9525tyOr9Q2N9t69p4Uo5rxKLvTvJKcAiLn2fJjmP3Fp64 0FWexXdle3kgjynSVOUVKRFIlwe1oGUJW4ZKnURJ81MadeWi9TP8RbNP2kj/WXWrcE59 piTw/tzm58xJgZ82YNH5hMn5U+ZrL+hLrC7tCGZLpW5wayGW9rle+8yaODZgdmsnTuBo 9wVWjhXkMqRkjU/WuYd9Az2M7gB/F/JkZlSD1+6iQi2/PZa/TzJmtMK0SQmCFdSub1jR ur4omiZDc/VkC1PTp6NMhFLcP+9vUvbB0mYpPJp9/aUQ1MKDHT6ndfcAOELEqxgvjBuy Bcbw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="MJUW/OBt"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id w65-20020a626244000000b004f761c974d0si7535007pfb.17.2022.03.11.14.50.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Mar 2022 14:50:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="MJUW/OBt"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B73B362FE; Fri, 11 Mar 2022 13:48:16 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240123AbiCKTIW (ORCPT + 99 others); Fri, 11 Mar 2022 14:08:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49622 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351047AbiCKTIS (ORCPT ); Fri, 11 Mar 2022 14:08:18 -0500 Received: from mail-pg1-x52a.google.com (mail-pg1-x52a.google.com [IPv6:2607:f8b0:4864:20::52a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 680251B45D8 for ; Fri, 11 Mar 2022 11:07:10 -0800 (PST) Received: by mail-pg1-x52a.google.com with SMTP id 27so8194128pgk.10 for ; Fri, 11 Mar 2022 11:07:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=H+7HXLBkyAOLpnbUPOmqITaSjFmsP/5pELWsoOHR/vE=; b=MJUW/OBtekTcOTSPuabcFv3BZEGGPew/ae5RAx3Okfwdh7+YFVmWWVBCoD2oJ9Wpp2 yiGW3QXb4/sREzD5y8Ba3WABTEmXNS4zXZMt11BPqGrjIRvnnYrfBJHqf05nQHuRTfoQ HtrZNCd49fExA3DSuGIxYVOTGgHwbzMgx8JzfVqSZWBRig7y9a0r5GBTAALGT6wJ8h5d 1mZT0oa97bKk6POT4ycHwrShqf3jeoz1RqYEIe1YSaWdzJdFyhsQUkNrKan7IoJmhYia TWWZg1lcE/0BzVPilxag1kNVCgjk1t/S1yWle7P4w4w9O/XDRfUvm+pGvIoV+2wLIPcq OQUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=H+7HXLBkyAOLpnbUPOmqITaSjFmsP/5pELWsoOHR/vE=; b=c02UDWHWjTQW5J3sEeQs3QQzCFqiNWWHl7YTMbNOB70aKhvvgS7KfC2fYvrzwbi6Ly //FkUMTccFwNZjpFIQ8oeFBZFzUj8/9pKhHbiirBQdOpA6+Kr9fG/3DqUtCkjdWjhPnU QzLeQ4+7SzEwJabZFU3NsFEq2C90+qJuMBdeHIkYjwe6NFpcDmSUyALXHoJa0TDynlE4 QiJEvTX/AyXQ1imc+OA4qSYkJg7sb9OiUdsfmsuzlh1onZFaBTiHwx87SSWdUadLuIxv USjHQvG+DespG6dfZTs6osLPF2OgHOd4na2dwPmt3rJpoLfAYevCPEoFhpFbJCnzy642 z0Aw== X-Gm-Message-State: AOAM533E3HTBPZWOGT4xEbgUW/PCWXbw+cgtC0ptDvJQ32O4oqYLLoHD 1vv/fZnCQKvpVkqkvsgp+04= X-Received: by 2002:a63:7cb:0:b0:380:f89f:c9a2 with SMTP id 194-20020a6307cb000000b00380f89fc9a2mr6316679pgh.264.1647025629551; Fri, 11 Mar 2022 11:07:09 -0800 (PST) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id a5-20020a621a05000000b004f79f8f795fsm857329pfa.0.2022.03.11.11.07.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Mar 2022 11:07:09 -0800 (PST) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Nadav Amit , Andrea Arcangeli , Andrew Cooper , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , x86@kernel.org Subject: [RESEND PATCH v3 5/5] mm: avoid unnecessary flush on change_huge_pmd() Date: Fri, 11 Mar 2022 11:07:49 -0800 Message-Id: <20220311190749.338281-6-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220311190749.338281-1-namit@vmware.com> References: <20220311190749.338281-1-namit@vmware.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Nadav Amit Calls to change_protection_range() on THP can trigger, at least on x86, two TLB flushes for one page: one immediately, when pmdp_invalidate() is called by change_huge_pmd(), and then another one later (that can be batched) when change_protection_range() finishes. The first TLB flush is only necessary to prevent the dirty bit (and with a lesser importance the access bit) from changing while the PTE is modified. However, this is not necessary as the x86 CPUs set the dirty-bit atomically with an additional check that the PTE is (still) present. One caveat is Intel's Knights Landing that has a bug and does not do so. Leverage this behavior to eliminate the unnecessary TLB flush in change_huge_pmd(). Introduce a new arch specific pmdp_invalidate_ad() that only invalidates the access and dirty bit from further changes. Cc: Andrea Arcangeli Cc: Andrew Cooper Cc: Andrew Morton Cc: Andy Lutomirski Cc: Dave Hansen Cc: Peter Xu Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Cc: Yu Zhao Cc: Nick Piggin Cc: x86@kernel.org Signed-off-by: Nadav Amit --- arch/x86/include/asm/pgtable.h | 5 +++++ arch/x86/mm/pgtable.c | 10 ++++++++++ include/linux/pgtable.h | 20 ++++++++++++++++++++ mm/huge_memory.c | 4 ++-- mm/pgtable-generic.c | 8 ++++++++ 5 files changed, 45 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 62ab07e24aef..23ad34edcc4b 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1173,6 +1173,11 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma, } } #endif + +#define __HAVE_ARCH_PMDP_INVALIDATE_AD +extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, + unsigned long address, pmd_t *pmdp); + /* * Page table pages are page-aligned. The lower half of the top * level is used for userspace and the top half for the kernel. diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 3481b35cb4ec..b2fcb2c749ce 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -608,6 +608,16 @@ int pmdp_clear_flush_young(struct vm_area_struct *vma, return young; } + +pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp) +{ + pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp)); + + if (cpu_feature_enabled(X86_BUG_PTE_LEAK)) + flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); + return old; +} #endif /** diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index f4f4077b97aa..5826e8e52619 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -570,6 +570,26 @@ extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); #endif +#ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD + +/* + * pmdp_invalidate_ad() invalidates the PMD while changing a transparent + * hugepage mapping in the page tables. This function is similar to + * pmdp_invalidate(), but should only be used if the access and dirty bits would + * not be cleared by the software in the new PMD value. The function ensures + * that hardware changes of the access and dirty bits updates would not be lost. + * + * Doing so can allow in certain architectures to avoid a TLB flush in most + * cases. Yet, another TLB flush might be necessary later if the PMD update + * itself requires such flush (e.g., if protection was set to be stricter). Yet, + * even when a TLB flush is needed because of the update, the caller may be able + * to batch these TLB flushing operations, so fewer TLB flush operations are + * needed. + */ +extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, + unsigned long address, pmd_t *pmdp); +#endif + #ifndef __HAVE_ARCH_PTE_SAME static inline int pte_same(pte_t pte_a, pte_t pte_b) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 51b0f3cb1ba0..691d80edcfd7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1781,10 +1781,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, * The race makes MADV_DONTNEED miss the huge pmd and don't clear it * which may break userspace. * - * pmdp_invalidate() is required to make sure we don't miss + * pmdp_invalidate_ad() is required to make sure we don't miss * dirty/young flags set by hardware. */ - oldpmd = pmdp_invalidate(vma, addr, pmd); + oldpmd = pmdp_invalidate_ad(vma, addr, pmd); entry = pmd_modify(oldpmd, newprot); if (preserve_write) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 6523fda274e5..90ab721a12a8 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -201,6 +201,14 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, } #endif +#ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD +pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp) +{ + return pmdp_invalidate(vma, address, pmdp); +} +#endif + #ifndef pmdp_collapse_flush pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) -- 2.25.1