Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp550368rwd; Wed, 31 May 2023 02:00:40 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5bJMurJ8jEikHOgMpkwzOQo/T6Gzbbo3OQk4Ei9o78AQ61Z/ktBIiVd3LjndQwcX7J7d9v X-Received: by 2002:a17:902:7d87:b0:1a9:b0a3:f03a with SMTP id a7-20020a1709027d8700b001a9b0a3f03amr4065694plm.9.1685523639686; Wed, 31 May 2023 02:00:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685523639; cv=none; d=google.com; s=arc-20160816; b=c6eVFK+BSjdiyGhTzrQdJejLTi4PKRNOg4DdsdwX9TlmPaKLmjHLinYG2AXnsZz57k Kwly2QtbnDEAk7rx6s66IAyVElYp7spTX3pBBLJCWrNtMepkH9aG5k7VqCd1h9bG9+xS AE6JDoXdZoW+a/kkjuPTLyot5L23vOoiT4Q71S17O3Z3F82KWiideRX3KW5lm+l9OLV4 s8FvRbe/mAfRqQ3xt8WyjmwSxRtJwDWN+WB24ybYLhnxIZLE4ieMV1lLwYnR7HLOtZ39 Ls6XJIjfmMZHIiQZ0LzYCWMXY1k0fzz9UDP5//XDAPp2Rdy32KKZuExbKMfCtRahI98J eG/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:subject:cc:to:from:message-id :date:dkim-signature; bh=lhQVz4TAztj9EUwZIX56tqSc7WiZfR88yfGJ+t5DGGw=; b=XN50BLJT8OctP1MaaiKE25JV6BxoRlGPOPiZleNYLR+Fp8zDvkF4/1w7gH207JNGeR AYWsAXavO3iKn156jiE8AF02AMsvqRkw/DQozSqlfYA9/hO20/xlAFO+cvODQRmFLIcE Bymh5Brubfv7ChcmsChbIfk0XxzYiYB+ztDhLuqf/Tx6PzmEJS9AtVE6JXYnT7FydWdK W4rrCyAsuKZsfE8y7jut4davlhsLFkjIVS638PwV3/yncnjx/1odQQNKiaJ4S3p+zCtI 0mBjTxY4Typ3ImRjpNoNVVKfzuplR2WCS+OWw0Hza8wdFZtroon+xG7r0MjmwD6R9A7k T61g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="HeZE/tnc"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u5-20020a17090282c500b001ac6d4e1d72si560100plz.149.2023.05.31.02.00.22; Wed, 31 May 2023 02:00:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="HeZE/tnc"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235017AbjEaIy2 (ORCPT + 99 others); Wed, 31 May 2023 04:54:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234988AbjEaIyZ (ORCPT ); Wed, 31 May 2023 04:54:25 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3CEB0E6; Wed, 31 May 2023 01:54:23 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id B3F4163761; Wed, 31 May 2023 08:54:22 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0AA25C433EF; Wed, 31 May 2023 08:54:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685523262; bh=Sqn3M6dSHNJEqYtJZIRvu64F2FVcoaakUNxFMdpXTJM=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=HeZE/tncj++dXrRuRsxqnZCmtcksVbV7Hepd7mOs3iyEhSFtfJw3dIWRumbiqkTOe mezWymVS6m21keQT4VzaKFRqXXKXZEMEmGPXs6OwGkQccCm6Wkxm1OxXQhOmIgKXwm mTlIY9WFxLp/3fa313tORgmYQcBqp/ImPNKA4lVFwt4RrXU9n/KLfbfnpZ9XSM3bD6 KFzu4kW7vodftmR7gtt/LQO4Y84V4w9K6yRPaXlHN/rEKLfjxlemTqGeCO0Dv9IxL7 Se0zOH4kYHXT8agpYpc6AfMu82+OnHm5df7bXZiOnHtY5Aqd/v0q7Vd/zjx0BBVg4b 0x6rCxIggPvsg== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1q4Hb1-001Z7H-UP; Wed, 31 May 2023 09:54:20 +0100 Date: Wed, 31 May 2023 09:54:19 +0100 Message-ID: <86y1l4c24k.wl-maz@kernel.org> From: Marc Zyngier To: Raghavendra Rao Ananta Cc: Oliver Upton , James Morse , Suzuki K Poulose , Ricardo Koller , Paolo Bonzini , Jing Zhang , Colton Lewis , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Subject: Re: [PATCH v4 6/6] KVM: arm64: Use TLBI range-based intructions for unmap In-Reply-To: References: <20230519005231.3027912-1-rananta@google.com> <20230519005231.3027912-7-rananta@google.com> <87ttvvjk5q.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: rananta@google.com, oliver.upton@linux.dev, james.morse@arm.com, suzuki.poulose@arm.com, ricarkol@google.com, pbonzini@redhat.com, jingzhangos@google.com, coltonlewis@google.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 30 May 2023 22:35:57 +0100, Raghavendra Rao Ananta wrote: >=20 > On Mon, May 29, 2023 at 7:18=E2=80=AFAM Marc Zyngier wro= te: > > > > On Fri, 19 May 2023 01:52:31 +0100, > > Raghavendra Rao Ananta wrote: > > > > > > The current implementation of the stage-2 unmap walker traverses > > > the given range and, as a part of break-before-make, performs > > > TLB invalidations with a DSB for every PTE. A multitude of this > > > combination could cause a performance bottleneck. > > > > > > Hence, if the system supports FEAT_TLBIRANGE, defer the TLB > > > invalidations until the entire walk is finished, and then > > > use range-based instructions to invalidate the TLBs in one go. > > > Condition this upon S2FWB in order to avoid walking the page-table > > > again to perform the CMOs after issuing the TLBI. > > > > But that's the real bottleneck. TLBIs are cheap compared to CMOs, even > > on remarkably bad implementations. What is your plan to fix this? > > > Correct me if I'm wrong, but my understanding was that a multiple > issuance of TLBI + DSB was the bottleneck, and this patch tries to > avoid this by issuing only one TLBI + DSB at the end. At least on some of the machines I have access to, CMOs are fare more expensive than TLBIs, and they are the ones causing slowdowns. Your system shows a different behaviour, and that's fine, but you can't draw a general conclusion from it. > > > > > > Rename stage2_put_pte() to stage2_unmap_put_pte() as the function > > > now serves the stage-2 unmap walker specifically, rather than > > > acting generic. > > > > > > Signed-off-by: Raghavendra Rao Ananta > > > --- > > > arch/arm64/kvm/hyp/pgtable.c | 35 ++++++++++++++++++++++++++++++----- > > > 1 file changed, 30 insertions(+), 5 deletions(-) > > > > > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtabl= e.c > > > index b8f0dbd12f773..5832ee3418fb0 100644 > > > --- a/arch/arm64/kvm/hyp/pgtable.c > > > +++ b/arch/arm64/kvm/hyp/pgtable.c > > > @@ -771,16 +771,34 @@ static void stage2_make_pte(const struct kvm_pg= table_visit_ctx *ctx, kvm_pte_t n > > > smp_store_release(ctx->ptep, new); > > > } > > > > > > -static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, = struct kvm_s2_mmu *mmu, > > > - struct kvm_pgtable_mm_ops *mm_ops) > > > +static bool stage2_unmap_defer_tlb_flush(struct kvm_pgtable *pgt) > > > { > > > + /* > > > + * If FEAT_TLBIRANGE is implemented, defer the individial PTE > > > + * TLB invalidations until the entire walk is finished, and > > > + * then use the range-based TLBI instructions to do the > > > + * invalidations. Condition this upon S2FWB in order to avoid > > > + * a page-table walk again to perform the CMOs after TLBI. > > > + */ > > > + return system_supports_tlb_range() && stage2_has_fwb(pgt); > > > +} > > > + > > > +static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx = *ctx, > > > + struct kvm_s2_mmu *mmu, > > > + struct kvm_pgtable_mm_ops *mm_ops) > > > +{ > > > + struct kvm_pgtable *pgt =3D ctx->arg; > > > + > > > /* > > > * Clear the existing PTE, and perform break-before-make with > > > * TLB maintenance if it was valid. > > > */ > > > if (kvm_pte_valid(ctx->old)) { > > > kvm_clear_pte(ctx->ptep); > > > - kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, = ctx->level); > > > + > > > + if (!stage2_unmap_defer_tlb_flush(pgt)) > > > + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, > > > + ctx->addr, ctx->level); > > > > This really doesn't match the comment anymore. > > > Right, I can re-write this in the next spin. >=20 > > Overall, I'm very concerned that we lose the consistency property that > > the current code has: once called, the TLBs and the page tables are > > synchronised. > > > > Yes, this patch looks correct. But it is also really fragile. > > > Yeah, we were a little skeptical about this too. Till v2, we had a > different implementation in which we had an independent fast unmap > path that disconnects the PTE hierarchy if the unmap range was exactly > KVM_PGTABLE_MIN_BLOCK_LEVEL [1]. But this had some problems, and we > pivoted to the current implementation. Can we at least have some sort of runtime assertions that at the point we release the write lock, the TLBs have been invalidated? Even if that's tied to some debug config. >=20 > > > } > > > > > > mm_ops->put_page(ctx->ptep); > > > @@ -1015,7 +1033,7 @@ static int stage2_unmap_walker(const struct kvm= _pgtable_visit_ctx *ctx, > > > * block entry and rely on the remaining portions being faulted > > > * back lazily. > > > */ > > > - stage2_put_pte(ctx, mmu, mm_ops); > > > + stage2_unmap_put_pte(ctx, mmu, mm_ops); > > > > > > if (need_flush && mm_ops->dcache_clean_inval_poc) > > > mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old,= mm_ops), > > > @@ -1029,13 +1047,20 @@ static int stage2_unmap_walker(const struct k= vm_pgtable_visit_ctx *ctx, > > > > > > int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 = size) > > > { > > > + int ret; > > > struct kvm_pgtable_walker walker =3D { > > > .cb =3D stage2_unmap_walker, > > > .arg =3D pgt, > > > .flags =3D KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TA= BLE_POST, > > > }; > > > > > > - return kvm_pgtable_walk(pgt, addr, size, &walker); > > > + ret =3D kvm_pgtable_walk(pgt, addr, size, &walker); > > > + if (stage2_unmap_defer_tlb_flush(pgt)) > > > + /* Perform the deferred TLB invalidations */ > > > + kvm_call_hyp(__kvm_tlb_flush_vmid_range, pgt->mmu, > > > + addr, addr + size); > > > > This "kvm_call_hyp(__kvm_tlb_flush_vmid_range,...)" could do with a > > wrapper from the point where you introduce it. > > > Sorry, I didn't get this comment. Do you mind elaborating on it? All I'm saying is that you should have a wrapper like: void kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu, phys_addr_t base, size_t size) { kvm_call_hyp(__kvm_tlb_flush_vmid_range, mmu, base, base + size); } and use it throughout the code. Thanks, M. --=20 Without deviation from the norm, progress is not possible.