Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp5099933rwd; Tue, 30 May 2023 14:48:42 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7whcen7jxqebbcIZeKiF2s5nX5MxtbWdNsM4tBjnaTUq/On73d9FoMdpxmWPIPhzWWxnNZ X-Received: by 2002:a05:6a20:6a27:b0:110:32ba:33c3 with SMTP id p39-20020a056a206a2700b0011032ba33c3mr4515308pzk.57.1685483321770; Tue, 30 May 2023 14:48:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685483321; cv=none; d=google.com; s=arc-20160816; b=ZCF+CeCSmqAOExfNH2mc1scFWd3wrgH0gNfzB67LJGtZLoToxjWrAZSdVmzFGJ29xr Jc6Scdb+MzO2wheFk+vlGpDaB5FQNtyTLGCsY8uIopTrdzDKut3JF1f4KQ07p8ybS+rl b+adgTQDrdFuxDNBFE9jnaQA/Y+cqRaCE3nq3T49R7al0GpZULWT5iODqCNAwu5PoE3f iA8IpGfxZTRBHnYQWsN2eU0+A0SziofuRcTqqDjZhi8YQIGp0PMBplhucYvMXClEiuTr fz1ES40po4hH6EBl0fGUDxdvMQ8TaW3A+JbsoFpTa9sO5jpoPGVCRxb27jDSw1fkuxRb 15og== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=ewkLPgc14ELbXHOwD+0DDe9cJ9eyhGp9sBwFGmnJByA=; b=dDgix0GLz1ZmYCyZc/FJGeBMrcjmzXyclptcGnP4WfAEivnVrAiXJKdpFX/uZrB4TZ Hsq+EudwbMEsnHLnrfMAAxoErKF+ZettWIIaQuGCIgUR2MGFtXyrZof3rLchu483NML4 N82LmWVXQFN4XtJCYDzxaQTH5cv60kAx5/8pIdToC56ww4l+Jvpd7Az4FJBZRnrpE/GJ vbH8WzfMDBr0rkppy4InPYK16oG1Ex9DH8M3kABwYkQmzAukGQKKZj7i0THDxcfsLvP+ 08YwarR9KQ1vLn3s7YkuLI6KevssSUutHDWZaL8l03PiiO+DO1Q7nwEmHcjirNRyn5iw iKeQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=OZBj5K1P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id rj14-20020a17090b3e8e00b002563db5c4b0si6778591pjb.184.2023.05.30.14.48.30; Tue, 30 May 2023 14:48:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=OZBj5K1P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233353AbjE3VgM (ORCPT + 99 others); Tue, 30 May 2023 17:36:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231398AbjE3VgL (ORCPT ); Tue, 30 May 2023 17:36:11 -0400 Received: from mail-il1-x12c.google.com (mail-il1-x12c.google.com [IPv6:2607:f8b0:4864:20::12c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74941E5 for ; Tue, 30 May 2023 14:36:09 -0700 (PDT) Received: by mail-il1-x12c.google.com with SMTP id e9e14a558f8ab-33baee0235cso22895ab.1 for ; Tue, 30 May 2023 14:36:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685482569; x=1688074569; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ewkLPgc14ELbXHOwD+0DDe9cJ9eyhGp9sBwFGmnJByA=; b=OZBj5K1PJzJq7ysGaWohPD3MWFE3CYtFPYIUSyE/WsUc4uD7FgNgku0xspOed1pXA8 bPfgeyybhTt4zAwKD6UOJaGYfhIseE0exPCyHQq2g95hYsXmkdGga6av3EMVZslCOjvm n8xtN4qGcX9z4OFguGscfFpNPt7ZZtuniT0qfzYFPu4f/HaIXaTx73xJ6DbnN1Fvq2Tv z0Tdr+GMEmSSmveJB/xI/Y6oCdtQSEo7wq/d9a3GdKOyXOR1BgMivYo/pKj3EXJKMAbk 8TIJ5/l0zTI+LmttQ2gdXcFgkeLIpLX3o6h4w2AV0jp3RyvB0PEkkLrqusv3O/KpcwhX GVSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685482569; x=1688074569; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ewkLPgc14ELbXHOwD+0DDe9cJ9eyhGp9sBwFGmnJByA=; b=JjDqGAusG+S9vGm00o3kygoG6xYfKsUL5em8w2Sc73k717YMbdxmA7SfZsUwiWM4Ws HmwsqE420yFlYOfGFpWFXXK5yWQO6qf5FOOGyhAyG1kENfrsBB6F3LgLshybXQ7rG1NU qJKqPz81ClffM+fyI7m7Et6kgsmTN0emooblDRnbNsb8WbNxR+Ehxw+VRfvHa2GbwJ1F lYtkO0v4f2NOmCYtY6g13PtvjPIY0FLqlWc3uGfFlfWE6d7aXD1RqQh5q/gFo0jbcY6l CpLW5GZeSrKSOVP/vEqKJvJBXE5V8g0ch/e2cyukXnE1zkSAk93jIaE3lpeLiBcBJini xe2A== X-Gm-Message-State: AC+VfDyOQGJGAZVjM3nPmEeV/KJtZZYmq0wCTfWhksx5ocRCkajfTZTV 80KDocpoiXqihAUE3+AkUWUNV2MhqV/YYL0fJik8fQ== X-Received: by 2002:a05:6e02:1bab:b0:33b:61f8:5ae5 with SMTP id n11-20020a056e021bab00b0033b61f85ae5mr26739ili.15.1685482568731; Tue, 30 May 2023 14:36:08 -0700 (PDT) MIME-Version: 1.0 References: <20230519005231.3027912-1-rananta@google.com> <20230519005231.3027912-7-rananta@google.com> <87ttvvjk5q.wl-maz@kernel.org> In-Reply-To: <87ttvvjk5q.wl-maz@kernel.org> From: Raghavendra Rao Ananta Date: Tue, 30 May 2023 14:35:57 -0700 Message-ID: Subject: Re: [PATCH v4 6/6] KVM: arm64: Use TLBI range-based intructions for unmap To: Marc Zyngier Cc: Oliver Upton , James Morse , Suzuki K Poulose , Ricardo Koller , Paolo Bonzini , Jing Zhang , Colton Lewis , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 29, 2023 at 7:18=E2=80=AFAM Marc Zyngier wrote= : > > On Fri, 19 May 2023 01:52:31 +0100, > Raghavendra Rao Ananta wrote: > > > > The current implementation of the stage-2 unmap walker traverses > > the given range and, as a part of break-before-make, performs > > TLB invalidations with a DSB for every PTE. A multitude of this > > combination could cause a performance bottleneck. > > > > Hence, if the system supports FEAT_TLBIRANGE, defer the TLB > > invalidations until the entire walk is finished, and then > > use range-based instructions to invalidate the TLBs in one go. > > Condition this upon S2FWB in order to avoid walking the page-table > > again to perform the CMOs after issuing the TLBI. > > But that's the real bottleneck. TLBIs are cheap compared to CMOs, even > on remarkably bad implementations. What is your plan to fix this? > Correct me if I'm wrong, but my understanding was that a multiple issuance of TLBI + DSB was the bottleneck, and this patch tries to avoid this by issuing only one TLBI + DSB at the end. > > > > Rename stage2_put_pte() to stage2_unmap_put_pte() as the function > > now serves the stage-2 unmap walker specifically, rather than > > acting generic. > > > > Signed-off-by: Raghavendra Rao Ananta > > --- > > arch/arm64/kvm/hyp/pgtable.c | 35 ++++++++++++++++++++++++++++++----- > > 1 file changed, 30 insertions(+), 5 deletions(-) > > > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.= c > > index b8f0dbd12f773..5832ee3418fb0 100644 > > --- a/arch/arm64/kvm/hyp/pgtable.c > > +++ b/arch/arm64/kvm/hyp/pgtable.c > > @@ -771,16 +771,34 @@ static void stage2_make_pte(const struct kvm_pgta= ble_visit_ctx *ctx, kvm_pte_t n > > smp_store_release(ctx->ptep, new); > > } > > > > -static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, st= ruct kvm_s2_mmu *mmu, > > - struct kvm_pgtable_mm_ops *mm_ops) > > +static bool stage2_unmap_defer_tlb_flush(struct kvm_pgtable *pgt) > > { > > + /* > > + * If FEAT_TLBIRANGE is implemented, defer the individial PTE > > + * TLB invalidations until the entire walk is finished, and > > + * then use the range-based TLBI instructions to do the > > + * invalidations. Condition this upon S2FWB in order to avoid > > + * a page-table walk again to perform the CMOs after TLBI. > > + */ > > + return system_supports_tlb_range() && stage2_has_fwb(pgt); > > +} > > + > > +static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *c= tx, > > + struct kvm_s2_mmu *mmu, > > + struct kvm_pgtable_mm_ops *mm_ops) > > +{ > > + struct kvm_pgtable *pgt =3D ctx->arg; > > + > > /* > > * Clear the existing PTE, and perform break-before-make with > > * TLB maintenance if it was valid. > > */ > > if (kvm_pte_valid(ctx->old)) { > > kvm_clear_pte(ctx->ptep); > > - kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ct= x->level); > > + > > + if (!stage2_unmap_defer_tlb_flush(pgt)) > > + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, > > + ctx->addr, ctx->level); > > This really doesn't match the comment anymore. > Right, I can re-write this in the next spin. > Overall, I'm very concerned that we lose the consistency property that > the current code has: once called, the TLBs and the page tables are > synchronised. > > Yes, this patch looks correct. But it is also really fragile. > Yeah, we were a little skeptical about this too. Till v2, we had a different implementation in which we had an independent fast unmap path that disconnects the PTE hierarchy if the unmap range was exactly KVM_PGTABLE_MIN_BLOCK_LEVEL [1]. But this had some problems, and we pivoted to the current implementation. > > } > > > > mm_ops->put_page(ctx->ptep); > > @@ -1015,7 +1033,7 @@ static int stage2_unmap_walker(const struct kvm_p= gtable_visit_ctx *ctx, > > * block entry and rely on the remaining portions being faulted > > * back lazily. > > */ > > - stage2_put_pte(ctx, mmu, mm_ops); > > + stage2_unmap_put_pte(ctx, mmu, mm_ops); > > > > if (need_flush && mm_ops->dcache_clean_inval_poc) > > mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, m= m_ops), > > @@ -1029,13 +1047,20 @@ static int stage2_unmap_walker(const struct kvm= _pgtable_visit_ctx *ctx, > > > > int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 si= ze) > > { > > + int ret; > > struct kvm_pgtable_walker walker =3D { > > .cb =3D stage2_unmap_walker, > > .arg =3D pgt, > > .flags =3D KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABL= E_POST, > > }; > > > > - return kvm_pgtable_walk(pgt, addr, size, &walker); > > + ret =3D kvm_pgtable_walk(pgt, addr, size, &walker); > > + if (stage2_unmap_defer_tlb_flush(pgt)) > > + /* Perform the deferred TLB invalidations */ > > + kvm_call_hyp(__kvm_tlb_flush_vmid_range, pgt->mmu, > > + addr, addr + size); > > This "kvm_call_hyp(__kvm_tlb_flush_vmid_range,...)" could do with a > wrapper from the point where you introduce it. > Sorry, I didn't get this comment. Do you mind elaborating on it? Thank you. Raghavendra [1]: https://lore.kernel.org/all/20230206172340.2639971-8-rananta@google.co= m/ > > + > > + return ret; > > } > > > > Thanks, > > M. > > -- > Without deviation from the norm, progress is not possible.