Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp3228556rwd; Mon, 29 May 2023 07:28:59 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5V8VqsYVcoiVRCEuCvaSVq9bn6bsub4KL01Rd4KeX61BzFjAl4TOhO/oUB7AAAHwbCQezw X-Received: by 2002:a05:6a20:6f0e:b0:10b:40a9:ec48 with SMTP id gt14-20020a056a206f0e00b0010b40a9ec48mr7950430pzb.29.1685370539376; Mon, 29 May 2023 07:28:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685370539; cv=none; d=google.com; s=arc-20160816; b=UbhrhQWSj7QANy1XcV5jx3LnLuk93B8rguW44EzKMs6qWPN1NPWyPsFRZK418Ne+Ud C3Sh4zChzGZAcNDidWsKK6qDbOBGF8DofxVQzd0448R7U6EbaLqyveDz18XsuOCH/dah Vd7+u87EYYVYAptlwx0FpArmfWhfmyphSsQ+58jRuEYvlDlbL6gAr7BVgXLx43MIIvwF FuvE7ff/mnLRKflfORoHzy242T8D+0r318slywspSsZwnFkcCyFsdJG69zk8LUXkFR1D dLMVjhDsqT1A8NM5/I/LoZmBZclb3FnzPgiqLoEMWiGfhIYrwHAO3gPNA7+eDo5hoFrz 31RQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:in-reply-to :subject:cc:to:from:message-id:date:dkim-signature; bh=K5S9rn8sIzG8Jh0M5OVvhbw7HIDqe6gUdLpQ/kFHNrs=; b=mDog5r6x15Savjw+xL0uFY8P6UaLowULDR5HYTmsJAD8cQfoeX6oi794uW2OJA2ue3 gvWSFEXIsyQaegFOpFSLHDagf9c3ExPwpCg5qGSsth6iKK495+YQpIKPjO6OgiV08DLO 1H2auvackIbDpe5QAaw/km+a4hUl8f1hPgvGLi+5cuFPvUishnPTaCHZFknQFFUMqkul lTJS2g6Ya06MLvCe1nQoIR/1ziEvrEZchiBIgbzbkrZqAhZ4MwqpFhUJEU94gtlbY4vl wl50CRrIs4NMwq69o0kquuCWWcJPflJO/oYpb0Cl4mhVeRFUPfX2u/dYv99/9OMjJd7v zmaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=t8PslmTO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t126-20020a637884000000b0053f0cdab81esi2284898pgc.357.2023.05.29.07.28.45; Mon, 29 May 2023 07:28:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=t8PslmTO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230009AbjE2OSg (ORCPT + 99 others); Mon, 29 May 2023 10:18:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57546 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229673AbjE2OSd (ORCPT ); Mon, 29 May 2023 10:18:33 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87610A8; Mon, 29 May 2023 07:18:29 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 23C5B614CA; Mon, 29 May 2023 14:18:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 85056C433EF; Mon, 29 May 2023 14:18:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685369908; bh=XanX7byBEmuBhTMsmTfYRDi9srDmLXFA6kg1lQHYCyQ=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=t8PslmTOfb8SYtHGK/zwuAcf+Vr5g/K1h/CwfUP42LMMyeIt1HgPjND/chW2vUPsB tovOtb/q/IQPMivy/hDcxPEYuVcTAMyoS+4vov4+sZbWzehrc7rvdMxKcdV6jpkvW7 ahIzHhGDGAx0oXtA4C5CRYBONySa6gtEVWsvkovdmIHsDC/Bi51FfDS1QERO9v6LPS 27o7qwcJclJr8L/OULqLNVRzv11idFUqSHGdvn5BbVOTW/GWZa9n2KZoeBxaRVeOZA 7fdhf47MlIpQgaZyq2K2GEjxGwPUZZ7LY2Exofrfep9s4ddB8jlskpVBHHNHiY7ElO AK1wdURP+aVEg== Received: from sofa.misterjones.org ([185.219.108.64] helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1q3dha-0015Qd-9I; Mon, 29 May 2023 15:18:26 +0100 Date: Mon, 29 May 2023 15:18:25 +0100 Message-ID: <87ttvvjk5q.wl-maz@kernel.org> From: Marc Zyngier To: Raghavendra Rao Ananta Cc: Oliver Upton , James Morse , Suzuki K Poulose , Ricardo Koller , Paolo Bonzini , Jing Zhang , Colton Lewis , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Subject: Re: [PATCH v4 6/6] KVM: arm64: Use TLBI range-based intructions for unmap In-Reply-To: <20230519005231.3027912-7-rananta@google.com> References: <20230519005231.3027912-1-rananta@google.com> <20230519005231.3027912-7-rananta@google.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: rananta@google.com, oliver.upton@linux.dev, james.morse@arm.com, suzuki.poulose@arm.com, ricarkol@google.com, pbonzini@redhat.com, jingzhangos@google.com, coltonlewis@google.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 19 May 2023 01:52:31 +0100, Raghavendra Rao Ananta wrote: > > The current implementation of the stage-2 unmap walker traverses > the given range and, as a part of break-before-make, performs > TLB invalidations with a DSB for every PTE. A multitude of this > combination could cause a performance bottleneck. > > Hence, if the system supports FEAT_TLBIRANGE, defer the TLB > invalidations until the entire walk is finished, and then > use range-based instructions to invalidate the TLBs in one go. > Condition this upon S2FWB in order to avoid walking the page-table > again to perform the CMOs after issuing the TLBI. But that's the real bottleneck. TLBIs are cheap compared to CMOs, even on remarkably bad implementations. What is your plan to fix this? > > Rename stage2_put_pte() to stage2_unmap_put_pte() as the function > now serves the stage-2 unmap walker specifically, rather than > acting generic. > > Signed-off-by: Raghavendra Rao Ananta > --- > arch/arm64/kvm/hyp/pgtable.c | 35 ++++++++++++++++++++++++++++++----- > 1 file changed, 30 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > index b8f0dbd12f773..5832ee3418fb0 100644 > --- a/arch/arm64/kvm/hyp/pgtable.c > +++ b/arch/arm64/kvm/hyp/pgtable.c > @@ -771,16 +771,34 @@ static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t n > smp_store_release(ctx->ptep, new); > } > > -static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu, > - struct kvm_pgtable_mm_ops *mm_ops) > +static bool stage2_unmap_defer_tlb_flush(struct kvm_pgtable *pgt) > { > + /* > + * If FEAT_TLBIRANGE is implemented, defer the individial PTE > + * TLB invalidations until the entire walk is finished, and > + * then use the range-based TLBI instructions to do the > + * invalidations. Condition this upon S2FWB in order to avoid > + * a page-table walk again to perform the CMOs after TLBI. > + */ > + return system_supports_tlb_range() && stage2_has_fwb(pgt); > +} > + > +static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx, > + struct kvm_s2_mmu *mmu, > + struct kvm_pgtable_mm_ops *mm_ops) > +{ > + struct kvm_pgtable *pgt = ctx->arg; > + > /* > * Clear the existing PTE, and perform break-before-make with > * TLB maintenance if it was valid. > */ > if (kvm_pte_valid(ctx->old)) { > kvm_clear_pte(ctx->ptep); > - kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level); > + > + if (!stage2_unmap_defer_tlb_flush(pgt)) > + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, > + ctx->addr, ctx->level); This really doesn't match the comment anymore. Overall, I'm very concerned that we lose the consistency property that the current code has: once called, the TLBs and the page tables are synchronised. Yes, this patch looks correct. But it is also really fragile. > } > > mm_ops->put_page(ctx->ptep); > @@ -1015,7 +1033,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx, > * block entry and rely on the remaining portions being faulted > * back lazily. > */ > - stage2_put_pte(ctx, mmu, mm_ops); > + stage2_unmap_put_pte(ctx, mmu, mm_ops); > > if (need_flush && mm_ops->dcache_clean_inval_poc) > mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops), > @@ -1029,13 +1047,20 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx, > > int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size) > { > + int ret; > struct kvm_pgtable_walker walker = { > .cb = stage2_unmap_walker, > .arg = pgt, > .flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST, > }; > > - return kvm_pgtable_walk(pgt, addr, size, &walker); > + ret = kvm_pgtable_walk(pgt, addr, size, &walker); > + if (stage2_unmap_defer_tlb_flush(pgt)) > + /* Perform the deferred TLB invalidations */ > + kvm_call_hyp(__kvm_tlb_flush_vmid_range, pgt->mmu, > + addr, addr + size); This "kvm_call_hyp(__kvm_tlb_flush_vmid_range,...)" could do with a wrapper from the point where you introduce it. > + > + return ret; > } > Thanks, M. -- Without deviation from the norm, progress is not possible.