Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp1430290rwl; Wed, 29 Mar 2023 18:12:56 -0700 (PDT) X-Google-Smtp-Source: AKy350bzqh/S3faZrXyMJ0lxjGYKa71sqH2itEp8ZCWqroYQ5RWgpiBsvklmap07CF3/QTNX1csl X-Received: by 2002:aa7:98db:0:b0:62a:4fb1:3dea with SMTP id e27-20020aa798db000000b0062a4fb13deamr19465872pfm.24.1680138776346; Wed, 29 Mar 2023 18:12:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680138776; cv=none; d=google.com; s=arc-20160816; b=Xtr4ijZxjI22Q1reK2SQ11Gz614o42YxxS49Eaud2s3YlzY8gY2Xv6nYeLXtyp7po2 PU81AvvAhFFmGs0dGCv64OJoye5tXzVOBw9L6qM7t3M4vzOSWmlUmvEIaLLdq++n10uu 3gRiDJZOOHihMCIAfIUSQqedDEtfzcW3rkBnTr0DCQZutQ5yIyNxcXw0397786ZSV8WW 6xMC2r7JJFtKBdoDlvU3MUVf1tLqTgKcjN+FSbfnAN0tJfmP8W6YtqB4ElzjfGlw6Eu8 qdxs6yH5jBpPAeC8ZdyML3uXw6sGc94i/PUpGuSUa4N5pbhjxl/uahxBPhw/RaY8ivTg fK+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:dkim-signature:date; bh=4xd/k5rHG8ZJOoRqMCC4pzYIZVQoUYyjdNvAulvNvC0=; b=znLoOIR44I4RxgAjLYthmwjRHvb9wnGTn1Lm7EESh0TkcBIKZlp/0oJa/3prWDZhFH k9Wae+xINx2fX6ICUDX+HtEy8LCZ2FGGiw2s0VrV/F+Qov4iz742AjxbNyx9tIr3jnQR hFg/acgYj9M5TsZNgfMVdXOik4nEJNr7f+07vZYAIx/6xQWfU/qpYb6vwEhxCmdkUSYx deTb2drYtshAU6ndooqmAgiGzBVuGRir08tPpeF6prR4MNsGqR8qyhSyy7iYoE1QR4cB yCch1pTPn1fjMyAEXpIA8O0ikppG580w9Op0Tgd2xfaYQsyngbIp2P5LlckeTJgvJs8p bEOg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=xwYIrIab; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f8-20020a056a0022c800b006222f66d440si27209560pfj.37.2023.03.29.18.12.43; Wed, 29 Mar 2023 18:12:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=xwYIrIab; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230179AbjC3AmZ (ORCPT + 99 others); Wed, 29 Mar 2023 20:42:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54414 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230039AbjC3AmX (ORCPT ); Wed, 29 Mar 2023 20:42:23 -0400 Received: from out-40.mta1.migadu.com (out-40.mta1.migadu.com [95.215.58.40]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AB3AF4C21 for ; Wed, 29 Mar 2023 17:42:22 -0700 (PDT) Date: Thu, 30 Mar 2023 00:42:15 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1680136941; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=4xd/k5rHG8ZJOoRqMCC4pzYIZVQoUYyjdNvAulvNvC0=; b=xwYIrIab5hLWSFwmwT/XZOdjgS9o0q3v1rsti4GMdILMIgJbEhekWYUlgTkfOadrGQAllJ 45+vKZNNuc6h7Cw27g9Uh8B+7YEc+bfa+7p95LHzu5sFhQxLEmo6c0lB6NO+PxZzRlINsI b8gVVMS8DJ69w7nE0SE5uGPpYBXFfjY= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Oliver Upton To: Raghavendra Rao Ananta Cc: Oliver Upton , Marc Zyngier , Ricardo Koller , Reiji Watanabe , James Morse , Alexandru Elisei , Suzuki K Poulose , Will Deacon , Paolo Bonzini , Catalin Marinas , Jing Zhang , Colton Lewis , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Subject: Re: [PATCH v2 7/7] KVM: arm64: Create a fast stage-2 unmap path Message-ID: References: <20230206172340.2639971-1-rananta@google.com> <20230206172340.2639971-8-rananta@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230206172340.2639971-8-rananta@google.com> X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 06, 2023 at 05:23:40PM +0000, Raghavendra Rao Ananta wrote: > The current implementation of the stage-2 unmap walker > traverses the entire page-table to clear and flush the TLBs > for each entry. This could be very expensive, especially if > the VM is not backed by hugepages. The unmap operation could be > made efficient by disconnecting the table at the very > top (level at which the largest block mapping can be hosted) > and do the rest of the unmapping using free_removed_table(). > If the system supports FEAT_TLBIRANGE, flush the entire range > that has been disconnected from the rest of the page-table. > > Suggested-by: Ricardo Koller > Signed-off-by: Raghavendra Rao Ananta > --- > arch/arm64/kvm/hyp/pgtable.c | 44 ++++++++++++++++++++++++++++++++++++ > 1 file changed, 44 insertions(+) > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > index 0858d1fa85d6b..af3729d0971f2 100644 > --- a/arch/arm64/kvm/hyp/pgtable.c > +++ b/arch/arm64/kvm/hyp/pgtable.c > @@ -1017,6 +1017,49 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx, > return 0; > } > > +/* > + * The fast walker executes only if the unmap size is exactly equal to the > + * largest block mapping supported (i.e. at KVM_PGTABLE_MIN_BLOCK_LEVEL), > + * such that the underneath hierarchy at KVM_PGTABLE_MIN_BLOCK_LEVEL can > + * be disconnected from the rest of the page-table without the need to > + * traverse all the PTEs, at all the levels, and unmap each and every one > + * of them. The disconnected table is freed using free_removed_table(). > + */ > +static int fast_stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx, > + enum kvm_pgtable_walk_flags visit) > +{ > + struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops; > + kvm_pte_t *childp = kvm_pte_follow(ctx->old, mm_ops); > + struct kvm_s2_mmu *mmu = ctx->arg; > + > + if (!kvm_pte_valid(ctx->old) || ctx->level != KVM_PGTABLE_MIN_BLOCK_LEVEL) > + return 0; > + > + if (!stage2_try_break_pte(ctx, mmu)) > + return -EAGAIN; > + > + /* > + * Gain back a reference for stage2_unmap_walker() to free > + * this table entry from KVM_PGTABLE_MIN_BLOCK_LEVEL - 1. > + */ > + mm_ops->get_page(ctx->ptep); Doesn't this run the risk of a potential UAF if the refcount was 1 before calling stage2_try_break_pte()? IOW, stage2_try_break_pte() will drop the refcount to 0 on the page before this ever gets called. Also, AFAICT this misses the CMOs that are required on systems w/o FEAT_FWB. Without them it is possible that the host will read something other than what was most recently written by the guest if it is using noncacheable memory attributes at stage-1. I imagine the actual bottleneck is the DSB required after every CMO/TLBI. Theoretically, the unmap path could be updated to: - Perform the appropriate CMOs for every valid leaf entry *without* issuing a DSB. - Elide TLBIs entirely that take place in the middle of the walk - After the walk completes, dsb(ish) to guarantee that the CMOs have completed and the invalid PTEs are made visible to the hardware walkers. This should be done implicitly by the TLBI implementation - Invalidate the [addr, addr + size) range of IPAs This would also avoid over-invalidating stage-1 since we blast the entire stage-1 context for every stage-2 invalidation. Thoughts? > + mm_ops->free_removed_table(childp, ctx->level); > + return 0; > +} > + -- Thanks, Oliver