Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp6334011rwl; Tue, 4 Apr 2023 11:00:19 -0700 (PDT) X-Google-Smtp-Source: AKy350bBgr3A+kAytJntYhWvfQojitHkIalTO546vYrSmIr0Q4h1+8IEECp811Bn80JEahWH2yTM X-Received: by 2002:a17:90b:1e46:b0:23f:7d05:8762 with SMTP id pi6-20020a17090b1e4600b0023f7d058762mr3945868pjb.23.1680631219016; Tue, 04 Apr 2023 11:00:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680631218; cv=none; d=google.com; s=arc-20160816; b=FWFXibyKV7Gw9QbpbhNG1vsNDHqL4Gj5sVqnS/zX0TWnAjORdXPTbLCoie9kfLN2Rr nhorlcPnkpB0RXIJSZGRBrp4pqNPnsZUHuZj6hVj+pvEdxnsi9cAzD7dUqSXCLWN5YJz ACCUM4eA5SKxQbxlD5XuixQwnAgyjT4zxj7sTAE8fh/BEd6CFz5eMfSfGu41+eWpyYQX HuavDC2RygyVLdGOZx8C7A9VcXLD3epPEPm7hIQssgYuDyvc3ee5dTfy7Jp4ijaEEdpP woQX8+ZH/FLLl8GM9rFimuEeTBLXKAwYK3pNMy+OUUZpG8ZTcUJ/nbedNXMaNZlmFFip RHQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=yNm7ut5xw3YZveb3bi0ZcN2eShh+vdiK3vTRJbWEvZA=; b=yttIuo9jKpPMuujbfcqkHAqKUrnBx1fZqpJccMcgYwcQvmMDyg0qfqaMhSme9T2HRK VPI2S/Q3Nr7Gmz7XRWStGtUXOT/L3Z/N3t5szn3vQFOwmh7A/D0aYVWxwT+INozCvF8N 1NwCJucm7uFXQVTZOBtivsbGO5uipqfW0K4j13MioWHdvAvnnL16KHjEWwuMQ72hoyuC jof1YRw+x+coxVRr9p5WABgcz4ahg+kxwHk1Db5JR12h7X8l5n9ZOu9U9Zs8FaWOMcWT X5bwRftxUT7+K6g5Va+8J7JnYWZOre4OcxksSzCeSG2y2HfxkdPULGmUDIFJJO97ah+e fh+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=h2tJpzO+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b7-20020a631b07000000b0050be9448847si10711861pgb.680.2023.04.04.11.00.07; Tue, 04 Apr 2023 11:00:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=h2tJpzO+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235763AbjDDRwU (ORCPT + 99 others); Tue, 4 Apr 2023 13:52:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235654AbjDDRwS (ORCPT ); Tue, 4 Apr 2023 13:52:18 -0400 Received: from mail-oa1-x2f.google.com (mail-oa1-x2f.google.com [IPv6:2001:4860:4864:20::2f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C28F3AAA for ; Tue, 4 Apr 2023 10:52:14 -0700 (PDT) Received: by mail-oa1-x2f.google.com with SMTP id 586e51a60fabf-17fcc07d6c4so21449643fac.8 for ; Tue, 04 Apr 2023 10:52:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680630733; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=yNm7ut5xw3YZveb3bi0ZcN2eShh+vdiK3vTRJbWEvZA=; b=h2tJpzO+g3ZA7Mm7KbV9GPDwwOmXm1vnpvbntR367mW0dQav+pdZcj+f50RJIi6beM tVxQJoFUESXI9XyuMrhwrLaOBoK0n+pJvu+rtplmt2eErJQNf032DN6A4hI9pGU7nOqV rA4QUKDX2VY4sFlKjFptHf+4iiEDFyv7prc6Q5h2TOLHRM3Q7b77VNFimBQ+9XvOveN1 DzO6yQYD9660QaKaDSCGQc7rJrxfjCPklNv041UJdiHriaz9/nvtuSsG5es7QrCwbPOP cHsKzjlv8Og+hpT6y5ROIw53CeSSVBe8AverjDtAy1Hc7sxC/vZQE8D2m+d8fvdnbtuF cmVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680630733; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yNm7ut5xw3YZveb3bi0ZcN2eShh+vdiK3vTRJbWEvZA=; b=q3OcfA5YJh/HsoMCSZ+lDnCh15Ef/7bwrU674GBSp6mrZemv4E+wAAOTPEKApymk0e aZyO85ZehE68043JbY7Di3REhS9AcWNoLmTv5Bz5zSlotPkqnfi9sdcPSYeNSPM6Xc9Y xv/0pXF3KVZsdcrwI6wkHtne424ZmFXIF+UGQ0ZrIaDVYtjeMipIln0L1Zt5j9H3o2/y QkA9jFZAr7fPVT4DBa0in0y+L4xQkWUZQ9gpYos2g663v6EamlAyhxJGsxS80ABzJalB VCHpAQBTrAtjL2MamKQJEUWfZsiIrwA+ueSEZoNzwYM9FMamerWEXLrSbsVw0vAeSe1A PRYw== X-Gm-Message-State: AAQBX9cs0NsS0v0b6CfX6SAaWd0Wi0OM0NvZoA5EJbtNedwqYNIAfUH+ Mw1ILXpEED4DWiFpxnWi6sGRpzPNjjCr6lEgs1DDSg== X-Received: by 2002:a05:6870:57:b0:17e:e396:cce0 with SMTP id 23-20020a056870005700b0017ee396cce0mr1672458oaz.9.1680630733248; Tue, 04 Apr 2023 10:52:13 -0700 (PDT) MIME-Version: 1.0 References: <20230206172340.2639971-1-rananta@google.com> <20230206172340.2639971-8-rananta@google.com> In-Reply-To: From: Raghavendra Rao Ananta Date: Tue, 4 Apr 2023 10:52:01 -0700 Message-ID: Subject: Re: [PATCH v2 7/7] KVM: arm64: Create a fast stage-2 unmap path To: Oliver Upton Cc: Oliver Upton , Marc Zyngier , Ricardo Koller , Reiji Watanabe , James Morse , Alexandru Elisei , Suzuki K Poulose , Will Deacon , Paolo Bonzini , Catalin Marinas , Jing Zhang , Colton Lewis , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-15.7 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,ENV_AND_HDR_SPF_MATCH, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 29, 2023 at 5:42=E2=80=AFPM Oliver Upton wrote: > > On Mon, Feb 06, 2023 at 05:23:40PM +0000, Raghavendra Rao Ananta wrote: > > The current implementation of the stage-2 unmap walker > > traverses the entire page-table to clear and flush the TLBs > > for each entry. This could be very expensive, especially if > > the VM is not backed by hugepages. The unmap operation could be > > made efficient by disconnecting the table at the very > > top (level at which the largest block mapping can be hosted) > > and do the rest of the unmapping using free_removed_table(). > > If the system supports FEAT_TLBIRANGE, flush the entire range > > that has been disconnected from the rest of the page-table. > > > > Suggested-by: Ricardo Koller > > Signed-off-by: Raghavendra Rao Ananta > > --- > > arch/arm64/kvm/hyp/pgtable.c | 44 ++++++++++++++++++++++++++++++++++++ > > 1 file changed, 44 insertions(+) > > > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.= c > > index 0858d1fa85d6b..af3729d0971f2 100644 > > --- a/arch/arm64/kvm/hyp/pgtable.c > > +++ b/arch/arm64/kvm/hyp/pgtable.c > > @@ -1017,6 +1017,49 @@ static int stage2_unmap_walker(const struct kvm_= pgtable_visit_ctx *ctx, > > return 0; > > } > > > > +/* > > + * The fast walker executes only if the unmap size is exactly equal to= the > > + * largest block mapping supported (i.e. at KVM_PGTABLE_MIN_BLOCK_LEVE= L), > > + * such that the underneath hierarchy at KVM_PGTABLE_MIN_BLOCK_LEVEL c= an > > + * be disconnected from the rest of the page-table without the need to > > + * traverse all the PTEs, at all the levels, and unmap each and every = one > > + * of them. The disconnected table is freed using free_removed_table()= . > > + */ > > +static int fast_stage2_unmap_walker(const struct kvm_pgtable_visit_ctx= *ctx, > > + enum kvm_pgtable_walk_flags visit) > > +{ > > + struct kvm_pgtable_mm_ops *mm_ops =3D ctx->mm_ops; > > + kvm_pte_t *childp =3D kvm_pte_follow(ctx->old, mm_ops); > > + struct kvm_s2_mmu *mmu =3D ctx->arg; > > + > > + if (!kvm_pte_valid(ctx->old) || ctx->level !=3D KVM_PGTABLE_MIN_B= LOCK_LEVEL) > > + return 0; > > + > > + if (!stage2_try_break_pte(ctx, mmu)) > > + return -EAGAIN; > > + > > + /* > > + * Gain back a reference for stage2_unmap_walker() to free > > + * this table entry from KVM_PGTABLE_MIN_BLOCK_LEVEL - 1. > > + */ > > + mm_ops->get_page(ctx->ptep); > > Doesn't this run the risk of a potential UAF if the refcount was 1 before > calling stage2_try_break_pte()? IOW, stage2_try_break_pte() will drop > the refcount to 0 on the page before this ever gets called. > > Also, AFAICT this misses the CMOs that are required on systems w/o > FEAT_FWB. Without them it is possible that the host will read something > other than what was most recently written by the guest if it is using > noncacheable memory attributes at stage-1. > > I imagine the actual bottleneck is the DSB required after every > CMO/TLBI. Theoretically, the unmap path could be updated to: > > - Perform the appropriate CMOs for every valid leaf entry *without* > issuing a DSB. > > - Elide TLBIs entirely that take place in the middle of the walk > > - After the walk completes, dsb(ish) to guarantee that the CMOs have > completed and the invalid PTEs are made visible to the hardware > walkers. This should be done implicitly by the TLBI implementation > > - Invalidate the [addr, addr + size) range of IPAs > > This would also avoid over-invalidating stage-1 since we blast the > entire stage-1 context for every stage-2 invalidation. Thoughts? > Correct me if I'm wrong, but if we invalidate the TLB after the walk is complete, don't you think there's a risk of race if the guest can hit in the TLB even though the page was unmapped? Thanks, Raghavendra Raghavendra > > + mm_ops->free_removed_table(childp, ctx->level); > > + return 0; > > +} > > + > > -- > Thanks, > Oliver