Received: by 2002:a05:6358:c692:b0:131:369:b2a3 with SMTP id fe18csp616333rwb; Tue, 25 Jul 2023 23:15:55 -0700 (PDT) X-Google-Smtp-Source: APBJJlGBkG+Sr3qqLlB/APQzD83OcMsrUqzZog5wbLzUKTfKNbO9PIY4x7SpWT2ngggQD1bxMRaa X-Received: by 2002:a17:907:a056:b0:991:b292:699 with SMTP id gz22-20020a170907a05600b00991b2920699mr788526ejc.5.1690352155390; Tue, 25 Jul 2023 23:15:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690352155; cv=none; d=google.com; s=arc-20160816; b=rkoKSMI4oM00DCCMaxv+hJXEslt5DiaoiR6SZSz0LwzRwhnUlboJKdBnBaOYStx9A4 FJ8LUl6YJiXTKwpak4BKRyKkXq179ExNVjC95xphhrrsV0h/CFZyrH7BO67RuG9IWG3d d3bCbrtNKjZb6sABTYq73vg9pXHKm/quvIHnhfGqPffsLVk2xlr5VjDaObC79wvC+8V6 8Hqa0R/S8ytSHNmjFkgtft4AGRXb3JNJdhalNPuOT23nJJGvm+iqg25VXYTHtYGnGwl0 djGXbAyGDoO/Ocvweqd5sEeA7zWZ9pfgBxKNfTDpYcygiAQmmRcEGxtEY/3OA5I7ZDXn t93g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=Jd3Ayh8DN133HPzYKMvS4hrOB9t+N5s98/iIQZ50pSY=; fh=qdIb887hU0dqc8IKH6rJbjoW7zl+PgoN/54++TKaROk=; b=Si+EdSGJFsF5q4f+BFCa9NhUYC35vxne4I/hMmN0hsFrM1NmKGF5FWsTkhNKFpWV/4 j2oE73hyaaSHZr7xW1RdZmiOrY3rXmKiZjuba1kH+0gnHo1mm64XTsa9YB/vYOv9ItyC L2VC8RCSlrR2QvwLa1yi6HrlMhQes/Lm9pFjHmxsfEx2FJ2VooHtvTvNWik6Eqa1pdj1 Fen8yiafpz7lHGddOxw1goPcfxmsDqv11ZTuPjuYRyVYaMum+zOPuXSEw7m2CNbTY2PT a3BRb1hX/yhURfighqURianocbgXAgSQrliYexxeXVcokP8l2xTP9K1Dlhv6vCO4kkPr oTYg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=W5+tayqH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t14-20020a1709060c4e00b0098dfdc3f2d9si8837737ejf.342.2023.07.25.23.15.31; Tue, 25 Jul 2023 23:15:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=W5+tayqH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230138AbjGZE1k (ORCPT + 99 others); Wed, 26 Jul 2023 00:27:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42886 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231182AbjGZE1P (ORCPT ); Wed, 26 Jul 2023 00:27:15 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF12C49F4 for ; Tue, 25 Jul 2023 21:21:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690345261; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Jd3Ayh8DN133HPzYKMvS4hrOB9t+N5s98/iIQZ50pSY=; b=W5+tayqHJDhwg3STdw0mXoY/CfK+NMHmA4U8Svrh5W2fDRMHVpoucPw7nxumYRejSMKeRe XZuu0DOc8XmSVoUvsWN1DBsVh5yMz+D7nU3c9qme4KWju/42AkLtw+ljWG3gJGr/3Unmqf SF1JVBSqOeYz5JLXaAZTioxYHf0YM5o= Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-278-wqWocSaNNhKiiXZ8Z5nnjw-1; Wed, 26 Jul 2023 00:07:06 -0400 X-MC-Unique: wqWocSaNNhKiiXZ8Z5nnjw-1 Received: by mail-pf1-f199.google.com with SMTP id d2e1a72fcca58-67167ad515aso753766b3a.1 for ; Tue, 25 Jul 2023 21:07:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690344425; x=1690949225; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Jd3Ayh8DN133HPzYKMvS4hrOB9t+N5s98/iIQZ50pSY=; b=eoTi2R0z3Iz0ffpD3KZJs3Z61FmidIB+Ad5hDC53WMjrdT3nCpr4Aid9tQQ+B6wyVN elOPYYNF3I9G2vVA4baOYfGWnlMX8MF9390SQ25uMEJl+RrvIOY0jrdzAW/6MBDd8J/2 HOfh3iLt62y6jvw9tH9FWflwyKdtyWCUprIVzzP7Ztift0AKOtB/d6uPp3Gz9ImVgz5b hvk+1vPfmur3Aw3s+O0owSqgYc4QwlhOsnJzS6nwb1B4SgQPcPNmndvRdzv5fgwsOuTO FMlzRpH6v68sXGDAwrXe1u4JEjMn+I1rPBXlTAiR+Bdrd0xgLrSr64DMAumnUrz3bWJj DoUg== X-Gm-Message-State: ABy/qLbTdHdXQQsRGmHPYE/3cJr/dTLdh6z1enMJ/kIyeXAvp2ZgKd81 L56VqV1ZbvhQKyXetzRbXqrALug2a5iHG0njE4bdT3lYGhm8+Yom6zpGXgR2wvEfviJDwkTe8AH 15gOXFUOwbIShcZ9Rs+uxIOYN X-Received: by 2002:a05:6a21:998e:b0:13a:3649:dc1a with SMTP id ve14-20020a056a21998e00b0013a3649dc1amr1214731pzb.0.1690344425265; Tue, 25 Jul 2023 21:07:05 -0700 (PDT) X-Received: by 2002:a05:6a21:998e:b0:13a:3649:dc1a with SMTP id ve14-20020a056a21998e00b0013a3649dc1amr1214709pzb.0.1690344424918; Tue, 25 Jul 2023 21:07:04 -0700 (PDT) Received: from [10.72.112.95] ([43.228.180.230]) by smtp.gmail.com with ESMTPSA id a13-20020aa7864d000000b006862e7f4648sm10736311pfo.99.2023.07.25.21.06.59 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 25 Jul 2023 21:07:04 -0700 (PDT) Message-ID: Date: Wed, 26 Jul 2023 12:06:57 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [PATCH v7 12/12] KVM: arm64: Use TLBI range-based intructions for unmap To: Raghavendra Rao Ananta Cc: Oliver Upton , Marc Zyngier , James Morse , Suzuki K Poulose , Paolo Bonzini , Sean Christopherson , Huacai Chen , Zenghui Yu , Anup Patel , Atish Patra , Jing Zhang , Reiji Watanabe , Colton Lewis , David Matlack , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org References: <20230722022251.3446223-1-rananta@google.com> <20230722022251.3446223-13-rananta@google.com> <0841aca6-2824-6a1b-a568-119f8bd220de@redhat.com> <7ea9e7a0-508d-0f00-09ae-ae468f111437@redhat.com> Content-Language: en-US From: Shaoqin Huang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Raghavendra, On 7/26/23 01:23, Raghavendra Rao Ananta wrote: > Hi Shaoqin, > > On Mon, Jul 24, 2023 at 7:32 PM Shaoqin Huang wrote: >> >> >> >> On 7/25/23 00:47, Raghavendra Rao Ananta wrote: >>> On Mon, Jul 24, 2023 at 2:35 AM Shaoqin Huang wrote: >>>> >>>> Hi Raghavendra, >>>> >>>> On 7/22/23 10:22, Raghavendra Rao Ananta wrote: >>>>> The current implementation of the stage-2 unmap walker traverses >>>>> the given range and, as a part of break-before-make, performs >>>>> TLB invalidations with a DSB for every PTE. A multitude of this >>>>> combination could cause a performance bottleneck on some systems. >>>>> >>>>> Hence, if the system supports FEAT_TLBIRANGE, defer the TLB >>>>> invalidations until the entire walk is finished, and then >>>>> use range-based instructions to invalidate the TLBs in one go. >>>>> Condition deferred TLB invalidation on the system supporting FWB, >>>>> as the optimization is entirely pointless when the unmap walker >>>>> needs to perform CMOs. >>>>> >>>>> Rename stage2_put_pte() to stage2_unmap_put_pte() as the function >>>>> now serves the stage-2 unmap walker specifically, rather than >>>>> acting generic. >>>>> >>>>> Signed-off-by: Raghavendra Rao Ananta >>>>> --- >>>>> arch/arm64/kvm/hyp/pgtable.c | 67 +++++++++++++++++++++++++++++++----- >>>>> 1 file changed, 58 insertions(+), 9 deletions(-) >>>>> >>>>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c >>>>> index 5ef098af1736..cf88933a2ea0 100644 >>>>> --- a/arch/arm64/kvm/hyp/pgtable.c >>>>> +++ b/arch/arm64/kvm/hyp/pgtable.c >>>>> @@ -831,16 +831,54 @@ static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t n >>>>> smp_store_release(ctx->ptep, new); >>>>> } >>>>> >>>>> -static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu, >>>>> - struct kvm_pgtable_mm_ops *mm_ops) >>>>> +struct stage2_unmap_data { >>>>> + struct kvm_pgtable *pgt; >>>>> + bool defer_tlb_flush_init; >>>>> +}; >>>>> + >>>>> +static bool __stage2_unmap_defer_tlb_flush(struct kvm_pgtable *pgt) >>>>> +{ >>>>> + /* >>>>> + * If FEAT_TLBIRANGE is implemented, defer the individual >>>>> + * TLB invalidations until the entire walk is finished, and >>>>> + * then use the range-based TLBI instructions to do the >>>>> + * invalidations. Condition deferred TLB invalidation on the >>>>> + * system supporting FWB, as the optimization is entirely >>>>> + * pointless when the unmap walker needs to perform CMOs. >>>>> + */ >>>>> + return system_supports_tlb_range() && stage2_has_fwb(pgt); >>>>> +} >>>>> + >>>>> +static bool stage2_unmap_defer_tlb_flush(struct stage2_unmap_data *unmap_data) >>>>> +{ >>>>> + bool defer_tlb_flush = __stage2_unmap_defer_tlb_flush(unmap_data->pgt); >>>>> + >>>>> + /* >>>>> + * Since __stage2_unmap_defer_tlb_flush() is based on alternative >>>>> + * patching and the TLBIs' operations behavior depend on this, >>>>> + * track if there's any change in the state during the unmap sequence. >>>>> + */ >>>>> + WARN_ON(unmap_data->defer_tlb_flush_init != defer_tlb_flush); >>>>> + return defer_tlb_flush; >>>>> +} >>>>> + >>>>> +static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx, >>>>> + struct kvm_s2_mmu *mmu, >>>>> + struct kvm_pgtable_mm_ops *mm_ops) >>>>> { >>>>> + struct stage2_unmap_data *unmap_data = ctx->arg; >>>>> + >>>>> /* >>>>> - * Clear the existing PTE, and perform break-before-make with >>>>> - * TLB maintenance if it was valid. >>>>> + * Clear the existing PTE, and perform break-before-make if it was >>>>> + * valid. Depending on the system support, the TLB maintenance for >>>>> + * the same can be deferred until the entire unmap is completed. >>>>> */ >>>>> if (kvm_pte_valid(ctx->old)) { >>>>> kvm_clear_pte(ctx->ptep); >>>>> - kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level); >>>>> + >>>>> + if (!stage2_unmap_defer_tlb_flush(unmap_data)) >>>> Why not directly check (unmap_data->defer_tlb_flush_init) here? >>>> >>> (Re-sending the reply as the previous one was formatted as HTML and >>> was blocked by many lists) >>> >>> No particular reason per say, but I was just going with the logic of >>> determining if we need to defer the flush and the WARN_ON() parts >>> separate. >>> Any advantage if we directly check in stage2_unmap_put_pte() that I >>> missed or is this purely for readability? >> >> Hi Raghavendra, >> >> I just wondering if before the stage2 walk, we want to defer the tlb >> flush, but if when walk the stage2, the stage2_unmap_defer_tlb_flush() >> trigger the WARN_ON() and return don't defer the tlb flush, it will >> flush the ctx->addr's tlb. But before the WARN_ON() triggered, these tlb >> will not be flushed, since when walk stage2 done in the >> kvm_pgtable_stage2_unmap(), the stage2_unmap_defer_tlb_flush() still >> trigger the WARN_ON() and don't use the tlb range-based flush. Thus some >> of the tlb are not flushed. >> > Excellent point! >> If we directly check the (unmap_data->defer_tlb_flush_init), this isn't >> change during walking the stage2, so the WARN_ON() should only trigger >> in kvm_pgtable_stage2_unmap()->stage2_unmap_defer_tlb_flush(). >> >> I'm not sure if it's right since I just think once we set up use the >> TLBI range-based flush, the result of the >> __stage2_unmap_defer_tlb_flush() shouldn't change. Otherwise there must >> have some stale TLB entry. >> > One solution that I can think of is, if the code triggers the > WARN_ON(), we flush the entire VM's TLB using > kvm_call_hyp(__kvm_tlb_flush_vmid) after the entire walk is finished. > In this special/rare situation, it'll be a little expensive, but we > will at least be correct, leaving no stale TLBs behind. WDYT? > I think it will be good to have this handling. Thanks, Shaoqin > Thank you. > Raghavendra >> Thanks, >> Shaoqin >> >>> >>>>> + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, >>>>> + ctx->addr, ctx->level); >>>> Small indent hint. The ctx->addr can align with __kvm_tlb_flush_vmid_ipa. >>>> >>> Ah, yes. I'll adjust this if I send out a v8. >>> >>> Thank you. >>> Raghavendra >>>> Thanks, >>>> Shaoqin >>>>> } >>>>> >>>>> mm_ops->put_page(ctx->ptep); >>>>> @@ -1070,7 +1108,8 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size, >>>>> static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx, >>>>> enum kvm_pgtable_walk_flags visit) >>>>> { >>>>> - struct kvm_pgtable *pgt = ctx->arg; >>>>> + struct stage2_unmap_data *unmap_data = ctx->arg; >>>>> + struct kvm_pgtable *pgt = unmap_data->pgt; >>>>> struct kvm_s2_mmu *mmu = pgt->mmu; >>>>> struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops; >>>>> kvm_pte_t *childp = NULL; >>>>> @@ -1098,7 +1137,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx, >>>>> * block entry and rely on the remaining portions being faulted >>>>> * back lazily. >>>>> */ >>>>> - stage2_put_pte(ctx, mmu, mm_ops); >>>>> + stage2_unmap_put_pte(ctx, mmu, mm_ops); >>>>> >>>>> if (need_flush && mm_ops->dcache_clean_inval_poc) >>>>> mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops), >>>>> @@ -1112,13 +1151,23 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx, >>>>> >>>>> int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size) >>>>> { >>>>> + int ret; >>>>> + struct stage2_unmap_data unmap_data = { >>>>> + .pgt = pgt, >>>>> + .defer_tlb_flush_init = __stage2_unmap_defer_tlb_flush(pgt), >>>>> + }; >>>>> struct kvm_pgtable_walker walker = { >>>>> .cb = stage2_unmap_walker, >>>>> - .arg = pgt, >>>>> + .arg = &unmap_data, >>>>> .flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST, >>>>> }; >>>>> >>>>> - return kvm_pgtable_walk(pgt, addr, size, &walker); >>>>> + ret = kvm_pgtable_walk(pgt, addr, size, &walker); >>>>> + if (stage2_unmap_defer_tlb_flush(&unmap_data)) >>>>> + /* Perform the deferred TLB invalidations */ >>>>> + kvm_tlb_flush_vmid_range(pgt->mmu, addr, size); >>>>> + >>>>> + return ret; >>>>> } >>>>> >>>>> struct stage2_attr_data { >>>> >>>> -- >>>> Shaoqin >>>> >>> >> >> -- >> Shaoqin >> > -- Shaoqin