Received: by 2002:a05:6358:701b:b0:131:369:b2a3 with SMTP id 27csp3925023rwo; Mon, 24 Jul 2023 20:13:35 -0700 (PDT) X-Google-Smtp-Source: APBJJlFQA7I06x/Zct6UdJpkdhBHphjlIM5TPb4NVVRBqNNoSJY73hznJRPYOIVTg35YjurSrU4h X-Received: by 2002:a17:906:73cf:b0:98d:f062:8503 with SMTP id n15-20020a17090673cf00b0098df0628503mr9751041ejl.77.1690254815497; Mon, 24 Jul 2023 20:13:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690254815; cv=none; d=google.com; s=arc-20160816; b=lha2PMqvy7eED5ZgxmHrwr6POBg74fcYDU0Yyi318ID6xa9825DidZzragAcYrNU7z v5lo1IoR690Q8GkzPGUXGelVH/fsS+2jqcvQpAqbbyPdWaRAh4dUMTOWpGEPdEuS8pHM t8qUufWZK/E2RO+0/6ZVysnAopXe3yUw+PpzIV8ye+JwWDbU/Pf15/pJC/VN270yPfD4 k01qnDoY5fOU8utSjQ6LVlixFZagafBTEcpRpc/VTa1TH329egfUiYC022JOn8/2rOBC h1dwXkzeHLUClKhrK2ioJsxQP8jMGAmZQOR7ZlPshd16d87Ec3X5EMgvnOYn3RgUr4d1 z9Zw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=TLi69c4E4L5eZQg22PEMzi9DRqiDnwUXsa6MfQ9GlAI=; fh=qdIb887hU0dqc8IKH6rJbjoW7zl+PgoN/54++TKaROk=; b=GoM5j58lpU+bHcCk8y3I4zQVI0CjGO24+3Cuim2X/jcBQ7FmT7rpWplCsYjsYMFWPx vn1TPYj+sNws0GlPIO0jI51tO3buoOEjyy2yEPEtdsV2QKyA309p0YbLL6+b4IPY0BpW bwLTwspD27T501W4J0YOzU+5WJpqYboyjkbz7tqK4bWKwLuzqb1ruQiYxHtuP+p8kDsA 2kkxuAZ1O15Q1HRxb8MefiHVNU/jB8PvVjwkcrk4uPVkD5VBFNLoIubDhMDgGTqDsiXM C2LecHtwtb9fcLRRWiWo+DFr/XSyCd3tloS4XkLiOIVP/V/VeIECbIn3eFp+Wy592vwz CEgQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=g1hclesL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ox14-20020a170907100e00b009928b4e3b9csi7955181ejb.312.2023.07.24.20.13.11; Mon, 24 Jul 2023 20:13:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=g1hclesL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230195AbjGYCdp (ORCPT + 99 others); Mon, 24 Jul 2023 22:33:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33658 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229461AbjGYCdo (ORCPT ); Mon, 24 Jul 2023 22:33:44 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07DC4198 for ; Mon, 24 Jul 2023 19:32:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690252375; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TLi69c4E4L5eZQg22PEMzi9DRqiDnwUXsa6MfQ9GlAI=; b=g1hclesLfqPxLdvTaT8gMkKqCYTrQOWTrW4U9H/YdJZ04FSJshX39gViNYzjQjjvNQXTrW KsQBvaqW7J2MF+xyxtvouZQezeuFEO2dGPliCZRXMXO1U9VMj5yRYnTOUwJPcdtvxbnbJg MtHDYPAudbfnWsPP5DWxMdWsGk3FL6c= Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-648-J_8sJxz_MgOxkOFIxDmnMg-1; Mon, 24 Jul 2023 22:32:53 -0400 X-MC-Unique: J_8sJxz_MgOxkOFIxDmnMg-1 Received: by mail-pf1-f200.google.com with SMTP id d2e1a72fcca58-67f1947c0easo359551b3a.0 for ; Mon, 24 Jul 2023 19:32:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690252373; x=1690857173; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TLi69c4E4L5eZQg22PEMzi9DRqiDnwUXsa6MfQ9GlAI=; b=hMlClQxP89ah68Ckm40Bx0iVDeYFcCnOZ7yCSt+f/tstqVojh3k2a8mo5Ta4aUrsCN Uz8U17UkkYyZa2cFXW8I+ORW4p12iJMftNxxSpDco1cs2ILs5Ju9auaCfGANRqMqmX+c nlJJCZMJVfchxKi6sRlpKlxD48pv/hHDfQqM9wYQndZHMI5CBO2p3VhE7Uizdp3DRMgq +TtHH0x59MuB3kdw1QsxTgenU4O7YIdLmV2hEImnIwFOroROphqMJ9bB/4KdfT4RvV7F 1e8gZOn1AAMVbZ6BibUyqAJbRX/iaCS2CtT9efFgsl5BBv1mgJvktYwS0cJLsGjog3qO g7QA== X-Gm-Message-State: ABy/qLaKZ7BKNz4nuoM/DcIKOl8rm+mVv6eqEPBDcjQZ2y20lznHwZHH 9D0k+YEaRwA78lwIFrPew33332CUGI+he+cVvY7ljPR3Cdhe13DYw2s8MhxApv7PpT0c4WGRhtr yv2yA9PAMkLJwTVr4jDZ6W1ep X-Received: by 2002:a05:6a20:6a04:b0:134:1671:6191 with SMTP id p4-20020a056a206a0400b0013416716191mr15770068pzk.0.1690252372801; Mon, 24 Jul 2023 19:32:52 -0700 (PDT) X-Received: by 2002:a05:6a20:6a04:b0:134:1671:6191 with SMTP id p4-20020a056a206a0400b0013416716191mr15770035pzk.0.1690252372422; Mon, 24 Jul 2023 19:32:52 -0700 (PDT) Received: from [10.66.61.39] ([43.228.180.230]) by smtp.gmail.com with ESMTPSA id h16-20020aa786d0000000b00678cb336f3csm8655609pfo.142.2023.07.24.19.32.47 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 24 Jul 2023 19:32:52 -0700 (PDT) Message-ID: <7ea9e7a0-508d-0f00-09ae-ae468f111437@redhat.com> Date: Tue, 25 Jul 2023 10:32:46 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [PATCH v7 12/12] KVM: arm64: Use TLBI range-based intructions for unmap Content-Language: en-US To: Raghavendra Rao Ananta Cc: Oliver Upton , Marc Zyngier , James Morse , Suzuki K Poulose , Paolo Bonzini , Sean Christopherson , Huacai Chen , Zenghui Yu , Anup Patel , Atish Patra , Jing Zhang , Reiji Watanabe , Colton Lewis , David Matlack , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org References: <20230722022251.3446223-1-rananta@google.com> <20230722022251.3446223-13-rananta@google.com> <0841aca6-2824-6a1b-a568-119f8bd220de@redhat.com> From: Shaoqin Huang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/25/23 00:47, Raghavendra Rao Ananta wrote: > On Mon, Jul 24, 2023 at 2:35 AM Shaoqin Huang wrote: >> >> Hi Raghavendra, >> >> On 7/22/23 10:22, Raghavendra Rao Ananta wrote: >>> The current implementation of the stage-2 unmap walker traverses >>> the given range and, as a part of break-before-make, performs >>> TLB invalidations with a DSB for every PTE. A multitude of this >>> combination could cause a performance bottleneck on some systems. >>> >>> Hence, if the system supports FEAT_TLBIRANGE, defer the TLB >>> invalidations until the entire walk is finished, and then >>> use range-based instructions to invalidate the TLBs in one go. >>> Condition deferred TLB invalidation on the system supporting FWB, >>> as the optimization is entirely pointless when the unmap walker >>> needs to perform CMOs. >>> >>> Rename stage2_put_pte() to stage2_unmap_put_pte() as the function >>> now serves the stage-2 unmap walker specifically, rather than >>> acting generic. >>> >>> Signed-off-by: Raghavendra Rao Ananta >>> --- >>> arch/arm64/kvm/hyp/pgtable.c | 67 +++++++++++++++++++++++++++++++----- >>> 1 file changed, 58 insertions(+), 9 deletions(-) >>> >>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c >>> index 5ef098af1736..cf88933a2ea0 100644 >>> --- a/arch/arm64/kvm/hyp/pgtable.c >>> +++ b/arch/arm64/kvm/hyp/pgtable.c >>> @@ -831,16 +831,54 @@ static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t n >>> smp_store_release(ctx->ptep, new); >>> } >>> >>> -static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu, >>> - struct kvm_pgtable_mm_ops *mm_ops) >>> +struct stage2_unmap_data { >>> + struct kvm_pgtable *pgt; >>> + bool defer_tlb_flush_init; >>> +}; >>> + >>> +static bool __stage2_unmap_defer_tlb_flush(struct kvm_pgtable *pgt) >>> +{ >>> + /* >>> + * If FEAT_TLBIRANGE is implemented, defer the individual >>> + * TLB invalidations until the entire walk is finished, and >>> + * then use the range-based TLBI instructions to do the >>> + * invalidations. Condition deferred TLB invalidation on the >>> + * system supporting FWB, as the optimization is entirely >>> + * pointless when the unmap walker needs to perform CMOs. >>> + */ >>> + return system_supports_tlb_range() && stage2_has_fwb(pgt); >>> +} >>> + >>> +static bool stage2_unmap_defer_tlb_flush(struct stage2_unmap_data *unmap_data) >>> +{ >>> + bool defer_tlb_flush = __stage2_unmap_defer_tlb_flush(unmap_data->pgt); >>> + >>> + /* >>> + * Since __stage2_unmap_defer_tlb_flush() is based on alternative >>> + * patching and the TLBIs' operations behavior depend on this, >>> + * track if there's any change in the state during the unmap sequence. >>> + */ >>> + WARN_ON(unmap_data->defer_tlb_flush_init != defer_tlb_flush); >>> + return defer_tlb_flush; >>> +} >>> + >>> +static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx, >>> + struct kvm_s2_mmu *mmu, >>> + struct kvm_pgtable_mm_ops *mm_ops) >>> { >>> + struct stage2_unmap_data *unmap_data = ctx->arg; >>> + >>> /* >>> - * Clear the existing PTE, and perform break-before-make with >>> - * TLB maintenance if it was valid. >>> + * Clear the existing PTE, and perform break-before-make if it was >>> + * valid. Depending on the system support, the TLB maintenance for >>> + * the same can be deferred until the entire unmap is completed. >>> */ >>> if (kvm_pte_valid(ctx->old)) { >>> kvm_clear_pte(ctx->ptep); >>> - kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level); >>> + >>> + if (!stage2_unmap_defer_tlb_flush(unmap_data)) >> Why not directly check (unmap_data->defer_tlb_flush_init) here? >> > (Re-sending the reply as the previous one was formatted as HTML and > was blocked by many lists) > > No particular reason per say, but I was just going with the logic of > determining if we need to defer the flush and the WARN_ON() parts > separate. > Any advantage if we directly check in stage2_unmap_put_pte() that I > missed or is this purely for readability? Hi Raghavendra, I just wondering if before the stage2 walk, we want to defer the tlb flush, but if when walk the stage2, the stage2_unmap_defer_tlb_flush() trigger the WARN_ON() and return don't defer the tlb flush, it will flush the ctx->addr's tlb. But before the WARN_ON() triggered, these tlb will not be flushed, since when walk stage2 done in the kvm_pgtable_stage2_unmap(), the stage2_unmap_defer_tlb_flush() still trigger the WARN_ON() and don't use the tlb range-based flush. Thus some of the tlb are not flushed. If we directly check the (unmap_data->defer_tlb_flush_init), this isn't change during walking the stage2, so the WARN_ON() should only trigger in kvm_pgtable_stage2_unmap()->stage2_unmap_defer_tlb_flush(). I'm not sure if it's right since I just think once we set up use the TLBI range-based flush, the result of the __stage2_unmap_defer_tlb_flush() shouldn't change. Otherwise there must have some stale TLB entry. Thanks, Shaoqin > >>> + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, >>> + ctx->addr, ctx->level); >> Small indent hint. The ctx->addr can align with __kvm_tlb_flush_vmid_ipa. >> > Ah, yes. I'll adjust this if I send out a v8. > > Thank you. > Raghavendra >> Thanks, >> Shaoqin >>> } >>> >>> mm_ops->put_page(ctx->ptep); >>> @@ -1070,7 +1108,8 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size, >>> static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx, >>> enum kvm_pgtable_walk_flags visit) >>> { >>> - struct kvm_pgtable *pgt = ctx->arg; >>> + struct stage2_unmap_data *unmap_data = ctx->arg; >>> + struct kvm_pgtable *pgt = unmap_data->pgt; >>> struct kvm_s2_mmu *mmu = pgt->mmu; >>> struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops; >>> kvm_pte_t *childp = NULL; >>> @@ -1098,7 +1137,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx, >>> * block entry and rely on the remaining portions being faulted >>> * back lazily. >>> */ >>> - stage2_put_pte(ctx, mmu, mm_ops); >>> + stage2_unmap_put_pte(ctx, mmu, mm_ops); >>> >>> if (need_flush && mm_ops->dcache_clean_inval_poc) >>> mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops), >>> @@ -1112,13 +1151,23 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx, >>> >>> int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size) >>> { >>> + int ret; >>> + struct stage2_unmap_data unmap_data = { >>> + .pgt = pgt, >>> + .defer_tlb_flush_init = __stage2_unmap_defer_tlb_flush(pgt), >>> + }; >>> struct kvm_pgtable_walker walker = { >>> .cb = stage2_unmap_walker, >>> - .arg = pgt, >>> + .arg = &unmap_data, >>> .flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST, >>> }; >>> >>> - return kvm_pgtable_walk(pgt, addr, size, &walker); >>> + ret = kvm_pgtable_walk(pgt, addr, size, &walker); >>> + if (stage2_unmap_defer_tlb_flush(&unmap_data)) >>> + /* Perform the deferred TLB invalidations */ >>> + kvm_tlb_flush_vmid_range(pgt->mmu, addr, size); >>> + >>> + return ret; >>> } >>> >>> struct stage2_attr_data { >> >> -- >> Shaoqin >> > -- Shaoqin