Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp3223794pxm; Mon, 28 Feb 2022 14:56:20 -0800 (PST) X-Google-Smtp-Source: ABdhPJxut0la/9sOhmR3CrRznkXLd1BOHFZnHJ7NieMeHQvvPKUX3+B2ayCfN6KKO1O7Kxb1gM6k X-Received: by 2002:a17:902:e88a:b0:14f:b488:a402 with SMTP id w10-20020a170902e88a00b0014fb488a402mr22897611plg.161.1646088980535; Mon, 28 Feb 2022 14:56:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646088980; cv=none; d=google.com; s=arc-20160816; b=oZRMAK8k/p+GY0DeXO4d2fxqcUbaEBtCUMDjSfpbL3UkWtsUobFD/VF3Yq8VqPmRxt TjAl1fZcxtTiVp3a1KGmDZe6ABVsN9EPKbzLNmaBEybhbOicK/Nm++IiWJNWv4UvuutJ 0k7bsTafacFWiHU2ulAypm2QQkxe80orJPvtZxM2A8uRrZznKa28VHfecu+HoV4QINjJ u9yB9FD0Y/rlrlFVIBaqm4uSblStZcRkXQF4hfsW8uym38urC8Ae1UInG72NiJrkYRjd qQGhD5zmFqzi1WELUAinXje2gBHEjY20R//C2PkDb6Yg33C9HWXF+wGdW/ZgTJqVfTTq Dk5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=F2fi2f0PCqSCYe+tcWhAUpFtNcH0g69LWKCmBIirVWc=; b=n+WpQttxPHuHwNilg2+TN/btNxBZ5eysGAqjXA/Zn/+BPrDygif5fv+lpzeZmDxvEs nC7sPmT5Y3U/DUquA+TIc5ddW257W+uc7PGjuBLs/a4wFev6s7K2JereCIcpIiUNtLXa qQUtnEz9s2ggaauuUoxisWNd2zfEHF5J4pryvqWnTg+2N5QEU8ybQ+wTM8rqe5Vnj2eR 6eK/fJYUUyTZbCp9sFhJKJAenapvPDwcbj+SeKq72LzZDqakpC/r4VsKC3r0U5vxOfGT yk8D4ftQcXBQNdbYrHA2MHIJl7cM0C7+Wj4EdwjIruBHBzYSWhg1Qc+60fmJnVcFlUS+ AIdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=WOEq1URY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z11-20020a170902708b00b00148b7431c57si9976592plk.344.2022.02.28.14.56.03; Mon, 28 Feb 2022 14:56:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=WOEq1URY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231492AbiB1WjJ (ORCPT + 99 others); Mon, 28 Feb 2022 17:39:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229780AbiB1WjI (ORCPT ); Mon, 28 Feb 2022 17:39:08 -0500 Received: from mail-ej1-x62b.google.com (mail-ej1-x62b.google.com [IPv6:2a00:1450:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 10F1A30F61 for ; Mon, 28 Feb 2022 14:38:28 -0800 (PST) Received: by mail-ej1-x62b.google.com with SMTP id p15so27801603ejc.7 for ; Mon, 28 Feb 2022 14:38:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=F2fi2f0PCqSCYe+tcWhAUpFtNcH0g69LWKCmBIirVWc=; b=WOEq1URYcDwyPIE9I17hh3CQNr46AQkStK2aveUEFtiX/oFcmFuN72q+SuX1DBDQAI vcHd7DA5kMne/VGCrkB6EFrGW22qY0iG7FbBaqDPTbqFsAfsb/krmvS0kAtv/ZMBDr86 WoiWI03kprmUiDUr7vQuePMl443QZpFCoNdCT7uUys7VOJ+D5OztXCIUwgN77IU3kUHF srSb5zHrKT2XunJHHoE4N648jgGP8M/tJI+KIvEs2lxQKdY5Ys2Qjq429WZRymt128/K VGGSSPyijvkGdV67TrxaOUOSXAwzF26JvUr1aAM1P9BsfoMWp/T+Oive7XLnKs1tGDlm FaCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=F2fi2f0PCqSCYe+tcWhAUpFtNcH0g69LWKCmBIirVWc=; b=GiMqRFXF8PbjWK83mTqdXJM36lOu7+EZB42mKF34lpDhc4zCP3XawBfh3dng6FKZNf UhFlYGTjS8SNOrAHkqPDqqd885w5s1gOvaIvhVLDubjyghoTBfwXk/uLl14sl6DQXSa3 wz4fJ3VwVNnr2dKviHj45VyHsultRn+h8tP/hbVJBArt90KehEy93gk/m+R70r0Nmb+D 7iAIscID0hB1tq0eC8k7ddARzhD19ddI/8RHu72xN/haHCPrJ6bLmhLBa2z7kwq6frr6 YFL/lu/uwOYMc1uCgdrDaOU/Nf4HHAihf3RJ15HRB4jttCjUVe/D2x2OMIVbtA/c6J6u BKRw== X-Gm-Message-State: AOAM531jSfte//BD9V0e9DzTfjr7MdSqQBOtpgoJxaWJomdDeXJ0L5gW Pk+NraKxxWE/ZLk1LYu3xs3OWDg0YfQEisNiqz/FGg== X-Received: by 2002:a17:906:be1:b0:6ce:c3c8:b4b6 with SMTP id z1-20020a1709060be100b006cec3c8b4b6mr16410585ejg.617.1646087906409; Mon, 28 Feb 2022 14:38:26 -0800 (PST) MIME-Version: 1.0 References: <20220225182248.3812651-1-seanjc@google.com> <20220225182248.3812651-5-seanjc@google.com> In-Reply-To: <20220225182248.3812651-5-seanjc@google.com> From: Ben Gardon Date: Mon, 28 Feb 2022 14:38:15 -0800 Message-ID: Subject: Re: [PATCH v2 4/7] KVM: x86/mmu: Zap only obsolete roots if a root shadow page is zapped To: Sean Christopherson Cc: Paolo Bonzini , Christian Borntraeger , Janosch Frank , David Hildenbrand , Claudio Imbrenda , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm , LKML , Lai Jiangshan Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-18.1 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 25, 2022 at 10:22 AM Sean Christopherson wrote: > > Zap only obsolete roots when responding to zapping a single root shadow > page. Because KVM keeps root_count elevated when stuffing a previous > root into its PGD cache, shadowing a 64-bit guest means that zapping any > root causes all vCPUs to reload all roots, even if their current root is > not affected by the zap. > > For many kernels, zapping a single root is a frequent operation, e.g. in > Linux it happens whenever an mm is dropped, e.g. process exits, etc... > Reviewed-by: Ben Gardon > Signed-off-by: Sean Christopherson > --- > arch/x86/include/asm/kvm_host.h | 2 + > arch/x86/kvm/mmu.h | 1 + > arch/x86/kvm/mmu/mmu.c | 65 +++++++++++++++++++++++++++++---- > arch/x86/kvm/x86.c | 4 +- > 4 files changed, 63 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 713e08f62385..343041e892c6 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -102,6 +102,8 @@ > #define KVM_REQ_MSR_FILTER_CHANGED KVM_ARCH_REQ(29) > #define KVM_REQ_UPDATE_CPU_DIRTY_LOGGING \ > KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) > +#define KVM_REQ_MMU_FREE_OBSOLETE_ROOTS \ > + KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) > > #define CR0_RESERVED_BITS \ > (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \ > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h > index 1d0c1904d69a..bf8dbc4bb12a 100644 > --- a/arch/x86/kvm/mmu.h > +++ b/arch/x86/kvm/mmu.h > @@ -80,6 +80,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, > > int kvm_mmu_load(struct kvm_vcpu *vcpu); > void kvm_mmu_unload(struct kvm_vcpu *vcpu); > +void kvm_mmu_free_obsolete_roots(struct kvm_vcpu *vcpu); > void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu); > void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu); > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 32c6d4b33d03..825996408465 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -2310,7 +2310,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm, > struct list_head *invalid_list, > int *nr_zapped) > { > - bool list_unstable; > + bool list_unstable, zapped_root = false; > > trace_kvm_mmu_prepare_zap_page(sp); > ++kvm->stat.mmu_shadow_zapped; > @@ -2352,14 +2352,20 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm, > * in kvm_mmu_zap_all_fast(). Note, is_obsolete_sp() also > * treats invalid shadow pages as being obsolete. > */ > - if (!is_obsolete_sp(kvm, sp)) > - kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD); > + zapped_root = !is_obsolete_sp(kvm, sp); > } > > if (sp->lpage_disallowed) > unaccount_huge_nx_page(kvm, sp); > > sp->role.invalid = 1; > + > + /* > + * Make the request to free obsolete roots after marking the root > + * invalid, otherwise other vCPUs may not see it as invalid. > + */ > + if (zapped_root) > + kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS); > return list_unstable; > } > > @@ -3947,7 +3953,7 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu, > * previous root, then __kvm_mmu_prepare_zap_page() signals all vCPUs > * to reload even if no vCPU is actively using the root. > */ > - if (!sp && kvm_test_request(KVM_REQ_MMU_RELOAD, vcpu)) > + if (!sp && kvm_test_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu)) > return true; > > return fault->slot && > @@ -4180,8 +4186,8 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd) > /* > * It's possible that the cached previous root page is obsolete because > * of a change in the MMU generation number. However, changing the > - * generation number is accompanied by KVM_REQ_MMU_RELOAD, which will > - * free the root set here and allocate a new one. > + * generation number is accompanied by KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, > + * which will free the root set here and allocate a new one. > */ > kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu); > > @@ -5085,6 +5091,51 @@ void kvm_mmu_unload(struct kvm_vcpu *vcpu) > vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY); > } > > +static bool is_obsolete_root(struct kvm *kvm, hpa_t root_hpa) > +{ > + struct kvm_mmu_page *sp; > + > + if (!VALID_PAGE(root_hpa)) > + return false; > + > + /* > + * When freeing obsolete roots, treat roots as obsolete if they don't > + * have an associated shadow page. This does mean KVM will get false > + * positives and free roots that don't strictly need to be freed, but > + * such false positives are relatively rare: > + * > + * (a) only PAE paging and nested NPT has roots without shadow pages > + * (b) remote reloads due to a memslot update obsoletes _all_ roots > + * (c) KVM doesn't track previous roots for PAE paging, and the guest > + * is unlikely to zap an in-use PGD. > + */ > + sp = to_shadow_page(root_hpa); > + return !sp || is_obsolete_sp(kvm, sp); > +} > + > +static void __kvm_mmu_free_obsolete_roots(struct kvm *kvm, struct kvm_mmu *mmu) > +{ > + unsigned long roots_to_free = 0; > + int i; > + > + if (is_obsolete_root(kvm, mmu->root.hpa)) > + roots_to_free |= KVM_MMU_ROOT_CURRENT; > + > + for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) { > + if (is_obsolete_root(kvm, mmu->root.hpa)) > + roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i); > + } > + > + if (roots_to_free) > + kvm_mmu_free_roots(kvm, mmu, roots_to_free); > +} > + > +void kvm_mmu_free_obsolete_roots(struct kvm_vcpu *vcpu) > +{ > + __kvm_mmu_free_obsolete_roots(vcpu->kvm, &vcpu->arch.root_mmu); > + __kvm_mmu_free_obsolete_roots(vcpu->kvm, &vcpu->arch.guest_mmu); > +} > + > static bool need_remote_flush(u64 old, u64 new) > { > if (!is_shadow_present_pte(old)) > @@ -5656,7 +5707,7 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm) > * Note: we need to do this under the protection of mmu_lock, > * otherwise, vcpu would purge shadow page but miss tlb flush. > */ > - kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD); > + kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS); > > kvm_zap_obsolete_pages(kvm); > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 579b26ffc124..d6bf0562c4c4 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -9856,8 +9856,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > goto out; > } > } > - if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu)) > - kvm_mmu_unload(vcpu); > + if (kvm_check_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu)) > + kvm_mmu_free_obsolete_roots(vcpu); > if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu)) > __kvm_migrate_timers(vcpu); > if (kvm_check_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu)) > -- > 2.35.1.574.g5d30c73bfb-goog >