Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp5364825pxu; Wed, 21 Oct 2020 23:31:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzyFN8wUP2b/YgiHaBwCZMBttj8BACElFsjCV9gCnZaxn9uIxm5qnHnpxp1KoaZGXs/oT/q X-Received: by 2002:a17:906:d0d4:: with SMTP id bq20mr876646ejb.257.1603348287603; Wed, 21 Oct 2020 23:31:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603348287; cv=none; d=google.com; s=arc-20160816; b=O3nNInX9WVIK+HETNK/v7gRiVPO1xkLkK3fWBwtModOZiAGtPYvKxWAguAsaEFwuNZ zbqEGEmlk87wQaRfC0vU35jf1/1NvkMW96dwMw5aAc4La35ZcXqXdtci5ts7EoqtL7hj t3qX2DY6bpX+pjV7rEi8azmhk+xGimO/HCpGiZ4/8WBaobPzolqIubO5CYWgMtnIq+e3 c11I4bSjaUn3CtqXyuVO/ymV6DXpH7h7yHkZzg9b8quEQVb8H2sfpnqjHSwZxi1UwMX5 YmojMK3VbRi01Nl8Gg0zq9unvslrBHyyFiifZ4tjKSbti194GnjYjDYjEpK4nxsQfrk5 TfTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=6xOMd3ovdN5AdHh0Qp+hJcCgYWSh7PiF+dyOCFeIRp8=; b=Cov7hKBG3Y3oQLmprrjstvnWoehxO9OWJL1SBTSnZ7icPNYTqP7yQgppIkuV8tdm8D DUPS5f3JMej3NXIXWuDKhGXOzMyqrVs1/eiw4wSQG8Pi+edUipLoZnzZnfXM6u4Y+KjQ 54FHkZXxN3i1dSYq3+dIeo8uIUYldjI6Mlerl2ncd26k8dev5BYtVRjelpnsgvofBT8R fb7T2dop/vDEKPuSuIZezEh3g3zsnqrCMER1BaDtdVGpvYXUGgiQ3AF3z1ZwjnIi5EJJ ngXfJqlh99VW+HkZ4JIrcJEaypF7wddPe/oVHDNmFtkqb9CQkt0nL64GqCWwtj16Ybqx QGjA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hpF8jKCd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f1si292677ejb.362.2020.10.21.23.31.05; Wed, 21 Oct 2020 23:31:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hpF8jKCd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2503828AbgJUSBZ (ORCPT + 99 others); Wed, 21 Oct 2020 14:01:25 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:41338 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2503767AbgJUSBZ (ORCPT ); Wed, 21 Oct 2020 14:01:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603303282; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6xOMd3ovdN5AdHh0Qp+hJcCgYWSh7PiF+dyOCFeIRp8=; b=hpF8jKCdMdUuCTulc5UHMsJyUvYs67sSNoGxeAKhxlgutkqq+t85/2Q0AMQH4WMR+jO35N 7hOCXMxjEPImJ9C4WjjjPt/o/skDQHD0SXjuK5g53Ty52cWW1O/7MCcEhmuO36NIEqTbcp baIiAjEqYY6e9i/oQgwVUDQ1sFx45Kw= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-269-xNzMDPHdPJmRCRXoPn7BEA-1; Wed, 21 Oct 2020 14:01:20 -0400 X-MC-Unique: xNzMDPHdPJmRCRXoPn7BEA-1 Received: by mail-wr1-f71.google.com with SMTP id k14so3018792wrd.6 for ; Wed, 21 Oct 2020 11:01:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=6xOMd3ovdN5AdHh0Qp+hJcCgYWSh7PiF+dyOCFeIRp8=; b=HBuqyqqmG2mAKPuC8FdcHt67O+w9cwau1sbLg6ElFuZOBcg0LaFGTabXSdmw0ryyAr YO/dCKYMoOTSRrtpVetflozETlGIrCSJD9UF/OL2tQLbzdqJVw7T/n3PkpOuAYMpJtRK LqhtEDgqveye3f/ALgceFaPu1PV5D/RqC2nQCqsBl2CsVs6O/EVT+T7HP7pd67user1U ynFYBuAAEymx1k/ApfBPReXvGL/yiw38pFMilLlVxlIOsa+iOZwwYoVE70CGMJsKxPCg 1S5jfbijODMHjtz/AJm3vzpOT0tg7mLBAuxEFXOOQYSyT3VwvwyM55jvHwFajM+QWJZF kxtg== X-Gm-Message-State: AOAM531hGMdtEk+fP1ZyDHpfHfd9k3XwuHf8kDWRuncLf9eHItESaEOH aACVrWZicocAvOE3vHXQRRA78i7GDU1xdDeibb+SfL0qppXlX6h4P9XnNLQYX9+8MuFYbAeiu5o QW7lY1u5TXSCgP1NiaX7ulmmV X-Received: by 2002:adf:e4c8:: with SMTP id v8mr6017458wrm.72.1603303278946; Wed, 21 Oct 2020 11:01:18 -0700 (PDT) X-Received: by 2002:adf:e4c8:: with SMTP id v8mr6017419wrm.72.1603303278631; Wed, 21 Oct 2020 11:01:18 -0700 (PDT) Received: from ?IPv6:2001:b07:6468:f312:c8dd:75d4:99ab:290a? ([2001:b07:6468:f312:c8dd:75d4:99ab:290a]) by smtp.gmail.com with ESMTPSA id y5sm5114130wrw.52.2020.10.21.11.01.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 21 Oct 2020 11:01:17 -0700 (PDT) Subject: Re: [PATCH v2 04/20] kvm: x86/mmu: Allocate and free TDP MMU roots To: Ben Gardon , Yu Zhang Cc: LKML , kvm , Cannon Matthews , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong References: <20201014182700.2888246-1-bgardon@google.com> <20201014182700.2888246-5-bgardon@google.com> <20201021150917.xkiq74pbb63rqxvu@linux.intel.com> From: Paolo Bonzini Message-ID: Date: Wed, 21 Oct 2020 20:01:14 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.3.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 21/10/20 19:54, Ben Gardon wrote: > On Wed, Oct 21, 2020 at 8:09 AM Yu Zhang wrote: >> >> On Wed, Oct 14, 2020 at 11:26:44AM -0700, Ben Gardon wrote: >>> The TDP MMU must be able to allocate paging structure root pages and track >>> the usage of those pages. Implement a similar, but separate system for root >>> page allocation to that of the x86 shadow paging implementation. When >>> future patches add synchronization model changes to allow for parallel >>> page faults, these pages will need to be handled differently from the >>> x86 shadow paging based MMU's root pages. >>> >>> Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell >>> machine. This series introduced no new failures. >>> >>> This series can be viewed in Gerrit at: >>> https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 >>> >>> Signed-off-by: Ben Gardon >>> --- >>> arch/x86/include/asm/kvm_host.h | 1 + >>> arch/x86/kvm/mmu/mmu.c | 29 +++++--- >>> arch/x86/kvm/mmu/mmu_internal.h | 24 +++++++ >>> arch/x86/kvm/mmu/tdp_mmu.c | 114 ++++++++++++++++++++++++++++++++ >>> arch/x86/kvm/mmu/tdp_mmu.h | 5 ++ >>> 5 files changed, 162 insertions(+), 11 deletions(-) >>> >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >>> index 6b6dbc20ce23a..e0ec1dd271a32 100644 >>> --- a/arch/x86/include/asm/kvm_host.h >>> +++ b/arch/x86/include/asm/kvm_host.h >>> @@ -989,6 +989,7 @@ struct kvm_arch { >>> * operations. >>> */ >>> bool tdp_mmu_enabled; >>> + struct list_head tdp_mmu_roots; >>> }; >>> >>> struct kvm_vm_stat { >>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c >>> index f53d29e09367c..a3340ed59ad1d 100644 >>> --- a/arch/x86/kvm/mmu/mmu.c >>> +++ b/arch/x86/kvm/mmu/mmu.c >>> @@ -144,11 +144,6 @@ module_param(dbg, bool, 0644); >>> #define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_mask \ >>> | shadow_x_mask | shadow_nx_mask | shadow_me_mask) >>> >>> -#define ACC_EXEC_MASK 1 >>> -#define ACC_WRITE_MASK PT_WRITABLE_MASK >>> -#define ACC_USER_MASK PT_USER_MASK >>> -#define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK) >>> - >>> /* The mask for the R/X bits in EPT PTEs */ >>> #define PT64_EPT_READABLE_MASK 0x1ull >>> #define PT64_EPT_EXECUTABLE_MASK 0x4ull >>> @@ -209,7 +204,7 @@ struct kvm_shadow_walk_iterator { >>> __shadow_walk_next(&(_walker), spte)) >>> >>> static struct kmem_cache *pte_list_desc_cache; >>> -static struct kmem_cache *mmu_page_header_cache; >>> +struct kmem_cache *mmu_page_header_cache; >>> static struct percpu_counter kvm_total_used_mmu_pages; >>> >>> static u64 __read_mostly shadow_nx_mask; >>> @@ -3588,9 +3583,13 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa, >>> return; >>> >>> sp = to_shadow_page(*root_hpa & PT64_BASE_ADDR_MASK); >>> - --sp->root_count; >>> - if (!sp->root_count && sp->role.invalid) >>> - kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); >>> + >>> + if (kvm_mmu_put_root(sp)) { >>> + if (sp->tdp_mmu_page) >>> + kvm_tdp_mmu_free_root(kvm, sp); >>> + else if (sp->role.invalid) >>> + kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); >>> + } >>> >>> *root_hpa = INVALID_PAGE; >>> } >>> @@ -3680,8 +3679,16 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) >>> hpa_t root; >>> unsigned i; >>> >>> - if (shadow_root_level >= PT64_ROOT_4LEVEL) { >>> - root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level, true); >>> + if (vcpu->kvm->arch.tdp_mmu_enabled) { >>> + root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu); >>> + >>> + if (!VALID_PAGE(root)) >>> + return -ENOSPC; >>> + vcpu->arch.mmu->root_hpa = root; >>> + } else if (shadow_root_level >= PT64_ROOT_4LEVEL) { >>> + root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level, >>> + true); >>> + >>> if (!VALID_PAGE(root)) >>> return -ENOSPC; >>> vcpu->arch.mmu->root_hpa = root; >>> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h >>> index 74ccbf001a42e..6cedf578c9a8d 100644 >>> --- a/arch/x86/kvm/mmu/mmu_internal.h >>> +++ b/arch/x86/kvm/mmu/mmu_internal.h >>> @@ -43,8 +43,12 @@ struct kvm_mmu_page { >>> >>> /* Number of writes since the last time traversal visited this page. */ >>> atomic_t write_flooding_count; >>> + >>> + bool tdp_mmu_page; >>> }; >>> >>> +extern struct kmem_cache *mmu_page_header_cache; >>> + >>> static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page) >>> { >>> struct page *page = pfn_to_page(shadow_page >> PAGE_SHIFT); >>> @@ -96,6 +100,11 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, >>> (PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \ >>> * PT64_LEVEL_BITS))) - 1)) >>> >>> +#define ACC_EXEC_MASK 1 >>> +#define ACC_WRITE_MASK PT_WRITABLE_MASK >>> +#define ACC_USER_MASK PT_USER_MASK >>> +#define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK) >>> + >>> /* Functions for interpreting SPTEs */ >>> static inline bool is_mmio_spte(u64 spte) >>> { >>> @@ -126,4 +135,19 @@ static inline kvm_pfn_t spte_to_pfn(u64 pte) >>> return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT; >>> } >>> >>> +static inline void kvm_mmu_get_root(struct kvm_mmu_page *sp) >>> +{ >>> + BUG_ON(!sp->root_count); >>> + >>> + ++sp->root_count; >>> +} >>> + >>> +static inline bool kvm_mmu_put_root(struct kvm_mmu_page *sp) >>> +{ >>> + --sp->root_count; >>> + >>> + return !sp->root_count; >>> +} >>> + >>> + >>> #endif /* __KVM_X86_MMU_INTERNAL_H */ >>> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c >>> index b3809835e90b1..09a84a6e157b6 100644 >>> --- a/arch/x86/kvm/mmu/tdp_mmu.c >>> +++ b/arch/x86/kvm/mmu/tdp_mmu.c >>> @@ -1,5 +1,7 @@ >>> // SPDX-License-Identifier: GPL-2.0 >>> >>> +#include "mmu.h" >>> +#include "mmu_internal.h" >>> #include "tdp_mmu.h" >>> >>> static bool __read_mostly tdp_mmu_enabled = false; >>> @@ -29,10 +31,122 @@ void kvm_mmu_init_tdp_mmu(struct kvm *kvm) >>> >>> /* This should not be changed for the lifetime of the VM. */ >>> kvm->arch.tdp_mmu_enabled = true; >>> + >>> + INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots); >>> } >>> >>> void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) >>> { >>> if (!kvm->arch.tdp_mmu_enabled) >>> return; >>> + >>> + WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots)); >>> +} >>> + >>> +#define for_each_tdp_mmu_root(_kvm, _root) \ >>> + list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link) >>> + >>> +bool is_tdp_mmu_root(struct kvm *kvm, hpa_t hpa) >>> +{ >>> + struct kvm_mmu_page *sp; >>> + >>> + sp = to_shadow_page(hpa); >>> + >>> + return sp->tdp_mmu_page && sp->root_count; >>> +} >>> + >>> +void kvm_tdp_mmu_free_root(struct kvm *kvm, struct kvm_mmu_page *root) >>> +{ >>> + lockdep_assert_held(&kvm->mmu_lock); >>> + >>> + WARN_ON(root->root_count); >>> + WARN_ON(!root->tdp_mmu_page); >>> + >>> + list_del(&root->link); >>> + >>> + free_page((unsigned long)root->spt); >>> + kmem_cache_free(mmu_page_header_cache, root); >>> +} >>> + >>> +static void put_tdp_mmu_root(struct kvm *kvm, struct kvm_mmu_page *root) >>> +{ >>> + if (kvm_mmu_put_root(root)) >>> + kvm_tdp_mmu_free_root(kvm, root); >>> +} >>> + >>> +static void get_tdp_mmu_root(struct kvm *kvm, struct kvm_mmu_page *root) >>> +{ >>> + lockdep_assert_held(&kvm->mmu_lock); >>> + >>> + kvm_mmu_get_root(root); >>> +} >>> + >>> +static union kvm_mmu_page_role page_role_for_level(struct kvm_vcpu *vcpu, >>> + int level) >>> +{ >>> + union kvm_mmu_page_role role; >>> + >>> + role = vcpu->arch.mmu->mmu_role.base; >>> + role.level = vcpu->arch.mmu->shadow_root_level; >> >> role.level = level; >> The role will be calculated for non root pages later. > > Thank you for catching that Yu, that was definitely an error! > I'm guessing this never showed up in my testing because I don't think > the TDP MMU actually uses role.level for anything other than root > pages. I'll fix it up, thanks to both! Paolo