Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp1365860pxu; Fri, 16 Oct 2020 10:08:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx1Ho8B9sCp88nuzjQvrttVontfObK3WdtLwEVufEPTBCxyNqiUZu+cd2//CZRRdZyix2Pf X-Received: by 2002:a2e:819a:: with SMTP id e26mr2115534ljg.469.1602868096370; Fri, 16 Oct 2020 10:08:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1602868096; cv=none; d=google.com; s=arc-20160816; b=J/utDZrowibMG/3EWiUF6EHs0Okyz39sSdZ19QcOqnUx38igTCa50CF2ncp6eDBYOd tXKHaOFvPQQDT/Fkc5cuiGx2VPfRWECxIQtWYqe28Og5TIO+S5O0Xg0iYhe3afMo66qG nQD9hsz+eVhDzKNvywBwSahOKUE5ka00n18+wVuP5ZBPprm+33aF0gOynaM8gfFlNppa NPs3y5KTJuRTj+bY2FJzCE2XVQD+TXCSE+IeuNFbK8rPERojxZPwuUcuP5buKXK3LEX2 qAHq5b5B9JsbirDUHcMLJtlm5f27Hu9eulG/jrqO8OuLQ7+KWEgOKdxn7AjrE3eaYD9G QtZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=e/SbRCVepdQHgx0PPvFLUqgTx0YaGbhGi4UR5HEZ070=; b=xTu41yNTRHncCByUH+Bph8KWPDX4R2/3tGDb3uFJr+enTK3a1dTCReTe1D15GKBTil qBmnur7LsQjeHMy7LPKm4LhGa34zWUuEp9JL25oY53DJctLsw2IiruZXLHtOjPoDXhaw HlF9CbuLlthLar8mo/8F8E7/nGwB/FwmgYy+YbUXx/M9FVPLfXoG3GYhHV5KurArEomM yKZou7jK/2jNW6WWDK9yIz5GiWPqgzsZdkjdBBfC0mN2lLsoxeTDvCPlZDgCaxKskYrJ w0BQBj7N3iIw6IBL+4hiJ4/p/bRt+qKqIANW2hM259G/4xq+Ob24MN+Xtr5GUzuT4A71 MlZg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XMLHLyin; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b24si2044386edt.22.2020.10.16.10.07.53; Fri, 16 Oct 2020 10:08:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XMLHLyin; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2410881AbgJPQum (ORCPT + 99 others); Fri, 16 Oct 2020 12:50:42 -0400 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:33303 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2410876AbgJPQum (ORCPT ); Fri, 16 Oct 2020 12:50:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1602867041; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=e/SbRCVepdQHgx0PPvFLUqgTx0YaGbhGi4UR5HEZ070=; b=XMLHLyin1ooLHb9U7Fg6Q78KSp1IyxNG8mbIMKmLq97SYFQ5HgCWABwCVcxhshkJV2GHc+ xk1VrojQ2XA3n+1PhTy3KGov7GhVfUOvXH/eLF/gCSsUCyUmRSTCt1xnMnme0SjuCYN5gi FXj9ko+PVqDPn0bjxRscN0d8vcQ0YpI= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-172-lSUJrAyUOUK7HjngvdLG5g-1; Fri, 16 Oct 2020 12:50:39 -0400 X-MC-Unique: lSUJrAyUOUK7HjngvdLG5g-1 Received: by mail-wr1-f72.google.com with SMTP id f11so1726784wro.15 for ; Fri, 16 Oct 2020 09:50:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=e/SbRCVepdQHgx0PPvFLUqgTx0YaGbhGi4UR5HEZ070=; b=nImlQZi8E8GbDWu5wOqvqgBv+8o2sEDYugdDsJDbTrLx4zCKnNuAAaN7frZGx2HolA 34zXnR/5+ZUidPV35CaqteRcDhSFstX1OPGboIqu5CfPCxNSP9ubwb/ha+dmhwXnDZfd Nd3aTWxpp2mUjBkWDcQmBaVN4GqdP5CRchqmu7fFvNHU8eCir6WJtjAQ0vf/OpO+QH8v mKNzNx+E3E87qySULRNCy6ov+9g228KR4iVug8XJFAm6W4AN8kDvKmEu+1XKr4zTtLC0 yHEzF05zgErYww9+QR61TBk6LTlSjtMmTlPmoMp/Mzqnx5acNjx07eIKNT2QN9W51Ytu X35A== X-Gm-Message-State: AOAM531BXoO0aqa8He/0MSYbTPWv+CqGZODDv2/lRGSmgbH6m7V1R8Ml PNwseMpIu5wuA+oCbSlHpcajpyrvGG2MOx2J4tnhSTEfZ3JqOG8PsZQX65QOpRhZ4+khVt5g3C7 oRKXTzu43Sg1ijsUQvnvpJCe3 X-Received: by 2002:adf:8462:: with SMTP id 89mr4838902wrf.389.1602867037701; Fri, 16 Oct 2020 09:50:37 -0700 (PDT) X-Received: by 2002:adf:8462:: with SMTP id 89mr4838872wrf.389.1602867037423; Fri, 16 Oct 2020 09:50:37 -0700 (PDT) Received: from ?IPv6:2001:b07:6468:f312:4e8a:ee8e:6ed5:4bc3? ([2001:b07:6468:f312:4e8a:ee8e:6ed5:4bc3]) by smtp.gmail.com with ESMTPSA id y190sm3696534wmc.27.2020.10.16.09.50.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 16 Oct 2020 09:50:36 -0700 (PDT) Subject: Re: [PATCH v2 00/20] Introduce the TDP MMU To: Ben Gardon , linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Cannon Matthews , Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong References: <20201014182700.2888246-1-bgardon@google.com> From: Paolo Bonzini Message-ID: Date: Fri, 16 Oct 2020 18:50:35 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: <20201014182700.2888246-1-bgardon@google.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 14/10/20 20:26, Ben Gardon wrote: > arch/x86/include/asm/kvm_host.h | 14 + > arch/x86/kvm/Makefile | 3 +- > arch/x86/kvm/mmu/mmu.c | 487 +++++++------ > arch/x86/kvm/mmu/mmu_internal.h | 242 +++++++ > arch/x86/kvm/mmu/paging_tmpl.h | 3 +- > arch/x86/kvm/mmu/tdp_iter.c | 181 +++++ > arch/x86/kvm/mmu/tdp_iter.h | 60 ++ > arch/x86/kvm/mmu/tdp_mmu.c | 1154 +++++++++++++++++++++++++++++++ > arch/x86/kvm/mmu/tdp_mmu.h | 48 ++ > include/linux/kvm_host.h | 2 + > virt/kvm/kvm_main.c | 12 +- > 11 files changed, 1944 insertions(+), 262 deletions(-) > create mode 100644 arch/x86/kvm/mmu/tdp_iter.c > create mode 100644 arch/x86/kvm/mmu/tdp_iter.h > create mode 100644 arch/x86/kvm/mmu/tdp_mmu.c > create mode 100644 arch/x86/kvm/mmu/tdp_mmu.h > My implementation of tdp_iter_set_spte was completely different, but of course that's not an issue; I would still like to understand and comment on why the bool arguments to __tdp_mmu_set_spte are needed. Apart from splitting tdp_mmu_iter_flush_cond_resched from tdp_mmu_iter_cond_resched, my remaining changes on top are pretty small and mostly cosmetic. I'll give it another go next week and send it Linus's way if everything's all right. Paolo diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index f8525c89fc95..baf260421a56 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -7,20 +7,15 @@ #include "tdp_mmu.h" #include "spte.h" +#ifdef CONFIG_X86_64 static bool __read_mostly tdp_mmu_enabled = false; +module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0644); +#endif static bool is_tdp_mmu_enabled(void) { #ifdef CONFIG_X86_64 - if (!READ_ONCE(tdp_mmu_enabled)) - return false; - - if (WARN_ONCE(!tdp_enabled, - "Creating a VM with TDP MMU enabled requires TDP.")) - return false; - - return true; - + return tdp_enabled && READ_ONCE(tdp_mmu_enabled); #else return false; #endif /* CONFIG_X86_64 */ @@ -277,8 +277,8 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, unaccount_huge_nx_page(kvm, sp); for (i = 0; i < PT64_ENT_PER_PAGE; i++) { - old_child_spte = *(pt + i); - *(pt + i) = 0; + old_child_spte = READ_ONCE(*(pt + i)); + WRITE_ONCE(*(pt + i), 0); handle_changed_spte(kvm, as_id, gfn + (i * KVM_PAGES_PER_HPAGE(level - 1)), old_child_spte, 0, level - 1); @@ -309,7 +309,7 @@ static inline void __tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter, struct kvm_mmu_page *root = sptep_to_sp(root_pt); int as_id = kvm_mmu_page_as_id(root); - *iter->sptep = new_spte; + WRITE_ONCE(*iter->sptep, new_spte); __handle_changed_spte(kvm, as_id, iter->gfn, iter->old_spte, new_spte, iter->level); @@ -361,16 +361,28 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm, for_each_tdp_pte(_iter, __va(_mmu->root_hpa), \ _mmu->shadow_root_level, _start, _end) -static bool tdp_mmu_iter_cond_resched(struct kvm *kvm, struct tdp_iter *iter) +/* + * Flush the TLB if the process should drop kvm->mmu_lock. + * Return whether the caller still needs to flush the tlb. + */ +static bool tdp_mmu_iter_flush_cond_resched(struct kvm *kvm, struct tdp_iter *iter) { if (need_resched() || spin_needbreak(&kvm->mmu_lock)) { kvm_flush_remote_tlbs(kvm); cond_resched_lock(&kvm->mmu_lock); tdp_iter_refresh_walk(iter); + return false; + } else { return true; } +} - return false; +static void tdp_mmu_iter_cond_resched(struct kvm *kvm, struct tdp_iter *iter) +{ + if (need_resched() || spin_needbreak(&kvm->mmu_lock)) { + cond_resched_lock(&kvm->mmu_lock); + tdp_iter_refresh_walk(iter); + } } /* @@ -407,7 +419,7 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, tdp_mmu_set_spte(kvm, &iter, 0); if (can_yield) - flush_needed = !tdp_mmu_iter_cond_resched(kvm, &iter); + flush_needed = tdp_mmu_iter_flush_cond_resched(kvm, &iter); else flush_needed = true; } @@ -479,7 +479,10 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write, map_writable, !shadow_accessed_mask, &new_spte); - tdp_mmu_set_spte(vcpu->kvm, iter, new_spte); + if (new_spte == iter->old_spte) + ret = RET_PF_SPURIOUS; + else + tdp_mmu_set_spte(vcpu->kvm, iter, new_spte); /* * If the page fault was caused by a write but the page is write @@ -496,7 +496,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write, } /* If a MMIO SPTE is installed, the MMIO will need to be emulated. */ - if (unlikely(is_mmio_spte(new_spte))) + else if (unlikely(is_mmio_spte(new_spte))) ret = RET_PF_EMULATE; trace_kvm_mmu_set_spte(iter->level, iter->gfn, iter->sptep); @@ -528,8 +528,10 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, int level; int req_level; - BUG_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)); - BUG_ON(!is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa)); + if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa))) + return RET_PF_ENTRY; + if (WARN_ON(!is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))) + return RET_PF_ENTRY; level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn, huge_page_disallowed, &req_level); @@ -579,7 +581,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, } } - BUG_ON(iter.level != level); + if (WARN_ON(iter.level != level)) + return RET_PF_RETRY; ret = tdp_mmu_map_handle_target_level(vcpu, write, map_writable, &iter, pfn, prefault); @@ -817,9 +829,8 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm, struct kvm_memory_slot *slot, */ kvm_mmu_get_root(kvm, root); - spte_set = wrprot_gfn_range(kvm, root, slot->base_gfn, - slot->base_gfn + slot->npages, min_level) || - spte_set; + spte_set |= wrprot_gfn_range(kvm, root, slot->base_gfn, + slot->base_gfn + slot->npages, min_level); kvm_mmu_put_root(kvm, root); } @@ -886,8 +897,8 @@ bool kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm, struct kvm_memory_slot *slot) */ kvm_mmu_get_root(kvm, root); - spte_set = clear_dirty_gfn_range(kvm, root, slot->base_gfn, - slot->base_gfn + slot->npages) || spte_set; + spte_set |= clear_dirty_gfn_range(kvm, root, slot->base_gfn, + slot->base_gfn + slot->npages); kvm_mmu_put_root(kvm, root); } @@ -1009,8 +1020,8 @@ bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot) */ kvm_mmu_get_root(kvm, root); - spte_set = set_dirty_gfn_range(kvm, root, slot->base_gfn, - slot->base_gfn + slot->npages) || spte_set; + spte_set |= set_dirty_gfn_range(kvm, root, slot->base_gfn, + slot->base_gfn + slot->npages); kvm_mmu_put_root(kvm, root); } @@ -1042,9 +1053,9 @@ static void zap_collapsible_spte_range(struct kvm *kvm, continue; tdp_mmu_set_spte(kvm, &iter, 0); - spte_set = true; - spte_set = !tdp_mmu_iter_cond_resched(kvm, &iter); + spte_set = tdp_mmu_iter_flush_cond_resched(kvm, &iter); } if (spte_set)