Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp1362389rwb; Tue, 29 Nov 2022 12:30:36 -0800 (PST) X-Google-Smtp-Source: AA0mqf74+11rSbh7Juxp1hHz978L+XJdx72EYScVxdkIpxm3TlfnbSA2hlVj6kYDic7V6nUcFf/z X-Received: by 2002:a17:90b:2744:b0:218:7675:ba83 with SMTP id qi4-20020a17090b274400b002187675ba83mr28422560pjb.61.1669753835769; Tue, 29 Nov 2022 12:30:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669753835; cv=none; d=google.com; s=arc-20160816; b=qYeX+eQvkuk/3tETiEDcwWDvakp9oyMj+r+fw4T/yCuL165x5g9xFzgXUFAuDTZHva P51BFFskWEY2PkvoF+ddqJocCK+IdwoA0jKvDbf0P4tmKJnYEX6P2+6OdL/oTFzJoH+l JxMKWt09X3kbOL3YClEZvvvQZi0HW+oXWEzKNd8eodYxP5OUtopQCTRPdPE+CqoTVAC3 vo/JNyYNoEfq4ax0pEP+wl5/XlGmv6Y3WHerr+q4h9skBqww3/noAEjMn2elz47sQOMF GHTnQCMcxDblrau5m+dtOtm6blEPEO9IUn5d7g9YGGuvTrMkd0MxO6lu7adBASMYHECb SFsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=rGwAkLQUJjI6/SajL0tjhWVIBhdO0EXs4usgx0YdcWo=; b=bxVwncRVGa3LGIor9JI9wED9Lj84colP1UJHLfREJVQE13xAftlgbaHwB9QKbdKvse gvUL074haIJv/AlAINiijeT11dyPoo90yoeW4mN6SKL0nIHb3vh/wkiox91WhMCpBnm2 c2Hbxv/akWXDm7y0XC5YOVcyhaGW1knBbizp7ZaCHLQvddmLdHnTDTGYbJsShBQ/vsPo yvYaWQTMCpeuDvDJptu/+6nwWmMuK1EVckImk6Pzbl3K3IkKod9zU0caCE7oZJCmu2ES BejpiAjpe1vv2rXcp1/TiORw+hH/tLmxowrBohCHsnsR47RykdU2zPF6sWBbj18txWoX gEUw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="OOdRLho/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x12-20020a63cc0c000000b00476db6fe29fsi15753459pgf.662.2022.11.29.12.30.24; Tue, 29 Nov 2022 12:30:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="OOdRLho/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237176AbiK2Tl6 (ORCPT + 84 others); Tue, 29 Nov 2022 14:41:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48470 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236531AbiK2TlD (ORCPT ); Tue, 29 Nov 2022 14:41:03 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B34A52171 for ; Tue, 29 Nov 2022 11:38:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1669750689; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rGwAkLQUJjI6/SajL0tjhWVIBhdO0EXs4usgx0YdcWo=; b=OOdRLho/MXWON7WUlGEacbWsWJ7p4ixccnij+9yFQOwwcCZ3RosLWvITxkZ7IXTCVoQowE jFFUIePZX9rjlmZGzOkK0WKRnzHZUq942w4HhtfZ1ftDNFKid9mll1gXdVTY77vqZKMoKS Pr1f2ZcXA+BCclL9fnj7s0OWQcigVh4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-385-I7THsf4vNf6hAU3FMB9_Jg-1; Tue, 29 Nov 2022 14:38:06 -0500 X-MC-Unique: I7THsf4vNf6hAU3FMB9_Jg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2FC5086C141; Tue, 29 Nov 2022 19:38:05 +0000 (UTC) Received: from localhost.localdomain (unknown [10.35.206.46]) by smtp.corp.redhat.com (Postfix) with ESMTP id EDDF42028DC1; Tue, 29 Nov 2022 19:38:00 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Sandipan Das , Paolo Bonzini , Jim Mattson , Peter Zijlstra , Dave Hansen , Borislav Petkov , Pawan Gupta , Thomas Gleixner , Ingo Molnar , Josh Poimboeuf , Daniel Sneddon , Jiaxi Chen , Babu Moger , linux-kernel@vger.kernel.org, Jing Liu , Wyes Karny , x86@kernel.org, "H. Peter Anvin" , Sean Christopherson , Maxim Levitsky , Santosh Shukla Subject: [PATCH v2 10/11] KVM: SVM: implement support for vNMI Date: Tue, 29 Nov 2022 21:37:16 +0200 Message-Id: <20221129193717.513824-11-mlevitsk@redhat.com> In-Reply-To: <20221129193717.513824-1-mlevitsk@redhat.com> References: <20221129193717.513824-1-mlevitsk@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch implements support for injecting pending NMIs via the .kvm_x86_set_hw_nmi_pending using new AMD's vNMI feature. Note that the vNMI can't cause a VM exit, which is needed when a nested guest intercepts NMIs. Therefore to avoid breaking nesting, the vNMI is inhibited while a nested guest is running and instead, the legacy NMI window detection and delivery method is used. While it is possible to passthrough the vNMI if a nested guest doesn't intercept NMIs, such usage is very uncommon, and it's not worth to optimize for. Signed-off-by: Santosh Shukla Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/nested.c | 42 +++++++++++++++ arch/x86/kvm/svm/svm.c | 111 ++++++++++++++++++++++++++++++-------- arch/x86/kvm/svm/svm.h | 10 ++++ 3 files changed, 140 insertions(+), 23 deletions(-) diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index e891318595113e..5bea672bf8b12d 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -623,6 +623,42 @@ static bool is_evtinj_nmi(u32 evtinj) return type == SVM_EVTINJ_TYPE_NMI; } +static void nested_svm_save_vnmi(struct vcpu_svm *svm) +{ + struct vmcb *vmcb01 = svm->vmcb01.ptr; + + /* + * Copy the vNMI state back to software NMI tracking state + * for the duration of the nested run + */ + + svm->nmi_masked = vmcb01->control.int_ctl & V_NMI_MASK; + svm->vcpu.arch.nmi_pending += vmcb01->control.int_ctl & V_NMI_PENDING; +} + +static void nested_svm_restore_vnmi(struct vcpu_svm *svm) +{ + struct kvm_vcpu *vcpu = &svm->vcpu; + struct vmcb *vmcb01 = svm->vmcb01.ptr; + + /* + * Restore the vNMI state from the software NMI tracking state + * after a nested run + */ + + if (svm->nmi_masked) + vmcb01->control.int_ctl |= V_NMI_MASK; + else + vmcb01->control.int_ctl &= ~V_NMI_MASK; + + if (vcpu->arch.nmi_pending) { + vcpu->arch.nmi_pending--; + vmcb01->control.int_ctl |= V_NMI_PENDING; + } else + vmcb01->control.int_ctl &= ~V_NMI_PENDING; +} + + static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, unsigned long vmcb12_rip, unsigned long vmcb12_csbase) @@ -646,6 +682,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, else int_ctl_vmcb01_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK); + if (vnmi) + nested_svm_save_vnmi(svm); + /* Copied from vmcb01. msrpm_base can be overwritten later. */ vmcb02->control.nested_ctl = vmcb01->control.nested_ctl; vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa; @@ -1049,6 +1088,9 @@ int nested_svm_vmexit(struct vcpu_svm *svm) svm_update_lbrv(vcpu); } + if (vnmi) + nested_svm_restore_vnmi(svm); + /* * On vmexit the GIF is set to false and * no event can be injected in L1. diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index cfed6ab29c839a..bf10adcf3170a8 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -230,6 +230,8 @@ module_param(dump_invalid_vmcb, bool, 0644); bool intercept_smi = true; module_param(intercept_smi, bool, 0444); +bool vnmi = true; +module_param(vnmi, bool, 0444); static bool svm_gp_erratum_intercept = true; @@ -1299,6 +1301,9 @@ static void init_vmcb(struct kvm_vcpu *vcpu) if (kvm_vcpu_apicv_active(vcpu)) avic_init_vmcb(svm, vmcb); + if (vnmi) + svm->vmcb->control.int_ctl |= V_NMI_ENABLE; + if (vgif) { svm_clr_intercept(svm, INTERCEPT_STGI); svm_clr_intercept(svm, INTERCEPT_CLGI); @@ -3487,6 +3492,39 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu) ++vcpu->stat.nmi_injections; } + +static bool svm_get_hw_nmi_pending(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + + if (!is_vnmi_enabled(svm)) + return false; + + return !!(svm->vmcb->control.int_ctl & V_NMI_MASK); +} + +static bool svm_set_hw_nmi_pending(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + + if (!is_vnmi_enabled(svm)) + return false; + + if (svm->vmcb->control.int_ctl & V_NMI_PENDING) + return false; + + svm->vmcb->control.int_ctl |= V_NMI_PENDING; + vmcb_mark_dirty(svm->vmcb, VMCB_INTR); + + /* + * NMI isn't yet technically injected but + * this rough estimation should be good enough + */ + ++vcpu->stat.nmi_injections; + + return true; +} + static void svm_inject_irq(struct kvm_vcpu *vcpu, bool reinjected) { struct vcpu_svm *svm = to_svm(vcpu); @@ -3582,11 +3620,38 @@ static void svm_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) svm_set_intercept(svm, INTERCEPT_CR8_WRITE); } +static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + + if (is_vnmi_enabled(svm)) + return svm->vmcb->control.int_ctl & V_NMI_MASK; + else + return svm->nmi_masked; +} + +static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked) +{ + struct vcpu_svm *svm = to_svm(vcpu); + + if (is_vnmi_enabled(svm)) { + if (masked) + svm->vmcb->control.int_ctl |= V_NMI_MASK; + else + svm->vmcb->control.int_ctl &= ~V_NMI_MASK; + } else { + svm->nmi_masked = masked; + if (masked) + svm_enable_iret_interception(svm); + else + svm_disable_iret_interception(svm); + } +} + bool svm_nmi_blocked(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); struct vmcb *vmcb = svm->vmcb; - bool ret; if (!gif_set(svm)) return true; @@ -3594,10 +3659,10 @@ bool svm_nmi_blocked(struct kvm_vcpu *vcpu) if (is_guest_mode(vcpu) && nested_exit_on_nmi(svm)) return false; - ret = (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) || - (svm->nmi_masked); + if (svm_get_nmi_mask(vcpu)) + return true; - return ret; + return vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK; } static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection) @@ -3615,24 +3680,6 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection) return 1; } -static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu) -{ - return to_svm(vcpu)->nmi_masked; -} - -static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked) -{ - struct vcpu_svm *svm = to_svm(vcpu); - - if (masked) { - svm->nmi_masked = true; - svm_enable_iret_interception(svm); - } else { - svm->nmi_masked = false; - svm_disable_iret_interception(svm); - } -} - bool svm_interrupt_blocked(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -3725,10 +3772,16 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu) /* * Something prevents NMI from been injected. Single step over possible * problem (IRET or exception injection or interrupt shadow) + * + * With vNMI we should never need an NMI window + * (we can always inject vNMI either by setting VNMI_PENDING or by EVENTINJ) */ + if (WARN_ON_ONCE(is_vnmi_enabled(svm))) + return; + svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu); - svm->nmi_singlestep = true; svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF); + svm->nmi_singlestep = true; } static void svm_flush_tlb_current(struct kvm_vcpu *vcpu) @@ -4770,6 +4823,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { .patch_hypercall = svm_patch_hypercall, .inject_irq = svm_inject_irq, .inject_nmi = svm_inject_nmi, + .get_hw_nmi_pending = svm_get_hw_nmi_pending, + .set_hw_nmi_pending = svm_set_hw_nmi_pending, .inject_exception = svm_inject_exception, .cancel_injection = svm_cancel_injection, .interrupt_allowed = svm_interrupt_allowed, @@ -5058,6 +5113,16 @@ static __init int svm_hardware_setup(void) pr_info("Virtual GIF supported\n"); } + + vnmi = vgif && vnmi && boot_cpu_has(X86_FEATURE_AMD_VNMI); + if (vnmi) + pr_info("Virtual NMI enabled\n"); + + if (!vnmi) { + svm_x86_ops.get_hw_nmi_pending = NULL; + svm_x86_ops.set_hw_nmi_pending = NULL; + } + if (lbrv) { if (!boot_cpu_has(X86_FEATURE_LBRV)) lbrv = false; diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 587ddc150f9f34..0b7e1790fadde1 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -35,6 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly; extern bool npt_enabled; extern int vgif; extern bool intercept_smi; +extern bool vnmi; enum avic_modes { AVIC_MODE_NONE = 0, @@ -553,6 +554,15 @@ static inline bool is_x2apic_msrpm_offset(u32 offset) (msr < (APIC_BASE_MSR + 0x100)); } +static inline bool is_vnmi_enabled(struct vcpu_svm *svm) +{ + /* L1's vNMI is inhibited while nested guest is running */ + if (is_guest_mode(&svm->vcpu)) + return false; + + return !!(svm->vmcb01.ptr->control.int_ctl & V_NMI_ENABLE); +} + /* svm.c */ #define MSR_INVALID 0xffffffffU -- 2.26.3