Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp7931948rwb; Wed, 23 Nov 2022 12:49:22 -0800 (PST) X-Google-Smtp-Source: AA0mqf6HHpt/XQzp4dW/YDmTY2ifrlTqGeaJHzVNYpkEQ8oHEl8diU5MnD0eGrDBXMiokoHq1kbW X-Received: by 2002:aa7:d3cb:0:b0:461:8d98:3d6d with SMTP id o11-20020aa7d3cb000000b004618d983d6dmr13293713edr.116.1669236562526; Wed, 23 Nov 2022 12:49:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669236562; cv=none; d=google.com; s=arc-20160816; b=uB72ToDglxagw+qvmHzNMQG1yoT2Lksd1EEAuOBEkJuXz/WA/LRpYbuJLewcmDos+Q isPy5KpMo3xSezqHj7Grt6mt4rMHLy0nb3nqg0NMs4kpYsB2Pvf8nXgXt7oawb2tmZB5 GhrT43t2QztRR+SjYyNy7pbOOw4DxorPFc1+OAT4act474joQVaxTJKfWBXj3XrgZVIs 80p66ke8qHtd+UhFrRWyR4/nzL7aZTagRp5cPmU1HGfurLsflxMxFXQDp91YnS71mJXi tOrffUJ2uLONilDm0Vuc0I3PdRB+ixaZw4wDhNKyG8QqFAk+VMZDWFJFtG8ItDYE7OuO MNCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=0Jcy3UuErFEL/uxbNGmrLvWKbMJqvOa2yNcvcH8igpg=; b=OEmzqP92bH49hFQ20Kpi+RYPXF52uh3KzLBEhbQqE6XG5N9VI2/CP6+VbLXDL4R0wG WoOEgcJr1H729a4UFLS8Hqqk5PFpcQhzabg9b3mvw0d6tDoG7ovv7yRQqyaKx1BV7DaD kk/YakXQJanOydHx2iKnvmm15TZsAr4OOsYb8/WFfsCREDpMjA45bQH8ToD8HV/sb3IA 19MRT2EiGooO3ogbiafSQPanRIckyvPFLGQQs/tSWbFTNZO4AK4OKlKDkVWGyCVWWUyH GTk7P+Yzm5e6lGa9IsYUPEaH5fOKf+Lse+cJtHw5uSOeXuUmMdBz/oVvfnL91EktNh7k tYhw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=ZLiw78UQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m11-20020a170906258b00b007ad8bd5a3b2si11716889ejb.263.2022.11.23.12.48.59; Wed, 23 Nov 2022 12:49:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=ZLiw78UQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239187AbiKWUnI (ORCPT + 89 others); Wed, 23 Nov 2022 15:43:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238614AbiKWUm6 (ORCPT ); Wed, 23 Nov 2022 15:42:58 -0500 Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 902DBE033 for ; Wed, 23 Nov 2022 12:42:56 -0800 (PST) Received: by mail-pf1-x434.google.com with SMTP id b29so18376993pfp.13 for ; Wed, 23 Nov 2022 12:42:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=0Jcy3UuErFEL/uxbNGmrLvWKbMJqvOa2yNcvcH8igpg=; b=ZLiw78UQJAxB7mQsrvnaGSNRr5DMKmak5zi+VeO48Wfj0qfU9Ktx+DM5+1oBhmDMXA J8vpxu4xzLXaacGdmPlzJe0QgG1VEfmI5omBlvCbcdwRGtgZ1SWDyniBMRd2LxdQWEIy 1hFQaA1DeG1GOSfWXtzejQ5tMpjw6MmGtOVWN01k2PqkLHhjjUZ6vqxqnMUNHbFKg2xn epMZsKnzPIGFT5ivk3eDfUmn9s0Nn5SLd1r8ZL5HyhliGC3f6bTseQmNCR8LilQgkCAG hPQm0YBNwPUe6GJYlneJPIJReHfJPjPUCqH1gg85R7tLwifjnl4cqUiqtciAWhvHxXax LuAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=0Jcy3UuErFEL/uxbNGmrLvWKbMJqvOa2yNcvcH8igpg=; b=8IaESwp2zj8hon3Mx5GNqxR4RHX0KnnXQXlQZEn0ewzJZTdunJxn1Xmr8CcqdlKw2S sP6GcDsLMj/nalZPPNjZpoDym5Hl/+aXSGNI5+cOKyi69Utk5sjsFC0x4mTK0WsQCui/ c96fzkYHBRism0bQL/lqLKbC0BG2d2F/5KIbFxqSfViPL/1zSf6ucEK7VRMTdUSOZW0o aCKzkpGZFrOgFa5e73hhnPHS8AiVS1S/qF0jIM+8qdKR2hkruS57cNYJ6ourtAKDkQ0u Do/3BkqueMPsdNtXL+oVFvp2o6uyXhUik1d5BH4XpW0gQRb0pfYcmBsCccLfH4/v5Ada 2WmA== X-Gm-Message-State: ANoB5pnkaIo7pS6ADtTFfYD+P87hfagNQMLut0CjKxDo989ab43hXVnM VjEhFjjHzxOho+9rEw+dsIIagQ== X-Received: by 2002:a63:dc45:0:b0:44e:46f9:7eeb with SMTP id f5-20020a63dc45000000b0044e46f97eebmr9659477pgj.3.1669236175467; Wed, 23 Nov 2022 12:42:55 -0800 (PST) Received: from google.com (7.104.168.34.bc.googleusercontent.com. [34.168.104.7]) by smtp.gmail.com with ESMTPSA id b13-20020aa7950d000000b0056bc742d21esm13488945pfp.176.2022.11.23.12.42.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Nov 2022 12:42:55 -0800 (PST) Date: Wed, 23 Nov 2022 20:42:51 +0000 From: Sean Christopherson To: "Li, Xin3" Cc: Peter Zijlstra , Paolo Bonzini , "linux-kernel@vger.kernel.org" , "x86@kernel.org" , "kvm@vger.kernel.org" , "tglx@linutronix.de" , "mingo@redhat.com" , "bp@alien8.de" , "dave.hansen@linux.intel.com" , "hpa@zytor.com" , "Tian, Kevin" Subject: Re: [RESEND PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 23, 2022, Li, Xin3 wrote: > > Actually, SVM already enables NMIs far earlier, which means the probability of > > breaking something by moving NMI handling earlier is lower. Not zero, as I > > don't trust that SVM gets all the corner right, but definitely low. Duh. Moving NMI handling much earlier in VMX's VM-Exit path _must_ be safe, otherwise we already have problems. On VMX, NMIs are enabled right up until VM-Enter (VMLAUNCH/VMRESUME), and unless the VM-Exit is due to an NMI, NMIs are enabled immediately after VM-Exit. This case is problematic because the CPU can end up running the NMI handler with NMIs enabled, not because it might run with incorrect state loaded. > > E.g. this should be doable > > Do we need to recover *all* host states before switching to NMI stack and > handler? As above, no. > if not what is the minimal set? The absolute minimal set is what is context switched via the VMCS. > If we restore the minimal set and do "int $2" as early as possible, is it a > quick and dirty approach? No, it is simply "the approach". If there are other problems with respect to noinstr, then they can/should be fixed separately. > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index > > cea8c07f5229..2fec93f38960 100644 > > --- a/arch/x86/kvm/vmx/vmx.c > > +++ b/arch/x86/kvm/vmx/vmx.c > > @@ -7249,6 +7249,8 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu > > *vcpu) > > if (unlikely(vmx->exit_reason.failed_vmentry)) > > return EXIT_FASTPATH_NONE; > > > > + > > + > > vmx->loaded_vmcs->launched = 1; > > > > vmx_recover_nmi_blocking(vmx); > > > > > > thouh we'd like want a fair bit of refactoring so that all of vmx_vcpu_run() and > > svm_vcpu_run() don't need to be noinstr. For the record, svm_vcpu_run() is fine, at least as far as NMIs are concerned. > This sounds reasonable to me, however from Documentation/core-api/entry.rst, > we do need it. Why do you say that? I believe this is what we should end up with. Compile tested only, and needs to split up into 4+ patches. I'll test and massage this into a proper series next week (US holiday the rest of this week). --- arch/x86/kvm/kvm_cache_regs.h | 16 +++++------ arch/x86/kvm/vmx/vmenter.S | 4 +-- arch/x86/kvm/vmx/vmx.c | 51 ++++++++++++++++++----------------- arch/x86/kvm/vmx/vmx.h | 2 +- arch/x86/kvm/x86.h | 6 ++--- 5 files changed, 41 insertions(+), 38 deletions(-) diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h index c09174f73a34..af9bd0374915 100644 --- a/arch/x86/kvm/kvm_cache_regs.h +++ b/arch/x86/kvm/kvm_cache_regs.h @@ -50,26 +50,26 @@ BUILD_KVM_GPR_ACCESSORS(r15, R15) * 1 0 register in vcpu->arch * 1 1 register in vcpu->arch, needs to be stored back */ -static inline bool kvm_register_is_available(struct kvm_vcpu *vcpu, - enum kvm_reg reg) +static __always_inline bool kvm_register_is_available(struct kvm_vcpu *vcpu, + enum kvm_reg reg) { return test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail); } -static inline bool kvm_register_is_dirty(struct kvm_vcpu *vcpu, - enum kvm_reg reg) +static __always_inline bool kvm_register_is_dirty(struct kvm_vcpu *vcpu, + enum kvm_reg reg) { return test_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty); } -static inline void kvm_register_mark_available(struct kvm_vcpu *vcpu, - enum kvm_reg reg) +static __always_inline void kvm_register_mark_available(struct kvm_vcpu *vcpu, + enum kvm_reg reg) { __set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail); } -static inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu, - enum kvm_reg reg) +static __always_inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu, + enum kvm_reg reg) { __set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail); __set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty); diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S index 0b5db4de4d09..b104dfd282ed 100644 --- a/arch/x86/kvm/vmx/vmenter.S +++ b/arch/x86/kvm/vmx/vmenter.S @@ -318,7 +318,7 @@ SYM_FUNC_START(vmread_error_trampoline) RET SYM_FUNC_END(vmread_error_trampoline) -SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff) +SYM_FUNC_START(vmx_do_interrupt_irqoff) /* * Unconditionally create a stack frame, getting the correct RSP on the * stack (for x86-64) would take two instructions anyways, and RBP can @@ -349,4 +349,4 @@ SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff) mov %_ASM_BP, %_ASM_SP pop %_ASM_BP RET -SYM_FUNC_END(vmx_do_interrupt_nmi_irqoff) +SYM_FUNC_END(vmx_do_interrupt_irqoff) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index cea8c07f5229..b721fde4f5af 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -5064,8 +5064,13 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu) vect_info = vmx->idt_vectoring_info; intr_info = vmx_get_intr_info(vcpu); + /* + * Machine checks are handled by handle_exception_irqoff(), or by + * vmx_vcpu_run() if a #MC occurs on VM-Entry. NMIs are handled by + * vmx_vcpu_enter_exit(). + */ if (is_machine_check(intr_info) || is_nmi(intr_info)) - return 1; /* handled by handle_exception_nmi_irqoff() */ + return 1; /* * Queue the exception here instead of in handle_nm_fault_irqoff(). @@ -6755,18 +6760,6 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu) memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir)); } -void vmx_do_interrupt_nmi_irqoff(unsigned long entry); - -static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, - unsigned long entry) -{ - bool is_nmi = entry == (unsigned long)asm_exc_nmi_noist; - - kvm_before_interrupt(vcpu, is_nmi ? KVM_HANDLING_NMI : KVM_HANDLING_IRQ); - vmx_do_interrupt_nmi_irqoff(entry); - kvm_after_interrupt(vcpu); -} - static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu) { /* @@ -6787,9 +6780,8 @@ static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu) rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err); } -static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx) +static void handle_exception_irqoff(struct vcpu_vmx *vmx) { - const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist; u32 intr_info = vmx_get_intr_info(&vmx->vcpu); /* if exit due to PF check for async PF */ @@ -6801,11 +6793,10 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx) /* Handle machine checks before interrupts are enabled */ else if (is_machine_check(intr_info)) kvm_machine_check(); - /* We need to handle NMIs before interrupts are enabled */ - else if (is_nmi(intr_info)) - handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry); } +void vmx_do_interrupt_irqoff(unsigned long entry); + static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu) { u32 intr_info = vmx_get_intr_info(vcpu); @@ -6816,7 +6807,10 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu) "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info)) return; - handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc)); + kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ); + vmx_do_interrupt_irqoff(gate_offset(desc)); + kvm_after_interrupt(vcpu); + vcpu->arch.at_instruction_boundary = true; } @@ -6830,7 +6824,7 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu) if (vmx->exit_reason.basic == EXIT_REASON_EXTERNAL_INTERRUPT) handle_external_interrupt_irqoff(vcpu); else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI) - handle_exception_nmi_irqoff(vmx); + handle_exception_irqoff(vmx); } /* @@ -7091,6 +7085,18 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, vmx_enable_fb_clear(vmx); + if (unlikely(vmx->fail)) + vmx->exit_reason.full = 0xdead; + else + vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON); + + if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI && + is_nmi(vmx_get_intr_info(vcpu))) { + kvm_before_interrupt(vcpu, KVM_HANDLING_NMI); + asm("int $2"); + kvm_after_interrupt(vcpu); + } + guest_state_exit_irqoff(); } @@ -7232,12 +7238,9 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu) vmx->idt_vectoring_info = 0; - if (unlikely(vmx->fail)) { - vmx->exit_reason.full = 0xdead; + if (unlikely(vmx->fail)) return EXIT_FASTPATH_NONE; - } - vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON); if (unlikely((u16)vmx->exit_reason.basic == EXIT_REASON_MCE_DURING_VMENTRY)) kvm_machine_check(); diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index a3da84f4ea45..eb52cfde5ec2 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -680,7 +680,7 @@ static inline unsigned long vmx_get_exit_qual(struct kvm_vcpu *vcpu) return vmx->exit_qualification; } -static inline u32 vmx_get_intr_info(struct kvm_vcpu *vcpu) +static __always_inline u32 vmx_get_intr_info(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 9de72586f406..44d1827f0a30 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -382,13 +382,13 @@ enum kvm_intr_type { KVM_HANDLING_NMI, }; -static inline void kvm_before_interrupt(struct kvm_vcpu *vcpu, - enum kvm_intr_type intr) +static __always_inline void kvm_before_interrupt(struct kvm_vcpu *vcpu, + enum kvm_intr_type intr) { WRITE_ONCE(vcpu->arch.handling_intr_from_guest, (u8)intr); } -static inline void kvm_after_interrupt(struct kvm_vcpu *vcpu) +static __always_inline void kvm_after_interrupt(struct kvm_vcpu *vcpu) { WRITE_ONCE(vcpu->arch.handling_intr_from_guest, 0); } base-commit: 0fa32dad1e78629cb42999dacd82489503fdf4c2 --