On Tue, 2021-05-04 at 21:50 +0200, Thomas Gleixner wrote:
> From: Lai Jiangshan <[email protected]>
>
> In VMX, the host NMI handler needs to be invoked after NMI VM-Exit.
> Before commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect
> call instead of INTn"), this was done by INTn ("int $2"). But INTn
> microcode is relatively expensive, so the commit reworked NMI VM-Exit
> handling to invoke the kernel handler by function call.
>
> But this missed a detail. The NMI entry point for direct invocation is
> fetched from the IDT table and called on the kernel stack. But on 64-bit
> the NMI entry installed in the IDT expects to be invoked on the IST stack.
> It relies on the "NMI executing" variable on the IST stack to work
> correctly, which is at a fixed position in the IST stack. When the entry
> point is unexpectedly called on the kernel stack, the RSP-addressed "NMI
> executing" variable is obviously also on the kernel stack and is
> "uninitialized" and can cause the NMI entry code to run in the wrong way.
>
> Provide a non-ist entry point for VMX which shares the C-function with
> the regular NMI entry and invoke the new asm entry point instead.
I haven't followed this closely, so this was probably already discussed,
but anyway, do I understand correctly that while the NMI handler that was
invoked from VMX code, another NMI arrives,
they won't share the stacks (one uses IST and other doesn't) and thus
the prevention of NMI nesting wont be activated?
Does this mean that we still rely on hardware NMI masking to be activated?
Or in other words, that is we still can't have an IRET between VM exit and
the entry to the NMI handler?
Best regards,
Maxim Levitsky
>
> On 32-bit this just maps to the regular NMI entry point as 32-bit has no
> ISTs and is not affected.
>
> [ tglx: Made it independent for backporting, massaged changelog ]
>
> Fixes: 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn")
> Signed-off-by: Lai Jiangshan <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Cc: [email protected]
> ---
>
> Note: That's the minimal fix which needs to be backported and the other
> stuff is cleanup material on top for 5.14.
>
> ---
> arch/x86/include/asm/idtentry.h | 15 +++++++++++++++
> arch/x86/kernel/nmi.c | 10 ++++++++++
> arch/x86/kvm/vmx/vmx.c | 16 +++++++++-------
> 3 files changed, 34 insertions(+), 7 deletions(-)
>
> --- a/arch/x86/include/asm/idtentry.h
> +++ b/arch/x86/include/asm/idtentry.h
> @@ -588,6 +588,21 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC, xenpv_
> #endif
>
> /* NMI */
> +
> +#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)
> +/*
> + * Special NOIST entry point for VMX which invokes this on the kernel
> + * stack. asm_exc_nmi() requires an IST to work correctly vs. the NMI
> + * 'executing' marker.
> + *
> + * On 32bit this just uses the regular NMI entry point because 32-bit does
> + * not have ISTs.
> + */
> +DECLARE_IDTENTRY(X86_TRAP_NMI, exc_nmi_noist);
> +#else
> +#define asm_exc_nmi_noist asm_exc_nmi
> +#endif
> +
> DECLARE_IDTENTRY_NMI(X86_TRAP_NMI, exc_nmi);
> #ifdef CONFIG_XEN_PV
> DECLARE_IDTENTRY_RAW(X86_TRAP_NMI, xenpv_exc_nmi);
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -524,6 +524,16 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
> mds_user_clear_cpu_buffers();
> }
>
> +#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)
> +DEFINE_IDTENTRY_RAW(exc_nmi_noist)
> +{
> + exc_nmi(regs);
> +}
> +#endif
> +#if IS_MODULE(CONFIG_KVM_INTEL)
> +EXPORT_SYMBOL_GPL(asm_exc_nmi_noist);
> +#endif
> +
> void stop_nmi(void)
> {
> ignore_nmis++;
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -36,6 +36,7 @@
> #include <asm/debugreg.h>
> #include <asm/desc.h>
> #include <asm/fpu/internal.h>
> +#include <asm/idtentry.h>
> #include <asm/io.h>
> #include <asm/irq_remapping.h>
> #include <asm/kexec.h>
> @@ -6415,18 +6416,17 @@ static void vmx_apicv_post_state_restore
>
> void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
>
> -static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
> +static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
> + unsigned long entry)
> {
> - unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
> - gate_desc *desc = (gate_desc *)host_idt_base + vector;
> -
> kvm_before_interrupt(vcpu);
> - vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
> + vmx_do_interrupt_nmi_irqoff(entry);
> kvm_after_interrupt(vcpu);
> }
>
> static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
> {
> + const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist;
> u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
>
> /* if exit due to PF check for async PF */
> @@ -6437,18 +6437,20 @@ static void handle_exception_nmi_irqoff(
> kvm_machine_check();
> /* We need to handle NMIs before interrupts are enabled */
> else if (is_nmi(intr_info))
> - handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
> + handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
> }
>
> static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
> {
> u32 intr_info = vmx_get_intr_info(vcpu);
> + unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
> + gate_desc *desc = (gate_desc *)host_idt_base + vector;
>
> if (WARN_ONCE(!is_external_intr(intr_info),
> "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
> return;
>
> - handle_interrupt_nmi_irqoff(vcpu, intr_info);
> + handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
> }
>
> static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
>
On 04/05/21 23:05, Maxim Levitsky wrote:
> Does this mean that we still rely on hardware NMI masking to be activated?
No, the NMI code already handles reentrancy at both the assembly and C
levels.
> Or in other words, that is we still can't have an IRET between VM exit and
> the entry to the NMI handler?
No, because NMIs are not masked on VM exit. This in fact makes things
potentially messy; unlike with AMD's CLGI/STGI, only MSRs and other
things that Intel thought can be restored atomically with the VM exit.
Paolo
On Tue, May 04, 2021, Paolo Bonzini wrote:
> On 04/05/21 23:05, Maxim Levitsky wrote:
> > Does this mean that we still rely on hardware NMI masking to be activated?
>
> No, the NMI code already handles reentrancy at both the assembly and C
> levels.
>
> > Or in other words, that is we still can't have an IRET between VM exit and
> > the entry to the NMI handler?
>
> No, because NMIs are not masked on VM exit. This in fact makes things
> potentially messy; unlike with AMD's CLGI/STGI, only MSRs and other things
> that Intel thought can be restored atomically with the VM exit.
FWIW, NMIs are masked if the VM-Exit was due to an NMI.
> On May 4, 2021, at 2:21 PM, Sean Christopherson <[email protected]> wrote:
>
> On Tue, May 04, 2021, Paolo Bonzini wrote:
>>> On 04/05/21 23:05, Maxim Levitsky wrote:
>>> Does this mean that we still rely on hardware NMI masking to be activated?
>>
>> No, the NMI code already handles reentrancy at both the assembly and C
>> levels.
>>
>>> Or in other words, that is we still can't have an IRET between VM exit and
>>> the entry to the NMI handler?
>>
>> No, because NMIs are not masked on VM exit. This in fact makes things
>> potentially messy; unlike with AMD's CLGI/STGI, only MSRs and other things
>> that Intel thought can be restored atomically with the VM exit.
>
> FWIW, NMIs are masked if the VM-Exit was due to an NMI.
Then this whole change is busted, since nothing will unmask NMIs. Revert it?
On 04/05/21 23:23, Andy Lutomirski wrote:
>> On May 4, 2021, at 2:21 PM, Sean Christopherson <[email protected]> wrote:
>> FWIW, NMIs are masked if the VM-Exit was due to an NMI.
Huh, indeed: "An NMI causes subsequent NMIs to be blocked, but only
after the VM exit completes".
> Then this whole change is busted, since nothing will unmask NMIs. Revert it?
Looks like the easiest way out indeed.
Paolo
On Tue, May 04, 2021, Paolo Bonzini wrote:
> On 04/05/21 23:23, Andy Lutomirski wrote:
> > > On May 4, 2021, at 2:21 PM, Sean Christopherson <[email protected]> wrote:
> > > FWIW, NMIs are masked if the VM-Exit was due to an NMI.
>
> Huh, indeed: "An NMI causes subsequent NMIs to be blocked, but only after
> the VM exit completes".
>
> > Then this whole change is busted, since nothing will unmask NMIs. Revert it?
SMI? #MC? :-)
> Looks like the easiest way out indeed.
I've no objection to reverting to intn, but what does reverting versus handling
NMI on the kernel stack have to do with NMIs being blocked on VM-Exit due to NMI?
I'm struggling mightily to connect the dots.
On Wed, May 5, 2021 at 5:23 AM Andy Lutomirski <[email protected]> wrote:
>
>
> > On May 4, 2021, at 2:21 PM, Sean Christopherson <[email protected]> wrote:
> >
> > On Tue, May 04, 2021, Paolo Bonzini wrote:
> >>> On 04/05/21 23:05, Maxim Levitsky wrote:
> >>> Does this mean that we still rely on hardware NMI masking to be activated?
> >>
> >> No, the NMI code already handles reentrancy at both the assembly and C
> >> levels.
> >>
> >>> Or in other words, that is we still can't have an IRET between VM exit and
> >>> the entry to the NMI handler?
> >>
> >> No, because NMIs are not masked on VM exit. This in fact makes things
> >> potentially messy; unlike with AMD's CLGI/STGI, only MSRs and other things
> >> that Intel thought can be restored atomically with the VM exit.
> >
> > FWIW, NMIs are masked if the VM-Exit was due to an NMI.
>
> Then this whole change is busted, since nothing will unmask NMIs. Revert it?
There is some instructable code between VMEXIT and
handle_exception_nmi_irqoff().
The possible #DB #BP can happen in this gap and the IRET
of the handler of #DB #BP will unmask NMI.
Another way to fix is to change the VMX code to call the NMI handler
immediately after VMEXIT before leaving "nostr" section.
Reverting it can't fix the problem.
> On May 4, 2021, at 6:08 PM, Lai Jiangshan <[email protected]> wrote:
>
> On Wed, May 5, 2021 at 5:23 AM Andy Lutomirski <[email protected]> wrote:
>>
>>
>>>> On May 4, 2021, at 2:21 PM, Sean Christopherson <[email protected]> wrote:
>>>
>>> On Tue, May 04, 2021, Paolo Bonzini wrote:
>>>>> On 04/05/21 23:05, Maxim Levitsky wrote:
>>>>> Does this mean that we still rely on hardware NMI masking to be activated?
>>>>
>>>> No, the NMI code already handles reentrancy at both the assembly and C
>>>> levels.
>>>>
>>>>> Or in other words, that is we still can't have an IRET between VM exit and
>>>>> the entry to the NMI handler?
>>>>
>>>> No, because NMIs are not masked on VM exit. This in fact makes things
>>>> potentially messy; unlike with AMD's CLGI/STGI, only MSRs and other things
>>>> that Intel thought can be restored atomically with the VM exit.
>>>
>>> FWIW, NMIs are masked if the VM-Exit was due to an NMI.
>>
>> Then this whole change is busted, since nothing will unmask NMIs. Revert it?
>
> There is some instructable code between VMEXIT and
> handle_exception_nmi_irqoff().
>
> The possible #DB #BP can happen in this gap and the IRET
> of the handler of #DB #BP will unmask NMI.
>
> Another way to fix is to change the VMX code to call the NMI handler
> immediately after VMEXIT before leaving "nostr" section.
>
> Reverting it can't fix the problem.
I was indeed wrong, and the helper properly unmasks NMIs. So all should be well.
I will contemplate how this all interacts with FRED.