2021-04-27 08:55:38

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack

From: Lai Jiangshan <[email protected]>

In VMX, the NMI handler needs to be invoked after NMI VM-Exit.

Before the commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via
indirect call instead of INTn"), the work is done by INTn ("int $2").

But INTn microcode is relatively expensive, so the commit reworked
NMI VM-Exit handling to invoke the kernel handler by function call.
And INTn doesn't set the NMI blocked flag required by the linux kernel
NMI entry. So moving away from INTn are very reasonable.

Yet some details were missed. After the said commit applied, the NMI
entry pointer is fetched from the IDT table and called from the kernel
stack. But the NMI entry pointer installed on the IDT table is
asm_exc_nmi() which expects to be invoked on the IST stack by the ISA.
And it relies on the "NMI executing" variable on the IST stack to work
correctly. When it is unexpectedly called from the kernel stack, the
RSP-located "NMI executing" variable is also on the kernel stack and
is "uninitialized" and can cause the NMI entry to run in the wrong way.

During fixing the problem for KVM, I found that there might be the same
problem for early booting stage where the IST is not set up. asm_exc_nmi()
is not allowed to be used in this stage for the same reason about
the RSP-located "NMI executing" variable.

For both cases, we should use asm_noist_exc_nmi() which is introduced
in the patch 1 via renaming from an existing asm_xenpv_exc_nmi() and
which is safe on the kernel stack.

https://lore.kernel.org/lkml/[email protected]/

Cc: Thomas Gleixner <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Jim Mattson <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: [email protected]
Cc: Josh Poimboeuf <[email protected]>
Cc: Uros Bizjak <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>

Lai Jiangshan (4):
x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage
KVM/VMX: Invoke NMI non-IST entry instead of IST entry
KVM/VMX: fold handle_interrupt_nmi_irqoff() into its solo caller

arch/x86/include/asm/idtentry.h | 4 +---
arch/x86/kernel/idt.c | 8 +++++++-
arch/x86/kernel/nmi.c | 12 ++++++++++++
arch/x86/kvm/vmx/vmx.c | 27 ++++++++++++++-------------
arch/x86/xen/enlighten_pv.c | 9 +++------
arch/x86/xen/xen-asm.S | 2 +-
6 files changed, 38 insertions(+), 24 deletions(-)

--
2.19.1.6.gb485710b


2021-04-27 08:56:03

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 2/4] x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage

From: Lai Jiangshan <[email protected]>

While the other entries for the exceptions which use Interrupt stacks can
be also used on the kernel stack, asm_exc_nmi() can not be used on the
kernel stack for it relies on the RSP-located "NMI executing" variable
which expects to on a fixed location in the NMI IST stack. When it is
unexpectedly called from the kernel stack, the RSP-located "NMI executing"
variable is also on the kernel stack and is "uninitialized" and can cause
the NMI entry to run in the wrong way.

Cc: Thomas Gleixner <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Jim Mattson <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Uros Bizjak <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/idtentry.h | 2 --
arch/x86/kernel/idt.c | 8 +++++++-
arch/x86/kernel/nmi.c | 7 ++++---
3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 5b11d2ddbb5c..0831c0da5957 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -589,9 +589,7 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC, xenpv_exc_machine_check);

/* NMI */
DECLARE_IDTENTRY_NMI(X86_TRAP_NMI, exc_nmi);
-#ifdef CONFIG_XEN_PV
DECLARE_IDTENTRY_RAW(X86_TRAP_NMI, noist_exc_nmi);
-#endif

/* #DB */
#ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index d552f177eca0..c75409633f16 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -71,10 +71,16 @@ static const __initconst struct idt_data early_idts[] = {
* cpu_init() is invoked. Interrupt stacks cannot be used at that point and
* the traps which use them are reinitialized with IST after cpu_init() has
* set up TSS.
+ *
+ * While the other entries for the exceptions which use Interrupt stacks can
+ * be also used on the kernel stack, asm_exc_nmi() can not be used on the
+ * kernel stack for it relies on the RSP-located "NMI executing" variable
+ * which expects to on a fixed location in the NMI IST stack. For early
+ * booting stage, asm_noist_exc_nmi() is used for NMI.
*/
static const __initconst struct idt_data def_idts[] = {
INTG(X86_TRAP_DE, asm_exc_divide_error),
- INTG(X86_TRAP_NMI, asm_exc_nmi),
+ INTG(X86_TRAP_NMI, asm_noist_exc_nmi),
INTG(X86_TRAP_BR, asm_exc_bounds),
INTG(X86_TRAP_UD, asm_exc_invalid_op),
INTG(X86_TRAP_NM, asm_exc_device_not_available),
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 2b907a76d0a1..2fb1fd59d714 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -524,13 +524,14 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
mds_user_clear_cpu_buffers();
}

-#ifdef CONFIG_XEN_PV
DEFINE_IDTENTRY_RAW(noist_exc_nmi)
{
- /* On Xen PV, NMI doesn't use IST. The C part is the same as native. */
+ /*
+ * On Xen PV and early booting stage, NMI doesn't use IST.
+ * The C part is the same as native.
+ */
exc_nmi(regs);
}
-#endif

void stop_nmi(void)
{
--
2.19.1.6.gb485710b

2021-04-27 08:56:03

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi

From: Lai Jiangshan <[email protected]>

There is no any functionality change intended. Just rename it and
move it to arch/x86/kernel/nmi.c so that we can resue it later in
next patch for early NMI and kvm.

Cc: Thomas Gleixner <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Jim Mattson <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: [email protected]
Cc: Josh Poimboeuf <[email protected]>
Cc: Uros Bizjak <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/idtentry.h | 2 +-
arch/x86/kernel/nmi.c | 8 ++++++++
arch/x86/xen/enlighten_pv.c | 9 +++------
arch/x86/xen/xen-asm.S | 2 +-
4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index e35e342673c7..5b11d2ddbb5c 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -590,7 +590,7 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC, xenpv_exc_machine_check);
/* NMI */
DECLARE_IDTENTRY_NMI(X86_TRAP_NMI, exc_nmi);
#ifdef CONFIG_XEN_PV
-DECLARE_IDTENTRY_RAW(X86_TRAP_NMI, xenpv_exc_nmi);
+DECLARE_IDTENTRY_RAW(X86_TRAP_NMI, noist_exc_nmi);
#endif

/* #DB */
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index bf250a339655..2b907a76d0a1 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -524,6 +524,14 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
mds_user_clear_cpu_buffers();
}

+#ifdef CONFIG_XEN_PV
+DEFINE_IDTENTRY_RAW(noist_exc_nmi)
+{
+ /* On Xen PV, NMI doesn't use IST. The C part is the same as native. */
+ exc_nmi(regs);
+}
+#endif
+
void stop_nmi(void)
{
ignore_nmis++;
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 4f18cd9eacd8..5efbdb0905b7 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -565,12 +565,6 @@ static void xen_write_ldt_entry(struct desc_struct *dt, int entrynum,

void noist_exc_debug(struct pt_regs *regs);

-DEFINE_IDTENTRY_RAW(xenpv_exc_nmi)
-{
- /* On Xen PV, NMI doesn't use IST. The C part is the same as native. */
- exc_nmi(regs);
-}
-
DEFINE_IDTENTRY_RAW_ERRORCODE(xenpv_exc_double_fault)
{
/* On Xen PV, DF doesn't use IST. The C part is the same as native. */
@@ -626,6 +620,9 @@ struct trap_array_entry {
.xen = xen_asm_xenpv_##func, \
.ist_okay = ist_ok }

+/* Alias to make TRAP_ENTRY_REDIR() happy for nmi */
+#define xen_asm_xenpv_exc_nmi xen_asm_noist_exc_nmi
+
static struct trap_array_entry trap_array[] = {
TRAP_ENTRY_REDIR(exc_debug, true ),
TRAP_ENTRY_REDIR(exc_double_fault, true ),
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 1e626444712b..12e7cbbb2a8d 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -130,7 +130,7 @@ _ASM_NOKPROBE(xen_\name)
xen_pv_trap asm_exc_divide_error
xen_pv_trap asm_xenpv_exc_debug
xen_pv_trap asm_exc_int3
-xen_pv_trap asm_xenpv_exc_nmi
+xen_pv_trap asm_noist_exc_nmi
xen_pv_trap asm_exc_overflow
xen_pv_trap asm_exc_bounds
xen_pv_trap asm_exc_invalid_op
--
2.19.1.6.gb485710b

2021-04-27 08:56:23

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry

From: Lai Jiangshan <[email protected]>

In VMX, the NMI handler needs to be invoked after NMI VM-Exit.

Before the commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via
indirect call instead of INTn"), the work is done by INTn ("int $2").

But INTn microcode is relatively expensive, so the commit reworked
NMI VM-Exit handling to invoke the kernel handler by function call.
And INTn doesn't set the NMI blocked flag required by the linux kernel
NMI entry. So moving away from INTn are very reasonable.

Yet some details were missed. After the said commit applied, the NMI
entry pointer is fetched from the IDT table and called from the kernel
stack. But the NMI entry pointer installed on the IDT table is
asm_exc_nmi() which expects to be invoked on the IST stack by the ISA.
And it relies on the "NMI executing" variable on the IST stack to work
correctly. When it is unexpectedly called from the kernel stack, the
RSP-located "NMI executing" variable is also on the kernel stack and
is "uninitialized" and can cause the NMI entry to run in the wrong way.

So we should not used the NMI entry installed on the IDT table. Rather,
we should use the NMI entry allowed to be used on the kernel stack which
is asm_noist_exc_nmi() which is also used for XENPV and early booting.

Link: https://lore.kernel.org/lkml/[email protected]/
Cc: Thomas Gleixner <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Jim Mattson <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: [email protected]
Cc: Josh Poimboeuf <[email protected]>
Cc: Uros Bizjak <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kernel/nmi.c | 3 +++
arch/x86/kvm/vmx/vmx.c | 8 ++++++--
2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 2fb1fd59d714..919f0400d931 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -528,10 +528,13 @@ DEFINE_IDTENTRY_RAW(noist_exc_nmi)
{
/*
* On Xen PV and early booting stage, NMI doesn't use IST.
+ * And when it is manually called from VMX NMI VM-Exit handler,
+ * it doesn't use IST either.
* The C part is the same as native.
*/
exc_nmi(regs);
}
+EXPORT_SYMBOL_GPL(asm_noist_exc_nmi);

void stop_nmi(void)
{
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index bcbf0d2139e9..96e59d912637 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -36,6 +36,7 @@
#include <asm/debugreg.h>
#include <asm/desc.h>
#include <asm/fpu/internal.h>
+#include <asm/idtentry.h>
#include <asm/io.h>
#include <asm/irq_remapping.h>
#include <asm/kexec.h>
@@ -6416,8 +6417,11 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
else if (is_machine_check(intr_info))
kvm_machine_check();
/* We need to handle NMIs before interrupts are enabled */
- else if (is_nmi(intr_info))
- handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
+ else if (is_nmi(intr_info)) {
+ kvm_before_interrupt(&vmx->vcpu);
+ vmx_do_interrupt_nmi_irqoff((unsigned long)asm_noist_exc_nmi);
+ kvm_after_interrupt(&vmx->vcpu);
+ }
}

static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
--
2.19.1.6.gb485710b

2021-04-27 08:56:45

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller

From: Lai Jiangshan <[email protected]>

The function handle_interrupt_nmi_irqoff() is called only once and
it doesn't handle for NMI, so its name is outdated.

Cc: Sean Christopherson <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 19 ++++++++-----------
1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 96e59d912637..92c22211203e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6396,16 +6396,6 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)

void vmx_do_interrupt_nmi_irqoff(unsigned long entry);

-static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
-{
- unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
- gate_desc *desc = (gate_desc *)host_idt_base + vector;
-
- kvm_before_interrupt(vcpu);
- vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
- kvm_after_interrupt(vcpu);
-}
-
static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
{
u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
@@ -6427,12 +6417,19 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
{
u32 intr_info = vmx_get_intr_info(vcpu);
+ unsigned int vector;
+ gate_desc *desc;

if (WARN_ONCE(!is_external_intr(intr_info),
"KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
return;

- handle_interrupt_nmi_irqoff(vcpu, intr_info);
+ vector = intr_info & INTR_INFO_VECTOR_MASK;
+ desc = (gate_desc *)host_idt_base + vector;
+
+ kvm_before_interrupt(vcpu);
+ vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
+ kvm_after_interrupt(vcpu);
}

static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
--
2.19.1.6.gb485710b

2021-04-28 21:29:28

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi

On Tue, 27 Apr 2021 07:09:46 +0800
Lai Jiangshan <[email protected]> wrote:

> From: Lai Jiangshan <[email protected]>
>
> There is no any functionality change intended. Just rename it and
> move it to arch/x86/kernel/nmi.c so that we can resue it later in
> next patch for early NMI and kvm.

Nit, but in change logs, please avoid stating "next patch" as searching git
history (via git blame or whatever) there is no such thing as "next patch".

Just state: "so that we can reuse it for early NMI and KVM."

I also just noticed the typo in "resue". Or maybe both NMI and KVM should
be sued again ;-)

-- Steve

2021-04-28 21:36:11

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/4] x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage

On Tue, 27 Apr 2021 07:09:47 +0800
Lai Jiangshan <[email protected]> wrote:

> While the other entries for the exceptions which use Interrupt stacks can
> be also used on the kernel stack, asm_exc_nmi() can not be used on the
> kernel stack for it relies on the RSP-located "NMI executing" variable
> which expects to on a fixed location in the NMI IST stack. When it is
> unexpectedly called from the kernel stack, the RSP-located "NMI executing"
> variable is also on the kernel stack and is "uninitialized" and can cause
> the NMI entry to run in the wrong way.

Acked-by: Steven Rostedt (VMware) <[email protected]>

-- Steve

2021-04-30 02:50:46

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry



On 2021/4/27 07:09, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
> In VMX, the NMI handler needs to be invoked after NMI VM-Exit.
>
> Before the commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via
> indirect call instead of INTn"), the work is done by INTn ("int $2").
>
> But INTn microcode is relatively expensive, so the commit reworked
> NMI VM-Exit handling to invoke the kernel handler by function call.
> And INTn doesn't set the NMI blocked flag required by the linux kernel
> NMI entry. So moving away from INTn are very reasonable.
>
> Yet some details were missed. After the said commit applied, the NMI
> entry pointer is fetched from the IDT table and called from the kernel
> stack. But the NMI entry pointer installed on the IDT table is
> asm_exc_nmi() which expects to be invoked on the IST stack by the ISA.
> And it relies on the "NMI executing" variable on the IST stack to work
> correctly. When it is unexpectedly called from the kernel stack, the
> RSP-located "NMI executing" variable is also on the kernel stack and
> is "uninitialized" and can cause the NMI entry to run in the wrong way.
>
> So we should not used the NMI entry installed on the IDT table. Rather,
> we should use the NMI entry allowed to be used on the kernel stack which
> is asm_noist_exc_nmi() which is also used for XENPV and early booting.
>

The problem can be tested by the following testing-patch.

1) the testing-patch can be applied without conflict before this patch 3.
And it shows the problem that the NMI is missed in the case.

2) you need to manually copy the same logic of this testing-patch to verify
this patch 3. And it shows that the problem is fixed.

3) the only one line of code in vmenter.S just emulates the situation that
a "uninitialized" garbage in the kernel stack happens to be 1 and it happens
to be at the same location of the RSP-located "NMI executing" variable.


diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 3a6461694fc2..32096049c2a2 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -316,6 +316,7 @@ SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
#endif
pushf
push $__KERNEL_CS
+ movq $1, -24(%rsp) // "NMI executing": 1 = nested, non-1 = not-nested
CALL_NOSPEC _ASM_ARG1

/*
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index bcbf0d2139e9..9509d2edd50d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6416,8 +6416,12 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
else if (is_machine_check(intr_info))
kvm_machine_check();
/* We need to handle NMIs before interrupts are enabled */
- else if (is_nmi(intr_info))
+ else if (is_nmi(intr_info)) {
+ unsigned long count = this_cpu_read(irq_stat.__nmi_count);
handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
+ if (count == this_cpu_read(irq_stat.__nmi_count))
+ pr_info("kvm nmi miss\n");
+ }
}

static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)

2021-04-30 07:15:14

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack

On 27/04/21 01:09, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
> In VMX, the NMI handler needs to be invoked after NMI VM-Exit.
>
> Before the commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via
> indirect call instead of INTn"), the work is done by INTn ("int $2").
>
> But INTn microcode is relatively expensive, so the commit reworked
> NMI VM-Exit handling to invoke the kernel handler by function call.
> And INTn doesn't set the NMI blocked flag required by the linux kernel
> NMI entry. So moving away from INTn are very reasonable.
>
> Yet some details were missed. After the said commit applied, the NMI
> entry pointer is fetched from the IDT table and called from the kernel
> stack. But the NMI entry pointer installed on the IDT table is
> asm_exc_nmi() which expects to be invoked on the IST stack by the ISA.
> And it relies on the "NMI executing" variable on the IST stack to work
> correctly. When it is unexpectedly called from the kernel stack, the
> RSP-located "NMI executing" variable is also on the kernel stack and
> is "uninitialized" and can cause the NMI entry to run in the wrong way.
>
> During fixing the problem for KVM, I found that there might be the same
> problem for early booting stage where the IST is not set up. asm_exc_nmi()
> is not allowed to be used in this stage for the same reason about
> the RSP-located "NMI executing" variable.
>
> For both cases, we should use asm_noist_exc_nmi() which is introduced
> in the patch 1 via renaming from an existing asm_xenpv_exc_nmi() and
> which is safe on the kernel stack.
>
> https://lore.kernel.org/lkml/[email protected]/

For the KVM part,

Acked-by: Paolo Bonzini <[email protected]>

Thanks,

Paolo

> Cc: Thomas Gleixner <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Sean Christopherson <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Vitaly Kuznetsov <[email protected]>
> Cc: Wanpeng Li <[email protected]>
> Cc: Jim Mattson <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: [email protected]
> Cc: Josh Poimboeuf <[email protected]>
> Cc: Uros Bizjak <[email protected]>
> Cc: Maxim Levitsky <[email protected]>
> Signed-off-by: Lai Jiangshan <[email protected]>
>
> Lai Jiangshan (4):
> x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi
> x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage
> KVM/VMX: Invoke NMI non-IST entry instead of IST entry
> KVM/VMX: fold handle_interrupt_nmi_irqoff() into its solo caller
>
> arch/x86/include/asm/idtentry.h | 4 +---
> arch/x86/kernel/idt.c | 8 +++++++-
> arch/x86/kernel/nmi.c | 12 ++++++++++++
> arch/x86/kvm/vmx/vmx.c | 27 ++++++++++++++-------------
> arch/x86/xen/enlighten_pv.c | 9 +++------
> arch/x86/xen/xen-asm.S | 2 +-
> 6 files changed, 38 insertions(+), 24 deletions(-)
>

2021-04-30 07:17:35

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi

On 28/04/21 23:27, Steven Rostedt wrote:
> On Tue, 27 Apr 2021 07:09:46 +0800
> Lai Jiangshan <[email protected]> wrote:
>
>> From: Lai Jiangshan <[email protected]>
>>
>> There is no any functionality change intended. Just rename it and
>> move it to arch/x86/kernel/nmi.c so that we can resue it later in
>> next patch for early NMI and kvm.
>
> Nit, but in change logs, please avoid stating "next patch" as searching git
> history (via git blame or whatever) there is no such thing as "next patch".

Interesting, I use next patch(es) relatively often, though you're right
that something like "in preparation for" works just as well. Yes, it's
the previous in "git log", but you get what it's meant in practice. :)

Paolo

> Just state: "so that we can reuse it for early NMI and KVM."
>
> I also just noticed the typo in "resue". Or maybe both NMI and KVM should
> be sued again ;-)
>
> -- Steve
>

2021-04-30 09:08:44

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller

Lai,

On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
> u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
> @@ -6427,12 +6417,19 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
> static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
> {
> u32 intr_info = vmx_get_intr_info(vcpu);
> + unsigned int vector;
> + gate_desc *desc;
>
> if (WARN_ONCE(!is_external_intr(intr_info),
> "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
> return;
>
> - handle_interrupt_nmi_irqoff(vcpu, intr_info);
> + vector = intr_info & INTR_INFO_VECTOR_MASK;
> + desc = (gate_desc *)host_idt_base + vector;
> +
> + kvm_before_interrupt(vcpu);
> + vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
> + kvm_after_interrupt(vcpu);

So the previous patch does:

+ kvm_before_interrupt(&vmx->vcpu);
+ vmx_do_interrupt_nmi_irqoff((unsigned long)asm_noist_exc_nmi);
+ kvm_after_interrupt(&vmx->vcpu);

What is this idt gate descriptor dance for in this code?

Thanks,

tglx

2021-04-30 09:09:57

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller

On 30/04/21 11:03, Thomas Gleixner wrote:
> Lai,
>
> On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
>> u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
>> @@ -6427,12 +6417,19 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
>> static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
>> {
>> u32 intr_info = vmx_get_intr_info(vcpu);
>> + unsigned int vector;
>> + gate_desc *desc;
>>
>> if (WARN_ONCE(!is_external_intr(intr_info),
>> "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
>> return;
>>
>> - handle_interrupt_nmi_irqoff(vcpu, intr_info);
>> + vector = intr_info & INTR_INFO_VECTOR_MASK;
>> + desc = (gate_desc *)host_idt_base + vector;
>> +
>> + kvm_before_interrupt(vcpu);
>> + vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
>> + kvm_after_interrupt(vcpu);
>
> So the previous patch does:
>
> + kvm_before_interrupt(&vmx->vcpu);
> + vmx_do_interrupt_nmi_irqoff((unsigned long)asm_noist_exc_nmi);
> + kvm_after_interrupt(&vmx->vcpu);
>
> What is this idt gate descriptor dance for in this code?

NMIs are sent through a different vmexit code (the same one as
exceptions). This one is for interrupts.

Paolo

2021-04-30 12:07:44

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi

On Fri, 30 Apr 2021 09:15:51 +0200
Paolo Bonzini <[email protected]> wrote:

> > Nit, but in change logs, please avoid stating "next patch" as searching git
> > history (via git blame or whatever) there is no such thing as "next patch".
>
> Interesting, I use next patch(es) relatively often, though you're right
> that something like "in preparation for" works just as well. Yes, it's
> the previous in "git log", but you get what it's meant in practice. :)

It's not always the previous in a git log. Git log sorts by time, and
if an unrelated commit was created in between those two patches, it
will be in between them.

-- Steve

2021-04-30 23:31:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 4/4] KVM/VMX: Fold handle_interrupt_nmi_irqoff() into its solo caller

On Fri, Apr 30 2021 at 11:06, Paolo Bonzini wrote:

> On 30/04/21 11:03, Thomas Gleixner wrote:
>> Lai,
>>
>> On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
>>> u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
>>> @@ -6427,12 +6417,19 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
>>> static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
>>> {
>>> u32 intr_info = vmx_get_intr_info(vcpu);
>>> + unsigned int vector;
>>> + gate_desc *desc;
>>>
>>> if (WARN_ONCE(!is_external_intr(intr_info),
>>> "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
>>> return;
>>>
>>> - handle_interrupt_nmi_irqoff(vcpu, intr_info);
>>> + vector = intr_info & INTR_INFO_VECTOR_MASK;
>>> + desc = (gate_desc *)host_idt_base + vector;
>>> +
>>> + kvm_before_interrupt(vcpu);
>>> + vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
>>> + kvm_after_interrupt(vcpu);
>>
>> So the previous patch does:
>>
>> + kvm_before_interrupt(&vmx->vcpu);
>> + vmx_do_interrupt_nmi_irqoff((unsigned long)asm_noist_exc_nmi);
>> + kvm_after_interrupt(&vmx->vcpu);
>>
>> What is this idt gate descriptor dance for in this code?
>
> NMIs are sent through a different vmexit code (the same one as
> exceptions). This one is for interrupts.

Duh. Yes. The ability to read is clearly an advantage...

2021-05-03 18:09:14

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 0/4] x86: Don't invoke asm_exc_nmi() on the kernel stack

On Fri, Apr 30 2021 at 09:14, Paolo Bonzini wrote:
> On 27/04/21 01:09, Lai Jiangshan wrote:
>
> Acked-by: Paolo Bonzini <[email protected]>

Thanks Paolo. I'm working through it now...

2021-05-03 19:07:22

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi

On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
> There is no any functionality change intended. Just rename it and
> move it to arch/x86/kernel/nmi.c so that we can resue it later in
> next patch for early NMI and kvm.

'Reuse it later' is not really a proper explanation why this change it
necessary.

Also this can be simplified by using aliasing which keeps the name
spaces intact.

Thanks,

tglx
---

--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -135,6 +135,9 @@ static __always_inline void __##func(str
#define DEFINE_IDTENTRY_RAW(func) \
__visible noinstr void func(struct pt_regs *regs)

+#define DEFINE_IDTENTRY_RAW_ALIAS(alias, func) \
+__visible noinstr void func(struct pt_regs *regs) __alias(alias)
+
/**
* DECLARE_IDTENTRY_RAW_ERRORCODE - Declare functions for raw IDT entry points
* Error code pushed by hardware
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -524,6 +524,8 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
mds_user_clear_cpu_buffers();
}

+DEFINE_IDTENTRY_RAW_ALIAS(exc_nmi, xenpv_exc_nmi);
+
void stop_nmi(void)
{
ignore_nmis++;
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -565,12 +565,6 @@ static void xen_write_ldt_entry(struct d

void noist_exc_debug(struct pt_regs *regs);

-DEFINE_IDTENTRY_RAW(xenpv_exc_nmi)
-{
- /* On Xen PV, NMI doesn't use IST. The C part is the same as native. */
- exc_nmi(regs);
-}
-
DEFINE_IDTENTRY_RAW_ERRORCODE(xenpv_exc_double_fault)
{
/* On Xen PV, DF doesn't use IST. The C part is the same as native. */

2021-05-03 19:39:36

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry

On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
> In VMX, the NMI handler needs to be invoked after NMI VM-Exit.
>
> Before the commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via
> indirect call instead of INTn"), the work is done by INTn ("int $2").
>
> But INTn microcode is relatively expensive, so the commit reworked
> NMI VM-Exit handling to invoke the kernel handler by function call.
> And INTn doesn't set the NMI blocked flag required by the linux kernel
> NMI entry. So moving away from INTn are very reasonable.
>
> Yet some details were missed. After the said commit applied, the NMI
> entry pointer is fetched from the IDT table and called from the kernel
> stack. But the NMI entry pointer installed on the IDT table is
> asm_exc_nmi() which expects to be invoked on the IST stack by the ISA.
> And it relies on the "NMI executing" variable on the IST stack to work
> correctly. When it is unexpectedly called from the kernel stack, the
> RSP-located "NMI executing" variable is also on the kernel stack and
> is "uninitialized" and can cause the NMI entry to run in the wrong way.
>
> So we should not used the NMI entry installed on the IDT table. Rather,
> we should use the NMI entry allowed to be used on the kernel stack which
> is asm_noist_exc_nmi() which is also used for XENPV and early booting.

It's not used by XENPV. XENPV only uses the C entry point, but the ASM
entry is separate.

Thanks,

tglx

2021-05-03 19:43:33

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi

On Mon, May 03 2021 at 21:05, Thomas Gleixner wrote:

> On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
>> From: Lai Jiangshan <[email protected]>
>>
>> There is no any functionality change intended. Just rename it and
>> move it to arch/x86/kernel/nmi.c so that we can resue it later in
>> next patch for early NMI and kvm.
>
> 'Reuse it later' is not really a proper explanation why this change it
> necessary.
>
> Also this can be simplified by using aliasing which keeps the name
> spaces intact.

Aside of that this is not required to be part of a fixes series which
needs to be backported.

Thanks,

tglx

2021-05-03 20:03:42

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry

On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index bcbf0d2139e9..96e59d912637 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -36,6 +36,7 @@
> #include <asm/debugreg.h>
> #include <asm/desc.h>
> #include <asm/fpu/internal.h>
> +#include <asm/idtentry.h>
> #include <asm/io.h>
> #include <asm/irq_remapping.h>
> #include <asm/kexec.h>
> @@ -6416,8 +6417,11 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
> else if (is_machine_check(intr_info))
> kvm_machine_check();
> /* We need to handle NMIs before interrupts are enabled */
> - else if (is_nmi(intr_info))
> - handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
> + else if (is_nmi(intr_info)) {

Lacks curly braces for all of the above conditions according to coding style.

> + kvm_before_interrupt(&vmx->vcpu);
> + vmx_do_interrupt_nmi_irqoff((unsigned long)asm_noist_exc_nmi);
> + kvm_after_interrupt(&vmx->vcpu);
> + }

but this and the next patch are not really needed. The below avoids the
extra kvm_before/after() dance in both places. Hmm?

Thanks,

tglx
---
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -526,6 +526,10 @@ DEFINE_IDTENTRY_RAW(exc_nmi)

DEFINE_IDTENTRY_RAW_ALIAS(exc_nmi, exc_nmi_noist);

+#if IS_MODULE(CONFIG_KVM_INTEL)
+EXPORT_SYMBOL_GPL(asm_exc_nmi_noist);
+#endif
+
void stop_nmi(void)
{
ignore_nmis++;
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -36,6 +36,7 @@
#include <asm/debugreg.h>
#include <asm/desc.h>
#include <asm/fpu/internal.h>
+#include <asm/idtentry.h>
#include <asm/io.h>
#include <asm/irq_remapping.h>
#include <asm/kexec.h>
@@ -6395,18 +6396,17 @@ static void vmx_apicv_post_state_restore

void vmx_do_interrupt_nmi_irqoff(unsigned long entry);

-static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
+static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
+ unsigned long entry)
{
- unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
- gate_desc *desc = (gate_desc *)host_idt_base + vector;
-
kvm_before_interrupt(vcpu);
- vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
+ vmx_do_interrupt_nmi_irqoff(entry);
kvm_after_interrupt(vcpu);
}

static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
{
+ const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist;
u32 intr_info = vmx_get_intr_info(&vmx->vcpu);

/* if exit due to PF check for async PF */
@@ -6417,18 +6417,20 @@ static void handle_exception_nmi_irqoff(
kvm_machine_check();
/* We need to handle NMIs before interrupts are enabled */
else if (is_nmi(intr_info))
- handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
+ handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
}

static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
{
u32 intr_info = vmx_get_intr_info(vcpu);
+ unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
+ gate_desc *desc = (gate_desc *)host_idt_base + vector;

if (WARN_ONCE(!is_external_intr(intr_info),
"KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
return;

- handle_interrupt_nmi_irqoff(vcpu, intr_info);
+ handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
}

static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)

2021-05-03 20:14:51

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 2/4] x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage

On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
> + *
> + * While the other entries for the exceptions which use Interrupt stacks can
> + * be also used on the kernel stack, asm_exc_nmi() can not be used on the
> + * kernel stack for it relies on the RSP-located "NMI executing" variable
> + * which expects to on a fixed location in the NMI IST stack. For early
> + * booting stage, asm_noist_exc_nmi() is used for NMI.
> */
> static const __initconst struct idt_data def_idts[] = {
> INTG(X86_TRAP_DE, asm_exc_divide_error),
> - INTG(X86_TRAP_NMI, asm_exc_nmi),
> + INTG(X86_TRAP_NMI, asm_noist_exc_nmi),

Actually this is a x86_64 only problem. The 32bit variant is fine, but
for consistency there is no problem to have that extra entry point on
32bit as well.

Thanks,

tglx

2021-05-03 20:25:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 2/4] x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage

On Mon, May 03 2021 at 22:13, Thomas Gleixner wrote:

> On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
>> + *
>> + * While the other entries for the exceptions which use Interrupt stacks can
>> + * be also used on the kernel stack, asm_exc_nmi() can not be used on the
>> + * kernel stack for it relies on the RSP-located "NMI executing" variable
>> + * which expects to on a fixed location in the NMI IST stack. For early
>> + * booting stage, asm_noist_exc_nmi() is used for NMI.
>> */
>> static const __initconst struct idt_data def_idts[] = {
>> INTG(X86_TRAP_DE, asm_exc_divide_error),
>> - INTG(X86_TRAP_NMI, asm_exc_nmi),
>> + INTG(X86_TRAP_NMI, asm_noist_exc_nmi),
>
> Actually this is a x86_64 only problem. The 32bit variant is fine, but
> for consistency there is no problem to have that extra entry point on
> 32bit as well.

Bah, no. This patch breaks 32bit because on 32bit nothing sets the entry
to asm_exc_nmi() later on.

Thanks,

tglx

2021-05-03 21:47:04

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 2/4] x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage

On Mon, May 03 2021 at 22:24, Thomas Gleixner wrote:

> On Mon, May 03 2021 at 22:13, Thomas Gleixner wrote:
>
>> On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
>>> + *
>>> + * While the other entries for the exceptions which use Interrupt stacks can
>>> + * be also used on the kernel stack, asm_exc_nmi() can not be used on the
>>> + * kernel stack for it relies on the RSP-located "NMI executing" variable
>>> + * which expects to on a fixed location in the NMI IST stack. For early
>>> + * booting stage, asm_noist_exc_nmi() is used for NMI.
>>> */
>>> static const __initconst struct idt_data def_idts[] = {
>>> INTG(X86_TRAP_DE, asm_exc_divide_error),
>>> - INTG(X86_TRAP_NMI, asm_exc_nmi),
>>> + INTG(X86_TRAP_NMI, asm_noist_exc_nmi),
>>
>> Actually this is a x86_64 only problem. The 32bit variant is fine, but
>> for consistency there is no problem to have that extra entry point on
>> 32bit as well.
>
> Bah, no. This patch breaks 32bit because on 32bit nothing sets the entry
> to asm_exc_nmi() later on.

Sigh. Finding a fixes tag for this is complicated.

The problem was introduced in 4.14 with b70543a0b2b6 ("x86/idt: Move
regular trap init to tables").

Before that trap_init() installed an IST gate right away, but looking
deeper this was broken forever because there is a hen and egg problem.

ISTs only work after TSS is initialized and the ordering here is:

trap_init()
init_idt()
cpu_init()
init_tss()

So the original code had a race window between init_idt() and
init_tss(). Any IST using exception in that window goes south because
TSS is not initialized.

b70543a0b2b6 traded the above with that NMI issue. All other
exceptions are fine...

I'll think about it tomorrow some more...

2021-05-04 08:14:06

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 3/4] KVM/VMX: Invoke NMI non-IST entry instead of IST entry

On 03/05/21 22:02, Thomas Gleixner wrote:
> but this and the next patch are not really needed. The below avoids the
> extra kvm_before/after() dance in both places. Hmm?

Sure, that's good as well.

Paolo

> Thanks,
>
> tglx
> ---
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -526,6 +526,10 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
>
> DEFINE_IDTENTRY_RAW_ALIAS(exc_nmi, exc_nmi_noist);
>
> +#if IS_MODULE(CONFIG_KVM_INTEL)
> +EXPORT_SYMBOL_GPL(asm_exc_nmi_noist);
> +#endif
> +
> void stop_nmi(void)
> {
> ignore_nmis++;
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -36,6 +36,7 @@
> #include <asm/debugreg.h>
> #include <asm/desc.h>
> #include <asm/fpu/internal.h>
> +#include <asm/idtentry.h>
> #include <asm/io.h>
> #include <asm/irq_remapping.h>
> #include <asm/kexec.h>
> @@ -6395,18 +6396,17 @@ static void vmx_apicv_post_state_restore
>
> void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
>
> -static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
> +static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
> + unsigned long entry)
> {
> - unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
> - gate_desc *desc = (gate_desc *)host_idt_base + vector;
> -
> kvm_before_interrupt(vcpu);
> - vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
> + vmx_do_interrupt_nmi_irqoff(entry);
> kvm_after_interrupt(vcpu);
> }
>
> static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
> {
> + const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist;
> u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
>
> /* if exit due to PF check for async PF */
> @@ -6417,18 +6417,20 @@ static void handle_exception_nmi_irqoff(
> kvm_machine_check();
> /* We need to handle NMIs before interrupts are enabled */
> else if (is_nmi(intr_info))
> - handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
> + handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
> }
>
> static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
> {
> u32 intr_info = vmx_get_intr_info(vcpu);
> + unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
> + gate_desc *desc = (gate_desc *)host_idt_base + vector;
>
> if (WARN_ONCE(!is_external_intr(intr_info),
> "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
> return;
>
> - handle_interrupt_nmi_irqoff(vcpu, intr_info);
> + handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
> }
>
> static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)

2021-05-04 15:01:28

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 2/4] x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage

On Mon, May 03 2021 at 23:45, Thomas Gleixner wrote:
> The problem was introduced in 4.14 with b70543a0b2b6 ("x86/idt: Move
> regular trap init to tables").
>
> Before that trap_init() installed an IST gate right away, but looking
> deeper this was broken forever because there is a hen and egg problem.
>
> ISTs only work after TSS is initialized and the ordering here is:
>
> trap_init()
> init_idt()
> cpu_init()
> init_tss()
>
> So the original code had a race window between init_idt() and
> init_tss(). Any IST using exception in that window goes south because
> TSS is not initialized.
>
> b70543a0b2b6 traded the above with that NMI issue. All other
> exceptions are fine...
>
> I'll think about it tomorrow some more...

It does not really matter which way around we do it. Even if we do that
noist dance then still any NMI hitting _before_ init_idt() is going to
lala land. So having this tiny step in between is more or less cosmetic.

And just for completness sake, I don't see a reason why we have to set
up the idt gates _before_ the TSS muck, i.e. before cpu_init().

The only thing cpu_init() needs working which is not installed in the
early_idt is #GP because some cpu init code uses rd/wrmsrl_safe(). But
that's pretty much all of it.

So this wants a proper cleanup and not some paper over it with an extra
step and I don't see a reason why any of this should be backported
simply because it does not matter at all whether the early idt which
only populates a few essential gates is active for a bit longer.

So what we need is a solution for that KVM wreckage but that can be
stand alone.

Thanks,

tglx



2021-05-04 21:54:46

by Thomas Gleixner

[permalink] [raw]
Subject: [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry

From: Lai Jiangshan <[email protected]>

In VMX, the host NMI handler needs to be invoked after NMI VM-Exit.
Before commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect
call instead of INTn"), this was done by INTn ("int $2"). But INTn
microcode is relatively expensive, so the commit reworked NMI VM-Exit
handling to invoke the kernel handler by function call.

But this missed a detail. The NMI entry point for direct invocation is
fetched from the IDT table and called on the kernel stack. But on 64-bit
the NMI entry installed in the IDT expects to be invoked on the IST stack.
It relies on the "NMI executing" variable on the IST stack to work
correctly, which is at a fixed position in the IST stack. When the entry
point is unexpectedly called on the kernel stack, the RSP-addressed "NMI
executing" variable is obviously also on the kernel stack and is
"uninitialized" and can cause the NMI entry code to run in the wrong way.

Provide a non-ist entry point for VMX which shares the C-function with
the regular NMI entry and invoke the new asm entry point instead.

On 32-bit this just maps to the regular NMI entry point as 32-bit has no
ISTs and is not affected.

[ tglx: Made it independent for backporting, massaged changelog ]

Fixes: 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn")
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
---

Note: That's the minimal fix which needs to be backported and the other
stuff is cleanup material on top for 5.14.

---
arch/x86/include/asm/idtentry.h | 15 +++++++++++++++
arch/x86/kernel/nmi.c | 10 ++++++++++
arch/x86/kvm/vmx/vmx.c | 16 +++++++++-------
3 files changed, 34 insertions(+), 7 deletions(-)

--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -588,6 +588,21 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC, xenpv_
#endif

/* NMI */
+
+#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)
+/*
+ * Special NOIST entry point for VMX which invokes this on the kernel
+ * stack. asm_exc_nmi() requires an IST to work correctly vs. the NMI
+ * 'executing' marker.
+ *
+ * On 32bit this just uses the regular NMI entry point because 32-bit does
+ * not have ISTs.
+ */
+DECLARE_IDTENTRY(X86_TRAP_NMI, exc_nmi_noist);
+#else
+#define asm_exc_nmi_noist asm_exc_nmi
+#endif
+
DECLARE_IDTENTRY_NMI(X86_TRAP_NMI, exc_nmi);
#ifdef CONFIG_XEN_PV
DECLARE_IDTENTRY_RAW(X86_TRAP_NMI, xenpv_exc_nmi);
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -524,6 +524,16 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
mds_user_clear_cpu_buffers();
}

+#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)
+DEFINE_IDTENTRY_RAW(exc_nmi_noist)
+{
+ exc_nmi(regs);
+}
+#endif
+#if IS_MODULE(CONFIG_KVM_INTEL)
+EXPORT_SYMBOL_GPL(asm_exc_nmi_noist);
+#endif
+
void stop_nmi(void)
{
ignore_nmis++;
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -36,6 +36,7 @@
#include <asm/debugreg.h>
#include <asm/desc.h>
#include <asm/fpu/internal.h>
+#include <asm/idtentry.h>
#include <asm/io.h>
#include <asm/irq_remapping.h>
#include <asm/kexec.h>
@@ -6415,18 +6416,17 @@ static void vmx_apicv_post_state_restore

void vmx_do_interrupt_nmi_irqoff(unsigned long entry);

-static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
+static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
+ unsigned long entry)
{
- unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
- gate_desc *desc = (gate_desc *)host_idt_base + vector;
-
kvm_before_interrupt(vcpu);
- vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
+ vmx_do_interrupt_nmi_irqoff(entry);
kvm_after_interrupt(vcpu);
}

static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
{
+ const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist;
u32 intr_info = vmx_get_intr_info(&vmx->vcpu);

/* if exit due to PF check for async PF */
@@ -6437,18 +6437,20 @@ static void handle_exception_nmi_irqoff(
kvm_machine_check();
/* We need to handle NMIs before interrupts are enabled */
else if (is_nmi(intr_info))
- handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
+ handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
}

static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
{
u32 intr_info = vmx_get_intr_info(vcpu);
+ unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
+ gate_desc *desc = (gate_desc *)host_idt_base + vector;

if (WARN_ONCE(!is_external_intr(intr_info),
"KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
return;

- handle_interrupt_nmi_irqoff(vcpu, intr_info);
+ handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
}

static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)

2021-05-04 22:14:12

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry

On 04/05/21 23:51, Sean Christopherson wrote:
> On Tue, May 04, 2021, Paolo Bonzini wrote:
>> On 04/05/21 23:23, Andy Lutomirski wrote:
>>>> On May 4, 2021, at 2:21 PM, Sean Christopherson <[email protected]> wrote:
>>>> FWIW, NMIs are masked if the VM-Exit was due to an NMI.
>>
>> Huh, indeed: "An NMI causes subsequent NMIs to be blocked, but only after
>> the VM exit completes".
>>
>>> Then this whole change is busted, since nothing will unmask NMIs. Revert it?
>> Looks like the easiest way out indeed.
>
> I've no objection to reverting to intn, but what does reverting versus handling
> NMI on the kernel stack have to do with NMIs being blocked on VM-Exit due to NMI?
> I'm struggling mightily to connect the dots.

Nah, you're right: vmx_do_interrupt_nmi_irqoff will not call the handler
directly, rather it calls the IDT entrypoint which *will* do an IRET and
unmask NMIs. I trusted Andy too much on this one. :)

Thomas's posted patch ("[PATCH] KVM/VMX: Invoke NMI non-IST entry
instead of IST entry") looks good.

Paolo

2021-05-05 01:31:04

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry

On Tue, May 04 2021 at 23:56, Paolo Bonzini wrote:
> On 04/05/21 23:51, Sean Christopherson wrote:
>> On Tue, May 04, 2021, Paolo Bonzini wrote:
>>> On 04/05/21 23:23, Andy Lutomirski wrote:
>>>>> On May 4, 2021, at 2:21 PM, Sean Christopherson <[email protected]> wrote:
>>>>> FWIW, NMIs are masked if the VM-Exit was due to an NMI.
>>>
>>> Huh, indeed: "An NMI causes subsequent NMIs to be blocked, but only after
>>> the VM exit completes".
>>>
>>>> Then this whole change is busted, since nothing will unmask NMIs. Revert it?
>>> Looks like the easiest way out indeed.
>>
>> I've no objection to reverting to intn, but what does reverting versus handling
>> NMI on the kernel stack have to do with NMIs being blocked on VM-Exit due to NMI?
>> I'm struggling mightily to connect the dots.
>
> Nah, you're right: vmx_do_interrupt_nmi_irqoff will not call the handler
> directly, rather it calls the IDT entrypoint which *will* do an IRET and
> unmask NMIs. I trusted Andy too much on this one. :)
>
> Thomas's posted patch ("[PATCH] KVM/VMX: Invoke NMI non-IST entry
> instead of IST entry") looks good.

Well, looks good is one thing.

It would be more helpful if someone would actually review and/or test it.

Thanks,

tglx

2021-05-05 15:46:50

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry



On 2021/5/5 08:00, Thomas Gleixner wrote:
> On Tue, May 04 2021 at 23:56, Paolo Bonzini wrote:
>> On 04/05/21 23:51, Sean Christopherson wrote:
>>> On Tue, May 04, 2021, Paolo Bonzini wrote:
>>>> On 04/05/21 23:23, Andy Lutomirski wrote:
>>>>>> On May 4, 2021, at 2:21 PM, Sean Christopherson <[email protected]> wrote:
>>>>>> FWIW, NMIs are masked if the VM-Exit was due to an NMI.
>>>>
>>>> Huh, indeed: "An NMI causes subsequent NMIs to be blocked, but only after
>>>> the VM exit completes".
>>>>
>>>>> Then this whole change is busted, since nothing will unmask NMIs. Revert it?
>>>> Looks like the easiest way out indeed.
>>>
>>> I've no objection to reverting to intn, but what does reverting versus handling
>>> NMI on the kernel stack have to do with NMIs being blocked on VM-Exit due to NMI?
>>> I'm struggling mightily to connect the dots.
>>
>> Nah, you're right: vmx_do_interrupt_nmi_irqoff will not call the handler
>> directly, rather it calls the IDT entrypoint which *will* do an IRET and
>> unmask NMIs. I trusted Andy too much on this one. :)
>>
>> Thomas's posted patch ("[PATCH] KVM/VMX: Invoke NMI non-IST entry
>> instead of IST entry") looks good.
>
> Well, looks good is one thing.
>
> It would be more helpful if someone would actually review and/or test it.
>
> Thanks,
>
> tglx
>

I tested it with the following testing-patch applied, it shows that the
problem is fixed.

The only one line of code in vmenter.S in the testing-patch just emulates
the situation that a "uninitialized" garbage in the kernel stack happens
to be 1 and it happens to be at the same location of the RSP-located
"NMI executing" variable.


First round:
# apply the testing-patch
# perf record events of a vm which does kbuild inside
# dmesg shows that there are the same number of "kvm nmi" and "kvm nmi miss"
It shows that the problem exists with regard to the invocation of the NMI
handler.

Second Round:
# apply the fix from tglx
# apply the testing-patch
# perf record events of a vm which does kbuild inside
# dmesg shows that there are some "kvm nmi" but no "kvm nmi miss".
It shows that the problem is fixed.


diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 3a6461694fc2..32096049c2a2 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -316,6 +316,7 @@ SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
#endif
pushf
push $__KERNEL_CS
+ movq $1, -24(%rsp) // "NMI executing": 1 = nested, non-1 = not-nested
CALL_NOSPEC _ASM_ARG1

/*
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8586eca349a9..eefd22d22fce 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6439,8 +6439,17 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)

if (vmx->exit_reason.basic == EXIT_REASON_EXTERNAL_INTERRUPT)
handle_external_interrupt_irqoff(vcpu);
- else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI)
+ else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI) {
+ unsigned long count = this_cpu_read(irq_stat.__nmi_count);
+
handle_exception_nmi_irqoff(vmx);
+
+ if (is_nmi(vmx_get_intr_info(&vmx->vcpu))) {
+ pr_info("kvm nmi\n");
+ if (count == this_cpu_read(irq_stat.__nmi_count))
+ pr_info("kvm nmi miss\n");
+ }
+ }
}

/*

Subject: [tip: x86/urgent] KVM/VMX: Invoke NMI non-IST entry instead of IST entry

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: a217a6593cec8b315d4c2f344bae33660b39b703
Gitweb: https://git.kernel.org/tip/a217a6593cec8b315d4c2f344bae33660b39b703
Author: Lai Jiangshan <[email protected]>
AuthorDate: Tue, 04 May 2021 21:50:14 +02:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Wed, 05 May 2021 22:54:10 +02:00

KVM/VMX: Invoke NMI non-IST entry instead of IST entry

In VMX, the host NMI handler needs to be invoked after NMI VM-Exit.
Before commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect
call instead of INTn"), this was done by INTn ("int $2"). But INTn
microcode is relatively expensive, so the commit reworked NMI VM-Exit
handling to invoke the kernel handler by function call.

But this missed a detail. The NMI entry point for direct invocation is
fetched from the IDT table and called on the kernel stack. But on 64-bit
the NMI entry installed in the IDT expects to be invoked on the IST stack.
It relies on the "NMI executing" variable on the IST stack to work
correctly, which is at a fixed position in the IST stack. When the entry
point is unexpectedly called on the kernel stack, the RSP-addressed "NMI
executing" variable is obviously also on the kernel stack and is
"uninitialized" and can cause the NMI entry code to run in the wrong way.

Provide a non-ist entry point for VMX which shares the C-function with
the regular NMI entry and invoke the new asm entry point instead.

On 32-bit this just maps to the regular NMI entry point as 32-bit has no
ISTs and is not affected.

[ tglx: Made it independent for backporting, massaged changelog ]

Fixes: 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn")
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Lai Jiangshan <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]

---
arch/x86/include/asm/idtentry.h | 15 +++++++++++++++
arch/x86/kernel/nmi.c | 10 ++++++++++
arch/x86/kvm/vmx/vmx.c | 16 +++++++++-------
3 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index e35e342..73d45b0 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -588,6 +588,21 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC, xenpv_exc_machine_check);
#endif

/* NMI */
+
+#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)
+/*
+ * Special NOIST entry point for VMX which invokes this on the kernel
+ * stack. asm_exc_nmi() requires an IST to work correctly vs. the NMI
+ * 'executing' marker.
+ *
+ * On 32bit this just uses the regular NMI entry point because 32-bit does
+ * not have ISTs.
+ */
+DECLARE_IDTENTRY(X86_TRAP_NMI, exc_nmi_noist);
+#else
+#define asm_exc_nmi_noist asm_exc_nmi
+#endif
+
DECLARE_IDTENTRY_NMI(X86_TRAP_NMI, exc_nmi);
#ifdef CONFIG_XEN_PV
DECLARE_IDTENTRY_RAW(X86_TRAP_NMI, xenpv_exc_nmi);
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index bf250a3..2ef961c 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -524,6 +524,16 @@ nmi_restart:
mds_user_clear_cpu_buffers();
}

+#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)
+DEFINE_IDTENTRY_RAW(exc_nmi_noist)
+{
+ exc_nmi(regs);
+}
+#endif
+#if IS_MODULE(CONFIG_KVM_INTEL)
+EXPORT_SYMBOL_GPL(asm_exc_nmi_noist);
+#endif
+
void stop_nmi(void)
{
ignore_nmis++;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cbe0cda..b21d751 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -36,6 +36,7 @@
#include <asm/debugreg.h>
#include <asm/desc.h>
#include <asm/fpu/internal.h>
+#include <asm/idtentry.h>
#include <asm/io.h>
#include <asm/irq_remapping.h>
#include <asm/kexec.h>
@@ -6415,18 +6416,17 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)

void vmx_do_interrupt_nmi_irqoff(unsigned long entry);

-static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
+static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
+ unsigned long entry)
{
- unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
- gate_desc *desc = (gate_desc *)host_idt_base + vector;
-
kvm_before_interrupt(vcpu);
- vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
+ vmx_do_interrupt_nmi_irqoff(entry);
kvm_after_interrupt(vcpu);
}

static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
{
+ const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist;
u32 intr_info = vmx_get_intr_info(&vmx->vcpu);

/* if exit due to PF check for async PF */
@@ -6437,18 +6437,20 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
kvm_machine_check();
/* We need to handle NMIs before interrupts are enabled */
else if (is_nmi(intr_info))
- handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
+ handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
}

static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
{
u32 intr_info = vmx_get_intr_info(vcpu);
+ unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
+ gate_desc *desc = (gate_desc *)host_idt_base + vector;

if (WARN_ONCE(!is_external_intr(intr_info),
"KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
return;

- handle_interrupt_nmi_irqoff(vcpu, intr_info);
+ handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
}

static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)

2021-05-10 08:02:06

by Juergen Gross

[permalink] [raw]
Subject: Re: [PATCH 1/4] x86/xen/entry: Rename xenpv_exc_nmi to noist_exc_nmi

On 27.04.21 01:09, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
> There is no any functionality change intended. Just rename it and
> move it to arch/x86/kernel/nmi.c so that we can resue it later in
> next patch for early NMI and kvm.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Sean Christopherson <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Vitaly Kuznetsov <[email protected]>
> Cc: Wanpeng Li <[email protected]>
> Cc: Jim Mattson <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: [email protected]
> Cc: Josh Poimboeuf <[email protected]>
> Cc: Uros Bizjak <[email protected]>
> Cc: Maxim Levitsky <[email protected]>
> Signed-off-by: Lai Jiangshan <[email protected]>

Acked-by: Juergen Gross <[email protected]>


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments