LinuxLists.cc - [PATCH 00/13] SVM: vNMI (with my fixes)

2022-11-17 15:23:03

Subject: [PATCH 00/13] SVM: vNMI (with my fixes)

Hi!

This is the vNMI patch series from Santosh Shukla with few

small fixes from me:

1. When a vNMI injection is pending, then to allow to not

delay for an unbounded time the injection of another NMI that could

arrive before the first vNMI injection is done, I added the code

that would intercept IRET/RSM/STGI and then try the injection again.

2. I slighlty modified the 'KVM: SVM: Add VNMI support in get/set_nmi_mask'

to have WARN_ON in vNMI functions when called without vNMI enabled.

Also NMI mask/unmask should be allowed regardless if SMM is active,

to support migration.

3. I did some refactoring in the code which updates the int_ctl in vmcb12

on nested VM exit, and updated the patch 'KVM: nSVM: implement nested VNMI'

to use this.

4. I added my reviewed-by to all the patches which I didn't change.

I only tested this on a machine which doesn't have vNMI, so this does need

some testing to ensure that nothing is broken.

Another thing I haven't looked at in depth yet is migration, I think there is a bug

because with vNMI, now in practise we can have 2 NMIs injected to the guest,

one in service, one 'pending injection' but no longer pending from KVM point of view,

and the KVM doesn't take this in account in kvm_vcpu_ioctl_x86_get_vcpu_events,a

and maybe more.

Best regards,

Maxim Levitsky

Maxim Levitsky (5):

KVM: nSVM: don't sync back tlb_ctl on nested VM exit

KVM: nSVM: don't call nested_sync_control_from_vmcb02 on each VM exit

KVM: nSVM: rename nested_sync_control_from_vmcb02 to

nested_sync_int_ctl_from_vmcb02

KVM: nSVM: clean up copying of int_ctl fields back to vmcb01/vmcb12

KVM: SVM: allow NMI window with vNMI

Santosh Shukla (8):

x86/cpu: Add CPUID feature bit for VNMI

KVM: SVM: Add VNMI bit definition

KVM: SVM: Add VNMI support in get/set_nmi_mask

KVM: SVM: Report NMI not allowed when Guest busy handling VNMI

KVM: SVM: Add VNMI support in inject_nmi

KVM: nSVM: implement nested VNMI

KVM: nSVM: emulate VMEXIT_INVALID case for nested VNMI

KVM: SVM: Enable VNMI feature

arch/x86/include/asm/cpufeatures.h | 1 +

arch/x86/include/asm/svm.h | 7 +++

arch/x86/kvm/svm/nested.c | 84 +++++++++++++++++++++---------

arch/x86/kvm/svm/svm.c | 60 ++++++++++++++++++---

arch/x86/kvm/svm/svm.h | 70 ++++++++++++++++++++++++-

5 files changed, 189 insertions(+), 33 deletions(-)

--

2.34.3

2022-11-17 15:23:09

by Maxim Levitsky

[permalink] [raw]

Subject: [PATCH 13/13] KVM: SVM: Enable VNMI feature

From: Santosh Shukla <[email protected]>

Enable the NMI virtualization (V_NMI_ENABLE) in the VMCB interrupt
control when the vnmi module parameter is set.

Signed-off-by: Santosh Shukla <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/svm.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c9190a8ee03273..5b61d89c644da6 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1307,6 +1307,9 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
if (kvm_vcpu_apicv_active(vcpu))
avic_init_vmcb(svm, vmcb);

+ if (vnmi)
+ svm->vmcb->control.int_ctl |= V_NMI_ENABLE;
+
if (vgif) {
svm_clr_intercept(svm, INTERCEPT_STGI);
svm_clr_intercept(svm, INTERCEPT_CLGI);
--
2.34.3

2022-11-17 15:23:26

by Maxim Levitsky

[permalink] [raw]

Subject: [PATCH 07/13] KVM: SVM: Add VNMI support in get/set_nmi_mask

From: Santosh Shukla <[email protected]>

VMCB intr_ctrl bit12 (V_NMI_MASK) is set by the processor when handling
NMI in guest and is cleared after the NMI is handled. Treat V_NMI_MASK
as read-only in the hypervisor except for the SMM case where hypervisor
before entring and after leaving SMM mode requires to set and unset
V_NMI_MASK.

Adding API(get_vnmi_vmcb) in order to return the correct vmcb for L1 or
L2.

Maxim:
- made set_vnmi_mask/clear_vnmi_mask/is_vnmi_mask warn if called
without vNMI enabled
- clear IRET intercept in svm_set_nmi_mask even with vNMI

Signed-off-by: Santosh Shukla <[email protected]>
Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/svm.c | 18 ++++++++++++++-
arch/x86/kvm/svm/svm.h | 52 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 08a7b2a0a29f3a..c16f68f6c4f7d7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3618,13 +3618,29 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)

static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
{
- return !!(vcpu->arch.hflags & HF_NMI_MASK);
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ if (is_vnmi_enabled(svm))
+ return is_vnmi_mask_set(svm);
+ else
+ return !!(vcpu->arch.hflags & HF_NMI_MASK);
}

static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
{
struct vcpu_svm *svm = to_svm(vcpu);

+ if (is_vnmi_enabled(svm)) {
+ if (masked)
+ set_vnmi_mask(svm);
+ else {
+ clear_vnmi_mask(svm);
+ if (!sev_es_guest(vcpu->kvm))
+ svm_clr_intercept(svm, INTERCEPT_IRET);
+ }
+ return;
+ }
+
if (masked) {
vcpu->arch.hflags |= HF_NMI_MASK;
if (!sev_es_guest(vcpu->kvm))
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f5383104d00580..bf7f4851dee204 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -35,6 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
extern bool npt_enabled;
extern int vgif;
extern bool intercept_smi;
+extern bool vnmi;

enum avic_modes {
AVIC_MODE_NONE = 0,
@@ -531,6 +532,57 @@ static inline bool is_x2apic_msrpm_offset(u32 offset)
(msr < (APIC_BASE_MSR + 0x100));
}

+static inline struct vmcb *get_vnmi_vmcb(struct vcpu_svm *svm)
+{
+ if (!vnmi)
+ return NULL;
+
+ if (is_guest_mode(&svm->vcpu))
+ return svm->nested.vmcb02.ptr;
+ else
+ return svm->vmcb01.ptr;
+}
+
+static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
+{
+ struct vmcb *vmcb = get_vnmi_vmcb(svm);
+
+ if (vmcb)
+ return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
+ else
+ return false;
+}
+
+static inline bool is_vnmi_mask_set(struct vcpu_svm *svm)
+{
+ struct vmcb *vmcb = get_vnmi_vmcb(svm);
+
+ if (!WARN_ON_ONCE(!vmcb))
+ return false;
+
+ return !!(vmcb->control.int_ctl & V_NMI_MASK);
+}
+
+static inline void set_vnmi_mask(struct vcpu_svm *svm)
+{
+ struct vmcb *vmcb = get_vnmi_vmcb(svm);
+
+ if (!WARN_ON_ONCE(!vmcb))
+ return;
+
+ vmcb->control.int_ctl |= V_NMI_MASK;
+}
+
+static inline void clear_vnmi_mask(struct vcpu_svm *svm)
+{
+ struct vmcb *vmcb = get_vnmi_vmcb(svm);
+
+ if (!WARN_ON_ONCE(!vmcb))
+ return;
+
+ vmcb->control.int_ctl &= ~V_NMI_MASK;
+}
+
/* svm.c */
#define MSR_INVALID 0xffffffffU

--
2.34.3

2022-11-17 15:23:27

by Maxim Levitsky

[permalink] [raw]

Subject: [PATCH 10/13] KVM: SVM: Add VNMI support in inject_nmi

From: Santosh Shukla <[email protected]>

Inject the NMI by setting V_NMI in the VMCB interrupt control. processor
will clear V_NMI to acknowledge processing has started and will keep the
V_NMI_MASK set until the processor is done with processing the NMI event.

Also, handle the nmi_l1_to_l2 case such that when it is true then
NMI to be injected originally comes from L1's VMCB12 EVENTINJ field.
So adding a check for that case.

Signed-off-by: Santosh Shukla <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/svm.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index eaa30f8ace518d..9ebfbd0d4b467e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3479,7 +3479,14 @@ static void pre_svm_run(struct kvm_vcpu *vcpu)
static void svm_inject_nmi(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ struct vmcb *vmcb = NULL;

+ if (is_vnmi_enabled(svm) && !svm->nmi_l1_to_l2) {
+ vmcb = get_vnmi_vmcb(svm);
+ vmcb->control.int_ctl |= V_NMI_PENDING;
+ ++vcpu->stat.nmi_injections;
+ return;
+ }
svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;

if (svm->nmi_l1_to_l2)
--
2.34.3

2022-11-17 19:27:25

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH 07/13] KVM: SVM: Add VNMI support in get/set_nmi_mask

On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> From: Santosh Shukla <[email protected]>
>
> VMCB intr_ctrl bit12 (V_NMI_MASK) is set by the processor when handling
> NMI in guest and is cleared after the NMI is handled. Treat V_NMI_MASK
> as read-only in the hypervisor except for the SMM case where hypervisor
> before entring and after leaving SMM mode requires to set and unset
> V_NMI_MASK.
>
> Adding API(get_vnmi_vmcb) in order to return the correct vmcb for L1 or
> L2.
>
> Maxim:
> - made set_vnmi_mask/clear_vnmi_mask/is_vnmi_mask warn if called
> without vNMI enabled
> - clear IRET intercept in svm_set_nmi_mask even with vNMI
>
> Signed-off-by: Santosh Shukla <[email protected]>
> Signed-off-by: Maxim Levitsky <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 18 ++++++++++++++-
> arch/x86/kvm/svm/svm.h | 52 ++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 69 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 08a7b2a0a29f3a..c16f68f6c4f7d7 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3618,13 +3618,29 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
>
> static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> {
> - return !!(vcpu->arch.hflags & HF_NMI_MASK);
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + if (is_vnmi_enabled(svm))
> + return is_vnmi_mask_set(svm);
> + else
> + return !!(vcpu->arch.hflags & HF_NMI_MASK);
> }
>
> static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
>
> + if (is_vnmi_enabled(svm)) {
> + if (masked)
> + set_vnmi_mask(svm);

I believe not setting INTERCEPT_IRET is correct, but only because the existing
code is unnecessary. And this all very subtly relies on KVM_REQ_EVENT being set
and/or KVM already being in kvm_check_and_inject_events().

When NMIs become unblocked, INTERCEPT_IRET can be cleared, but KVM should also
pending KVM_REQ_EVENT. AFAICT, that doesn't happen when this is called via the
emulator. Ah, because em_iret() only handles RM for Intel's restricted guest
crap. I.e. it "works" only because it never happens. All other flows set
KVM_REQ_EVENT when toggling NMI blocking, e.g. the RSM path of kvm_smm_changed().

And when NMIs become blocked, there's no need to force INTERCEPT_IRET in this
code because kvm_check_and_inject_events() will request an NMI window and set the
intercept if necessary, and all paths that set NMI blocking are guaranteed to
reach kvm_check_and_inject_events() before entering the guest.

1. RSM => kvm_smm_changed() sets KVM_REQ_EVENT
2. enter_smm() is only called from within kvm_check_and_inject_events(),
before pending NMIs are processed (yay priority)
3. emulator_set_nmi_mask() never blocks NMIs, only does the half-baked IRET emulation
4. kvm_vcpu_ioctl_x86_set_vcpu_event() sets KVM_REQ_EVENT

So, can you add a prep patch to drop the forced INTERCEPT_IRET? That way the
logic for vNMI and !vNMI is the same.

> + else {
> + clear_vnmi_mask(svm);

This is the only code that sets/clears the vNMI mask, so rather than have set/clear
helpers, what about a single helper to do the dirty work?

> + if (!sev_es_guest(vcpu->kvm))
> + svm_clr_intercept(svm, INTERCEPT_IRET);
> + }
> + return;
> + }
> +
> if (masked) {
> vcpu->arch.hflags |= HF_NMI_MASK;
> if (!sev_es_guest(vcpu->kvm))
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index f5383104d00580..bf7f4851dee204 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -35,6 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
> extern bool npt_enabled;
> extern int vgif;
> extern bool intercept_smi;
> +extern bool vnmi;
>
> enum avic_modes {
> AVIC_MODE_NONE = 0,
> @@ -531,6 +532,57 @@ static inline bool is_x2apic_msrpm_offset(u32 offset)
> (msr < (APIC_BASE_MSR + 0x100));
> }
>
> +static inline struct vmcb *get_vnmi_vmcb(struct vcpu_svm *svm)
> +{
> + if (!vnmi)
> + return NULL;
> +
> + if (is_guest_mode(&svm->vcpu))
> + return svm->nested.vmcb02.ptr;
> + else
> + return svm->vmcb01.ptr;
> +}
> +
> +static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
> +{
> + struct vmcb *vmcb = get_vnmi_vmcb(svm);
> +
> + if (vmcb)
> + return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
> + else
> + return false;

Maybe just this?

return vmcb && (vmcb->control.int_ctl & V_NMI_ENABLE);

Or if an inner helper is added:

return vmcb && __is_vnmi_enabled(vmcb);

> +}
> +
> +static inline bool is_vnmi_mask_set(struct vcpu_svm *svm)
> +{
> + struct vmcb *vmcb = get_vnmi_vmcb(svm);
> +
> + if (!WARN_ON_ONCE(!vmcb))

Rather than WARN, add an inner __is_vnmi_enabled() that takes the vnmi_vmcb.
Actually, if you do that, the test/set/clear helpers can go away entirely.

> + return false;
> +
> + return !!(vmcb->control.int_ctl & V_NMI_MASK);
> +}
> +
> +static inline void set_vnmi_mask(struct vcpu_svm *svm)
> +{
> + struct vmcb *vmcb = get_vnmi_vmcb(svm);
> +
> + if (!WARN_ON_ONCE(!vmcb))
> + return;
> +
> + vmcb->control.int_ctl |= V_NMI_MASK;
> +}
> +
> +static inline void clear_vnmi_mask(struct vcpu_svm *svm)
> +{
> + struct vmcb *vmcb = get_vnmi_vmcb(svm);
> +
> + if (!WARN_ON_ONCE(!vmcb))
> + return;
> +
> + vmcb->control.int_ctl &= ~V_NMI_MASK;
> +}

These helpers can all go in svm. There are no users oustide of svm.c, and
unless I'm misunderstanding how nested works, there should never be oustide users.

E.g. with HF_NMI_MASK => svm->nmi_masked, the end result can be something like:

static bool __is_vnmi_enabled(struct *vmcb)
{
return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
}

static bool is_vnmi_enabled(struct vcpu_svm *svm)
{
struct vmcb *vmcb = get_vnmi_vmcb(svm);

return vmcb && __is_vnmi_enabled(vmcb);
}

static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = get_vnmi_vmcb(svm);

if (vmcb && __is_vnmi_enabled(vmcb))
return !!(vmcb->control.int_ctl & V_NMI_MASK);
else
return !!(vcpu->arch.hflags & HF_NMI_MASK);
}

static void svm_set_or_clear_vnmi_mask(struct vmcb *vmcb, bool set)
{
if (set)
vmcb->control.int_ctl |= V_NMI_MASK;
else
vmcb->control.int_ctl &= ~V_NMI_MASK;
}

static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
{
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = get_vnmi_vmcb(svm);

if (vmcb && __is_vnmi_enabled(vmcb)) {
if (masked)
vmcb->control.int_ctl |= V_NMI_MASK;
else
vmcb->control.int_ctl &= ~V_NMI_MASK;
} else {
svm->nmi_masked = masked;
}

if (!masked)
svm_disable_iret_interception(svm);
}

2022-11-21 13:25:08

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCH 07/13] KVM: SVM: Add VNMI support in get/set_nmi_mask

On Thu, 2022-11-17 at 18:54 +0000, Sean Christopherson wrote:
> On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> > From: Santosh Shukla <[email protected]>
> >
> > VMCB intr_ctrl bit12 (V_NMI_MASK) is set by the processor when handling
> > NMI in guest and is cleared after the NMI is handled. Treat V_NMI_MASK
> > as read-only in the hypervisor except for the SMM case where hypervisor
> > before entring and after leaving SMM mode requires to set and unset
> > V_NMI_MASK.
> >
> > Adding API(get_vnmi_vmcb) in order to return the correct vmcb for L1 or
> > L2.
> >
> > Maxim:
> >    - made set_vnmi_mask/clear_vnmi_mask/is_vnmi_mask warn if called
> >      without vNMI enabled
> >    - clear IRET intercept in svm_set_nmi_mask even with vNMI
> >
> > Signed-off-by: Santosh Shukla <[email protected]>
> > Signed-off-by: Maxim Levitsky <[email protected]>
> > ---
> > arch/x86/kvm/svm/svm.c | 18 ++++++++++++++-
> > arch/x86/kvm/svm/svm.h | 52 ++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 69 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 08a7b2a0a29f3a..c16f68f6c4f7d7 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -3618,13 +3618,29 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> >
> > static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> > {
> > -       return !!(vcpu->arch.hflags & HF_NMI_MASK);
> > +       struct vcpu_svm *svm = to_svm(vcpu);
> > +
> > +       if (is_vnmi_enabled(svm))
> > +               return is_vnmi_mask_set(svm);
> > +       else
> > +               return !!(vcpu->arch.hflags & HF_NMI_MASK);
> > }
> >
> > static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> > {
> >         struct vcpu_svm *svm = to_svm(vcpu);
> >
> > +       if (is_vnmi_enabled(svm)) {
> > +               if (masked)
> > +                       set_vnmi_mask(svm);
>
> I believe not setting INTERCEPT_IRET is correct, but only because the existing
> code is unnecessary. And this all very subtly relies on KVM_REQ_EVENT being set
> and/or KVM already being in kvm_check_and_inject_events().
>
> When NMIs become unblocked, INTERCEPT_IRET can be cleared, but KVM should also
> pending KVM_REQ_EVENT. AFAICT, that doesn't happen when this is called via the
> emulator. Ah, because em_iret() only handles RM for Intel's restricted guest
> crap. I.e. it "works" only because it never happens. All other flows set
> KVM_REQ_EVENT when toggling NMI blocking, e.g. the RSM path of kvm_smm_changed().
Makes sense

>
> And when NMIs become blocked, there's no need to force INTERCEPT_IRET in this
> code because kvm_check_and_inject_events() will request an NMI window and set the
> intercept if necessary, and all paths that set NMI blocking are guaranteed to
> reach kvm_check_and_inject_events() before entering the guest.

I think I understand what you mean.

When a normal NMI is injected, we do have to intercept IRET because this is how
we know that NMI done executing.

But if NMI becames masked then it can be either if:

1. We are already in NMI, so masking it is essintially NOP, so no need to intercept
IRET since it is already intercepted.

2. We are not in NMI, then we don't need to intercept IRET, since its interception
is not needed to track that NMI is over, but only needed to detect NMI window
(IRET can be used without a paired NMI to unblock NMIs, which is IMHO a very wrong
x86 design decision, but the ship sailed long ago), and so we can intercept IRET
only when we request an actual NMI window.

Makes sense and I'll send a patch to do it.

>
> 1. RSM => kvm_smm_changed() sets KVM_REQ_EVENT
> 2. enter_smm() is only called from within kvm_check_and_inject_events(),
>      before pending NMIs are processed (yay priority)
> 3. emulator_set_nmi_mask() never blocks NMIs, only does the half-baked IRET emulation
> 4. kvm_vcpu_ioctl_x86_set_vcpu_event() sets KVM_REQ_EVENT
>
> So, can you add a prep patch to drop the forced INTERCEPT_IRET? That way the
> logic for vNMI and !vNMI is the same.
>
> > +               else {
> > +                       clear_vnmi_mask(svm);
>
> This is the only code that sets/clears the vNMI mask, so rather than have set/clear
> helpers, what about a single helper to do the dirty work?

Or not have any hepler at all maybe, since it is only done once? I don't know
I just wanted to not change the original patches too much, I only changed
the minimum to make it work.

>
> > +                       if (!sev_es_guest(vcpu->kvm))
> > +                               svm_clr_intercept(svm, INTERCEPT_IRET);
> > +               }
> > +               return;
> > +       }
> > +
> >         if (masked) {
> >                 vcpu->arch.hflags |= HF_NMI_MASK;
> >                 if (!sev_es_guest(vcpu->kvm))
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index f5383104d00580..bf7f4851dee204 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -35,6 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
> > extern bool npt_enabled;
> > extern int vgif;
> > extern bool intercept_smi;
> > +extern bool vnmi;
> >
> > enum avic_modes {
> >         AVIC_MODE_NONE = 0,
> > @@ -531,6 +532,57 @@ static inline bool is_x2apic_msrpm_offset(u32 offset)
> >                (msr < (APIC_BASE_MSR + 0x100));
> > }
> >
> > +static inline struct vmcb *get_vnmi_vmcb(struct vcpu_svm *svm)
> > +{
> > +       if (!vnmi)
> > +               return NULL;
> > +
> > +       if (is_guest_mode(&svm->vcpu))
> > +               return svm->nested.vmcb02.ptr;
> > +       else
> > +               return svm->vmcb01.ptr;
> > +}
> > +
> > +static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
> > +{
> > +       struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > +
> > +       if (vmcb)
> > +               return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
> > +       else
> > +               return false;
>
> Maybe just this?
>
>         return vmcb && (vmcb->control.int_ctl & V_NMI_ENABLE);
>
> Or if an inner helper is added:
>
>         return vmcb && __is_vnmi_enabled(vmcb);
>
> > +}
> > +
> > +static inline bool is_vnmi_mask_set(struct vcpu_svm *svm)
> > +{
> > +       struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > +
> > +       if (!WARN_ON_ONCE(!vmcb))
>
> Rather than WARN, add an inner __is_vnmi_enabled() that takes the vnmi_vmcb.
> Actually, if you do that, the test/set/clear helpers can go away entirely.
>
> > +               return false;
> > +
> > +       return !!(vmcb->control.int_ctl & V_NMI_MASK);
> > +}
> > +
> > +static inline void set_vnmi_mask(struct vcpu_svm *svm)
> > +{
> > +       struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > +
> > +       if (!WARN_ON_ONCE(!vmcb))
> > +               return;
> > +
> > +       vmcb->control.int_ctl |= V_NMI_MASK;
> > +}
> > +
> > +static inline void clear_vnmi_mask(struct vcpu_svm *svm)
> > +{
> > +       struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > +
> > +       if (!WARN_ON_ONCE(!vmcb))
> > +               return;
> > +
> > +       vmcb->control.int_ctl &= ~V_NMI_MASK;
> > +}
>
> These helpers can all go in svm. There are no users oustide of svm.c, and
> unless I'm misunderstanding how nested works, there should never be oustide users.
>
> E.g. with HF_NMI_MASK => svm->nmi_masked, the end result can be something like:
>
> static bool __is_vnmi_enabled(struct *vmcb)
> {
>         return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
> }
>
> static bool is_vnmi_enabled(struct vcpu_svm *svm)
> {
>         struct vmcb *vmcb = get_vnmi_vmcb(svm);
>
>         return vmcb && __is_vnmi_enabled(vmcb);
> }
>
> static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> {
>         struct vcpu_svm *svm = to_svm(vcpu);
>         struct vmcb *vmcb = get_vnmi_vmcb(svm);
>
>         if (vmcb && __is_vnmi_enabled(vmcb))
>                 return !!(vmcb->control.int_ctl & V_NMI_MASK);
>         else
>                 return !!(vcpu->arch.hflags & HF_NMI_MASK);
> }
>
> static void svm_set_or_clear_vnmi_mask(struct vmcb *vmcb, bool set)
> {
>         if (set)
>                 vmcb->control.int_ctl |= V_NMI_MASK;
>         else
>                 vmcb->control.int_ctl &= ~V_NMI_MASK;
> }
>
> static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> {
>         struct vcpu_svm *svm = to_svm(vcpu);
>         struct vmcb *vmcb = get_vnmi_vmcb(svm);
>
>         if (vmcb && __is_vnmi_enabled(vmcb)) {
>                 if (masked)
>                         vmcb->control.int_ctl |= V_NMI_MASK;
>                 else
>                         vmcb->control.int_ctl &= ~V_NMI_MASK;
>         } else {
>                 svm->nmi_masked = masked;
>         }
>
>         if (!masked)
>                 svm_disable_iret_interception(svm);
> }

OK, this is one of the ways to do it, makes sense overall.
I actualy wanted to do something like that but opted to not touch
the original code too much, but only what I needed. I can do this
in a next version.

Best regards,
Maxim Levitsky

>

2022-11-21 17:29:10

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH 07/13] KVM: SVM: Add VNMI support in get/set_nmi_mask

On Mon, Nov 21, 2022, Maxim Levitsky wrote:
> On Thu, 2022-11-17 at 18:54 +0000, Sean Christopherson wrote:
> > E.g. with HF_NMI_MASK => svm->nmi_masked, the end result can be something like:
> >
> > static bool __is_vnmi_enabled(struct *vmcb)
> > {
> > ????????return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
> > }
> >
> > static bool is_vnmi_enabled(struct vcpu_svm *svm)
> > {
> > ????????struct vmcb *vmcb = get_vnmi_vmcb(svm);
> >
> > ????????return vmcb && __is_vnmi_enabled(vmcb);
> > }
> >
> > static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> > {
> > ????????struct vcpu_svm *svm = to_svm(vcpu);
> > ????????struct vmcb *vmcb = get_vnmi_vmcb(svm);
> >
> > ????????if (vmcb && __is_vnmi_enabled(vmcb))
> > ????????????????return !!(vmcb->control.int_ctl & V_NMI_MASK);
> > ????????else
> > ????????????????return !!(vcpu->arch.hflags & HF_NMI_MASK);
> > }
> >
> > static void svm_set_or_clear_vnmi_mask(struct vmcb *vmcb, bool set)
> > {
> > ????????if (set)
> > ????????????????vmcb->control.int_ctl |= V_NMI_MASK;
> > ????????else
> > ????????????????vmcb->control.int_ctl &= ~V_NMI_MASK;
> > }
> >
> > static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> > {
> > ????????struct vcpu_svm *svm = to_svm(vcpu);
> > ????????struct vmcb *vmcb = get_vnmi_vmcb(svm);
> >
> > ????????if (vmcb && __is_vnmi_enabled(vmcb)) {
> > ????????????????if (masked)
> > ????????????????????????vmcb->control.int_ctl |= V_NMI_MASK;
> > ????????????????else
> > ????????????????????????vmcb->control.int_ctl &= ~V_NMI_MASK;
> > ????????} else {
> > ????????????????svm->nmi_masked = masked;
> > ????????}
> >
> > ????????if (!masked)
> > ????????????????svm_disable_iret_interception(svm);
> > }
>
> OK, this is one of the ways to do it, makes sense overall.
> I actualy wanted to do something like that but opted to not touch
> the original code too much, but only what I needed. I can do this
> in a next version.

After looking at more of this code, I think having get_vnmi_vmcb() is a mistake.
It just ends up being a funky wrapper to the current svm->vmcb. And the manual
check on the "vnmi" global is pointless. If KVM sets V_NMI_ENABLE in any VMCB
when vnmi=false, then that's a KVM bug.

Dropping the wrapper eliminates the possibility of a NULL VMCB pointer, and IMO
yields far more readable code.

static bool is_vnmi_enabled(struct vcpu_svm *svm)
{
return !!(svm->vmcb->control.int_ctl & V_NMI_ENABLE);
}

static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);

if (is_vnmi_enabled(svm))
return !!(svm->vmcb->control.int_ctl & V_NMI_MASK);
else
return !!(vcpu->arch.hflags & HF_NMI_MASK);
}

static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
{
struct vcpu_svm *svm = to_svm(vcpu);

if (is_vnmi_enabled(svm)) {
if (masked)
svm->vmcb->control.int_ctl |= V_NMI_MASK;
else
svm->vmcb->control.int_ctl &= ~V_NMI_MASK;
} else {
svm->nmi_masked = masked;
}

if (!masked)
svm_disable_iret_interception(svm);
}

2022-11-21 17:51:47

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH 10/13] KVM: SVM: Add VNMI support in inject_nmi

On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> From: Santosh Shukla <[email protected]>
>
> Inject the NMI by setting V_NMI in the VMCB interrupt control. processor
> will clear V_NMI to acknowledge processing has started and will keep the
> V_NMI_MASK set until the processor is done with processing the NMI event.
>
> Also, handle the nmi_l1_to_l2 case such that when it is true then
> NMI to be injected originally comes from L1's VMCB12 EVENTINJ field.
> So adding a check for that case.
>
> Signed-off-by: Santosh Shukla <[email protected]>
> Reviewed-by: Maxim Levitsky <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index eaa30f8ace518d..9ebfbd0d4b467e 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3479,7 +3479,14 @@ static void pre_svm_run(struct kvm_vcpu *vcpu)
> static void svm_inject_nmi(struct kvm_vcpu *vcpu)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> + struct vmcb *vmcb = NULL;

As written, no need to initialize vmcb. Might be a moot point depending on the
final form of the code.

> + if (is_vnmi_enabled(svm) && !svm->nmi_l1_to_l2) {

Checking nmi_l1_to_l2 is wrong. KVM should directly re-inject any NMI that was
already recognized by hardware, not just those that were originally injected by
L1.

If another event comes along, e.g. SMI, because an event (NMI) is already injected,
KVM will send a hardware IRQ to interrupt the guest and forcea a VM-Exit so that
the SMI can be injected. If hardware does the (IMO) sane thing and prioritizes
"real" IRQs over virtual NMIs, the IRQ VM-Exit will occur before the virtual NMI
is processed and KVM will incorrectly service the SMI before the NMI.

I believe the correct way to handle this is to add a @reinjected param to
->inject_nmi(), a la ->inject_irq(). That would also allow adding a sanity check
that KVM never attempts to inject an NMI into L2 if NMIs are supposed to trigger
VM-Exit.

This is the least ugly code I could come up with. Note, if vNMI is enabled,
hardare sets V_NMI_MASKED if an NMI is injected through event_inj.

static void svm_inject_nmi(struct kvm_vcpu *vcpu, bool reinjected)
{
struct vcpu_svm *svm = to_svm(vcpu);

/*
* Except for re-injection, KVM should never inject an NMI into L2 if
* NMIs are supposed to exit from L2 to L1.
*/
WARN_ON_ONCE(!reinjected && is_guest_mode(vcpu) && nested_exit_on_nmi(svm));

if (is_vnmi_enabled(svm)) {
if (!reinjected)
svm->vmcb->control.int_ctl |= V_NMI_PENDING;
else
svm->vmcb->control.event_inj = SVM_EVTINJ_VALID |
SVM_EVTINJ_TYPE_NMI;
++vcpu->stat.nmi_injections;
return;
}

svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;

if (svm->nmi_l1_to_l2)
return;

vcpu->arch.hflags |= HF_NMI_MASK;
if (!sev_es_guest(vcpu->kvm))
svm_set_intercept(svm, INTERCEPT_IRET);
++vcpu->stat.nmi_injections;
}

2022-12-04 19:57:02

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCH 07/13] KVM: SVM: Add VNMI support in get/set_nmi_mask

On Thu, 2022-11-17 at 18:54 +0000, Sean Christopherson wrote:
> On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> > From: Santosh Shukla <[email protected]>
> >
> > VMCB intr_ctrl bit12 (V_NMI_MASK) is set by the processor when handling
> > NMI in guest and is cleared after the NMI is handled. Treat V_NMI_MASK
> > as read-only in the hypervisor except for the SMM case where hypervisor
> > before entring and after leaving SMM mode requires to set and unset
> > V_NMI_MASK.
> >
> > Adding API(get_vnmi_vmcb) in order to return the correct vmcb for L1 or
> > L2.
> >
> > Maxim:
> > - made set_vnmi_mask/clear_vnmi_mask/is_vnmi_mask warn if called
> > without vNMI enabled
> > - clear IRET intercept in svm_set_nmi_mask even with vNMI
> >
> > Signed-off-by: Santosh Shukla <[email protected]>
> > Signed-off-by: Maxim Levitsky <[email protected]>
> > ---
> > arch/x86/kvm/svm/svm.c | 18 ++++++++++++++-
> > arch/x86/kvm/svm/svm.h | 52 ++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 69 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 08a7b2a0a29f3a..c16f68f6c4f7d7 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -3618,13 +3618,29 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> >
> > static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> > {
> > - return !!(vcpu->arch.hflags & HF_NMI_MASK);
> > + struct vcpu_svm *svm = to_svm(vcpu);
> > +
> > + if (is_vnmi_enabled(svm))
> > + return is_vnmi_mask_set(svm);
> > + else
> > + return !!(vcpu->arch.hflags & HF_NMI_MASK);
> > }
> >
> > static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> > {
> > struct vcpu_svm *svm = to_svm(vcpu);
> >
> > + if (is_vnmi_enabled(svm)) {
> > + if (masked)
> > + set_vnmi_mask(svm);
>
> I believe not setting INTERCEPT_IRET is correct, but only because the existing
> code is unnecessary. And this all very subtly relies on KVM_REQ_EVENT being set
> and/or KVM already being in kvm_check_and_inject_events().

Actually after thinking about this again, I am almost sure that you are wrong about this:

KVM sets INTERCEPT_IRET not only because of detection of NMI window but to know
when the NMI handler execution ended.

Even if no additional NMI arrives, if you don't intercept IRET, KVM will think that
NMI are never unmasked, and in particular when another NMI arrives, KVM will
try to open an NMI window while it is already open.

Now the svm_set_nmi_mask masks NMI either when it is called by SMM entry point,
or on migaration.

On SMM entry point, most of the time the NMIs will be unmasked by RSM, but
unsoliced IRET is also an (official) way to unmask NMI, which we will miss
if we don't intercept IRET.

On migartion, also, if it happened during NMI, we also have to intercept IRET once loading the
migration stream so that we can know when NMI ended in the same way.

All of this is of course only true for !vNMI case.

For vNMI case it turns out that we don't need to intercept IRET at all after all:

Turns out that when vNMI is pending, you can still EVENTINJ another NMI, and the pending
vNMI will be kept pending, vNMI will became masked due to EVENTINJ, and on IRET the pending vNMI
will be serviced as well, so in total both NMIs will be serviced.

I confirmed this with Santosh Shukla, and in my patch series I take an advantage of
this by failing the delayed NMI injection, and making KVM fail back to EVENTINJ.

If NMIs are blocked (VNMI_MASKED bit set) on the other hand, and a vNMI is pending,
then we can't use EVENTINJ to inject the second NMI, but in this case KVM drops
the second NMI anyway.

It all seems to add up very nicely, please take a look at the patch series
([PATCH v2 00/11] SVM: vNMI (with my fixes))

Best regards,
Maxim Levitsky

>
> When NMIs become unblocked, INTERCEPT_IRET can be cleared, but KVM should also
> pending KVM_REQ_EVENT. AFAICT, that doesn't happen when this is called via the
> emulator. Ah, because em_iret() only handles RM for Intel's restricted guest
> crap. I.e. it "works" only because it never happens. All other flows set
> KVM_REQ_EVENT when toggling NMI blocking, e.g. the RSM path of kvm_smm_changed().
>
> And when NMIs become blocked, there's no need to force INTERCEPT_IRET in this
> code because kvm_check_and_inject_events() will request an NMI window and set the
> intercept if necessary, and all paths that set NMI blocking are guaranteed to
> reach kvm_check_and_inject_events() before entering the guest.
>
> 1. RSM => kvm_smm_changed() sets KVM_REQ_EVENT
> 2. enter_smm() is only called from within kvm_check_and_inject_events(),
> before pending NMIs are processed (yay priority)
> 3. emulator_set_nmi_mask() never blocks NMIs, only does the half-baked IRET emulation
> 4. kvm_vcpu_ioctl_x86_set_vcpu_event() sets KVM_REQ_EVENT
>
> So, can you add a prep patch to drop the forced INTERCEPT_IRET? That way the
> logic for vNMI and !vNMI is the same.
>
> > + else {
> > + clear_vnmi_mask(svm);
>
> This is the only code that sets/clears the vNMI mask, so rather than have set/clear
> helpers, what about a single helper to do the dirty work?
>
> > + if (!sev_es_guest(vcpu->kvm))
> > + svm_clr_intercept(svm, INTERCEPT_IRET);
> > + }
> > + return;
> > + }
> > +
> > if (masked) {
> > vcpu->arch.hflags |= HF_NMI_MASK;
> > if (!sev_es_guest(vcpu->kvm))
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index f5383104d00580..bf7f4851dee204 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -35,6 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
> > extern bool npt_enabled;
> > extern int vgif;
> > extern bool intercept_smi;
> > +extern bool vnmi;
> >
> > enum avic_modes {
> > AVIC_MODE_NONE = 0,
> > @@ -531,6 +532,57 @@ static inline bool is_x2apic_msrpm_offset(u32 offset)
> > (msr < (APIC_BASE_MSR + 0x100));
> > }
> >
> > +static inline struct vmcb *get_vnmi_vmcb(struct vcpu_svm *svm)
> > +{
> > + if (!vnmi)
> > + return NULL;
> > +
> > + if (is_guest_mode(&svm->vcpu))
> > + return svm->nested.vmcb02.ptr;
> > + else
> > + return svm->vmcb01.ptr;
> > +}
> > +
> > +static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
> > +{
> > + struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > +
> > + if (vmcb)
> > + return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
> > + else
> > + return false;
>
> Maybe just this?
>
> return vmcb && (vmcb->control.int_ctl & V_NMI_ENABLE);
>
> Or if an inner helper is added:
>
> return vmcb && __is_vnmi_enabled(vmcb);
>
> > +}
> > +
> > +static inline bool is_vnmi_mask_set(struct vcpu_svm *svm)
> > +{
> > + struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > +
> > + if (!WARN_ON_ONCE(!vmcb))
>
> Rather than WARN, add an inner __is_vnmi_enabled() that takes the vnmi_vmcb.
> Actually, if you do that, the test/set/clear helpers can go away entirely.
>
> > + return false;
> > +
> > + return !!(vmcb->control.int_ctl & V_NMI_MASK);
> > +}
> > +
> > +static inline void set_vnmi_mask(struct vcpu_svm *svm)
> > +{
> > + struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > +
> > + if (!WARN_ON_ONCE(!vmcb))
> > + return;
> > +
> > + vmcb->control.int_ctl |= V_NMI_MASK;
> > +}
> > +
> > +static inline void clear_vnmi_mask(struct vcpu_svm *svm)
> > +{
> > + struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > +
> > + if (!WARN_ON_ONCE(!vmcb))
> > + return;
> > +
> > + vmcb->control.int_ctl &= ~V_NMI_MASK;
> > +}
>
> These helpers can all go in svm. There are no users oustide of svm.c, and
> unless I'm misunderstanding how nested works, there should never be oustide users.
>
> E.g. with HF_NMI_MASK => svm->nmi_masked, the end result can be something like:
>
> static bool __is_vnmi_enabled(struct *vmcb)
> {
> return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
> }
>
> static bool is_vnmi_enabled(struct vcpu_svm *svm)
> {
> struct vmcb *vmcb = get_vnmi_vmcb(svm);
>
> return vmcb && __is_vnmi_enabled(vmcb);
> }
>
> static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> struct vmcb *vmcb = get_vnmi_vmcb(svm);
>
> if (vmcb && __is_vnmi_enabled(vmcb))
> return !!(vmcb->control.int_ctl & V_NMI_MASK);
> else
> return !!(vcpu->arch.hflags & HF_NMI_MASK);
> }
>
> static void svm_set_or_clear_vnmi_mask(struct vmcb *vmcb, bool set)
> {
> if (set)
> vmcb->control.int_ctl |= V_NMI_MASK;
> else
> vmcb->control.int_ctl &= ~V_NMI_MASK;
> }
>
> static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> struct vmcb *vmcb = get_vnmi_vmcb(svm);
>
> if (vmcb && __is_vnmi_enabled(vmcb)) {
> if (masked)
> vmcb->control.int_ctl |= V_NMI_MASK;
> else
> vmcb->control.int_ctl &= ~V_NMI_MASK;
> } else {
> svm->nmi_masked = masked;
> }
>
> if (!masked)
> svm_disable_iret_interception(svm);
> }
>

2022-12-06 19:00:11

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH 07/13] KVM: SVM: Add VNMI support in get/set_nmi_mask

On Sun, Dec 04, 2022, Maxim Levitsky wrote:
> For vNMI case it turns out that we don't need to intercept IRET at all after all:
>
> Turns out that when vNMI is pending, you can still EVENTINJ another NMI, and
> the pending vNMI will be kept pending, vNMI will became masked due to
> EVENTINJ, and on IRET the pending vNMI will be serviced as well, so in total
> both NMIs will be serviced.

I believe past me was thinking that the "merging" of pending NMIs happened in the
context of the sender, but it always happens in the context of the target vCPU.
Senders always do KVM_REQ_NMI, i.e. always kick if the vCPU in running, which gives
KVM the hook it needs to update the VMCB.

So yeah, as long as KVM can stuff two NMIs into the VMCB, I think we're good.
I'll give the series a proper review in the next week or so.