LinuxLists.cc - [PATCHv5 0/8] Virtual NMI feature

2022-10-27 09:21:29

Subject: [PATCHv5 0/8] Virtual NMI feature

VNMI Spec is at [1].

Change History:

v5 (6.1-rc2)
01,02,06 - Renamed s/X86_FEATURE_V_NMI/X86_FEATURE_AMD_VNMI (Jim Mattson)

v4 (v6.0-rc3):
https://lore.kernel.org/all/[email protected]/

v3 (rebased on eb555cb5b794f):
https://lore.kernel.org/all/[email protected]/

v2:
https://lore.kernel.org/lkml/[email protected]/T/#m4bf8a131748688fed00ab0fefdcac209a169e202

v1:
https://lore.kernel.org/all/[email protected]/

Description:
Currently, NMI is delivered to the guest using the Event Injection
mechanism [2]. The Event Injection mechanism does not block the delivery
of subsequent NMIs. So the Hypervisor needs to track the NMI delivery
and its completion(by intercepting IRET) before sending a new NMI.

Virtual NMI (VNMI) allows the hypervisor to inject the NMI into the guest
w/o using Event Injection mechanism meaning not required to track the
guest NMI and intercepting the IRET. To achieve that,
VNMI feature provides virtualized NMI and NMI_MASK capability bits in
VMCB intr_control -
V_NMI(11) - Indicates whether a virtual NMI is pending in the guest.
V_NMI_MASK(12) - Indicates whether virtual NMI is masked in the guest.
V_NMI_ENABLE(26) - Enables the NMI virtualization feature for the guest.

When Hypervisor wants to inject NMI, it will set V_NMI bit, Processor will
clear the V_NMI bit and Set the V_NMI_MASK which means the Guest is
handling NMI, After the guest handled the NMI, The processor will clear
the V_NMI_MASK on the successful completion of IRET instruction
Or if VMEXIT occurs while delivering the virtual NMI.

If NMI virtualization enabled and NMI_INTERCEPT bit is unset
then HW will exit with #INVALID exit reason.

To enable the VNMI capability, Hypervisor need to program
V_NMI_ENABLE bit 1.

The presence of this feature is indicated via the CPUID function
0x8000000A_EDX[25].

Testing -
* Used qemu's `inject_nmi` for testing.
* tested with and w/o AVIC case.
* tested with kvm-unit-test
* tested with vGIF enable and disable.
* tested nested env:
- L1+L2 using vnmi
- L1 using vnmi and L2 not

Thanks,
Santosh
[1] https://www.amd.com/en/support/tech-docs/amd64-architecture-programmers-manual-volumes-1-5
(Ch-15.21.10 - NMI Virtualization)

[2] https://www.amd.com/en/support/tech-docs/amd64-architecture-programmers-manual-volumes-1-5
(Ch-15.20 - Event Injection)

Santosh Shukla (8):
x86/cpu: Add CPUID feature bit for VNMI
KVM: SVM: Add VNMI bit definition
KVM: SVM: Add VNMI support in get/set_nmi_mask
KVM: SVM: Report NMI not allowed when Guest busy handling VNMI
KVM: SVM: Add VNMI support in inject_nmi
KVM: nSVM: implement nested VNMI
KVM: nSVM: emulate VMEXIT_INVALID case for nested VNMI
KVM: SVM: Enable VNMI feature

arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/svm.h | 7 +++
arch/x86/kvm/svm/nested.c | 32 ++++++++++++++
arch/x86/kvm/svm/svm.c | 44 ++++++++++++++++++-
arch/x86/kvm/svm/svm.h | 68 ++++++++++++++++++++++++++++++
5 files changed, 151 insertions(+), 1 deletion(-)

--
2.25.1

2022-10-27 09:31:59

by Santosh Shukla

[permalink] [raw]

Subject: [PATCHv5 1/8] x86/cpu: Add CPUID feature bit for VNMI

VNMI feature allows the hypervisor to inject NMI into the guest w/o
using Event injection mechanism, The benefit of using VNMI over the
event Injection that does not require tracking the Guest's NMI state and
intercepting the IRET for the NMI completion. VNMI achieves that by
exposing 3 capability bits in VMCB intr_cntrl which helps with
virtualizing NMI injection and NMI_Masking.

The presence of this feature is indicated via the CPUID function
0x8000000A_EDX[25].

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Santosh Shukla <[email protected]>
---
v5:
- Rename s/X86_FEATURE_V_NMI/X86_FEATURE_AMD_VNMI

arch/x86/include/asm/cpufeatures.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index b71f4f2ecdd5..23bd69848d27 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -356,6 +356,7 @@
#define X86_FEATURE_VGIF (15*32+16) /* Virtual GIF */
#define X86_FEATURE_X2AVIC (15*32+18) /* Virtual x2apic */
#define X86_FEATURE_V_SPEC_CTRL (15*32+20) /* Virtual SPEC_CTRL */
+#define X86_FEATURE_AMD_VNMI (15*32+25) /* Virtual NMI */
#define X86_FEATURE_SVME_ADDR_CHK (15*32+28) /* "" SVME addr check */

/* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */
--
2.25.1

2022-11-14 08:49:05

by Santosh Shukla

[permalink] [raw]

Subject: Re: [PATCHv5 0/8] Virtual NMI feature

On 10/27/2022 2:08 PM, Santosh Shukla wrote:
> VNMI Spec is at [1].
>
> Change History:
>
> v5 (6.1-rc2)
> 01,02,06 - Renamed s/X86_FEATURE_V_NMI/X86_FEATURE_AMD_VNMI (Jim Mattson)
>

Gentle reminder.

Thanks,
Santosh

2022-11-14 15:03:43

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCHv5 0/8] Virtual NMI feature

On Mon, 2022-11-14 at 13:32 +0530, Santosh Shukla wrote:
>
>
> On 10/27/2022 2:08 PM, Santosh Shukla wrote:
> > VNMI Spec is at [1].
> >
> > Change History:
> >
> > v5 (6.1-rc2)
> > 01,02,06 - Renamed s/X86_FEATURE_V_NMI/X86_FEATURE_AMD_VNMI (Jim Mattson)
> >
>
> Gentle reminder.
>
> Thanks,
> Santosh
>

I started reviewing it today and I think there are still few issues,
and the biggest one is that if a NMI arrives while vNMI injection
is pending, current code just drops such NMI.

We had a discussion about this, like forcing immeditate vm exit
in this case and such but I have a simplier idea:

In this case we can just open the NMI window in the good old way
by intercepting IRET, STGI, and or RSM (which is intercepted anyway),

and only if we already *just* intercepted IRET, only then just drop
the new NMI instead of single stepping over it based on reasoning that
its 3rd NMI (one is almost done the servicing (its IRET is executing),
one is pending injection, and we want to inject another one.

Does this sound good to you? It won't work for SEV-ES as it looks
like it doesn't intercept IRET, but it might be a reasonable tradeof
for SEV-ES guests to accept that we can't inject a NMI if one is
already pending injection.

Best regards,
Maxim Levitsky

2022-11-16 06:01:25

by Santosh Shukla

[permalink] [raw]

Subject: Re: [PATCHv5 0/8] Virtual NMI feature

Hi Maxim,

On 11/14/2022 8:01 PM, Maxim Levitsky wrote:
> On Mon, 2022-11-14 at 13:32 +0530, Santosh Shukla wrote:
>>
>>
>> On 10/27/2022 2:08 PM, Santosh Shukla wrote:
>>> VNMI Spec is at [1].
>>>
>>> Change History:
>>>
>>> v5 (6.1-rc2)
>>> 01,02,06 - Renamed s/X86_FEATURE_V_NMI/X86_FEATURE_AMD_VNMI (Jim Mattson)
>>>
>>
>> Gentle reminder.
>>
>> Thanks,
>> Santosh
>>
>
> I started reviewing it today and I think there are still few issues,
> and the biggest one is that if a NMI arrives while vNMI injection
> is pending, current code just drops such NMI.
>
> We had a discussion about this, like forcing immeditate vm exit

I believe, We discussed above case in [1] i.e.. HW can handle
the second (/pending)virtual NMI while the guest processing first virtual NMI w/o vmexit.
is it same scenario or different one that you are mentioning?

[1] https://lore.kernel.org/lkml/[email protected]/

Thanks,
Santosh

> in this case and such but I have a simplier idea:
>
> In this case we can just open the NMI window in the good old way
> by intercepting IRET, STGI, and or RSM (which is intercepted anyway),
>
> and only if we already *just* intercepted IRET, only then just drop
> the new NMI instead of single stepping over it based on reasoning that
> its 3rd NMI (one is almost done the servicing (its IRET is executing),
> one is pending injection, and we want to inject another one.
>
> Does this sound good to you? It won't work for SEV-ES as it looks
> like it doesn't intercept IRET, but it might be a reasonable tradeof
> for SEV-ES guests to accept that we can't inject a NMI if one is
> already pending injection.
>
> Best regards,
> Maxim Levitsky
>

2022-11-16 09:36:12

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCHv5 0/8] Virtual NMI feature

On Wed, 2022-11-16 at 11:10 +0530, Santosh Shukla wrote:
> Hi Maxim,
>
> On 11/14/2022 8:01 PM, Maxim Levitsky wrote:
> > On Mon, 2022-11-14 at 13:32 +0530, Santosh Shukla wrote:
> > >
> > >
> > > On 10/27/2022 2:08 PM, Santosh Shukla wrote:
> > > > VNMI Spec is at [1].
> > > >
> > > > Change History:
> > > >
> > > > v5 (6.1-rc2)
> > > > 01,02,06 - Renamed s/X86_FEATURE_V_NMI/X86_FEATURE_AMD_VNMI (Jim Mattson)
> > > >
> > >
> > > Gentle reminder.
> > >
> > > Thanks,
> > > Santosh
> > >
> >
> > I started reviewing it today and I think there are still few issues,
> > and the biggest one is that if a NMI arrives while vNMI injection
> > is pending, current code just drops such NMI.
> >
> > We had a discussion about this, like forcing immeditate vm exit
>
> I believe, We discussed above case in [1] i.e.. HW can handle
> the second (/pending)virtual NMI while the guest processing first virtual NMI w/o vmexit.
> is it same scenario or different one that you are mentioning?
>
> [1] https://lore.kernel.org/lkml/[email protected]/

You misunderstood the issue.

Hardware can handle the case when a NMI is in service (that is V_NMI_MASK is set) and another one is injected
(V_NMI_PENDING can be set),

but it is not possible to handle the case when a NMI is already injected (V_NMI_PENDING set) but
and KVM wants to inject another one before the first one went into the service (that is V_NMI_MASK is not set
yet).

Also same can happen when NMIs are blocked in SMM, since V_NMI_MASK is set despite no NMI in service,
we will be able to inject only one NMI by setting the V_NMI_PENDING.

I think I was able to solve all these issues and I will today post a modified patch series of yours,
which should cover all these cases and have some nice refactoring as well.

Best regards,
Maxim Levitsky

>
> Thanks,
> Santosh
>
> > in this case and such but I have a simplier idea:
> >
> > In this case we can just open the NMI window in the good old way
> > by intercepting IRET, STGI, and or RSM (which is intercepted anyway),
> >
> > and only if we already *just* intercepted IRET, only then just drop
> > the new NMI instead of single stepping over it based on reasoning that
> > its 3rd NMI (one is almost done the servicing (its IRET is executing),
> > one is pending injection, and we want to inject another one.
> >
> > Does this sound good to you? It won't work for SEV-ES as it looks
> > like it doesn't intercept IRET, but it might be a reasonable tradeof
> > for SEV-ES guests to accept that we can't inject a NMI if one is
> > already pending injection.
> >
> > Best regards,
> > Maxim Levitsky
> >
>

2022-11-16 16:04:53

by Santosh Shukla

[permalink] [raw]

Subject: Re: [PATCHv5 0/8] Virtual NMI feature

Hello Maxim,.

On 11/16/2022 2:51 PM, Maxim Levitsky wrote:
> On Wed, 2022-11-16 at 11:10 +0530, Santosh Shukla wrote:
>> Hi Maxim,
>>
>> On 11/14/2022 8:01 PM, Maxim Levitsky wrote:
>>> On Mon, 2022-11-14 at 13:32 +0530, Santosh Shukla wrote:
>>>>
>>>>
>>>> On 10/27/2022 2:08 PM, Santosh Shukla wrote:
>>>>> VNMI Spec is at [1].
>>>>>
>>>>> Change History:
>>>>>
>>>>> v5 (6.1-rc2)
>>>>> 01,02,06 - Renamed s/X86_FEATURE_V_NMI/X86_FEATURE_AMD_VNMI (Jim Mattson)
>>>>>
>>>>
>>>> Gentle reminder.
>>>>
>>>> Thanks,
>>>> Santosh
>>>>
>>>
>>> I started reviewing it today and I think there are still few issues,
>>> and the biggest one is that if a NMI arrives while vNMI injection
>>> is pending, current code just drops such NMI.
>>>
>>> We had a discussion about this, like forcing immeditate vm exit
>>
>> I believe, We discussed above case in [1] i.e.. HW can handle
>> the second (/pending)virtual NMI while the guest processing first virtual NMI w/o vmexit.
>> is it same scenario or different one that you are mentioning?
>>
>> [1] https://lore.kernel.org/lkml/[email protected]/>>
> You misunderstood the issue.
>
> Hardware can handle the case when a NMI is in service (that is V_NMI_MASK is set) and another one is injected
> (V_NMI_PENDING can be set),
>
> but it is not possible to handle the case when a NMI is already injected (V_NMI_PENDING set) but
> and KVM wants to inject another one before the first one went into the service (that is V_NMI_MASK is not set
> yet).
>

In this case, HW will collapse the NMI.

Note that the HW will take the pending NMI at the boundary of IRET instruction such that
it will check for the V_NMI_PENDING and if its set then HW will *take* the NMI,
HW will clear the V_NMI_PENDING bit and set the V_NMI_MASK w/o the VMEXIT!,.

> Also same can happen when NMIs are blocked in SMM, since V_NMI_MASK is set despite no NMI in service,
> we will be able to inject only one NMI by setting the V_NMI_PENDING.
>

Ditto,. HW will collapse the NMI.

Thanks,
Santosh

> I think I was able to solve all these issues and I will today post a modified patch series of yours,
> which should cover all these cases and have some nice refactoring as well.
>
>
> Best regards,
> Maxim Levitsky
>
>
>>
>> Thanks,
>> Santosh
>>
>>> in this case and such but I have a simplier idea:
>>>
>>> In this case we can just open the NMI window in the good old way
>>> by intercepting IRET, STGI, and or RSM (which is intercepted anyway),
>>>
>>> and only if we already *just* intercepted IRET, only then just drop
>>> the new NMI instead of single stepping over it based on reasoning that
>>> its 3rd NMI (one is almost done the servicing (its IRET is executing),
>>> one is pending injection, and we want to inject another one.
>>>
>>> Does this sound good to you? It won't work for SEV-ES as it looks
>>> like it doesn't intercept IRET, but it might be a reasonable tradeof
>>> for SEV-ES guests to accept that we can't inject a NMI if one is
>>> already pending injection.
>>>
>>> Best regards,
>>> Maxim Levitsky
>>>
>>
>
>

2022-11-16 16:20:14

by Tom Lendacky

[permalink] [raw]

Subject: Re: [PATCHv5 0/8] Virtual NMI feature

On 11/16/22 09:44, Santosh Shukla wrote:
> Hello Maxim,.
>
> On 11/16/2022 2:51 PM, Maxim Levitsky wrote:
>> On Wed, 2022-11-16 at 11:10 +0530, Santosh Shukla wrote:
>>> Hi Maxim,
>>>
>>> On 11/14/2022 8:01 PM, Maxim Levitsky wrote:
>>>> On Mon, 2022-11-14 at 13:32 +0530, Santosh Shukla wrote:
>>>>>
>>>>>
>>>>> On 10/27/2022 2:08 PM, Santosh Shukla wrote:
>>>>>> VNMI Spec is at [1].
>>>>>>
>>>>>> Change History:
>>>>>>
>>>>>> v5 (6.1-rc2)
>>>>>> 01,02,06 - Renamed s/X86_FEATURE_V_NMI/X86_FEATURE_AMD_VNMI (Jim Mattson)
>>>>>>
>>>>>
>>>>> Gentle reminder.
>>>>>
>>>>> Thanks,
>>>>> Santosh
>>>>>
>>>>
>>>> I started reviewing it today and I think there are still few issues,
>>>> and the biggest one is that if a NMI arrives while vNMI injection
>>>> is pending, current code just drops such NMI.
>>>>
>>>> We had a discussion about this, like forcing immeditate vm exit
>>>
>>> I believe, We discussed above case in [1] i.e.. HW can handle
>>> the second (/pending)virtual NMI while the guest processing first virtual NMI w/o vmexit.
>>> is it same scenario or different one that you are mentioning?
>>>
>>> [1] https://lore.kernel.org/lkml/[email protected]/>>
>> You misunderstood the issue.
>>
>> Hardware can handle the case when a NMI is in service (that is V_NMI_MASK is set) and another one is injected
>> (V_NMI_PENDING can be set),
>>
>> but it is not possible to handle the case when a NMI is already injected (V_NMI_PENDING set) but
>> and KVM wants to inject another one before the first one went into the service (that is V_NMI_MASK is not set
>> yet).
>>
>
> In this case, HW will collapse the NMI.
>
> Note that the HW will take the pending NMI at the boundary of IRET instruction such that
> it will check for the V_NMI_PENDING and if its set then HW will *take* the NMI,
> HW will clear the V_NMI_PENDING bit and set the V_NMI_MASK w/o the VMEXIT!,.
>
>
>> Also same can happen when NMIs are blocked in SMM, since V_NMI_MASK is set despite no NMI in service,
>> we will be able to inject only one NMI by setting the V_NMI_PENDING.
>>
>
> Ditto,. HW will collapse the NMI.

Note, this is how bare-metal NMIs are also handled. Multiple NMIs are
collapsed into a single NMI if they are received while an NMI is currently
being processed.

Thanks,
Tom

>
> Thanks,
> Santosh
>
>> I think I was able to solve all these issues and I will today post a modified patch series of yours,
>> which should cover all these cases and have some nice refactoring as well.
>>
>>
>> Best regards,
>> Maxim Levitsky
>>
>>
>>>
>>> Thanks,
>>> Santosh
>>>
>>>> in this case and such but I have a simplier idea:
>>>>
>>>> In this case we can just open the NMI window in the good old way
>>>> by intercepting IRET, STGI, and or RSM (which is intercepted anyway),
>>>>
>>>> and only if we already *just* intercepted IRET, only then just drop
>>>> the new NMI instead of single stepping over it based on reasoning that
>>>> its 3rd NMI (one is almost done the servicing (its IRET is executing),
>>>> one is pending injection, and we want to inject another one.
>>>>
>>>> Does this sound good to you? It won't work for SEV-ES as it looks
>>>> like it doesn't intercept IRET, but it might be a reasonable tradeof
>>>> for SEV-ES guests to accept that we can't inject a NMI if one is
>>>> already pending injection.
>>>>
>>>> Best regards,
>>>> Maxim Levitsky
>>>>
>>>
>>
>>
>

2022-11-16 17:25:42

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCHv5 0/8] Virtual NMI feature

On Wed, Nov 16, 2022, Tom Lendacky wrote:
> On 11/16/22 09:44, Santosh Shukla wrote:
> > Hello Maxim,.
> >
> > On 11/16/2022 2:51 PM, Maxim Levitsky wrote:
> > > On Wed, 2022-11-16 at 11:10 +0530, Santosh Shukla wrote:
> > > > Hi Maxim,
> > > >
> > > > On 11/14/2022 8:01 PM, Maxim Levitsky wrote:
> > > > > On Mon, 2022-11-14 at 13:32 +0530, Santosh Shukla wrote:
> > > > > I started reviewing it today and I think there are still few issues,
> > > > > and the biggest one is that if a NMI arrives while vNMI injection
> > > > > is pending, current code just drops such NMI.

I don't think it gets dropped, just improperly delayed.

> > > > > We had a discussion about this, like forcing immeditate vm exit
> > > >
> > > > I believe, We discussed above case in [1] i.e.. HW can handle
> > > > the second (/pending)virtual NMI while the guest processing first virtual NMI w/o vmexit.
> > > > is it same scenario or different one that you are mentioning?
> > > >
> > > > [1] https://lore.kernel.org/lkml/[email protected]/>>
> > > You misunderstood the issue.
> > >
> > > Hardware can handle the case when a NMI is in service (that is V_NMI_MASK
> > > is set) and another one is injected (V_NMI_PENDING can be set),
> > >
> > > but it is not possible to handle the case when a NMI is already injected
> > > (V_NMI_PENDING set) but and KVM wants to inject another one before the
> > > first one went into the service (that is V_NMI_MASK is not set yet).
> >
> > In this case, HW will collapse the NMI.

Yes, but practically speaking two NMIs can't arrive at the exact same instance on
bare metal. One NMI will always arrive first and get vectored, and then the second
NMI will arrive and be pended. In a virtual environment, two NMIs that are sent
far apart can arrive together, e.g. if the vCPU is scheduled out for an extended
period of time. KVM mitigates this side effect by allowing two NMIs to be pending.

The problem here isn't that second the NMI is lost, it's that KVM can't get control
when the first NMI completes (IRETs). KVM can pend both NMIs and queue the first
for injection/vectoring (set V_NMI_PENDING), but AIUI there is no mechanism (and no
code) to force a VM-Exit on the IRET so that KVM can inject the second NMI.

Santosh's response in v2[*] suggested that hardware would allow KVM to "post" an
NMI while the vCPU is running, but I don't see any code in this series to actually
do that. svm_inject_nmi() is only called from the vCPU's run loop, i.e. requires
a VM-Exit.

[*] https://lore.kernel.org/all/[email protected]

> > Note that the HW will take the pending NMI at the boundary of IRET instruction such that
> > it will check for the V_NMI_PENDING and if its set then HW will *take* the NMI,
> > HW will clear the V_NMI_PENDING bit and set the V_NMI_MASK w/o the VMEXIT!,.
> >
> >
> > > Also same can happen when NMIs are blocked in SMM, since V_NMI_MASK is
> > > set despite no NMI in service, we will be able to inject only one NMI by
> > > setting the V_NMI_PENDING.

I believe this one is a non-issue. Like bare metal, KVM only allows one NMI to
be pending if NMIs are blocked.

static void process_nmi(struct kvm_vcpu *vcpu)
{
unsigned limit = 2;

/*
* x86 is limited to one NMI running, and one NMI pending after it.
* If an NMI is already in progress, limit further NMIs to just one.
* Otherwise, allow two (and we'll inject the first one immediately).
*/
if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
limit = 1;

vcpu->arch.nmi_pending += atomic_xchg(&vcpu->arch.nmi_queued, 0);
vcpu->arch.nmi_pending = min(vcpu->arch.nmi_pending, limit);
kvm_make_request(KVM_REQ_EVENT, vcpu);
}

> > Ditto,. HW will collapse the NMI.
>
> Note, this is how bare-metal NMIs are also handled. Multiple NMIs are
> collapsed into a single NMI if they are received while an NMI is currently
> being processed.