Hi!
This is the vNMI patch series based on Santosh Shukla's vNMI patch series.
In this version of this patch series I addressed most of the review feedback
added some more refactoring and also I think fixed the issue with migration.
I only tested this on a machine which doesn't have vNMI, so this does need
some testing to ensure that nothing is broken.
Best regards,
Maxim Levitsky
Maxim Levitsky (9):
KVM: nSVM: don't sync back tlb_ctl on nested VM exit
KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12
KVM: nSVM: explicitly raise KVM_REQ_EVENT on nested VM exit if L1
doesn't intercept interrupts
KVM: SVM: drop the SVM specific H_FLAGS
KVM: x86: emulator: stop using raw host flags
KVM: SVM: add wrappers to enable/disable IRET interception
KVM: x86: add a delayed hardware NMI injection interface
KVM: SVM: implement support for vNMI
KVM: nSVM: implement support for nested VNMI
Santosh Shukla (2):
x86/cpu: Add CPUID feature bit for VNMI
KVM: SVM: Add VNMI bit definition
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/kvm-x86-ops.h | 2 +
arch/x86/include/asm/kvm_host.h | 24 +++--
arch/x86/include/asm/svm.h | 7 ++
arch/x86/kvm/emulate.c | 11 +--
arch/x86/kvm/kvm_emulate.h | 7 +-
arch/x86/kvm/smm.c | 2 -
arch/x86/kvm/svm/nested.c | 102 ++++++++++++++++---
arch/x86/kvm/svm/svm.c | 154 ++++++++++++++++++++++-------
arch/x86/kvm/svm/svm.h | 41 +++++++-
arch/x86/kvm/x86.c | 50 ++++++++--
11 files changed, 318 insertions(+), 83 deletions(-)
--
2.26.3
The CPU doesn't change TLB_CTL value as stated in the PRM (15.16.2):
"The VMRUN instruction reads, but does not change, the
value of the TLB_CONTROL field"
Therefore the KVM shouldn't do that either.
Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/nested.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index bc9cd7086fa972..37af0338da7c32 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1010,7 +1010,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
vmcb12->control.next_rip = vmcb02->control.next_rip;
vmcb12->control.int_ctl = svm->nested.ctl.int_ctl;
- vmcb12->control.tlb_ctl = svm->nested.ctl.tlb_ctl;
vmcb12->control.event_inj = svm->nested.ctl.event_inj;
vmcb12->control.event_inj_err = svm->nested.ctl.event_inj_err;
--
2.26.3
From: Santosh Shukla <[email protected]>
VNMI exposes 3 capability bits (V_NMI, V_NMI_MASK, and V_NMI_ENABLE) to
virtualize NMI and NMI_MASK, Those capability bits are part of
VMCB::intr_ctrl -
V_NMI(11) - Indicates whether a virtual NMI is pending in the guest.
V_NMI_MASK(12) - Indicates whether virtual NMI is masked in the guest.
V_NMI_ENABLE(26) - Enables the NMI virtualization feature for the guest.
When Hypervisor wants to inject NMI, it will set V_NMI bit, Processor
will clear the V_NMI bit and Set the V_NMI_MASK which means the Guest is
handling NMI, After the guest handled the NMI, The processor will clear
the V_NMI_MASK on the successful completion of IRET instruction Or if
VMEXIT occurs while delivering the virtual NMI.
To enable the VNMI capability, Hypervisor need to program
V_NMI_ENABLE bit 1.
Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Santosh Shukla <[email protected]>
---
arch/x86/include/asm/svm.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index cb1ee53ad3b189..26d6f549ce2b46 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -203,6 +203,13 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define X2APIC_MODE_SHIFT 30
#define X2APIC_MODE_MASK (1 << X2APIC_MODE_SHIFT)
+#define V_NMI_PENDING_SHIFT 11
+#define V_NMI_PENDING (1 << V_NMI_PENDING_SHIFT)
+#define V_NMI_MASK_SHIFT 12
+#define V_NMI_MASK (1 << V_NMI_MASK_SHIFT)
+#define V_NMI_ENABLE_SHIFT 26
+#define V_NMI_ENABLE (1 << V_NMI_ENABLE_SHIFT)
+
#define LBR_CTL_ENABLE_MASK BIT_ULL(0)
#define VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK BIT_ULL(1)
--
2.26.3
If the L2 doesn't intercept interrupts, then the KVM will use vmcb02's
V_IRQ for L1 (to detect the interrupt window)
In this case on the nested VM exit KVM might need to copy the V_IRQ bit
from the vmcb02 to the vmcb01, to continue waiting for the
interrupt window.
To make it simple, just raise the KVM_REQ_EVENT request, which
execution will lead to the reenabling of the interrupt
window if needed.
Note that this is a theoretical bug because the KVM already does raise
the KVM_REQ_EVENT request one each nested VM exit because the nested
VM exit resets RFLAGS and the kvm_set_rflags() raises the
KVM_REQ_EVENT request in the response.
However raising this request explicitly, together with
documenting why this is needed, is still preferred.
Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/nested.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index aad3145b2f62fe..e891318595113e 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1016,6 +1016,31 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
svm_switch_vmcb(svm, &svm->vmcb01);
+ /* Note about synchronizing some of int_ctl bits from vmcb02 to vmcb01:
+ *
+ * - V_IRQ, V_IRQ_VECTOR, V_INTR_PRIO_MASK, V_IGN_TPR:
+ * If the L2 doesn't intercept interrupts, then
+ * (even if the L2 does use virtual interrupt masking),
+ * KVM will use the vmcb02's V_INTR to detect interrupt window.
+ *
+ * In this case, the KVM raises the KVM_REQ_EVENT to ensure that interrupt window
+ * is not lost and this implicitly copies these bits from vmcb02 to vmcb01
+ *
+ * V_TPR:
+ * If the L2 doesn't use virtual interrupt masking, then the L1's vTPR
+ * is stored in the vmcb02 but its value doesn't need to be copied from/to
+ * vmcb01 because it is copied from/to the TPR APIC's register on
+ * each VM entry/exit.
+ *
+ * V_GIF:
+ * - If the nested vGIF is not used, KVM uses vmcb02's V_GIF for L1's V_GIF,
+ * however, the L1 vGIF is reset to false on each VM exit, thus
+ * there is no need to copy it from vmcb02 to vmcb01.
+ */
+
+ if (!nested_exit_on_intr(svm))
+ kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
+
if (unlikely(svm->lbrv_enabled && (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
svm_copy_lbrs(vmcb12, vmcb02);
svm_update_lbrv(vcpu);
--
2.26.3
Hi Maxim,
On 11/30/2022 1:07 AM, Maxim Levitsky wrote:
> The CPU doesn't change TLB_CTL value as stated in the PRM (15.16.2):
>
nits:
s / PRM (15.16.2) / APM (15.16.1 - TLB Flush)
> "The VMRUN instruction reads, but does not change, the
> value of the TLB_CONTROL field"
>
> Therefore the KVM shouldn't do that either.
>
> Signed-off-by: Maxim Levitsky <[email protected]>
> ---
> arch/x86/kvm/svm/nested.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index bc9cd7086fa972..37af0338da7c32 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -1010,7 +1010,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
> vmcb12->control.next_rip = vmcb02->control.next_rip;
>
> vmcb12->control.int_ctl = svm->nested.ctl.int_ctl;
> - vmcb12->control.tlb_ctl = svm->nested.ctl.tlb_ctl;
> vmcb12->control.event_inj = svm->nested.ctl.event_inj;
> vmcb12->control.event_inj_err = svm->nested.ctl.event_inj_err;
>
On 11/30/2022 1:07 AM, Maxim Levitsky wrote:
> Hi!
>
>
>
> This is the vNMI patch series based on Santosh Shukla's vNMI patch series.
>
>
>
> In this version of this patch series I addressed most of the review feedback
>
> added some more refactoring and also I think fixed the issue with migration.
>
>
>
> I only tested this on a machine which doesn't have vNMI, so this does need
>
> some testing to ensure that nothing is broken.
>
>
>
> Best regards,
>
> Maxim Levitsky
>
>
Series tested on EPYC-v4.
Tested-By: Santosh Shukla <[email protected]>
>
> Maxim Levitsky (9):
>
> KVM: nSVM: don't sync back tlb_ctl on nested VM exit
>
> KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12
>
> KVM: nSVM: explicitly raise KVM_REQ_EVENT on nested VM exit if L1
>
> doesn't intercept interrupts
>
> KVM: SVM: drop the SVM specific H_FLAGS
>
> KVM: x86: emulator: stop using raw host flags
>
> KVM: SVM: add wrappers to enable/disable IRET interception
>
> KVM: x86: add a delayed hardware NMI injection interface
>
> KVM: SVM: implement support for vNMI
>
> KVM: nSVM: implement support for nested VNMI
>
>
>
> Santosh Shukla (2):
>
> x86/cpu: Add CPUID feature bit for VNMI
>
> KVM: SVM: Add VNMI bit definition
>
>
>
> arch/x86/include/asm/cpufeatures.h | 1 +
>
> arch/x86/include/asm/kvm-x86-ops.h | 2 +
>
> arch/x86/include/asm/kvm_host.h | 24 +++--
>
> arch/x86/include/asm/svm.h | 7 ++
>
> arch/x86/kvm/emulate.c | 11 +--
>
> arch/x86/kvm/kvm_emulate.h | 7 +-
>
> arch/x86/kvm/smm.c | 2 -
>
> arch/x86/kvm/svm/nested.c | 102 ++++++++++++++++---
>
> arch/x86/kvm/svm/svm.c | 154 ++++++++++++++++++++++-------
>
> arch/x86/kvm/svm/svm.h | 41 +++++++-
>
> arch/x86/kvm/x86.c | 50 ++++++++--
>
> 11 files changed, 318 insertions(+), 83 deletions(-)
>
>
>
On Mon, 2022-12-05 at 19:35 +0530, Santosh Shukla wrote:
> Hi Maxim,
>
> On 11/30/2022 1:07 AM, Maxim Levitsky wrote:
> > The CPU doesn't change TLB_CTL value as stated in the PRM (15.16.2):
> >
> nits:
> s / PRM (15.16.2) / APM (15.16.1 - TLB Flush)
True for both changes, thanks!
Best regards,
Maxim Levitsky
>
> > "The VMRUN instruction reads, but does not change, the
> > value of the TLB_CONTROL field"
> >
> > Therefore the KVM shouldn't do that either.
> >
> > Signed-off-by: Maxim Levitsky <[email protected]>
> > ---
> > arch/x86/kvm/svm/nested.c | 1 -
> > 1 file changed, 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index bc9cd7086fa972..37af0338da7c32 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -1010,7 +1010,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
> > vmcb12->control.next_rip = vmcb02->control.next_rip;
> >
> > vmcb12->control.int_ctl = svm->nested.ctl.int_ctl;
> > - vmcb12->control.tlb_ctl = svm->nested.ctl.tlb_ctl;
> > vmcb12->control.event_inj = svm->nested.ctl.event_inj;
> > vmcb12->control.event_inj_err = svm->nested.ctl.event_inj_err;
> >
On Tue, 2022-11-29 at 21:37 +0200, Maxim Levitsky wrote:
> Hi!
>
> This is the vNMI patch series based on Santosh Shukla's vNMI patch series.
>
> In this version of this patch series I addressed most of the review feedback
> added some more refactoring and also I think fixed the issue with migration.
>
> I only tested this on a machine which doesn't have vNMI, so this does need
> some testing to ensure that nothing is broken.
>
> Best regards,
> Maxim Levitsky
>
> Maxim Levitsky (9):
> KVM: nSVM: don't sync back tlb_ctl on nested VM exit
> KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12
> KVM: nSVM: explicitly raise KVM_REQ_EVENT on nested VM exit if L1
> doesn't intercept interrupts
> KVM: SVM: drop the SVM specific H_FLAGS
> KVM: x86: emulator: stop using raw host flags
> KVM: SVM: add wrappers to enable/disable IRET interception
> KVM: x86: add a delayed hardware NMI injection interface
> KVM: SVM: implement support for vNMI
> KVM: nSVM: implement support for nested VNMI
>
> Santosh Shukla (2):
> x86/cpu: Add CPUID feature bit for VNMI
> KVM: SVM: Add VNMI bit definition
>
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/kvm-x86-ops.h | 2 +
> arch/x86/include/asm/kvm_host.h | 24 +++--
> arch/x86/include/asm/svm.h | 7 ++
> arch/x86/kvm/emulate.c | 11 +--
> arch/x86/kvm/kvm_emulate.h | 7 +-
> arch/x86/kvm/smm.c | 2 -
> arch/x86/kvm/svm/nested.c | 102 ++++++++++++++++---
> arch/x86/kvm/svm/svm.c | 154 ++++++++++++++++++++++-------
> arch/x86/kvm/svm/svm.h | 41 +++++++-
> arch/x86/kvm/x86.c | 50 ++++++++--
> 11 files changed, 318 insertions(+), 83 deletions(-)
>
> --
> 2.26.3
>
>
A very kind ping on these patches.
Best regards,
Maxim Levitsky
On Tue, Dec 20, 2022, Maxim Levitsky wrote:
> On Tue, 2022-11-29 at 21:37 +0200, Maxim Levitsky wrote:
> > Hi!
> >
> > This is the vNMI patch series based on Santosh Shukla's vNMI patch series.
> >
> > In this version of this patch series I addressed most of the review feedback
> > added some more refactoring and also I think fixed the issue with migration.
> >
> > I only tested this on a machine which doesn't have vNMI, so this does need
> > some testing to ensure that nothing is broken.
> >
> > Best regards,
> > Maxim Levitsky
> >
> > Maxim Levitsky (9):
> > KVM: nSVM: don't sync back tlb_ctl on nested VM exit
> > KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12
> > KVM: nSVM: explicitly raise KVM_REQ_EVENT on nested VM exit if L1
> > doesn't intercept interrupts
> > KVM: SVM: drop the SVM specific H_FLAGS
> > KVM: x86: emulator: stop using raw host flags
> > KVM: SVM: add wrappers to enable/disable IRET interception
> > KVM: x86: add a delayed hardware NMI injection interface
> > KVM: SVM: implement support for vNMI
> > KVM: nSVM: implement support for nested VNMI
> >
> > Santosh Shukla (2):
> > x86/cpu: Add CPUID feature bit for VNMI
> > KVM: SVM: Add VNMI bit definition
> >
> > arch/x86/include/asm/cpufeatures.h | 1 +
> > arch/x86/include/asm/kvm-x86-ops.h | 2 +
> > arch/x86/include/asm/kvm_host.h | 24 +++--
> > arch/x86/include/asm/svm.h | 7 ++
> > arch/x86/kvm/emulate.c | 11 +--
> > arch/x86/kvm/kvm_emulate.h | 7 +-
> > arch/x86/kvm/smm.c | 2 -
> > arch/x86/kvm/svm/nested.c | 102 ++++++++++++++++---
> > arch/x86/kvm/svm/svm.c | 154 ++++++++++++++++++++++-------
> > arch/x86/kvm/svm/svm.h | 41 +++++++-
> > arch/x86/kvm/x86.c | 50 ++++++++--
> > 11 files changed, 318 insertions(+), 83 deletions(-)
> >
> > --
> > 2.26.3
> >
> >
> A very kind ping on these patches.
Sorry, I won't get to this (or anything else) until the new year.
On Tue, 2022-11-29 at 21:37 +0200, Maxim Levitsky wrote:
> Hi!
>
> This is the vNMI patch series based on Santosh Shukla's vNMI patch series.
>
> In this version of this patch series I addressed most of the review feedback
> added some more refactoring and also I think fixed the issue with migration.
>
> I only tested this on a machine which doesn't have vNMI, so this does need
> some testing to ensure that nothing is broken.
>
> Best regards,
> Maxim Levitsky
>
> Maxim Levitsky (9):
> KVM: nSVM: don't sync back tlb_ctl on nested VM exit
> KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12
> KVM: nSVM: explicitly raise KVM_REQ_EVENT on nested VM exit if L1
> doesn't intercept interrupts
> KVM: SVM: drop the SVM specific H_FLAGS
> KVM: x86: emulator: stop using raw host flags
> KVM: SVM: add wrappers to enable/disable IRET interception
> KVM: x86: add a delayed hardware NMI injection interface
> KVM: SVM: implement support for vNMI
> KVM: nSVM: implement support for nested VNMI
>
> Santosh Shukla (2):
> x86/cpu: Add CPUID feature bit for VNMI
> KVM: SVM: Add VNMI bit definition
>
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/kvm-x86-ops.h | 2 +
> arch/x86/include/asm/kvm_host.h | 24 +++--
> arch/x86/include/asm/svm.h | 7 ++
> arch/x86/kvm/emulate.c | 11 +--
> arch/x86/kvm/kvm_emulate.h | 7 +-
> arch/x86/kvm/smm.c | 2 -
> arch/x86/kvm/svm/nested.c | 102 ++++++++++++++++---
> arch/x86/kvm/svm/svm.c | 154 ++++++++++++++++++++++-------
> arch/x86/kvm/svm/svm.h | 41 +++++++-
> arch/x86/kvm/x86.c | 50 ++++++++--
> 11 files changed, 318 insertions(+), 83 deletions(-)
>
> --
> 2.26.3
>
>
Another kind ping on this patch series.
Best regards,
Maxim Levitsky
Shortlog is too long, maybe this?
KVM: nSVM: Raise event on nested VM exit if L1 doesn't intercept IRQs
On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> If the L2 doesn't intercept interrupts, then the KVM will use vmcb02's
s/the L2/L2, though don't you mean L1?
> V_IRQ for L1 (to detect the interrupt window)
"an interrupt window", i.e. there's not just one window.
> In this case on the nested VM exit KVM might need to copy the V_IRQ bit
s/the nested/nested
> from the vmcb02 to the vmcb01, to continue waiting for the
> interrupt window.
>
> To make it simple, just raise the KVM_REQ_EVENT request, which
> execution will lead to the reenabling of the interrupt
> window if needed.
>
> Note that this is a theoretical bug because the KVM already does raise
s/the KVM/KVM
> the KVM_REQ_EVENT request one each nested VM exit because the nested
s/the KVM_REQ_EVENT/KVM_REQ_EVENT, and s/one/on
> VM exit resets RFLAGS and the kvm_set_rflags() raises the
> KVM_REQ_EVENT request in the response.
>
> However raising this request explicitly, together with
> documenting why this is needed, is still preferred.
>
> Signed-off-by: Maxim Levitsky <[email protected]>
> ---
> arch/x86/kvm/svm/nested.c | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index aad3145b2f62fe..e891318595113e 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -1016,6 +1016,31 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>
> svm_switch_vmcb(svm, &svm->vmcb01);
>
> + /* Note about synchronizing some of int_ctl bits from vmcb02 to vmcb01:
/*
* Preferred block comment style...
> + *
> + * - V_IRQ, V_IRQ_VECTOR, V_INTR_PRIO_MASK, V_IGN_TPR:
Drop the "-" to be consistent with the rest of the paragraphs.
> + * If the L2 doesn't intercept interrupts, then
> + * (even if the L2 does use virtual interrupt masking),
KVM uses "L2" to refer to the thing running at L2. I think what you are referring
to here is vmcb12? And that's controlled by L1.
> + * KVM will use the vmcb02's V_INTR to detect interrupt window.
s/the vmcb02/vmcb02
Which of the V_INTR fields does this refer to? Oooh, you're saying the KVM injects
a virtual interrupt into L2 using vmcb02 in order to determine when L2 has IRQs
enabled.
Why does KVM do that? Why not pend the actual IRQ directly?
> + *
> + * In this case, the KVM raises the KVM_REQ_EVENT to ensure that interrupt window
s/the KVM_REQ_EVENT/KVM_REQ_EVENT
> + * is not lost and this implicitly copies these bits from vmcb02 to vmcb01
Too many pronouns. What do "this" and "these bits" refer to?
> + *
> + * V_TPR:
> + * If the L2 doesn't use virtual interrupt masking, then the L1's vTPR
> + * is stored in the vmcb02 but its value doesn't need to be copied from/to
> + * vmcb01 because it is copied from/to the TPR APIC's register on
> + * each VM entry/exit.
> + *
> + * V_GIF:
> + * - If the nested vGIF is not used, KVM uses vmcb02's V_GIF for L1's V_GIF,
Drop this "-" too.
> + * however, the L1 vGIF is reset to false on each VM exit, thus
> + * there is no need to copy it from vmcb02 to vmcb01.
> + */
> +
> + if (!nested_exit_on_intr(svm))
> + kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
> +
> if (unlikely(svm->lbrv_enabled && (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
> svm_copy_lbrs(vmcb12, vmcb02);
> svm_update_lbrv(vcpu);
> --
> 2.26.3
>
On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> Hi!
>
> This is the vNMI patch series based on Santosh Shukla's vNMI patch series.
>
> In this version of this patch series I addressed most of the review feedback
> added some more refactoring and also I think fixed the issue with migration.
>
> I only tested this on a machine which doesn't have vNMI, so this does need
> some testing to ensure that nothing is broken.
Apologies for the slow review.
Did a fast run through, mostly have questions to address my lack of knowledge.
I'll give this a much more thorough review first thing next week (my brain is
fried), and am planning on queueing it unless I see someone truly busted (I'll
fixup my nits when applying).
On Sat, Jan 28, 2023, Sean Christopherson wrote:
> > + * If the L2 doesn't intercept interrupts, then
> > + * (even if the L2 does use virtual interrupt masking),
>
> KVM uses "L2" to refer to the thing running at L2. I think what you are referring
> to here is vmcb12? And that's controlled by L1.
>
> > + * KVM will use the vmcb02's V_INTR to detect interrupt window.
>
> s/the vmcb02/vmcb02
>
> Which of the V_INTR fields does this refer to? Oooh, you're saying the KVM injects
> a virtual interrupt into L2 using vmcb02 in order to determine when L2 has IRQs
> enabled.
>
> Why does KVM do that? Why not pend the actual IRQ directly?
Duh, because KVM needs to gain control in if there are multiple pending events.
On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> From: Santosh Shukla <[email protected]>
>
> VNMI exposes 3 capability bits (V_NMI, V_NMI_MASK, and V_NMI_ENABLE) to
> virtualize NMI and NMI_MASK, Those capability bits are part of
> VMCB::intr_ctrl -
> V_NMI(11) - Indicates whether a virtual NMI is pending in the guest.
> V_NMI_MASK(12) - Indicates whether virtual NMI is masked in the guest.
> V_NMI_ENABLE(26) - Enables the NMI virtualization feature for the guest.
>
> When Hypervisor wants to inject NMI, it will set V_NMI bit, Processor
> will clear the V_NMI bit and Set the V_NMI_MASK which means the Guest is
> handling NMI, After the guest handled the NMI, The processor will clear
> the V_NMI_MASK on the successful completion of IRET instruction Or if
> VMEXIT occurs while delivering the virtual NMI.
>
> To enable the VNMI capability, Hypervisor need to program
> V_NMI_ENABLE bit 1.
>
> Reviewed-by: Maxim Levitsky <[email protected]>
> Signed-off-by: Santosh Shukla <[email protected]>
> ---
> arch/x86/include/asm/svm.h | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index cb1ee53ad3b189..26d6f549ce2b46 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -203,6 +203,13 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
> #define X2APIC_MODE_SHIFT 30
> #define X2APIC_MODE_MASK (1 << X2APIC_MODE_SHIFT)
>
> +#define V_NMI_PENDING_SHIFT 11
> +#define V_NMI_PENDING (1 << V_NMI_PENDING_SHIFT)
> +#define V_NMI_MASK_SHIFT 12
> +#define V_NMI_MASK (1 << V_NMI_MASK_SHIFT)
Argh, more KVM warts. The existing INT_CTL defines all use "mask" in the name,
so looking at V_NMI_MASK in the context of other code reads "vNMI is pending",
not "vNMIs are blocked".
IMO, the existing _MASK terminology is the one that's wrong, but there's an absurd
amount of prior art in svm.h :-(
And the really annoying one is V_INTR_MASKING_MASK, which IIRC says "virtual INTR
masking is enabled", not "virtual INTRs are blocked".
So maybe call this V_NMI_BLOCKING_MASK? And tack on _MASK too the others (even
though I agree it's ugly).
> +#define V_NMI_ENABLE_SHIFT 26
> +#define V_NMI_ENABLE (1 << V_NMI_ENABLE_SHIFT)
Hrm. I think I would prefer to keep the defines ordered by bit position. Knowing
that there's an enable bit isn't all that critical for understanding vNMI pending
and blocked.
On Tue, Dec 06, 2022, Santosh Shukla wrote:
> Series tested on EPYC-v4.
> Tested-By: Santosh Shukla <[email protected]>
In the future, please use Tested-by, not Tested-By. For whatever reason, the
preferred kernel style for tags is to capitalize only the first word, e.g.
Co-developed-by, Tested-by, Reviewed-by, etc...
On Tue, 29 Nov 2022 21:37:06 +0200, Maxim Levitsky wrote:
> This is the vNMI patch series based on Santosh Shukla's vNMI patch series.
>
> In this version of this patch series I addressed most of the review feedback
> added some more refactoring and also I think fixed the issue with migration.
>
> I only tested this on a machine which doesn't have vNMI, so this does need
> some testing to ensure that nothing is broken.
>
> [...]
Applied 1, 4, and 5 to kvm-x86 svm. I split patch 4 as doing so made the
HF_GIF_MASK change super trivial.
vNMI support will get pushed beyond v6.3, but I will do my best to promptly
review future versions, while I still have all of this paged in...
[01/11] KVM: nSVM: Don't sync tlb_ctl back to vmcb12 on nested VM-Exit
https://github.com/kvm-x86/linux/commit/8957cbcfed0a
[04/11] KVM: x86: Move HF_GIF_MASK into "struct vcpu_svm" as "guest_gif"
https://github.com/kvm-x86/linux/commit/c760e86f27fe
[04/11] KVM: x86: Move HF_NMI_MASK and HF_IRET_MASK into "struct vcpu_svm"
https://github.com/kvm-x86/linux/commit/916b54a7688b
[05/11] KVM: x86: Use emulator callbacks instead of duplicating "host flags"
https://github.com/kvm-x86/linux/commit/32e69f232db4
--
https://github.com/kvm-x86/linux/tree/next
https://github.com/kvm-x86/linux/tree/fixes
On 2/1/2023 4:12 AM, Sean Christopherson wrote:
> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
>> From: Santosh Shukla <[email protected]>
>>
>> VNMI exposes 3 capability bits (V_NMI, V_NMI_MASK, and V_NMI_ENABLE) to
>> virtualize NMI and NMI_MASK, Those capability bits are part of
>> VMCB::intr_ctrl -
>> V_NMI(11) - Indicates whether a virtual NMI is pending in the guest.
>> V_NMI_MASK(12) - Indicates whether virtual NMI is masked in the guest.
>> V_NMI_ENABLE(26) - Enables the NMI virtualization feature for the guest.
>>
>> When Hypervisor wants to inject NMI, it will set V_NMI bit, Processor
>> will clear the V_NMI bit and Set the V_NMI_MASK which means the Guest is
>> handling NMI, After the guest handled the NMI, The processor will clear
>> the V_NMI_MASK on the successful completion of IRET instruction Or if
>> VMEXIT occurs while delivering the virtual NMI.
>>
>> To enable the VNMI capability, Hypervisor need to program
>> V_NMI_ENABLE bit 1.
>>
>> Reviewed-by: Maxim Levitsky <[email protected]>
>> Signed-off-by: Santosh Shukla <[email protected]>
>> ---
>> arch/x86/include/asm/svm.h | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
>> index cb1ee53ad3b189..26d6f549ce2b46 100644
>> --- a/arch/x86/include/asm/svm.h
>> +++ b/arch/x86/include/asm/svm.h
>> @@ -203,6 +203,13 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>> #define X2APIC_MODE_SHIFT 30
>> #define X2APIC_MODE_MASK (1 << X2APIC_MODE_SHIFT)
>>
>> +#define V_NMI_PENDING_SHIFT 11
>> +#define V_NMI_PENDING (1 << V_NMI_PENDING_SHIFT)
>> +#define V_NMI_MASK_SHIFT 12
>> +#define V_NMI_MASK (1 << V_NMI_MASK_SHIFT)
>
> Argh, more KVM warts. The existing INT_CTL defines all use "mask" in the name,
> so looking at V_NMI_MASK in the context of other code reads "vNMI is pending",
> not "vNMIs are blocked".
>
> IMO, the existing _MASK terminology is the one that's wrong, but there's an absurd
> amount of prior art in svm.h :-(
>
> And the really annoying one is V_INTR_MASKING_MASK, which IIRC says "virtual INTR
> masking is enabled", not "virtual INTRs are blocked".
>
> So maybe call this V_NMI_BLOCKING_MASK? And tack on _MASK too the others (even
> though I agree it's ugly).
>
Sure.
>> +#define V_NMI_ENABLE_SHIFT 26
>> +#define V_NMI_ENABLE (1 << V_NMI_ENABLE_SHIFT)
>
> Hrm. I think I would prefer to keep the defines ordered by bit position. Knowing
> that there's an enable bit isn't all that critical for understanding vNMI pending
> and blocked.
Sure, Sean will include in V3.
Thanks,
Santosh