Although L2 is in halt state, it will be in the active state after
VM entry if the VM entry is vectoring. Halting the vcpu here means
the event won't be injected to L2 and this decision isn't reported
to L1. Thus L0 drops an event that should be injected to L2.
Because virtual interrupt delivery may wake L2 vcpu, if VID is enabled,
do the same thing -- don't halt L2.
Signed-off-by: Chao Gao <[email protected]>
---
arch/x86/kvm/vmx.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bb5b488..e1fe4e4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10985,8 +10985,14 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
if (ret)
return ret;
- if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT)
- return kvm_vcpu_halt(vcpu);
+ if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) {
+ u32 intr_info = vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);
+ u32 exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
+
+ if (!(intr_info & VECTORING_INFO_VALID_MASK) &&
+ !(exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY))
+ return kvm_vcpu_halt(vcpu);
+ }
vmx->nested.nested_run_pending = 1;
--
1.9.1
On 08/02/2018 06:13, Chao Gao wrote:
> Although L2 is in halt state, it will be in the active state after
> VM entry if the VM entry is vectoring. Halting the vcpu here means
> the event won't be injected to L2 and this decision isn't reported
> to L1. Thus L0 drops an event that should be injected to L2.
>
> Because virtual interrupt delivery may wake L2 vcpu, if VID is enabled,
> do the same thing -- don't halt L2.
This second part seems wrong to me, or at least overly general. Perhaps
you mean if RVI>0?
Thanks,
Paolo
> Signed-off-by: Chao Gao <[email protected]>
> ---
> arch/x86/kvm/vmx.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index bb5b488..e1fe4e4 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -10985,8 +10985,14 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
> if (ret)
> return ret;
>
> - if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT)
> - return kvm_vcpu_halt(vcpu);
> + if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) {
> + u32 intr_info = vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);
> + u32 exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
> +
> + if (!(intr_info & VECTORING_INFO_VALID_MASK) &&
> + !(exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY))
> + return kvm_vcpu_halt(vcpu);
> + }
> vmx->nested.nested_run_pending = 1;
>
>
----- [email protected] wrote:
> On 08/02/2018 06:13, Chao Gao wrote:
> > Although L2 is in halt state, it will be in the active state after
> > VM entry if the VM entry is vectoring. Halting the vcpu here means
> > the event won't be injected to L2 and this decision isn't reported
> > to L1. Thus L0 drops an event that should be injected to L2.
> >
> > Because virtual interrupt delivery may wake L2 vcpu, if VID is
> enabled,
> > do the same thing -- don't halt L2.
>
> This second part seems wrong to me, or at least overly general.
> Perhaps
> you mean if RVI>0?
>
> Thanks,
>
> Paolo
I would first recommend to split this commit.
The first commit should handle only the case of vectoring VM entry.
It should also specify in commit message it is based on Intel SDM 26.6.2 Activity State:
("If the VM entry is vectoring, the logical processor is in the active state after VM entry.")
That part in code seems correct to me.
The second commit seems wrong to me as-well.
(I would also mention here it is based on Intel SDM 26.6.5
Interrupt-Window Exiting and Virtual-Interrupt Delivery:
"These events wake the logical processor if it just entered the HLT state because of a VM entry")
Paolo, I think that your suggestion is not sufficient as well.
Consider the case that APIC's TPR blocks interrupt specified in RVI.
I think that we should just remove the check for VID from commit,
and instead fix another bug which I describe below.
If lapic_in_kernel()==false, then APICv is not active and VID is not exposed to L1
(In current KVM code. Jim already criticize that this is wrong.).
Otherwise, kvm_vcpu_halt() will change mp_state to KVM_MP_STATE_HALTED.
Eventually, vcpu_run() will call vcpu_block() which will reach kvm_vcpu_has_events().
That function is responsible for checking if there is any pending interrupts.
Including, pending interrupts as a result of VID enabled and RVI>0
(While also taking into account the APIC's TPR).
The logic that checks for pending interrupts is kvm_cpu_has_interrupt()
which eventually reach apic_has_interrupt_for_ppr().
If APICv is enabled, apic_has_interrupt_for_ppr() will call vmx_sync_pir_to_irr()
which calls vmx_hwapic_irr_update().
However, max_irr returned to apic_has_interrupt_for_ppr() does not consider the interrupt
pending in RVI. Which I think is the real bug to fix here.
In the non-nested case, RVI can never be larger than max_irr because that is how L0 KVM manages RVI.
However, in the nested case, L1 can set RVI in VMCS arbitrary
(we just copy GUEST_INTR_STATUS from vmcs01 into vmcs02).
A possible patch to fix this is to change vmx_hwapic_irr_update() such that
if is_guest_mode(vcpu)==true, we should return max(max_irr, rvi) and return
that value into apic_has_interrupt_for_ppr().
Need to verify that it doesn't break other flows but I think it makes sense.
What do you think?
Regards,
-Liran
>
> > Signed-off-by: Chao Gao <[email protected]>
> > ---
> > arch/x86/kvm/vmx.c | 10 ++++++++--
> > 1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index bb5b488..e1fe4e4 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -10985,8 +10985,14 @@ static int nested_vmx_run(struct kvm_vcpu
> *vcpu, bool launch)
> > if (ret)
> > return ret;
> >
> > - if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT)
> > - return kvm_vcpu_halt(vcpu);
> > + if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) {
> > + u32 intr_info = vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);
> > + u32 exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
> > +
> > + if (!(intr_info & VECTORING_INFO_VALID_MASK) &&
> > + !(exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY))
> > + return kvm_vcpu_halt(vcpu);
> > + }
>
> > vmx->nested.nested_run_pending = 1;
> >
> >
On Thu, Feb 08, 2018 at 11:29:49AM +0100, Paolo Bonzini wrote:
>On 08/02/2018 06:13, Chao Gao wrote:
>> Although L2 is in halt state, it will be in the active state after
>> VM entry if the VM entry is vectoring. Halting the vcpu here means
>> the event won't be injected to L2 and this decision isn't reported
>> to L1. Thus L0 drops an event that should be injected to L2.
>>
>> Because virtual interrupt delivery may wake L2 vcpu, if VID is enabled,
>> do the same thing -- don't halt L2.
>
>This second part seems wrong to me, or at least overly general. Perhaps
>you mean if RVI>0?
Yes. It is a little general.
How about this patch:
-- 8> --
Subject: [PATCH] x86/kvm/vmx: Don't halt vcpu when L1 is injecting events to
L2
Although L2 is in halt state, it will be in the active state after
VM entry if the VM entry is vectoring. Halting the vcpu here means
the event won't be injected to L2 and this decision isn't reported
to L1. Thus L0 drops an event that should be injected to L2.
Because virtual interrupt delivery may wake L2 vcpu, if VID is enabled
and RVI > 0, do the same thing -- don't halt L2.
Signed-off-by: Chao Gao <[email protected]>
---
arch/x86/kvm/vmx.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bb5b488..fa889c8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10985,7 +10985,13 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
if (ret)
return ret;
- if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT)
+ /*
+ * If we're entering a halted L2 vcpu and the L2 vcpu won't be woken
+ * by event injection or VID, halt vcpu for optimization.
+ */
+ if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) &&
+ !(vmcs12->vm_entry_intr_info_field & VECTORING_INFO_VALID_MASK) &&
+ !(nested_cpu_has_vid(vmcs12) && (vmcs12->guest_intr_status & 0xff)))
return kvm_vcpu_halt(vcpu);
vmx->nested.nested_run_pending = 1;
--
>
>Thanks,
>
>Paolo
>
>> Signed-off-by: Chao Gao <[email protected]>
>> ---
>> arch/x86/kvm/vmx.c | 10 ++++++++--
>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index bb5b488..e1fe4e4 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -10985,8 +10985,14 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
>> if (ret)
>> return ret;
>>
>> - if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT)
>> - return kvm_vcpu_halt(vcpu);
>> + if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) {
>> + u32 intr_info = vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);
>> + u32 exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
>> +
>> + if (!(intr_info & VECTORING_INFO_VALID_MASK) &&
>> + !(exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY))
>> + return kvm_vcpu_halt(vcpu);
>> + }
>
>> vmx->nested.nested_run_pending = 1;
>>
>>
>
On Thu, Feb 08, 2018 at 04:09:36AM -0800, Liran Alon wrote:
>
>----- [email protected] wrote:
>
>> On 08/02/2018 06:13, Chao Gao wrote:
>> > Although L2 is in halt state, it will be in the active state after
>> > VM entry if the VM entry is vectoring. Halting the vcpu here means
>> > the event won't be injected to L2 and this decision isn't reported
>> > to L1. Thus L0 drops an event that should be injected to L2.
>> >
>> > Because virtual interrupt delivery may wake L2 vcpu, if VID is
>> enabled,
>> > do the same thing -- don't halt L2.
>>
>> This second part seems wrong to me, or at least overly general.
>> Perhaps
>> you mean if RVI>0?
>>
>> Thanks,
>>
>> Paolo
>
>I would first recommend to split this commit.
>The first commit should handle only the case of vectoring VM entry.
>It should also specify in commit message it is based on Intel SDM 26.6.2 Activity State:
>("If the VM entry is vectoring, the logical processor is in the active state after VM entry.")
>That part in code seems correct to me.
>
>The second commit seems wrong to me as-well.
>(I would also mention here it is based on Intel SDM 26.6.5
>Interrupt-Window Exiting and Virtual-Interrupt Delivery:
>"These events wake the logical processor if it just entered the HLT state because of a VM entry")
>
>Paolo, I think that your suggestion is not sufficient as well.
>Consider the case that APIC's TPR blocks interrupt specified in RVI.
>
>I think that we should just remove the check for VID from commit,
>and instead fix another bug which I describe below.
>
>If lapic_in_kernel()==false, then APICv is not active and VID is not exposed to L1
>(In current KVM code. Jim already criticize that this is wrong.).
>
>Otherwise, kvm_vcpu_halt() will change mp_state to KVM_MP_STATE_HALTED.
>Eventually, vcpu_run() will call vcpu_block() which will reach kvm_vcpu_has_events().
>That function is responsible for checking if there is any pending interrupts.
>Including, pending interrupts as a result of VID enabled and RVI>0
The difference between checking VID and RVI here and in vcpu_run() is
"nested_run_pending" is set for the former. Would it cause any problem
if we don't set it?
Thanks
Chao
>(While also taking into account the APIC's TPR).
>The logic that checks for pending interrupts is kvm_cpu_has_interrupt()
>which eventually reach apic_has_interrupt_for_ppr().
>If APICv is enabled, apic_has_interrupt_for_ppr() will call vmx_sync_pir_to_irr()
>which calls vmx_hwapic_irr_update().
>
>However, max_irr returned to apic_has_interrupt_for_ppr() does not consider the interrupt
>pending in RVI. Which I think is the real bug to fix here.
>In the non-nested case, RVI can never be larger than max_irr because that is how L0 KVM manages RVI.
>However, in the nested case, L1 can set RVI in VMCS arbitrary
>(we just copy GUEST_INTR_STATUS from vmcs01 into vmcs02).
>
>A possible patch to fix this is to change vmx_hwapic_irr_update() such that
>if is_guest_mode(vcpu)==true, we should return max(max_irr, rvi) and return
>that value into apic_has_interrupt_for_ppr().
>Need to verify that it doesn't break other flows but I think it makes sense.
>What do you think?
>
>Regards,
>-Liran
>
>>
>> > Signed-off-by: Chao Gao <[email protected]>
>> > ---
>> > arch/x86/kvm/vmx.c | 10 ++++++++--
>> > 1 file changed, 8 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> > index bb5b488..e1fe4e4 100644
>> > --- a/arch/x86/kvm/vmx.c
>> > +++ b/arch/x86/kvm/vmx.c
>> > @@ -10985,8 +10985,14 @@ static int nested_vmx_run(struct kvm_vcpu
>> *vcpu, bool launch)
>> > if (ret)
>> > return ret;
>> >
>> > - if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT)
>> > - return kvm_vcpu_halt(vcpu);
>> > + if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) {
>> > + u32 intr_info = vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);
>> > + u32 exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
>> > +
>> > + if (!(intr_info & VECTORING_INFO_VALID_MASK) &&
>> > + !(exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY))
>> > + return kvm_vcpu_halt(vcpu);
>> > + }
>>
>> > vmx->nested.nested_run_pending = 1;
>> >
>> >
On 08/02/2018 13:09, Liran Alon wrote:
> ----- [email protected] wrote:
>> On 08/02/2018 06:13, Chao Gao wrote:
>>> Because virtual interrupt delivery may wake L2 vcpu, if VID is
>>> enabled, do the same thing -- don't halt L2.
>>
>> This second part seems wrong to me, or at least overly general.
>> Perhaps you mean if RVI>0?
>
> I would first recommend to split this commit.
> The first commit should handle only the case of vectoring VM entry.
> It should also specify in commit message it is based on Intel SDM 26.6.2 Activity State:
> ("If the VM entry is vectoring, the logical processor is in the active state after VM entry.")
> That part in code seems correct to me.
I agree.
> The second commit seems wrong to me as-well.
> (I would also mention here it is based on Intel SDM 26.6.5
> Interrupt-Window Exiting and Virtual-Interrupt Delivery:
> "These events wake the logical processor if it just entered the HLT state because of a VM entry")
>
> Paolo, I think that your suggestion is not sufficient as well.
> Consider the case that APIC's TPR blocks interrupt specified in RVI.
That's true. It should be RVI>PPR.
> Otherwise, kvm_vcpu_halt() will change mp_state to KVM_MP_STATE_HALTED.
> Eventually, vcpu_run() will call vcpu_block() which will reach kvm_vcpu_has_events().
> That function is responsible for checking if there is any pending interrupts.
> Including, pending interrupts as a result of VID enabled and RVI>0
> (While also taking into account the APIC's TPR).
> The logic that checks for pending interrupts is kvm_cpu_has_interrupt()
> which eventually reach apic_has_interrupt_for_ppr().
> If APICv is enabled, apic_has_interrupt_for_ppr() will call vmx_sync_pir_to_irr()
> which calls vmx_hwapic_irr_update().
>
> However, max_irr returned to apic_has_interrupt_for_ppr() does not consider the interrupt
> pending in RVI. Which I think is the real bug to fix here.
> In the non-nested case, RVI can never be larger than max_irr because that is how L0 KVM manages RVI.
> However, in the nested case, L1 can set RVI in VMCS arbitrary
> (we just copy GUEST_INTR_STATUS from vmcs01 into vmcs02).
>
> A possible patch to fix this is to change vmx_hwapic_irr_update() such that
> if is_guest_mode(vcpu)==true, we should return max(max_irr, rvi) and return
> that value into apic_has_interrupt_for_ppr().
> Need to verify that it doesn't break other flows but I think it makes sense.
> What do you think?
Yeah, I think it makes sense though I'd need to look a lot more at
arch/x86/kvm/lapic.c and arch/x86/kvm/vmx.c to turn that into a patch!
Paolo
----- [email protected] wrote:
> On 08/02/2018 13:09, Liran Alon wrote:
> > ----- [email protected] wrote:
> >> On 08/02/2018 06:13, Chao Gao wrote:
> >
> > A possible patch to fix this is to change vmx_hwapic_irr_update()
> such that
> > if is_guest_mode(vcpu)==true, we should return max(max_irr, rvi) and
> return
> > that value into apic_has_interrupt_for_ppr().
> > Need to verify that it doesn't break other flows but I think it
> makes sense.
> > What do you think?
>
> Yeah, I think it makes sense though I'd need to look a lot more at
> arch/x86/kvm/lapic.c and arch/x86/kvm/vmx.c to turn that into a
> patch!
>
> Paolo
After thinking about this a bit more, I don't like my previous suggestion.
As we don't semantically want to change the value returned from kvm_apic_has_interrupt().
Instead, it makes more sense to change kvm_cpu_has_interrupt() to check for RVI>PPR
in case is_guest_mode(vcpu)==true.
Something like (partial theoretical patch):
@@ -97,6 +97,14 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
if (kvm_cpu_has_extint(v))
return 1;
+ /*
+ * When running L2, L1 controls vmcs02 RVI via vmcs12.
+ * Therefore, it is possible RVI indicates pending interrupt
+ * for vCPU while LAPIC IRR is empty.
+ */
+ if (is_guest_mode(v) &&
+ (kvm_x86_ops->hwapic_has_interrupt(v) != -1))
+ return 1;
+
return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
}
Where:
+static int vmx_get_rvi(void)
+{
+ return ((u8)vmcs_read16(GUEST_INTR_STATUS) & 0xff);
+}
+static int vmx_hwapic_has_interrupt(struct kvm_vcpu *vcpu)
+{
+ int vector = vmx_get_rvi(vcpu);
+ return kvm_apic_has_interrupt_for_vector(vector);
+}
+int kvm_apic_has_interrupt_for_vector(struct kvm_vcpu *vcpu, int vector)
+{
+ struct kvm_lapic *apic = vcpu->arch.apic;
+ u32 ppr;
+
+ if (!apic_enabled(apic))
+ return -1;
+
+ __apic_update_ppr(apic, &ppr);
+ return (((vector & 0xF0) > ppr) ? (vector) : (-1));
+}
+EXPORT_SYMBOL_GPL(kvm_apic_has_interrupt_for_vector);
Regards,
-Liran