Some Posted-Interrupts from passthrough devices may be lost or
overwritten when the vCPU is in runnable state.
The SN (Suppress Notification) of PID (Posted Interrupt Descriptor) will
be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE state
but not running on physical CPU). If a posted interrupt coming at this
time, the irq remmaping facility will set the bit of PIR (Posted
Interrupt Requests) without ON (Outstanding Notification).
So this interrupt can't be sync to APIC virtualization register and
will not be handled by Guest because ON is zero.
Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 2 +-
arch/x86/kvm/x86.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f6915f1..820a03b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6048,7 +6048,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
bool max_irr_updated;
WARN_ON(!vcpu->arch.apicv_active);
- if (pi_test_on(&vmx->pi_desc)) {
+ if (!bitmap_empty((unsigned long *)vmx->pi_desc.pir, NR_VECTORS)) {
pi_clear_on(&vmx->pi_desc);
/*
* IOMMU can write to PIR.ON, so the barrier matters even on UP.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 02c8e09..c31b608 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7793,7 +7793,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
* 1) We should set ->mode before checking ->requests. Please see
* the comment in kvm_vcpu_exiting_guest_mode().
*
- * 2) For APICv, we should set ->mode before checking PIR.ON. This
+ * 2) For APICv, we should set ->mode before checking PID.PIR. This
* pairs with the memory barrier implicit in pi_test_and_set_on
* (see vmx_deliver_posted_interrupt).
*
--
1.8.3.1
On 30/01/19 01:59, Luwei Kang wrote:
> Some Posted-Interrupts from passthrough devices may be lost or
> overwritten when the vCPU is in runnable state.
>
> The SN (Suppress Notification) of PID (Posted Interrupt Descriptor) will
> be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE state
> but not running on physical CPU). If a posted interrupt coming at this
> time, the irq remmaping facility will set the bit of PIR (Posted
> Interrupt Requests) without ON (Outstanding Notification).
> So this interrupt can't be sync to APIC virtualization register and
> will not be handled by Guest because ON is zero.
>
> Signed-off-by: Luwei Kang <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 2 +-
> arch/x86/kvm/x86.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index f6915f1..820a03b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6048,7 +6048,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> bool max_irr_updated;
>
> WARN_ON(!vcpu->arch.apicv_active);
> - if (pi_test_on(&vmx->pi_desc)) {
> + if (!bitmap_empty((unsigned long *)vmx->pi_desc.pir, NR_VECTORS)) {
> pi_clear_on(&vmx->pi_desc);
> /*
> * IOMMU can write to PIR.ON, so the barrier matters even on UP.
This is not what I asked. You should instead do the check after
pi_clear_sn.
Paolo
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 02c8e09..c31b608 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7793,7 +7793,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> * 1) We should set ->mode before checking ->requests. Please see
> * the comment in kvm_vcpu_exiting_guest_mode().
> *
> - * 2) For APICv, we should set ->mode before checking PIR.ON. This
> + * 2) For APICv, we should set ->mode before checking PID.PIR. This
> * pairs with the memory barrier implicit in pi_test_and_set_on
> * (see vmx_deliver_posted_interrupt).
> *
>
> > Some Posted-Interrupts from passthrough devices may be lost or
> > overwritten when the vCPU is in runnable state.
> >
> > The SN (Suppress Notification) of PID (Posted Interrupt Descriptor)
> > will be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE
> > state but not running on physical CPU). If a posted interrupt coming
> > at this time, the irq remmaping facility will set the bit of PIR
> > (Posted Interrupt Requests) without ON (Outstanding Notification).
> > So this interrupt can't be sync to APIC virtualization register and
> > will not be handled by Guest because ON is zero.
> >
> > Signed-off-by: Luwei Kang <[email protected]>
> > ---
> > arch/x86/kvm/vmx/vmx.c | 2 +-
> > arch/x86/kvm/x86.c | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> > f6915f1..820a03b 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -6048,7 +6048,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> > bool max_irr_updated;
> >
> > WARN_ON(!vcpu->arch.apicv_active);
> > - if (pi_test_on(&vmx->pi_desc)) {
> > + if (!bitmap_empty((unsigned long *)vmx->pi_desc.pir, NR_VECTORS)) {
> > pi_clear_on(&vmx->pi_desc);
> > /*
> > * IOMMU can write to PIR.ON, so the barrier matters even on UP.
>
> This is not what I asked. You should instead do the check after pi_clear_sn.
>
I think the SN has been cleared here before test the bitmap.
The SN will be set when the vCPU is schedule out. ID: 28b835d60fcc2498e717cf5e6f0c3691c24546f7
But SN will be cleared when sched in.
Another place is when vCPU run out of the vcpu_run() function:
kvm_arch_vcpu_ioctl_run()
vcpu_load(vcpu); -> kvm_arch_vcpu_load -> vmx_vcpu_load -> vmx_vcpu_pi_load -> new.sn = 0;
vcpu_run(vcpu);
for(;;)
vcpu_put(vcpu); -> kvm_arch_vcpu_put -> vmx_vcpu_put -> vmx_vcpu_pi_put -> pi_set_sn()
But SN will be cleared in vcpu_load() before back to vcpu_run()
Thanks,
Luwei Kang
On 30/01/19 10:01, Kang, Luwei wrote:
>> This is not what I asked. You should instead do the check after pi_clear_sn.
>>
>
> I think the SN has been cleared here before test the bitmap.
> The SN will be set when the vCPU is schedule out. ID: 28b835d60fcc2498e717cf5e6f0c3691c24546f7
> But SN will be cleared when sched in.
>
> Another place is when vCPU run out of the vcpu_run() function:
> kvm_arch_vcpu_ioctl_run()
> vcpu_load(vcpu); -> kvm_arch_vcpu_load -> vmx_vcpu_load -> vmx_vcpu_pi_load -> new.sn = 0;
> vcpu_run(vcpu);
> for(;;)
> vcpu_put(vcpu); -> kvm_arch_vcpu_put -> vmx_vcpu_put -> vmx_vcpu_pi_put -> pi_set_sn()
> But SN will be cleared in vcpu_load() before back to vcpu_run()
Yes, but you're changing the wrong path. The patch is affecting _all_
vmentries, not just those after PID.SN has been cleared.
As I mentioned in the previous email, KVM relies on the SDM's invariant
that ON where PID.ON=1 whenever PID.PIR!=0. Invariants are your best
friend when dealing with complicated multi-processor code so I don't
want to change that.
It's the VT-d pi_clear_sn path that I want to be changed, because it's
VT-d and specifically SN that complicates the very simple definition in
the SDM. By modifying the pi_clear_sn path, you ensure the invariant is
respected and everyone is happy.
Paolo
> >> This is not what I asked. You should instead do the check after pi_clear_sn.
> >>
> >
> > I think the SN has been cleared here before test the bitmap.
> > The SN will be set when the vCPU is schedule out. ID:
> > 28b835d60fcc2498e717cf5e6f0c3691c24546f7
> > But SN will be cleared when sched in.
> >
> > Another place is when vCPU run out of the vcpu_run() function:
> > kvm_arch_vcpu_ioctl_run()
> > vcpu_load(vcpu); -> kvm_arch_vcpu_load -> vmx_vcpu_load -> vmx_vcpu_pi_load -> new.sn = 0;
> > vcpu_run(vcpu);
> > for(;;)
> > vcpu_put(vcpu); -> kvm_arch_vcpu_put -> vmx_vcpu_put ->
> > vmx_vcpu_pi_put -> pi_set_sn() But SN will be cleared in vcpu_load()
> > before back to vcpu_run()
>
> Yes, but you're changing the wrong path. The patch is affecting _all_ vmentries, not just those after PID.SN has been cleared.
>
> As I mentioned in the previous email, KVM relies on the SDM's invariant that ON where PID.ON=1 whenever PID.PIR!=0. Invariants are your
> best friend when dealing with complicated multi-processor code so I don't want to change that.
>
> It's the VT-d pi_clear_sn path that I want to be changed, because it's VT-d and specifically SN that complicates the very simple definition in
> the SDM. By modifying the pi_clear_sn path, you ensure the invariant is respected and everyone is happy.
Hi Paolo,
How about like this:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 820a03b..dfc5e3d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1219,6 +1219,9 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
new.ndst = (dest << 8) & 0xFF00;
new.sn = 0;
+
+ if (!bitmap_empty((unsigned long *)new.pir, NR_VECTORS))
+ new.on = 1;
} while (cmpxchg64(&pi_desc->control, old.control,
new.control) != old.control);
}
Thanks,
Luwei Kang
> > >> This is not what I asked. You should instead do the check after pi_clear_sn.
> > >>
> > >
> > > I think the SN has been cleared here before test the bitmap.
> > > The SN will be set when the vCPU is schedule out. ID:
> > > 28b835d60fcc2498e717cf5e6f0c3691c24546f7
> > > But SN will be cleared when sched in.
> > >
> > > Another place is when vCPU run out of the vcpu_run() function:
> > > kvm_arch_vcpu_ioctl_run()
> > > vcpu_load(vcpu); -> kvm_arch_vcpu_load -> vmx_vcpu_load -> vmx_vcpu_pi_load -> new.sn = 0;
> > > vcpu_run(vcpu);
> > > for(;;)
> > > vcpu_put(vcpu); -> kvm_arch_vcpu_put -> vmx_vcpu_put ->
> > > vmx_vcpu_pi_put -> pi_set_sn() But SN will be cleared in vcpu_load()
> > > before back to vcpu_run()
> >
> > Yes, but you're changing the wrong path. The patch is affecting _all_ vmentries, not just those after PID.SN has been cleared.
> >
> > As I mentioned in the previous email, KVM relies on the SDM's
> > invariant that ON where PID.ON=1 whenever PID.PIR!=0. Invariants are your best friend when dealing with complicated multi-processor
> code so I don't want to change that.
> >
> > It's the VT-d pi_clear_sn path that I want to be changed, because it's
> > VT-d and specifically SN that complicates the very simple definition in the SDM. By modifying the pi_clear_sn path, you ensure the
> invariant is respected and everyone is happy.
>
> Hi Paolo,
> How about like this:
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 820a03b..dfc5e3d 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1219,6 +1219,9 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
> new.ndst = (dest << 8) & 0xFF00;
>
> new.sn = 0;
> +
> + if (!bitmap_empty((unsigned long *)new.pir, NR_VECTORS))
> + new.on = 1;
Sorry, should be:
+ if (!bitmap_empty((unsigned long *)pi_desc->pir, NR_VECTORS))
Luwei Kang
> } while (cmpxchg64(&pi_desc->control, old.control,
> new.control) != old.control); }
>
> Thanks,
> Luwei Kang
On 30/01/19 11:38, Kang, Luwei wrote:
>>>> This is not what I asked. You should instead do the check after pi_clear_sn.
>>>>
>>>
>>> I think the SN has been cleared here before test the bitmap.
>>> The SN will be set when the vCPU is schedule out. ID:
>>> 28b835d60fcc2498e717cf5e6f0c3691c24546f7
>>> But SN will be cleared when sched in.
>>>
>>> Another place is when vCPU run out of the vcpu_run() function:
>>> kvm_arch_vcpu_ioctl_run()
>>> vcpu_load(vcpu); -> kvm_arch_vcpu_load -> vmx_vcpu_load -> vmx_vcpu_pi_load -> new.sn = 0;
>>> vcpu_run(vcpu);
>>> for(;;)
>>> vcpu_put(vcpu); -> kvm_arch_vcpu_put -> vmx_vcpu_put ->
>>> vmx_vcpu_pi_put -> pi_set_sn() But SN will be cleared in vcpu_load()
>>> before back to vcpu_run()
>>
>> Yes, but you're changing the wrong path. The patch is affecting _all_ vmentries, not just those after PID.SN has been cleared.
>>
>> As I mentioned in the previous email, KVM relies on the SDM's invariant that ON where PID.ON=1 whenever PID.PIR!=0. Invariants are your
>> best friend when dealing with complicated multi-processor code so I don't want to change that.
>>
>> It's the VT-d pi_clear_sn path that I want to be changed, because it's VT-d and specifically SN that complicates the very simple definition in
>> the SDM. By modifying the pi_clear_sn path, you ensure the invariant is respected and everyone is happy.
>
> Hi Paolo,
> How about like this:
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 820a03b..dfc5e3d 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1219,6 +1219,9 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
> new.ndst = (dest << 8) & 0xFF00;
>
> new.sn = 0;
> +
> + if (!bitmap_empty((unsigned long *)new.pir, NR_VECTORS))
> + new.on = 1;
This is racy, the bitmap can change after the cmpxchg64; and the "if"
that calls pi_clear_sn needs to check the bitmap as well.
So you have to remove that "if", and move the check outside the
do/while. You also need an smp_mb__after_atomic() after the do/while.
Paolo
> } while (cmpxchg64(&pi_desc->control, old.control,
> new.control) != old.control);
> }