2019-01-18 06:52:16

by Luwei Kang

[permalink] [raw]
Subject: [PATCH] KVM: x86: Sync the pending Posted-Interrupts

Some Posted-Interrupts from passthrough devices may be lost or
overwritten when the vCPU is in runnable state.

The SN (Suppress Notification) of PID (Posted Interrupt Descriptor) will
be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE state
but not running on physical CPU). If a posted interrupt coming at this
time, the irq remmaping facility will set the bit of PIR (Posted
Interrupt Requests) but ON (Outstanding Notification).
So this interrupt can't be sync to APIC virtualization register and
will not be handled by Guest because ON is zero.

Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f6915f1..820a03b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6048,7 +6048,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
bool max_irr_updated;

WARN_ON(!vcpu->arch.apicv_active);
- if (pi_test_on(&vmx->pi_desc)) {
+ if (!bitmap_empty((unsigned long *)vmx->pi_desc.pir, NR_VECTORS)) {
pi_clear_on(&vmx->pi_desc);
/*
* IOMMU can write to PIR.ON, so the barrier matters even on UP.
--
1.8.3.1



2019-01-25 16:34:48

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Sync the pending Posted-Interrupts

On Fri, Jan 18, 2019 at 02:34:00PM +0800, Luwei Kang wrote:
> Some Posted-Interrupts from passthrough devices may be lost or
> overwritten when the vCPU is in runnable state.
>
> The SN (Suppress Notification) of PID (Posted Interrupt Descriptor) will
> be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE state
> but not running on physical CPU). If a posted interrupt coming at this
> time, the irq remmaping facility will set the bit of PIR (Posted
> Interrupt Requests) but ON (Outstanding Notification).

s/but ON/and ON is set too/?
> So this interrupt can't be sync to APIC virtualization register and
> will not be handled by Guest because ON is zero.
>
> Signed-off-by: Luwei Kang <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index f6915f1..820a03b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6048,7 +6048,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> bool max_irr_updated;
>
> WARN_ON(!vcpu->arch.apicv_active);
> - if (pi_test_on(&vmx->pi_desc)) {
> + if (!bitmap_empty((unsigned long *)vmx->pi_desc.pir, NR_VECTORS)) {
> pi_clear_on(&vmx->pi_desc);
> /*
> * IOMMU can write to PIR.ON, so the barrier matters even on UP.

2019-01-25 18:28:33

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Sync the pending Posted-Interrupts

On 18/01/19 07:34, Luwei Kang wrote:
> Some Posted-Interrupts from passthrough devices may be lost or
> overwritten when the vCPU is in runnable state.
>
> The SN (Suppress Notification) of PID (Posted Interrupt Descriptor) will
> be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE state
> but not running on physical CPU). If a posted interrupt coming at this
> time, the irq remmaping facility will set the bit of PIR (Posted
> Interrupt Requests) but ON (Outstanding Notification).
> So this interrupt can't be sync to APIC virtualization register and
> will not be handled by Guest because ON is zero.
>
> Signed-off-by: Luwei Kang <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index f6915f1..820a03b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6048,7 +6048,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> bool max_irr_updated;
>
> WARN_ON(!vcpu->arch.apicv_active);
> - if (pi_test_on(&vmx->pi_desc)) {
> + if (!bitmap_empty((unsigned long *)vmx->pi_desc.pir, NR_VECTORS)) {
> pi_clear_on(&vmx->pi_desc);
> /*
> * IOMMU can write to PIR.ON, so the barrier matters even on UP.
>

This is a very delicate path. The bitmap check here is ordered after
the vcpu->mode write in vcpu_enter_guest, matching the check of
vcpu->mode in vmx_deliver_posted_interrupt (which comes after a write of
PIR.ON):

sender receiver
write PIR
write PIR.ON vcpu->mode = IN_GUEST_MODE
smp_mb() smp_mb()
read vcpu->mode sync_pir_to_irr
read PIR.ON

What you did should work, since PIR is written after PIR.ON anyway.
However, you should at least change the comment in vcpu_enter_guest to
mention "before reading PIR" instead of "before reading PIR.ON".

Alternatively, would it be possible to instead set ON when SN is
cleared? The clearing of SN is in pi_clear_sn, and you would have
instead something like

WRITE_ONCE(u16 *)&pi_desc->on_sn, POSTED_INTR_ON);

where on_sn is added to struct pi_desc like this:

@@ -61,4 +60,5 @@ struct pi_desc {
u32 ndst;
};
+ u16 on_sn;
u64 control;
};

Paolo

2019-01-28 06:18:22

by Luwei Kang

[permalink] [raw]
Subject: RE: [PATCH] KVM: x86: Sync the pending Posted-Interrupts

> > Some Posted-Interrupts from passthrough devices may be lost or
> > overwritten when the vCPU is in runnable state.
> >
> > The SN (Suppress Notification) of PID (Posted Interrupt Descriptor)
> > will be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE
> > state but not running on physical CPU). If a posted interrupt coming
> > at this time, the irq remmaping facility will set the bit of PIR
> > (Posted Interrupt Requests) but ON (Outstanding Notification).
>
> s/but ON/and ON is set too/?

Sorry. If the interrupt remapping facility received a interrupt from device but the current SN bit is 1, the remapping facility will not set ON and not send notification event to CPU, just set the corresponding bit of PIR.

Thanks,
Luwei Kang

> > So this interrupt can't be sync to APIC virtualization register and
> > will not be handled by Guest because ON is zero.
> >
> > Signed-off-by: Luwei Kang <[email protected]>
> > ---
> > arch/x86/kvm/vmx/vmx.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> > f6915f1..820a03b 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -6048,7 +6048,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> > bool max_irr_updated;
> >
> > WARN_ON(!vcpu->arch.apicv_active);
> > - if (pi_test_on(&vmx->pi_desc)) {
> > + if (!bitmap_empty((unsigned long *)vmx->pi_desc.pir, NR_VECTORS)) {
> > pi_clear_on(&vmx->pi_desc);
> > /*
> > * IOMMU can write to PIR.ON, so the barrier matters even on UP.

2019-01-28 08:09:21

by Luwei Kang

[permalink] [raw]
Subject: RE: [PATCH] KVM: x86: Sync the pending Posted-Interrupts

> > Some Posted-Interrupts from passthrough devices may be lost or
> > overwritten when the vCPU is in runnable state.
> >
> > The SN (Suppress Notification) of PID (Posted Interrupt Descriptor)
> > will be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE
> > state but not running on physical CPU). If a posted interrupt coming
> > at this time, the irq remmaping facility will set the bit of PIR
> > (Posted Interrupt Requests) but ON (Outstanding Notification).
> > So this interrupt can't be sync to APIC virtualization register and
> > will not be handled by Guest because ON is zero.
> >
> > Signed-off-by: Luwei Kang <[email protected]>
> > ---
> > arch/x86/kvm/vmx/vmx.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> > f6915f1..820a03b 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -6048,7 +6048,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> > bool max_irr_updated;
> >
> > WARN_ON(!vcpu->arch.apicv_active);
> > - if (pi_test_on(&vmx->pi_desc)) {
> > + if (!bitmap_empty((unsigned long *)vmx->pi_desc.pir, NR_VECTORS)) {
> > pi_clear_on(&vmx->pi_desc);
> > /*
> > * IOMMU can write to PIR.ON, so the barrier matters even on UP.
> >
>
> This is a very delicate path. The bitmap check here is ordered after the vcpu->mode write in vcpu_enter_guest, matching the check of
> vcpu->mode in vmx_deliver_posted_interrupt (which comes after a write of
> PIR.ON):
>
> sender receiver
> write PIR
> write PIR.ON vcpu->mode = IN_GUEST_MODE
> smp_mb() smp_mb()
> read vcpu->mode sync_pir_to_irr
> read PIR.ON
>
> What you did should work, since PIR is written after PIR.ON anyway.

Hi Paolo:
I think what you mentioned PIR.ON here is PID.ON;
PID: Posted Interrupt Descriptor (a structure for PI which include 512 bits)
ON: Outstanding Notification (one bit of PID)
PIR: Posted Interrupt Requests (256 bits for each interrupt vector in PID)

Before VT-d irq remapping, there just have PIR and ON in PID. Posted interrupt introduced by APICv can be used for send IPI. The working flow of sent a posted interrupt show in vmx_deliver_posted_interrupt().
1. Set PIR of PID
2. Set ON of PID
3. Send a IPI to target vCPU with notification vector (POSTED_INTR_VECTOR) if target vCPU is Running on a pCPU; have no vm-exit and handed by guest interrupt handler directly.
4. if the target vCPU is not Running on pCPU, invoke kvm_vcpu_kick() to wake up this vCPU or send RESCHEDULE_VECTOR IPI to target pCPU to make the vCPU running as soon as possible.
5. follow 4. The vCPU prepare to run (vcpu_enter_guest) and sync the posted interrupt of ON is set.

It looks like work well.
VT-d irq remapping facility introduce SN, NV, NDST in PID. These are used by irq remapping facility and CPU don’t care these flags.
6. The bit of SN will be set when vCPU id preempted (runnable but not running).
commit 28b835d60fcc2498e717cf5e6f0c3691c24546f7
KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
7. if a interrupt coming at this moment, irq remapping facility just set PIR but *not* set ON (VT-d spec 9.12).
So, here, the interrupt can't be sync to IRR because ON is 0.

I add some log here and found some interrupt recorded in PIR but ON is zero. It will impact the performance of pass through device.

> However, you should at least change the comment in vcpu_enter_guest to mention "before reading PIR" instead of "before reading
> PIR.ON".

Will do that. I think the "checking PIR.ON" should be PID.ON. I will fix it.

>
> Alternatively, would it be possible to instead set ON when SN is cleared? The clearing of SN is in pi_clear_sn, and you would have instead
> something like

SN is cleared when the corresponding vCPU is running on pCPU. I think we can't set ON when SN is cleared. Because there have some words in VT-d spec 9.12:
If ON is set at the time of hardware posting an interrupt to PIR field, notification event is not generated.

>
> WRITE_ONCE(u16 *)&pi_desc->on_sn, POSTED_INTR_ON);

We already have a function (pi_test_on) to check the bit of POSTED_INTR_ON. So I think it is unnecessary.

Thanks,
Luwei Kang

>
> where on_sn is added to struct pi_desc like this:
>
> @@ -61,4 +60,5 @@ struct pi_desc {
> u32 ndst;
> };
> + u16 on_sn;
> u64 control;
> };
>
> Paolo

2019-01-28 09:19:13

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Sync the pending Posted-Interrupts

On 28/01/19 09:08, Kang, Luwei wrote:
>> However, you should at least change the comment in vcpu_enter_guest to mention "before reading PIR" instead of "before reading
>> PIR.ON".
>
> Will do that. I think the "checking PIR.ON" should be PID.ON. I will fix it.

Yes.

>> Alternatively, would it be possible to instead set ON when SN is
>> cleared? The clearing of SN is in pi_clear_sn, and you would have instead
>> something like

>
> SN is cleared when the corresponding vCPU is running on pCPU. I think we can't set ON when SN is cleared. Because there have some words in VT-d spec 9.12:
> If ON is set at the time of hardware posting an interrupt to PIR field, notification event is not generated.

This is okay, because you are setting ON and vmx_sync_pir_to_irr will
read ON before the next vCPU entry and move the interrupts from PIR to IRR.

Paolo

>>
>> WRITE_ONCE(u16 *)&pi_desc->on_sn, POSTED_INTR_ON);
>
> We already have a function (pi_test_on) to check the bit of POSTED_INTR_ON. So I think it is unnecessary.
>
> Thanks,
> Luwei Kang
>
>>
>> where on_sn is added to struct pi_desc like this:
>>
>> @@ -61,4 +60,5 @@ struct pi_desc {
>> u32 ndst;
>> };
>> + u16 on_sn;
>> u64 control;
>> };
>>
>> Paolo


2019-01-29 09:35:03

by Luwei Kang

[permalink] [raw]
Subject: RE: [PATCH] KVM: x86: Sync the pending Posted-Interrupts

> > However, you should at least change the comment in vcpu_enter_guest to
> > mention "before reading PIR" instead of "before reading PIR.ON".
>
> Will do that. I think the "checking PIR.ON" should be PID.ON. I will fix it.
>
Hi Paolo,
I reconsidered the comment in vcpu_enter_guest() and think it may don't need to change. The original comment as below:
* 2) For APICv, we should set ->mode before checking PIR.ON. This
* pairs with the memory barrier implicit in pi_test_and_set_on
* (see vmx_deliver_posted_interrupt).

I think "before checking PIR.ON" is mean "before checking PID.PIR and PID.ON". Because there indeed have this two flag check in vmx_deliver_posted_interrupt() function. If PID.ON is already Set at the time of hardware posting an interrupt to PIR field, notification event is not generated (from VT-d spec 9.12).

Thanks,
Luwei Kang

2019-01-29 13:32:46

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Sync the pending Posted-Interrupts

On 29/01/19 10:32, Kang, Luwei wrote:
>>> However, you should at least change the comment in vcpu_enter_guest to
>>> mention "before reading PIR" instead of "before reading PIR.ON".
>>
>> Will do that. I think the "checking PIR.ON" should be PID.ON. I will fix it.
>>
> Hi Paolo,
> I reconsidered the comment in vcpu_enter_guest() and think it may don't need to change. The original comment as below:
> * 2) For APICv, we should set ->mode before checking PIR.ON. This
> * pairs with the memory barrier implicit in pi_test_and_set_on
> * (see vmx_deliver_posted_interrupt).
>
> I think "before checking PIR.ON" is mean "before checking PID.PIR and PID.ON".

I can say it only means PID.ON because I wrote the comment. :)

The idea is that checking ON is enough: KVM assumes that PID.PIR is only
set if PID.ON is set, because it follows the definition of ON in table
29-1 of the SDM: "If this bit is set, there is a notification
outstanding for one or more posted interrupts in bits 255:0".

VT-D breaks this assumption whenever SN=1 ("hardware does not generate
notification event nor modify the ON field"), resulting in nonzero
PID.PIR but PID.ON=0. I'm sure there was a reason for that, but it does
result in inconsistency between the PID definitions in the SDM and the
VT-D specification. The right fix is definitely to reconcile this
difference and test the bitmap after SN is cleared (with
smp_mb__after_atomic after clearing SN), and set ON=1 if the bitmap is
not clear.

Paolo

2019-01-30 00:38:29

by Luwei Kang

[permalink] [raw]
Subject: RE: [PATCH] KVM: x86: Sync the pending Posted-Interrupts

> >>> However, you should at least change the comment in vcpu_enter_guest
> >>> to mention "before reading PIR" instead of "before reading PIR.ON".
> >>
> >> Will do that. I think the "checking PIR.ON" should be PID.ON. I will fix it.
> >>
> > Hi Paolo,
> > I reconsidered the comment in vcpu_enter_guest() and think it may don't need to change. The original comment as below:
> > * 2) For APICv, we should set ->mode before checking PIR.ON. This
> > * pairs with the memory barrier implicit in pi_test_and_set_on
> > * (see vmx_deliver_posted_interrupt).
> >
> > I think "before checking PIR.ON" is mean "before checking PID.PIR and PID.ON".
>
> I can say it only means PID.ON because I wrote the comment. :)
>
> The idea is that checking ON is enough: KVM assumes that PID.PIR is only set if PID.ON is set, because it follows the definition of ON in table
> 29-1 of the SDM: "If this bit is set, there is a notification outstanding for one or more posted interrupts in bits 255:0".
>
> VT-D breaks this assumption whenever SN=1 ("hardware does not generate notification event nor modify the ON field"), resulting in
> nonzero PID.PIR but PID.ON=0. I'm sure there was a reason for that, but it does result in inconsistency between the PID definitions in the
> SDM and the VT-D specification. The right fix is definitely to reconcile this difference and test the bitmap after SN is cleared (with
> smp_mb__after_atomic after clearing SN), and set ON=1 if the bitmap is not clear.
>
Hi Paolo
Thanks for your elaboration, very clear to me. I will fix it in next version.

Thanks,
Luwei Kang