Hi list,
We have observed on GICv4.1 systems that, after a guest reset, the
secondary VCPU would receive an IPI_CPU_STOP accidently and failed to
come online eventually.
| Guest User-space
|
| reset (with a vSGI pending in the
| hardware) [0]
|
| disable the distributor (write 0
| into GICD_CTLR) [1]
|
| clear pending status (write 0 into
| GICR_ISPENDR0 for each RD) [2]
|
| disable the distributor (write 0
| into GICD_CTLR) [3]
|
| enable the distributor with ARE,
| Group1 and nASSGIreq [4]
The problem is that even if user-space tries to disable the distributor
and clear pending bits for all SGIs, we don't actually propogate it into
HW (we only record it via vgic_dist->{enabled, nassgireq} and
vgic_irq->pending_latch) and the vSGI remains pending across the guest
reset.
And when we're at [4], all vSGI's vgic_irq->hw are *true* and
vgic_v4_enable_vsgis() becomes a nop.. That's not good.
The following solution can solve the problem. Not sure if this is a good
solution.Sent out early for suggestions or solutions.
diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
index 89117ba2528a..3678ab33f5b9 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
@@ -374,6 +374,10 @@ static int vgic_v3_uaccess_write_pending(struct
kvm_vcpu *vcpu,
irq->pending_latch = true;
vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
} else {
+ if (irq->hw && vgic_irq_is_sgi(irq->intid))
+ irq_set_irqchip_state(irq->host_irq,
+ IRQCHIP_STATE_PENDING,
+ false);
irq->pending_latch = false;
raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
}
On Thu, 14 Dec 2023 12:13:57 +0000,
Kunkun Jiang <[email protected]> wrote:
>
> Hi list,
>
> We have observed on GICv4.1 systems that, after a guest reset, the
> secondary VCPU would receive an IPI_CPU_STOP accidently and failed to
> come online eventually.
>
> | Guest User-space
Guest userspace????
> |
> | reset (with a vSGI pending in the
> | hardware) [0]
> |
> | disable the distributor (write 0
> | into GICD_CTLR) [1]
> |
> | clear pending status (write 0 into
> | GICR_ISPENDR0 for each RD) [2]
This cannot clear the pending bits. Writing 0 to any of the
ISPEND/ICPEND registers is effectively a NOP.
If you want to remove the pending SGIs, you need to write a bunch of
1s to ICPENDR0.
> |
> | disable the distributor (write 0
> | into GICD_CTLR) [3]
> |
> | enable the distributor with ARE,
> | Group1 and nASSGIreq [4]
>
> The problem is that even if user-space tries to disable the distributor
I don't understand what userspace does here. Why is it significant
that it is an EL0 access? I don't know of any SW doing that, but even
if it existed, this should make no difference.
> and clear pending bits for all SGIs, we don't actually propogate it into
> HW (we only record it via vgic_dist->{enabled, nassgireq} and
> vgic_irq->pending_latch) and the vSGI remains pending across the guest
> reset.
>
> And when we're at [4], all vSGI's vgic_irq->hw are *true* and
> vgic_v4_enable_vsgis() becomes a nop.. That's not good.
>
> The following solution can solve the problem. Not sure if this is a good
> solution.Sent out early for suggestions or solutions.
>
> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> index 89117ba2528a..3678ab33f5b9 100644
> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> @@ -374,6 +374,10 @@ static int vgic_v3_uaccess_write_pending(struct
> kvm_vcpu *vcpu,
> irq->pending_latch = true;
> vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> } else {
> + if (irq->hw && vgic_irq_is_sgi(irq->intid))
> + irq_set_irqchip_state(irq->host_irq,
> + IRQCHIP_STATE_PENDING,
> + false);
> irq->pending_latch = false;
> raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
> }
>
But this has *nothing* to do with the guest. This is the *host*
userspace performing a write to the redistributor view, which has
different semantics. Which is why your earlier description made no
sense to me.
I think the problem is slightly larger than what you describe. A write
to ISPENDR0 should be propagated to the ITS for any values of the
latch, just like this happens on enabling HW-backed SGIs.
Can you please give this a go?
Thanks,
M.
From 9932d74392d969057e84bc3c18bd15f1769b6211 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <[email protected]>
Date: Sun, 17 Dec 2023 11:15:09 +0000
Subject: [PATCH] KVM: arm64: vgic-v4: Restore pending state on host userspace
write
When the VMM writes to ISPENDR0 to set the state pending state of
an SGI, we fail to convey this to the HW if this SGI is already
backed by a GICv4.1 vSGI.
This is a bit of a corner case, as this would only occur if the
vgic state is changed on an already running VM, but this can
apparently happen across a guest reset driven by the VMM.
Fix this by always writing out the pending_latch value to the
HW, and reseting it to false.
Reported-by: Kunkun Jiang <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/arm64/kvm/vgic/vgic-mmio-v3.c | 27 +++++++++++++++++----------
1 file changed, 17 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
index a764b0ab8bf9..2533f264b954 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
@@ -365,19 +365,26 @@ static int vgic_v3_uaccess_write_pending(struct kvm_vcpu *vcpu,
struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
raw_spin_lock_irqsave(&irq->irq_lock, flags);
- if (test_bit(i, &val)) {
- /*
- * pending_latch is set irrespective of irq type
- * (level or edge) to avoid dependency that VM should
- * restore irq config before pending info.
- */
- irq->pending_latch = true;
- vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
- } else {
+
+ /*
+ * pending_latch is set irrespective of irq type
+ * (level or edge) to avoid dependency that VM should
+ * restore irq config before pending info.
+ */
+ irq->pending_latch = test_bit(i, &val);
+
+ if (irq->hw && vgic_irq_is_sgi(irq->intid)) {
+ irq_set_irqchip_state(irq->host_irq,
+ IRQCHIP_STATE_PENDING,
+ irq->pending_latch);
irq->pending_latch = false;
- raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
}
+ if (irq->pending_latch)
+ vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
+ else
+ raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+
vgic_put_irq(vcpu->kvm, irq);
}
--
2.39.2
--
Without deviation from the norm, progress is not possible.
On Sun, Dec 17, 2023 at 11:26:15AM +0000, Marc Zyngier wrote:
[...]
> But this has *nothing* to do with the guest. This is the *host*
> userspace performing a write to the redistributor view, which has
> different semantics. Which is why your earlier description made no
> sense to me.
>
> I think the problem is slightly larger than what you describe. A write
> to ISPENDR0 should be propagated to the ITS for any values of the
> latch, just like this happens on enabling HW-backed SGIs.
>
> Can you please give this a go?
What do you think about using this as an opportunity for a bit of
cleanup? It'd be nice unify the various MMIO and uaccess handlers for
SPENDING + CPENDING while being careful about the arch_timer interrupt.
clear = ~val;
vgic_uaccess_write_spending(val);
vgic_uaccess_write_cpending(clear);
Happy to take your fix too, especially in case I missed something
obvious :)
[*] https://git.kernel.org/pub/scm/linux/kernel/git/oupton/linux.git/log/?h=kvm-arm64/vsgi-spending-fixes
--
Thanks,
Oliver
On Sun, Dec 17, 2023 at 05:33:16PM +0000, Oliver Upton wrote:
> On Sun, Dec 17, 2023 at 11:26:15AM +0000, Marc Zyngier wrote:
>
> [...]
>
> > But this has *nothing* to do with the guest. This is the *host*
> > userspace performing a write to the redistributor view, which has
> > different semantics. Which is why your earlier description made no
> > sense to me.
> >
> > I think the problem is slightly larger than what you describe. A write
> > to ISPENDR0 should be propagated to the ITS for any values of the
> > latch, just like this happens on enabling HW-backed SGIs.
> >
> > Can you please give this a go?
>
> What do you think about using this as an opportunity for a bit of
> cleanup? It'd be nice unify the various MMIO and uaccess handlers for
> SPENDING + CPENDING while being careful about the arch_timer interrupt.
Cut myself off... Meant to say that user writes to SPENDING for GICv3
can then be treated as:
> clear = ~val;
> vgic_uaccess_write_spending(val);
> vgic_uaccess_write_cpending(clear);
--
Thanks,
Oliver
On Sun, 17 Dec 2023 17:34:38 +0000,
Oliver Upton <[email protected]> wrote:
>
> On Sun, Dec 17, 2023 at 05:33:16PM +0000, Oliver Upton wrote:
> > On Sun, Dec 17, 2023 at 11:26:15AM +0000, Marc Zyngier wrote:
> >
> > [...]
> >
> > > But this has *nothing* to do with the guest. This is the *host*
> > > userspace performing a write to the redistributor view, which has
> > > different semantics. Which is why your earlier description made no
> > > sense to me.
> > >
> > > I think the problem is slightly larger than what you describe. A write
> > > to ISPENDR0 should be propagated to the ITS for any values of the
> > > latch, just like this happens on enabling HW-backed SGIs.
> > >
> > > Can you please give this a go?
> >
> > What do you think about using this as an opportunity for a bit of
> > cleanup? It'd be nice unify the various MMIO and uaccess handlers for
> > SPENDING + CPENDING while being careful about the arch_timer interrupt.
What is special about the timer interrupt?
> Cut myself off... Meant to say that user writes to SPENDING for GICv3
> can then be treated as:
>
> > clear = ~val;
> > vgic_uaccess_write_spending(val);
> > vgic_uaccess_write_cpending(clear);
Could be. But I'd rather have separate fixes from more invasive
reworks. Specially given that we have had multiple ugly bugs around
this code in the past, which is why we ended up splitting userspace
from guest accessors.
M.
--
Without deviation from the norm, progress is not possible.
On Sun, Dec 17, 2023 at 06:52:55PM +0000, Marc Zyngier wrote:
> On Sun, 17 Dec 2023 17:34:38 +0000,
> Oliver Upton <[email protected]> wrote:
> >
> > On Sun, Dec 17, 2023 at 05:33:16PM +0000, Oliver Upton wrote:
> > > On Sun, Dec 17, 2023 at 11:26:15AM +0000, Marc Zyngier wrote:
> > >
> > > [...]
> > >
> > > > But this has *nothing* to do with the guest. This is the *host*
> > > > userspace performing a write to the redistributor view, which has
> > > > different semantics. Which is why your earlier description made no
> > > > sense to me.
> > > >
> > > > I think the problem is slightly larger than what you describe. A write
> > > > to ISPENDR0 should be propagated to the ITS for any values of the
> > > > latch, just like this happens on enabling HW-backed SGIs.
> > > >
> > > > Can you please give this a go?
> > >
> > > What do you think about using this as an opportunity for a bit of
> > > cleanup? It'd be nice unify the various MMIO and uaccess handlers for
> > > SPENDING + CPENDING while being careful about the arch_timer interrupt.
>
> What is special about the timer interrupt?
Isn't that the case where we have a physical IRQ mapped and wind up
forwarding state to the physical GIC?
> Could be. But I'd rather have separate fixes from more invasive
> reworks. Specially given that we have had multiple ugly bugs around
> this code in the past, which is why we ended up splitting userspace
> from guest accessors.
Fine by me. I had felt like a common helper w/ the user v. guest
exclusions is a bit easier to understand than diffing two very similar
functions, but it isn't a big deal.
Anyway, I'm happy with your fix. I'd like Kunkun to give it a go but
either way I can pick it up for 6.7.
--
Thanks,
Oliver
On Mon, 18 Dec 2023 17:20:04 +0000,
Oliver Upton <[email protected]> wrote:
>
> On Sun, Dec 17, 2023 at 06:52:55PM +0000, Marc Zyngier wrote:
> > On Sun, 17 Dec 2023 17:34:38 +0000,
> > Oliver Upton <[email protected]> wrote:
> > >
> > > On Sun, Dec 17, 2023 at 05:33:16PM +0000, Oliver Upton wrote:
> > > > On Sun, Dec 17, 2023 at 11:26:15AM +0000, Marc Zyngier wrote:
> > > >
> > > > [...]
> > > >
> > > > > But this has *nothing* to do with the guest. This is the *host*
> > > > > userspace performing a write to the redistributor view, which has
> > > > > different semantics. Which is why your earlier description made no
> > > > > sense to me.
> > > > >
> > > > > I think the problem is slightly larger than what you describe. A write
> > > > > to ISPENDR0 should be propagated to the ITS for any values of the
> > > > > latch, just like this happens on enabling HW-backed SGIs.
> > > > >
> > > > > Can you please give this a go?
> > > >
> > > > What do you think about using this as an opportunity for a bit of
> > > > cleanup? It'd be nice unify the various MMIO and uaccess handlers for
> > > > SPENDING + CPENDING while being careful about the arch_timer interrupt.
> >
> > What is special about the timer interrupt?
>
> Isn't that the case where we have a physical IRQ mapped and wind up
> forwarding state to the physical GIC?
Indeed. But that's not specific to the timer. That's a general
infrastructure that NV also uses it for the GIC maintenance interrupt.
>
> > Could be. But I'd rather have separate fixes from more invasive
> > reworks. Specially given that we have had multiple ugly bugs around
> > this code in the past, which is why we ended up splitting userspace
> > from guest accessors.
>
> Fine by me. I had felt like a common helper w/ the user v. guest
> exclusions is a bit easier to understand than diffing two very similar
> functions, but it isn't a big deal.
To be clear, I'm not objecting to that rework. It's just that we
should carefully review these patches on the list, as they would have
a wider ranging impact than a pure GICv4.1 change.
> Anyway, I'm happy with your fix. I'd like Kunkun to give it a go but
> either way I can pick it up for 6.7.
Yup, same here. But we also shouldn't hold the 6.7 fixes for too long.
If we don't get feedback from Kunkun shortly, I'd suggest to move this
patch to 6.8, and at this point your approach makes complete sense.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
Hi Marc and Oliver,
On 2023/12/17 19:26, Marc Zyngier wrote:
> On Thu, 14 Dec 2023 12:13:57 +0000,
> Kunkun Jiang <[email protected]> wrote:
>> Hi list,
>>
>> We have observed on GICv4.1 systems that, after a guest reset, the
>> secondary VCPU would receive an IPI_CPU_STOP accidently and failed to
>> come online eventually.
>>
>> | Guest User-space
> Guest userspace????
Sorry. The formatting is a bit messy.
| Guest User-space
|
| reset (with a vSGI pending in the
| hardware) [0]
|
| disable the distributor (write 0
| into GICD_CTLR) [1]
|
| clear pending status (write 0 into
| GICR_ISPENDR0 for each RD) [2]
|
| disable the distributor (write 0
| into GICD_CTLR) [3]
|
| enable the distributor with ARE,
| Group1 and nASSGIreq [4]
>> |
>> | reset (with a vSGI pending in the
>> | hardware) [0]
>> |
>> | disable the distributor (write 0
>> | into GICD_CTLR) [1]
>> |
>> | clear pending status (write 0 into
>> | GICR_ISPENDR0 for each RD) [2]
> This cannot clear the pending bits. Writing 0 to any of the
> ISPEND/ICPEND registers is effectively a NOP.
>
> If you want to remove the pending SGIs, you need to write a bunch of
> 1s to ICPENDR0.
>
>> |
>> | disable the distributor (write 0
>> | into GICD_CTLR) [3]
>> |
>> | enable the distributor with ARE,
>> | Group1 and nASSGIreq [4]
>>
>> The problem is that even if user-space tries to disable the distributor
> I don't understand what userspace does here. Why is it significant
> that it is an EL0 access? I don't know of any SW doing that, but even
> if it existed, this should make no difference.
>
>> and clear pending bits for all SGIs, we don't actually propogate it into
>> HW (we only record it via vgic_dist->{enabled, nassgireq} and
>> vgic_irq->pending_latch) and the vSGI remains pending across the guest
>> reset.
>>
>> And when we're at [4], all vSGI's vgic_irq->hw are *true* and
>> vgic_v4_enable_vsgis() becomes a nop.. That's not good.
>>
>> The following solution can solve the problem. Not sure if this is a good
>> solution.Sent out early for suggestions or solutions.
>>
>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> index 89117ba2528a..3678ab33f5b9 100644
>> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> @@ -374,6 +374,10 @@ static int vgic_v3_uaccess_write_pending(struct
>> kvm_vcpu *vcpu,
>> irq->pending_latch = true;
>> vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>> } else {
>> + if (irq->hw && vgic_irq_is_sgi(irq->intid))
>> + irq_set_irqchip_state(irq->host_irq,
>> + IRQCHIP_STATE_PENDING,
>> + false);
>> irq->pending_latch = false;
>> raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
>> }
>>
> But this has *nothing* to do with the guest. This is the *host*
> userspace performing a write to the redistributor view, which has
> different semantics. Which is why your earlier description made no
> sense to me.
>
> I think the problem is slightly larger than what you describe. A write
> to ISPENDR0 should be propagated to the ITS for any values of the
> latch, just like this happens on enabling HW-backed SGIs.
>
> Can you please give this a go?
I have given it a go. I tested reseting the guest 100 times in succession,
and the problem did not occur again.
Thanks,
Kunkun Jiang
> Thanks,
>
> M.
>
> >From 9932d74392d969057e84bc3c18bd15f1769b6211 Mon Sep 17 00:00:00 2001
> From: Marc Zyngier <[email protected]>
> Date: Sun, 17 Dec 2023 11:15:09 +0000
> Subject: [PATCH] KVM: arm64: vgic-v4: Restore pending state on host userspace
> write
>
> When the VMM writes to ISPENDR0 to set the state pending state of
> an SGI, we fail to convey this to the HW if this SGI is already
> backed by a GICv4.1 vSGI.
>
> This is a bit of a corner case, as this would only occur if the
> vgic state is changed on an already running VM, but this can
> apparently happen across a guest reset driven by the VMM.
>
> Fix this by always writing out the pending_latch value to the
> HW, and reseting it to false.
>
> Reported-by: Kunkun Jiang <[email protected]>
> Signed-off-by: Marc Zyngier <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> ---
> arch/arm64/kvm/vgic/vgic-mmio-v3.c | 27 +++++++++++++++++----------
> 1 file changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> index a764b0ab8bf9..2533f264b954 100644
> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> @@ -365,19 +365,26 @@ static int vgic_v3_uaccess_write_pending(struct kvm_vcpu *vcpu,
> struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>
> raw_spin_lock_irqsave(&irq->irq_lock, flags);
> - if (test_bit(i, &val)) {
> - /*
> - * pending_latch is set irrespective of irq type
> - * (level or edge) to avoid dependency that VM should
> - * restore irq config before pending info.
> - */
> - irq->pending_latch = true;
> - vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> - } else {
> +
> + /*
> + * pending_latch is set irrespective of irq type
> + * (level or edge) to avoid dependency that VM should
> + * restore irq config before pending info.
> + */
> + irq->pending_latch = test_bit(i, &val);
> +
> + if (irq->hw && vgic_irq_is_sgi(irq->intid)) {
> + irq_set_irqchip_state(irq->host_irq,
> + IRQCHIP_STATE_PENDING,
> + irq->pending_latch);
> irq->pending_latch = false;
> - raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
> }
>
> + if (irq->pending_latch)
> + vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> + else
> + raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
> +
> vgic_put_irq(vcpu->kvm, irq);
> }
>
On 2023/12/19 1:20, Oliver Upton wrote:
> Anyway, I'm happy with your fix. I'd like Kunkun to give it a go but
> either way I can pick it up for 6.7.
+1. When applying, feel free to add my
Reviewed-by: Zenghui Yu <[email protected]>
and perhaps
Cc: [email protected] # 5.10+
Thanks,
Zenghui