2023-01-06 08:37:04

by Yipeng Zou

[permalink] [raw]
Subject: [RFC PATCH] irqchip/gic-v3: wait irq done to set affinity

Recently we have some problem about gic set affinity in our test.

This patch just aim to make some discuss about this problem.

For now, the implementation of gic set affinity going to take effects
immediately, and without check if any irq are being processed.

So, This leads to some problem, think about this scenario:

1. First, we have an irq was generated by an device.

2. In the processing of this irq(after handle event, before clear
IRQD_IRQ_INPROGRESS flag), we modify the route and the gic takes effect
immediately,at the same time the new one was generated again.

3. The new irq will be processing in other cpu which different form the
old one.

4. The new irq going to be discarded because of the flag IRQD_IRQ_INPROGRESS
has been set.

I notice that if we set IRQF_ONESHOT when register the irq, this problem
will gone.

But I'm also thinking about change the gic_set_affinity function, to wait
current irq done on all cpus before gic_write_irouter.
I'm not sure if that's appropriate.

Is the best workaround to use IRQF_ONESHOT to prevent reentrancy?

Please let me know, if have any other suggestions on this issue.

Signed-off-by: Yipeng Zou <[email protected]>
---
drivers/irqchip/irq-gic-v3.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 997104d4338e..e9b9f15f07f8 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1348,6 +1348,8 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
reg = gic_dist_base(d) + offset + (index * 8);
val = gic_mpidr_to_affinity(cpu_logical_map(cpu));

+ // wait irq done on all cpus
+
gic_write_irouter(val, reg);

/*
--
2.17.1


2023-01-06 12:02:41

by Marc Zyngier

[permalink] [raw]
Subject: Re: [RFC PATCH] irqchip/gic-v3: wait irq done to set affinity

On Fri, 06 Jan 2023 08:21:36 +0000,
Yipeng Zou <[email protected]> wrote:
>
> Recently we have some problem about gic set affinity in our test.
>
> This patch just aim to make some discuss about this problem.
>
> For now, the implementation of gic set affinity going to take effects
> immediately, and without check if any irq are being processed.
>
> So, This leads to some problem, think about this scenario:
>
> 1. First, we have an irq was generated by an device.
>
> 2. In the processing of this irq(after handle event, before clear
> IRQD_IRQ_INPROGRESS flag), we modify the route and the gic takes effect
> immediately,at the same time the new one was generated again.

How is that possible?

If it is affected by GICD_IROUTERn (as your patch suggests), then it
is a SPI. If it is a SPI, it has an active state. Which means it
cannot fire again without a deactivation (EOI if EOImode=0, EOI+DIR if
EOImode=1) having taken place.

So either something has deactivated the interrupt without masking it
beforehand, or the active state is not honoured. Either way, this is
wrong.

>
> 3. The new irq will be processing in other cpu which different form the
> old one.
>
> 4. The new irq going to be discarded because of the flag IRQD_IRQ_INPROGRESS
> has been set.
>
> I notice that if we set IRQF_ONESHOT when register the irq, this problem
> will gone.
>
> But I'm also thinking about change the gic_set_affinity function, to wait
> current irq done on all cpus before gic_write_irouter.
> I'm not sure if that's appropriate.

The base architecture should guarantee that this is not a problem,
thanks to the active state. If that was a LPI (which do not have an
active state), that'd be a different problem. But this doesn't seem to
be the case here.

I'm afraid to say that what you describe seem like a bug of some sort,
either HW or SW.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2023-01-09 12:47:43

by Yipeng Zou

[permalink] [raw]
Subject: Re: [RFC PATCH] irqchip/gic-v3: wait irq done to set affinity


在 2023/1/6 19:55, Marc Zyngier 写道:
> On Fri, 06 Jan 2023 08:21:36 +0000,
> Yipeng Zou <[email protected]> wrote:
>> Recently we have some problem about gic set affinity in our test.
>>
>> This patch just aim to make some discuss about this problem.
>>
>> For now, the implementation of gic set affinity going to take effects
>> immediately, and without check if any irq are being processed.
>>
>> So, This leads to some problem, think about this scenario:
>>
>> 1. First, we have an irq was generated by an device.
>>
>> 2. In the processing of this irq(after handle event, before clear
>> IRQD_IRQ_INPROGRESS flag), we modify the route and the gic takes effect
>> immediately,at the same time the new one was generated again.
> How is that possible?
>
> If it is affected by GICD_IROUTERn (as your patch suggests), then it
> is a SPI. If it is a SPI, it has an active state. Which means it
> cannot fire again without a deactivation (EOI if EOImode=0, EOI+DIR if
> EOImode=1) having taken place.
>
> So either something has deactivated the interrupt without masking it
> beforehand, or the active state is not honoured. Either way, this is
> wrong.
Yes, agree, There is no possible in SPI case.
>> 3. The new irq will be processing in other cpu which different form the
>> old one.
>>
>> 4. The new irq going to be discarded because of the flag IRQD_IRQ_INPROGRESS
>> has been set.
>>
>> I notice that if we set IRQF_ONESHOT when register the irq, this problem
>> will gone.
>>
>> But I'm also thinking about change the gic_set_affinity function, to wait
>> current irq done on all cpus before gic_write_irouter.
>> I'm not sure if that's appropriate.
> The base architecture should guarantee that this is not a problem,
> thanks to the active state. If that was a LPI (which do not have an
> active state), that'd be a different problem. But this doesn't seem to
> be the case here.

Hi , Thanks for reply very much.

I have rechecked our test. Actually, that was a LPI in out test case.

It cause the problem since its_send_movi command.

I made a mistake when i modified the code.  It should be as follow.
Sorry for misleading you.


diff --git a/drivers/irqchip/irq-gic-v3-its.c
b/drivers/irqchip/irq-gic-v3-its.c

index 973ede0197e3..fad08ccb7fd9 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1667,6 +1667,9 @@ static int its_set_affinity(struct irq_data *d,
const struct cpumask *mask_val,

        /* don't set the affinity when the target cpu is same as
current one */
        if (cpu != prev_cpu) {
+
+               // wait irq done on all cpus
+
                target_col = &its_dev->its->collections[cpu];
                its_send_movi(its_dev, target_col, id);
                its_dev->event_map.col_map[id] = cpu

> I'm afraid to say that what you describe seem like a bug of some sort,
> either HW or SW.
>
> Thanks,
>
> M.

--
Regards,
Yipeng Zou

2023-01-17 10:14:44

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH] irqchip/gic-v3: wait irq done to set affinity

On Mon, Jan 09 2023 at 20:26, Yipeng Zou wrote:
> 在 2023/1/6 19:55, Marc Zyngier 写道:
> index 973ede0197e3..fad08ccb7fd9 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1667,6 +1667,9 @@ static int its_set_affinity(struct irq_data *d,
> const struct cpumask *mask_val,
>
>         /* don't set the affinity when the target cpu is same as
> current one */
>         if (cpu != prev_cpu) {
> +
> +               // wait irq done on all cpus
> +

There is no way to wait here. The caller holds the interrupt descriptor
lock.

If this is really an issue for LPI, then the only way to deal with that
is CONFIG_GENERIC_PENDING_IRQ, which delays the affinity change to
interrupt context

Why on earth must all the known hardware mistakes be repeated over and
over?

Thanks,

tglx