It will always enable the interrupt after calling plic_set_affinity()
however it should set to its previous setting. Staying disabled or enabled.
This patch can also fix this pwm hang issue in Unleashed board.
[ 919.015783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 919.020922] rcu: 0-...0: (0 ticks this GP)
idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=105807
[ 919.030295] (detected by 1, t=225825 jiffies, g=1561, q=3496)
[ 919.036109] Task dump for CPU 0:
[ 919.039321] kworker/0:1 R running task 0 30 2 0x00000008
[ 919.046359] Workqueue: events set_brightness_delayed
[ 919.051302] Call Trace:
[ 919.053738] [<ffffffe000930d92>] __schedule+0x194/0x4de
[ 982.035783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 982.040923] rcu: 0-...0: (0 ticks this GP)
idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=113325
[ 982.050294] (detected by 1, t=241580 jiffies, g=1561, q=3509)
[ 982.056108] Task dump for CPU 0:
[ 982.059321] kworker/0:1 R running task 0 30 2 0x00000008
[ 982.066359] Workqueue: events set_brightness_delayed
[ 982.071302] Call Trace:
[ 982.073739] [<ffffffe000930d92>] __schedule+0x194/0x4de
[..]
Fixes: bb0fed1c60cc ("irqchip/sifive-plic: Switch to fasteoi flow")
Signed-off-by: Greentime Hu <[email protected]>
---
drivers/irqchip/irq-sifive-plic.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index eaa3e9fe54e9..4cc8a2657a6d 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -137,6 +137,7 @@ static int plic_set_affinity(struct irq_data *d,
const struct cpumask *mask_val, bool force)
{
unsigned int cpu;
+ bool enable;
struct cpumask amask;
struct plic_priv *priv = irq_get_chip_data(d->irq);
@@ -150,8 +151,10 @@ static int plic_set_affinity(struct irq_data *d,
if (cpu >= nr_cpu_ids)
return -EINVAL;
+ enable = !irqd_irq_disabled(d);
plic_irq_toggle(&priv->lmask, d, 0);
- plic_irq_toggle(cpumask_of(cpu), d, 1);
+ /* Keep its original setting. */
+ plic_irq_toggle(cpumask_of(cpu), d, enable);
irq_data_update_effective_affinity(d, cpumask_of(cpu));
--
2.28.0
In commit 2ca0b460bbcb ("genirq/affinity: Make affinity setting if activated opt-in"),
it added irqd_affinity_on_activate() checking in the function
irq_set_affinity_deactivated() so it will return false here.
In that case, it will call irq_try_set_affinity() -> plic_irq_toggle()
which will enable the interrupt to cause the CPU hang.
if (irq_set_affinity_deactivated())
return 0;
...
irq_try_set_affinity(data, mask, force);
[ 919.015783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 919.020922] rcu: 0-...0: (0 ticks this GP)
idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=105807
[ 919.030295] (detected by 1, t=225825 jiffies, g=1561, q=3496)
[ 919.036109] Task dump for CPU 0:
[ 919.039321] kworker/0:1 R running task 0 30 2 0x00000008
[ 919.046359] Workqueue: events set_brightness_delayed
[ 919.051302] Call Trace:
[ 919.053738] [<ffffffe000930d92>] __schedule+0x194/0x4de
[ 982.035783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 982.040923] rcu: 0-...0: (0 ticks this GP)
idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=113325
[ 982.050294] (detected by 1, t=241580 jiffies, g=1561, q=3509)
[ 982.056108] Task dump for CPU 0:
[ 982.059321] kworker/0:1 R running task 0 30 2 0x00000008
[ 982.066359] Workqueue: events set_brightness_delayed
[ 982.071302] Call Trace:
[ 982.073739] [<ffffffe000930d92>] __schedule+0x194/0x4de
[..]
After applying this patch, the CPU hang issue can be fixed.
Signed-off-by: Greentime Hu <[email protected]>
---
drivers/irqchip/irq-sifive-plic.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index 4cc8a2657a6d..8a673d9cff69 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -183,10 +183,14 @@ static int plic_irqdomain_map(struct irq_domain *d, unsigned int irq,
irq_hw_number_t hwirq)
{
struct plic_priv *priv = d->host_data;
+ struct irq_data *irqd;
irq_domain_set_info(d, irq, hwirq, &plic_chip, d->host_data,
handle_fasteoi_irq, NULL, NULL);
irq_set_noprobe(irq);
+ irqd = irq_get_irq_data(irq);
+ irqd_set_single_target(irqd);
+ irqd_set_affinity_on_activate(irqd);
irq_set_affinity(irq, &priv->lmask);
return 0;
}
--
2.28.0
On Mon, Oct 12, 2020 at 7:27 PM Greentime Hu <[email protected]> wrote:
>
> It will always enable the interrupt after calling plic_set_affinity()
> however it should set to its previous setting. Staying disabled or enabled.
>
> This patch can also fix this pwm hang issue in Unleashed board.
>
> [ 919.015783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 919.020922] rcu: 0-...0: (0 ticks this GP)
> idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=105807
> [ 919.030295] (detected by 1, t=225825 jiffies, g=1561, q=3496)
> [ 919.036109] Task dump for CPU 0:
> [ 919.039321] kworker/0:1 R running task 0 30 2 0x00000008
> [ 919.046359] Workqueue: events set_brightness_delayed
> [ 919.051302] Call Trace:
> [ 919.053738] [<ffffffe000930d92>] __schedule+0x194/0x4de
> [ 982.035783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 982.040923] rcu: 0-...0: (0 ticks this GP)
> idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=113325
> [ 982.050294] (detected by 1, t=241580 jiffies, g=1561, q=3509)
> [ 982.056108] Task dump for CPU 0:
> [ 982.059321] kworker/0:1 R running task 0 30 2 0x00000008
> [ 982.066359] Workqueue: events set_brightness_delayed
> [ 982.071302] Call Trace:
> [ 982.073739] [<ffffffe000930d92>] __schedule+0x194/0x4de
> [..]
>
> Fixes: bb0fed1c60cc ("irqchip/sifive-plic: Switch to fasteoi flow")
> Signed-off-by: Greentime Hu <[email protected]>
> ---
> drivers/irqchip/irq-sifive-plic.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
> index eaa3e9fe54e9..4cc8a2657a6d 100644
> --- a/drivers/irqchip/irq-sifive-plic.c
> +++ b/drivers/irqchip/irq-sifive-plic.c
> @@ -137,6 +137,7 @@ static int plic_set_affinity(struct irq_data *d,
> const struct cpumask *mask_val, bool force)
> {
> unsigned int cpu;
> + bool enable;
> struct cpumask amask;
> struct plic_priv *priv = irq_get_chip_data(d->irq);
>
> @@ -150,8 +151,10 @@ static int plic_set_affinity(struct irq_data *d,
> if (cpu >= nr_cpu_ids)
> return -EINVAL;
>
> + enable = !irqd_irq_disabled(d);
> plic_irq_toggle(&priv->lmask, d, 0);
> - plic_irq_toggle(cpumask_of(cpu), d, 1);
> + /* Keep its original setting. */
> + plic_irq_toggle(cpumask_of(cpu), d, enable);
>
> irq_data_update_effective_affinity(d, cpumask_of(cpu));
>
> --
> 2.28.0
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Good catch. Your fix looks good to me.
Reviewed-by: Anup Patel <[email protected]>
Regards,
Anup
On Mon, Oct 12, 2020 at 6:57 AM Greentime Hu <[email protected]> wrote:
>
> It will always enable the interrupt after calling plic_set_affinity()
> however it should set to its previous setting. Staying disabled or enabled.
>
> This patch can also fix this pwm hang issue in Unleashed board.
>
> [ 919.015783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 919.020922] rcu: 0-...0: (0 ticks this GP)
> idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=105807
> [ 919.030295] (detected by 1, t=225825 jiffies, g=1561, q=3496)
> [ 919.036109] Task dump for CPU 0:
> [ 919.039321] kworker/0:1 R running task 0 30 2 0x00000008
> [ 919.046359] Workqueue: events set_brightness_delayed
> [ 919.051302] Call Trace:
> [ 919.053738] [<ffffffe000930d92>] __schedule+0x194/0x4de
> [ 982.035783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 982.040923] rcu: 0-...0: (0 ticks this GP)
> idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=113325
> [ 982.050294] (detected by 1, t=241580 jiffies, g=1561, q=3509)
> [ 982.056108] Task dump for CPU 0:
> [ 982.059321] kworker/0:1 R running task 0 30 2 0x00000008
> [ 982.066359] Workqueue: events set_brightness_delayed
> [ 982.071302] Call Trace:
> [ 982.073739] [<ffffffe000930d92>] __schedule+0x194/0x4de
> [..]
>
> Fixes: bb0fed1c60cc ("irqchip/sifive-plic: Switch to fasteoi flow")
> Signed-off-by: Greentime Hu <[email protected]>
> ---
> drivers/irqchip/irq-sifive-plic.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
> index eaa3e9fe54e9..4cc8a2657a6d 100644
> --- a/drivers/irqchip/irq-sifive-plic.c
> +++ b/drivers/irqchip/irq-sifive-plic.c
> @@ -137,6 +137,7 @@ static int plic_set_affinity(struct irq_data *d,
> const struct cpumask *mask_val, bool force)
> {
> unsigned int cpu;
> + bool enable;
> struct cpumask amask;
> struct plic_priv *priv = irq_get_chip_data(d->irq);
>
> @@ -150,8 +151,10 @@ static int plic_set_affinity(struct irq_data *d,
> if (cpu >= nr_cpu_ids)
> return -EINVAL;
>
> + enable = !irqd_irq_disabled(d);
> plic_irq_toggle(&priv->lmask, d, 0);
> - plic_irq_toggle(cpumask_of(cpu), d, 1);
> + /* Keep its original setting. */
> + plic_irq_toggle(cpumask_of(cpu), d, enable);
>
> irq_data_update_effective_affinity(d, cpumask_of(cpu));
>
> --
> 2.28.0
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv
LGTM.
Reviewed-by: Atish Patra <[email protected]>
--
Regards,
Atish
On 2020-10-12 14:57, Greentime Hu wrote:
> In commit 2ca0b460bbcb ("genirq/affinity: Make affinity setting if
> activated opt-in"),
> it added irqd_affinity_on_activate() checking in the function
> irq_set_affinity_deactivated() so it will return false here.
> In that case, it will call irq_try_set_affinity() -> plic_irq_toggle()
> which will enable the interrupt to cause the CPU hang.
>
> if (irq_set_affinity_deactivated())
> return 0;
> ...
> irq_try_set_affinity(data, mask, force);
>
> [ 919.015783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 919.020922] rcu: 0-...0: (0 ticks this GP)
> idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=105807
> [ 919.030295] (detected by 1, t=225825 jiffies, g=1561, q=3496)
> [ 919.036109] Task dump for CPU 0:
> [ 919.039321] kworker/0:1 R running task 0 30 2
> 0x00000008
> [ 919.046359] Workqueue: events set_brightness_delayed
> [ 919.051302] Call Trace:
> [ 919.053738] [<ffffffe000930d92>] __schedule+0x194/0x4de
> [ 982.035783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 982.040923] rcu: 0-...0: (0 ticks this GP)
> idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=113325
> [ 982.050294] (detected by 1, t=241580 jiffies, g=1561, q=3509)
> [ 982.056108] Task dump for CPU 0:
> [ 982.059321] kworker/0:1 R running task 0 30 2
> 0x00000008
> [ 982.066359] Workqueue: events set_brightness_delayed
> [ 982.071302] Call Trace:
> [ 982.073739] [<ffffffe000930d92>] __schedule+0x194/0x4de
> [..]
>
> After applying this patch, the CPU hang issue can be fixed.
>
> Signed-off-by: Greentime Hu <[email protected]>
> ---
> drivers/irqchip/irq-sifive-plic.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/irqchip/irq-sifive-plic.c
> b/drivers/irqchip/irq-sifive-plic.c
> index 4cc8a2657a6d..8a673d9cff69 100644
> --- a/drivers/irqchip/irq-sifive-plic.c
> +++ b/drivers/irqchip/irq-sifive-plic.c
> @@ -183,10 +183,14 @@ static int plic_irqdomain_map(struct irq_domain
> *d, unsigned int irq,
> irq_hw_number_t hwirq)
> {
> struct plic_priv *priv = d->host_data;
> + struct irq_data *irqd;
>
> irq_domain_set_info(d, irq, hwirq, &plic_chip, d->host_data,
> handle_fasteoi_irq, NULL, NULL);
> irq_set_noprobe(irq);
> + irqd = irq_get_irq_data(irq);
> + irqd_set_single_target(irqd);
> + irqd_set_affinity_on_activate(irqd);
> irq_set_affinity(irq, &priv->lmask);
> return 0;
> }
How does this fix anything? The plic driver doesn't have an activate
callback, so how does it make any difference? I get the feeling this
papers over another issue.
M.
--
Jazz is not dead. It just smells funny...
On Thu, Oct 15 2020 at 22:23, Marc Zyngier wrote:
> On 2020-10-12 14:57, Greentime Hu wrote:
>> In commit 2ca0b460bbcb ("genirq/affinity: Make affinity setting if
>> activated opt-in"),
This commit sha does not exist in mainline. Referencing a random stable
kernel tree is useless.
>> it added irqd_affinity_on_activate() checking in the function
>> irq_set_affinity_deactivated() so it will return false here.
>> In that case, it will call irq_try_set_affinity() -> plic_irq_toggle()
>> which will enable the interrupt to cause the CPU hang.
>> @@ -183,10 +183,14 @@ static int plic_irqdomain_map(struct irq_domain
>> *d, unsigned int irq,
>> irq_hw_number_t hwirq)
>> {
>> struct plic_priv *priv = d->host_data;
>> + struct irq_data *irqd;
>>
>> irq_domain_set_info(d, irq, hwirq, &plic_chip, d->host_data,
>> handle_fasteoi_irq, NULL, NULL);
>> irq_set_noprobe(irq);
>> + irqd = irq_get_irq_data(irq);
>> + irqd_set_single_target(irqd);
>> + irqd_set_affinity_on_activate(irqd);
>> irq_set_affinity(irq, &priv->lmask);
>> return 0;
>> }
>
> How does this fix anything? The plic driver doesn't have an activate
> callback, so how does it make any difference? I get the feeling this
> papers over another issue.
The existence of the activate callback is not really the interesting
part. And of course the analysis is completely bogus.
The referenced commit is a follow up to:
baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly")
which was added on Jul 17 and the fix was added on Jul 24:
f0c7baca1800 ("genirq/affinity: Make affinity setting if activated opt-in")
The latter restored the original behaviour within 7 days, except for
x86. And the original behaviour was that the core did not care about the
activated state at all. It just invoked irqchip->irq_set_affinity()
unconditionally, which is still does except when the interrupt is marked
with irqd_set_affinity_on_activate().
So the plic code had this problem before that change already:
irq_set_noprobe(irq);
irq_set_affinity(irq, &priv->lmask);
Mapping happens way before anything else, so Hu is definitely barking up
the wrong tree.
What the patch works around is the irq_set_affinity() callback which is
completely bogus and still broken even after that duct tape which"
fixes" the boot fail.
irq_set_affinity() can be called on masked interrupts so this:
plic_irq_toggle(&priv->lmask, d, 0);
plic_irq_toggle(cpumask_of(cpu), d, 1);
will always unconditionally unmask the interrupt when affinity is
changed.
plic_irq_toggle(cpumask_of(cpu), d, !irqd_irq_masked(d));
is all it needs to work everywhere. Obvious, right?
The changelog wants to be "...: Fix broken irq_set_affinity() callback"
and the Fixes: tag referencing the commit which introduced that
unconditional unmask. Oh well.
Thanks,
tglx