2016-04-01 03:29:32

by majun (Euler7)

[permalink] [raw]
Subject: [RFC PATCH] genirq: Change the non-balanced irq to balance irq when the cpu of the irq bounded off line

From: Ma Jun <[email protected]>

When the CPU of a non-balanced irq bounded is off line, the irq will be migrated to other CPUs,
usually the first cpu on-line.

We can suppose the situation if a system has more than one non-balanced irq.
At extreme case, these irqs will be migrated to the same CPU and will cause the
CPU run with high irq pressure, even make the system die.

So, I think maybe we need to change the non-balanced irq to a irq can be
balanced to avoid the problem descried above.

Maybe this is not a good solution for this problem, please offer me some
suggestion if you have a better one.

Signed-off-by: Ma Jun <[email protected]>
---
kernel/irq/cpuhotplug.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 011f8c4..80d54a5 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -30,6 +30,8 @@ static bool migrate_one_irq(struct irq_desc *desc)
return false;

if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
+ if (irq_settings_has_no_balance_set(desc))
+ irqd_clear(d, IRQD_NO_BALANCING);
affinity = cpu_online_mask;
ret = true;
}
--
1.7.1



2016-04-01 08:30:40

by Marc Zyngier

[permalink] [raw]
Subject: Re: [RFC PATCH] genirq: Change the non-balanced irq to balance irq when the cpu of the irq bounded off line

Hi Ma Jun,

On 01/04/16 04:28, MaJun wrote:
> From: Ma Jun <[email protected]>
>
> When the CPU of a non-balanced irq bounded is off line, the irq will be migrated to other CPUs,
> usually the first cpu on-line.
>
> We can suppose the situation if a system has more than one non-balanced irq.
> At extreme case, these irqs will be migrated to the same CPU and will cause the
> CPU run with high irq pressure, even make the system die.

It would take a hell of lot of interrupts (and a very badly designed
system) for that system to collapse under the interrupt load. Whatever
people tend to think, interrupts are a very rare event.

Any moderately ancient CPU can take several hundred of thousand
interrupts per second, and you still barely notice it (try any embedded
platform with a bunch of MMC controllers...).

Now, let's get to the actual question:

> So, I think maybe we need to change the non-balanced irq to a irq can be
> balanced to avoid the problem descried above.

But what makes you think that you can safely clear that flag? If it has
been excluding from balancing, that's surely for a good reason, and the
device driver that requested this probably doesn't expect the interrupt
affinity to change, other than by the effect of CPU hotplug itself.

So if you're seeing a problem with an interrupt not being balanced,
please first investigate *why* the driver asked for it the first place.

But to the best of my understanding, this patch doesn't solve anything.

Thanks,

N,
--
Jazz is not dead. It just smells funny...