2024-04-23 07:37:23

by Dongli Zhang

[permalink] [raw]
Subject: [PATCH v2 1/1] genirq/cpuhotplug: retry with cpu_online_mask when irq_do_set_affinity return -ENOSPC

When a CPU goes offline, the interrupts pinned to that CPU are
re-configured.

Its managed interrupts undergo either migration to other CPUs or shutdown
if all CPUs listed in the affinity are offline. This patch doesn't affect
managed interrupts.

For regular interrupts, they are migrated to other selected online CPUs.
The target CPUs are chosen from either desc->pending_mask (suppose
CONFIG_GENERIC_PENDING_IRQ) or d->common->affinity (suppose CONFIG_SMP).
The cpu_online_mask is used as target CPUs only when CPUs in both
desc->pending_mask and d->common->affinity are offline.

However, there is a bad corner case, when desc->pending_mask or
d->common->affinity is selected as the target cpumask, but none of their
CPUs has any available vectors.

In this case the migration fails and the device interrupt becomes
stale. This is not any different from the case where the affinity
mask does not contain any online CPU, but there is no fallback
operation for this.

Instead of giving up, retry the migration attempt with the online CPU
mask if the interrupt is not managed, as managed interrupts cannot be
affected by this problem.

Cc: Joe Jin <[email protected]>
Signed-off-by: Dongli Zhang <[email protected]>
[tglx: massage some changelog]
---
Changed since v1:
- Re-work the commit message
- Move pr_debug before setting affinity
- Remove 'all' from pr_debug message

kernel/irq/cpuhotplug.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 1ed2b1739363..19babb914949 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -130,6 +130,17 @@ static bool migrate_one_irq(struct irq_desc *desc)
* CPU.
*/
err = irq_do_set_affinity(d, affinity, false);
+
+ if (err == -ENOSPC && !irqd_affinity_is_managed(d) && affinity != cpu_online_mask) {
+ pr_debug("IRQ%u: set affinity failed for %*pbl, re-try with online CPUs\n",
+ d->irq, cpumask_pr_args(affinity));
+
+ affinity = cpu_online_mask;
+ brokeaff = true;
+
+ err = irq_do_set_affinity(d, affinity, false);
+ }
+
if (err) {
pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n",
d->irq, err);
--
2.34.1



Subject: [tip: irq/core] genirq/cpuhotplug: Retry with cpu_online_mask when migration fails

The following commit has been merged into the irq/core branch of tip:

Commit-ID: 88d724e2301a69c1ab805cd74fc27aa36ae529e0
Gitweb: https://git.kernel.org/tip/88d724e2301a69c1ab805cd74fc27aa36ae529e0
Author: Dongli Zhang <[email protected]>
AuthorDate: Tue, 23 Apr 2024 00:34:13 -07:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Wed, 24 Apr 2024 20:42:57 +02:00

genirq/cpuhotplug: Retry with cpu_online_mask when migration fails

When a CPU goes offline, the interrupts affine to that CPU are
re-configured.

Managed interrupts undergo either migration to other CPUs or shutdown if
all CPUs listed in the affinity are offline. The migration of managed
interrupts is guaranteed on x86 because there are interrupt vectors
reserved.

Regular interrupts are migrated to a still online CPU in the affinity mask
or if there is no online CPU to any online CPU.

This works as long as the still online CPUs in the affinity mask have
interrupt vectors available, but in case that none of those CPUs has a
vector available the migration fails and the device interrupt becomes
stale.

This is not any different from the case where the affinity mask does not
contain any online CPU, but there is no fallback operation for this.

Instead of giving up, retry the migration attempt with the online CPU mask
if the interrupt is not managed, as managed interrupts cannot be affected
by this problem.

Signed-off-by: Dongli Zhang <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/irq/cpuhotplug.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 43340e0..75cadbc 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -130,6 +130,22 @@ static bool migrate_one_irq(struct irq_desc *desc)
* CPU.
*/
err = irq_do_set_affinity(d, affinity, false);
+
+ /*
+ * If there are online CPUs in the affinity mask, but they have no
+ * vectors left to make the migration work, try to break the
+ * affinity by migrating to any online CPU.
+ */
+ if (err == -ENOSPC && !irqd_affinity_is_managed(d) && affinity != cpu_online_mask) {
+ pr_debug("IRQ%u: set affinity failed for %*pbl, re-try with online CPUs\n",
+ d->irq, cpumask_pr_args(affinity));
+
+ affinity = cpu_online_mask;
+ brokeaff = true;
+
+ err = irq_do_set_affinity(d, affinity, false);
+ }
+
if (err) {
pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n",
d->irq, err);