2024-04-19 01:36:04

by Dongli Zhang

[permalink] [raw]
Subject: [PATCH 1/1] genirq/cpuhotplug: retry with online CPUs on irq_do_set_affinity failure

When a CPU is offline, its IRQs may migrate to other CPUs. For managed
IRQs, they are migrated, or shutdown (if all CPUs of the managed IRQ
affinity are offline). For regular IRQs, there will only be a migration.

The migrate_one_irq() first uses pending_mask or affinity_mask of the IRQ.

104 if (irq_fixup_move_pending(desc, true))
105 affinity = irq_desc_get_pending_mask(desc);
106 else
107 affinity = irq_data_get_affinity_mask(d);

The migrate_one_irq() may use all online CPUs, if all CPUs in
pending_mask/affinity_mask are already offline.

113 if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
114 /*
115 * If the interrupt is managed, then shut it down and leave
116 * the affinity untouched.
117 */
118 if (irqd_affinity_is_managed(d)) {
119 irqd_set_managed_shutdown(d);
120 irq_shutdown_and_deactivate(desc);
121 return false;
122 }
123 affinity = cpu_online_mask;
124 brokeaff = true;
125 }

However, there is a corner case. Although some CPUs in
pending_mask/affinity_mask are still online, they are lack of available
vectors. If the kernel continues calling irq_do_set_affinity() with those CPUs,
there will be -ENOSPC error.

This is not reasonable as other online CPUs still have many available vectors.

name: VECTOR
size: 0
mapped: 529
flags: 0x00000103
Online bitmaps: 7
Global available: 884
Global reserved: 6
Total allocated: 539
System: 36: 0-19,21,50,128,236,243-244,246-255
| CPU | avl | man | mac | act | vectors
0 147 0 0 55 32-49,51-87
1 147 0 0 55 32-49,51-87
2 0 0 0 202 32-49,51-127,129-235
4 147 0 0 55 32-49,51-87
5 147 0 0 55 32-49,51-87
6 148 0 0 54 32-49,51-86
7 148 0 0 54 32-49,51-86

This issue should not happen for managed IRQs because the vectors are already
reserved before CPU hotplug. For regular IRQs, do a re-try with all online
CPUs if the prior irq_do_set_affinity() is failed with -ENOSPC.

Cc: Joe Jin <[email protected]>
Signed-off-by: Dongli Zhang <[email protected]>
---
kernel/irq/cpuhotplug.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 1ed2b1739363..d1666a6b73f4 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -130,6 +130,19 @@ static bool migrate_one_irq(struct irq_desc *desc)
* CPU.
*/
err = irq_do_set_affinity(d, affinity, false);
+
+ if (err == -ENOSPC &&
+ !irqd_affinity_is_managed(d) &&
+ affinity != cpu_online_mask) {
+ affinity = cpu_online_mask;
+ brokeaff = true;
+
+ pr_debug("IRQ%u: set affinity failed for %*pbl, re-try with all online CPUs\n",
+ d->irq, cpumask_pr_args(affinity));
+
+ err = irq_do_set_affinity(d, affinity, false);
+ }
+
if (err) {
pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n",
d->irq, err);
--
2.34.1



2024-04-22 21:00:01

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/1] genirq/cpuhotplug: retry with online CPUs on irq_do_set_affinity failure

On Thu, Apr 18 2024 at 18:33, Dongli Zhang wrote:

> When a CPU is offline, its IRQs may migrate to other CPUs. For managed
> IRQs, they are migrated, or shutdown (if all CPUs of the managed IRQ
> affinity are offline). For regular IRQs, there will only be a
> migration.

Please write out interrupts. There is enough space for it and IRQ is
just not a regular word.

> The migrate_one_irq() first uses pending_mask or affinity_mask of the IRQ.
>
> 104 if (irq_fixup_move_pending(desc, true))
> 105 affinity = irq_desc_get_pending_mask(desc);
> 106 else
> 107 affinity = irq_data_get_affinity_mask(d);
>
> The migrate_one_irq() may use all online CPUs, if all CPUs in
> pending_mask/affinity_mask are already offline.
>
> 113 if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
> 114 /*
> 115 * If the interrupt is managed, then shut it down and leave
> 116 * the affinity untouched.
> 117 */
> 118 if (irqd_affinity_is_managed(d)) {
> 119 irqd_set_managed_shutdown(d);
> 120 irq_shutdown_and_deactivate(desc);
> 121 return false;
> 122 }
> 123 affinity = cpu_online_mask;
> 124 brokeaff = true;
> 125 }

Please don't copy code into the change log. Describe the problem in
text.

> However, there is a corner case. Although some CPUs in
> pending_mask/affinity_mask are still online, they are lack of available
> vectors. If the kernel continues calling irq_do_set_affinity() with those CPUs,
> there will be -ENOSPC error.
>
> This is not reasonable as other online CPUs still have many available
> vectors.

Reasonable is not the question here. It's either correct or not.

> name: VECTOR
> size: 0
> mapped: 529
> flags: 0x00000103
> Online bitmaps: 7
> Global available: 884
> Global reserved: 6
> Total allocated: 539
> System: 36: 0-19,21,50,128,236,243-244,246-255
> | CPU | avl | man | mac | act | vectors
> 0 147 0 0 55 32-49,51-87
> 1 147 0 0 55 32-49,51-87
> 2 0 0 0 202 32-49,51-127,129-235

Just ouf of curiousity. How did this end up with CPU2 completely
occupied?

> 4 147 0 0 55 32-49,51-87
> 5 147 0 0 55 32-49,51-87
> 6 148 0 0 54 32-49,51-86
> 7 148 0 0 54 32-49,51-86
>
> This issue should not happen for managed IRQs because the vectors are already
> reserved before CPU hotplug.

Should not? It either does or it does not.

> For regular IRQs, do a re-try with all online
> CPUs if the prior irq_do_set_affinity() is failed with -ENOSPC.
>
> Cc: Joe Jin <[email protected]>
> Signed-off-by: Dongli Zhang <[email protected]>
> ---
> kernel/irq/cpuhotplug.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
> index 1ed2b1739363..d1666a6b73f4 100644
> --- a/kernel/irq/cpuhotplug.c
> +++ b/kernel/irq/cpuhotplug.c
> @@ -130,6 +130,19 @@ static bool migrate_one_irq(struct irq_desc *desc)
> * CPU.
> */
> err = irq_do_set_affinity(d, affinity, false);
> +
> + if (err == -ENOSPC &&
> + !irqd_affinity_is_managed(d) &&
> + affinity != cpu_online_mask) {

This really wants to be a single line conditional.

> + affinity = cpu_online_mask;
> + brokeaff = true;
> +
> + pr_debug("IRQ%u: set affinity failed for %*pbl, re-try with all online CPUs\n",
> + d->irq, cpumask_pr_args(affinity));

How is it useful to print cpu_online_mask here?

Thanks,

tglx

2024-04-22 23:19:43

by Dongli Zhang

[permalink] [raw]
Subject: Re: [PATCH 1/1] genirq/cpuhotplug: retry with online CPUs on irq_do_set_affinity failure

Hi Thomas,

On 4/22/24 13:58, Thomas Gleixner wrote:
> On Thu, Apr 18 2024 at 18:33, Dongli Zhang wrote:
>
>> When a CPU is offline, its IRQs may migrate to other CPUs. For managed
>> IRQs, they are migrated, or shutdown (if all CPUs of the managed IRQ
>> affinity are offline). For regular IRQs, there will only be a
>> migration.
>
> Please write out interrupts. There is enough space for it and IRQ is
> just not a regular word.

I will use "interrupts".

>
>> The migrate_one_irq() first uses pending_mask or affinity_mask of the IRQ.
>>
>> 104 if (irq_fixup_move_pending(desc, true))
>> 105 affinity = irq_desc_get_pending_mask(desc);
>> 106 else
>> 107 affinity = irq_data_get_affinity_mask(d);
>>
>> The migrate_one_irq() may use all online CPUs, if all CPUs in
>> pending_mask/affinity_mask are already offline.
>>
>> 113 if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
>> 114 /*
>> 115 * If the interrupt is managed, then shut it down and leave
>> 116 * the affinity untouched.
>> 117 */
>> 118 if (irqd_affinity_is_managed(d)) {
>> 119 irqd_set_managed_shutdown(d);
>> 120 irq_shutdown_and_deactivate(desc);
>> 121 return false;
>> 122 }
>> 123 affinity = cpu_online_mask;
>> 124 brokeaff = true;
>> 125 }
>
> Please don't copy code into the change log. Describe the problem in
> text.

Would you mind suggesting if the below commit message is fine to you?


genirq/cpuhotplug: retry with cpu_online_mask when irq_do_set_affinity return
-ENOSPC

When a CPU goes offline, the interrupts pinned to that CPU are
re-configured.

Its managed interrupts undergo either migration to other CPUs or shutdown
if all CPUs listed in the affinity are offline. This patch doesn't affect
managed interrupts.

For regular interrupts, they are migrated to other selected online CPUs.
The target CPUs are chosen from either desc->pending_mask (suppose
CONFIG_GENERIC_PENDING_IRQ) or d->common->affinity (suppose CONFIG_SMP).
The cpu_online_mask is used as target CPUs only when CPUs in both
desc->pending_mask and d->common->affinity are offline.

However, there is a bad corner case, when desc->pending_mask or
d->common->affinity is selected as the target cpumask, but none of their
CPUs has any available vectors.

As a result, an -ENOSPC error happens:

"IRQ151: set affinity failed(-28)."

This is from the debugfs. The allocation fails although other online CPUs
(except CPU=2) have many free vectors.

name: VECTOR
size: 0
mapped: 529
flags: 0x00000103
Online bitmaps: 7
Global available: 884
Global reserved: 6
Total allocated: 539
System: 36: 0-19,21,50,128,236,243-244,246-255
| CPU | avl | man | mac | act | vectors
0 147 0 0 55 32-49,51-87
1 147 0 0 55 32-49,51-87
2 0 0 0 202 32-49,51-127,129-235
4 147 0 0 55 32-49,51-87
5 147 0 0 55 32-49,51-87
6 148 0 0 54 32-49,51-86
7 148 0 0 54 32-49,51-86

The steps to reproduce the issue are in [1]. The core idea is:

1. Create a KVM guest with many virtio-net PCI devices, each configured
with a very large number of queues/vectors.

2. Set the affinity of all virtio-net interrupts to "2,3".

3. Offline many CPUs, excluding "2,3".

4. Offline CPU=2, and irq_do_set_affinity() returns -ENOSPC.

For regular interrupts, if irq_do_set_affinity() returns -ENOSPC, retry it
with all online CPUs. The issue does not happen for managed interrupts
because the vectors are always reserved (in cm->managed_map) before the CPU
offline operation.

[1] https://lore.kernel.org/all/[email protected]/

Cc: Joe Jin <[email protected]>
Signed-off-by: Dongli Zhang <[email protected]>


>
>> However, there is a corner case. Although some CPUs in
>> pending_mask/affinity_mask are still online, they are lack of available
>> vectors. If the kernel continues calling irq_do_set_affinity() with those CPUs,
>> there will be -ENOSPC error.
>>
>> This is not reasonable as other online CPUs still have many available
>> vectors.
>
> Reasonable is not the question here. It's either correct or not.

This has been re-written in the new commit message.

>
>> name: VECTOR
>> size: 0
>> mapped: 529
>> flags: 0x00000103
>> Online bitmaps: 7
>> Global available: 884
>> Global reserved: 6
>> Total allocated: 539
>> System: 36: 0-19,21,50,128,236,243-244,246-255
>> | CPU | avl | man | mac | act | vectors
>> 0 147 0 0 55 32-49,51-87
>> 1 147 0 0 55 32-49,51-87
>> 2 0 0 0 202 32-49,51-127,129-235
>
> Just ouf of curiousity. How did this end up with CPU2 completely
> occupied?

The details are in the link:
https://lore.kernel.org/all/[email protected]/

Here is the core idea:

1. Create a KVM guest with many virtio-net PCI devices, each configured
with a very large number of queues/vectors.

2. Set the affinity of all virtio-net interrupts to "2,3".

3. Offline many CPUs, excluding "2,3".

4. Offline CPU=2, and irq_do_set_affinity() returns -ENOSPC.

>
>> 4 147 0 0 55 32-49,51-87
>> 5 147 0 0 55 32-49,51-87
>> 6 148 0 0 54 32-49,51-86
>> 7 148 0 0 54 32-49,51-86
>>
>> This issue should not happen for managed IRQs because the vectors are already
>> reserved before CPU hotplug.
>
> Should not? It either does or it does not.

It is "does not". I will fix it. Please see new commit message.

>
>> For regular IRQs, do a re-try with all online
>> CPUs if the prior irq_do_set_affinity() is failed with -ENOSPC.
>>
>> Cc: Joe Jin <[email protected]>
>> Signed-off-by: Dongli Zhang <[email protected]>
>> ---
>> kernel/irq/cpuhotplug.c | 13 +++++++++++++
>> 1 file changed, 13 insertions(+)
>>
>> diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
>> index 1ed2b1739363..d1666a6b73f4 100644
>> --- a/kernel/irq/cpuhotplug.c
>> +++ b/kernel/irq/cpuhotplug.c
>> @@ -130,6 +130,19 @@ static bool migrate_one_irq(struct irq_desc *desc)
>> * CPU.
>> */
>> err = irq_do_set_affinity(d, affinity, false);
>> +
>> + if (err == -ENOSPC &&
>> + !irqd_affinity_is_managed(d) &&
>> + affinity != cpu_online_mask) {
>
> This really wants to be a single line conditional.

I will change it to a single line.

Would you mind suggesting which is preferred? !cpumask_equal(affinity,
cpu_online_mask) or (affinity != cpu_online_mask)?


>
>> + affinity = cpu_online_mask;
>> + brokeaff = true;
>> +
>> + pr_debug("IRQ%u: set affinity failed for %*pbl, re-try with all online CPUs\n",
>> + d->irq, cpumask_pr_args(affinity));
>
> How is it useful to print cpu_online_mask here?

My bad and sorry about that. I should print the log before changing 'affinity' I
will fix that in the new version.


Thank you very much for feedback!

Dongli Zhang

2024-04-23 01:02:25

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/1] genirq/cpuhotplug: retry with online CPUs on irq_do_set_affinity failure

On Mon, Apr 22 2024 at 16:09, Dongli Zhang wrote:
> On 4/22/24 13:58, Thomas Gleixner wrote:
>> On Thu, Apr 18 2024 at 18:33, Dongli Zhang wrote:
> Would you mind suggesting if the below commit message is fine to you?
>
>
> genirq/cpuhotplug: retry with cpu_online_mask when irq_do_set_affinity return
> -ENOSPC
>
> When a CPU goes offline, the interrupts pinned to that CPU are
> re-configured.
>
> Its managed interrupts undergo either migration to other CPUs or shutdown
> if all CPUs listed in the affinity are offline. This patch doesn't affect
> managed interrupts.
>
> For regular interrupts, they are migrated to other selected online CPUs.
> The target CPUs are chosen from either desc->pending_mask (suppose
> CONFIG_GENERIC_PENDING_IRQ) or d->common->affinity (suppose CONFIG_SMP).
> The cpu_online_mask is used as target CPUs only when CPUs in both
> desc->pending_mask and d->common->affinity are offline.
>
> However, there is a bad corner case, when desc->pending_mask or
> d->common->affinity is selected as the target cpumask, but none of their
> CPUs has any available vectors.

Up to here it's fine.

> As a result, an -ENOSPC error happens:
>
> "IRQ151: set affinity failed(-28)."
>
> This is from the debugfs. The allocation fails although other online CPUs
> (except CPU=2) have many free vectors.

The debugfs output is not really providing more information than the
last sentence. It just occupies space :)

> The steps to reproduce the issue are in [1]. The core idea is:
>
> 1. Create a KVM guest with many virtio-net PCI devices, each configured
> with a very large number of queues/vectors.
>
> 2. Set the affinity of all virtio-net interrupts to "2,3".

That makes absolutely no sense at all. :)

But yes, I can see the non-real world problem with that.

> For regular interrupts, if irq_do_set_affinity() returns -ENOSPC, retry it
> with all online CPUs. The issue does not happen for managed interrupts
> because the vectors are always reserved (in cm->managed_map) before the CPU
> offline operation.
>
> [1]
> https://lore.kernel.org/all/[email protected]/

The reproduction instructions are just additional information and not
necessarily change log material.

So I'd just say after the above:
> However, there is a bad corner case, when desc->pending_mask or
> d->common->affinity is selected as the target cpumask, but none of their
> CPUs has any available vectors.

In this case the migration fails and the device interrupt becomes
stale. This is not any different from the case where the affinity
mask does not contain any online CPU, but there is no fallback
operation for this.

Instead of giving up retry the migration attempt with the online CPU
mask if the interrupt is not managed as managed interrupts cannot be
affected by this problem.

Hmm?

> I will change it to a single line.
>
> Would you mind suggesting which is preferred? !cpumask_equal(affinity,
> cpu_online_mask) or (affinity != cpu_online_mask)?

If at all you want !cpumask_subset(cpu_online_mask, affinity), but as
this is a corner case 'affinity != cpu_online_mask' should be good
enough.

Thanks,

tglx