2023-10-20 07:26:43

by Chen Yu

[permalink] [raw]
Subject: [RFC PATCH] genirq: Exclude managed irq during irq migration

The managed IRQ will be shutdown and not be migrated to
other CPUs during CPU offline. Later when the CPU is online,
the managed IRQ will be re-enabled on this CPU. The managed
IRQ can be used to reduce the IRQ migration during CPU hotplug.

Before putting the CPU offline, the number of the already allocated
IRQs on this offlining CPU will be compared to the total number
of available IRQ vectors on the remaining online CPUs. If there is
not enough slot for these IRQs to be migrated to, the CPU offline
will be terminated. However, currently the code treats the managed
IRQ as migratable, which is not true, and brings false negative
during CPU hotplug and hibernation stress test.

For example:

cat /sys/kernel/debug/irq/domains/VECTOR

name: VECTOR
size: 0
mapped: 338
flags: 0x00000103
Online bitmaps: 168
Global available: 33009
Global reserved: 83
Total allocated: 255 <------
System: 43: 0-21,50,128,192,233-236,240-242,244,246-255
| CPU | avl | man | mac | act | vectors
0 180 1 1 18 32-49
1 196 1 1 2 32-33
...
166 197 1 1 1 32
167 197 1 1 1 32

//put CPU167 offline
pepc.standalone cpu-hotplug offline --cpus 167

cat /sys/kernel/debug/irq/domains/VECTOR

name: VECTOR
size: 0
mapped: 338
flags: 0x00000103
Online bitmaps: 167
Global available: 32812
Global reserved: 83
Total allocated: 254 <------
System: 43: 0-21,50,128,192,233-236,240-242,244,246-255
| CPU | avl | man | mac | act | vectors
0 180 1 1 18 32-49
1 196 1 1 2 32-33
...
166 197 1 1 1 32

After CPU167 is offline, the number of allocated vectors
decreases from 255 to 254. Since the only IRQ on CPU167 is
managed(mac field), it is not migrated. But the current
code thinks that there is 1 IRQ to be migrated.

Fix the check by substracting the number of managed IRQ from
allocated one.

Fixes: 2f75d9e1c905 ("genirq: Implement bitmap matrix allocator")
Reported-by: Wendy Wang <[email protected]>
Signed-off-by: Chen Yu <[email protected]>
---
kernel/irq/matrix.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 1698e77645ac..d245ad76661e 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -475,7 +475,7 @@ unsigned int irq_matrix_allocated(struct irq_matrix *m)
{
struct cpumap *cm = this_cpu_ptr(m->maps);

- return cm->allocated;
+ return cm->allocated - cm->managed_allocated;
}

#ifdef CONFIG_GENERIC_IRQ_DEBUGFS
--
2.25.1


2023-10-25 14:35:41

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH] genirq: Exclude managed irq during irq migration

Chen!

On Fri, Oct 20 2023 at 15:25, Chen Yu wrote:
> The managed IRQ will be shutdown and not be migrated to

Please write out interrupts in change logs, this is not twitter.

> other CPUs during CPU offline. Later when the CPU is online,
> the managed IRQ will be re-enabled on this CPU. The managed
> IRQ can be used to reduce the IRQ migration during CPU hotplug.
>
> Before putting the CPU offline, the number of the already allocated
> IRQs on this offlining CPU will be compared to the total number

The usage of IRQs and vectors is slightly confusing all over the
place.

> of available IRQ vectors on the remaining online CPUs. If there is
> not enough slot for these IRQs to be migrated to, the CPU offline
> will be terminated. However, currently the code treats the managed
> IRQ as migratable, which is not true, and brings false negative
> during CPU hotplug and hibernation stress test.

Your assumption that managed interrupts cannot be migrated is only
correct when the managed interrupts affinity mask has exactly one online
target CPU. Otherwise the interrupt is migrated to one of the other
online CPUs in the affinity mask.

Though that does not affect the migrateability calculation because in
case that a managed interrupt has an affinity mask with more than one
target CPU set, the vectors on the currently not targeted CPUs are
already reserved and accounted for in matrix->global_available. IOW,
migrateability for such managed interrupts is already guaranteed.

I'll amend the changelog to make this clear.

Thanks,

tglx

2023-10-25 14:36:52

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH] genirq: Exclude managed irq during irq migration

On Fri, Oct 20 2023 at 15:25, Chen Yu wrote:
> diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
> index 1698e77645ac..d245ad76661e 100644
> --- a/kernel/irq/matrix.c
> +++ b/kernel/irq/matrix.c
> @@ -475,7 +475,7 @@ unsigned int irq_matrix_allocated(struct irq_matrix *m)
> {
> struct cpumap *cm = this_cpu_ptr(m->maps);
>
> - return cm->allocated;
> + return cm->allocated - cm->managed_allocated;

The kernel documentation above this function is now misleading. Sigh...

Subject: [tip: irq/core] genirq/matrix: Exclude managed interrupts in irq_matrix_allocated()

The following commit has been merged into the irq/core branch of tip:

Commit-ID: a0b0bad10587ae2948a7c36ca4ffc206007fbcf3
Gitweb: https://git.kernel.org/tip/a0b0bad10587ae2948a7c36ca4ffc206007fbcf3
Author: Chen Yu <[email protected]>
AuthorDate: Fri, 20 Oct 2023 15:25:22 +08:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Wed, 25 Oct 2023 17:25:57 +02:00

genirq/matrix: Exclude managed interrupts in irq_matrix_allocated()

When a CPU is about to be offlined, x86 validates that all active
interrupts which are targeted to this CPU can be migrated to the remaining
online CPUs. If not, the offline operation is aborted.

The validation uses irq_matrix_allocated() to retrieve the number of
vectors which are allocated on the outgoing CPU. The returned number of
allocated vectors includes also vectors which are associated to managed
interrupts.

That's overaccounting because managed interrupts are:

- not migrated when the affinity mask of the interrupt targets only
the outgoing CPU

- migrated to another CPU, but in that case the vector is already
pre-allocated on the potential target CPUs and must not be taken into
account.

As a consequence the check whether the remaining online CPUs have enough
capacity for migrating the allocated vectors from the outgoing CPU might
fail incorrectly.

Let irq_matrix_allocated() return only the number of allocated non-managed
interrupts to make this validation check correct.

[ tglx: Amend changelog and fixup kernel-doc comment ]

Fixes: 2f75d9e1c905 ("genirq: Implement bitmap matrix allocator")
Reported-by: Wendy Wang <[email protected]>
Signed-off-by: Chen Yu <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/irq/matrix.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 1698e77..75d0ae4 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -466,16 +466,16 @@ unsigned int irq_matrix_reserved(struct irq_matrix *m)
}

/**
- * irq_matrix_allocated - Get the number of allocated irqs on the local cpu
+ * irq_matrix_allocated - Get the number of allocated non-managed irqs on the local CPU
* @m: Pointer to the matrix to search
*
- * This returns number of allocated irqs
+ * This returns number of allocated non-managed interrupts.
*/
unsigned int irq_matrix_allocated(struct irq_matrix *m)
{
struct cpumap *cm = this_cpu_ptr(m->maps);

- return cm->allocated;
+ return cm->allocated - cm->managed_allocated;
}

#ifdef CONFIG_GENERIC_IRQ_DEBUGFS

2023-10-26 07:03:30

by Chen Yu

[permalink] [raw]
Subject: Re: [RFC PATCH] genirq: Exclude managed irq during irq migration

Hi Thomas,

On 2023-10-25 at 16:34:59 +0200, Thomas Gleixner wrote:
> Chen!
>
> On Fri, Oct 20 2023 at 15:25, Chen Yu wrote:
> > The managed IRQ will be shutdown and not be migrated to
>
> Please write out interrupts in change logs, this is not twitter.
>
> > other CPUs during CPU offline. Later when the CPU is online,
> > the managed IRQ will be re-enabled on this CPU. The managed
> > IRQ can be used to reduce the IRQ migration during CPU hotplug.
> >
> > Before putting the CPU offline, the number of the already allocated
> > IRQs on this offlining CPU will be compared to the total number
>
> The usage of IRQs and vectors is slightly confusing all over the
> place.
>
> > of available IRQ vectors on the remaining online CPUs. If there is
> > not enough slot for these IRQs to be migrated to, the CPU offline
> > will be terminated. However, currently the code treats the managed
> > IRQ as migratable, which is not true, and brings false negative
> > during CPU hotplug and hibernation stress test.
>
> Your assumption that managed interrupts cannot be migrated is only
> correct when the managed interrupts affinity mask has exactly one online
> target CPU. Otherwise the interrupt is migrated to one of the other
> online CPUs in the affinity mask.
>
> Though that does not affect the migrateability calculation because in
> case that a managed interrupt has an affinity mask with more than one
> target CPU set, the vectors on the currently not targeted CPUs are
> already reserved and accounted for in matrix->global_available. IOW,
> migrateability for such managed interrupts is already guaranteed.
>

Got it, the percpu cm->managed has been pre-reserved already and is
excluded from m->global_available. So we still should substract the
number of allocated managed interrupts to avoid duplicated calculation.

> I'll amend the changelog to make this clear.
>

Thanks for helping on this.

thanks,
Chenyu