John (and later on David) reported[1] a while ago that booting with
maxcpus=1, managed affinity devices would fail to get the interrupts
that were associated with offlined CPUs.
Similarly, Xiongfeng reported[2] that the GICv3 ITS would sometime use
non-housekeeping CPUs instead of the affinity that was passed down as
a parameter.
[1] can be fixed by not trying to activate these interrupts if no CPU
that can satisfy the affinity is present (a patch addressing this was
already posted[3])
[2] is a consequence of affinities containing non-online CPUs being
passed down to the interrupt controller driver and the ITS driver
trying to paper over that by ignoring the affinity parameter and doing
its own (stupid) thing. It would be better to (a) get the core code to
remove the offline CPUs from the affinity mask at all times, and (b)
fix the drivers so that they can trust the core code not to trip them.
This small series, based on 5.18-rc1, addresses the above.
Thanks,
M.
From v2 [4]:
- Rebased on 5.18-rc1
[1] https://lore.kernel.org/r/[email protected]
[2] https://lore.kernel.org/r/[email protected]
[3] https://lore.kernel.org/r/[email protected]
[4] https://lore.kernel.org/r/[email protected]
Marc Zyngier (3):
genirq/msi: Shutdown managed interrupts with unsatifiable affinities
genirq: Always limit the affinity to online CPUs
irqchip/gic-v3: Always trust the managed affinity provided by the core
code
drivers/irqchip/irq-gic-v3-its.c | 2 +-
kernel/irq/manage.c | 25 +++++++++++++++++--------
kernel/irq/msi.c | 15 +++++++++++++++
3 files changed, 33 insertions(+), 9 deletions(-)
--
2.34.1
Now that the core code has been fixed to always give us an affinity
that only includes online CPUs, directly use this affinity when
computing a target CPU.
Signed-off-by: Marc Zyngier <[email protected]>
---
drivers/irqchip/irq-gic-v3-its.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index cd772973114a..2656efd5d2b6 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1624,7 +1624,7 @@ static int its_select_cpu(struct irq_data *d,
cpu = cpumask_pick_least_loaded(d, tmpmask);
} else {
- cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask);
+ cpumask_copy(tmpmask, aff_mask);
/* If we cannot cross sockets, limit the search to that node */
if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) &&
--
2.34.1
When booting with maxcpus=<small number>, interrupt controllers
such as the GICv3 ITS may not be able to satisfy the affinity of
some managed interrupts, as some of the HW resources are simply
not available.
The same thing happens when loading a driver using managed interrupts
while CPUs are offline.
In order to deal with this, do not try to activate such interrupt
if there is no online CPU capable of handling it. Instead, place
it in shutdown state. Once a capable CPU shows up, it will be
activated.
Reported-by: John Garry <[email protected]>
Tested-by: John Garry <[email protected]>
Reported-by: David Decotigny <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
---
kernel/irq/msi.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 2bdfce5edafd..a9ee535293eb 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -818,6 +818,21 @@ static int msi_init_virq(struct irq_domain *domain, int virq, unsigned int vflag
irqd_clr_can_reserve(irqd);
if (vflags & VIRQ_NOMASK_QUIRK)
irqd_set_msi_nomask_quirk(irqd);
+
+ /*
+ * If the interrupt is managed but no CPU is available to
+ * service it, shut it down until better times. Note that
+ * we only do this on the !RESERVE path as x86 (the only
+ * architecture using this flag) deals with this in a
+ * different way by using a catch-all vector.
+ */
+ if ((vflags & VIRQ_ACTIVATE) &&
+ irqd_affinity_is_managed(irqd) &&
+ !cpumask_intersects(irq_data_get_affinity_mask(irqd),
+ cpu_online_mask)) {
+ irqd_set_managed_shutdown(irqd);
+ return 0;
+ }
}
if (!(vflags & VIRQ_ACTIVATE))
--
2.34.1
On 05/04/2022 19:50, Marc Zyngier wrote:
> John (and later on David) reported[1] a while ago that booting with
> maxcpus=1, managed affinity devices would fail to get the interrupts
> that were associated with offlined CPUs.
>
> Similarly, Xiongfeng reported[2] that the GICv3 ITS would sometime use
> non-housekeeping CPUs instead of the affinity that was passed down as
> a parameter.
>
> [1] can be fixed by not trying to activate these interrupts if no CPU
> that can satisfy the affinity is present (a patch addressing this was
> already posted[3])
>
> [2] is a consequence of affinities containing non-online CPUs being
> passed down to the interrupt controller driver and the ITS driver
> trying to paper over that by ignoring the affinity parameter and doing
> its own (stupid) thing. It would be better to (a) get the core code to
> remove the offline CPUs from the affinity mask at all times, and (b)
> fix the drivers so that they can trust the core code not to trip them.
>
> This small series, based on 5.18-rc1, addresses the above.
Hi Marc,
Please let me know if you require anything more from me on this one. I
was hoping that Xiongfeng would verify that his "housekeeping" issues
were fixed.
Cheers
>
> Thanks,
>
> M.
>
>>From v2 [4]:
> - Rebased on 5.18-rc1
>
> [1] https://lore.kernel.org/r/[email protected]
> [2] https://lore.kernel.org/r/[email protected]
> [3] https://lore.kernel.org/r/[email protected]
> [4] https://lore.kernel.org/r/[email protected]
>
> Marc Zyngier (3):
> genirq/msi: Shutdown managed interrupts with unsatifiable affinities
> genirq: Always limit the affinity to online CPUs
> irqchip/gic-v3: Always trust the managed affinity provided by the core
> code
>
> drivers/irqchip/irq-gic-v3-its.c | 2 +-
> kernel/irq/manage.c | 25 +++++++++++++++++--------
> kernel/irq/msi.c | 15 +++++++++++++++
> 3 files changed, 33 insertions(+), 9 deletions(-)
>
Hi,
On 2022/4/8 1:29, John Garry wrote:
> On 05/04/2022 19:50, Marc Zyngier wrote:
>> John (and later on David) reported[1] a while ago that booting with
>> maxcpus=1, managed affinity devices would fail to get the interrupts
>> that were associated with offlined CPUs.
>>
>> Similarly, Xiongfeng reported[2] that the GICv3 ITS would sometime use
>> non-housekeeping CPUs instead of the affinity that was passed down as
>> a parameter.
>>
>> [1] can be fixed by not trying to activate these interrupts if no CPU
>> that can satisfy the affinity is present (a patch addressing this was
>> already posted[3])
>>
>> [2] is a consequence of affinities containing non-online CPUs being
>> passed down to the interrupt controller driver and the ITS driver
>> trying to paper over that by ignoring the affinity parameter and doing
>> its own (stupid) thing. It would be better to (a) get the core code to
>> remove the offline CPUs from the affinity mask at all times, and (b)
>> fix the drivers so that they can trust the core code not to trip them.
>>
>> This small series, based on 5.18-rc1, addresses the above.
>
> Hi Marc,
>
> Please let me know if you require anything more from me on this one. I was
> hoping that Xiongfeng would verify that his "housekeeping" issues were fixed.
I have tested the V2 version. It works well and fixed both issues, the
'maxcpus=1' issue and 'housekeeping' issue. Let me know if you need me test this
V3 version. I am not seeing much change, only context change.
Thanks,
Xiongfeng
>
> Cheers
>
>>
>> Thanks,
>>
>> M.
>>
>>> From v2 [4]:
>> - Rebased on 5.18-rc1
>>
>> [1] https://lore.kernel.org/r/[email protected]
>> [2] https://lore.kernel.org/r/[email protected]
>> [3] https://lore.kernel.org/r/[email protected]
>> [4] https://lore.kernel.org/r/[email protected]
>>
>> Marc Zyngier (3):
>> genirq/msi: Shutdown managed interrupts with unsatifiable affinities
>> genirq: Always limit the affinity to online CPUs
>> irqchip/gic-v3: Always trust the managed affinity provided by the core
>> code
>>
>> drivers/irqchip/irq-gic-v3-its.c | 2 +-
>> kernel/irq/manage.c | 25 +++++++++++++++++--------
>> kernel/irq/msi.c | 15 +++++++++++++++
>> 3 files changed, 33 insertions(+), 9 deletions(-)
>>
>
> .
The following commit has been merged into the irq/core branch of tip:
Commit-ID: d802057c7c553ad426520a053da9f9fe08e2c35a
Gitweb: https://git.kernel.org/tip/d802057c7c553ad426520a053da9f9fe08e2c35a
Author: Marc Zyngier <[email protected]>
AuthorDate: Tue, 05 Apr 2022 19:50:38 +01:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Sun, 10 Apr 2022 21:06:30 +02:00
genirq/msi: Shutdown managed interrupts with unsatifiable affinities
When booting with maxcpus=<small number>, interrupt controllers
such as the GICv3 ITS may not be able to satisfy the affinity of
some managed interrupts, as some of the HW resources are simply
not available.
The same thing happens when loading a driver using managed interrupts
while CPUs are offline.
In order to deal with this, do not try to activate such interrupt
if there is no online CPU capable of handling it. Instead, place
it in shutdown state. Once a capable CPU shows up, it will be
activated.
Reported-by: John Garry <[email protected]>
Reported-by: David Decotigny <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: John Garry <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/irq/msi.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 2bdfce5..a9ee535 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -818,6 +818,21 @@ static int msi_init_virq(struct irq_domain *domain, int virq, unsigned int vflag
irqd_clr_can_reserve(irqd);
if (vflags & VIRQ_NOMASK_QUIRK)
irqd_set_msi_nomask_quirk(irqd);
+
+ /*
+ * If the interrupt is managed but no CPU is available to
+ * service it, shut it down until better times. Note that
+ * we only do this on the !RESERVE path as x86 (the only
+ * architecture using this flag) deals with this in a
+ * different way by using a catch-all vector.
+ */
+ if ((vflags & VIRQ_ACTIVATE) &&
+ irqd_affinity_is_managed(irqd) &&
+ !cpumask_intersects(irq_data_get_affinity_mask(irqd),
+ cpu_online_mask)) {
+ irqd_set_managed_shutdown(irqd);
+ return 0;
+ }
}
if (!(vflags & VIRQ_ACTIVATE))
The following commit has been merged into the irq/core branch of tip:
Commit-ID: 3f893a5962d31c0164efdbf6174ed0784f1d7603
Gitweb: https://git.kernel.org/tip/3f893a5962d31c0164efdbf6174ed0784f1d7603
Author: Marc Zyngier <[email protected]>
AuthorDate: Tue, 05 Apr 2022 19:50:40 +01:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Sun, 10 Apr 2022 21:06:30 +02:00
irqchip/gic-v3: Always trust the managed affinity provided by the core code
Now that the core code has been fixed to always give us an affinity
that only includes online CPUs, directly use this affinity when
computing a target CPU.
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
drivers/irqchip/irq-gic-v3-its.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index cd77297..2656efd 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1624,7 +1624,7 @@ static int its_select_cpu(struct irq_data *d,
cpu = cpumask_pick_least_loaded(d, tmpmask);
} else {
- cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask);
+ cpumask_copy(tmpmask, aff_mask);
/* If we cannot cross sockets, limit the search to that node */
if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) &&