LinuxLists.cc - Re: [PATCH 1/2] perf/arm-cmn: Cope with spurious IRQs better

2022-11-30 16:22:57

Subject: Re: [PATCH 1/2] perf/arm-cmn: Cope with spurious IRQs better

Robin,

From my perspective, this is a worse solution as now we're sweeping an
issue under the rug and consuming CPU cycles handling IRQs we should not
be getting in the first place. While an overflow IRQ from the cmn should
not be high frequency, there is a non-zero chance in the future it could
be and this could lead to a very hard to debug performance issue instead
of the current problem, which is discovering we need to clean up better
from a noisy kernel message.

The driver as best I can grok currently is optimized to limit the amount
of register writes for the common use-case, which is setting and unsetting
events, so all the wiring for the PMU to feed events to the DTC is done up
front on load: DTC_CTL's DT_EN bit is set immediately during probe, as is
OVFL_INTR_EN. All the DN states and DTM PMU_CONFIG_PMU_EN is deferred
for when an event is actually set, and here we go through all of them
anyways for each event unless its bynodeid, so the expense of setting
events grows linearly with the mesh size anyways.

Doing a one time scan on remove/shutdown to reset everything to a
clean state seems like a wash performance wise and wanting to keep just a
single register write doesn't make much sense to me.

Should I instead pull this step out as a
generic initialization step that can live as a single chunk of code to
make it better for maintainability? Would that be a better solution?

- Geoff

2022-11-30 19:22:37

by Robin Murphy

[permalink] [raw]

Subject: Re: [PATCH 1/2] perf/arm-cmn: Cope with spurious IRQs better

On 2022-11-30 16:02, Geoff Blake wrote:
> Robin,
>
> From my perspective, this is a worse solution as now we're sweeping an
> issue under the rug and consuming CPU cycles handling IRQs we should not
> be getting in the first place. While an overflow IRQ from the cmn should
> not be high frequency, there is a non-zero chance in the future it could
> be and this could lead to a very hard to debug performance issue instead
> of the current problem, which is discovering we need to clean up better
> from a noisy kernel message.

Kexec is not the only possible source of spurious IRQs. If they cause a
problem for this driver, that cannot be robustly addressed by trying to
rely on whatever software might happen to run before this driver.

> The driver as best I can grok currently is optimized to limit the amount
> of register writes for the common use-case, which is setting and unsetting
> events, so all the wiring for the PMU to feed events to the DTC is done up
> front on load: DTC_CTL's DT_EN bit is set immediately during probe, as is
> OVFL_INTR_EN. All the DN states and DTM PMU_CONFIG_PMU_EN is deferred
> for when an event is actually set, and here we go through all of them
> anyways for each event unless its bynodeid, so the expense of setting
> events grows linearly with the mesh size anyways.

If arm_cmn_init_dtc() writing 0 to PMCR didn't stop the PMU then we've
got bigger problems, because that's how we expect to start and stop it
in normal operation. I'm not ruling out that some subtle bug in that
regard might exist, since I've still not yet had a chance to reproduce
and observe this behaviour on my board, but I've also not seen
sufficient evidence to suggest that that is the case either. (Now that
I'm looking closely, I think there *is* actually a small oversight for
the DTMs, but that would lead to different symptoms than you reported)

At least the writes to PMOVSR_CLR *did* clearly work, because you're
seeing the "nobody cared" message from the IRQ core rather than the
WARN_ON(!dtc->counters[i]) which would happen if a fresh overflow was
actually asserted. Currently I would expect to see up to 4 of those
messages since there can be up to 4 IRQs, but once those are all
requested, enabled, and "handled", all the spurious initially-latched
state should be cleared and any *new* overflows will be indicated in
PMOVSR. I don't see how a single IRQ could ever be unhandled more than
once anyway, if the first time disables it.

Thanks,
Robin.

2022-12-01 00:43:06

by Geoff Blake

[permalink] [raw]

Subject: RE: [PATCH 1/2] perf/arm-cmn: Cope with spurious IRQs better

> > From my perspective, this is a worse solution as now we're sweeping an
> > issue under the rug and consuming CPU cycles handling IRQs we should not
> > be getting in the first place. While an overflow IRQ from the cmn should
> > not be high frequency, there is a non-zero chance in the future it could
> > be and this could lead to a very hard to debug performance issue instead
> > of the current problem, which is discovering we need to clean up better
> > from a noisy kernel message.
>
> Kexec is not the only possible source of spurious IRQs. If they cause a
> problem for this driver, that cannot be robustly addressed by trying to
> rely on whatever software might happen to run before this driver.

Sure, I can agree with the assertion a spurious IRQ could come from
anywhere, in that case though, shouldn't the behavior still be to log
spurious IRQs as a warning instead of silently sinking them?

> > The driver as best I can grok currently is optimized to limit the amount
> > of register writes for the common use-case, which is setting and unsetting
> > events, so all the wiring for the PMU to feed events to the DTC is done up
> > front on load: DTC_CTL's DT_EN bit is set immediately during probe, as is
> > OVFL_INTR_EN. All the DN states and DTM PMU_CONFIG_PMU_EN is deferred
> > for when an event is actually set, and here we go through all of them
> > anyways for each event unless its bynodeid, so the expense of setting
> > events grows linearly with the mesh size anyways.
>
> If arm_cmn_init_dtc() writing 0 to PMCR didn't stop the PMU then we've
> got bigger problems, because that's how we expect to start and stop it
> in normal operation. I'm not ruling out that some subtle bug in that
> regard might exist, since I've still not yet had a chance to reproduce
> and observe this behaviour on my board, but I've also not seen
> sufficient evidence to suggest that that is the case either. (Now that
> I'm looking closely, I think there *is* actually a small oversight for
> the DTMs, but that would lead to different symptoms than you reported)

> At least the writes to PMOVSR_CLR *did* clearly work, because you're
> seeing the "nobody cared" message from the IRQ core rather than the
> WARN_ON(!dtc->counters[i]) which would happen if a fresh overflow was
> actually asserted. Currently I would expect to see up to 4 of those
> messages since there can be up to 4 IRQs, but once those are all
> requested, enabled, and "handled", all the spurious initially-latched
> state should be cleared and any *new* overflows will be indicated in
> PMOVSR. I don't see how a single IRQ could ever be unhandled more than
> once anyway, if the first time disables it.

I do see 4 of these "nobody cared" messages in all the times I've
reproduced it, but saw no need to copy paste all of them in with the
original post. Looking back over the code I see why more clearly your
assertion we only need to clear the DT_EN bit as the PMU is off at
the DTC with the PMCR set to 0 on init, but it is really hard to
see why that is with all the various places bits of configuration is done,
but it is still not easy to verify if unsetting that bit is sufficient to
not get into some odd corner cases.

Is there any argument against me taking another pass and try separating
out discovery, from a shared reset/initialization code path?

-Geoff

2022-12-01 18:42:22

by Robin Murphy

[permalink] [raw]

Subject: Re: [PATCH 1/2] perf/arm-cmn: Cope with spurious IRQs better

On 2022-11-30 23:13, Geoff Blake wrote:
>
>>> From my perspective, this is a worse solution as now we're sweeping an
>>> issue under the rug and consuming CPU cycles handling IRQs we should not
>>> be getting in the first place. While an overflow IRQ from the cmn should
>>> not be high frequency, there is a non-zero chance in the future it could
>>> be and this could lead to a very hard to debug performance issue instead
>>> of the current problem, which is discovering we need to clean up better
>>> from a noisy kernel message.
>>
>> Kexec is not the only possible source of spurious IRQs. If they cause a
>> problem for this driver, that cannot be robustly addressed by trying to
>> rely on whatever software might happen to run before this driver.
>
> Sure, I can agree with the assertion a spurious IRQ could come from
> anywhere, in that case though, shouldn't the behavior still be to log
> spurious IRQs as a warning instead of silently sinking them?

We still have to handle the interrupt anyway to avoid it getting
disabled behind our back, and beyond that it's not really something
that's actionable by the user. What would we say?

dev_warn(dev, "Something harmless, and in some cases expected,
happened! If you've just rebooted after a kernel panic, maybe try having
the kernel not panic?");

Perhaps that should be a core IRQ helper so that many other drivers can
also call it too?

Furthermore if you're worried about performance implications from a
theoretical interrupt storm, I can tell you from experience that logging
to a serial console from a high-frequency interrupt handler is one of
the best ways to cripple a system to the point where reaching for the
power switch is the only option.

>>> The driver as best I can grok currently is optimized to limit the amount
>>> of register writes for the common use-case, which is setting and unsetting
>>> events, so all the wiring for the PMU to feed events to the DTC is done up
>>> front on load: DTC_CTL's DT_EN bit is set immediately during probe, as is
>>> OVFL_INTR_EN. All the DN states and DTM PMU_CONFIG_PMU_EN is deferred
>>> for when an event is actually set, and here we go through all of them
>>> anyways for each event unless its bynodeid, so the expense of setting
>>> events grows linearly with the mesh size anyways.
>>
>> If arm_cmn_init_dtc() writing 0 to PMCR didn't stop the PMU then we've
>> got bigger problems, because that's how we expect to start and stop it
>> in normal operation. I'm not ruling out that some subtle bug in that
>> regard might exist, since I've still not yet had a chance to reproduce
>> and observe this behaviour on my board, but I've also not seen
>> sufficient evidence to suggest that that is the case either. (Now that
>> I'm looking closely, I think there *is* actually a small oversight for
>> the DTMs, but that would lead to different symptoms than you reported)
>
>> At least the writes to PMOVSR_CLR *did* clearly work, because you're
>> seeing the "nobody cared" message from the IRQ core rather than the
>> WARN_ON(!dtc->counters[i]) which would happen if a fresh overflow was
>> actually asserted. Currently I would expect to see up to 4 of those
>> messages since there can be up to 4 IRQs, but once those are all
>> requested, enabled, and "handled", all the spurious initially-latched
>> state should be cleared and any *new* overflows will be indicated in
>> PMOVSR. I don't see how a single IRQ could ever be unhandled more than
>> once anyway, if the first time disables it.
>
> I do see 4 of these "nobody cared" messages in all the times I've
> reproduced it, but saw no need to copy paste all of them in with the
> original post.

For reference, the key point to look for is that the IRQ numbers are
different, so it's not actually the *same* message multiple times, it's
multiple messages representing distinct causes.

> Looking back over the code I see why more clearly your
> assertion we only need to clear the DT_EN bit as the PMU is off at
> the DTC with the PMCR set to 0 on init, but it is really hard to
> see why that is with all the various places bits of configuration is done,
> but it is still not easy to verify if unsetting that bit is sufficient to
> not get into some odd corner cases.

The DTC_CTL documentation seems fairly unambiguous:

[0] dt_en Enables debug, trace, and PMU features

The design intent is that the PMU counters do not count when the entire
PMU feature is disabled. I'm pretty sure I did confirm that empirically
during development too (I recall the sheer number of different "enable"
bits baffled me at the beginning, and there was actually one that did
nothing, which I think did eventually get removed from the documentation).

Of course clearing PMCR_PMU_EN is sufficient to simply stop counting,
which we also depend on for correct operation, but I believe clearing
DT_EN allows it to put all of the DT logic into a quiescent state.

> Is there any argument against me taking another pass and try separating
> out discovery, from a shared reset/initialization code path?

Frankly, yes. Discovery and initialisation are already as distinct as I
could realistically make them:

err = arm_cmn_discover(cmn, rootnode);
if (err)
return err;

err = arm_cmn_init_dtcs(cmn);
if (err)
return err;

err = arm_cmn_init_irqs(cmn);
if (err)
return err;

It's true that that DTM initialisation is still entangled in the middle
of the discovery walk, but the alternatives are to separately initialise
the DTM data and registers from two different places (yuck), or
duplicate a chunk of the discovery process in a subsequent
initialisation phase to establish the XP/DTM relationship (also yuck).

With the possible exception of arm_cmn_init_irqs() which may contain a
teeny bit of self-indulgence, I've tried hard to make most of the code
as concise and clear as it can be within the bounds of the perf API and
the way the CMN PMU fundamentally works, also balanced with trying to
keep memory footprint and runtime overhead from getting out of hand
(still a bit of a losing battle when updating a single event count might
require 148 register reads...). I accept that it's a bit of a personal
preference that less code is easier to follow than more code, but I
don't think it's a particularly rare one, and the fact is that the way
the CMN PMU works is rather complicated compared to most other PMUs, so
that brings an unavoidable level of complexity to the driver, and adding
more code to do additional unnecessary work cannot conceivably make it
any less complex.

Thanks,
Robin.

2022-12-05 15:53:57

by Geoff Blake

[permalink] [raw]

Subject: RE: [PATCH 1/2] perf/arm-cmn: Cope with spurious IRQs better

> > > > From my perspective, this is a worse solution as now we're sweeping an
> > > > issue under the rug and consuming CPU cycles handling IRQs we should not
> > > > be getting in the first place. While an overflow IRQ from the cmn
> > > > should
> > > > not be high frequency, there is a non-zero chance in the future it could
> > > > be and this could lead to a very hard to debug performance issue instead
> > > > of the current problem, which is discovering we need to clean up better
> > > > from a noisy kernel message.
> > >
> > > Kexec is not the only possible source of spurious IRQs. If they cause a
> > > problem for this driver, that cannot be robustly addressed by trying to
> > > rely on whatever software might happen to run before this driver.
> >
> > Sure, I can agree with the assertion a spurious IRQ could come from
> > anywhere, in that case though, shouldn't the behavior still be to log
> > spurious IRQs as a warning instead of silently sinking them?
>
> We still have to handle the interrupt anyway to avoid it getting
> disabled behind our back, and beyond that it's not really something
> that's actionable by the user. What would we say?
>
> dev_warn(dev, "Something harmless, and in some cases expected,
> happened! If you've just rebooted after a kernel panic, maybe try having
> the kernel not panic?");
>
> Perhaps that should be a core IRQ helper so that many other drivers can
> also call it too?
>
> Furthermore if you're worried about performance implications from a
> theoretical interrupt storm, I can tell you from experience that logging
> to a serial console from a high-frequency interrupt handler is one of
> the best ways to cripple a system to the point where reaching for the
> power switch is the only option.

Logging unexpected events is necessary to give clues of what is going
wrong before they implode on fully remote machines. If you prefer to
handle the IRQ here rather than in the bad_irq section, then can we at
least have a WARN_ON() in the case where a spurious IRQ happens but no
overflow bit is set.

> The DTC_CTL documentation seems fairly unambiguous:
>
> [0] dt_en Enables debug, trace, and PMU features
>
> The design intent is that the PMU counters do not count when the entire
> PMU feature is disabled. I'm pretty sure I did confirm that empirically
> during development too (I recall the sheer number of different "enable"
> bits baffled me at the beginning, and there was actually one that did
> nothing, which I think did eventually get removed from the documentation).
>
> Of course clearing PMCR_PMU_EN is sufficient to simply stop counting,
> which we also depend on for correct operation, but I believe clearing
> DT_EN allows it to put all of the DT logic into a quiescent state.

I took the other patch that writes 0 to DTC_CTL.dt_en only and put it in a
loop of kexec'ing when the PMU is active for a few hours, I did not see
anymore spurious IRQs (whereas with the stock driver I could reproduce in under 10 tries).
You are correct Robin, that is all that is needed, and my code was overly
cautious.

- Geoff

2022-12-15 18:25:14

by Geoff Blake

[permalink] [raw]

Subject: [PATCH] perf/arm-cmn: Add shutdown routine

Attempt #2 with all the feedback from Robin to do the minimal amount of
shutdown and handle spurious IRQs within the CMN driver but still do
limited logging in the event a spurious IRQ still occurs in the future.
Tested over 100's of kexec's and have no reproduced the spurious IRQs.

The CMN driver does not gracefully handle all
restart cases, such as kexec. On a kexec if the
arm-cmn driver is in use it can be left in a state
with still active events that can cause spurious and/or
unhandled interrupts that appear as non-fatal kernel errors
like below, that can be confusing and misleading:

[ 3.895093] irq 28: nobody cared (try booting with the "irqpoll" option)
[ 3.895170] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-1011-aws #12
[ 3.895172] Hardware name: Amazon EC2 c6g.metal/Not Specified, BIOS 1.0 10/16/2017
[ 3.895174] Call trace:
[ 3.895175] dump_backtrace+0xe8/0x150
[ 3.895181] show_stack+0x28/0x70
[ 3.895183] dump_stack_lvl+0x68/0x9c
[ 3.895188] dump_stack+0x1c/0x48
[ 3.895190] __report_bad_irq+0x58/0x138
[ 3.895193] note_interrupt+0x23c/0x360
[ 3.895196] handle_irq_event+0x108/0x1a0
[ 3.895198] handle_fasteoi_irq+0xd0/0x24c
[ 3.895201] generic_handle_domain_irq+0x3c/0x70
[ 3.895203] __gic_handle_irq_from_irqson.isra.0+0xcc/0x2c0
[ 3.895207] gic_handle_irq+0x34/0xb0
[ 3.895209] call_on_irq_stack+0x40/0x50
[ 3.895211] do_interrupt_handler+0xb0/0xb4
[ 3.895214] el1_interrupt+0x4c/0xe0
[ 3.895217] el1h_64_irq_handler+0x1c/0x40
[ 3.895220] el1h_64_irq+0x78/0x7c
[ 3.895222] __do_softirq+0xd0/0x450
[ 3.895223] __irq_exit_rcu+0xcc/0x120
[ 3.895227] irq_exit_rcu+0x20/0x40
[ 3.895229] el1_interrupt+0x50/0xe0
[ 3.895231] el1h_64_irq_handler+0x1c/0x40
[ 3.895233] el1h_64_irq+0x78/0x7c
[ 3.895235] arch_cpu_idle+0x1c/0x6c
[ 3.895238] default_idle_call+0x4c/0x19c
[ 3.895240] cpuidle_idle_call+0x18c/0x1f0
[ 3.895243] do_idle+0xb0/0x11c
[ 3.895245] cpu_startup_entry+0x34/0x40
[ 3.895248] rest_init+0xec/0x104
[ 3.895250] arch_post_acpi_subsys_init+0x0/0x30
[ 3.895254] start_kernel+0x4d0/0x534
[ 3.895256] __primary_switched+0xc4/0xcc
[ 3.895259] handlers:
[ 3.895292] [<000000008f5364c7>] arm_cmn_handle_irq [arm_cmn]
[ 3.895369] Disabling IRQ #28

This type of kernel error can be reproduced by running perf with
an arm_cmn event active and then forcing a kexec. On return from
the kexec, this message can appear semi-regularly.

Signed-off-by: Geoff Blake <[email protected]>
---
drivers/perf/arm-cmn.c | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index b80a9b74662b..5e661a9aa0fe 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -112,6 +112,7 @@
#define CMN_DTM_UNIT_INFO 0x0910

#define CMN_DTM_NUM_COUNTERS 4
+#define CMN_DTM_NUM_WPS 4
/* Want more local counters? Why not replicate the whole DTM! Ugh... */
#define CMN_DTM_OFFSET(n) ((n) * 0x200)

@@ -1797,6 +1798,7 @@ static int arm_cmn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_no

static irqreturn_t arm_cmn_handle_irq(int irq, void *dev_id)
{
+ static int spurious_count = 100;
struct arm_cmn_dtc *dtc = dev_id;
irqreturn_t ret = IRQ_NONE;

@@ -1825,8 +1827,13 @@ static irqreturn_t arm_cmn_handle_irq(int irq, void *dev_id)

writel_relaxed(status, dtc->base + CMN_DT_PMOVSR_CLR);

- if (!dtc->irq_friend)
- return ret;
+ if (!dtc->irq_friend) {
+ if (ret != IRQ_HANDLED && spurious_count > 0) {
+ spurious_count--;
+ WARN_ON(ret != IRQ_HANDLED);
+ }
+ return IRQ_HANDLED;
+ }
dtc += dtc->irq_friend;
}
}
@@ -1865,7 +1872,7 @@ static void arm_cmn_init_dtm(struct arm_cmn_dtm *dtm, struct arm_cmn_node *xp, i

dtm->base = xp->pmu_base + CMN_DTM_OFFSET(idx);
dtm->pmu_config_low = CMN_DTM_PMU_CONFIG_PMU_EN;
- for (i = 0; i < 4; i++) {
+ for (i = 0; i < CMN_DTM_NUM_WPS; i++) {
dtm->wp_event[i] = -1;
writeq_relaxed(0, dtm->base + CMN_DTM_WPn_MASK(i));
writeq_relaxed(~0ULL, dtm->base + CMN_DTM_WPn_VAL(i));
@@ -2312,11 +2319,18 @@ static int arm_cmn_probe(struct platform_device *pdev)
return err;
}

-static int arm_cmn_remove(struct platform_device *pdev)
+static void arm_cmn_shutdown(struct platform_device *pdev)
{
struct arm_cmn *cmn = platform_get_drvdata(pdev);

writel_relaxed(0, cmn->dtc[0].base + CMN_DT_DTC_CTL);
+}
+
+static int arm_cmn_remove(struct platform_device *pdev)
+{
+ struct arm_cmn *cmn = platform_get_drvdata(pdev);
+
+ arm_cmn_shutdown(pdev);

perf_pmu_unregister(&cmn->pmu);
cpuhp_state_remove_instance_nocalls(arm_cmn_hp_state, &cmn->cpuhp_node);
@@ -2353,6 +2367,7 @@ static struct platform_driver arm_cmn_driver = {
},
.probe = arm_cmn_probe,
.remove = arm_cmn_remove,
+ .shutdown = arm_cmn_shutdown,
};

static int __init arm_cmn_init(void)
--
2.24.3 (Apple Git-128)

2023-01-04 16:01:43

by Geoff Blake

[permalink] [raw]

Subject: Re: [PATCH] perf/arm-cmn: Add shutdown routine

Robin, Will,

Happy new year! Hope I can get some attention back on this patch.
Only difference from Robin's is it will do limited logging if spurious
interrupts still happen to occur on current or future CMN implementations.

Thanks,
Geoff

2023-01-19 16:09:11

by Geoff Blake

[permalink] [raw]

Subject: Re: [PATCH] perf/arm-cmn: Add shutdown routine

Robin, Will,

Still looking for a follow-up on this patch, would appreciate another
review cycle.

-Geoff

2023-01-19 16:12:41

by Robin Murphy

[permalink] [raw]

Subject: Re: [PATCH] perf/arm-cmn: Add shutdown routine

On 04/01/2023 3:55 pm, Geoff Blake wrote:
> Robin, Will,
>
> Happy new year! Hope I can get some attention back on this patch.
> Only difference from Robin's is it will do limited logging if spurious
> interrupts still happen to occur on current or future CMN implementations.

In all honesty I'm not sure what you want me to say... now you've
written the same patch that I already sent, but still with an incorrect
commit message, and with some unrelated changes that aren't mentioned
and have nothing to do with shutdown anyway. Please see:

https://www.kernel.org/doc/html/latest/process/submitting-patches.html#separate-your-changes

If you have a convincing argument that returning IRQ_NONE for unexpected
spurious interrupts is a real and important concern, then please propose
a general solution, because if it matters for arm-cmn then it matters
for hundreds of other drivers too, by rough estimate:

$ git grep -l IRQ_NONE '*.c' | xargs git grep -L IRQF_SHARED | wc -l
834

Thanks,
Robin.

2023-01-19 17:22:43

by Geoff Blake

[permalink] [raw]

Subject: RE: [PATCH] perf/arm-cmn: Add shutdown routine

On Thu, 19 Jan 2023, Robin Murphy wrote:

> If you have a convincing argument that returning IRQ_NONE for unexpected
> spurious interrupts is a real and important concern, then please propose
> a general solution, because if it matters for arm-cmn then it matters
> for hundreds of other drivers too, by rough estimate:
>
> $ git grep -l IRQ_NONE '*.c' | xargs git grep -L IRQF_SHARED | wc -l
> 834

The general solution for IRQ_NONE exists in the layer above the
driver, it complains with a visible warning that something might be wrong
and then moves on. Nothing more is needed.

Your shutdown routine that flips DT_EN in CMN_DT_DTC_CTL is sufficient, as
after some testing it solves the problem with left over IRQs for my kexec
use case.

-Geoff

2023-01-19 18:28:12

by Will Deacon

[permalink] [raw]

Subject: Re: [PATCH] perf/arm-cmn: Add shutdown routine

On Thu, Jan 19, 2023 at 11:10:29AM -0600, Geoff Blake wrote:
> On Thu, 19 Jan 2023, Robin Murphy wrote:
>
> > If you have a convincing argument that returning IRQ_NONE for unexpected
> > spurious interrupts is a real and important concern, then please propose
> > a general solution, because if it matters for arm-cmn then it matters
> > for hundreds of other drivers too, by rough estimate:
> >
> > $ git grep -l IRQ_NONE '*.c' | xargs git grep -L IRQF_SHARED | wc -l
> > 834
>
> The general solution for IRQ_NONE exists in the layer above the
> driver, it complains with a visible warning that something might be wrong
> and then moves on. Nothing more is needed.
>
> Your shutdown routine that flips DT_EN in CMN_DT_DTC_CTL is sufficient, as
> after some testing it solves the problem with left over IRQs for my kexec
> use case.

Please can you two let me know when you've settled on a fix for this? I'm
not going to queue something that you don't both agree on, although it seems
like we're quibbling over details at this point so maybe let's get
_something_ in and then work to improve it later?

Cheers,

Will