Subject: [PATCH] arch: fix 'unexpected IRQ trap at vector' warnings

All archs, except Alpha, print out the irq number in hex, but the message
looks like it was a decimal number, which is quite confusing. Fixing this
by adding "0x" prefix.

Signed-off-by: Enrico Weigelt, metux IT consult <[email protected]>
---
arch/arm/include/asm/hw_irq.h | 2 +-
arch/parisc/include/asm/hardirq.h | 2 +-
arch/powerpc/include/asm/hardirq.h | 2 +-
arch/s390/include/asm/hardirq.h | 2 +-
arch/um/include/asm/hardirq.h | 2 +-
arch/x86/kernel/irq.c | 2 +-
6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/hw_irq.h b/arch/arm/include/asm/hw_irq.h
index cecc13214ef1..2749f19271d9 100644
--- a/arch/arm/include/asm/hw_irq.h
+++ b/arch/arm/include/asm/hw_irq.h
@@ -9,7 +9,7 @@ static inline void ack_bad_irq(int irq)
{
extern unsigned long irq_err_count;
irq_err_count++;
- pr_crit("unexpected IRQ trap at vector %02x\n", irq);
+ pr_crit("unexpected IRQ trap at vector 0x%02x\n", irq);
}

#define ARCH_IRQ_INIT_FLAGS (IRQ_NOREQUEST | IRQ_NOPROBE)
diff --git a/arch/parisc/include/asm/hardirq.h b/arch/parisc/include/asm/hardirq.h
index 7f7039516e53..c3348af88d3f 100644
--- a/arch/parisc/include/asm/hardirq.h
+++ b/arch/parisc/include/asm/hardirq.h
@@ -35,6 +35,6 @@ DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
#define __IRQ_STAT(cpu, member) (irq_stat[cpu].member)
#define inc_irq_stat(member) this_cpu_inc(irq_stat.member)
#define __inc_irq_stat(member) __this_cpu_inc(irq_stat.member)
-#define ack_bad_irq(irq) WARN(1, "unexpected IRQ trap at vector %02x\n", irq)
+#define ack_bad_irq(irq) WARN(1, "unexpected IRQ trap at vector 0x%02x\n", irq)

#endif /* _PARISC_HARDIRQ_H */
diff --git a/arch/powerpc/include/asm/hardirq.h b/arch/powerpc/include/asm/hardirq.h
index f133b5930ae1..ec8cf3cf6e49 100644
--- a/arch/powerpc/include/asm/hardirq.h
+++ b/arch/powerpc/include/asm/hardirq.h
@@ -29,7 +29,7 @@ DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);

static inline void ack_bad_irq(unsigned int irq)
{
- printk(KERN_CRIT "unexpected IRQ trap at vector %02x\n", irq);
+ printk(KERN_CRIT "unexpected IRQ trap at vector 0x%02x\n", irq);
}

extern u64 arch_irq_stat_cpu(unsigned int cpu);
diff --git a/arch/s390/include/asm/hardirq.h b/arch/s390/include/asm/hardirq.h
index dfbc3c6c0674..aaaec5cdd4fe 100644
--- a/arch/s390/include/asm/hardirq.h
+++ b/arch/s390/include/asm/hardirq.h
@@ -23,7 +23,7 @@

static inline void ack_bad_irq(unsigned int irq)
{
- printk(KERN_CRIT "unexpected IRQ trap at vector %02x\n", irq);
+ printk(KERN_CRIT "unexpected IRQ trap at vector 0x%02x\n", irq);
}

#endif /* __ASM_HARDIRQ_H */
diff --git a/arch/um/include/asm/hardirq.h b/arch/um/include/asm/hardirq.h
index b426796d26fd..2a2e6eae034b 100644
--- a/arch/um/include/asm/hardirq.h
+++ b/arch/um/include/asm/hardirq.h
@@ -15,7 +15,7 @@ typedef struct {
#ifndef ack_bad_irq
static inline void ack_bad_irq(unsigned int irq)
{
- printk(KERN_CRIT "unexpected IRQ trap at vector %02x\n", irq);
+ printk(KERN_CRIT "unexpected IRQ trap at vector 0x%02x\n", irq);
}
#endif

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index c5dd50369e2f..957c716f2df7 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -37,7 +37,7 @@ atomic_t irq_err_count;
void ack_bad_irq(unsigned int irq)
{
if (printk_ratelimit())
- pr_err("unexpected IRQ trap at vector %02x\n", irq);
+ pr_err("unexpected IRQ trap at vector 0x%02x\n", irq);

/*
* Currently unexpected vectors happen only on SMP and APIC.
--
2.11.0


2020-12-08 02:15:58

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH] arch: fix 'unexpected IRQ trap at vector' warnings

"Enrico Weigelt, metux IT consult" <[email protected]> writes:
> All archs, except Alpha, print out the irq number in hex, but the message
> looks like it was a decimal number, which is quite confusing. Fixing this
> by adding "0x" prefix.

Arguably decimal would be better, /proc/interrupts and /proc/irq/ both
use decimal.

The whole message is very dated IMO, these days the number it prints is
(possibly) virtualised via IRQ domains, ie. it's not necessarily a
"vector" if that even makes sense on all arches). Arguably "trap" is the
wrong term on some arches too.

So it would be better reworded entirely IMO, and also switched to
decimal to match other sources of information on interrupts.

Perhaps:
"Unexpected Linux IRQ %d."


If anyone else is having deja vu like me, yes this has come up before:
https://lore.kernel.org/lkml/20150712220211.7166.42035.stgit@bhelgaas-glaptop2.roam.corp.google.com/

cheers



> diff --git a/arch/arm/include/asm/hw_irq.h b/arch/arm/include/asm/hw_irq.h
> index cecc13214ef1..2749f19271d9 100644
> --- a/arch/arm/include/asm/hw_irq.h
> +++ b/arch/arm/include/asm/hw_irq.h
> @@ -9,7 +9,7 @@ static inline void ack_bad_irq(int irq)
> {
> extern unsigned long irq_err_count;
> irq_err_count++;
> - pr_crit("unexpected IRQ trap at vector %02x\n", irq);
> + pr_crit("unexpected IRQ trap at vector 0x%02x\n", irq);
> }
>
> #define ARCH_IRQ_INIT_FLAGS (IRQ_NOREQUEST | IRQ_NOPROBE)
> diff --git a/arch/parisc/include/asm/hardirq.h b/arch/parisc/include/asm/hardirq.h
> index 7f7039516e53..c3348af88d3f 100644
> --- a/arch/parisc/include/asm/hardirq.h
> +++ b/arch/parisc/include/asm/hardirq.h
> @@ -35,6 +35,6 @@ DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
> #define __IRQ_STAT(cpu, member) (irq_stat[cpu].member)
> #define inc_irq_stat(member) this_cpu_inc(irq_stat.member)
> #define __inc_irq_stat(member) __this_cpu_inc(irq_stat.member)
> -#define ack_bad_irq(irq) WARN(1, "unexpected IRQ trap at vector %02x\n", irq)
> +#define ack_bad_irq(irq) WARN(1, "unexpected IRQ trap at vector 0x%02x\n", irq)
>
> #endif /* _PARISC_HARDIRQ_H */
> diff --git a/arch/powerpc/include/asm/hardirq.h b/arch/powerpc/include/asm/hardirq.h
> index f133b5930ae1..ec8cf3cf6e49 100644
> --- a/arch/powerpc/include/asm/hardirq.h
> +++ b/arch/powerpc/include/asm/hardirq.h
> @@ -29,7 +29,7 @@ DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
>
> static inline void ack_bad_irq(unsigned int irq)
> {
> - printk(KERN_CRIT "unexpected IRQ trap at vector %02x\n", irq);
> + printk(KERN_CRIT "unexpected IRQ trap at vector 0x%02x\n", irq);
> }
>
> extern u64 arch_irq_stat_cpu(unsigned int cpu);
> diff --git a/arch/s390/include/asm/hardirq.h b/arch/s390/include/asm/hardirq.h
> index dfbc3c6c0674..aaaec5cdd4fe 100644
> --- a/arch/s390/include/asm/hardirq.h
> +++ b/arch/s390/include/asm/hardirq.h
> @@ -23,7 +23,7 @@
>
> static inline void ack_bad_irq(unsigned int irq)
> {
> - printk(KERN_CRIT "unexpected IRQ trap at vector %02x\n", irq);
> + printk(KERN_CRIT "unexpected IRQ trap at vector 0x%02x\n", irq);
> }
>
> #endif /* __ASM_HARDIRQ_H */
> diff --git a/arch/um/include/asm/hardirq.h b/arch/um/include/asm/hardirq.h
> index b426796d26fd..2a2e6eae034b 100644
> --- a/arch/um/include/asm/hardirq.h
> +++ b/arch/um/include/asm/hardirq.h
> @@ -15,7 +15,7 @@ typedef struct {
> #ifndef ack_bad_irq
> static inline void ack_bad_irq(unsigned int irq)
> {
> - printk(KERN_CRIT "unexpected IRQ trap at vector %02x\n", irq);
> + printk(KERN_CRIT "unexpected IRQ trap at vector 0x%02x\n", irq);
> }
> #endif
>
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> index c5dd50369e2f..957c716f2df7 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
> @@ -37,7 +37,7 @@ atomic_t irq_err_count;
> void ack_bad_irq(unsigned int irq)
> {
> if (printk_ratelimit())
> - pr_err("unexpected IRQ trap at vector %02x\n", irq);
> + pr_err("unexpected IRQ trap at vector 0x%02x\n", irq);
>
> /*
> * Currently unexpected vectors happen only on SMP and APIC.
> --
> 2.11.0

2020-12-08 14:47:46

by Helge Deller

[permalink] [raw]
Subject: Re: [PATCH] arch: fix 'unexpected IRQ trap at vector' warnings

On 12/8/20 3:11 AM, Michael Ellerman wrote:
> "Enrico Weigelt, metux IT consult" <[email protected]> writes:
>> All archs, except Alpha, print out the irq number in hex, but the message
>> looks like it was a decimal number, which is quite confusing. Fixing this
>> by adding "0x" prefix.
>
> Arguably decimal would be better, /proc/interrupts and /proc/irq/ both
> use decimal.

I agree.

> The whole message is very dated IMO, these days the number it prints is
> (possibly) virtualised via IRQ domains, ie. it's not necessarily a
> "vector" if that even makes sense on all arches). Arguably "trap" is the
> wrong term on some arches too.
>
> So it would be better reworded entirely IMO, and also switched to
> decimal to match other sources of information on interrupts.
>
> Perhaps:
> "Unexpected Linux IRQ %d."

Yes.

and while cleaning it up, introducing a default weak implementation of ack_bad_irq()
which adds and increases irq_err_count for all platforms would be a nice cleanup.

Helge

> If anyone else is having deja vu like me, yes this has come up before:
> https://lore.kernel.org/lkml/20150712220211.7166.42035.stgit@bhelgaas-glaptop2.roam.corp.google.com/
>
> cheers
>
>
>
>> diff --git a/arch/arm/include/asm/hw_irq.h b/arch/arm/include/asm/hw_irq.h
>> index cecc13214ef1..2749f19271d9 100644
>> --- a/arch/arm/include/asm/hw_irq.h
>> +++ b/arch/arm/include/asm/hw_irq.h
>> @@ -9,7 +9,7 @@ static inline void ack_bad_irq(int irq)
>> {
>> extern unsigned long irq_err_count;
>> irq_err_count++;
>> - pr_crit("unexpected IRQ trap at vector %02x\n", irq);
>> + pr_crit("unexpected IRQ trap at vector 0x%02x\n", irq);
>> }
>>
>> #define ARCH_IRQ_INIT_FLAGS (IRQ_NOREQUEST | IRQ_NOPROBE)
>> diff --git a/arch/parisc/include/asm/hardirq.h b/arch/parisc/include/asm/hardirq.h
>> index 7f7039516e53..c3348af88d3f 100644
>> --- a/arch/parisc/include/asm/hardirq.h
>> +++ b/arch/parisc/include/asm/hardirq.h
>> @@ -35,6 +35,6 @@ DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
>> #define __IRQ_STAT(cpu, member) (irq_stat[cpu].member)
>> #define inc_irq_stat(member) this_cpu_inc(irq_stat.member)
>> #define __inc_irq_stat(member) __this_cpu_inc(irq_stat.member)
>> -#define ack_bad_irq(irq) WARN(1, "unexpected IRQ trap at vector %02x\n", irq)
>> +#define ack_bad_irq(irq) WARN(1, "unexpected IRQ trap at vector 0x%02x\n", irq)
>>
>> #endif /* _PARISC_HARDIRQ_H */
>> diff --git a/arch/powerpc/include/asm/hardirq.h b/arch/powerpc/include/asm/hardirq.h
>> index f133b5930ae1..ec8cf3cf6e49 100644
>> --- a/arch/powerpc/include/asm/hardirq.h
>> +++ b/arch/powerpc/include/asm/hardirq.h
>> @@ -29,7 +29,7 @@ DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
>>
>> static inline void ack_bad_irq(unsigned int irq)
>> {
>> - printk(KERN_CRIT "unexpected IRQ trap at vector %02x\n", irq);
>> + printk(KERN_CRIT "unexpected IRQ trap at vector 0x%02x\n", irq);
>> }
>>
>> extern u64 arch_irq_stat_cpu(unsigned int cpu);
>> diff --git a/arch/s390/include/asm/hardirq.h b/arch/s390/include/asm/hardirq.h
>> index dfbc3c6c0674..aaaec5cdd4fe 100644
>> --- a/arch/s390/include/asm/hardirq.h
>> +++ b/arch/s390/include/asm/hardirq.h
>> @@ -23,7 +23,7 @@
>>
>> static inline void ack_bad_irq(unsigned int irq)
>> {
>> - printk(KERN_CRIT "unexpected IRQ trap at vector %02x\n", irq);
>> + printk(KERN_CRIT "unexpected IRQ trap at vector 0x%02x\n", irq);
>> }
>>
>> #endif /* __ASM_HARDIRQ_H */
>> diff --git a/arch/um/include/asm/hardirq.h b/arch/um/include/asm/hardirq.h
>> index b426796d26fd..2a2e6eae034b 100644
>> --- a/arch/um/include/asm/hardirq.h
>> +++ b/arch/um/include/asm/hardirq.h
>> @@ -15,7 +15,7 @@ typedef struct {
>> #ifndef ack_bad_irq
>> static inline void ack_bad_irq(unsigned int irq)
>> {
>> - printk(KERN_CRIT "unexpected IRQ trap at vector %02x\n", irq);
>> + printk(KERN_CRIT "unexpected IRQ trap at vector 0x%02x\n", irq);
>> }
>> #endif
>>
>> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
>> index c5dd50369e2f..957c716f2df7 100644
>> --- a/arch/x86/kernel/irq.c
>> +++ b/arch/x86/kernel/irq.c
>> @@ -37,7 +37,7 @@ atomic_t irq_err_count;
>> void ack_bad_irq(unsigned int irq)
>> {
>> if (printk_ratelimit())
>> - pr_err("unexpected IRQ trap at vector %02x\n", irq);
>> + pr_err("unexpected IRQ trap at vector 0x%02x\n", irq);
>>
>> /*
>> * Currently unexpected vectors happen only on SMP and APIC.
>> --
>> 2.11.0

2020-12-09 01:15:00

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] arch: fix 'unexpected IRQ trap at vector' warnings

On Tue, Dec 08 2020 at 13:11, Michael Ellerman wrote:
> "Enrico Weigelt, metux IT consult" <[email protected]> writes:
>> All archs, except Alpha, print out the irq number in hex, but the message
>> looks like it was a decimal number, which is quite confusing. Fixing this
>> by adding "0x" prefix.
>
> Arguably decimal would be better, /proc/interrupts and /proc/irq/ both
> use decimal.
>
> The whole message is very dated IMO, these days the number it prints is
> (possibly) virtualised via IRQ domains, ie. it's not necessarily a
> "vector" if that even makes sense on all arches). Arguably "trap" is the
> wrong term on some arches too.
>
> So it would be better reworded entirely IMO, and also switched to
> decimal to match other sources of information on interrupts.

So much for the theory.

The printk originates from the very early days of i386 Linux where it
was called from the low level entry code when there was no interrupt
assigned to a vector, which is an x86'ism.

That was copied to other architectures without actually thinking about
whether the vector concept made sense on that architecture and at some
point it got completely bonkers because it moved to core code without
thought.

There are a few situations why it is invoked or not:

1) The original x86 usage is not longer using it because it complains
rightfully about a vector being raised which has no interrupt
descriptor associated to it. So the original reason for naming it
vector is gone long ago. It emits:

pr_emerg_ratelimited("%s: %d.%u No irq handler for vector\n",
__func__, smp_processor_id(), vector);

Directly from the x86 C entry point without ever invoking that
function. Pretty popular error message due to some AMD BIOS
wreckage. :)

2) It's invoked when there is an interrupt descriptor installed but
not configured/requested. In that case some architectures need to
ack it in order not to block further interrupt delivery. In that
case 'vector is bogus' and really want's to be 'irqnr' or such
because there is a Linux virq number associated to it.

3) It's invoked from __handle_domain_irq() when the 'hwirq' which is
handed in by the caller does not resolve to a mapped Linux
interrupt which is pretty much the same as the x86 situation above
in #1, but it prints useless data.

It prints 'irq' which is invalid but it does not print the really
interesting 'hwirq' which was handed in by the caller and did
not resolve.

In this case the Linux irq number is uninteresting as it is known
to be invalid and simply is not mapped and therefore does not
exist.

This has to print out 'hwirq' which is kinda the equivalent to the
original 'vector' message.

4) It's invoked from the dummy irq chip which is installed for a
couple of truly virtual interrupts where the invocation of
dummy_irq_chip::irq_ack() is indicating wreckage.

In that case the Linux irq number is the thing which is printed.

So no. It's not just inconsistent it's in some places outright
wrong. What we really want is:

ack_bad_irq(int hwirq, int virq)
{
if (hwirq >= 0)
print_useful_info(hwirq);
if (virq > 0)
print_useful_info(virq);
arch_try_to_ack(hwirq, virq);
}

for this to make sense. Just fixing the existing printk() to be less
wrong is not really an improvement.

Thanks,

tglx


Subject: Re: [PATCH] arch: fix 'unexpected IRQ trap at vector' warnings

On 09.12.20 00:01, Thomas Gleixner wrote:

> There are a few situations why it is invoked or not:
>
> 1) The original x86 usage is not longer using it because it complains
> rightfully about a vector being raised which has no interrupt
> descriptor associated to it. So the original reason for naming it
> vector is gone long ago. It emits:
>
> pr_emerg_ratelimited("%s: %d.%u No irq handler for vector\n",
> __func__, smp_processor_id(), vector);
>
> Directly from the x86 C entry point without ever invoking that
> function. Pretty popular error message due to some AMD BIOS
> wreckage. :)

Of course, the term "vector" should be replaced by something like
"irqnr" or "virq", but I didn't have name changes within scope - just
wanted to fix the printing of that number, as i've stupled over it while
working on something different and wondered why the number differed from
what I had expected, until I seen that it prints hex instead of decimal.

But if you prefer a more complete cleanup, I'll be happy to do it.

> 3) It's invoked from __handle_domain_irq() when the 'hwirq' which is
> handed in by the caller does not resolve to a mapped Linux
> interrupt which is pretty much the same as the x86 situation above
> in #1, but it prints useless data.
>
> It prints 'irq' which is invalid but it does not print the really
> interesting 'hwirq' which was handed in by the caller and did
> not resolve.

I wouldn't say the irq-nr isn't interesting. In my particular case it
was quite what I've been looking for. But you're right, hwirq should
also be printed.

> In this case the Linux irq number is uninteresting as it is known
> to be invalid and simply is not mapped and therefore does not
> exist.

In my case it came in from generic_handle_irq(), and in this case this
irq number (IMHO) has been valid, but nobody handled it, so it went to
ack_bad_irq.

Of course, if this function is meant as a fallback to ack some not
otherwise handled IRQ on the hw, the linux irq number indeed isn't quite
helpful (unless we expect that code to do a lookup to the hw irq).

... rethinking this further ... shouldn't we also pass in even more data
(eg. irq_desc, irqchip, ...), so this function can check which hw to
actually talk to ?

> 4) It's invoked from the dummy irq chip which is installed for a
> couple of truly virtual interrupts where the invocation of
> dummy_irq_chip::irq_ack() is indicating wreckage.
>
> In that case the Linux irq number is the thing which is printed.
>
> So no. It's not just inconsistent it's in some places outright
> wrong. What we really want is:
>
> ack_bad_irq(int hwirq, int virq)

is 'int' correct here ?

BTW: I also wonder why the virq is unsigned int, while hwirq (eg. in
struct irq_data) is unsigned long. shouldn't the virtual number space
be at least as big (or even bigger) than the hw one ?

{
> if (hwirq >= 0)
> print_useful_info(hwirq);
> if (virq > 0)
> print_useful_info(virq);
> arch_try_to_ack(hwirq, virq);
> }
>
> for this to make sense. Just fixing the existing printk() to be less
> wrong is not really an improvement.

Okay, makes sense.

OTOH: since both callers (dummychip.c, handle.c) already dump out before
ack_bad_irq(), do we need to print out anything at all ?

I've also seen that many archs increase a counter (some use long, others
atomic_t) - should we also consolidate this in an arch-independent way
in handle.c (or does kstat_incr_irqs_this_cpu already do this) ?

--mtx

--
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
[email protected] -- +49-151-27565287

2020-12-15 22:17:33

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] arch: fix 'unexpected IRQ trap at vector' warnings

On Tue, Dec 15 2020 at 21:12, Enrico Weigelt wrote:
> On 09.12.20 00:01, Thomas Gleixner wrote:
>> 3) It's invoked from __handle_domain_irq() when the 'hwirq' which is
>> handed in by the caller does not resolve to a mapped Linux
>> interrupt which is pretty much the same as the x86 situation above
>> in #1, but it prints useless data.
>>
>> It prints 'irq' which is invalid but it does not print the really
>> interesting 'hwirq' which was handed in by the caller and did
>> not resolve.
>
> I wouldn't say the irq-nr isn't interesting. In my particular case it
> was quite what I've been looking for. But you're right, hwirq should
> also be printed.

The number is _not_ interesting in this case. It's useless because the
function does:

irq = hwirq;

if (lookup)
irq = find_mapping(hwirq);

if (!irq || irq >= nr_irqs)
-> BAD

So irq is completely useless because find_mapping() returns 0 if there
is no mapping and if irq >= nr_irqs then there was no lookup and the
hwirq number is bogus.

In both cases the only interesting information is that hwirq does not
resolve to a valid Linux interrupt number and which hwirq number caused
that.

>> In this case the Linux irq number is uninteresting as it is known
>> to be invalid and simply is not mapped and therefore does not
>> exist.
>
> In my case it came in from generic_handle_irq(), and in this case this
> irq number (IMHO) has been valid, but nobody handled it, so it went to
> ack_bad_irq.

generic_handle_irq() _is_ a different function which is only invoked
when there is a valid Linux interrupt number and then the ack_bad_irq()
is invoked from a different place. See below.

> Of course, if this function is meant as a fallback to ack some not
> otherwise handled IRQ on the hw, the linux irq number indeed isn't quite
> helpful (unless we expect that code to do a lookup to the hw irq).

If there is no valid linux irq number then there is no lookup. And you
can't look it up from the hardware either.

If you look really then you find out that there is exactly _ONE_
architecture which does anything else than incrementing a counter and/or
printing stuff: X86, which has a big fat comment explaining why. The
only way to ack an interrupt on X86 is to issue EOI on the local APIC,
i.e. it does _not_ need any further information.

> ... rethinking this further ... shouldn't we also pass in even more data
> (eg. irq_desc, irqchip, ...), so this function can check which hw to
> actually talk to ?

There are 3 ways to get there:

1) via dummy chip which obviously has no hardware associated

2) via handle_bad_irq() which prints the info already

3) __handle_domain_irq() which cannot print anything and obviously
cannot figure out the hw to talk to because there is no irq
descriptor associated.

>> 4) It's invoked from the dummy irq chip which is installed for a
>> couple of truly virtual interrupts where the invocation of
>> dummy_irq_chip::irq_ack() is indicating wreckage.
>>
>> In that case the Linux irq number is the thing which is printed.
>>
>> So no. It's not just inconsistent it's in some places outright
>> wrong. What we really want is:
>>
>> ack_bad_irq(int hwirq, int virq)
>
> is 'int' correct here ?

This was just for illustration.

> BTW: I also wonder why the virq is unsigned int, while hwirq (eg. in
> struct irq_data) is unsigned long. shouldn't the virtual number space
> be at least as big (or even bigger) than the hw one ?

Only if there are no irqdomain mappings and the virq space is 1:1 mapped
to the hwirq space. Systems with > 4G interrupts are pretty unlikely.

Also hwirq can be completely artificial and encode information about
interrupts which are composed, i.e. PCI/MSI. See pci_msi_domain_calc_hwirq().

> {
>> if (hwirq >= 0)
>> print_useful_info(hwirq);
>> if (virq > 0)
>> print_useful_info(virq);
>> arch_try_to_ack(hwirq, virq);
>> }
>>
>> for this to make sense. Just fixing the existing printk() to be less
>> wrong is not really an improvement.
>
> Okay, makes sense.
>
> OTOH: since both callers (dummychip.c, handle.c) already dump out before
> ack_bad_irq(), do we need to print out anything at all ?

Not all callers print something, but yes this could do with some general
cleanup.

> I've also seen that many archs increase a counter (some use long, others
> atomic_t) - should we also consolidate this in an arch-independent way
> in handle.c (or does kstat_incr_irqs_this_cpu already do this) ?

kstat_incr_irqs_this_cpu(desc) operates on the irq descriptor which
requires that an irq descriptor exists in the first place.

The error counter is independent of that, but yes there is room for
consolidation.

Thanks,

tglx