2004-09-02 19:28:37

by Chris Wedgwood

[permalink] [raw]
Subject: [PATCH] i386 reduce spurious interrupt noise

i386 hardware can (and does) see spurious interrupts from time to
tome. Ideally I would like the printk removed completely but this is
probably good enough for now.

Singed-of-By: Chris Wedgwood <[email protected]>

===== arch/i386/kernel/apic.c 1.58 vs edited =====
--- 1.58/arch/i386/kernel/apic.c 2004-08-26 23:30:31 -07:00
+++ edited/arch/i386/kernel/apic.c 2004-09-02 12:19:19 -07:00
@@ -1190,7 +1190,7 @@
6: Received illegal vector
7: Illegal register address
*/
- printk (KERN_INFO "APIC error on CPU%d: %02lx(%02lx)\n",
+ printk (KERN_DEBUG "APIC error on CPU%d: %02lx(%02lx)\n",
smp_processor_id(), v , v1);
irq_exit();
}
===== arch/i386/kernel/i8259.c 1.36 vs edited =====
--- 1.36/arch/i386/kernel/i8259.c 2004-08-23 12:48:32 -07:00
+++ edited/arch/i386/kernel/i8259.c 2004-09-02 12:20:49 -07:00
@@ -226,7 +226,7 @@
* lets ACK and report it. [once per IRQ]
*/
if (!(spurious_irq_mask & irqmask)) {
- printk("spurious 8259A interrupt: IRQ%d.\n", irq);
+ printk(KERN_DEBUG "spurious 8259A interrupt: IRQ%d.\n", irq);
spurious_irq_mask |= irqmask;
}
atomic_inc(&irq_err_count);



2004-09-02 19:35:08

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] i386 reduce spurious interrupt noise

On Thu, Sep 02, 2004 at 12:28:20PM -0700, Chris Wedgwood wrote:
> i386 hardware can (and does) see spurious interrupts from time to
> tome. Ideally I would like the printk removed completely but this is
> probably good enough for now.

Please check printk_ratelimit().


-- wli

2004-09-02 19:48:02

by Chris Wedgwood

[permalink] [raw]
Subject: Re: [PATCH] i386 reduce spurious interrupt noise

On Thu, Sep 02, 2004 at 12:34:54PM -0700, William Lee Irwin III wrote:

> Please check printk_ratelimit().

I don't want them displayed by default at *all* --- it wakes up the
monitor on console machines and that's annoying.

You get about 1 or 2 a day --- rate limiting isn't useful, nor is
reporting them IMO.


--cw

2004-09-02 20:00:03

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] i386 reduce spurious interrupt noise

On Thu, Sep 02, 2004 at 12:34:54PM -0700, William Lee Irwin III wrote:
>> Please check printk_ratelimit().

On Thu, Sep 02, 2004 at 12:47:39PM -0700, Chris Wedgwood wrote:
> I don't want them displayed by default at *all* --- it wakes up the
> monitor on console machines and that's annoying.
> You get about 1 or 2 a day --- rate limiting isn't useful, nor is
> reporting them IMO.
> --cw

That's okay. The reason why is that this is in response to an external
stimulus which can, in principle, scream out of control, so even at
KERN_DEBUG or other loglevels it's meaningful to rate limit it.


-- wli

2004-09-02 20:00:17

by Chris Wedgwood

[permalink] [raw]
Subject: Re: [PATCH] i386 reduce spurious interrupt noise

On Thu, Sep 02, 2004 at 12:52:19PM -0700, William Lee Irwin III wrote:

> That's okay. The reason why is that this is in response to an
> external stimulus which can, in principle, scream out of control, so
> even at KERN_DEBUG or other loglevels it's meaningful to rate limit
> it.

If you have enough to be a problem something is wrong and you're dead
already. For such cases a more generic interrupt throttling approach
is required.


--cw

2004-09-02 20:16:46

by Nathan Bryant

[permalink] [raw]
Subject: Re: [PATCH] i386 reduce spurious interrupt noise

Chris Wedgwood wrote:
> On Thu, Sep 02, 2004 at 12:34:54PM -0700, William Lee Irwin III wrote:
>
>
>>Please check printk_ratelimit().
>
>
> I don't want them displayed by default at *all* --- it wakes up the
> monitor on console machines and that's annoying.
>
> You get about 1 or 2 a day --- rate limiting isn't useful, nor is
> reporting them IMO.

Right, spurious interrupts aren't a big deal on i386. They happen now
and then with some devices because some hardware timing tolerances are a
little too tight. See for example sections 5.7.1.3/5.7.4 of the intel
850 chipset databook
(http://developer.intel.com/design/chipsets/datashts/29068702.pdf) :

"6. Upon receiving the second internally generated INTA# pulse, the PIC
returns the interrupt vector. If no interrupt request is present because
the request was too short in duration, the PIC will return vector 7 from
the master controller."

"In both the edge-triggered and level-triggered modes, the IRQ inputs
must remain active until after the falling edge of the first internal
INTA#. If the IRQ input goes inactive before this time, a default IRQ7
vector will be returned."

>
>
> --cw
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2004-09-02 20:33:33

by Alan

[permalink] [raw]
Subject: Re: [PATCH] i386 reduce spurious interrupt noise

On Iau, 2004-09-02 at 21:13, Nathan Bryant wrote:
> Right, spurious interrupts aren't a big deal on i386. They happen now
> and then with some devices because some hardware timing tolerances are a
> little too tight.

It also happens on a lot of hardware on the odd instance a non IRQ code
path clears down an interrupt just as its being raised. IDE does it now
and then for example.

2004-09-02 20:35:35

by Chris Wedgwood

[permalink] [raw]
Subject: Re: [PATCH] i386 reduce spurious interrupt noise

On Thu, Sep 02, 2004 at 08:27:28PM +0100, Alan Cox wrote:

> It also happens on a lot of hardware on the odd instance a non IRQ
> code path clears down an interrupt just as its being raised. IDE
> does it now and then for example.

So how about we just remove those printk statements completely then?
I've never heard of a single need for them other than reporting things
we don't really care about....

2004-09-10 21:23:26

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH] i386 reduce spurious interrupt noise

On Thu, 2 Sep 2004, Chris Wedgwood wrote:

> i386 hardware can (and does) see spurious interrupts from time to
> tome. Ideally I would like the printk removed completely but this is
> probably good enough for now.
>
> Singed-of-By: Chris Wedgwood <[email protected]>
>
> ===== arch/i386/kernel/apic.c 1.58 vs edited =====
> --- 1.58/arch/i386/kernel/apic.c 2004-08-26 23:30:31 -07:00
> +++ edited/arch/i386/kernel/apic.c 2004-09-02 12:19:19 -07:00
> @@ -1190,7 +1190,7 @@
> 6: Received illegal vector
> 7: Illegal register address
> */
> - printk (KERN_INFO "APIC error on CPU%d: %02lx(%02lx)\n",
> + printk (KERN_DEBUG "APIC error on CPU%d: %02lx(%02lx)\n",
> smp_processor_id(), v , v1);
> irq_exit();
> }

This should probably be KERN_ERR even. This is a serious condition -- if
you ever get such a message, then inter-APIC messages get corrupted and
this affects system's stability. E.g. with a badly corrupted message you
may get one or more of your processors halted if the matching destinations
misinterpret the delivery mode as a result. You certainly want to know
about these errors and perhaps get your hardware replaced (starting with
the PSU as they've been repeatedly reported to be the causers).

> ===== arch/i386/kernel/i8259.c 1.36 vs edited =====
> --- 1.36/arch/i386/kernel/i8259.c 2004-08-23 12:48:32 -07:00
> +++ edited/arch/i386/kernel/i8259.c 2004-09-02 12:20:49 -07:00
> @@ -226,7 +226,7 @@
> * lets ACK and report it. [once per IRQ]
> */
> if (!(spurious_irq_mask & irqmask)) {
> - printk("spurious 8259A interrupt: IRQ%d.\n", irq);
> + printk(KERN_DEBUG "spurious 8259A interrupt: IRQ%d.\n", irq);
> spurious_irq_mask |= irqmask;
> }
> atomic_inc(&irq_err_count);

You may ever get a single message per system boot from this line. It
encourages to have a look at the ERR counter in /proc/interrupts to check
for possible problems, though admittedly the suggestion isn't especially
clear.

Maciej

2004-09-10 23:11:34

by Chris Wedgwood

[permalink] [raw]
Subject: Re: [PATCH] i386 reduce spurious interrupt noise

On Fri, Sep 10, 2004 at 11:23:20PM +0200, Maciej W. Rozycki wrote:

> > - printk (KERN_INFO "APIC error on CPU%d: %02lx(%02lx)\n",
> > + printk (KERN_DEBUG "APIC error on CPU%d: %02lx(%02lx)\n",

> This should probably be KERN_ERR even. This is a serious condition -- if
> you ever get such a message, then inter-APIC messages get corrupted and
> this affects system's stability.

These messages are very common on many platforms, infrequent (once
very few days to twice a day at most in my observations) and seemingly
harmless.

I agree that if you get *many* of these certainly that would indicate
there is a problem but I've not not heard a single instance of this
and if that is the case we need to deal with it differently.

> > - printk("spurious 8259A interrupt: IRQ%d.\n", irq);
> > + printk(KERN_DEBUG "spurious 8259A interrupt: IRQ%d.\n", irq);

> You may ever get a single message per system boot from this line.

Sometimes as boot, though often in my experience several minutes after
boot.

> It encourages to have a look at the ERR counter in /proc/interrupts
> to check for possible problems, though admittedly the suggestion
> isn't especially clear.

I think in *both* cases we want to detect a largish (more than 1 ever
n seconds or so) number of these and then complain, not before and
even then not excessively so that we printk our-selves to death.

I'm not inclined to offer such a patch right now as it feels like it's
fixing a problem nobody has reported.


--cw

2004-09-10 23:23:54

by Alan

[permalink] [raw]
Subject: Re: [PATCH] i386 reduce spurious interrupt noise

On Sad, 2004-09-11 at 00:10, Chris Wedgwood wrote:
> On Fri, Sep 10, 2004 at 11:23:20PM +0200, Maciej W. Rozycki wrote:
>
> > > - printk (KERN_INFO "APIC error on CPU%d: %02lx(%02lx)\n",
> > > + printk (KERN_DEBUG "APIC error on CPU%d: %02lx(%02lx)\n",
>
> > This should probably be KERN_ERR even. This is a serious condition -- if
> > you ever get such a message, then inter-APIC messages get corrupted and
> > this affects system's stability.
>
> These messages are very common on many platforms, infrequent (once
> very few days to twice a day at most in my observations) and seemingly
> harmless.

On a lot of 2.4 boxes they aren't harmless but thats 2.4 IPI messsage
handling bugs. People sometimes assume an IPI is delivered once - but
its not its delivered "at least once" and when you get a checksum error
like on old dual celerons you get replays.

They also identify kernel bugs in some other bit combinations so they
are useful there too. I'd say this should only go if we are sure 2.6.x
handles IPI replay properly and we mask bits off to see if its real news
or a retry.

> > > - printk("spurious 8259A interrupt: IRQ%d.\n", irq);
> > > + printk(KERN_DEBUG "spurious 8259A interrupt: IRQ%d.\n", irq);
>
> > You may ever get a single message per system boot from this line.
>
> Sometimes as boot, though often in my experience several minutes after
> boot.

The IDE layer will generate these naturally as will any other code that
happens to clear an IRQ causing event in non IRQ context. Eventually you
clear it just as the IRQ is raised, and the pulse causes the error.

This should really go.

2004-09-10 23:30:20

by Chris Wedgwood

[permalink] [raw]
Subject: Re: [PATCH] i386 reduce spurious interrupt noise

On Fri, Sep 10, 2004 at 11:21:26PM +0100, Alan Cox wrote:

> On a lot of 2.4 boxes they aren't harmless but thats 2.4 IPI
> messsage handling bugs. People sometimes assume an IPI is delivered
> once - but its not its delivered "at least once" and when you get a
> checksum error like on old dual celerons you get replays.


it sounds like leaving it as KERN_DEBUG is the right thing to do then


> > > > - printk("spurious 8259A interrupt: IRQ%d.\n", irq);
> > > > + printk(KERN_DEBUG "spurious 8259A interrupt: IRQ%d.\n", irq);

> This should really go.

do we want counters for this? what about the APIC case?

2004-09-11 00:14:20

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH] i386 reduce spurious interrupt noise

On Fri, 10 Sep 2004, Chris Wedgwood wrote:

> > > - printk (KERN_INFO "APIC error on CPU%d: %02lx(%02lx)\n",
> > > + printk (KERN_DEBUG "APIC error on CPU%d: %02lx(%02lx)\n",
>
> > This should probably be KERN_ERR even. This is a serious condition -- if
> > you ever get such a message, then inter-APIC messages get corrupted and
> > this affects system's stability.
>
> These messages are very common on many platforms, infrequent (once
> very few days to twice a day at most in my observations) and seemingly
> harmless.

These are just as harmless as single-bit RAM errors with ECC working.
In both cases you want the problem to be reported.

> I agree that if you get *many* of these certainly that would indicate
> there is a problem but I've not not heard a single instance of this
> and if that is the case we need to deal with it differently.

Please search list archives for lots of such reports.

> > > - printk("spurious 8259A interrupt: IRQ%d.\n", irq);
> > > + printk(KERN_DEBUG "spurious 8259A interrupt: IRQ%d.\n", irq);
>
> > You may ever get a single message per system boot from this line.
>
> Sometimes as boot, though often in my experience several minutes after
> boot.

And never again until you reboot. That's what I mean.

> > It encourages to have a look at the ERR counter in /proc/interrupts
> > to check for possible problems, though admittedly the suggestion
> > isn't especially clear.
>
> I think in *both* cases we want to detect a largish (more than 1 ever
> n seconds or so) number of these and then complain, not before and
> even then not excessively so that we printk our-selves to death.

I agree for the latter case. I won't mind the message going away either.
For the former you only really want to rate-limit the report -- some
people apparently want or need to run broken hardware and they'd probably
appreciate limiting the output.

Maciej

2004-09-11 00:17:43

by Chris Wedgwood

[permalink] [raw]
Subject: Re: [PATCH] i386 reduce spurious interrupt noise

On Sat, Sep 11, 2004 at 02:14:13AM +0200, Maciej W. Rozycki wrote:

> These are just as harmless as single-bit RAM errors with ECC
> working.

Hence KERN_DEBUG

> For the former you only really want to rate-limit the report -- some
> people apparently want or need to run broken hardware and they'd
> probably appreciate limiting the output.

A little more than rate-limit as I mentioned. I don't want the
occasional surious APIC message waking up consoles that are asleep.
This was the reason for the change.


--cw

2004-09-11 00:33:42

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH] i386 reduce spurious interrupt noise

On Fri, 10 Sep 2004, Chris Wedgwood wrote:

> > These are just as harmless as single-bit RAM errors with ECC
> > working.
>
> Hence KERN_DEBUG

Both are serious hardware failures. KERN_DEBUG is for stuff that's
normally out of interest for most system operators.

> > For the former you only really want to rate-limit the report -- some
> > people apparently want or need to run broken hardware and they'd
> > probably appreciate limiting the output.
>
> A little more than rate-limit as I mentioned. I don't want the
> occasional surious APIC message waking up consoles that are asleep.
> This was the reason for the change.

If that's the sole reason, then how about setting console_loglevel
appropriately for the systems you want the console to remain asleep?
It's there exactly for a purpose like this. You can eliminate other
messages you consider unimportant this way, too, without tweaking the log
level of all of them.

Maciej

2004-09-11 14:38:57

by Alan

[permalink] [raw]
Subject: Re: [PATCH] i386 reduce spurious interrupt noise

On Sad, 2004-09-11 at 00:28, Chris Wedgwood wrote:
> > > > > - printk("spurious 8259A interrupt: IRQ%d.\n", irq);
> > > > > + printk(KERN_DEBUG "spurious 8259A interrupt: IRQ%d.\n", irq);
>
> > This should really go.
>
> do we want counters for this? what about the APIC case?

I don't know enough about the APIC version to comment, just the PIC one.