2003-11-10 21:58:48

by tip-bot for Jack Steiner

[permalink] [raw]
Subject: hot cache line due to note_interrupt()


I dont know the background on note_interrupt() in arch/ia64/kernel/irq.c,
but I had to disable the function on our large systems (IA64).

The function updates a counter in the irq_desc_t table. An entry in this table
is shared by all cpus that take a specific interrupt #. For most interrupt #'s,
this is a problem but it is prohibitive for the timer tick on big systems.

Updating the counter causes a cache line to be bounced between
cpus at a rate of at least HZ*active_cpus. (The number of bus transactions
is at least 2X higher because the line is first obtained for
shared usage, then upgraded to modified. In addition, multiple references
are made to the line for each interrupt. On a big system, it is unlikely that
a cpu can hold the line for entire time that the interrupt is being
serviced).

On a 500p system, the system was VERY slowly making forward progres during boot,
but I didnt have the patience to wait for it finish. After I disabled
note_interrupt(), I had no problem booting.

Certainly, the problem is much less severe on smaller systems. However, even
moderate sized systems may see some degradation due to this hot cache line.



I also verified on a system simulator that the counter in note_interrupt() is the
only line that is bounced between cpus at a HZ rate on an idle system.

The IPI & reschedule interrupts have a similar problem but at a lower rate.


(initialially sent to [email protected])

--
Thanks

Jack Steiner ([email protected]) 651-683-5302
Principal Engineer SGI - Silicon Graphics, Inc.



2003-11-10 23:05:07

by Anton Blanchard

[permalink] [raw]
Subject: Re: hot cache line due to note_interrupt()


Hi,

> I also verified on a system simulator that the counter in
> note_interrupt() is the only line that is bounced between cpus at a HZ
> rate on an idle system.
>
> The IPI & reschedule interrupts have a similar problem but at a lower
> rate.

We havent seen this on ppc64 because our timer interrupt doesnt go via
the standard interrupt path (its a separate exception). IPIs do however
so Id expect these to be a problem for us.

Anton

2003-11-11 08:23:28

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: Re: hot cache line due to note_interrupt()

On Mon, Nov 10, 2003 at 03:58:44PM -0600, Jack Steiner wrote:
>
> I dont know the background on note_interrupt() in arch/ia64/kernel/irq.c,
> but I had to disable the function on our large systems (IA64).
>
> The function updates a counter in the irq_desc_t table. An entry in this table
> is shared by all cpus that take a specific interrupt #. For most interrupt #'s,
> this is a problem but it is prohibitive for the timer tick on big systems.
>
> Updating the counter causes a cache line to be bounced between
> cpus at a rate of at least HZ*active_cpus. (The number of bus transactions

The answer to this is probably alloc_percpu for the counters.
right now this might not possible because irq_desc_t table might be used very
early, and alloc_percpu uses slab underneath. alloc_percpu will have to be
made to work early enough for this....

Thanks,
Kiran

2003-11-11 16:28:13

by Anton Blanchard

[permalink] [raw]
Subject: Re: hot cache line due to note_interrupt()


> The answer to this is probably alloc_percpu for the counters.
> right now this might not possible because irq_desc_t table might be used very
> early, and alloc_percpu uses slab underneath. alloc_percpu will have to be
> made to work early enough for this....

Also keep in mind the memory bloat with high NR_IRQS and NR_IRQS.

Anton

2003-11-11 19:54:10

by Andrew Morton

[permalink] [raw]
Subject: Re: hot cache line due to note_interrupt()

Ravikiran G Thirumalai <[email protected]> wrote:
>
> On Mon, Nov 10, 2003 at 03:58:44PM -0600, Jack Steiner wrote:
> >
> > I dont know the background on note_interrupt() in arch/ia64/kernel/irq.c,
> > but I had to disable the function on our large systems (IA64).
> >
> > The function updates a counter in the irq_desc_t table. An entry in this table
> > is shared by all cpus that take a specific interrupt #. For most interrupt #'s,
> > this is a problem but it is prohibitive for the timer tick on big systems.
> >
> > Updating the counter causes a cache line to be bounced between
> > cpus at a rate of at least HZ*active_cpus. (The number of bus transactions
>
> The answer to this is probably alloc_percpu for the counters.

Or just make noirqdebug the default.

The note_interrupt() stuff is only useful for diagnosing mysterious lockups
(and hasn't proven useful for that, actually). It should be disabled for
production use.


2003-11-14 17:46:22

by tip-bot for Jack Steiner

[permalink] [raw]
Subject: Re: hot cache line due to note_interrupt()

On Tue, Nov 11, 2003 at 11:58:04AM -0800, Andrew Morton wrote:
> Ravikiran G Thirumalai <[email protected]> wrote:
> >
> > On Mon, Nov 10, 2003 at 03:58:44PM -0600, Jack Steiner wrote:
> > >
> > > I dont know the background on note_interrupt() in arch/ia64/kernel/irq.c,
> > > but I had to disable the function on our large systems (IA64).
> > >
> > > The function updates a counter in the irq_desc_t table. An entry in this table
> > > is shared by all cpus that take a specific interrupt #. For most interrupt #'s,
> > > this is a problem but it is prohibitive for the timer tick on big systems.
> > >
> > > Updating the counter causes a cache line to be bounced between
> > > cpus at a rate of at least HZ*active_cpus. (The number of bus transactions
> >
> > The answer to this is probably alloc_percpu for the counters.
>
> Or just make noirqdebug the default.
>
> The note_interrupt() stuff is only useful for diagnosing mysterious lockups
> (and hasn't proven useful for that, actually). It should be disabled for
> production use.
>


Probably too late for 2.6.0, but here is a patch that disables noirqdebug:



# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
# ChangeSet 1.1438 -> 1.1439
# arch/i386/kernel/irq.c 1.45 -> 1.46
# arch/mips/kernel/irq.c 1.14 -> 1.15
# arch/ppc64/kernel/irq.c 1.36 -> 1.37
# arch/ia64/kernel/irq.c 1.31 -> 1.32
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/11/14 [email protected] 1.1439
# Change the default value for "irqdebug" so that it is disabled.
# Rename option from "noirqdebug" to "irqdebug".
# --------------------------------------------
#
diff -Nru a/arch/i386/kernel/irq.c b/arch/i386/kernel/irq.c
--- a/arch/i386/kernel/irq.c Fri Nov 14 11:18:07 2003
+++ b/arch/i386/kernel/irq.c Fri Nov 14 11:18:07 2003
@@ -259,16 +259,16 @@
}
}

-static int noirqdebug;
+static int irqdebug;

-static int __init noirqdebug_setup(char *str)
+static int __init irqdebug_setup(char *str)
{
- noirqdebug = 1;
- printk("IRQ lockup detection disabled\n");
+ irqdebug = 1;
+ printk("IRQ lockup detection enabled\n");
return 1;
}

-__setup("noirqdebug", noirqdebug_setup);
+__setup("irqdebug", irqdebug_setup);

/*
* If 99,900 of the previous 100,000 interrupts have not been handled then
@@ -484,7 +484,7 @@
spin_unlock(&desc->lock);
action_ret = handle_IRQ_event(irq, &regs, action);
spin_lock(&desc->lock);
- if (!noirqdebug)
+ if (irqdebug)
note_interrupt(irq, desc, action_ret);
if (likely(!(desc->status & IRQ_PENDING)))
break;
diff -Nru a/arch/ia64/kernel/irq.c b/arch/ia64/kernel/irq.c
--- a/arch/ia64/kernel/irq.c Fri Nov 14 11:18:07 2003
+++ b/arch/ia64/kernel/irq.c Fri Nov 14 11:18:07 2003
@@ -283,16 +283,16 @@
}
}

-static int noirqdebug;
+static int irqdebug;

-static int __init noirqdebug_setup(char *str)
+static int __init irqdebug_setup(char *str)
{
- noirqdebug = 1;
- printk("IRQ lockup detection disabled\n");
+ irqdebug = 1;
+ printk("IRQ lockup detection enabled\n");
return 1;
}

-__setup("noirqdebug", noirqdebug_setup);
+__setup("irqdebug", irqdebug_setup);

/*
* If 99,900 of the previous 100,000 interrupts have not been handled then
@@ -457,7 +457,7 @@
desc->handler->ack(irq);
action_ret = handle_IRQ_event(irq, regs, desc->action);
desc->handler->end(irq);
- if (!noirqdebug)
+ if (irqdebug)
note_interrupt(irq, desc, action_ret);
} else {
spin_lock(&desc->lock);
@@ -504,7 +504,7 @@
spin_unlock(&desc->lock);
action_ret = handle_IRQ_event(irq, regs, action);
spin_lock(&desc->lock);
- if (!noirqdebug)
+ if (irqdebug)
note_interrupt(irq, desc, action_ret);
if (!(desc->status & IRQ_PENDING))
break;
diff -Nru a/arch/mips/kernel/irq.c b/arch/mips/kernel/irq.c
--- a/arch/mips/kernel/irq.c Fri Nov 14 11:18:07 2003
+++ b/arch/mips/kernel/irq.c Fri Nov 14 11:18:07 2003
@@ -191,16 +191,16 @@
}
}

-static int noirqdebug;
+static int irqdebug;

-static int __init noirqdebug_setup(char *str)
+static int __init irqdebug_setup(char *str)
{
- noirqdebug = 1;
- printk("IRQ lockup detection disabled\n");
+ irqdebug = 1;
+ printk("IRQ lockup detection enabled\n");
return 1;
}

-__setup("noirqdebug", noirqdebug_setup);
+__setup("irqdebug", irqdebug_setup);

/*
* If 99,900 of the previous 100,000 interrupts have not been handled then
@@ -396,7 +396,7 @@
spin_unlock(&desc->lock);
action_ret = handle_IRQ_event(irq, regs, action);
spin_lock(&desc->lock);
- if (!noirqdebug)
+ if (irqdebug)
note_interrupt(irq, desc, action_ret);
if (likely(!(desc->status & IRQ_PENDING)))
break;
diff -Nru a/arch/ppc64/kernel/irq.c b/arch/ppc64/kernel/irq.c
--- a/arch/ppc64/kernel/irq.c Fri Nov 14 11:18:07 2003
+++ b/arch/ppc64/kernel/irq.c Fri Nov 14 11:18:07 2003
@@ -416,16 +416,16 @@
}
}

-static int noirqdebug;
+static int irqdebug;

-static int __init noirqdebug_setup(char *str)
+static int __init irqdebug_setup(char *str)
{
- noirqdebug = 1;
- printk("IRQ lockup detection disabled\n");
+ irqdebug = 1;
+ printk("IRQ lockup detection enabled\n");
return 1;
}

-__setup("noirqdebug", noirqdebug_setup);
+__setup("irqdebug", irqdebug_setup);

/*
* If 99,900 of the previous 100,000 interrupts have not been handled then
@@ -538,7 +538,7 @@
spin_unlock(&desc->lock);
action_ret = handle_irq_event(irq, regs, action);
spin_lock(&desc->lock);
- if (!noirqdebug)
+ if (irqdebug)
note_interrupt(irq, desc, action_ret);
if (likely(!(desc->status & IRQ_PENDING)))
break;

--
Thanks

Jack Steiner ([email protected]) 651-683-5302
Principal Engineer SGI - Silicon Graphics, Inc.


2003-11-14 17:53:12

by Andrew Morton

[permalink] [raw]
Subject: Re: hot cache line due to note_interrupt()

Jack Steiner <[email protected]> wrote:
>
> Probably too late for 2.6.0

Hard to see how it can break anything; I'll queue it up, thanks.

> but here is a patch that disables noirqdebug:

No update to Documentation/kernel-parameters.txt! No cookie for you.

2003-11-14 18:10:30

by David Mosberger

[permalink] [raw]
Subject: Re: hot cache line due to note_interrupt()

>>>>> On Fri, 14 Nov 2003 11:45:35 -0600, Jack Steiner <[email protected]> said:

Jack> Probably too late for 2.6.0, but here is a patch that disables
Jack> noirqdebug:

The ia64-portion looks good to me. Andrew, if you queue this patch,
please feel free to include the ia64-portion.

Thanks,

--david

2003-11-14 18:18:13

by Linus Torvalds

[permalink] [raw]
Subject: Re: hot cache line due to note_interrupt()


On Fri, 14 Nov 2003, Jack Steiner wrote:
> >
> > The note_interrupt() stuff is only useful for diagnosing mysterious lockups
> > (and hasn't proven useful for that, actually). It should be disabled for
> > production use.
>
> Probably too late for 2.6.0, but here is a patch that disables noirqdebug:

Why do people hate irqdebug?

Sure, there's some overhead on big machines, but there aren't that many of
them, and you can just disable them for those.

The fact is, irqdebug _has_ resulted in a few reports where instead of a
silent and total lock-up, the kernel just said "I'll disable this irq" and
the machine continued limping along.

Which is a huge improvement, since otherwise especially newcomers really
have nothing to even report. Just "it locked up at boot" is not very
useful, while a "it said nobody cared about irq5 and now my PCMCIA card
doesn't work" is a _huge_ cluebat.

It doesn't catch everything, but it doesn't really cost anything on any
normal machines either..

Linus

2003-11-14 18:11:58

by tip-bot for Jack Steiner

[permalink] [raw]
Subject: Re: hot cache line due to note_interrupt()

On Fri, Nov 14, 2003 at 09:57:37AM -0800, Andrew Morton wrote:
> Jack Steiner <[email protected]> wrote:
> >
> > Probably too late for 2.6.0
>
> Hard to see how it can break anything; I'll queue it up, thanks.
>
> > but here is a patch that disables noirqdebug:
>
> No update to Documentation/kernel-parameters.txt! No cookie for you.

Do I get my cookie now :-)


# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
# ChangeSet 1.1439 -> 1.1440
# Documentation/kernel-parameters.txt 1.31 -> 1.32
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/11/14 [email protected] 1.1440
# Update documentation for irqdebug.
# --------------------------------------------
#
diff -Nru a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
--- a/Documentation/kernel-parameters.txt Fri Nov 14 12:09:40 2003
+++ b/Documentation/kernel-parameters.txt Fri Nov 14 12:09:40 2003
@@ -427,6 +427,9 @@
ips= [HW,SCSI] Adaptec / IBM ServeRAID controller
See header of drivers/scsi/ips.c.

+ irqdebug [IA-32] Enables the code which attempts to detect and
+ disable unhandled interrupt sources.
+
isapnp= [ISAPNP]
Format: <RDP>, <reset>, <pci_scan>, <verbosity>

@@ -624,9 +627,6 @@
no-hlt [BUGS=IA-32] Tells the kernel that the hlt
instruction doesn't work correctly and not to
use it.
-
- noirqdebug [IA-32] Disables the code which attempts to detect and
- disable unhandled interrupt sources.

noisapnp [ISAPNP] Disables ISA PnP code.

--
Thanks

Jack Steiner ([email protected]) 651-683-5302
Principal Engineer SGI - Silicon Graphics, Inc.


2003-11-14 19:14:23

by Anton Blanchard

[permalink] [raw]
Subject: Re: hot cache line due to note_interrupt()


> Probably too late for 2.6.0, but here is a patch that disables noirqdebug:

Why dont you just disable it during boot somewhere in sgi ia64 specific
code? It doesnt seem right to disable this after all the driver effort
it took to make it work.

And yes Im a paid up member of the "we build stupidly big machines"
club. I'll disable where applicable in ppc64 specific code.

Anton

2003-11-14 19:28:58

by Mike Fedyk

[permalink] [raw]
Subject: Re: hot cache line due to note_interrupt()

On Sat, Nov 15, 2003 at 06:10:46AM +1100, Anton Blanchard wrote:
>
> > Probably too late for 2.6.0, but here is a patch that disables noirqdebug:
>
> Why dont you just disable it during boot somewhere in sgi ia64 specific
> code? It doesnt seem right to disable this after all the driver effort
> it took to make it work.
>
> And yes Im a paid up member of the "we build stupidly big machines"
> club. I'll disable where applicable in ppc64 specific code.

Since the problem only starts when the cache line bounces between the CPUs,
then it might be better to just disable if more than 150 processors are
detected?

If it's a layering voilation to disable this during SMP init, then maybe
there can be a smp-quirks.c file where exceptions can be put.

2003-11-15 05:18:28

by Jeff Garzik

[permalink] [raw]
Subject: Re: hot cache line due to note_interrupt()

Linus Torvalds wrote:
> The fact is, irqdebug _has_ resulted in a few reports where instead of a
> silent and total lock-up, the kernel just said "I'll disable this irq" and
> the machine continued limping along.


I strongly agree.

There are two huge reasons I develop and debug drivers solely in 2.6
now: kksymoops and irqdebug. Both have helped a great deal in tracking
down problems. Indeed, screaming irqs that lead to lockups are
instantly detected in 2.6.

But hey... I'm just a developer. Who can say whether it should be on by
default?

Jeff