2002-10-17 03:26:54

by Doug Ledford

[permalink] [raw]
Subject: 2.5.43 IO-APIC bug and spinlock deadlock

IO-APIC bug: regular kernel, UP, no IO-APIC or APIC on UP enabled, compile
fails (does *everyone* run SMP or at least UP + APIC now?)

spinlock deadlock: run an smp kernel on a up machine. On mine here all I
have to do is try to boot to multiuser mode, it won't make it through the
startup scripts before it locks up by trying to reenter common_interrupt
on the only CPU. Seems like an SMP kernel on UP hardware doesn't disable
interrupts properly maybe? I get task lists via alt-sysreq when the
machine should be hardlocked I think. Anyway, this is what has been
tricking me into thinking I had an IDE problem. IDE is innocent, it's the
core interrupt handling code.

Back to work on scsi stuff now that I have a decently running 2.5.43
system, I'll let someone else deal with these.

--
Doug Ledford <[email protected]> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606


2002-10-17 04:22:38

by Doug Ledford

[permalink] [raw]
Subject: Re: 2.5.43 IO-APIC bug and spinlock deadlock

On Wed, Oct 16, 2002 at 11:33:02PM -0400, Doug Ledford wrote:
> IO-APIC bug: regular kernel, UP, no IO-APIC or APIC on UP enabled, compile
> fails (does *everyone* run SMP or at least UP + APIC now?)

OK, this is real.

> spinlock deadlock: run an smp kernel on a up machine. On mine here all I
> have to do is try to boot to multiuser mode, it won't make it through the

This turned out to be a red herring. Up on Up failed for me to. Did
finally track down to the area of the problem. That will be under
separate email with different subject so it will get the right person's
attention.

--
Doug Ledford <[email protected]> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606

2002-10-17 04:34:36

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.43 IO-APIC bug and spinlock deadlock

Doug Ledford wrote:
>
> On Wed, Oct 16, 2002 at 11:33:02PM -0400, Doug Ledford wrote:
> > IO-APIC bug: regular kernel, UP, no IO-APIC or APIC on UP enabled, compile
> > fails (does *everyone* run SMP or at least UP + APIC now?)
>
> OK, this is real.
>

Linus has merged a patch for this. Does it work for you? I don't
think you've sent us any error output.


include/asm-i386/apic.h | 4 ++--
include/asm-i386/smp.h | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

--- 2.5.43/include/asm-i386/smp.h~mpparse-fix Tue Oct 15 21:26:18 2002
+++ 2.5.43-akpm/include/asm-i386/smp.h Tue Oct 15 21:26:31 2002
@@ -37,6 +37,7 @@
#endif /* CONFIG_CLUSTERED_APIC */
#endif

+#define BAD_APICID 0xFFu
#ifdef CONFIG_SMP
#ifndef __ASSEMBLY__

@@ -65,7 +66,6 @@ extern void zap_low_mappings (void);
* the real APIC ID <-> CPU # mapping.
*/
#define MAX_APICID 256
-#define BAD_APICID 0xFFu
extern volatile int cpu_to_physical_apicid[NR_CPUS];
extern volatile int physical_apicid_to_cpu[MAX_APICID];
extern volatile int cpu_to_logical_apicid[NR_CPUS];
--- 2.5.43/include/asm-i386/apic.h~mpparse-fix Tue Oct 15 21:34:03 2002
+++ 2.5.43-akpm/include/asm-i386/apic.h Tue Oct 15 21:34:05 2002
@@ -7,8 +7,6 @@
#include <asm/apicdef.h>
#include <asm/system.h>

-#ifdef CONFIG_X86_LOCAL_APIC
-
#define APIC_DEBUG 0

#if APIC_DEBUG
@@ -17,6 +15,8 @@
#define Dprintk(x...)
#endif

+#ifdef CONFIG_X86_LOCAL_APIC
+
/*
* Basic functions accessing APICs.
*/

.

2002-10-17 04:57:37

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: 2.5.43 IO-APIC bug and spinlock deadlock

On Wed, 16 Oct 2002, Doug Ledford wrote:

> IO-APIC bug: regular kernel, UP, no IO-APIC or APIC on UP enabled, compile
> fails (does *everyone* run SMP or at least UP + APIC now?)
>
> spinlock deadlock: run an smp kernel on a up machine. On mine here all I
> have to do is try to boot to multiuser mode, it won't make it through the
> startup scripts before it locks up by trying to reenter common_interrupt
> on the only CPU. Seems like an SMP kernel on UP hardware doesn't disable
> interrupts properly maybe? I get task lists via alt-sysreq when the
> machine should be hardlocked I think. Anyway, this is what has been
> tricking me into thinking I had an IDE problem. IDE is innocent, it's the
> core interrupt handling code.

Hmm i'm running an SMP kernel on a UP (only Local-APIC present) and the
machine is currently running X. I've gotten an SMP kernel on a UP box
without any APICs to also go multiuser too (currently 30 minutes uptime).

Zwane
--
function.linuxpower.ca


2002-10-17 05:06:57

by Doug Ledford

[permalink] [raw]
Subject: Re: 2.5.43 IO-APIC bug and spinlock deadlock

On Wed, Oct 16, 2002 at 09:40:24PM -0700, Andrew Morton wrote:
> Doug Ledford wrote:
> >
> > On Wed, Oct 16, 2002 at 11:33:02PM -0400, Doug Ledford wrote:
> > > IO-APIC bug: regular kernel, UP, no IO-APIC or APIC on UP enabled, compile
> > > fails (does *everyone* run SMP or at least UP + APIC now?)
> >
> > OK, this is real.
> >
>
> Linus has merged a patch for this. Does it work for you? I don't
> think you've sent us any error output.
>
>
> include/asm-i386/apic.h | 4 ++--
> include/asm-i386/smp.h | 2 +-
> 2 files changed, 3 insertions(+), 3 deletions(-)

No, tried that, didn't work. Turn off SMP in your config and also turn
off APIC support entirely, that's when it breaks the compile.

--
Doug Ledford <[email protected]> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606

2002-10-17 08:23:38

by Mikael Pettersson

[permalink] [raw]
Subject: Re: 2.5.43 IO-APIC bug and spinlock deadlock

Doug Ledford writes:
> On Wed, Oct 16, 2002 at 09:40:24PM -0700, Andrew Morton wrote:
> > Doug Ledford wrote:
> > >
> > > On Wed, Oct 16, 2002 at 11:33:02PM -0400, Doug Ledford wrote:
> > > > IO-APIC bug: regular kernel, UP, no IO-APIC or APIC on UP enabled, compile
> > > > fails (does *everyone* run SMP or at least UP + APIC now?)
> > >
> > > OK, this is real.
> > >
> >
> > Linus has merged a patch for this. Does it work for you? I don't
> > think you've sent us any error output.
> >
> >
> > include/asm-i386/apic.h | 4 ++--
> > include/asm-i386/smp.h | 2 +-
> > 2 files changed, 3 insertions(+), 3 deletions(-)
>
> No, tried that, didn't work. Turn off SMP in your config and also turn
> off APIC support entirely, that's when it breaks the compile.

Ah, that rings a bell. Does 'grep MPPARSE' find a match in your .config
even though SMP and *APIC are disabled? That's a scripts/Configure bug:
it enables MPPARSE because CONFIG_X86_LOCAL_APIC was enabled at the start
of this Configure run, even though CONFIG_X86_LOCAL_APIC was disabled later.
Another 'make oldconfig' fixes it.

2002-10-18 19:33:36

by Bill Davidsen

[permalink] [raw]
Subject: Re: 2.5.43 IO-APIC bug and spinlock deadlock

On Wed, 16 Oct 2002, Doug Ledford wrote:

> IO-APIC bug: regular kernel, UP, no IO-APIC or APIC on UP enabled, compile
> fails (does *everyone* run SMP or at least UP + APIC now?)
>
> spinlock deadlock: run an smp kernel on a up machine. On mine here all I
> have to do is try to boot to multiuser mode, it won't make it through the
> startup scripts before it locks up by trying to reenter common_interrupt
> on the only CPU. Seems like an SMP kernel on UP hardware doesn't disable
> interrupts properly maybe? I get task lists via alt-sysreq when the
> machine should be hardlocked I think. Anyway, this is what has been
> tricking me into thinking I had an IDE problem. IDE is innocent, it's the
> core interrupt handling code.

Doug, I noted a similar bug back about 2.5.38, which went away if I boot
the SMP kernel on uni with "nosmp" in the boot parameters. If you are
curious I'd love to know if that's related, and I'm sure someone looking
at the problem would like to know as well.

The 2.4 kernels seem fine in that regard, I did a test the hard way, when
I got a batch of Xeons with the lifespan of a mayfly. After one died and
the system was rebooted every one came up cleanly with only one CPU.
However, most of the SMP hardware was still there, and I boot "noapic"
because it seems to help uptime.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.