2001-04-01 04:15:27

by Simon Garner

[permalink] [raw]
Subject: Asus CUV4X-D, 2.4.3 crashes at boot

Hi,

I've compiled kernel 2.4.3 on the following RH7 system, and I'm now getting
random crashes at boot, during IO-APIC initialisation. Random meaning that
sometimes it boots fine, other times it doesn't, and it hangs in different
places (but always around IO-APIC stuff). It almost always hangs after a
cold boot - if I do a Ctrl+Alt+Del then it will usually boot up OK.

System: Asus CUV4X-D motherboard, Dual P3 800EB.

The last thing I see on the screen when it hangs is, for example:



CPU1: Intel Pentium III (Coppermine) stepping 06
CPU has booted.
Before bogomips.
Total of 2 processors activated (3207.98 BogoMIPS).
Before bogocount - setting activated=1.
Boot done.
ENABLING IO-APIC IRQs
...changing IO-APIC physical APIC ID to 2 ... ok.
Synchronizing Arb IDs.
...TIMER: vector=49 pin1=2 pin2=0



Sometimes it gets a little further, but it's always somewhere near the
IO-APIC
stuff.

When it does boot, I get:



CPU1: Intel Pentium III (Coppermine) stepping 06
CPU has booted.
Before bogomips.
Total of 2 processors activated (3207.98 BogoMIPS).
Before bogocount - setting activated=1.
Boot done.
ENABLING IO-APIC IRQs
...changing IO-APIC physical APIC ID to 2 ... ok.
Synchronizing Arb IDs.
init IO_APIC IRQs
IO-APIC (apicid-pin) 2-5, 2-10, 2-11, 2-13, 2-19, 2-20, 2-21, 2-22, 2-23
not connected.
..TIMER: vector=49 pin1=2 pin2=0
number of MP IRQ sources: 17.
number of IO-APIC #2 registers: 24.
testing the IO APIC.......................

IO APIC #2......
.... register #00: 02000000
....... : physical APIC id: 02
.... register #01: 00178011
....... : max redirection entries: 0017
....... : IO APIC version: 0011
WARNING: unexpected IO-APIC, please mail
to [email protected]
.... register #02: 00000000
....... : arbitration: 00



Full dmesg output:
http://www.expio.co.nz/~sgarner/orion/smp/dmesg.txt
My kernel .config:
http://www.expio.co.nz/~sgarner/orion/smp/config.txt
Output from lspci -xx:
http://www.expio.co.nz/~sgarner/orion/smp/lspcixx.txt


Any ideas?


Thanks in advance,

Simon Garner


2001-04-01 05:14:48

by Allen Campbell

[permalink] [raw]
Subject: Re: Asus CUV4X-D, 2.4.3 crashes at boot

On Sun, Apr 01, 2001 at 04:15:38PM +1200, Simon Garner wrote:
> Hi,
>
> I've compiled kernel 2.4.3 on the following RH7 system, and I'm now getting
> random crashes at boot, during IO-APIC initialisation. Random meaning that
> sometimes it boots fine, other times it doesn't, and it hangs in different
> places (but always around IO-APIC stuff). It almost always hangs after a
> cold boot - if I do a Ctrl+Alt+Del then it will usually boot up OK.
>
> System: Asus CUV4X-D motherboard, Dual P3 800EB.
>
> The last thing I see on the screen when it hangs is, for example:
[snip]

I've seen the exact same behavior with my CUV4X-D (2x1GHz) under
2.4.2 (debian woody). In addition, the kernel would sometimes hang
around NMI watchdog enable. At least, I think it's trying to
`enable'. The hang would occur around 50% of boot attempts. Once
booted, everything was stable. A non-SMP 2.4.2 kernel (no IO-APIC
either, sorry, didn't test that) always booted without hangs.

Strangely, (happily for me,) the boot hangs stopped with 2.4.3.
I've booted maybe 10 times (hot and cold) since I built 2.4.3 and
I've had no hangs. When I get back to the box, I'll try booting
a few dozen more times and see if I can confirm your observation.

--
Allen Campbell
[email protected]

2001-04-01 09:18:51

by Simon Garner

[permalink] [raw]
Subject: Re: Asus CUV4X-D, 2.4.3 crashes at boot

From: "Allen Campbell" <[email protected]>

> I've seen the exact same behavior with my CUV4X-D (2x1GHz) under
> 2.4.2 (debian woody). In addition, the kernel would sometimes hang
> around NMI watchdog enable. At least, I think it's trying to
> `enable'. The hang would occur around 50% of boot attempts. Once
> booted, everything was stable. A non-SMP 2.4.2 kernel (no IO-APIC
> either, sorry, didn't test that) always booted without hangs.

Yep, sounds like the same problem.


>
> Strangely, (happily for me,) the boot hangs stopped with 2.4.3.
> I've booted maybe 10 times (hot and cold) since I built 2.4.3 and
> I've had no hangs. When I get back to the box, I'll try booting
> a few dozen more times and see if I can confirm your observation.
>

Please do test it. I think you'll find the problem is still very much
present.


Cheers

Simon Garner

2001-04-01 09:48:23

by Allen Campbell

[permalink] [raw]
Subject: Re: Asus CUV4X-D, 2.4.3 crashes at boot

On Sun, Apr 01, 2001 at 09:18:25PM +1200, Simon Garner wrote:
> From: "Allen Campbell" <[email protected]>
>
> > I've seen the exact same behavior with my CUV4X-D (2x1GHz) under
> > 2.4.2 (debian woody). In addition, the kernel would sometimes hang
> > around NMI watchdog enable. At least, I think it's trying to
> > `enable'. The hang would occur around 50% of boot attempts. Once
> > booted, everything was stable. A non-SMP 2.4.2 kernel (no IO-APIC
> > either, sorry, didn't test that) always booted without hangs.
>
> Yep, sounds like the same problem.
>
>
> >
> > Strangely, (happily for me,) the boot hangs stopped with 2.4.3.
> > I've booted maybe 10 times (hot and cold) since I built 2.4.3 and
> > I've had no hangs. When I get back to the box, I'll try booting
> > a few dozen more times and see if I can confirm your observation.
> >
>
> Please do test it. I think you'll find the problem is still very much
> present.

Yeah, still there. Cold boot only.

2001-04-01 09:57:04

by Mikael Pettersson

[permalink] [raw]
Subject: Re: Asus CUV4X-D, 2.4.3 crashes at boot

Simon Garner wrote:

>I've compiled kernel 2.4.3 on the following RH7 system, and I'm now getting
>random crashes at boot, during IO-APIC initialisation. Random meaning that
>sometimes it boots fine, other times it doesn't, and it hangs in different
>places (but always around IO-APIC stuff). It almost always hangs after a
>cold boot - if I do a Ctrl+Alt+Del then it will usually boot up OK.
>
>System: Asus CUV4X-D motherboard, Dual P3 800EB.
>...
>Any ideas?

Boot with "nmi_watchdog=0" as a boot parameter. Does it work now?

Some people have reported before here that the IO-APIC driven NMI
watchdog itself can cause boot-time hangs.

/Mikael

2001-04-01 10:04:04

by Simon Garner

[permalink] [raw]
Subject: Re: Asus CUV4X-D, 2.4.3 crashes at boot

From: "Mikael Pettersson" <[email protected]>

> Boot with "nmi_watchdog=0" as a boot parameter. Does it work now?
>
> Some people have reported before here that the IO-APIC driven NMI
> watchdog itself can cause boot-time hangs.
>
> /Mikael


Thanks, but I do not have watchdog support compiled into the kernel.


2001-04-01 10:11:14

by David Weinehall

[permalink] [raw]
Subject: Re: Asus CUV4X-D, 2.4.3 crashes at boot

On Sun, Apr 01, 2001 at 10:04:17PM +1200, Simon Garner wrote:
> From: "Mikael Pettersson" <[email protected]>
>
> > Boot with "nmi_watchdog=0" as a boot parameter. Does it work now?
> >
> > Some people have reported before here that the IO-APIC driven NMI
> > watchdog itself can cause boot-time hangs.
> >
> > /Mikael
>
>
> Thanks, but I do not have watchdog support compiled into the kernel.

Doesn't matter. The NMI-watchdog tries to detect SMP-lockups, and is
always present. Unless you specifically disable it on boot.


/David Weinehall
_ _
// David Weinehall <[email protected]> /> Northern lights wander \\
// Project MCA Linux hacker // Dance across the winter sky //
\> http://www.acc.umu.se/~tao/ </ Full colour fire </

2001-04-01 12:52:41

by Keith Owens

[permalink] [raw]
Subject: Re: Asus CUV4X-D, 2.4.3 crashes at boot

On Sun, 1 Apr 2001 12:09:18 +0200,
David Weinehall <[email protected]> wrote:
>On Sun, Apr 01, 2001 at 10:04:17PM +1200, Simon Garner wrote:
>> Thanks, but I do not have watchdog support compiled into the kernel.
>
>Doesn't matter. The NMI-watchdog tries to detect SMP-lockups, and is
>always present. Unless you specifically disable it on boot.

Not any more. In 2.4.3-ac* the default is no watchdog and it must be
specifically enabled at boot.

2001-04-01 23:50:26

by Simon Garner

[permalink] [raw]
Subject: Re: Asus CUV4X-D, 2.4.3 crashes at boot

From: "Keith Owens" <[email protected]>

> >Doesn't matter. The NMI-watchdog tries to detect SMP-lockups, and is
> >always present. Unless you specifically disable it on boot.
>
> Not any more. In 2.4.3-ac* the default is no watchdog and it must be
> specifically enabled at boot.
>


nmi_watchdog 0 didn't help - the above would explain why.

Any more ideas? My expensive server is basically useless because of this. :(

2001-04-02 00:48:09

by Simon Garner

[permalink] [raw]
Subject: Re: Asus CUV4X-D, 2.4.3 crashes at boot

From: "Jeff Garzik" <[email protected]>

> (private reply, because I have lost discussion context)
>
> Have you tried booting with 'noapic'?
>
>


Thanks Jeff, this seems to fix the problem, and also fixes my problem with
the aic7xxx scsi driver ABORTing multiple times at startup (which I presumed
was unrelated).

However, the machine now crashes at "Configuring Kernel Parameters" during
rc initialisation:


Welcome to Red Hat Linux
Press 'I' for interactive startup

Mounting /proc filesystem... [ OK ]
Configuring Kernel Parameters...


This is if I type "linux noapic" at the Lilo boot prompt.

Also, what do I lose by running with noapic?


Thanks

Simon Garner

2001-04-02 02:59:17

by Simon Garner

[permalink] [raw]
Subject: Re: Asus CUV4X-D, 2.4.3 crashes at boot

Hi all,

>
> However, the machine now crashes at "Configuring Kernel Parameters" during
> rc initialisation:
>
>
> Welcome to Red Hat Linux
> Press 'I' for interactive startup
>
> Mounting /proc filesystem... [ OK ]
> Configuring Kernel Parameters...
>
>
> This is if I type "linux noapic" at the Lilo boot prompt.
>
> Also, what do I lose by running with noapic?
>
>


Just discovered the above is not quite correct - it actually says [ OK ]
after Configuring Kernel Parameters, and crashes on the next line.

Reading through /etc/rc.d/rc.sysinit, the next line is where it sets the
system clock. If I comment out the line:

/sbin/hwclock $CLOCKFLAGS

Then the system will boot OK with 'noapic'. So presumably the system RTC is
not accessed in a SMP-compatible way without APIC.

Anyway, I'm not too happy about having to run without APIC - seems more of a
workaround than a fix. I'm happy to test patches etc if anyone has any
ideas - this problem I presume affects all motherboards using the VIA 694XDP
chipset.


Thanks in advance,

Simon Garner

2001-04-02 22:27:30

by Alan

[permalink] [raw]
Subject: Re: Asus CUV4X-D, 2.4.3 crashes at boot

> I've seen the exact same behavior with my CUV4X-D (2x1GHz) under
> 2.4.2 (debian woody). In addition, the kernel would sometimes hang
> around NMI watchdog enable. At least, I think it's trying to

Known problem. Thats one reason why -ac trees had nmi watchdog turned off.

2001-04-02 22:40:50

by Simon Garner

[permalink] [raw]
Subject: Re: Asus CUV4X-D, 2.4.3 crashes at boot

From: "Alan Cox" <[email protected]>

> > I've seen the exact same behavior with my CUV4X-D (2x1GHz) under
> > 2.4.2 (debian woody). In addition, the kernel would sometimes hang
> > around NMI watchdog enable. At least, I think it's trying to
>
> Known problem. Thats one reason why -ac trees had nmi watchdog turned off.


It still crashes with nmi_watchdog turned off.

Running with noapic fixes it but then the system crashes if you access the
RTC with hwclock (and probably creates a hundred other problems...).

How can I get this chipset/motherboard supported properly under Linux? I'm
happy to test patches etc. on the box. *pleading*


Cheers

Simon Garner

2001-04-03 06:48:51

by Allen Campbell

[permalink] [raw]
Subject: Re: Asus CUV4X-D, 2.4.3 crashes at boot

On Tue, Apr 03, 2001 at 10:40:36AM +1200, Simon Garner wrote:
> From: "Alan Cox" <[email protected]>
>
> > > I've seen the exact same behavior with my CUV4X-D (2x1GHz) under
> > > 2.4.2 (debian woody). In addition, the kernel would sometimes hang
> > > around NMI watchdog enable. At least, I think it's trying to
> >
> > Known problem. Thats one reason why -ac trees had nmi watchdog turned off.
>
> It still crashes with nmi_watchdog turned off.
>
> Running with noapic fixes it but then the system crashes if you access the
> RTC with hwclock (and probably creates a hundred other problems...).
>
> How can I get this chipset/motherboard supported properly under Linux? I'm
> happy to test patches etc. on the box. *pleading*

Patience is likely to be effective. The chipset isn't exactly rare
being on SMP boards from Gigabyte, MSI, Tyan and Asus, and likely
others. I'm betting it will be fixed soon enough. UP and 2.2.x
kernels worked fine here if you're really desperate. OTOH, the
board is stable once you get past the boot problems... What sort
of production system needs frequent unattended boots?

Sorry about this, I just don't remember signing any paychecks for
what I know is likely to be a non-issue probably before the next
time I actually have to do something drastic, like reboot.

--
Allen Campbell | Lurking at the bottom of the
[email protected] | gravity well, getting old.

2001-04-03 06:54:41

by Simon Garner

[permalink] [raw]
Subject: Re: Asus CUV4X-D, 2.4.3 crashes at boot

From: "Allen Campbell" <[email protected]>

> Patience is likely to be effective. The chipset isn't exactly rare
> being on SMP boards from Gigabyte, MSI, Tyan and Asus, and likely
> others. I'm betting it will be fixed soon enough. UP and 2.2.x
> kernels worked fine here if you're really desperate. OTOH, the
> board is stable once you get past the boot problems... What sort
> of production system needs frequent unattended boots?
>


I was planning to install the box as a colocated production webserver in 1-2
weeks' time.

I don't want to colocate a box that I cannot reboot, so I'll just have to
sit on it until it's fixed I guess.


Regards

Simon Garner