2003-11-11 19:47:43

by Julien Oster

[permalink] [raw]
Subject: A7N8X (Deluxe) Madness


Hello,

seriously, I'm pretty fed up with it.

I have an ASUS A7N8X Deluxe mainboard. Yeah, right, that thing causing
serious trouble. I'm getting hard lockups all the time. No panic, no
message, no sysrq, no blinking cursor in the framebuffer. Gone for good.

I went through the mailing list archive and tried out many
things. However, this is how far I got:

With 2.6.0-test9, the machine locks up while booting or shortly
after. This is clearly connected to high IDE (PATA) load, since it
locks up with a 100% chance while doing an fsck. If I managed booting
it (which means, if it doesn't do an fsck while booting) I can lock it
up immediately by doing a hdparm -t /dev/hda. I don't know what SATA
load would do on that kernel, I never got that far.

Specifying "noapic nolapic acpi=off noacpi=off" helps, I got no
lockups. However, I don't like this, because of the performance flaws
(I'll talk about this later).

So, one might suspect: Something between APIC or ACPI (or both) and
the IDE controller broken, nothing to fix there, that's life. Right?
Wrong. Because:

With 2.4.22-ac4 it actually works *better*. Not absolutely good, but
better. I can achieve uptimes up to *several days*. However, it still
locks up. Sometimes after several days, sometimes some minutes after
booting. But basically I can actually use my computer with
2.4.22-ac4. Strangely, the lockups don't seem to be connected to IDE
load with that kernel. When the machine locks up, it simply does,
without any appearent cause. I can create as many CPU, disk, network
or whatever load I want. All goes fine. Then I leave the computer, the
machine staying idle, I come back and it's crashed. I even have the
impression, that it only crashes when it has no load at all. Clearly
spoken, I can't really remember that it locked up when I was sitting
in front of the computer. Moving the mouse or typing things seems to
create enough load to actually keep it from locking up?!

So, things are totally different between 2.6.0-test9 and
2.4.22-ac4. 2.6.0-test9 doesn't like the slightest IDE load with that
mainboard at all. 2.4.22-ac4 doesn't care, runs for hours or for days
and then locks up when it just gets bored or something similar.

The solution might look simple: why don't I just use 2.6.0-test9 with
the enormous "noapic nolapic acpi=off pci=noacpi" command line?
Because then, my SATA performance really is a pain compared to what I
can get with 2.4.22-ac4. A simple example with hdparm -t (I tried
other things, also, but this already gives a nice example): with
2.4.22-ac4 I get amazing 100 to 110 MB/s on the SATA RAID. With
2.6.0-test9 and the nasty command line, I get at most 40MB/s. To feel
the difference, I just have to fire up Oracle and let it do some I/O
expensive things.

Has nobody an idea what it could be? That's just strange, both kernels
are unstable on that mainboard, but the one is much more stable while
locking up in completely different situations.

If that continues like that, I'll begin to feel the urge of hunting
ASUS and NVIDIA down.

Well, I hope I could give you some worthy information.

In great despair,
Julien


2003-11-11 19:55:37

by Maciej Żenczykowski

[permalink] [raw]
Subject: Re: A7N8X (Deluxe) Madness

I'd guess one is locking up due to hard disk load,
and the other is locking up due to automatic suspend/standby issues.
Can you verify that the ac kernel isn't locking up due to a 'screensaver'
type problem?

> Hello,
>
> seriously, I'm pretty fed up with it.
>
> I have an ASUS A7N8X Deluxe mainboard. Yeah, right, that thing causing
> serious trouble. I'm getting hard lockups all the time. No panic, no
> message, no sysrq, no blinking cursor in the framebuffer. Gone for good.
>
> I went through the mailing list archive and tried out many
> things. However, this is how far I got:
>
> With 2.6.0-test9, the machine locks up while booting or shortly
> after. This is clearly connected to high IDE (PATA) load, since it
> locks up with a 100% chance while doing an fsck. If I managed booting
> it (which means, if it doesn't do an fsck while booting) I can lock it
> up immediately by doing a hdparm -t /dev/hda. I don't know what SATA
> load would do on that kernel, I never got that far.
>
> Specifying "noapic nolapic acpi=off noacpi=off" helps, I got no
> lockups. However, I don't like this, because of the performance flaws
> (I'll talk about this later).
>
> So, one might suspect: Something between APIC or ACPI (or both) and
> the IDE controller broken, nothing to fix there, that's life. Right?
> Wrong. Because:
>
> With 2.4.22-ac4 it actually works *better*. Not absolutely good, but
> better. I can achieve uptimes up to *several days*. However, it still
> locks up. Sometimes after several days, sometimes some minutes after
> booting. But basically I can actually use my computer with
> 2.4.22-ac4. Strangely, the lockups don't seem to be connected to IDE
> load with that kernel. When the machine locks up, it simply does,
> without any appearent cause. I can create as many CPU, disk, network
> or whatever load I want. All goes fine. Then I leave the computer, the
> machine staying idle, I come back and it's crashed. I even have the
> impression, that it only crashes when it has no load at all. Clearly
> spoken, I can't really remember that it locked up when I was sitting
> in front of the computer. Moving the mouse or typing things seems to
> create enough load to actually keep it from locking up?!
>
> So, things are totally different between 2.6.0-test9 and
> 2.4.22-ac4. 2.6.0-test9 doesn't like the slightest IDE load with that
> mainboard at all. 2.4.22-ac4 doesn't care, runs for hours or for days
> and then locks up when it just gets bored or something similar.
>
> The solution might look simple: why don't I just use 2.6.0-test9 with
> the enormous "noapic nolapic acpi=off pci=noacpi" command line?
> Because then, my SATA performance really is a pain compared to what I
> can get with 2.4.22-ac4. A simple example with hdparm -t (I tried
> other things, also, but this already gives a nice example): with
> 2.4.22-ac4 I get amazing 100 to 110 MB/s on the SATA RAID. With
> 2.6.0-test9 and the nasty command line, I get at most 40MB/s. To feel
> the difference, I just have to fire up Oracle and let it do some I/O
> expensive things.
>
> Has nobody an idea what it could be? That's just strange, both kernels
> are unstable on that mainboard, but the one is much more stable while
> locking up in completely different situations.
>
> If that continues like that, I'll begin to feel the urge of hunting
> ASUS and NVIDIA down.
>
> Well, I hope I could give you some worthy information.
>
> In great despair,
> Julien
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2003-11-11 20:18:25

by Julien Oster

[permalink] [raw]
Subject: Re: A7N8X (Deluxe) Madness

Maciej Zenczykowski <[email protected]> writes:

Hello Maciej,

>> So, things are totally different between 2.6.0-test9 and
>> 2.4.22-ac4. 2.6.0-test9 doesn't like the slightest IDE load with that
>> mainboard at all. 2.4.22-ac4 doesn't care, runs for hours or for days
>> and then locks up when it just gets bored or something similar.

> I'd guess one is locking up due to hard disk load,
> and the other is locking up due to automatic suspend/standby issues.
> Can you verify that the ac kernel isn't locking up due to a 'screensaver'
> type problem?

Interesting question. I also thought about that one. However,
regarding X, the machine sometimes crashes before the X Server
screensaver (nothing special there, just the built in one that turns
the screen black) is clearing the screen and sometimes afterwards. If
it crashes afterwards, I can of course not see when it crashed, since
I don't see the clock on the screen anymore.

And there's nothing else which I could think of. I have resetted the
spinout time for the harddisks to "never" (for different reasons) and
I don't think that there's any power saving stuff enabled in BIOS
setup. I'll check that. However, I'm afraid there really isn't any
screensaver or powersaving thing within my system, of course for the
standard X screensaver, which doesn't seem related to it.

Regards,
Julien

2003-11-11 20:09:24

by Erik Andersen

[permalink] [raw]
Subject: Re: A7N8X (Deluxe) Madness

On Tue Nov 11, 2003 at 08:47:38PM +0100, Julien Oster wrote:
>
> Hello,
>
> seriously, I'm pretty fed up with it.
>
> I have an ASUS A7N8X Deluxe mainboard. Yeah, right, that thing causing
> serious trouble. I'm getting hard lockups all the time. No panic, no
> message, no sysrq, no blinking cursor in the framebuffer. Gone for good.

Does it help if you go into the BIOS and set the IDE controller
to "Compatible Mode" rather than "Enhanced Mode"?

-Erik

--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--

2003-11-11 20:24:07

by Julien Oster

[permalink] [raw]
Subject: Re: A7N8X (Deluxe) Madness

Erik Andersen <[email protected]> writes:

Hello Erik,

>> I have an ASUS A7N8X Deluxe mainboard. Yeah, right, that thing causing
>> serious trouble. I'm getting hard lockups all the time. No panic, no
>> message, no sysrq, no blinking cursor in the framebuffer. Gone for good.

> Does it help if you go into the BIOS and set the IDE controller
> to "Compatible Mode" rather than "Enhanced Mode"?

I'm sorry, but I just can't find that option... it's the newest BIOS
version, however?

Regards,
Julien

2003-11-11 20:26:10

by Maciej Żenczykowski

[permalink] [raw]
Subject: Re: A7N8X (Deluxe) Madness

> > I'd guess one is locking up due to hard disk load,
> > and the other is locking up due to automatic suspend/standby issues.
> > Can you verify that the ac kernel isn't locking up due to a 'screensaver'
> > type problem?
>
> Interesting question. I also thought about that one. However,
> regarding X, the machine sometimes crashes before the X Server
> screensaver (nothing special there, just the built in one that turns
> the screen black) is clearing the screen and sometimes afterwards. If
> it crashes afterwards, I can of course not see when it crashed, since
> I don't see the clock on the screen anymore.
>
> And there's nothing else which I could think of. I have resetted the
> spinout time for the harddisks to "never" (for different reasons) and
> I don't think that there's any power saving stuff enabled in BIOS
> setup. I'll check that. However, I'm afraid there really isn't any
> screensaver or powersaving thing within my system, of course for the
> standard X screensaver, which doesn't seem related to it.

Indeed however I didn't mean the X server xscreensaver and family - I
meant the BIOS DPMS, kernel console saver, etc functionality. I had this
kind of problem with my stationary computer (it locked solid when the
screen was blanked) with some older kernel version (around 2.4.9). I
think kernel screen saveing can be turned off with some sort of escape
code...

indeed:
$ man console_codes
/timeout
gives:
ESC [ 9 ; n ] where n is screen blank timeout in minutes
ESC [ 13 ] to unblank
ESC [ 14 ; n ] to set the VESA powerdown interval in minutes
so try something like
echo -e "\e[13]\e[9;10080]\e[14;10080]"
to make it blank after a week and see if it still locks.
You can also try turning of VESA/DPMS blanking in the Bios.

Cheers,
MaZe.


2003-11-11 21:09:23

by Erik Andersen

[permalink] [raw]
Subject: Re: A7N8X (Deluxe) Madness

On Tue Nov 11, 2003 at 09:24:03PM +0100, Julien Oster wrote:
> Erik Andersen <[email protected]> writes:
>
> Hello Erik,
>
> >> I have an ASUS A7N8X Deluxe mainboard. Yeah, right, that thing causing
> >> serious trouble. I'm getting hard lockups all the time. No panic, no
> >> message, no sysrq, no blinking cursor in the framebuffer. Gone for good.
>
> > Does it help if you go into the BIOS and set the IDE controller
> > to "Compatible Mode" rather than "Enhanced Mode"?
>
> I'm sorry, but I just can't find that option... it's the newest BIOS
> version, however?

I have an ASUS mb with that option, but I just checked
your manual and it indeed does not have that option.
Anyway, the problem I had was that I had my SATA ports
as well as all usb devices sharing the same interrupt
and the resulting interrupt storm was easily seen by
watching /proc/interrupts

-Erik

--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--

2003-11-11 21:31:45

by Julien Oster

[permalink] [raw]
Subject: Re: A7N8X (Deluxe) Madness

Erik Andersen <[email protected]> writes:

Hello Erik,

>> > Does it help if you go into the BIOS and set the IDE controller
>> > to "Compatible Mode" rather than "Enhanced Mode"?

> I have an ASUS mb with that option, but I just checked
> your manual and it indeed does not have that option.

Unfortunately, yes...

> Anyway, the problem I had was that I had my SATA ports
> as well as all usb devices sharing the same interrupt
> and the resulting interrupt storm was easily seen by
> watching /proc/interrupts

Well, I guess, that may be the point. With APIC enabled, I have a lot
of interrupts available. Without APIC, there are only those available
ever since the IBM AT. So, an excerpt of /proc/interrupts without APIC
looks like that:

10: 224131 XT-PIC ide2, ide3, usb-ohci, usb-ohci, eth0, EMU10K1
11: 0 XT-PIC NVidia nForce2
14: 61649 XT-PIC ide0
15: 60954 XT-PIC ide1

As you see, IRQ 10 ist really crowded with stuff. ide2 and ide3 are my
SATA channels, on USB there's my mouse and sometimes my mobile phone
or my pocket pc, eth0 is one quite heavily used ethernet card and my
soundcard... well, sometimes it's playing music.

And I just typed "ifconfig eth2 up" (I have a 4-port DEC network card
in my workstation), today it's unused, but just to see:

10: 233008 XT-PIC ide2, ide3, usb-ohci, usb-ohci, eth0, EMU10K1, eth2

Uh.

With ISA cards, long time ago, I was able to select the interrupt for
each card myself, either through jumpers or later by using PnP. Is
there any such possibility for PCI, or do I just have to accept what
the kernel or the mainboard is giving me?

Just balancing my devices on the available interrupts might already
help. Currently, according to /proc/interrupts, IRQ 3, 4 and 7 are
completely unused!

Regards,
Julien

2003-11-12 02:57:08

by Josh McKinney

[permalink] [raw]
Subject: Re: A7N8X (Deluxe) Madness

I thought I would share some of my experiences with the ASUS A7N8X. I
just got this mobo last week, so I haven't had a whole lot of time with
it nor do I have anything on the SATA controller. 2.6.0-test9-mm2 would
crash hard with any IDE activity with APIC and IO-APIC enabled.
recompiling the kernel without APIC or IO-APIC but with APCI still
enabled and and *no* pci=noacpi on the command line the board is
perfectly stable and I see no performance hit with the IDE disks. Here
is my /proc/interrupts with the working config:

$ cat /proc/interrupts
CPU0
0: 90624732 XT-PIC timer
1: 21404 XT-PIC i8042
2: 0 XT-PIC cascade
5: 35712 XT-PIC ohci_hcd
8: 1 XT-PIC rtc
9: 0 XT-PIC acpi
11: 6930402 XT-PIC nvidia
12: 114340 XT-PIC ehci_hcd, ohci_hcd, eth0, NVidia
nForce2
14: 887 XT-PIC ide0
15: 133930 XT-PIC ide1
NMI: 0
ERR: 0

If there is anything else I could test or anymore info I could give to
help track down this problem I would be more than happy to help. I am
planning on buying some SATA drives soon and might change my mind if
this issue isn't cleared up.

Thanks

On approximately Tue, Nov 11, 2003 at 08:47:38PM +0100, Julien Oster wrote:
>
> Hello,
>
> seriously, I'm pretty fed up with it.
>
> I have an ASUS A7N8X Deluxe mainboard. Yeah, right, that thing causing
> serious trouble. I'm getting hard lockups all the time. No panic, no
> message, no sysrq, no blinking cursor in the framebuffer. Gone for good.
>
> I went through the mailing list archive and tried out many
> things. However, this is how far I got:
>
> With 2.6.0-test9, the machine locks up while booting or shortly
> after. This is clearly connected to high IDE (PATA) load, since it
> locks up with a 100% chance while doing an fsck. If I managed booting
> it (which means, if it doesn't do an fsck while booting) I can lock it
> up immediately by doing a hdparm -t /dev/hda. I don't know what SATA
> load would do on that kernel, I never got that far.
>
> Specifying "noapic nolapic acpi=off noacpi=off" helps, I got no
> lockups. However, I don't like this, because of the performance flaws
> (I'll talk about this later).
>
> So, one might suspect: Something between APIC or ACPI (or both) and
> the IDE controller broken, nothing to fix there, that's life. Right?
> Wrong. Because:
>
> With 2.4.22-ac4 it actually works *better*. Not absolutely good, but
> better. I can achieve uptimes up to *several days*. However, it still
> locks up. Sometimes after several days, sometimes some minutes after
> booting. But basically I can actually use my computer with
> 2.4.22-ac4. Strangely, the lockups don't seem to be connected to IDE
> load with that kernel. When the machine locks up, it simply does,
> without any appearent cause. I can create as many CPU, disk, network
> or whatever load I want. All goes fine. Then I leave the computer, the
> machine staying idle, I come back and it's crashed. I even have the
> impression, that it only crashes when it has no load at all. Clearly
> spoken, I can't really remember that it locked up when I was sitting
> in front of the computer. Moving the mouse or typing things seems to
> create enough load to actually keep it from locking up?!
>
> So, things are totally different between 2.6.0-test9 and
> 2.4.22-ac4. 2.6.0-test9 doesn't like the slightest IDE load with that
> mainboard at all. 2.4.22-ac4 doesn't care, runs for hours or for days
> and then locks up when it just gets bored or something similar.
>
> The solution might look simple: why don't I just use 2.6.0-test9 with
> the enormous "noapic nolapic acpi=off pci=noacpi" command line?
> Because then, my SATA performance really is a pain compared to what I
> can get with 2.4.22-ac4. A simple example with hdparm -t (I tried
> other things, also, but this already gives a nice example): with
> 2.4.22-ac4 I get amazing 100 to 110 MB/s on the SATA RAID. With
> 2.6.0-test9 and the nasty command line, I get at most 40MB/s. To feel
> the difference, I just have to fire up Oracle and let it do some I/O
> expensive things.
>
> Has nobody an idea what it could be? That's just strange, both kernels
> are unstable on that mainboard, but the one is much more stable while
> locking up in completely different situations.
>
> If that continues like that, I'll begin to feel the urge of hunting
> ASUS and NVIDIA down.
>
> Well, I hope I could give you some worthy information.
>
> In great despair,
> Julien
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Josh McKinney | Webmaster: http://joshandangie.org
--------------------------------------------------------------------------
| They that can give up essential liberty
Linux, the choice -o) | to obtain a little temporary safety deserve
of the GNU generation /\ | neither liberty or safety.
_\_v | -Benjamin Franklin