2002-08-14 12:04:11

by Matt Bernstein

[permalink] [raw]
Subject: GA-7DX+ crashes

Hi,

We're very much at a loss as to why the 60 new PCs we've bought largely
don't run Linux (various 2.4 kernels including 2.4.19, limbo1-BOOT) for
very long without crashing. One of them seems to work OK; its /proc/pci is
identical, but the batch number on the southbridge seems one lower--is
this dodgy VIA hardware again? We'll be trying a different IDE controller
next, but 60 of those ain't cheap..

Has anyone else had success or failure stories in particular with this
motherboard? We don't really have a significant number of data points just
yet, but are willing to try pretty much anything anyone might suggest!

Matt

symptoms
- random data corruption (sometimes memory, more often HDD)
- somtimes oopsing, but never in the same place

what we think we've ascertained so far
- they pass memtest86
- we've tried different HDDs, no effect
- tried ide=nodma, possibly makes it crash after longer
- tried noapic, no effect
- tried all sorts of BIOS settings, no effect (except--possibly--turning
off the on board IDE controller and playing nfsroot games)
- ..and yet they seem to run that other OS fine :-(
- extra cooling/underclocking doesn't seem to help
- seems to be fs-independent (tried ext3, reiserfs, jfs)

hardware
- GA-7DX+ motherboard
- AMD 761 northbridge
- VIA 686B southbridge
- Athlon 2000XP
- 256MB DDR RAM

/proc/pci and /proc/cpuinfo:

PCI devices found:
Bus 0, device 0, function 0:
Host bridge: Advanced Micro Devices [AMD] AMD-760 [IGD4-1P] System Controller (rev 20).
Master Capable. Latency=32.
Prefetchable 32 bit memory at 0xe8000000 [0xebffffff].
Prefetchable 32 bit memory at 0xee006000 [0xee006fff].
I/O at 0xd000 [0xd003].
Bus 0, device 1, function 0:
PCI bridge: Advanced Micro Devices [AMD] AMD-760 [IGD4-1P] AGP Bridge (rev 0).
Master Capable. Latency=32. Min Gnt=14.
Bus 0, device 7, function 0:
ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 64).
Bus 0, device 7, function 1:
IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 6).
Master Capable. Latency=32.
I/O at 0xd400 [0xd40f].
Bus 0, device 7, function 2:
USB Controller: VIA Technologies, Inc. UHCI USB (rev 26).
IRQ 11.
Master Capable. Latency=32.
I/O at 0xd800 [0xd81f].
Bus 0, device 7, function 3:
USB Controller: VIA Technologies, Inc. UHCI USB (#2) (rev 26).
IRQ 11.
Master Capable. Latency=32.
I/O at 0xdc00 [0xdc1f].
Bus 0, device 7, function 4:
SMBus: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 64).
IRQ 9.
Bus 0, device 7, function 5:
Multimedia audio controller: VIA Technologies, Inc. VT82C686 AC97 Audio Controller (rev 80).
IRQ 5.
I/O at 0xe000 [0xe0ff].
I/O at 0xe400 [0xe403].
I/O at 0xe800 [0xe803].
Bus 0, device 13, function 0:
Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C (rev 16).
IRQ 11.
Master Capable. Latency=32. Min Gnt=32.Max Lat=64.
I/O at 0xec00 [0xecff].
Non-prefetchable 32 bit memory at 0xee004000 [0xee0040ff].
Bus 0, device 15, function 0:
FireWire (IEEE 1394): Texas Instruments TSB12LV23 IEEE-1394 Controller (rev 0).
IRQ 11.
Master Capable. Latency=32. Min Gnt=15.Max Lat=15.
Non-prefetchable 32 bit memory at 0xee005000 [0xee0057ff].
Non-prefetchable 32 bit memory at 0xee000000 [0xee003fff].
Bus 1, device 5, function 0:
VGA compatible controller: nVidia Corporation NV11 (GeForce2 MX) (rev 178).
IRQ 10.
Master Capable. Latency=32. Min Gnt=5.Max Lat=1.
Non-prefetchable 32 bit memory at 0xec000000 [0xecffffff].
Prefetchable 32 bit memory at 0xe0000000 [0xe7ffffff].

processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 6
model name : AMD Athlon(tm) XP 2000+
stepping : 2
cpu MHz : 1675.283
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips : 3342.33




2002-08-14 12:34:18

by Alan

[permalink] [raw]
Subject: Re: GA-7DX+ crashes

On Wed, 2002-08-14 at 13:12, Matt Bernstein wrote:
> We're very much at a loss as to why the 60 new PCs we've bought largely
> don't run Linux (various 2.4 kernels including 2.4.19, limbo1-BOOT) for
> very long without crashing. One of them seems to work OK; its /proc/pci is
> identical, but the batch number on the southbridge seems one lower--is
> this dodgy VIA hardware again? We'll be trying a different IDE controller
> next, but 60 of those ain't cheap..

My immediate assumption would be a batch of bad hardware or faulty bios

> Has anyone else had success or failure stories in particular with this
> motherboard? We don't really have a significant number of data points just
> yet, but are willing to try pretty much anything anyone might suggest!
>
> symptoms
> - random data corruption (sometimes memory, more often HDD)
> - somtimes oopsing, but never in the same place

We've seen similar on pure VIA chipset machines. The kernel has fixes to
work around the hardware problems there. We have no information on
workarounds (or if they are needed) for the AMD/VIA combo other than the
fact that the APIC cannot be used on them according to AMD docs.

> what we think we've ascertained so far
> - they pass memtest86
> - we've tried different HDDs, no effect
> - tried ide=nodma, possibly makes it crash after longer
> - tried noapic, no effect
> - tried all sorts of BIOS settings, no effect (except--possibly--turning
> off the on board IDE controller and playing nfsroot games)
> - ..and yet they seem to run that other OS fine :-(
> - extra cooling/underclocking doesn't seem to help
> - seems to be fs-independent (tried ext3, reiserfs, jfs)

Boot it up in linu grab an lspci with the full config space of each
chip. Next boot it into windows and do the same. That may show what is
being patched up. New BIOS versions may also help

Other things to try

Disable all ACPI/APM
Disable any BIOS 'usb keyboard/mouse' support

Finally if they still don't work and your purchase order stated that you
intended to also run Linux on them then they are "not fit for the
purpose for which they were sold". If your purchase order didnt mention
that detail then someone wants kicking.


2002-08-14 12:45:50

by grendel

[permalink] [raw]
Subject: Re: GA-7DX+ crashes

On Wed, Aug 14, 2002 at 01:12:43PM +0100, Matt Bernstein scribbled:
> Hi,
>
> We're very much at a loss as to why the 60 new PCs we've bought largely
> don't run Linux (various 2.4 kernels including 2.4.19, limbo1-BOOT) for
[snip]
> Has anyone else had success or failure stories in particular with this
> motherboard? We don't really have a significant number of data points just
> yet, but are willing to try pretty much anything anyone might suggest!
>
> Matt
>
> symptoms
> - random data corruption (sometimes memory, more often HDD)
> - somtimes oopsing, but never in the same place
>
> what we think we've ascertained so far
> - they pass memtest86
> - we've tried different HDDs, no effect
> - tried ide=nodma, possibly makes it crash after longer
> - tried noapic, no effect
> - tried all sorts of BIOS settings, no effect (except--possibly--turning
> off the on board IDE controller and playing nfsroot games)
> - ..and yet they seem to run that other OS fine :-(
> - extra cooling/underclocking doesn't seem to help
> - seems to be fs-independent (tried ext3, reiserfs, jfs)
I've had very similar (actually identical) problems with the ASUS A7V333
mobo. The mobo is completely VIA-based (both north and south) but the
southbridge seems to be much the same. What I did to make the machine run
stable was to - turn the USB2 support off (by hardware, a solder point on
the mobo) and short the solder point which is responsible for the CPU
functional settings data readout (the ROMSIP setting) to read the data from
a BIOS table instead of from the CPU itself. That seemed to have been enough
for me - now the mobo is stable (I'm gonna get rid of it, though...).

Another thing you might check is the CPU voltage - make sure it is the
standard 3.3 and not 3.5 as some manufacturers set it.

hope that helps a bit,

marek


Attachments:
(No filename) (1.79 kB)
(No filename) (189.00 B)
Download all attachments

2002-08-14 17:05:07

by Alex Davis

[permalink] [raw]
Subject: Re: GA-7DX+ crashes

>We have no information on
>workarounds (or if they are needed) for the AMD/VIA combo other than the
>fact that the APIC cannot be used on them according to AMD docs.

Really?? I have an Epox 8k7a (AMD761 North / VIA South) that I've been using for
over a year now with APIC configured. In fact I'm sure I remember (haven't checked
recently) one of the boot messages being 'found and enabled local APIC'. Am I
missing something??



__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

2002-08-14 18:37:22

by Mikael Pettersson

[permalink] [raw]
Subject: Re: GA-7DX+ crashes

On wed, 14 Aug 2002 10:08:50 -0700 (PDT), Alex Davis wrote:
>>We have no information on
>>workarounds (or if they are needed) for the AMD/VIA combo other than the
>>fact that the APIC cannot be used on them according to AMD docs.
>
>Really?? I have an Epox 8k7a (AMD761 North / VIA South) that I've been using for
>over a year now with APIC configured. In fact I'm sure I remember (haven't checked
>recently) one of the boot messages being 'found and enabled local APIC'. Am I
>missing something??

Alan was a bit careless in his wording. He most likely meant the
IO-APIC, which is broken in some/all(?) AMD north / VIA south combos.
AMD has an errata sheet on this issue.

/Mikael

2002-08-16 12:12:57

by Matt Bernstein

[permalink] [raw]
Subject: Re: GA-7DX+ crashes

On Aug 14 Alan Cox wrote:
>On Wed, 2002-08-14 at 13:12, Matt Bernstein wrote:
>> We're very much at a loss as to why the 60 new PCs we've bought largely
>> don't run Linux (various 2.4 kernels including 2.4.19, limbo1-BOOT) for
>> very long without crashing. One of them seems to work OK; its /proc/pci is
>> identical, but the batch number on the southbridge seems one lower--is
>> this dodgy VIA hardware again? We'll be trying a different IDE controller
>> next, but 60 of those ain't cheap..
>
>My immediate assumption would be a batch of bad hardware or faulty bios

The former seems to be the case, after talking to Gigbayte. (We got
Windows to crash much more spectacularly after stressing it a little
harder--that gets the company who sold them to us to come over personally
:)

Thanks very much--hopefully our dual-boot lab will continue to exist for
the next academic year!

Matt