2002-01-18 15:17:09

by Jani Forssell

[permalink] [raw]
Subject: VIA KT133 & HPT 370 IDE disk corruption

We first reported disk corruption with a VIA KT133A based board (Abit KT7A)

http://marc.theaimsgroup.com/?l=linux-kernel&m=100651892331843&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=100669782329815&w=2

and then switched to a VIA KT133 board (Abit KT7) that showed the same
symptoms. It finally seems to be working, so I'm going to try to summarise
our
experiences.

The test configuration is:

VIA KT133
kernels: stock 2.4.18pre2, 2.2.21pre2 and 2.2.20 (both with Hedrick
IDE patch 05042001 )
Two hdds and a cdrom attached to the onboard hpt 370 controller
3com 905b-tx
Matrox G200 AGP

In this configuration, we could force an oops

http://marc.theaimsgroup.com/?l=linux-kernel&m=101052001508211&w=2

with all the kernels we tested on each boot by running:

cat /dev/hde > null &
cat /dev/hdg > null &
ping -f -s 64000 otherMachine &

It usually took about 15 seconds for the oops to trigger. We also verified
that wgetting a file (instead of ping -f) on the local 100mbit network would
trigger the oops.

The peculiar thing is that with certain BIOS settings the disk read & write
test didn't show errors, even when left running over the weekend. But when
the ping -f was launched, it started immediately showing disk corruption
(from
4 to ~1000 bytes in 64 megabyte blocks). The data corruption most likely
happened when both disks were read in parallel. We weren't able to trigger
disk corruption with disks on VIA IDE (686A & 686B).

It turned out that the main culprit was the NIC that was attached to PCI
slot
4. Moving it to slot 3 resolved the disk corruption as well as the oopses
that
occured. Other PCI slots to avoid for the NIC were 5 and 6. Slot 4 & 6
shares
an IRQ with the VIA USB controller, but I did try disabling it from the BIOS
but it didn't help (lspci didn't show the device after it had been
disabled).
Slot 5 shares and IRQ with the Highpoint controller.

Finally, we tested that it works with an Adaptec 2940UW SCSI card in PCI
slot
1 and the NIC in PCI slot 3.

More details on request. Does anyone have any idea what causes this?

Jani Forssell


2002-01-19 00:36:19

by Tim Moore

[permalink] [raw]
Subject: Re: VIA KT133 & HPT 370 IDE disk corruption

Jani Forssell wrote:
>
> We first reported disk corruption with a VIA KT133A based board (Abit KT7A)
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=100651892331843&w=2
> http://marc.theaimsgroup.com/?l=linux-kernel&m=100669782329815&w=2
>
> ...
> It turned out that the main culprit was the NIC that was attached to PCI
> slot
> 4. Moving it to slot 3 resolved the disk corruption as well as the oopses
> that
> occured. Other PCI slots to avoid for the NIC were 5 and 6. Slot 4 & 6
> shares
> an IRQ with the VIA USB controller, but I did try disabling it from the BIOS
> but it didn't help (lspci didn't show the device after it had been
> disabled).
> Slot 5 shares and IRQ with the Highpoint controller.
>
> Finally, we tested that it works with an Adaptec 2940UW SCSI card in PCI
> slot
> 1 and the NIC in PCI slot 3.
>
> More details on request. Does anyone have any idea what causes this?

My BP6's [hpt366] had similar sustained I/O lockup issues, especially
when running a RAID stripe. From the v1.01 BP6 manual:
...
PCI slots 4 and 5 use the same bus master control signal.

PCI slot 3 shares IRQ signals with the HPT366 IDE controller
(Ultra ATA/66). The driver for the HPT 366 IDE controller
supports IRQ sharing with other PCI devices. But if you
install a PCI card that doesn t allow IRQ sharing with other
devices into PCI slot 3, you may encounter some problems.
Furthermore, if your Operating System doesn t allow peripheral
devices to share IRQ signals with each other--Windows NT for
example, you can t install a PCI card into PCI slot 3.
...

Of course, I didn't read this until much later.

rgds,
tim.
--

2002-01-19 00:55:23

by Aaron Tiensivu

[permalink] [raw]
Subject: Re: VIA KT133 & HPT 370 IDE disk corruption

> My BP6's [hpt366] had similar sustained I/O lockup issues, especially
> when running a RAID stripe. From the v1.01 BP6 manual:

Unfortunately, I suspect that is due to the older HPT drivers still in the
current kernels (the HPT366 is a very broken beast by design, and from what
I've gathered from others, is that Abit did poor job connecting it into the
BP6)

Another reason for those lockups could be due to the noisy APIC bus on the
BP6.

As much as I love my BP6, as an "ultimate dirty hack not approved by Intel"
motherboard, it has its flaws.
I'm just thankful it is still running. :)


2002-01-19 01:05:49

by Tim Moore

[permalink] [raw]
Subject: Re: VIA KT133 & HPT 370 IDE disk corruption

Aaron Tiensivu wrote:
>
> > My BP6's [hpt366] had similar sustained I/O lockup issues, especially
> > when running a RAID stripe. From the v1.01 BP6 manual:
>
> Unfortunately, I suspect that is due to the older HPT drivers still in the
> current kernels (the HPT366 is a very broken beast by design, and from what
> I've gathered from others, is that Abit did poor job connecting it into the
> BP6)
>
> Another reason for those lockups could be due to the noisy APIC bus on the
> BP6.
>
> As much as I love my BP6, as an "ultimate dirty hack not approved by Intel"
> motherboard, it has its flaws.
> I'm just thankful it is still running. :)

Yes, the board you love to hate. I also did the EC10 capacitor fix
which is why I still have no heart to retire/upgrade them.

--

2002-01-19 11:02:23

by Ville Herva

[permalink] [raw]
Subject: Re: VIA KT133 & HPT 370 IDE disk corruption

On Fri, Jan 18, 2002 at 04:35:48PM -0800, you [Tim Moore] claimed:
> Jani Forssell wrote:
> >
> > It turned out that the main culprit was the NIC that was attached to PCI
> > slot 4. Moving it to slot 3 resolved the disk corruption as well as the
> > oopses that occured. Other PCI slots to avoid for the NIC were 5 and 6.
> > Slot 4 & 6 shares an IRQ with the VIA USB controller, but I did try
> > disabling it from the BIOS but it didn't help (lspci didn't show the
> > device after it had been disabled). Slot 5 shares and IRQ with the
> > Highpoint controller.
>
> My BP6's [hpt366] had similar sustained I/O lockup issues, especially
> when running a RAID stripe. From the v1.01 BP6 manual:
> ...
> PCI slots 4 and 5 use the same bus master control signal.
>
> PCI slot 3 shares IRQ signals with the HPT366 IDE controller
> (Ultra ATA/66). The driver for the HPT 366 IDE controller
> supports IRQ sharing with other PCI devices. But if you
> install a PCI card that doesn t allow IRQ sharing with other
> devices into PCI slot 3, you may encounter some problems.

Note that culprit wasn't the slot that shares an irq with the highpoint
controllers (HPT370 on this board). We knew to avoid that slow from the
beginning (I have a BP6 at home), but I think we tried slot 5 out of
interest after we had verified slot 3 works. I think slot 5 showed the
problem as well - Jani?

Anyhow, we were more puzzled as to how the VIA USB controller, that is
disabled in both BIOS and kernel config can cause these problems. Or is
there something else wrong with the board's pci routing?

As regards to BP6, I find it bearably stable after upgrading to latest bios
ages ago (was it RU or what that solved the lock up issue). It still locks
up once or twice a month - which I can live with. But I digress.


-- v --

[email protected]

2002-01-19 20:50:20

by Jani Forssell

[permalink] [raw]
Subject: Re: VIA KT133 & HPT 370 IDE disk corruption

> Note that culprit wasn't the slot that shares an irq with the highpoint
> controllers (HPT370 on this board). We knew to avoid that slow from the
> beginning (I have a BP6 at home), but I think we tried slot 5 out of
> interest after we had verified slot 3 works. I think slot 5 showed the
> problem as well - Jani?

That's right, when the NIC was in slot 4, 5 or 6, it oopsed almost
immediately when both the drives on HPT370 IDE and the NIC
were stressed simultaneously.

Jani Forssell