2000-12-01 11:35:22

by Gerard Sharp

[permalink] [raw]
Subject: HPT366 + SMP = slight corruption in 2.3.99 - 2.4.0-11

Hello.
[1.] One line summary of the problem:
Intermittent corruption of 4 bytes in SMP kernels using HPT366

[2.] Full description of the problem/report:
First noticed in 2.3.99-preX; but hard to track down then.
When the system was under load - e.g. cp /usr/src/linux /usr/src/l2,
it would occasionally and randomly corrupt some files; possibly multiple
times per file; possibly multiple files. always exactly 4 bytes would be
altered per corruption.
Nothing shows up in logs; no oopses; no messages.
Tests on 2.3.99 found the problem to be unreproducable on UP kernels
Tests on the current kernel found the problem to be unreproducable on
the BX chipset's own ATA33 controller.

[3.] Keywords (i.e., modules, networking, kernel):
IDE, HPT366, EXT2, SMP, Corruption, Worrying

[4.] Kernel version (from /proc/version):
#cat /proc/version
Linux version 2.4.0-test11-ac4-smp (root@midnight) (gcc version
egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #2 SMP Tue Nov 28
22:38:21 NZDT 2000

[5.]
Nada

[6.] A small shell script or example program which triggers the
problem (if possible)
cp /usr/src/linux /usr/src/l2 ; diff -dur /usr/src/linux /usr/src/l2
shows the problem up if diff produces any output
system may 'survive' two copies (I tend to use a different, uncached
kernel for each attempt - to rule out/minimise the effect of caching)
but 'fail' the third.
where 'survive' = no corruption; 'fail' = some / lots of corruption.
High memory usage increases likelihood; hitting swap at ALL seems to
increase likelihood (swap on same drive)

[7.] Environment
Redhat 6.2 basis.
Abit BP6 Motherboard.
Dual Celeron 466's
128 Mb ram; 13.6 Gb Seagate Barracuda HDD
"hda: ST313620A, ATA DISK drive"
CD-ROM on hdd

[7.1.] Software (add the output of the ver_linux script here)

-- Versions installed: (if some fields are empty or look
-- unusual then possibly you have very old versions)
Linux midnight 2.4.0-test11-ac4-smp #2 SMP Tue Nov 28 22:38:21 NZDT 2000
i686 unknown
Kernel modules 2.3.13
Gnu C egcs-2.91.66
Gnu Make 3.78.1
Binutils 2.9.5.0.22
Linux C Library 2.1.3
Dynamic linker ldd (GNU libc) 2.1.3
Procps 2.0.6
Mount 2.10q
Net-tools 1.54
Console-tools 0.3.3
Sh-utils 2.0

[7.2.] Processor information (from /proc/cpuinfo):

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 6
model name : Celeron (Mendocino)
stepping : 5
cpu MHz : 467.000741
cache size : 128 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr
bogomips : 933.89

processor : 0
vendor_id : GenuineIntel
...

[7.3.] Module information (from /proc/modules):
Doesn't Impact Problem.

[7.4.] Loaded driver and hardware information (/proc/ioports,
/proc/iomem)
#cat /proc/ioports
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
01f0-01f7 : ide0
0220-022f : soundblaster
02f8-02ff : serial(auto)
0376-0376 : ide1
03c0-03df : vga+
03c0-03df : matrox
03f6-03f6 : ide0
03f8-03ff : serial(auto)
0cf8-0cff : PCI conf1
4000-403f : Intel Corporation 82371AB PIIX4 ACPI
5000-501f : Intel Corporation 82371AB PIIX4 ACPI
5000-5007 : piix4-smbus
d000-d01f : Intel Corporation 82371AB PIIX4 USB
d400-d4ff : Realtek Semiconductor Co., Ltd. RTL-8139
d400-d4ff : eth0
d800-d807 : Triones Technologies, Inc. HPT366
dc00-dc03 : Triones Technologies, Inc. HPT366
e000-e0ff : Triones Technologies, Inc. HPT366
e000-e007 : ide2
e010-e0ff : HPT366
e400-e407 : Triones Technologies, Inc. HPT366 (#2)
e800-e803 : Triones Technologies, Inc. HPT366 (#2)
ec00-ecff : Triones Technologies, Inc. HPT366 (#2)
ec00-ec07 : ide3
ec10-ecff : HPT366
f000-f00f : Intel Corporation 82371AB PIIX4 IDE
f000-f007 : ide0
f008-f00f : ide1

#cat /proc/iomem
00000000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000f0000-000fffff : System ROM
00100000-07ffffff : System RAM
00100000-0021232f : Kernel code
00212330-002239ff : Kernel data
e0000000-e3ffffff : Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge
e4000000-e4003fff : Matrox Graphics, Inc. MGA 1064SG [Mystique]
e4000000-e4003fff : matroxfb MMIO
e5000000-e57fffff : Matrox Graphics, Inc. MGA 1064SG [Mystique]
e5000000-e57fffff : matroxfb FB
e6000000-e67fffff : Matrox Graphics, Inc. MGA 1064SG [Mystique]
e9000000-e90000ff : Realtek Semiconductor Co., Ltd. RTL-8139
e9000000-e90000ff : eth0
fec00000-fec00fff : reserved
fee00000-fee00fff : reserved
ffff0000-ffffffff : reserved


[7.5.] PCI information ('lspci -vvv' as root)
===
#lspci -vvv | less
00:00.0 Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge
(rev 03
)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Step
ping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort
- <MAbort+ >SERR- <PERR-
Latency: 32 set
Region 0: Memory at e0000000 (32-bit, prefetchable) [size=64M]
Capabilities: [a0] AGP version 1.0
Status: RQ=31 SBA+ 64bit- FW- Rate=x1,x2
Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>

00:01.0 PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge
(rev 03)
(prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Step
ping- SERR+ FastB2B-
Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort
- <MAbort- >SERR- <PERR-
Latency: 64 set
Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
I/O behind bridge: 0000f000-00000fff
Memory behind bridge: fff00000-000fffff
Prefetchable memory behind bridge: fff00000-000fffff
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B+

00:07.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop-
ParErr- Step
ping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort
- <MAbort- >SERR- <PERR-
Latency: 0 set

00:07.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
(prog-if 80
[Master])
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Step
ping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort
- <MAbort- >SERR- <PERR-
Latency: 32 set
Region 4: I/O ports at f000 [size=16]
00:07.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
(prog-if 00
[UHCI])
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Step
ping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort
- <MAbort- >SERR- <PERR-
Latency: 32 set
Interrupt: pin D routed to IRQ 19
Region 4: I/O ports at d000 [size=32]

00:07.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Step
ping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort
- <MAbort- >SERR- <PERR-

00:0b.0 VGA compatible controller: Matrox Graphics, Inc. MGA 1064SG
[Mystique] (
rev 02) (prog-if 00 [VGA])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Step
ping+ SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort
- <MAbort- >SERR- <PERR-
Latency: 32 set
Interrupt: pin A routed to IRQ 18
Region 0: Memory at e4000000 (32-bit, non-prefetchable)
[size=16K]
Region 1: Memory at e5000000 (32-bit, prefetchable) [size=8M]
Region 2: Memory at e6000000 (32-bit, non-prefetchable)
[size=8M]
Expansion ROM at <unassigned> [disabled] [size=64K]

00:0f.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139
(rev 10)
Subsystem: Realtek Semiconductor Co., Ltd. RT8139
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Step
ping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort
- <MAbort- >SERR- <PERR-
Latency: 32 min, 64 max, 32 set
Interrupt: pin A routed to IRQ 16
Region 0: I/O ports at d400 [size=256]
Region 1: Memory at e9000000 (32-bit, non-prefetchable)
[size=256]
Capabilities: [50] Power Management version 2
Flags: PMEClk- AuxPwr- DSI- D1+ D2+ PME-
Status: D0 PME-Enable+ DSel=0 DScale=0 PME-
Capabilities: [60] Vital Product Data
00:13.0 Unknown mass storage controller: Triones Technologies, Inc.
HPT366 (rev
01)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Step
ping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort
- <MAbort- >SERR- <PERR-
Latency: 8 min, 8 max, 120 set, cache line size 08
Interrupt: pin A routed to IRQ 18
Region 0: I/O ports at d800 [size=8]
Region 1: I/O ports at dc00 [size=4]
Region 4: I/O ports at e000 [size=256]
Expansion ROM at e8000000 [disabled] [size=128K]

00:13.1 Unknown mass storage controller: Triones Technologies, Inc.
HPT366 (rev
01)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Step
ping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort
- <MAbort- >SERR- <PERR-
Latency: 8 min, 8 max, 120 set, cache line size 08
Interrupt: pin B routed to IRQ 18
Region 0: I/O ports at e400 [size=8]
Region 1: I/O ports at e800 [size=4]
Region 4: I/O ports at ec00 [size=256]
===

[7.6.] SCSI information (from /proc/scsi/scsi)
Nada

[7.7.]
snippets from dmesg:
=== <hard drive on hde> ===
HPT366: onboard version of chipset, pin1=1 pin2=2
HPT366: IDE controller on PCI bus 00 dev 98
PCI: Enabling device 00:13.0 (0005 -> 0007)
HPT366: chipset revision 1
HPT366: not 100% native mode: will probe irqs later
ide2: BM-DMA at 0xe000-0xe007, BIOS settings: hde:DMA, hdf:pio
HPT366: IDE controller on PCI bus 00 dev 99
HPT366: chipset revision 1
HPT366: not 100% native mode: will probe irqs later
ide3: BM-DMA at 0xec00-0xec07, BIOS settings: hdg:pio, hdh:pio
hdd: FX240S, ATAPI CDROM drive
hde: ST313620A, ATA DISK drive
ide1 at 0x170-0x177,0x376 on irq 15
ide2 at 0xd800-0xd807,0xdc02 on irq 18
hde: 26692776 sectors (13667 MB) w/512KiB Cache, CHS=26480/16/63,
UDMA(66)
=== </hard drive on hde> ===

=== <hard drive on hda> ===
HPT366: onboard version of chipset, pin1=1 pin2=2
HPT366: IDE controller on PCI bus 00 dev 98
PCI: Enabling device 00:13.0 (0005 -> 0007)
HPT366: chipset revision 1
HPT366: not 100% native mode: will probe irqs later
ide2: BM-DMA at 0xe000-0xe007, BIOS settings: hde:pio, hdf:pio
HPT366: IDE controller on PCI bus 00 dev 99
HPT366: chipset revision 1
HPT366: not 100% native mode: will probe irqs later
ide3: BM-DMA at 0xec00-0xec07, BIOS settings: hdg:pio, hdh:pio
hda: ST313620A, ATA DISK drive
hdd: FX240S, ATAPI CDROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: 26692776 sectors (13667 MB) w/512KiB Cache, CHS=1661/255/63,
UDMA(33)
=== </hard drive on hda> ===


[X.] Other notes, patches, fixes, workarounds:

Only current workaround is to avoid the HPT chip :(

I can't help but worry that (especially after the volume of this email)
it's a simple problem / my fault - however; I have not seen anything
specific to this in the past few months.

I can offer to help debug; but my time is limited due to the twin evils
of Work and Sleep; and I don't have too many leads what with no error
output; just silent corruption :(


Gerard Sharp
Two Penguins at 1024x768


2000-12-02 17:11:12

by Scott Prader

[permalink] [raw]
Subject: Re: HPT366 + SMP = slight corruption in 2.3.99 - 2.4.0-11


On Sat, 02 Dec 2000 00:04:27 +1300, Gerard Sharp blurted forth:

> Hello.
> [1.] One line summary of the problem:
> Intermittent corruption of 4 bytes in SMP kernels using HPT366
[snip]
> [7.] Environment
> Redhat 6.2 basis.
> Abit BP6 Motherboard.
> Dual Celeron 466's
> 128 Mb ram; 13.6 Gb Seagate Barracuda HDD
> "hda: ST313620A, ATA DISK drive"
> CD-ROM on hdd
[snip]

Have you tried updating the bios on the bp6? This solved a LOT of
problems for me, and afaik, ru is the latest... if you need a hand with
it, i've put together a dos boot disk with everything you'll need at:
http://garson.org/~gnea/bp6-biosupdate.img

just dd if=bp6-biosupdate.img of=/dev/fd0 and boot it, run awdflash.exe
and tell it to use bp6_ru.bin when it asks for a file... have it back
up the current bios (just in case) and reboot when ready.. you'll of
course need to go into the bios on reboot and reset everything to
defaults, then go thru and re-tweak (this is the proper method.. not
doing so can create further problems) all of your settings until it's
satisfactorily set... also, the overclocking might be a bad thing in
this case unless you have the proper cooling for it (lm-sensors is
great for this sort of thing :) there's a neat wm applet called wmbp6
too) so u may want to try clocking it straight at 300 for awhile and
see what effect that has.. hope this helps

--
.oO gnea at rochester dot rr dot com Oo.
.oO url: http://garson.org/~gnea Oo.

"You can tune a filesystem, but you can't tuna fish" -unknown

2000-12-04 16:42:03

by Gerard Sharp

[permalink] [raw]
Subject: Re: HPT366 + SMP = slight corruption in 2.3.99 - 2.4.0-11

Gnea wrote:
> > [1.] One line summary of the problem:
> > Intermittent corruption of 4 bytes in SMP kernels using HPT366
> [snip]
> Have you tried updating the bios on the bp6? This solved a LOT of
> problems for me, and afaik, ru is the latest...

RU seems the latest. Flashed bios as per your nicely detailed
instructions.
No improvement in condition, alas.

> also, the overclocking might be a bad thing in this case unless you
> have the proper cooling for it (lm-sensors is great for this sort of
> thing :) there's a neat wm applet called wmbp6 too) so u may want to
> try clocking it straight at 300 for awhile and see what effect that
> has.. hope this helps

Err. "300(66)" probably won't help too much. The cpu's are multiplier
locked (love that one, Intel); so will run at 7 * 66 (466) when set to
chip defaults - which I currently am - to rule out flaky / stressed
hardware.

Temperatures are a nice frosty 30 deg C across all the temperature
sensors lm_sensors offers; and the thermal probe I have dangling in the
psu exhaust :)
[this is with two rc5des clients running - loadavg of 2 - btw]

Back to the original topic, I've done some more 'research'; and I'm not
_certain_ of my findings, but there's a few coincidences here...

I think I'll make a more general post to lkml direct in a minute.


Gerard Sharp
Two penguins at 1024x768

2000-12-04 21:19:44

by Dan Hollis

[permalink] [raw]
Subject: Re: HPT366 + SMP = slight corruption in 2.3.99 - 2.4.0-11

On Tue, 5 Dec 2000, Gerard Sharp wrote:
> Gnea wrote:
> > > [1.] One line summary of the problem:
> > > Intermittent corruption of 4 bytes in SMP kernels using HPT366
> > [snip]
> > Have you tried updating the bios on the bp6? This solved a LOT of
> > problems for me, and afaik, ru is the latest...
> RU seems the latest. Flashed bios as per your nicely detailed
> instructions.
> No improvement in condition, alas.

HPT366 on BP6 is just broken. Corruption and lockups happen under
microsoft-windoze as well.

-Dan

2000-12-04 21:58:49

by Richard Torkar

[permalink] [raw]
Subject: Re: HPT366 + SMP = slight corruption in 2.3.99 - 2.4.0-11

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dan Hollis wrote:

> On Tue, 5 Dec 2000, Gerard Sharp wrote:
> > Gnea wrote:
> > > > [1.] One line summary of the problem:
> > > > Intermittent corruption of 4 bytes in SMP kernels using HPT366
> > > [snip]
> > > Have you tried updating the bios on the bp6? This solved a LOT of
> > > problems for me, and afaik, ru is the latest...
> > RU seems the latest. Flashed bios as per your nicely detailed
> > instructions.
> > No improvement in condition, alas.
>
> HPT366 on BP6 is just broken. Corruption and lockups happen under
> microsoft-windoze as well.
>

Not my experience Dan.

I've used my BP6 + HPT366 for a while now and I haven't had on lockup.
No corruption either.

Presently I use 2.4.0-test11-p4 and I have been following the 2.3.* kernel
since the day I got the BP6.

I have two Celeron 500 which are *not* o/c.
I have seti@home running on this box 24/7.
I use the latest BIOS.

I guess I'm lucky *grin*



/Richard
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6LAwsUSLExYo23RsRAtY+AKCOuqpfcSa73zzpHQfddSY/7JG8IACffPRe
UzfNUJ7t3y2jdsS4jmS4Ggg=
=FdqO
-----END PGP SIGNATURE-----


2000-12-04 22:11:22

by Dan Hollis

[permalink] [raw]
Subject: Re: HPT366 + SMP = slight corruption in 2.3.99 - 2.4.0-11

On Mon, 4 Dec 2000, Richard Torkar wrote:
> Dan Hollis wrote:
> > On Tue, 5 Dec 2000, Gerard Sharp wrote:
> > > Gnea wrote:
> > > > > [1.] One line summary of the problem:
> > > > > Intermittent corruption of 4 bytes in SMP kernels using HPT366
> > > > [snip]
> > > > Have you tried updating the bios on the bp6? This solved a LOT of
> > > > problems for me, and afaik, ru is the latest...
> > > RU seems the latest. Flashed bios as per your nicely detailed
> > > instructions.
> > > No improvement in condition, alas.
> > HPT366 on BP6 is just broken. Corruption and lockups happen under
> > microsoft-windoze as well.
> Not my experience Dan.
> I've used my BP6 + HPT366 for a while now and I haven't had on lockup.
> No corruption either.
> I guess I'm lucky *grin*

Your 1 success out of maybe 500-1000 peoples failures. Not exactly a great
average for this motherboard. BP6 is notorious for instability, HPT366 on
it is about 50% of the problems.

-Dan

2000-12-04 22:22:23

by Mike Dresser

[permalink] [raw]
Subject: Re: HPT366 + SMP = slight corruption in 2.3.99 - 2.4.0-11

Agreed. I've got one of these beasts running NT Server, dual 433 non o/c,
4x12.7 gig software raid. Before i put the Promise Ultra/33 card in, i was
using the HPT366. Random lockups every couple weeks. Stopped using the
HPT366, machine is stable now. In hindsight, I think the HPT366 was the cause
of the Onstream 50 gig drive that locked up frequently too, before i shipped
that back to Onstream. One thing that did help on stability was putting a cpu
fan on the chipset.

Dan Hollis wrote:

>
> Your 1 success out of maybe 500-1000 peoples failures. Not exactly a great
> average for this motherboard. BP6 is notorious for instability, HPT366 on
> it is about 50% of the problems.
>
> -Dan

2000-12-06 11:05:04

by Gerard Sharp

[permalink] [raw]
Subject: Re: HPT366 + SMP = slight corruption in 2.3.99 - 2.4.0-11

Dan Hollis wrote:

> > No improvement in condition, alas.
> HPT366 on BP6 is just broken. Corruption and lockups happen under
> microsoft-windoze as well.

I think I'll run with leaving the HDD on the ATA-33 controller.
After all; the "100% speedup" isn't really that
A) noticable, or
B) worth this.

> -Dan

Thanks to all the little people that helped - Appreciated :)


Gerard Sharp
Two Penguins at 1024x768

2000-12-06 11:59:05

by [email protected]

[permalink] [raw]
Subject: Re: HPT366 + SMP = slight corruption in 2.3.99 - 2.4.0-11

On 2.2.17 I had good luck with BP6.

EX:

[root@animal /root]# uptime
3:17am up 51 days, 20:17, 2 users, load average: 0.00, 0.04, 0.02

from /proc/pci:

Unknown mass storage controller: Triones Technologies, Inc. HPT366 IDE
Ultra
DMA/66 (rev 1).
Medium devsel. IRQ 5. Master Capable. Latency=120. Min Gnt=8.Max
Lat=8
.
I/O at 0xd400 [0xd401].
I/O at 0xd800 [0xd801].
I/O at 0xdc00 [0xdc01].
Bus 0, device 19, function 1:

Can you post/e-mail any additional details about the lockups. I am very
curious about this. We have a huge mosix cluster in production with BP6
mobo's and have plans to upgrade to 2.4 as soon as an official stable
kernel is released.

[email protected]
On Wed, 6 Dec 2000, Gerard Sharp wrote:

> Dan Hollis wrote:
>
> > > No improvement in condition, alas.
> > HPT366 on BP6 is just broken. Corruption and lockups happen under
> > microsoft-windoze as well.
>
> I think I'll run with leaving the HDD on the ATA-33 controller.
> After all; the "100% speedup" isn't really that
> A) noticable, or
> B) worth this.
>
> > -Dan
>
> Thanks to all the little people that helped - Appreciated :)
>
>
> Gerard Sharp
> Two Penguins at 1024x768
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
>

2000-12-07 09:54:27

by Gerard Sharp

[permalink] [raw]
Subject: Re: HPT366 + SMP = slight corruption in 2.3.99 - 2.4.0-11

"[email protected]" wrote:
> On 2.2.17 I had good luck with BP6.

Never tried the 2.2 series with this controller - probably should get
around to doing that this weekend. :S

> EX:
> [root@animal /root]# uptime
> 3:17am up 51 days, 20:17, 2 users, load average: 0.00, 0.04, 0.02

Uptime proves nothing alas; I could get 30+ days if I wanted :S

> Can you post/e-mail any additional details about the lockups. I am very
> curious about this. We have a huge mosix cluster in production with BP6
> mobo's and have plans to upgrade to 2.4 as soon as an official stable
> kernel is released.

The problem is not one of lockups. The system in question doesn't lock;
it doesn't crash; it doesn't even log anything at the time - not even
APIC errors.
Instead it quietly and silently corrupts exactly four bytes at a time;
mostly on the last four bytes of a 4096 block...

I can most easily cause the corruption by copying a large amount of
known data across the disk, and then checking for differences:
cp -aR /usr/src/linux /usr/src/l2 ; diff -dur /usr/src/linux /usr/src/l2

When it does corrupt in this manner, it is not so much of a concern - I
can detect, delete, and recopy the corrupted file(s).
What is more worrying is, what if it is quietly silently and happily
corrupting other data - when I'm NOT staring at it with a paranoid
concern; for example, compiling binaries; or altering large data
files...

As such, I cannot at this time reduce the problem to one of Software or
Hardware Error; and while it is A Major Problem in my eyes; I avoid it
currently by not using the hpt366 controller :)


Gerard Sharp
Two Penguins at 1024x768