2001-02-05 19:55:41

by Peter Horton

[permalink] [raw]
Subject: VIA silent disk corruption - bad news

The patch doesn't work for me. Maybe I need to disable some more of
those North bridge features :-(

Oh bum. Back to testing with "normal" ...

P.

----- CORRUPTING SETUP -----

00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02)
Subsystem: Asustek Computer, Inc.: Unknown device 8033
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR+
Latency: 0 set
Region 0: Memory at e4000000 (32-bit, prefetchable) [size=64M]
Capabilities: [a0] AGP version 2.0
Status: RQ=31 SBA+ 64bit- FW- Rate=421
Command: RQ=0 SBA- AGP- 64bit- FW- Rate=
Capabilities: [c0] Power Management version 2
Flags: PMEClk- AuxPwr- DSI- D1- D2- PME-
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 06 11 05 03 06 00 10 a2 02 00 00 06 00 00 00 00
10: 08 00 00 e4 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 33 80
30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 17 a4 6b b4 4f 81 08 08 80 00 04 08 08 08 08 08
60: 03 ff 00 a0 52 e5 e5 00 44 7c 86 0f 08 3f 00 00
70: de 80 cc 0c 0e a1 d2 00 01 b4 11 02 00 00 00 00
80: 0f 40 00 00 c0 00 00 00 02 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 02 c0 20 00 07 02 00 1f 00 00 00 00 6e 02 04 00
b0: 59 ec 80 b5 32 33 28 00 00 00 00 00 00 00 00 00
c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 0e 22 00 00 00 00 00 91 06

----- DIFF FOR NON-CORRUPTING SETUP -----

@@ -5,7 +5,7 @@
Latency: 0 set
Region 0: Memory at e4000000 (32-bit, prefetchable) [size=64M]
Capabilities: [a0] AGP version 2.0
- Status: RQ=31 SBA+ 64bit- FW- Rate=421
+ Status: RQ=31 SBA+ 64bit- FW- Rate=21
Command: RQ=0 SBA- AGP- 64bit- FW- Rate=
Capabilities: [c0] Power Management version 2
Flags: PMEClk- AuxPwr- DSI- D1- D2- PME-
@@ -15,12 +15,12 @@
20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 33 80
30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-50: 17 a4 6b b4 4f 81 08 08 80 00 04 08 08 08 08 08
-60: 03 ff 00 a0 52 e5 e5 00 44 7c 86 0f 08 3f 00 00
-70: de 80 cc 0c 0e a1 d2 00 01 b4 11 02 00 00 00 00
+50: 17 a4 6b b4 06 81 08 08 80 00 04 08 08 08 08 08
+60: 03 ff 00 a0 50 e4 e4 00 40 78 86 0f 08 3f 00 00
+70: d8 80 cc 0c 0e a1 d2 00 01 b4 01 02 00 00 00 00
80: 0f 40 00 00 c0 00 00 00 02 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-a0: 02 c0 20 00 07 02 00 1f 00 00 00 00 6e 02 04 00
+a0: 02 c0 20 00 03 02 00 1f 00 00 00 00 6e 02 00 00
b0: 59 ec 80 b5 32 33 28 00 00 00 00 00 00 00 00 00
c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


2001-02-05 22:03:28

by Udo A. Steinberg

[permalink] [raw]
Subject: Re: VIA silent disk corruption - bad news

Peter Horton wrote:
>
> The patch doesn't work for me. Maybe I need to disable some more of
> those North bridge features :-(
>
> Oh bum. Back to testing with "normal" ...

FWIW, here's the output of my lspci for A7V with working 1003 BIOS
and still no corruption (after 2 hours stresstest).

00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02)
Subsystem: Asustek Computer, Inc.: Unknown device 8033
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR+
Latency: 0
Region 0: Memory at e0000000 (32-bit, prefetchable) [size=128M]
Capabilities: [a0] AGP version 2.0
Status: RQ=31 SBA+ 64bit- FW+ Rate=x1,x2
Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 06 11 05 03 06 00 10 a2 02 00 00 06 00 00 00 00
10: 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 33 80
30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00

I'll leave the comparing work to you. If you need more info, just holler.

-Udo.

2001-02-05 22:09:11

by Udo A. Steinberg

[permalink] [raw]
Subject: Re: VIA silent disk corruption - bad news

"Udo A. Steinberg" wrote:
>
> FWIW, here's the output of my lspci for A7V with working 1003 BIOS
> and still no corruption (after 2 hours stresstest).

Bugger, forgot the end bit. Here's it again:

00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02)
Subsystem: Asustek Computer, Inc.: Unknown device 8033
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR+
Latency: 0
Region 0: Memory at e0000000 (32-bit, prefetchable) [size=128M]
Capabilities: [a0] AGP version 2.0
Status: RQ=31 SBA+ 64bit- FW+ Rate=x1,x2
Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 06 11 05 03 06 00 10 a2 02 00 00 06 00 00 00 00
10: 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 33 80
30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 17 a4 eb b4 06 81 10 10 88 00 04 08 0c 10 10 10
60: 0f ff 0f b0 e6 e6 e5 00 40 78 86 0f 08 7f 00 00
70: de c0 cc 0c 0e a1 d2 00 01 b4 11 02 00 00 00 01
80: 0f 40 00 00 80 00 00 00 03 00 4c 01 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 02 c0 20 00 17 02 00 1f 00 00 00 00 6a 02 14 00
b0: 5a ec 80 a5 32 33 28 00 00 00 00 00 00 00 00 00
c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 0e 22 00 00 00 00 00 91 06

-Udo.

2001-02-06 13:49:06

by Richard B. Johnson

[permalink] [raw]
Subject: Re: VIA silent disk corruption - bad news

On Mon, 5 Feb 2001, Udo A. Steinberg wrote:

> Peter Horton wrote:
> >
> > The patch doesn't work for me. Maybe I need to disable some more of
> > those North bridge features :-(
> >
> > Oh bum. Back to testing with "normal" ...
>
> FWIW, here's the output of my lspci for A7V with working 1003 BIOS
> and still no corruption (after 2 hours stresstest).
>
> 00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02)
> Subsystem: Asustek Computer, Inc.: Unknown device 8033
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
> Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR+
> Latency: 0
> Region 0: Memory at e0000000 (32-bit, prefetchable) [size=128M]
> Capabilities: [a0] AGP version 2.0
> Status: RQ=31 SBA+ 64bit- FW+ Rate=x1,x2
> Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>
> Capabilities: [c0] Power Management version 2
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> 00: 06 11 05 03 06 00 10 a2 02 00 00 06 00 00 00 00
> 10: 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 33 80
> 30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00
>
> I'll leave the comparing work to you. If you need more info, just holler.
>
> -Udo.

I have found a way to create the file-system problem! Just build
the kernel and, from another terminal execute `sync`. The build will
fail with multiple errors caused by corrupted files. The files
are not actually corrupted, though. Just restart the build and it
will complete.

So there is something that `sync` triggers that results in apparent
file system corruption. Once files are re-read, they are fine.
This does not explain the permanent file-system corruption that
sometimes occurs --that may occur during the `umount`.

I don't use IDE. I use BusLogic SCSI, 250 Mb RAM, Pentium-III(2), SMP.
Linux 2.4.1 (no postfix).

PCI stuff.....

Device Vendor Type
0 Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge
I/O memory : 0xe6000000->0xe7fffff7
1 Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge
I/O memory : 0x40010100->0x470101ff
I/O memory : 0x22a0d0e0->0x1fffdfef
I/O memory : 0xe5c0e5d0->0xe5cfe5df
I/O memory : 0xe5f0e600->0xe5ffe60f
4 Intel Corporation 82371AB PIIX4 ISA
9 S3 Inc. 86c968 [Vision 968 VRAM] rev 0
IRQ 15 Pin 1
I/O memory : 0x14000000->0x15ffffff
10 Advanced Micro Devices [AMD] 79c970 [PCnet LANCE]
IRQ 12 Pin 1
I/O ports : 0xd000->0xd01e
I/O memory : 0xe1800000->0xe180001f
11 3Com Corporation 3c905B 100BaseTX [Cyclone]
IRQ 10 Pin 1
I/O ports : 0xb800->0xb87e
I/O memory : 0xe1000000->0xe100007f
12 BusLogic BT-946C (BA80C30), [MultiMaster 10]
IRQ 11 Pin 1
I/O ports : 0xb400->0xb402
I/O memory : 0xe0800000->0xe0800fff




Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2001-02-06 14:25:06

by Udo A. Steinberg

[permalink] [raw]
Subject: Re: VIA silent disk corruption - bad news

Petr Vandrovec wrote:
>
> On 5 Feb 01 at 23:08, Udo A. Steinberg wrote:
>
> > 00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02)
> > Subsystem: Asustek Computer, Inc.: Unknown device 8033
> > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
> > Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium
> >TAbort- <TAbort- <MAbort+ >SERR- <PERR+
> ^^^^^^
> I tried all different settings in BIOS, and I even programmed values
> from your lspci to my VIA (except for SDRAM timmings) - and although
> it is a bit better, it is not still perfect.

> So for today I'm back on [UMS]DMA disabled. I'll try downgrading BIOS
> today, but it looks to me like that something is severely broken here.

Are your drives connected to the VIA or the Promise controller? Mine
are both connected to the PDC20265 and running in UDMA-100 mode. There
have been several threads on lkml about corruption on disks connected
to Via chipset IDE controllers, although I didn't follow them in great
detail. Maybe your problem is not related to the host bridge, but to
the IDE controller?

-Udo.

2001-02-06 14:35:08

by Petr Vandrovec

[permalink] [raw]
Subject: Re: VIA silent disk corruption - bad news

On 6 Feb 01 at 15:24, Udo A. Steinberg wrote:
> Petr Vandrovec wrote:
> > On 5 Feb 01 at 23:08, Udo A. Steinberg wrote:
> >
> > > 00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02)
> > > Subsystem: Asustek Computer, Inc.: Unknown device 8033
> > > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
> > > Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium
> > >TAbort- <TAbort- <MAbort+ >SERR- <PERR+
> > ^^^^^^
> > I tried all different settings in BIOS, and I even programmed values
> > from your lspci to my VIA (except for SDRAM timmings) - and although
> > it is a bit better, it is not still perfect.
>
> > So for today I'm back on [UMS]DMA disabled. I'll try downgrading BIOS
> > today, but it looks to me like that something is severely broken here.
>
> Are your drives connected to the VIA or the Promise controller? Mine
> are both connected to the PDC20265 and running in UDMA-100 mode. There
> have been several threads on lkml about corruption on disks connected
> to Via chipset IDE controllers, although I didn't follow them in great
> detail. Maybe your problem is not related to the host bridge, but to
> the IDE controller?

They are connected to Promise, I reserved VIA for CDROM drive.
One HDD runs in UDMA5 mode, another in UDMA2. Corruption is often
when I run md5sum in parallel on both HDDs - in that case almost
no file generates same checksum which was generated using PIO4.
When I run md5sum on only one HDD, there are about 4 checksum errors
in 6GB of data. But I'm more and more inclined to throw this A7V away,
as it is impossible to get datasheet from Promise, and for VIA host
bridge I was just able to slow down normal system operation by factor
of 3... but still with same corruption :-( Just if page could have
4092 and not 4096 bytes ;-)
Petr