2007-10-20 06:55:57

by Soeren Sonnenburg

[permalink] [raw]
Subject: sata sil3114 vs. certain seagate drives results in filesystem corruptions

Dear all,

I finally managed to find a *reproducible* setup and way to trigger
random corruptions using a sata sil 3114 controller connected to 4
seagate drives

port 1: ST3400832AS sda
port 2: ST3400620AS sdb
port 3: ST3750640AS sdc
port 4: ST3750640AS sdd

sda & sdb form md0 via a raid1 setup followed by an additional
devicemapper layer ( root ). sdc and sdb are separate and also have an
additional device mapper layer ( public ) and ( backups ).

Now when I write large files of zeros to root(sda&sdb) and read the file
back in it contains a few nonzero entries:

# dd if=/dev/zero of=/foo bs=1M count=2000
# hexdump /foo
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
<after >1GB random parts, within large blocks of zeroes>

I can reliably trigger this on the md0 / devmapper-root setup when I
write about 2GB of data (note that this machine has 1.5G of memory - and
still 1GB is often enough to see this problem). Here it does not matter
where in the filesystem I do these writes.

As a test I did the same on sdc / devmapper-public and
sdd/devmapper-backups with even 30G of zeros. Nothing, no errors
everything is perfectly OK.

So I thought that this is also the the sil mod15write problem
http://home-tj.org/wiki/index.php/Sil_m15w and applied patches 1 & 2
from http://lkml.org/lkml/2007/10/11/115 (adding my two disks) and
rebooted. Now there was some MOD15 stuff in dmesg for the two disks but
still apart from the disks being even slower it was of no use - the
corruption problem was still there (I then also tried patch 3 from Bernd
but that immediately caused oopses fs/errors). So it looks like the
problem I am having is different...

Now I remembered that this machine also has two idle promise pdc20376
sata ports where I first tried the ST3400832AS (sda) and ST3400620AS
(sdb) on about a year ago
http://lists.openwall.net/linux-kernel/2006/08/27/106 . At that time I
just saw random error messages and then finally hangs - quoting Tejon
Heo:

"I see. your drive is reporting error for some reason and libata is
failing to recover."

Now promise_sata is converted to new EH, so I simply gave it a go, i.e.
I attached ST3400832AS and ST3400620AS to the promise controller and
rebooted and redid the experiments from above.

No data corruptions whatsoever. I even ran the dd on all three devmapped
mount points simultaneously with a size of 30GB each, still no
corruption. However the error messages I've seen a year ago are back for
the ST3400832AS and ST3400620AS attached to the promise controller (see
below).

Please find all the details below:

- uname

Linux 2.6.23.1 #3 PREEMPT Fri Oct 19 20:39:45 CEST 2007 i686 GNU/Linux

- lspci

00:0e.0 RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
00:0e.0 0104: 1095:3114 (rev 02)

00:08.0 RAID bus controller: Promise Technology, Inc. PDC20376 (FastTrak 376) (rev 02)
00:08.0 0104: 105a:3376 (rev 02)

- proc interrupts

17: 4434549 IO-APIC-fasteoi sata_promise, sata_sil, ohci1394

- dmesg

sata_sil 0000:00:0e.0: version 2.3
ACPI: PCI Interrupt 0000:00:0e.0[A] -> GSI 17 (level, low) -> IRQ 17
sata_sil 0000:00:0e.0: Applying R_ERR on DMA activate FIS errata fix
scsi3 : sata_sil
scsi4 : sata_sil
scsi5 : sata_sil
scsi6 : sata_sil
ata4: SATA max UDMA/100 cmd 0xf882e080 ctl 0xf882e08a bmdma 0xf882e000 irq 17
ata5: SATA max UDMA/100 cmd 0xf882e0c0 ctl 0xf882e0ca bmdma 0xf882e008 irq 17
ata6: SATA max UDMA/100 cmd 0xf882e280 ctl 0xf882e28a bmdma 0xf882e200 irq 17
ata7: SATA max UDMA/100 cmd 0xf882e2c0 ctl 0xf882e2ca bmdma 0xf882e208 irq 17
ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata4.00: ATA-7: ST3400832AS, 3.01, max UDMA/133
ata4.00: 781422768 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata4.00: configured for UDMA/100
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata5.00: ATA-7: ST3400620AS, 3.AAE, max UDMA/133
ata5.00: 781422768 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata5.00: configured for UDMA/100
ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata6.00: ATA-7: ST3750640AS, 3.AAE, max UDMA/133
ata6.00: 1465149168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata6.00: configured for UDMA/100
ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata7.00: ATA-7: ST3750640AS, 3.AAC, max UDMA/133
ata7.00: 1465149168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata7.00: configured for UDMA/100
scsi 3:0:0:0: Direct-Access ATA ST3400832AS 3.01 PQ: 0 ANSI: 5
sd 3:0:0:0: [sda] 781422768 512-byte hardware sectors (400088 MB)
sd 3:0:0:0: [sda] Write Protect is off
sd 3:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 3:0:0:0: [sda] 781422768 512-byte hardware sectors (400088 MB)
sd 3:0:0:0: [sda] Write Protect is off
sd 3:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sda: unknown partition table
sd 3:0:0:0: [sda] Attached SCSI disk
sd 3:0:0:0: Attached scsi generic sg0 type 0
scsi 4:0:0:0: Direct-Access ATA ST3400620AS 3.AA PQ: 0 ANSI: 5
sd 4:0:0:0: [sdb] 781422768 512-byte hardware sectors (400088 MB)
sd 4:0:0:0: [sdb] Write Protect is off
sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 4:0:0:0: [sdb] 781422768 512-byte hardware sectors (400088 MB)
sd 4:0:0:0: [sdb] Write Protect is off
sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sdb: unknown partition table
sd 4:0:0:0: [sdb] Attached SCSI disk
sd 4:0:0:0: Attached scsi generic sg1 type 0
scsi 5:0:0:0: Direct-Access ATA ST3750640AS 3.AA PQ: 0 ANSI: 5
sd 5:0:0:0: [sdc] 1465149168 512-byte hardware sectors (750156 MB)
sd 5:0:0:0: [sdc] Write Protect is off
sd 5:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 5:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 5:0:0:0: [sdc] 1465149168 512-byte hardware sectors (750156 MB)
sd 5:0:0:0: [sdc] Write Protect is off
sd 5:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 5:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sdc: unknown partition table
sd 5:0:0:0: [sdc] Attached SCSI disk
sd 5:0:0:0: Attached scsi generic sg2 type 0
scsi 6:0:0:0: Direct-Access ATA ST3750640AS 3.AA PQ: 0 ANSI: 5
sd 6:0:0:0: [sdd] 1465149168 512-byte hardware sectors (750156 MB)
sd 6:0:0:0: [sdd] Write Protect is off
sd 6:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 6:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 6:0:0:0: [sdd] 1465149168 512-byte hardware sectors (750156 MB)
sd 6:0:0:0: [sdd] Write Protect is off
sd 6:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 6:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sdd: unknown partition table
sd 6:0:0:0: [sdd] Attached SCSI disk
sd 6:0:0:0: Attached scsi generic sg3 type 0


- promise errors:

ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x100 action 0x2
ata1.00: port_status 0x20200000
ata1.00: cmd 25/00:00:c0:b6:74/00:01:20:00:00/e0 tag 0 cdb 0x0 data 131072 in
res 51/0c:00:c0:b6:74/0c:01:20:00:00/e0 Emask 0x10 (ATA bus error)
ata1: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: configured for UDMA/133
ata1: EH complete
sd 0:0:0:0: [sda] 781422768 512-byte hardware sectors (400088 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x100 action 0x2
ata2.00: port_status 0x20200000
ata2.00: cmd c8/00:00:40:16:fe/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in
res 51/0c:00:40:16:fe/00:00:00:00:00/e1 Emask 0x10 (ATA bus error)
ata2: soft resetting port
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: configured for UDMA/133
ata2: EH complete
sd 1:0:0:0: [sdb] 781422768 512-byte hardware sectors (400088 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x100 action 0x2
ata2.00: port_status 0x20200000
ata2.00: cmd c8/00:50:58:25:e3/00:00:00:00:00/ec tag 0 cdb 0x0 data 40960 in
res 51/0c:50:58:25:e3/00:00:00:00:00/ec Emask 0x10 (ATA bus error)
ata2: soft resetting port
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: configured for UDMA/133
ata2: EH complete
sd 1:0:0:0: [sdb] 781422768 512-byte hardware sectors (400088 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x100 action 0x2
ata2.00: port_status 0x20200000
ata2.00: cmd c8/00:08:b0:54:b0/00:00:00:00:00/ed tag 0 cdb 0x0 data 4096 in
res 51/0c:08:b0:54:b0/00:00:00:00:00/ed Emask 0x10 (ATA bus error)
ata2: soft resetting port
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: configured for UDMA/133
ata2: EH complete
sd 1:0:0:0: [sdb] 781422768 512-byte hardware sectors (400088 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x100 action 0x2
ata2.00: port_status 0x20200000
ata2.00: cmd 25/00:00:f8:af:c0/00:02:12:00:00/e0 tag 0 cdb 0x0 data 262144 in
res 51/0c:00:f8:af:c0/0c:02:12:00:00/e0 Emask 0x10 (ATA bus error)
ata2: soft resetting port
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: configured for UDMA/133
ata2: EH complete
sd 1:0:0:0: [sdb] 781422768 512-byte hardware sectors (400088 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x100 action 0x2
ata2.00: port_status 0x20200000
ata2.00: cmd c8/00:00:60:f5:e2/00:00:00:00:00/e2 tag 0 cdb 0x0 data 131072 in
res 51/0c:00:60:f5:e2/0c:02:12:00:00/e2 Emask 0x10 (ATA bus error)
ata2: soft resetting port
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: configured for UDMA/133
ata2: EH complete
sd 1:0:0:0: [sdb] 781422768 512-byte hardware sectors (400088 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x100 action 0x2
ata2.00: port_status 0x20200000
ata2.00: cmd 25/00:f0:80:54:da/00:00:12:00:00/e0 tag 0 cdb 0x0 data 122880 in
res 51/0c:f0:80:54:da/0c:00:12:00:00/e0 Emask 0x10 (ATA bus error)
ata2: soft resetting port
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: configured for UDMA/133
ata2: EH complete
sd 1:0:0:0: [sdb] 781422768 512-byte hardware sectors (400088 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Open for suggestions/ideas,
Soeren.


2007-10-22 02:13:17

by Tejun Heo

[permalink] [raw]
Subject: Re: sata sil3114 vs. certain seagate drives results in filesystem corruptions

Helo,

Soeren Sonnenburg wrote:
> I finally managed to find a *reproducible* setup and way to trigger
> random corruptions using a sata sil 3114 controller connected to 4
> seagate drives
>
> port 1: ST3400832AS sda
> port 2: ST3400620AS sdb
> port 3: ST3750640AS sdc
> port 4: ST3750640AS sdd
>
> sda & sdb form md0 via a raid1 setup followed by an additional
> devicemapper layer ( root ). sdc and sdb are separate and also have an
> additional device mapper layer ( public ) and ( backups ).
>
> Now when I write large files of zeros to root(sda&sdb) and read the file
> back in it contains a few nonzero entries:
>
> # dd if=/dev/zero of=/foo bs=1M count=2000
> # hexdump /foo
> 0000000 0000 0000 0000 0000 0000 0000 0000 0000
> *
> <after >1GB random parts, within large blocks of zeroes>
>
> I can reliably trigger this on the md0 / devmapper-root setup when I
> write about 2GB of data (note that this machine has 1.5G of memory - and
> still 1GB is often enough to see this problem). Here it does not matter
> where in the filesystem I do these writes.

Thanks. I'll try to reproduce the problem here. What's your motherboard?

> Now promise_sata is converted to new EH, so I simply gave it a go, i.e.
> I attached ST3400832AS and ST3400620AS to the promise controller and
> rebooted and redid the experiments from above.
>
> No data corruptions whatsoever. I even ran the dd on all three devmapped
> mount points simultaneously with a size of 30GB each, still no
> corruption. However the error messages I've seen a year ago are back for
> the ST3400832AS and ST3400620AS attached to the promise controller (see
> below).
[--snip--]
> ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x100 action 0x2
> ata1.00: port_status 0x20200000
> ata1.00: cmd 25/00:00:c0:b6:74/00:01:20:00:00/e0 tag 0 cdb 0x0 data 131072 in
> res 51/0c:00:c0:b6:74/0c:01:20:00:00/e0 Emask 0x10 (ATA bus error)
> ata1: soft resetting port

Yeah, still the same. Your drives don't like the way promise controller
speaks to them (e.g. promise generates signals which are ) but now that
sata_promise has proper EH. It can recover from those errors. As long
as nothing worse happens, it should be okay.

Thanks.

--
tejun

2007-10-22 05:56:27

by Soeren Sonnenburg

[permalink] [raw]
Subject: Re: sata sil3114 vs. certain seagate drives results in filesystem corruptions

On Mon, 2007-10-22 at 11:12 +0900, Tejun Heo wrote:
> Helo,
>
> Soeren Sonnenburg wrote:
> > I finally managed to find a *reproducible* setup and way to trigger
> > random corruptions using a sata sil 3114 controller connected to 4
> > seagate drives
> >
> > port 1: ST3400832AS sda
> > port 2: ST3400620AS sdb
> > port 3: ST3750640AS sdc
> > port 4: ST3750640AS sdd
> >
> > sda & sdb form md0 via a raid1 setup followed by an additional
> > devicemapper layer ( root ). sdc and sdb are separate and also have an
> > additional device mapper layer ( public ) and ( backups ).
> >
> > Now when I write large files of zeros to root(sda&sdb) and read the file
> > back in it contains a few nonzero entries:
> >
> > # dd if=/dev/zero of=/foo bs=1M count=2000
> > # hexdump /foo
> > 0000000 0000 0000 0000 0000 0000 0000 0000 0000
> > *
> > <after >1GB random parts, within large blocks of zeroes>
> >
> > I can reliably trigger this on the md0 / devmapper-root setup when I
> > write about 2GB of data (note that this machine has 1.5G of memory - and
> > still 1GB is often enough to see this problem). Here it does not matter
> > where in the filesystem I do these writes.
>
> Thanks. I'll try to reproduce the problem here. What's your motherboard?

It is an asus a7v8x with a AMD Athlon(TM) XP 3000+ and admittingly
almost completely filled pci slots (4 dvb cards, 1 with the sil3114; 1
empty; in the agp slot a radeon 9200). Nevertheless I would not expect
the power supply to be the problem (it got replaced recently by a 500W
one), enough cooling (it is winter in germany + several fans).

> > Now promise_sata is converted to new EH, so I simply gave it a go, i.e.
> > I attached ST3400832AS and ST3400620AS to the promise controller and
> > rebooted and redid the experiments from above.
> >
> > No data corruptions whatsoever. I even ran the dd on all three devmapped
> > mount points simultaneously with a size of 30GB each, still no
> > corruption. However the error messages I've seen a year ago are back for
> > the ST3400832AS and ST3400620AS attached to the promise controller (see
> > below).
> [--snip--]
> > ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x100 action 0x2
> > ata1.00: port_status 0x20200000
> > ata1.00: cmd 25/00:00:c0:b6:74/00:01:20:00:00/e0 tag 0 cdb 0x0 data 131072 in
> > res 51/0c:00:c0:b6:74/0c:01:20:00:00/e0 Emask 0x10 (ATA bus error)
> > ata1: soft resetting port
>
> Yeah, still the same. Your drives don't like the way promise controller
> speaks to them (e.g. promise generates signals which are ) but now that
> sata_promise has proper EH. It can recover from those errors. As long
> as nothing worse happens, it should be okay.

These errors only appear when I generate some stress (like with the dd).
The machine is now up 2 days 8hrs and no further such warnings in the
log.

Soeren

2007-10-22 09:48:22

by Bernd Schubert

[permalink] [raw]
Subject: Re: sata sil3114 vs. certain seagate drives results in filesystem corruptions

Hello,

On Monday 22 October 2007 04:12:44 Tejun Heo wrote:
> Helo,
>
> Soeren Sonnenburg wrote:
> > I finally managed to find a *reproducible* setup and way to trigger
> > random corruptions using a sata sil 3114 controller connected to 4
> > seagate drives
> >
> > port 1: ST3400832AS sda
> > port 2: ST3400620AS sdb
> > port 3: ST3750640AS sdc
> > port 4: ST3750640AS sdd
> >
> > sda & sdb form md0 via a raid1 setup followed by an additional
> > devicemapper layer ( root ). sdc and sdb are separate and also have an
> > additional device mapper layer ( public ) and ( backups ).
> >
> > Now when I write large files of zeros to root(sda&sdb) and read the file
> > back in it contains a few nonzero entries:
> >
> > # dd if=/dev/zero of=/foo bs=1M count=2000
> > # hexdump /foo
> > 0000000 0000 0000 0000 0000 0000 0000 0000 0000
> > *
> > <after >1GB random parts, within large blocks of zeroes>
> >
> > I can reliably trigger this on the md0 / devmapper-root setup when I
> > write about 2GB of data (note that this machine has 1.5G of memory - and
> > still 1GB is often enough to see this problem). Here it does not matter
> > where in the filesystem I do these writes.

Thats almost the same test as I'm always doing. Only I do not write only 2GB,
but as much as it fits onto the disk. On reading back this file, the
filesystem will report errors somewhere between 50GB and 230GB (disk size is
250GB).

>
> Thanks. I'll try to reproduce the problem here. What's your motherboard?

All tested S2882 boards here.


Cheers,
Bernd

--
Bernd Schubert
Q-Leap Networks GmbH

2007-10-22 10:36:45

by Soeren Sonnenburg

[permalink] [raw]
Subject: Re: sata sil3114 vs. certain seagate drives results in filesystem corruptions

On Mon, 2007-10-22 at 11:48 +0200, Bernd Schubert wrote:
> Hello,
>
> On Monday 22 October 2007 04:12:44 Tejun Heo wrote:
> > Helo,
> > [...]
> > > Now when I write large files of zeros to root(sda&sdb) and read the file
> > > back in it contains a few nonzero entries:
> > >
> > > # dd if=/dev/zero of=/foo bs=1M count=2000
> > > # hexdump /foo
> > > 0000000 0000 0000 0000 0000 0000 0000 0000 0000
> > > *
> > > <after >1GB random parts, within large blocks of zeroes>
> > >
> > > I can reliably trigger this on the md0 / devmapper-root setup when I
> > > write about 2GB of data (note that this machine has 1.5G of memory - and
> > > still 1GB is often enough to see this problem). Here it does not matter
> > > where in the filesystem I do these writes.
>
> Thats almost the same test as I'm always doing. Only I do not write only 2GB,

Well when I read your mail I thought that I could be seeing exactly the
same bug... it still may be. However ``my'' problem does not go away
with the mod15fix ...

> but as much as it fits onto the disk. On reading back this file, the
> filesystem will report errors somewhere between 50GB and 230GB (disk size is
> 250GB).

Wow, I really see lots of corruptions (well every 1-2 GB a couple of
bytes are corrupted). Are you getting similiarly many in the 50G - 230G
region?

> > Thanks. I'll try to reproduce the problem here. What's your motherboard?
>
> All tested S2882 boards here.

I assume all equipped with lots of memory and mostly empty pci slots?

Soeren

2007-10-22 10:59:40

by Bernd Schubert

[permalink] [raw]
Subject: Re: sata sil3114 vs. certain seagate drives results in filesystem corruptions

On Monday 22 October 2007 12:36:32 Soeren Sonnenburg wrote:
> On Mon, 2007-10-22 at 11:48 +0200, Bernd Schubert wrote:
> > Hello,
> >
> > On Monday 22 October 2007 04:12:44 Tejun Heo wrote:
> > > Helo,
> > > [...]
> > >
> > > > Now when I write large files of zeros to root(sda&sdb) and read the
> > > > file back in it contains a few nonzero entries:
> > > >
> > > > # dd if=/dev/zero of=/foo bs=1M count=2000
> > > > # hexdump /foo
> > > > 0000000 0000 0000 0000 0000 0000 0000 0000 0000
> > > > *
> > > > <after >1GB random parts, within large blocks of zeroes>
> > > >
> > > > I can reliably trigger this on the md0 / devmapper-root setup when I
> > > > write about 2GB of data (note that this machine has 1.5G of memory -
> > > > and still 1GB is often enough to see this problem). Here it does not
> > > > matter where in the filesystem I do these writes.
> >
> > Thats almost the same test as I'm always doing. Only I do not write only
> > 2GB,
>
> Well when I read your mail I thought that I could be seeing exactly the
> same bug... it still may be. However ``my'' problem does not go away
> with the mod15fix ...

Yeah, pity it did not fix it :( I will try to port Tejuns patch
(http://home-tj.org/wiki/index.php/Sil_m15w#Patches) to 2.6.23 today or
tomorrow. If you are testing anyway, could you then also try this?

>
> > but as much as it fits onto the disk. On reading back this file, the
> > filesystem will report errors somewhere between 50GB and 230GB (disk size
> > is 250GB).
>
> Wow, I really see lots of corruptions (well every 1-2 GB a couple of
> bytes are corrupted). Are you getting similiarly many in the 50G - 230G
> region?
>
> > > Thanks. I'll try to reproduce the problem here. What's your
> > > motherboard?
> >
> > All tested S2882 boards here.
>
> I assume all equipped with lots of memory and mostly empty pci slots?

Yes, all pci-slots are free and the systems to have between 4 and 16GB memory
(ecc, monitored with edac). Well, those are cluster systems (actually tyan
names those B2882).
Do you think the configuration is related? Here it also happens with odirect,
we tested this to minimize memory effects.


Cheers,
Bernd


--
Bernd Schubert
Q-Leap Networks GmbH

2007-10-22 11:02:24

by Bernd Schubert

[permalink] [raw]
Subject: Re: sata sil3114 vs. certain seagate drives results in filesystem corruptions

On Monday 22 October 2007 12:36:32 Soeren Sonnenburg wrote:
> > but as much as it fits onto the disk. On reading back this file, the
> > filesystem will report errors somewhere between 50GB and 230GB (disk size
> > is 250GB).
>
> Wow, I really see lots of corruptions (well every 1-2 GB a couple of
> bytes are corrupted). Are you getting similiarly many in the 50G - 230G
> region?

I never tested what is corrupted. Well, a diff over 250GB would take quite a
lot of time...

--
Bernd Schubert
Q-Leap Networks GmbH

2007-10-22 12:56:29

by Soeren Sonnenburg

[permalink] [raw]
Subject: Re: sata sil3114 vs. certain seagate drives results in filesystem corruptions

On Mon, 2007-10-22 at 13:02 +0200, Bernd Schubert wrote:
> On Monday 22 October 2007 12:36:32 Soeren Sonnenburg wrote:
> > > but as much as it fits onto the disk. On reading back this file, the
> > > filesystem will report errors somewhere between 50GB and 230GB (disk size
> > > is 250GB).
> >
> > Wow, I really see lots of corruptions (well every 1-2 GB a couple of
> > bytes are corrupted). Are you getting similiarly many in the 50G - 230G
> > region?
>
> I never tested what is corrupted. Well, a diff over 250GB would take quite a
> lot of time...

Actually hexdump does not display duplicate lines, so if your file is
really all zeros it will only display a single line + the count, however
I think it is not so optimized...

Soeren

2007-10-23 07:00:56

by Soeren Sonnenburg

[permalink] [raw]
Subject: Re: sata sil3114 vs. certain seagate drives results in filesystem corruptions

On Mon, 2007-10-22 at 12:59 +0200, Bernd Schubert wrote:
> On Monday 22 October 2007 12:36:32 Soeren Sonnenburg wrote:
> > On Mon, 2007-10-22 at 11:48 +0200, Bernd Schubert wrote:
> > > Hello,
> > >
> > > On Monday 22 October 2007 04:12:44 Tejun Heo wrote:
> > > > Helo,
> > > > [...]
> > > >
> > > > > Now when I write large files of zeros to root(sda&sdb) and read the
> > > > > file back in it contains a few nonzero entries:
> > > > >
> > > > > # dd if=/dev/zero of=/foo bs=1M count=2000
> > > > > # hexdump /foo
> > > > > 0000000 0000 0000 0000 0000 0000 0000 0000 0000
> > > > > *
> > > > > <after >1GB random parts, within large blocks of zeroes>
> > > > >
> > > > > I can reliably trigger this on the md0 / devmapper-root setup when I
> > > > > write about 2GB of data (note that this machine has 1.5G of memory -
> > > > > and still 1GB is often enough to see this problem). Here it does not
> > > > > matter where in the filesystem I do these writes.
> > >
> > > Thats almost the same test as I'm always doing. Only I do not write only
> > > 2GB,
> >
> > Well when I read your mail I thought that I could be seeing exactly the
> > same bug... it still may be. However ``my'' problem does not go away
> > with the mod15fix ...
>
> Yeah, pity it did not fix it :( I will try to port Tejuns patch
> (http://home-tj.org/wiki/index.php/Sil_m15w#Patches) to 2.6.23 today or
> tomorrow. If you are testing anyway, could you then also try this?

Hmmhh, dmesg said the m15 fix was turned on (at least it appeared for
the 2 drives in question in dmesg), so I fear it is something different.
On the other hand this is a 'production' machine so I am not too eager
to try very experimental things...

> > > but as much as it fits onto the disk. On reading back this file, the
> > > filesystem will report errors somewhere between 50GB and 230GB (disk size
> > > is 250GB).
> >
> > Wow, I really see lots of corruptions (well every 1-2 GB a couple of
> > bytes are corrupted). Are you getting similiarly many in the 50G - 230G
> > region?
> >
> > > > Thanks. I'll try to reproduce the problem here. What's your
> > > > motherboard?
> > >
> > > All tested S2882 boards here.
> >
> > I assume all equipped with lots of memory and mostly empty pci slots?
>
> Yes, all pci-slots are free and the systems to have between 4 and 16GB memory
> (ecc, monitored with edac). Well, those are cluster systems (actually tyan
> names those B2882).
> Do you think the configuration is related? Here it also happens with odirect,
> we tested this to minimize memory effects.

Mine is just a a7v8x with via KT400 chipset... really old, but several
of the pci slots are filled, so the problem may be more likely to happen
it may happen here... on the other hand I never tried writing 50-250G on
the drives I considered OK. Will do. Also what could be helpful is that
we both see patterns in the corruptions, like corruptions are always 512
bytes long or so (IIRC in my case they were only up to 64 bytes).

Soeren