2003-11-01 08:28:02

by Ville Herva

[permalink] [raw]
Subject: ide write cache issue? [Re: Something corrupts raid5 disks slightly during reboot]

On Fri, Oct 31, 2003 at 07:41:30PM -0600, you [Jeffrey E. Hundstad] wrote:
> Try:
>
> hdparm -W0 /dev/hdX
>
> for each of your ide drives. This turns off write-caching which is
> usually a bad thing with ide drives anyway.

According to hdparm, write caching is indeed enabled for all the drives.
I find it somewhat odd if this was the cause, though. Before reboot, the
drives were not being written to for quite a while (the fs had been
unmounted and the raid array had been stopped.)

I suppose it _is_ possible that the drives were updating the ext2 superblock
from their write cache when power went off. The md5sum of first 1MB of the
drives was probably in sync before reboot because I got it from kernel's
cache (or drive's cache), although the up-to-date data had not been written
onto the platter yet. Also, as this is a raid5 array, one of the drives
could have been clean because the ext2 superblock (that I assume was being
updated) is physically located on only two of the drives.

I can try to turn of write caching well before next reboot. I don't
suppose there is a way to boot so that the write caching would be off all
the time - the best I can do is turn it off early in boot scripts, no?

Does anyone know if there is a crucial write caching / flushing fix in
2.4/2.6 that hasn't been merged into 2.2 (I am using the newest 2.4 ide
backport from Krzysztof Ol?dzk (ide-2.2.21-06162002)).

I don't suppose there is a away to explicitly flush the IDE drive write
cache from user space?

Or is this likely to be a drive firmware problem (kernel tries to flush the
drives, but they don't do it early enough?) How long do ide drives normally
hold data in write cache if they are idle?

The drives are SAMSUNG SV8004H, FwRev=QR100-07, fwiw.

Turning off write caching permanently doesn't sound inviting though, as
it'll probably ruin the raid performance completely...


-- v --

[email protected]


2003-11-01 15:56:14

by Willy Tarreau

[permalink] [raw]
Subject: Re: ide write cache issue? [Re: Something corrupts raid5 disks slightly during reboot]

Hi Ville,

do you have the ability to reboot this beast on a DOS floppy equiped with a
disk editor or even debug ? It would tell you wether it's the IDE
initialization or shutdown which harms the disks. BTW, it may even be your
bios which believes for an unknown reason that it has to write to the
partition table which is not one.

just my 2 cents,
Willy

2003-11-01 18:25:35

by Ville Herva

[permalink] [raw]
Subject: Re: ide write cache issue? [Re: Something corrupts raid5 disks slightly during reboot]

On Sat, Nov 01, 2003 at 04:56:04PM +0100, you [Willy Tarreau] wrote:
> Hi Ville,
>
> do you have the ability to reboot this beast on a DOS floppy equiped with a
> disk editor or even debug ?

I have been planning (as someone else suggested) to boot to a different
kernel, but unfortunately I think my off-the-shelf solution, knoppix, won't
do as it probably includes raid autodetection in its kernel, and I'd rather
rule raidstart out as well.

Is there anything special in booting to DOS instead of different linux
kernel, other than that it would rule out some strange kernel bug that is
present in 2.2 and 2.4?

> BTW, it may even be your bios which believes for an unknown reason that it
> has to write to the partition table which is not one.

Yes, but I find it unlikely. The partition table in within the first 512
bytes and the corruption was in bytes 1060-1080. Also, one of the corrupted
disks is on i815 and another in on HPT370.

BTW: the corruption happens on warm reboots (running reboot command), not
just on power off / on.


-- v --

[email protected]

2003-11-01 19:01:23

by Willy Tarreau

[permalink] [raw]
Subject: Re: ide write cache issue? [Re: Something corrupts raid5 disks slightly during reboot]

On Sat, Nov 01, 2003 at 08:25:18PM +0200, Ville Herva wrote:

> Is there anything special in booting to DOS instead of different linux
> kernel, other than that it would rule out some strange kernel bug that is
> present in 2.2 and 2.4?

No, it was just to quicky confirm or deny the fact that it's the kernel
which causes the problem. It could have been a long standing bug in the IDE
or partition code, and which is present in several kernels. But as you say
that it affects two different controllers, there's little chance that it's
caused by anything except linux itself. Then, the reboot on DOS will only
tell you if the drives were corrupted at startup or at shutdown.

> Yes, but I find it unlikely. The partition table in within the first 512
> bytes and the corruption was in bytes 1060-1080. Also, one of the corrupted
> disks is on i815 and another in on HPT370.

I agree, but I proposed it just because it was simple to test.

> BTW: the corruption happens on warm reboots (running reboot command), not
> just on power off / on.

OK, but the BIOS scans your disks even during warm reboots. Though I don't
think it comes from there because of your two different controllers.

Willy

2003-11-01 21:02:31

by Ville Herva

[permalink] [raw]
Subject: Re: ide write cache issue? [Re: Something corrupts raid5 disks slightly during reboot]

On Sat, Nov 01, 2003 at 08:01:14PM +0100, you [Willy Tarreau] wrote:
> On Sat, Nov 01, 2003 at 08:25:18PM +0200, Ville Herva wrote:
>
> > Is there anything special in booting to DOS instead of different linux
> > kernel, other than that it would rule out some strange kernel bug that is
> > present in 2.2 and 2.4?
>
> No, it was just to quicky confirm or deny the fact that it's the kernel
> which causes the problem. It could have been a long standing bug in the IDE
> or partition code, and which is present in several kernels.

I vaguely recall some ide write cache flushing code was fixed some time ago,
but I can't find it in the archives. Maybe I dreamed that up. But I still
wonder why an otherwise idle drive would hold the data in write cache for so
long (several minutes.)

> But as you say that it affects two different controllers, there's little
> chance that it's caused by anything except linux itself.

Unless the drive is buggy wrt. flushing its write cache. But I think it's
a quite distant possibility.

> Then, the reboot on DOS will only tell you if the drives were corrupted at
> startup or at shutdown.

Yep. I'll try to find the moment to boot the beast into something else than
the current kernel / distro (it could in theory be something in userspace,
though I cannot think what).

> > BTW: the corruption happens on warm reboots (running reboot command), not
> > just on power off / on.
>
> OK, but the BIOS scans your disks even during warm reboots.

True, I mainly made this note because I hadn't mentioned it before in the
thread, and I thought it might have some relevance wrt. possible ide write
caching problems. I didn't mean it as a response to the BIOS theory.


-- v --

[email protected]

2003-11-02 06:06:15

by Andre Hedrick

[permalink] [raw]
Subject: Re: ide write cache issue? [Re: Something corrupts raid5 disks slightly during reboot]


I added the flush code to flush a drive in several places but it got
pulled and munged.

The original model was to flush each time a device was closed, when any
partition mount point was released, and called by notifier.

In a minimal partition count of 1, you had at least two flush before
shutdown or reboot.

So it was not the code because I fixed it, but then again I am retiring
from formal maintainership.

Cheers,

Andre Hedrick
LAD Storage Consulting Group

On Sat, 1 Nov 2003, Ville Herva wrote:

> On Sat, Nov 01, 2003 at 08:01:14PM +0100, you [Willy Tarreau] wrote:
> > On Sat, Nov 01, 2003 at 08:25:18PM +0200, Ville Herva wrote:
> >
> > > Is there anything special in booting to DOS instead of different linux
> > > kernel, other than that it would rule out some strange kernel bug that is
> > > present in 2.2 and 2.4?
> >
> > No, it was just to quicky confirm or deny the fact that it's the kernel
> > which causes the problem. It could have been a long standing bug in the IDE
> > or partition code, and which is present in several kernels.
>
> I vaguely recall some ide write cache flushing code was fixed some time ago,
> but I can't find it in the archives. Maybe I dreamed that up. But I still
> wonder why an otherwise idle drive would hold the data in write cache for so
> long (several minutes.)
>
> > But as you say that it affects two different controllers, there's little
> > chance that it's caused by anything except linux itself.
>
> Unless the drive is buggy wrt. flushing its write cache. But I think it's
> a quite distant possibility.
>
> > Then, the reboot on DOS will only tell you if the drives were corrupted at
> > startup or at shutdown.
>
> Yep. I'll try to find the moment to boot the beast into something else than
> the current kernel / distro (it could in theory be something in userspace,
> though I cannot think what).
>
> > > BTW: the corruption happens on warm reboots (running reboot command), not
> > > just on power off / on.
> >
> > OK, but the BIOS scans your disks even during warm reboots.
>
> True, I mainly made this note because I hadn't mentioned it before in the
> thread, and I thought it might have some relevance wrt. possible ide write
> caching problems. I didn't mean it as a response to the BIOS theory.
>
>
> -- v --
>
> [email protected]
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2003-11-02 08:28:41

by Ville Herva

[permalink] [raw]
Subject: Re: ide write cache issue? [Re: Something corrupts raid5 disks slightly during reboot]

On Sat, Nov 01, 2003 at 10:05:31PM -0800, you [Andre Hedrick] wrote:
>
> I added the flush code to flush a drive in several places but it got
> pulled and munged.
>
> The original model was to flush each time a device was closed, when any
> partition mount point was released, and called by notifier.
>
> In a minimal partition count of 1, you had at least two flush before
> shutdown or reboot.
>
> So it was not the code because I fixed it, but then again I am retiring
> from formal maintainership.

Thanks, Andre :(.

As an^Wthe IDE expert, can you clarify a few points:

- How long can the unwritten data linger in the drive cache if the drive
is otherwise idle? (Without an explicit flush and with write caching
enabled.)

I had unmounted the fs an raidstopped the md minutes before the boot.

- Can this corruption happen on warmboot or only on poweroff?

- What kind of corruption can one see the if boot takes place "too fast"
and drive hasn't got enough time to flush its cache?



-- v --

[email protected]

2003-11-02 20:58:05

by Matthias Andree

[permalink] [raw]
Subject: Re: ide write cache issue? [Re: Something corrupts raid5 disks slightly during reboot]

On Sun, 02 Nov 2003, Ville Herva wrote:

> As an^Wthe IDE expert, can you clarify a few points:
>
> - How long can the unwritten data linger in the drive cache if the drive
> is otherwise idle? (Without an explicit flush and with write caching
> enabled.)

Several seconds. This is usually detailed in the OEM integrator manual,
at least it used to be for several IBM and Fujitsu drives when I looked
two years ago. Drives usually start flushing cached data before they go
idle, and some drives guarantee maximum times before data hits the disk.
IIRC, Fujitsu MAH drives (SCSI though, not ATA) for instance guarantee
not to cache data for longer than 3 s, even if that means interrupting
reordering writes and hits write performance adversely (because it might
involve seeks). I seem to recall some IBM ATA drive claimed 15 s, but
don't quote me on that, I don't even recall if that was 2.5" or 3.5".

I don't recall the exact wording, so it may mean that the drive will not
VOLUNTARILY DELAY the write for more than 3 s. It's quite hard to write
4,096 scattered blocks on individual cylinders in 3 s even on 10,025/min
drives and requires knowing the block offset from the current rotational
angle of the platter... I wonder if drive firmware makes such scheduling
efforts.

> I had unmounted the fs an raidstopped the md minutes before the boot.

Ugly if it still corrupts. :-(

> - Can this corruption happen on warmboot or only on poweroff?

On ATA drives, the cache contents must persist across soft or hard reset
(warmboot).

> - What kind of corruption can one see the if boot takes place "too fast"
> and drive hasn't got enough time to flush its cache?

None with intact drives and bug-free firmware (I doubt such a thing
exists). Anyways, on powering down or with firmware bugs, anything is
possible.

--
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95

2003-11-03 05:35:31

by Andre Hedrick

[permalink] [raw]
Subject: Re: ide write cache issue? [Re: Something corrupts raid5 disks slightly during reboot]

On Sun, 2 Nov 2003, Ville Herva wrote:

> On Sat, Nov 01, 2003 at 10:05:31PM -0800, you [Andre Hedrick] wrote:
> >
> > I added the flush code to flush a drive in several places but it got
> > pulled and munged.
> >
> > The original model was to flush each time a device was closed, when any
> > partition mount point was released, and called by notifier.
> >
> > In a minimal partition count of 1, you had at least two flush before
> > shutdown or reboot.
> >
> > So it was not the code because I fixed it, but then again I am retiring
> > from formal maintainership.
>
> Thanks, Andre :(.
>
> As an^Wthe IDE expert, can you clarify a few points:
>
> - How long can the unwritten data linger in the drive cache if the drive
> is otherwise idle? (Without an explicit flush and with write caching
> enabled.)

Basically forever, until a read is issued to a range of lba's which starts
smaller than the uncommitted contents's lba, and includes the content in
question. Or if a flush cache or disable write-back cache is issued.

> I had unmounted the fs an raidstopped the md minutes before the boot.

The problem imho, is a break down of fundamental cascading callers.

Unmount MD -> flush MD

MD is a fakie device :-/

MD fakie calls for flush of R_DEV's

Likewise unloading or stopping MD operations should repeat regardless of
mount or not.

> - Can this corruption happen on warmboot or only on poweroff?

Given POST (assume x86 for only a brief moment) will issue execute
diagnositics to hunt for signatures on the ribbon, that basically wacks
the content. Cool cycle obviously wacks the buffer.

> - What kind of corruption can one see the if boot takes place "too fast"
> and drive hasn't got enough time to flush its cache?

erm, I am lost with the above.
Flush Cache is a hold and wait on completion, period.
However, a cache error at this point is a wasted effort to attempt
recovery.

Not sure I helped or not ...

Cheers,

Andre

2003-11-03 06:38:36

by Ville Herva

[permalink] [raw]
Subject: Re: ide write cache issue? [Re: Something corrupts raid5 disks slightly during reboot]

On Sun, Nov 02, 2003 at 09:34:30PM -0800, you [Andre Hedrick] wrote:
>
> > - How long can the unwritten data linger in the drive cache if the drive
> > is otherwise idle? (Without an explicit flush and with write caching
> > enabled.)
>
> Basically forever, until a read is issued to a range of lba's which starts
> smaller than the uncommitted contents's lba, and includes the content in
> question. Or if a flush cache or disable write-back cache is issued.

Huh. Sounds stunning.

I mean if the drive is otherwise idle, why would it hold the data in cache
without trying to write it onto platter? But I'll take your word for it.

> > I had unmounted the fs an raidstopped the md minutes before the boot.
>
> The problem imho, is a break down of fundamental cascading callers.
>
> Unmount MD -> flush MD
>
> MD is a fakie device :-/
>
> MD fakie calls for flush of R_DEV's
>
> Likewise unloading or stopping MD operations should repeat regardless of
> mount or not.

Yep. You wouldn't happen to know if it could make difference if the md
consists of raw devices (hdb,hdc,hdg) instead of partitions (hdc1,hb1,hdg1)
wrt. how and when the IDE flushes get triggered? Is there code that does it
for partitions but is lacking for whole devices?

(The other MDs on the same box that consist of partitions do not get
corrupted, but they are on Maxtors, not Samsungs.)

> > - Can this corruption happen on warmboot or only on poweroff?
>
> Given POST (assume x86 for only a brief moment) will issue execute

x86 in this case, yes.

> diagnositics to hunt for signatures on the ribbon, that basically wacks
> the content. Cool cycle obviously wacks the buffer.

Ack.

> > - What kind of corruption can one see the if boot takes place "too fast"
> > and drive hasn't got enough time to flush its cache?
>
> erm, I am lost with the above.
> Flush Cache is a hold and wait on completion, period.
> However, a cache error at this point is a wasted effort to attempt
> recovery.

I meant: if the drive does not flush it cache before reboot, is it likely to
see the sectors either up-to-date or having the old data? Or can one see
half-written or otherwise corrupted sectors?

The corruption I saw didn't look like the sector just had the old data, but
I'm not sure.

Then again, this may very well be something completely unrelated to ide
write caching.

> Not sure I helped or not ...

Yes you did, thanks!


-- v --

[email protected]

2004-01-02 19:42:15

by Ville Herva

[permalink] [raw]
Subject: Re: Something corrupts raid5 disks slightly during reboot

Summary:

I've been experiencing strange corruption on a raid5 volume for some time.
The kernel is 2.2.x + RAID-0.90 patch. Fs is ext2 (+e2compr). After
unmounting the filesystem, I can mount it again without problems. I can also
raidstop the raid device in between and all is still fine:

> umount /dev/md4; mount /dev/md4
- no corruption
> umount /dev/md4; raidstop /dev/md4; raidstart /dev/md4; mount /dev/md4
- no corruption

But after a reboot, the filesystem is corrupted - few bytes differ in the
beginning of /dev/md4 between 1k and and 5k.

See the threads
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=utf-8&threadm=MMYt.4B2.1%40gated-at.bofh.it&rnum=1&prev=/groups%3Fnum%3D50%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3Dutf-8%26q%3DSomething%2Bcorrupts%2Braid5%2Bdisks%2Bslightly%2Bduring%2Breboot%26sa%3DN%26tab%3Dwg
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=utf-8&threadm=MZsH.72R.5%40gated-at.bofh.it&rnum=4&prev=/groups%3Fnum%3D50%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3Dutf-8%26q%3DSomething%2Bcorrupts%2Braid5%2Bdisks%2Bslightly%2Bduring%2Breboot%26sa%3DN%26tab%3Dwg
for details.

I did some futher research.

First I thought this was an artifact of using "non-normal" blocksize on the
fs, 4096 bytes. The other raid partitions I have on the system are 1024 and
do not get corrupted.). Also the corrupting fs is on raid5 on bare disks
(hdb+hdc+hdg), while the others are on partitions (hda1+hdd1+hdf1 and so
on.)

I tried to reproduce this under vmware with 3-disk raid5 (hda+hdb+hdd) using
4096-byte ext2 and the exact same kernel. Initially, I thought I was able to
trigger it by mounting the fs while raid rebuild was on progress. The kernel
spitted this:

set_blocksize: b_count 1, dev md(9,4), block 15642112, from c014c3fb
set_blocksize: b_count 1, dev md(9,4), block 15642113, from c014c3fb
set_blocksize: b_count 1, dev md(9,4), block 15642114, from c014c3fb
...
set_blocksize: b_count 2, dev md(9,4), block 15642367, from c014c3fb
md4: blocksize changed during read
nr_blocks changed to 64 (blocksize 4096, j 3910528, max_blocks 39091968)

and fsck reported problems, but only once (the set_blocksize stuff appeared
each time). It seems the "set_blocksize" outpouring is a known issue, and
not severe:

http://www.ussg.iu.edu/hypermail/linux/kernel/0110.1/0493.html

The fsck errors were probably just a side-effect of unclean shutdown I used
to force raid rebuild.


After the failed vmware experiment, I tried to isolate when exactly the
corruption happens, shutdown or boot. Also, in the mentioned threads, people
had suggested turning off the write cache of the IDE disk.

I found out that the difference (corruption) is usually on three bytes on
/dev/hdg, but sometimes on /dev/hdc, too. (/dev/md4 = hdb+hdc+hdg; hdb&hdc
are on i810, hdg is on hpt370).

First, I did
umount /dev/md4
raidstop /dev/md4
head -c 50k /dev/hdg > /save/hdg
reboot

To rule out kernel raid autodetect and raid code in general, I
booted 2.2.25-1-secure with "single init=/bin/bash raid=noautodetect".
Did
head -c50k /dev/hdg | cmp -l /save/hdg
Three bytes differed:
4641 0 35
4642 0 205
4643 0 10
bytepos after before
boot boot

wrote the original stuff back:
dd if=/save/hdg /dev/hdg
sync
hdparm -W0 /dev/hdg
sync
reboot

Booted 2.2.25-1-secure with "single init=/bin/bash raid=noautodetect"
again.
Did
head -c50k /dev/hdg | cmp -l /save/hdg
Three same three bytes differed again.
Wrote the stuff back, sync'ed, did hdparm, and powered off. Still, the the
bytes differed on next boot.

Then I booted 2.4.21-jam1 with "single init=/bin/bash raid=noautodetect" (I
happened to have 2.4.21-jam1 compiled with suitable drivers at hand).
Wrote the same stuff back with dd, synced, turned ide cache off.
Booted 2.4.21-jam1 with "single init=/bin/bash raid=noautodetect" again.
Did the diff; the three bytes differed again.

Note that sometimes few bytes on hdc differed, too. Usually it was just the
three hdg bytes.

So this is not a 2.2 kernel issue. I very much doubt it's a kernel issue at
all. Unless it is a bug in kernel partition detection that is still present
in 2.4.x.

I tried to turn off the ide write cache with hdparm -W0, so it shouldn't
be a write caching issue.

If it's a bios issue, it's really a strange one, since it affects both disks
on i810 ide and on hpt370. The disks have no partition table, though, which
_could_ confuse the bios.

Any ideas? Who the heck could write to those three bytes, and why?


-- v --

[email protected]

2004-01-02 20:02:39

by Ville Herva

[permalink] [raw]
Subject: Re: Something corrupts raid5 disks slightly during reboot

> So this is not a 2.2 kernel issue. I very much doubt it's a kernel issue at
> all. Unless it is a bug in kernel partition detection that is still present
> in 2.4.x.

Short addition: in the earlier thread, it was suggested to inspect the disk
with another OS (DOS, Windows, something else) to rule out Linux kernel
completely. I couldn't easily find anything that boots from cd or preferably
from floppy (since I don't have cdrom attached due to ide cable shortage)
*and* supports the HPT370 ide controller /dev/hdg is connected to.

If I find something that fits the bill, I'll give it a shot.


-- v --

[email protected]

2004-01-14 14:47:35

by Ville Herva

[permalink] [raw]
Subject: Re: Something corrupts raid5 disks slightly during reboot

On Fri, Jan 02, 2004 at 09:42:00PM +0200, you [Ville Herva] wrote:
> Summary:
>
> I've been experiencing strange corruption on a raid5 volume for some time.
> The kernel is 2.2.x + RAID-0.90 patch. Fs is ext2 (+e2compr). After
> unmounting the filesystem, I can mount it again without problems. I can also
> raidstop the raid device in between and all is still fine:
>
> > umount /dev/md4; mount /dev/md4
> - no corruption
> > umount /dev/md4; raidstop /dev/md4; raidstart /dev/md4; mount /dev/md4
> - no corruption
>
> But after a reboot, the filesystem is corrupted - few bytes differ in the
> beginning of /dev/md4 between 1k and and 5k.
>
> See the threads
> http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=utf-8&threadm=MMYt.4B2.1%40gated-at.bofh.it&rnum=1&prev=/groups%3Fnum%3D50%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3Dutf-8%26q%3DSomething%2Bcorrupts%2Braid5%2Bdisks%2Bslightly%2Bduring%2Breboot%26sa%3DN%26tab%3Dwg
> http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=utf-8&threadm=MZsH.72R.5%40gated-at.bofh.it&rnum=4&prev=/groups%3Fnum%3D50%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3Dutf-8%26q%3DSomething%2Bcorrupts%2Braid5%2Bdisks%2Bslightly%2Bduring%2Breboot%26sa%3DN%26tab%3Dwg
> for details.
(...)
> I found out that the difference (corruption) is usually on three bytes on
> /dev/hdg, but sometimes on /dev/hdc, too. (/dev/md4 = hdb+hdc+hdg; hdb&hdc
> are on i810, hdg is on hpt370).
>
> First, I did
> umount /dev/md4
> raidstop /dev/md4
> head -c 50k /dev/hdg > /save/hdg
> reboot
>
> To rule out kernel raid autodetect and raid code in general, I
> booted 2.2.25-1-secure with "single init=/bin/bash raid=noautodetect".
> Did
> head -c50k /dev/hdg | cmp -l /save/hdg
> Three bytes differed:
> 4641 0 35
> 4642 0 205
> 4643 0 10
> bytepos after before
> boot boot
>
> wrote the original stuff back:
> dd if=/save/hdg /dev/hdg
> sync
> hdparm -W0 /dev/hdg
> sync
> reboot
>
> Booted 2.2.25-1-secure with "single init=/bin/bash raid=noautodetect"
> again.
> Did
> head -c50k /dev/hdg | cmp -l /save/hdg
> Three same three bytes differed again.
> Wrote the stuff back, sync'ed, did hdparm, and powered off. Still, the the
> bytes differed on next boot.
>
> Then I booted 2.4.21-jam1 with "single init=/bin/bash raid=noautodetect" (I
> happened to have 2.4.21-jam1 compiled with suitable drivers at hand).
> Wrote the same stuff back with dd, synced, turned ide cache off.
> Booted 2.4.21-jam1 with "single init=/bin/bash raid=noautodetect" again.
> Did the diff; the three bytes differed again.
>
> Note that sometimes few bytes on hdc differed, too. Usually it was just the
> three hdg bytes.
>
> So this is not a 2.2 kernel issue. I very much doubt it's a kernel issue at
> all. Unless it is a bug in kernel partition detection that is still present
> in 2.4.x.
>
> I tried to turn off the ide write cache with hdparm -W0, so it shouldn't
> be a write caching issue.
>
> If it's a bios issue, it's really a strange one, since it affects both disks
> on i810 ide and on hpt370. The disks have no partition table, though, which
> _could_ confuse the bios.

Addition:

- I tried booting from 2.6.1 single user mode to 2.6.1 single user
mode (booting with sysrq-b to avoid shutdown process):
-> The corruption on /dev/hdg happens like with 2.2 and 2.4

- I booted from 2.6.1 single user mode to 2.6.1 single user
mode with kexec patch to avoid entering BIOS in between
-> The corruption DOES NOT happen

I'm pretty much out of ideas.


-- v --

[email protected]

2004-01-14 22:22:27

by Willy Tarreau

[permalink] [raw]
Subject: Re: Something corrupts raid5 disks slightly during reboot

Hi Ville,

On Wed, Jan 14, 2004 at 04:46:46PM +0200, Ville Herva wrote:

> - I tried booting from 2.6.1 single user mode to 2.6.1 single user
> mode (booting with sysrq-b to avoid shutdown process):
> -> The corruption on /dev/hdg happens like with 2.2 and 2.4
>
> - I booted from 2.6.1 single user mode to 2.6.1 single user
> mode with kexec patch to avoid entering BIOS in between
> -> The corruption DOES NOT happen
>
> I'm pretty much out of ideas.

To me, it proves that the bios triggers the problem. It could also be in
the device enumeration functions or device initialization that it does
this thing. Perhaps even a more nasty thing such as a pending DMA write
which completes during a device reset. That's very odd anyway. I don't
quite remember well all your setup. Have you tried enabling/disabling
shadow ram/caching on bios regions to check if a faster/slower code execution
in the bios changes something ? Also do it on additionnal ROMs if you have
an onboard bios on your secondary controller.

I'm also getting stuck without any other idea :-/

Regards,
Willy

2004-01-14 22:47:24

by Ville Herva

[permalink] [raw]
Subject: Re: Something corrupts raid5 disks slightly during reboot

On Wed, Jan 14, 2004 at 11:22:14PM +0100, you [Willy Tarreau] wrote:
> Hi Ville,
>
> On Wed, Jan 14, 2004 at 04:46:46PM +0200, Ville Herva wrote:
>
> > - I tried booting from 2.6.1 single user mode to 2.6.1 single user
> > mode (booting with sysrq-b to avoid shutdown process):
> > -> The corruption on /dev/hdg happens like with 2.2 and 2.4
> >
> > - I booted from 2.6.1 single user mode to 2.6.1 single user
> > mode with kexec patch to avoid entering BIOS in between
> > -> The corruption DOES NOT happen
> >
> > I'm pretty much out of ideas.
>
> To me, it proves that the bios triggers the problem.

Or lilo. Abit BIOS, Adaptec SCSI BIOS, Highpoint HPT370 BIOS and lilo are
the only pieces of code that get executed between power on and the kernel.

Unfortunately, I was unable to rule that (unlikely) alternative out just
yet, because I found out that the box doesn't have a working floppy either
(cdrom is not plugged because of lack of cables - I guess I miswired the
floppy drive too when I last messed with the power cables.) This is also why
I didn't try your DOS disk on the box. It seems its diskedit can recognize
at least scsi disks, so it could well handle the disk on Highpoint
controller, too. Anyway, thanks for that (and reminding me how rusty my
French is - and has always been :). I plan to try booting from floppy
without lilo and the dos editor, when I next open the box and can fix the
floppy wiring. It's a server so I don't take it down all the time...

> It could also be in the device enumeration functions or device
> initialization that it does this thing. Perhaps even a more nasty thing
> such as a pending DMA write which completes during a device reset.

Something like that crossed my mind initially, but waiting >10min between
the write and boot didn't help, nor did "hdparm -W 0"...

> That's very odd anyway. I don't quite remember well all your setup.

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=utf-8&threadm=MMYt.4B2.1%40gated-at.bofh.it&rnum=1&prev=/groups%3Fnum%3D50%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3Dutf-8%26q%3DSomething%2Bcorrupts%2Braid5%2Bdisks%2Bslightly%2Bduring%2Breboot%26sa%3DN%26tab%3Dwg

gives some details. Basicly it's a Abit ST6R mobo (i815 and HPT370 IDEs),
and three Maxtor 250GB disk (root and first data fs), 3 Samsung 80GB's
(second data fs). One of the Samsungs on the HPT370 is the one that exhibits
the corruption.

> Have you tried enabling/disabling shadow ram/caching on bios regions to
> check if a faster/slower code execution in the bios changes something ?

No. I could try that.

> Also do it on additionnal ROMs if you have an onboard bios on your
> secondary controller.

Ok, if only I can manage to find such options from the BIOS.

> I'm also getting stuck without any other idea :-/

No wonder. So far you have been most helpful - bug thanks for that.

PS: Again, the next round of results will only be in after some time - as I
said, I'll need to wait for a suitable reboot time for the box... Sorry for
the trickle.


-- v --

[email protected]