(please CC on replies)
Hi!
I recently had a case where one disk in a two-disk RAID1 array went
subtly bad, effectively refusing to write to certain sectors without
reporting an error. Basically, parts of the disk went undetectably
read-only, causing file system corruption that wouldn't go away after
fsck, and all kinds of other fun.
Would it be hard/wise to add an option for RAID1 mode to read from all
devices on a read, and report an error to syslog or simply return an
I/O error if there is a mismatch? (Or use majority voting and tell
people to use 3-disk RAID1 arrays from now on ;-)
thanks,
Lennert
Lennert Buytenhek wrote:
> (please CC on replies)
>
> Hi!
>
> I recently had a case where one disk in a two-disk RAID1 array went
> subtly bad, effectively refusing to write to certain sectors without
> reporting an error. Basically, parts of the disk went undetectably
> read-only, causing file system corruption that wouldn't go away after
> fsck, and all kinds of other fun.
>
> Would it be hard/wise to add an option for RAID1 mode to read from all
> devices on a read, and report an error to syslog or simply return an
> I/O error if there is a mismatch? (Or use majority voting and tell
> people to use 3-disk RAID1 arrays from now on ;-)
Well, an option might be good for debugging, but it will lower performance
a lot, IMHO.
As log as 3-drive RAID1 is concerned :-) You'd better get RAID3 or 2
more drives and RAID5.
Kalin.
--
|[ ~~~~~~~~~~~~~~~~~~~~~~ ]|
+-> http://ThinRope.net/ <-+
|[ ______________________ ]|
Lennert Buytenhek wrote:
> (please CC on replies)
>
> Hi!
>
> I recently had a case where one disk in a two-disk RAID1 array went
> subtly bad, effectively refusing to write to certain sectors without
> reporting an error. Basically, parts of the disk went undetectably
> read-only, causing file system corruption that wouldn't go away after
> fsck, and all kinds of other fun.
>
> Would it be hard/wise to add an option for RAID1 mode to read from all
> devices on a read, and report an error to syslog or simply return an
> I/O error if there is a mismatch? (Or use majority voting and tell
> people to use 3-disk RAID1 arrays from now on ;-)
>
>
> thanks,
> Lennert
Would a two-disk raid-5 not do just what you want?
--
Eyal Lebedinsky ([email protected]) <http://samba.org/eyal/>
attach .zip as .dat
On Saturday September 10, [email protected] wrote:
> (please CC on replies)
>
> Hi!
>
> I recently had a case where one disk in a two-disk RAID1 array went
> subtly bad, effectively refusing to write to certain sectors without
> reporting an error. Basically, parts of the disk went undetectably
> read-only, causing file system corruption that wouldn't go away after
> fsck, and all kinds of other fun.
That really isn't something that a drive should do. If a write fails,
you need to be told that it failed. If anything else happens, maybe
you should consider boycotting that manufacturer, or at least buying
more expensive drives (do I guess right that there were fairly
cheap??).
>
> Would it be hard/wise to add an option for RAID1 mode to read from all
> devices on a read, and report an error to syslog or simply return an
> I/O error if there is a mismatch? (Or use majority voting and tell
> people to use 3-disk RAID1 arrays from now on ;-)
>
No, I don't think so. The overhead would be substantial, so people
would be very unlikely to use it.
Checking of the correctness of the data is really best done in
hardware - in the drive itself. That's what CRC fields (or whatever
they use today) in the physical sectors are for...
Sun's new ZFS file system (don't know if it's released yet) has a
fairly cute idea. Instead of just storing the address of each data
block in a files index information, they also store a checksum and
potentially multiple physical addresses. When loading the data, they
(maybe optionally) check the checksum (it would be nice if that could
be hardware accelerated!). If the check fails, either flag an error,
or try to read from another location.
I think doing this in the filesystem is a much better idea than trying
to do it in the raid layer.
The only raid-layer option that I can think of that makes much sense
is to have a regular background scan that reads all blocks and makes
sure all mirrors are consistent. If an error is found, you generate a
warning and possibly fix it. This wouldn't report errors immediately,
but at least you would find out proactively instead of through weird
data corruption.
I'm working towards this functionality, but it is still a little way
off.
NeilBrown
On 9/12/05, Neil Brown <[email protected]> wrote:
> On Saturday September 10, [email protected] wrote:
> > (please CC on replies)
> >
> > Hi!
> >
> > I recently had a case where one disk in a two-disk RAID1 array went
> > subtly bad, effectively refusing to write to certain sectors without
> > reporting an error. Basically, parts of the disk went undetectably
> > read-only, causing file system corruption that wouldn't go away after
> > fsck, and all kinds of other fun.
>
> That really isn't something that a drive should do. If a write fails,
> you need to be told that it failed. If anything else happens, maybe
> you should consider boycotting that manufacturer, or at least buying
> more expensive drives (do I guess right that there were fairly
> cheap??).
>
>
> >
> > Would it be hard/wise to add an option for RAID1 mode to read from all
> > devices on a read, and report an error to syslog or simply return an
> > I/O error if there is a mismatch? (Or use majority voting and tell
> > people to use 3-disk RAID1 arrays from now on ;-)
> >
>
> No, I don't think so. The overhead would be substantial, so people
> would be very unlikely to use it.
There are situations where data integrity is far more important than speed.
On AIX I usually use the Mirror Write Consistency and Write Verify
options on my mirrored volumes that store data where integrity is more
important than speed.
I guess something like those options would also satisfy Lennert's
needs, but I don't know if it's currently possible with the Linux LVM
or elsewhere.
You can read a bit about the MWC and WV options in AIX at :
http://publib.boulder.ibm.com/infocenter/pseries/index.jsp?topic=/com.ibm.aix.doc/aixbman/prftungd/diskperf2.htm
--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html
On Monday September 12, [email protected] wrote:
> > No, I don't think so. The overhead would be substantial, so people
> > would be very unlikely to use it.
>
> There are situations where data integrity is far more important than speed.
> On AIX I usually use the Mirror Write Consistency and Write Verify
> options on my mirrored volumes that store data where integrity is more
> important than speed.
> I guess something like those options would also satisfy Lennert's
> needs, but I don't know if it's currently possible with the Linux LVM
> or elsewhere.
>
> You can read a bit about the MWC and WV options in AIX at :
> http://publib.boulder.ibm.com/infocenter/pseries/index.jsp?topic=/com.ibm.aix.doc/aixbman/prftungd/diskperf2.htm
Thanks for the link.
If I understand the (fairly brief) descriptions correctly:
Passive mirror-write-constancy has always been part of md/raid1
Active mirror-write-constancy is equivalent to the new
bitmap-write-intent support.
WV means read-after-write which we don't do, but might be useful.
However, I'm not 100% certain that WV would really be useful. Modern
drives will almost certainly return a read-after-write request out of
the drive's cache rather than going to the media. We would need some
way to tell the drive to ignore the cache for this read. I suspect
this is possible, but might not be trivial...
NeilBrown
In article <[email protected]> you wrote:
> However, I'm not 100% certain that WV would really be useful. Modern
> drives will almost certainly return a read-after-write request out of
> the drive's cache rather than going to the media. We would need some
> way to tell the drive to ignore the cache for this read. I suspect
> this is possible, but might not be trivial...
I too think an background disk scrubbing job to detect bit errors
(expecially usefull for raid5, but also helpfull todetect bad hardware)
would be good. Some user mode API is needed to address parts of mirrors or
parity sets.
Gruss
Bernd
On Mon, Sep 12, 2005 at 09:52:09AM +1000, Neil Brown wrote:
> > I recently had a case where one disk in a two-disk RAID1 array went
> > subtly bad, effectively refusing to write to certain sectors without
> > reporting an error. Basically, parts of the disk went undetectably
> > read-only, causing file system corruption that wouldn't go away after
> > fsck, and all kinds of other fun.
>
> That really isn't something that a drive should do. If a write fails,
> you need to be told that it failed.
Agreed.
> If anything else happens, maybe you should consider boycotting that
> manufacturer, or at least buying more expensive drives (do I guess
> right that there were fairly cheap??).
The drive was a Western Digital Protege WD400 40G PATA drive, no idea
whether that is to be considered 'cheap' but it wasn't exactly cheap
when I bought it.
I've had drives from lots of drive manufacturers (IBM, Hitachi, Samsung,
Maxtor, Western Digital, Spinpoint (?), Seagate) fail on me so far. I
don't mind always buying known-good brand X, but how will I be sure that
one of brand X's drives won't eventually fail in a similar manner?
If you say that "The RAID1 driver design doesn't tolerate these kinds
of failures.", that sounds fair enough, but it would still be nice to
have an option to enable some extra consistency checking that would
catch this.
> > Would it be hard/wise to add an option for RAID1 mode to read from all
> > devices on a read, and report an error to syslog or simply return an
> > I/O error if there is a mismatch? (Or use majority voting and tell
> > people to use 3-disk RAID1 arrays from now on ;-)
>
> No, I don't think so. The overhead would be substantial, so people
> would be very unlikely to use it.
I personally wouldn't care if I have put three disks in every box I
build and if reads and writes are 500% slower if I would then be sure
that this silent failure would not occur. I value data integrity
above performance even if that means slowing stuff down substantially.
> The only raid-layer option that I can think of that makes much sense
> is to have a regular background scan that reads all blocks and makes
> sure all mirrors are consistent. If an error is found, you generate a
> warning and possibly fix it. This wouldn't report errors immediately,
> but at least you would find out proactively instead of through weird
> data corruption.
That sounds like a nice thing to have in any case.
--L