Message-ID: <43CE1E52.3030907@aitel.hist.no>
Date: Wed, 18 Jan 2006 11:54:10 +0100
From: Helge Hafting <helge.hafting@aitel.hist.no>
User-Agent: Debian Thunderbird 1.0.7 (X11/20051017)
MIME-Version: 1.0
To: Cynbe ru Taren <cynbe@muq.org>
CC: linux-kernel@vger.kernel.org
Subject: Re: FYI: RAID5 unusably unstable through 2.6.14
References: <E1EywcM-0004Oz-IE@laurel.muq.org>
In-Reply-To: <E1EywcM-0004Oz-IE@laurel.muq.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1881
Lines: 54

Cynbe ru Taren wrote:

>Just in case the RAID5 maintainers aren't aware of it:
>
>The current Linux kernel RAID5 implementation is just
>too fragile to be used for most of the applications
>where it would be most useful.
>
>In principle, RAID5 should allow construction of a
>disk-based store which is considerably MORE reliable
>than any individual drive.
>
>In my experience, at least, using Linux RAID5 results
>in a disk storage system which is considerably LESS
>reliable than the underlying drives.
>
>What happens repeatedly, at least in my experience over
>a variety of boxes running a variety of 2.4 and 2.6
>Linux kernel releases, is that any transient I/O problem
>results in a critical mass of RAID5 drives being marked
>'failed', 
>
What kind of "transient io error" would that be?
That is not supposed to happen regularly. . .

You do replace failed drives immediately?  Allowing
systems to run "for a while" in degraded mode is
surely a recipe for disaster.  Degraded mode
has no safety at all, it is just raid-0 with a performance
overhead added in. :-/

Having hot spares is a nice way of replacing the failed
drive quickly.

>at which point there is no longer any supported
>way of retrieving the data on the RAID5 device, even
>though the underlying drives are all fine, and the underlying
>data on those drives almost certainly intact.
>  
>
As other have showed - "mdadm" can reassemble your
broken raid - and it'll work well in those cases where
the underlying drives indeed are ok.  It will fail
spectacularly if you have a real double fault though,
but then nothing short of raid-6 can save you.


Helge Hafting

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/