From: Neil Brown <neilb@suse.de>
To: Mark Lord <lkml@rtr.ca>
Date: Thu, 19 Jan 2006 10:37:03 +1100
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <17358.53535.449726.814333@cse.unsw.edu.au>
Cc: Helge Hafting <helge.hafting@aitel.hist.no>,
       Cynbe ru Taren <cynbe@muq.org>, linux-kernel@vger.kernel.org
Subject: Re: FYI: RAID5 unusably unstable through 2.6.14
In-Reply-To: message from Mark Lord on Wednesday January 18
References: <E1EywcM-0004Oz-IE@laurel.muq.org>
	<43CE1E52.3030907@aitel.hist.no>
	<43CE6997.6090005@rtr.ca>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1568
Lines: 37

On Wednesday January 18, lkml@rtr.ca wrote:
> Helge Hafting wrote:
>  >
>  > As other have showed - "mdadm" can reassemble your
>  > broken raid - and it'll work well in those cases where
>  > the underlying drives indeed are ok.  It will fail
>  > spectacularly if you have a real double fault though,
>  > but then nothing short of raid-6 can save you.
> 
> No, actually there are several things we *could* do,
> if only the will-to-do-so existed.

You not only need the will.  You also need the ability and the time,
and the three must be combined into the one person...

> 
> For example, one bad sector on a drive doesn't mean that
> the entire drive has failed.  It just means that one 512-byte
> chunk of the drive has failed.
> 
> We could rewrite the failed area of the drive, allowing the
> onboard firmware to repair the fault internally, likely by
> remapping physical sectors.  This is nothing unusual, as all
> drives these days ship from the factory with many bad sectors
> that have already been remapped to "fix" them.  One or two
> more in the field is no reason to toss a perfectly good drive.

Very recent 2.6 kernels do exactly this.  They don't drop a drive on a
read error, only on a write error.  On a read error they generate the
data from elsewhere and schedule a write, then a re-read.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/