Message-Id: <200212200049.gBK0n5NS016297@chaos.ao.net>
To: linux-kernel@vger.kernel.org
Subject: raid5 multi-drive-failure and recovery?
Date: Thu, 19 Dec 2002 19:49:05 -0500
From: Dan Merillat <harik@chaos.ao.net>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1495
Lines: 38


I've got a problem.  Two drives in my array developed bad sectors at
the same time.  They're in completely different places, but I can't
read the disk because md simply fails out both of them, and marks the
array unusable.

Now, I can either buy new drives and dd the raw partition over, or I
can hack the kernel to make it a bit smarter about unrecoverable
reads.

Obviously, if I have raid5_error not mark the drive bad, it will hammer
away on it, failing over and over.  My thought was to let the read
fail, catch it in raid5_end_read_request then tag the stripe_head
with the device that's failed.  If one has already failed, return
EIO.   This way further reads on the stripe_head will go to the parity
disk (until it's eventually freed.  One IO error per stripe isn't too
harsh a price to pay for disaster recovery)

in 2.4.20, I'm at raid5.c:421 where we're about to call md_error.
What happens to the bh from that point?  Obviously, it's not up-to-date,
so when 1 drive fails how does it get re-issued to be pulled from a parity
drive to reconstruct it?

Please CC me, I read via the archives.

Thanks,

--Dan


Obviously, this would ONLY be for recovery in the face of bad sectors.
As quickly as possible the bad drives need to be replaced
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/