2004-04-20 22:13:10

by Hubert Tonneau

[permalink] [raw]
Subject: The missing RAID level

Assuming that I want to do long term archiving (many many gigabytes of datas,
but tiny load) on disks, the cheapest and easiest solution nowdays seems to
connect large 300 GB IDE disks through USB 2 to a comodity PC.

Now the problem is how to best recover from some disks failure ?

For the production storage, I use software RAID 5 on internal disks, but
for huge external ones, it might no more be a good idea, because of two
reasons:
. having several disks failure is more likely
. in case of a catastrophy, I would like to be abble to recover at least some
of the datas, so no RAID at all is better that RAID 5 since I have DVD
backups and what I want to optimise is operator time

So, one very interesting possibility would be to have an extra RAID level that
would do the following:
assuming that you connect 5+1 partitions, then you get 5 md partitions, not a
single one, with the following properties:
. any read to mdX goes straight forward to reading the underlying partition.
. any write goes staight forward to writting the underlying partition, but also
updates the parity on the extra partition.

So, at the expense of slow write capabilities, which is not a problem for long
term archiving, I get a system with very interesting properties not covered by
existing Linux software RAID levels:
. in case of one disk failure, I can plug a new one, then rebuild just as with
classical RAID
. in case of more disk failures, I only loose part of the archives (so spend
less operator time for recovery from DVD).
. all partitions can be read just through ignoring the RAID details, so it is
possible to unplug any of disks and connect it sowehere else with no extra
constrains
. adding or removing disks from the raidset is trivial: just rebuild the parity
partition

On the 'use what's available instead of requesting new features' side, I'm also
interested with feedback from users using large (8 to 16 SATA disks) external
cheap (anything that raises the price for 8 disk from 8 x 350 euro to more than
16 x 360 euro is no solution since clustering is then the way to go) towers that
would make RAID 5 a resonable solution, and how they connect to the Linux kernel
(each disk seen individualy, RAID handled by the contoler, need for a driver
outside the stock kernel, etc)

Regards,
Hubert Tonneau


2004-04-21 21:03:58

by Eric D. Mudama

[permalink] [raw]
Subject: Re: The missing RAID level

On Tue, Apr 20 at 19:50, Hubert Tonneau wrote:
>Assuming that I want to do long term archiving (many many gigabytes of datas,
>but tiny load) on disks, the cheapest and easiest solution nowdays seems to
>connect large 300 GB IDE disks through USB 2 to a comodity PC.
>
>Now the problem is how to best recover from some disks failure ?

Can you give a bit more detail? How much data is there? "many many"
gigabytes can mean a lot. You say you use DVDs for recovery, but at
~5GB/DVD, that's a LOT of DVDs to store a 5 drive array, especially
since you're talking about 300GB IDE drives.

Can you just use RAID0 then on your backup interval, break the mirror
and store the mirror in the closet, replacing that half of the mirror
with fresh drives?

In theory SATA will support hotplugging and already supports long
cables, so that might be a better solution than external USB
enclosures.

--eric



--
Eric D. Mudama
[email protected]

2004-04-21 21:52:35

by Junio C Hamano

[permalink] [raw]
Subject: Re: The missing RAID level

>>>>> "HT" == Hubert Tonneau <[email protected]> writes:

HT> So, one very interesting possibility would be to have an
HT> extra RAID level that would do the following:

HT> assuming that you connect 5+1 partitions, then you get 5 md
HT> partitions, not a single one, with the following properties:

HT> . any read to mdX goes straight forward to reading the
HT> underlying partition.

HT> . any write goes staight forward to writting the underlying
HT> partition, but also updates the parity on the extra
HT> partition.

It seems to me that you can create a single RAID-4 device out of
5 data and 1 parity disks, and run device mapper on top of that
RAID-4 device, picking every 6 chunk worth of data and
collecting into a device (totalling 6 devices). The first 5
such device mapper devices would end up with blocks from the
underlying 5 data disks. I do not know offhand if such a dm
target already exists, though.


2004-04-22 02:55:16

by H. Peter Anvin

[permalink] [raw]
Subject: Re: The missing RAID level

Followup to: <[email protected]>
By author: Hubert Tonneau <[email protected]>
In newsgroup: linux.dev.kernel
>
> So, one very interesting possibility would be to have an extra RAID level that
> would do the following:
> assuming that you connect 5+1 partitions, then you get 5 md partitions, not a
> single one, with the following properties:
> . any read to mdX goes straight forward to reading the underlying partition.
> . any write goes staight forward to writting the underlying partition, but also
> updates the parity on the extra partition.
>

You have just described RAID 4.

-hpa