2006-08-01 17:25:59

by Bill Davidsen

[permalink] [raw]
Subject: Re: let md auto-detect 128+ raid members, fix potential race condition

Neil Brown wrote:

>[linux-raid added to cc.
> Background: patch was submitted to remove the current hard limit
> of 127 partitions that can be auto-detected - limit set by
> 'detected_devices array in md.c.
>]
>
>My first inclination is not to fix this problem.
>
>I consider md auto-detect to be a legacy feature.
>I don't use it and I recommend that other people don't use it.
>However I cannot justify removing it, so it stays there.
>Having this limitation could be seen as a good motivation for some
>more users to stop using it.
>
>Why not use auto-detect?
>I have three issues with it.
>
> 1/
> It just isn't "right". We don't mount filesystems from partitions
> just because they have type 'Linux'. We don't enable swap on
> partitions just because they have type 'Linux swap'. So why do we
> assemble md/raid from partitions that have type 'Linux raid
> autodetect'?
>
>

I rarely think you are totally wrong about anything RAID, but I do
believe you have missed the point of autodetect. It is intended to work
as it does now, building the array without depending on some user level
functionality. The name "autodetect" clearly differentiates this type
from the others you mentioned, there is no implication that swap or
Linux partitions should do anything automatically.

This is not a case of my using a feature and defending it, I don't use
it currently. for all of the reasons you enumerate. That doesn't mean
that I haven't used the autodetect in the past or that I won't in the
future, particularly with embedded systems.

> 2/
> It can cause problems when moving devices. If you have two
> machines, both with an 'md0' array and you move the drives from one
> on to the other - say because the first lost a powersupply - and
> then reboot the machine that received the drives, which array gets
> assembled as 'md0' ?? You might be lucky, you might not. This
> isn't purely theoretical - there have been pleas for help on
> linux-raid resulting from exactly this - though they have been
> few.
>
> 3/
> The information redundancy can cause a problem when it gets out of
> sync. i.e. you add a partition to a raid array without setting
> the partition type to 'fd'. This works, but on the next reboot
> the partition doesn't get added back into the array and you have
> to manually add it yourself.
> This too is not purely theory - it has been reported slightly more
> often than '2'.
>
>So my preferred solution to the problem is to tell people not to use
>autodetect. Quite possibly this should be documented in the code, and
>maybe even have a KERN_INFO message if more than 64 devices are
>autodetected.
>
>
I don't personally see the value of autodetect for putting together the
huge number of drives people configure. I see this as a way to improve
boot reliability, if someone needs 64 drives for root and boot, they
need to read a few essays on filesystem configuration. However, I'm
aware that there are some really bizarre special cases out there.

Maybe the limit should be in KCONFIG, with a default of 16 or so.

--
bill davidsen <[email protected]>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979


2006-08-01 21:32:52

by Alexandre Oliva

[permalink] [raw]
Subject: Re: let md auto-detect 128+ raid members, fix potential race condition

On Aug 1, 2006, Bill Davidsen <[email protected]> wrote:

> I rarely think you are totally wrong about anything RAID, but I do
> believe you have missed the point of autodetect. It is intended to
> work as it does now, building the array without depending on some user
> level functionality.

Well, it clearly depends on at least some user level functionality
(the ioctl that triggers autodetect). Going from that to a
full-fledged mdadm doesn't sound like such a big deal to me.

> I don't personally see the value of autodetect for putting together
> the huge number of drives people configure. I see this as a way to
> improve boot reliability, if someone needs 64 drives for root and
> boot, they need to read a few essays on filesystem
> configuration. However, I'm aware that there are some really bizarre
> special cases out there.

There's LVM. If you have to keep root out of the VG just because
people say so, you lose lots of benefits from LVM, such as being able
to grow root with the system running, take snapshots of root, etc.

Sure enough the LVM subsystem could make things better for one to not
need all of the PVs in the root-containing VG in order to be able to
mount root read-write, or at all, but if you think about it, if initrd
is set up such that you only bring up the devices that hold the actual
root device within the VG and then you change that, say by taking a
snapshot of root, moving it around, growing it, etc, you'd be better
off if you could still boot. So you do want all of the VG members to
be around, just in case.

This is trivially-accomplished for regular disks whose drivers are
loaded by initrd, but for raid devices, you need to tentatively bring
up every raid member you can, just in case some piece of root is
there, otherwise you may end up unable to boot.

Yes, this is an argument against root on LVM, but there are arguments
*for* root on LVM as well, and there's no reason to not support both
behaviors equally well and let people figure out what works best for
them.

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
Secretary for FSF Latin America http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}

2006-08-02 06:47:45

by Luca Berra

[permalink] [raw]
Subject: Re: let md auto-detect 128+ raid members, fix potential race condition

On Tue, Aug 01, 2006 at 06:32:33PM -0300, Alexandre Oliva wrote:
>Sure enough the LVM subsystem could make things better for one to not
>need all of the PVs in the root-containing VG in order to be able to
>mount root read-write, or at all, but if you think about it, if initrd
it shouldn't need all of the PVs you just need all the pv where the
rootfs is.

>is set up such that you only bring up the devices that hold the actual
>root device within the VG and then you change that, say by taking a
>snapshot of root, moving it around, growing it, etc, you'd be better
>off if you could still boot. So you do want all of the VG members to
>be around, just in case.
in this case just regenerate the initramfs after modifying the vg that
contains root. I am fairly sure that kernel upgrades are far more
frequent than the addirion of PVs to the root VG.

>Yes, this is an argument against root on LVM, but there are arguments
>*for* root on LVM as well, and there's no reason to not support both
>behaviors equally well and let people figure out what works best for
>them.

No, this is just an argument against misusing root on lvm.

L.

--
Luca Berra -- [email protected]
Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
X AGAINST HTML MAIL
/ \

2006-08-02 16:44:11

by Bill Davidsen

[permalink] [raw]
Subject: Re: let md auto-detect 128+ raid members, fix potential race condition

Alexandre Oliva wrote:
> On Aug 1, 2006, Bill Davidsen <[email protected]> wrote:
>
>> I rarely think you are totally wrong about anything RAID, but I do
>> believe you have missed the point of autodetect. It is intended to
>> work as it does now, building the array without depending on some user
>> level functionality.
>
> Well, it clearly depends on at least some user level functionality
> (the ioctl that triggers autodetect). Going from that to a
> full-fledged mdadm doesn't sound like such a big deal to me.
>
>> I don't personally see the value of autodetect for putting together
>> the huge number of drives people configure. I see this as a way to
>> improve boot reliability, if someone needs 64 drives for root and
>> boot, they need to read a few essays on filesystem
>> configuration. However, I'm aware that there are some really bizarre
>> special cases out there.
>
> There's LVM. If you have to keep root out of the VG just because
> people say so, you lose lots of benefits from LVM, such as being able
> to grow root with the system running, take snapshots of root, etc.
>
But it's MY system. I don't have to anything. More to the point, growing
root while the system is running is done a lot less than booting. In
general the root f/s has very little in it, and that's a good thing.

--
Bill Davidsen <[email protected]>
Obscure bug of 2004: BASH BUFFER OVERFLOW - if bash is being run by a
normal user and is setuid root, with the "vi" line edit mode selected,
and the character set is "big5," an off-by-one errors occurs during
wildcard (glob) expansion.