2006-01-31 00:52:27

by H. Peter Anvin

[permalink] [raw]
Subject: Exporting which partitions to md-configure

I'm putting the final touches on kinit, which is the user-space
replacement (based on klibc) for the whole in-kernel root-mount complex.
Pretty much the one thing remaining -- other than lots of testing --
is to handle automatically mounted md devices. In order to do that,
without adding userspace versions of all the paritition code (which may
be a future change, but a pretty big one) it would be good if the
partition flag to auto-configure RAID was available in userspace,
presumably through sysfs.

Any feeling how best to do that? My current thinking is to export a
"flags" entry in addition to the current ones, presumably based on
"struct parsed_partitions->parts[].flags" (fs/partitions/check.h), which
seems to be what causes md_autodetect_dev() to be called.

Note that this should be available even if md isn't compiled into the
kernel, thus making it possible to load md as a module before running
kinit, or to make the equivalent of the kernel mounting sequence from a
totally runtime user tool.

-hpa


2006-01-31 01:10:39

by NeilBrown

[permalink] [raw]
Subject: Re: Exporting which partitions to md-configure

On Monday January 30, [email protected] wrote:
>
> Any feeling how best to do that? My current thinking is to export a
> "flags" entry in addition to the current ones, presumably based on
> "struct parsed_partitions->parts[].flags" (fs/partitions/check.h), which
> seems to be what causes md_autodetect_dev() to be called.
>

I think I would prefer a 'type' attribute in each partition that
records the 'type' from the partition table. This might be more
generally useful than just for md.
Then your userspace code would have to look for '253' and use just
those partitions.

NeilBrown

2006-01-31 01:43:08

by Kyle Moffett

[permalink] [raw]
Subject: Re: Exporting which partitions to md-configure

On Jan 30, 2006, at 20:10, Neil Brown wrote:
> On Monday January 30, [email protected] wrote:
>> Any feeling how best to do that? My current thinking is to export
>> a "flags" entry in addition to the current ones, presumably based
>> on "struct parsed_partitions->parts[].flags" fs/partitions/
>> check.h), which seems to be what causes md_autodetect_dev() to be
>> called.
>
> I think I would prefer a 'type' attribute in each partition that
> records the 'type' from the partition table. This might be more
> generally useful than just for md. Then your userspace code would
> have to look for '253' and use just those partitions.

Well, for an MSDOS partition table, you would look for '253', for a
Mac partition table you could look for something like 'Linux_RAID' or
similar (just arbitrarily define some name beginning with the Linux_
prefix), etc. This means that the partition table type would need to
be exposed as well (I don't know if it is already).

Cheers,
Kyle Moffett

--
I lost interest in "blade servers" when I found they didn't throw
knives at people who weren't supposed to be in your machine room.
-- Anthony de Boer


2006-01-31 01:42:59

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Exporting which partitions to md-configure

Neil Brown wrote:
> On Monday January 30, [email protected] wrote:
>
>>Any feeling how best to do that? My current thinking is to export a
>>"flags" entry in addition to the current ones, presumably based on
>>"struct parsed_partitions->parts[].flags" (fs/partitions/check.h), which
>>seems to be what causes md_autodetect_dev() to be called.
>
> I think I would prefer a 'type' attribute in each partition that
> records the 'type' from the partition table. This might be more
> generally useful than just for md.
> Then your userspace code would have to look for '253' and use just
> those partitions.
>

What about non-DOS partitions?

-hpa

2006-01-31 01:46:59

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Exporting which partitions to md-configure

Kyle Moffett wrote:
>
> Well, for an MSDOS partition table, you would look for '253', for a Mac
> partition table you could look for something like 'Linux_RAID' or
> similar (just arbitrarily define some name beginning with the Linux_
> prefix), etc. This means that the partition table type would need to
> be exposed as well (I don't know if it is already).
>

It's not, but perhaps exporting "format" and "type" as distinct
attributes is the way to go. The policy for which partitions to
consider would live entirely in kinit that way.

type would be format-specific; in EFI it's a UUID.

This, of course, is a bigger change, but it just might be worth it.

-hpa

2006-01-31 02:01:12

by NeilBrown

[permalink] [raw]
Subject: Re: Exporting which partitions to md-configure

On Monday January 30, [email protected] wrote:
> Neil Brown wrote:
> > On Monday January 30, [email protected] wrote:
> >
> >>Any feeling how best to do that? My current thinking is to export a
> >>"flags" entry in addition to the current ones, presumably based on
> >>"struct parsed_partitions->parts[].flags" (fs/partitions/check.h), which
> >>seems to be what causes md_autodetect_dev() to be called.
> >
> > I think I would prefer a 'type' attribute in each partition that
> > records the 'type' from the partition table. This might be more
> > generally useful than just for md.
> > Then your userspace code would have to look for '253' and use just
> > those partitions.
> >
>
> What about non-DOS partitions?

Well, grepping through fs/partitions/*.c, the 'flags' thing is set by
efi.c, msdos.c sgi.c sun.c

Of these, efi compares something against PARTITION_LINUX_RAID_GUID,
and msdos.c, sgi.c and sun. compare something against
LINUX_RAID_PARTITION.

The former would look like
e6d6d379-f507-44c2-a23c-238f2a3df928
in sysfs (I think);
The latter would look like
fd
(I suspect).

These are both easily recognisable with no real room for confusion.

And if other partition styles wanted to add support for raid auto
detect, tell them "no". It is perfectly possible and even preferable
to live without autodetect. We should support legacy usage (those
above) but should discourage any new usage.

NeilBrown

2006-01-31 02:02:06

by NeilBrown

[permalink] [raw]
Subject: Re: Exporting which partitions to md-configure

On Monday January 30, [email protected] wrote:
> On Jan 30, 2006, at 20:10, Neil Brown wrote:
> > On Monday January 30, [email protected] wrote:
> >> Any feeling how best to do that? My current thinking is to export
> >> a "flags" entry in addition to the current ones, presumably based
> >> on "struct parsed_partitions->parts[].flags" fs/partitions/
> >> check.h), which seems to be what causes md_autodetect_dev() to be
> >> called.
> >
> > I think I would prefer a 'type' attribute in each partition that
> > records the 'type' from the partition table. This might be more
> > generally useful than just for md. Then your userspace code would
> > have to look for '253' and use just those partitions.
>
> Well, for an MSDOS partition table, you would look for '253', for a
> Mac partition table you could look for something like 'Linux_RAID' or
> similar (just arbitrarily define some name beginning with the Linux_
> prefix), etc. This means that the partition table type would need to
> be exposed as well (I don't know if it is already).

Mac partition tables doesn't currently support autodetect (as far as I
can tell). Let's keep it that way.

NeilBrown

2006-01-31 02:05:28

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Exporting which partitions to md-configure

Neil Brown wrote:
>
> Well, grepping through fs/partitions/*.c, the 'flags' thing is set by
> efi.c, msdos.c sgi.c sun.c
>
> Of these, efi compares something against PARTITION_LINUX_RAID_GUID,
> and msdos.c, sgi.c and sun. compare something against
> LINUX_RAID_PARTITION.
>
> The former would look like
> e6d6d379-f507-44c2-a23c-238f2a3df928
> in sysfs (I think);
> The latter would look like
> fd
> (I suspect).
>
> These are both easily recognisable with no real room for confusion.

Well, if we're going to have a generic facility it should make sense
across the board. If all we're doing is supporting legacy usage we
might as well export a flag.

I guess we could have a single entry with a string of the form
"efi:e6d6d379-f507-44c2-a23c-238f2a3df928" or "msdos:fd" etc -- it
really doesn't make any difference to me, but it seems cleaner to have
two pieces of data in two different sysfs entries.

>
> And if other partition styles wanted to add support for raid auto
> detect, tell them "no". It is perfectly possible and even preferable
> to live without autodetect. We should support legacy usage (those
> above) but should discourage any new usage.
>

Why is that, keeping in mind this will all be done in userspace?

-hpa

2006-01-31 02:39:53

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Exporting which partitions to md-configure

Neil Brown wrote:
>
> Mac partition tables doesn't currently support autodetect (as far as I
> can tell). Let's keep it that way.
>

For now I guess I'll just take the code from init/do_mounts_md.c; we can
worry about ripping the RAID_AUTORUN code out of the kernel later.

-hpa

2006-01-31 03:21:48

by Greg KH

[permalink] [raw]
Subject: Re: [klibc] Exporting which partitions to md-configure

On Mon, Jan 30, 2006 at 04:52:08PM -0800, H. Peter Anvin wrote:
> I'm putting the final touches on kinit, which is the user-space
> replacement (based on klibc) for the whole in-kernel root-mount complex.
> Pretty much the one thing remaining -- other than lots of testing --
> is to handle automatically mounted md devices. In order to do that,
> without adding userspace versions of all the paritition code (which may
> be a future change, but a pretty big one) it would be good if the
> partition flag to auto-configure RAID was available in userspace,
> presumably through sysfs.

What are you looking for exactly? udev has a great helper program,
volume_id, that identifies any type of filesystem that Linux knows about
(it was based on the ext2 lib code, but smaller, and much more sane, and
works better.)

Would that help out here?

thanks,

greg k-h

2006-01-31 03:24:49

by Greg KH

[permalink] [raw]
Subject: Re: [klibc] Exporting which partitions to md-configure

On Mon, Jan 30, 2006 at 07:21:33PM -0800, Greg KH wrote:
> On Mon, Jan 30, 2006 at 04:52:08PM -0800, H. Peter Anvin wrote:
> > I'm putting the final touches on kinit, which is the user-space
> > replacement (based on klibc) for the whole in-kernel root-mount complex.
> > Pretty much the one thing remaining -- other than lots of testing --
> > is to handle automatically mounted md devices. In order to do that,
> > without adding userspace versions of all the paritition code (which may
> > be a future change, but a pretty big one) it would be good if the
> > partition flag to auto-configure RAID was available in userspace,
> > presumably through sysfs.
>
> What are you looking for exactly? udev has a great helper program,
> volume_id, that identifies any type of filesystem that Linux knows about
> (it was based on the ext2 lib code, but smaller, and much more sane, and
> works better.)
>
> Would that help out here?

Oh, an example of it working:
# vol_id /dev/sda3
ID_FS_USAGE=filesystem
ID_FS_TYPE=ext3
ID_FS_VERSION=1.0
ID_FS_UUID=9d2efd53-6b5a-4f84-86cc-def71269b7ca
ID_FS_LABEL=
ID_FS_LABEL_SAFE=
# vol_id -t /dev/sda3
ext3
# vol_id -u /dev/sda3
9d2efd53-6b5a-4f84-86cc-def71269b7ca

It also shows just the label if you have one set (I don't on this
disk...)

thanks,

greg k-h

2006-01-31 03:53:56

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [klibc] Exporting which partitions to md-configure

Greg KH wrote:
>
> What are you looking for exactly? udev has a great helper program,
> volume_id, that identifies any type of filesystem that Linux knows about
> (it was based on the ext2 lib code, but smaller, and much more sane, and
> works better.)
>
> Would that help out here?
>

It might, but it's also rather ugly to have two pieces of code,
especially in the presence of very dynamic partitions. In other words,
if the kernel deals with partitions, you want to be able to get at the
kernel's view of partitions, not necessarily the actual set of
partitions on disk, which can be quite different.

-hpa

2006-01-31 06:42:56

by Kyle Moffett

[permalink] [raw]
Subject: Re: Exporting which partitions to md-configure

On Jan 30, 2006, at 21:01, Neil Brown wrote:
> On Monday January 30, [email protected] wrote:
>> On Jan 30, 2006, at 20:10, Neil Brown wrote:
>>> On Monday January 30, [email protected] wrote:
>>>> Any feeling how best to do that? My current thinking is to
>>>> export a "flags" entry in addition to the current ones,
>>>> presumably based on "struct parsed_partitions->parts[].flags" fs/
>>>> partitions/ check.h), which seems to be what causes
>>>> md_autodetect_dev() to be called.
>>>
>>> I think I would prefer a 'type' attribute in each partition that
>>> records the 'type' from the partition table. This might be more
>>> generally useful than just for md. Then your userspace code
>>> would have to look for '253' and use just those partitions.
>>
>> Well, for an MSDOS partition table, you would look for '253', for
>> a Mac partition table you could look for something like
>> 'Linux_RAID' or similar (just arbitrarily define some name
>> beginning with the Linux_ prefix), etc. This means that the
>> partition table type would need to
>> be exposed as well (I don't know if it is already).
>
> Mac partition tables doesn't currently support autodetect (as far
> as I can tell). Let's keep it that way.

Well, no, the point would definitely *NOT* be to add kernel-level
autodetect of stuff in the Mac partition tables. The point would be
to export the partition-table-format and partition-type information
to userspace. That way a custom mdadm-control-script could have a
config file with "AUTODETECT_TYPE=Linux_RAID" or
"AUTODETECT_TYPE=253" or "AUTODETECT_TYPE=<insert EFI UUID here>",
etc. The whole detection thing could be configured and done in
userspace based on partition table info provided by the kernel if
desired, or mdadm could just scan all disks for RAID headers like it
does now. The idea would be that any autodetection would be
completely out of the kernel and userspace's responsibility; the
kernel would just export info to make it easier.

Cheers,
Kyle Moffett

--
Unix was not designed to stop people from doing stupid things,
because that would also stop them from doing clever things.
-- Doug Gwyn


2006-01-31 06:49:13

by Chris Wedgwood

[permalink] [raw]
Subject: Re: [klibc] Re: Exporting which partitions to md-configure

On Mon, Jan 30, 2006 at 05:42:45PM -0800, H. Peter Anvin wrote:

> What about non-DOS partitions?

Is something like libblkid suitable as a starting point of something
you can cut-down-to-size?

text data bss dec hex filename
24978 2272 12 27262 6a7e /lib/libblkid.so.1

2006-01-31 06:53:39

by Kyle Moffett

[permalink] [raw]
Subject: Re: [klibc] Exporting which partitions to md-configure

On Jan 30, 2006, at 22:24, Greg KH wrote:
> Oh, an example of it working:
> # vol_id /dev/sda3
> ID_FS_USAGE=filesystem
> ID_FS_TYPE=ext3
> ID_FS_VERSION=1.0
> ID_FS_UUID=9d2efd53-6b5a-4f84-86cc-def71269b7ca
> ID_FS_LABEL=
> ID_FS_LABEL_SAFE=
> # vol_id -t /dev/sda3
> ext3
> # vol_id -u /dev/sda3
> 9d2efd53-6b5a-4f84-86cc-def71269b7ca

That shows filesystem information, but not at all the partition table
information. If I look at my mac partition table (this is using the
apple-provided tool under OS X, but it's the same using the Linux
tool), for example:

hc6524e32:~ kyle$ sudo -H /usr/sbin/pdisk -l /dev/disk1

Partition map (with 512 byte blocks) on '/dev/disk1'
#: type name length
base ( size )
1: Apple_partition_map Apple 63 @ 1
2: Apple_Bootstrap bootstrap 1600 @ 64
3: Apple_UNIX_SVR2 linux_boot 1048576 @
1664 (512.0M)
4: Apple_UNIX_SVR2 linux_swap 2097152 @
1050240 ( 1.0G)
5: Apple_UNIX_SVR2 linux_lvm 241051200 @
3147392 (114.9G)
6: Apple_Boot eXternal booter 262144 @
244198592 (128.0M)
7: Apple_RAID Apple_RAID_OfflineV2_Untitled_2 243936432 @
244460736 (116.3G)

Device block size=512, Number of Blocks=488397168 (232.9G)
DeviceType=0x0, DeviceId=0x0


Now obviously linux_boot, linux_swap, and linux_lvm are not
_actually_ Apple_UNIX_SVR2, but that's the type stored in the
partition map. They also have partition map labels "linux_*", but
they do *not* have ext3 volume labels. In fact, linux_boot is one
slice of a RAID1 of 3 drives, linux_swap is one slice of a RAID5 of 3
drives, and linux_lvm is one slice of a RAID5 of 3 drives that
alltogether make an LVM PV. If I examine each of those partitions
individually, I get a lot of other information that is totally
unrelated to that stored in the partition table. It would be nice to
be able to change the type of linux_* from Apple_UNIX_SVR2 to
Linux_RAID (Max length is 31 characters), and have my userspace tools
be able to detect that and do useful things with it and the pmap label.


Cheers,
Kyle Moffett

--
If you don't believe that a case based on [nothing] could potentially
drag on in court for _years_, then you have no business playing with
the legal system at all.
-- Rob Landley



2006-02-06 01:46:51

by NeilBrown

[permalink] [raw]
Subject: Re: Exporting which partitions to md-configure

On Monday January 30, [email protected] wrote:
> Neil Brown wrote:
> >
> > Well, grepping through fs/partitions/*.c, the 'flags' thing is set by
> > efi.c, msdos.c sgi.c sun.c
> >
> > Of these, efi compares something against PARTITION_LINUX_RAID_GUID,
> > and msdos.c, sgi.c and sun. compare something against
> > LINUX_RAID_PARTITION.
> >
> > The former would look like
> > e6d6d379-f507-44c2-a23c-238f2a3df928
> > in sysfs (I think);
> > The latter would look like
> > fd
> > (I suspect).
> >
> > These are both easily recognisable with no real room for confusion.
>
> Well, if we're going to have a generic facility it should make sense
> across the board. If all we're doing is supporting legacy usage we
> might as well export a flag.
>
> I guess we could have a single entry with a string of the form
> "efi:e6d6d379-f507-44c2-a23c-238f2a3df928" or "msdos:fd" etc -- it
> really doesn't make any difference to me, but it seems cleaner to have
> two pieces of data in two different sysfs entries.

What constitutes 'a piece of data'? A bit? a byte?

I would say that
msdos:fd
is one piece of data. The 'fd' is useless without the 'msdos'.
The 'msdos' is, I guess, not completely useless with the fd.

I would lean towards the composite, but I wouldn't fight a separation.


>
> >
> > And if other partition styles wanted to add support for raid auto
> > detect, tell them "no". It is perfectly possible and even preferable
> > to live without autodetect. We should support legacy usage (those
> > above) but should discourage any new usage.
> >
>
> Why is that, keeping in mind this will all be done in userspace?
>

partition-type based autodetect is easily fooled.
If you take a pair of drives from a failed computer, plug them into a
similar computer for data recovery, and boot: you might have two
different pairs of drives that both want to be 'md0'. Which wins?

I believe there needs to be a clear, non ambigous, causality path from
the kernel paramters, initramfs, or machine hardware that leads to the
arrays to be assembled and hence the filesystem to be mounted. These
should identify the array by some reasonably unique identifier. The
'minor number' stored in the raid superblock and used by
partition-based autodetect is not 'reasonably unique'.

There are many many possibilites. Some are:

kernel parameter md=0,/dev/hda,/dev/hdc

'hda' and 'hdc' on 'this' machine are (I think) still unique
identifiers of hardware, and so this can identify drives to assemble
into an array. Note that this would *not* be reliable with
md=0,/dev/sda,/dev/sdb

kernel parameter md_root_uuid=xxyy:zzyy:aabb:ccdd...
This could be interpreted by an initramfs script to run mdadm
to find and assemble the array with that uuid. The uuid of
each array is reasonably unique.

initramfs containing preconfigured /etc/mdadm.conf
This mdadm.conf would contain the uuid's of the arrays to
assemble.

kernel hardware MAC address
This could be mapped through DHCP to a host name. The host name
is then given to mdadm such that it finds and assemble the array
with 'name' (only supported in version-1 superblocks) of
$HOST-root
or whatever.


Just as there is a direct unambiguous causal path from something
present at early boot to the root filesystem that is mounted (and the
root filesystem specifies all other filesystems through fstab)
similarly there should be an unambiguous causal path from something
present at early boot to the array which holds the root filesystem -
and the root filesystem should describe all other arrays via
mdadm.conf

Does that make sense?

NeilBrown

2006-02-06 03:29:22

by Kyle Moffett

[permalink] [raw]
Subject: Re: Exporting which partitions to md-configure

On Feb 05, 2006, at 20:46, Neil Brown wrote:
> On Monday January 30, [email protected] wrote:
>> Well, if we're going to have a generic facility it should make
>> sense across the board. If all we're doing is supporting legacy
>> usage we might as well export a flag.
>>
>> I guess we could have a single entry with a string of the form
>> "efi:e6d6d379-f507-44c2-a23c-238f2a3df928" or "msdos:fd" etc -- it
>> really doesn't make any difference to me, but it seems cleaner to
>> have two pieces of data in two different sysfs entries.
>
> What constitutes 'a piece of data'? A bit? a byte?
>
> I would say that
> msdos:fd
> is one piece of data. The 'fd' is useless without the 'msdos'. The
> 'msdos' is, I guess, not completely useless with the fd. I would
> lean towards the composite, but I wouldn't fight a separation.

I think this boundary is blurred by the fact that several partition
tables allow mostly-arbitrary partition type strings. It would be
convenient to not have to worry about the prefix in that case. You
would need just the partition type in the parent device anyways, so
why munge it into the partition label too?

>>> And if other partition styles wanted to add support for raid auto
>>> detect, tell them "no". It is perfectly possible and even preferable
>>> to live without autodetect. We should support legacy usage (those
>>> above) but should discourage any new usage.
>>
>> Why is that, keeping in mind this will all be done in userspace?
>
> partition-type based autodetect is easily fooled. If you take a
> pair of drives from a failed computer, plug them into a similar
> computer for data recovery, and boot: you might have two different
> pairs of drives that both want to be 'md0'. Which wins?

Nonono, not _just_ partition-type based autodetect, but a more
complicated solution done _completely_ in userspace. Essentially, by
exporting this data you would merely be providing _extra_ pieces of
data that could be verified on boot. If I know that my boot RAID
volumes for my desktop always have a partition table type string of
"Linux_RAID_<unique-id>", then I can _also_ verify that in my
initramdisk. This isn't as useful on x86, but that's no reason to
prevent it from being used on archs that do allow 31+ character
strings for partition types.

> I believe there needs to be a clear, non ambigous, causality path
> from the kernel paramters, initramfs, or machine hardware that
> leads to the arrays to be assembled and hence the filesystem to be
> mounted.

This is one way of doing that on a systems with mac partition
tables. The autoprobing is mostly useless on x86 hardware due to the
limited range of partition types, but that's x86's problem.

Cheers,
Kyle Moffett

--
If you don't believe that a case based on [nothing] could potentially
drag on in court for _years_, then you have no business playing with
the legal system at all.
-- Rob Landley



2006-02-07 02:48:04

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [klibc] Re: Exporting which partitions to md-configure

Neil Brown wrote:
>
> What constitutes 'a piece of data'? A bit? a byte?
>
> I would say that
> msdos:fd
> is one piece of data. The 'fd' is useless without the 'msdos'.
> The 'msdos' is, I guess, not completely useless with the fd.
>
> I would lean towards the composite, but I wouldn't fight a separation.
>

Well, the two pieces come from different sources.

>
> Just as there is a direct unambiguous causal path from something
> present at early boot to the root filesystem that is mounted (and the
> root filesystem specifies all other filesystems through fstab)
> similarly there should be an unambiguous causal path from something
> present at early boot to the array which holds the root filesystem -
> and the root filesystem should describe all other arrays via
> mdadm.conf
>
> Does that make sense?
>

It makes sense, but I disagree. I believe you are correct in that the
current "preferred minor" bit causes an invalid assumption that, e.g.
/dev/md3 is always a certain thing, but since each array has a UUID, and
one should be able to mount by either filesystem UUID or array UUID,
there should be no need for such a conflict if one allows for dynamic md
numbers.

Requiring that mdadm.conf describes the actual state of all volumes
would be an enormous step in the wrong direction. Right now, the Linux
md system can handle some very oddball hardware changes (such as on
hera.kernel.org, when the disks not just completely changed names due to
a controller change, but changed from hd* to sd*!)

Dynamicity is a good thing, although it needs to be harnessed.

> kernel parameter md_root_uuid=xxyy:zzyy:aabb:ccdd...
> This could be interpreted by an initramfs script to run mdadm
> to find and assemble the array with that uuid. The uuid of
> each array is reasonably unique.

This, in fact is *EXACTLY* what we're talking about; it does require
autoassemble. Why do we care about the partition types at all? The
reason is that since the md superblock is at the end, it doesn't get
automatically wiped if the partition is used as a raw filesystem, and so
it's important that there is a qualifier for it.

-hpa

2006-02-07 10:43:14

by Luca Berra

[permalink] [raw]
Subject: Re: [klibc] Re: Exporting which partitions to md-configure

On Mon, Feb 06, 2006 at 06:47:54PM -0800, H. Peter Anvin wrote:
>Neil Brown wrote:
>Requiring that mdadm.conf describes the actual state of all volumes
>would be an enormous step in the wrong direction. Right now, the Linux
>md system can handle some very oddball hardware changes (such as on
>hera.kernel.org, when the disks not just completely changed names due to
>a controller change, but changed from hd* to sd*!)
DEVICE partitions
ARRAY /dev/md0 UUID=xxyy:zzyy:aabb:ccdd

would catch that


>Dynamicity is a good thing, although it needs to be harnessed.
>
> > kernel parameter md_root_uuid=xxyy:zzyy:aabb:ccdd...
> > This could be interpreted by an initramfs script to run mdadm
> > to find and assemble the array with that uuid. The uuid of
> > each array is reasonably unique.
I could change mdassemble to allow accepting an uuid on the command line
and assemble a /dev/md0 with the specified uuid (at the moment it only
accepts a configuration file, which i tought was enough for
initrd/initramfs.

>This, in fact is *EXACTLY* what we're talking about; it does require
>autoassemble. Why do we care about the partition types at all? The
>reason is that since the md superblock is at the end, it doesn't get
>automatically wiped if the partition is used as a raw filesystem, and so
>it's important that there is a qualifier for it.
I don't like using partition type as a qualifier, there is people who do
not wish to partition their drives, there are systems not supporting
msdos like partitions, heck even m$ is migrating away from those.

In any case if that has to be done it should be done into mdadm, not
in a different scrip that is going to call mdadm (behaviour should be
consistent between mdadm invoked by initramfs and mdadm invoked on a
running system).

If the user wants to reutilize a device that was previously a member of
an md array he/she should use mdadm --zero-superblock to remove the
superblock.
I see no point in having a system that tries to compensate for users not
following correct procedures. sorry.

L.

--
Luca Berra -- [email protected]
Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
X AGAINST HTML MAIL
/ \

2006-02-07 15:47:18

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [klibc] Re: Exporting which partitions to md-configure

Luca Berra wrote:
>
>> This, in fact is *EXACTLY* what we're talking about; it does require
>> autoassemble. Why do we care about the partition types at all? The
>> reason is that since the md superblock is at the end, it doesn't get
>> automatically wiped if the partition is used as a raw filesystem, and
>> so it's important that there is a qualifier for it.
>
> I don't like using partition type as a qualifier, there is people who do
> not wish to partition their drives, there are systems not supporting
> msdos like partitions, heck even m$ is migrating away from those.
>

That's why we're talking about non-msdos partitioning schemes.

> In any case if that has to be done it should be done into mdadm, not
> in a different scrip that is going to call mdadm (behaviour should be
> consistent between mdadm invoked by initramfs and mdadm invoked on a
> running system).

Agreed. mdadm is the best place for it.

> If the user wants to reutilize a device that was previously a member of
> an md array he/she should use mdadm --zero-superblock to remove the
> superblock.
> I see no point in having a system that tries to compensate for users not
> following correct procedures. sorry.

You don't? That surprises me... making it harder for the user to have
accidental data loss sounds like a very good thing to me.

-hpa

2006-02-07 16:47:34

by Luca Berra

[permalink] [raw]
Subject: Re: [klibc] Re: Exporting which partitions to md-configure

On Tue, Feb 07, 2006 at 07:46:59AM -0800, H. Peter Anvin wrote:
>Luca Berra wrote:
>>
>>I don't like using partition type as a qualifier, there is people who do
>>not wish to partition their drives, there are systems not supporting
>>msdos like partitions, heck even m$ is migrating away from those.
>>
>
>That's why we're talking about non-msdos partitioning schemes.

this still leaves whole disks

>>If the user wants to reutilize a device that was previously a member of
>>an md array he/she should use mdadm --zero-superblock to remove the
>>superblock.
>>I see no point in having a system that tries to compensate for users not
>>following correct procedures. sorry.
>
>You don't? That surprises me... making it harder for the user to have
>accidental data loss sounds like a very good thing to me.

making it harder for the user is a good thing, but please not at the
expense of usability

the only way i see a user can have data loss is if
- a md array is stopped
- two different filesystems are created on the component devices
- these filesystems are filled with data, but not to the point of
damaging the superblock
- then the array is started again.

if only one device is removed using mdadm the event counter would
prevent the array from being assembled again.

there are a lot of easier ways for shooting yourself in the feet :)

if we really want to be paranoid we should modify mkXXXfs to refuse
creating a filesystem if the device has an md superblock on it. (lvm2
tools are already able to ignore devices with md superblocks on them,
no clue about EVMS)

L.
--
Luca Berra -- [email protected]
Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
X AGAINST HTML MAIL
/ \

2006-02-07 16:55:31

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [klibc] Re: Exporting which partitions to md-configure

Luca Berra wrote:
>
> making it harder for the user is a good thing, but please not at the
> expense of usability
>

What's the usability problem?

-hpa

2006-02-07 17:03:18

by Luca Berra

[permalink] [raw]
Subject: Re: [klibc] Re: Exporting which partitions to md-configure

On Tue, Feb 07, 2006 at 08:55:21AM -0800, H. Peter Anvin wrote:
>Luca Berra wrote:
>>
>>making it harder for the user is a good thing, but please not at the
>>expense of usability
>>
>
>What's the usability problem?
>
if we fail to support all partitioning schemes and we do not support
non partitioned devices.

if we manage to support all this without too much code bloat i'll shut
up.

L.

--
Luca Berra -- [email protected]
Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
X AGAINST HTML MAIL
/ \