2006-10-31 06:00:54

by NeilBrown

[permalink] [raw]
Subject: [PATCH 001 of 6] md: Send online/offline uevents when an md array starts/stops.


This allows udev to do something intelligent when an
array becomes available.

cc: [email protected]
Signed-off-by: Neil Brown <[email protected]>

### Diffstat output
./drivers/md/md.c | 2 ++
1 file changed, 2 insertions(+)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c 2006-10-31 16:40:52.000000000 +1100
+++ ./drivers/md/md.c 2006-10-31 16:41:02.000000000 +1100
@@ -3200,6 +3200,7 @@ static int do_md_run(mddev_t * mddev)

mddev->changed = 1;
md_new_event(mddev);
+ kobject_uevent(&mddev->gendisk->kobj, KOBJ_ONLINE);
return 0;
}

@@ -3313,6 +3314,7 @@ static int do_md_stop(mddev_t * mddev, i

module_put(mddev->pers->owner);
mddev->pers = NULL;
+ kobject_uevent(&mddev->gendisk->kobj, KOBJ_OFFLINE);
if (mddev->ro)
mddev->ro = 0;
}


2006-10-31 21:16:50

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 001 of 6] md: Send online/offline uevents when an md array starts/stops.

On Tue, Oct 31, 2006 at 05:00:46PM +1100, NeilBrown wrote:
>
> This allows udev to do something intelligent when an
> array becomes available.
>
> cc: [email protected]
> Signed-off-by: Neil Brown <[email protected]>

Acked-by: Greg Kroah-Hartman <[email protected]>

2006-11-02 12:13:58

by Kay Sievers

[permalink] [raw]
Subject: Re: [PATCH 001 of 6] md: Send online/offline uevents when an md array starts/stops.

On 10/31/06, Greg KH <[email protected]> wrote:
> On Tue, Oct 31, 2006 at 05:00:46PM +1100, NeilBrown wrote:
> >
> > This allows udev to do something intelligent when an
> > array becomes available.
> >
> > cc: [email protected]
> > Signed-off-by: Neil Brown <[email protected]>
>
> Acked-by: Greg Kroah-Hartman <[email protected]>

I don't agree with this, and asked several times to change this to
"change" events, like device-mapper is doing it to address the same
problem. Online/offline is not supported by udev/HAL and will not work
as expected. Please fix this.

Thanks,
Kay

2006-11-02 12:33:03

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 001 of 6] md: Send online/offline uevents when an md array starts/stops.

On Thursday November 2, [email protected] wrote:
> On 10/31/06, Greg KH <[email protected]> wrote:
> > On Tue, Oct 31, 2006 at 05:00:46PM +1100, NeilBrown wrote:
> > >
> > > This allows udev to do something intelligent when an
> > > array becomes available.
> > >
> > > cc: [email protected]
> > > Signed-off-by: Neil Brown <[email protected]>
> >
> > Acked-by: Greg Kroah-Hartman <[email protected]>
>
> I don't agree with this, and asked several times to change this to
> "change" events, like device-mapper is doing it to address the same
> problem. Online/offline is not supported by udev/HAL and will not work
> as expected. Please fix this.

I don't remember who suggested "online/offline", and I don't remember
you suggesting "change", but my memory isn't what it used to be(*), so you
probably did.

Is there some document somewhere that explains exactly what each of
the kobject_actions are meant to mean and how they can be
interpreted?

Anyway, I am happy to change it. What exactly do you want?
KOBJ_CHANGE both when the array is activated and when it is
deactivated? Or only when it is activated?
Should ONLINE and OFFLINE remain and CHANGE be added, or should they
go away?
If they remain, should CHANGE come before or after ONLINE (and
OFFLINE)?


I must admit that it feels more like an ONLINE/OFFLINE event than a
CHANGE event to me, but they are just words after all.

What does udev/HAL do with ONLINE/OFFLINE? Could it be changed to do
"the right thing" for ONLINE? (Not implying that it should be, just
wanting to understand as much of the picture as possible).

Thanks,
NeilBrown


(*) At least I think it isn't what it used to be, but I cannot
remember what it used to be, so I'm not sure :-)

2006-11-02 13:52:01

by Kay Sievers

[permalink] [raw]
Subject: Re: [PATCH 001 of 6] md: Send online/offline uevents when an md array starts/stops.

On Thu, 2006-11-02 at 23:32 +1100, Neil Brown wrote:
> On Thursday November 2, [email protected] wrote:
> > On 10/31/06, Greg KH <[email protected]> wrote:
> > > On Tue, Oct 31, 2006 at 05:00:46PM +1100, NeilBrown wrote:
> > > > This allows udev to do something intelligent when an
> > > > array becomes available.
> > > >
> > > Acked-by: Greg Kroah-Hartman <[email protected]>
> >
> > I don't agree with this, and asked several times to change this to
> > "change" events, like device-mapper is doing it to address the same
> > problem. Online/offline is not supported by udev/HAL and will not work
> > as expected. Please fix this.
>
> I don't remember who suggested "online/offline",

It was probably the first version of the patch for device-mapper which
we got into SLE10, but it changed to "change" in the upstream kernel,
after we all met at OLS and talked about it.

> and I don't remember
> you suggesting "change", but my memory isn't what it used to be(*), so you
> probably did.

It was in the Czech Republic, but we got a few beers... :) And in the
"virtual md devices" conversation.

> Is there some document somewhere that explains exactly what each of
> the kobject_actions are meant to mean and how they can be
> interpreted?

No, there isn't. The thing is, that "online/offline" need to be always
symmetric in it's order. There can't be two "online" events without an
"offline" event. We decided at OLS for the device-mapper events, that we
can't be sure, that there will always be "online/offline" sequences and
can't be sure to make them always match the right sequence. Therefore we
decided to go for a simple "change", and let userspace find out the
current state of the device if needed.

> Anyway, I am happy to change it. What exactly do you want?
> KOBJ_CHANGE both when the array is activated and when it is
> deactivated? Or only when it is activated?

We couldn't think of any use of an "offline" event. So we removed the
event when the device-mapper device is suspended.

> Should ONLINE and OFFLINE remain and CHANGE be added, or should they
> go away?

The current idea is to send only a "change" event if something happens
that makes it necessary for udev to reinvestigate the device, like
possible filesystem content that creates /dev/disk/by-* links.

Finer grained device-monitoring is likely better placed by using the
poll() infrastructure for a sysfs file, instead of sending pretty
expensive uevents.

Udev only hooks into "change" and revalidates all current symlinks for
the device. Udev can run programs on "online", but currently, it will
not update any /dev/disk/by-* link, if the device changes its content.

> If they remain, should CHANGE come before or after ONLINE (and
> OFFLINE)?

> I must admit that it feels more like an ONLINE/OFFLINE event than a
> CHANGE event to me, but they are just words after all.

Yeah, "online/offline" sounds nice, but it will get messy, if you have a
case where you don't need to go offline, but still want to notify a
change ... :)

> What does udev/HAL do with ONLINE/OFFLINE? Could it be changed to do
> "the right thing" for ONLINE? (Not implying that it should be, just
> wanting to understand as much of the picture as possible).

Sure, it's just software, it definitely could be made to match on
anything. It's just that "change" already works fine today. :)

Thanks,
Kay


2006-11-03 06:57:43

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 001 of 6] md: Send online/offline uevents when an md array starts/stops.

On Thursday November 2, [email protected] wrote:
> On Thu, 2006-11-02 at 23:32 +1100, Neil Brown wrote:
> > and I don't remember
> > you suggesting "change", but my memory isn't what it used to be(*), so you
> > probably did.
>
> It was in the Czech Republic, but we got a few beers... :) And in the
> "virtual md devices" conversation.

Hmm... rings a bell. I guess I didn't appreciate the important
difference between 'change' and 'online' at the time. Thanks for
clearing that up.

>
> We couldn't think of any use of an "offline" event. So we removed the
> event when the device-mapper device is suspended.
>
> > Should ONLINE and OFFLINE remain and CHANGE be added, or should they
> > go away?
>
> The current idea is to send only a "change" event if something happens
> that makes it necessary for udev to reinvestigate the device, like
> possible filesystem content that creates /dev/disk/by-* links.
>
> Finer grained device-monitoring is likely better placed by using the
> poll() infrastructure for a sysfs file, instead of sending pretty
> expensive uevents.
>
> Udev only hooks into "change" and revalidates all current symlinks for
> the device. Udev can run programs on "online", but currently, it will
> not update any /dev/disk/by-* link, if the device changes its content.
>

OK. Makes sense.
I tried it an got an interesting result....

This is with md generating 'CHANGE' events when an array goes on-line
and when it goes off line, and also with another patch which causes md
devices to disappear when not active so that we get ADD and REMOVE
events at reasonably appropriate times.

It all works fine until I stop an array.
We get a CHANGE event and then a REMOVE event.
And then a seemingly infinite series of ADD/REMOVE pairs.

I guess that udev sees the CHANGE and so opens the device to see what
is there. By that time the device has disappeared so the open causes
an ADD. udev doesn't find anything and closes the device which causes
it to disappear and we get a REMOVE.
Now udev sees that ADD and so opens the device again to see what it
there, triggering an ADD. Nothing is there so we close it and get a
REMOVE.
Now udev sees the second ADD and ....

A bit unfortunate really. This didn't happen when I had
ONLINE/OFFLINE as udev ignored the OFFLINE.
I guess I can removed the CHANGE at shutdown, but as there really is a
change there, that doesn't seem right.

The real problem is that udev opens the device, and md interprets and
'open' as a request to create the device. And udev see the open and an
ADD and so opens the device....

It's not clear to me what the 'right' thing to do here is:
- I could stop removing the device on last-close, but I still
think that (the current situation) is ugly.
- I could delay the remove until udev will have stopped poking,
but that is even more ugly
- udev could avoid opening md devices until it has poked in
/sys/block/mdX to see what the status is, but that is very specific
to md

It would be nice if I could delay the add until later, but that would
require major surgery and probably break the model badly.

On the whole, it seems that udev was designed without thought to the
special needs of md, and md was designed (long ago) without thought
the ugliness that "open creates a device" causes.

Any clever ideas anyone?


NeilBrown

2006-11-03 08:23:09

by Kay Sievers

[permalink] [raw]
Subject: Re: [PATCH 001 of 6] md: Send online/offline uevents when an md array starts/stops.

On Fri, 2006-11-03 at 17:57 +1100, Neil Brown wrote:
> On Thursday November 2, [email protected] wrote:
> > On Thu, 2006-11-02 at 23:32 +1100, Neil Brown wrote:

> > We couldn't think of any use of an "offline" event. So we removed the
> > event when the device-mapper device is suspended.
> >
> > > Should ONLINE and OFFLINE remain and CHANGE be added, or should they
> > > go away?
> >
> > The current idea is to send only a "change" event if something happens
> > that makes it necessary for udev to reinvestigate the device, like
> > possible filesystem content that creates /dev/disk/by-* links.
> >
> > Finer grained device-monitoring is likely better placed by using the
> > poll() infrastructure for a sysfs file, instead of sending pretty
> > expensive uevents.
> >
> > Udev only hooks into "change" and revalidates all current symlinks for
> > the device. Udev can run programs on "online", but currently, it will
> > not update any /dev/disk/by-* link, if the device changes its content.
> >
>
> OK. Makes sense.
> I tried it an got an interesting result....
>
> This is with md generating 'CHANGE' events when an array goes on-line
> and when it goes off line, and also with another patch which causes md
> devices to disappear when not active so that we get ADD and REMOVE
> events at reasonably appropriate times.
>
> It all works fine until I stop an array.
> We get a CHANGE event and then a REMOVE event.
> And then a seemingly infinite series of ADD/REMOVE pairs.
>
> I guess that udev sees the CHANGE and so opens the device to see what
> is there. By that time the device has disappeared so the open causes
> an ADD. udev doesn't find anything and closes the device which causes
> it to disappear and we get a REMOVE.
> Now udev sees that ADD and so opens the device again to see what it
> there, triggering an ADD. Nothing is there so we close it and get a
> REMOVE.
> Now udev sees the second ADD and ....

Hmm, why does the open() of device node of a stopped device cause an "add"?
Shouldn't it just return a failure, instead of creating a device?

> A bit unfortunate really. This didn't happen when I had
> ONLINE/OFFLINE as udev ignored the OFFLINE.
> I guess I can removed the CHANGE at shutdown, but as there really is a
> change there, that doesn't seem right.

Yeah, it's the same problem we had with device-mapper, nobody could
think of any useful action at a dm-device suspend "change"-event, so we
didn't add it. :)

> The real problem is that udev opens the device, and md interprets and
> 'open' as a request to create the device. And udev see the open and an
> ADD and so opens the device....

Yes, current udev rules are written to to so, md needs to be excluded
from the list of block devices which are handled by the default
persistent naming rules, and moved to its own rules file. We did the
same for device-mapper to ignore some "private" dm-* volumes like
snapshot devices.

> It's not clear to me what the 'right' thing to do here is:
> - I could stop removing the device on last-close, but I still
> think that (the current situation) is ugly.
> - I could delay the remove until udev will have stopped poking,
> but that is even more ugly
> - udev could avoid opening md devices until it has poked in
> /sys/block/mdX to see what the status is, but that is very specific
> to md
>
> It would be nice if I could delay the add until later, but that would
> require major surgery and probably break the model badly.
>
> On the whole, it seems that udev was designed without thought to the
> special needs of md, and md was designed (long ago) without thought
> the ugliness that "open creates a device" causes.

The persistent naming rules for /dev/disk/by-* are causing this. Md
devices will probably just get their own rules file, which will handle
this and which can be packaged and installed along with the md tools.

If it's acceptable for you, so leave the shutdown "change" event out for
now, until someone has the need for it.
We will update the rules in the meantime, and read a sysfs file or call
a md-tool to query the current state of the device on "add" and "change"
events, this will prevent the opening of the device when it's not
supposed to do so.

Thanks,
Kay

2006-11-06 00:19:04

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 001 of 6] md: Send online/offline uevents when an md array starts/stops.

On Friday November 3, [email protected] wrote:
>
> Hmm, why does the open() of device node of a stopped device cause an "add"?
> Shouldn't it just return a failure, instead of creating a device?

Because that is the API I inherited. To create an MD array, you open
/dev/mdX and issue some IOCTLs. Originally I think the devices were
all created at boot/module-load time much like they still are for
loop.c. But when Al Viro did all that work with kmap and blkdev_get
ages ago he changed it so they didn't have to pre-created but rather
were created on-the-fly by an attempt to open the block device (this
calls in to md_probe which does the add_disk).

This creates a deep disconnect between udev and md.
udev expects a device to appear first, then it created the
device-special-file in /dev.
md expect the device-special-file to exist first, and then created the
device on the first open.

>
> > A bit unfortunate really. This didn't happen when I had
> > ONLINE/OFFLINE as udev ignored the OFFLINE.
> > I guess I can removed the CHANGE at shutdown, but as there really is a
> > change there, that doesn't seem right.
>
> Yeah, it's the same problem we had with device-mapper, nobody could
> think of any useful action at a dm-device suspend "change"-event, so we
> didn't add it. :)
>

Yes... the device cannot disappear until no-one is using it, so no-one
will be interested in it going away.

>
> The persistent naming rules for /dev/disk/by-* are causing this. Md
> devices will probably just get their own rules file, which will handle
> this and which can be packaged and installed along with the md tools.
>
> If it's acceptable for you, so leave the shutdown "change" event out for
> now, until someone has the need for it.

Yes, I'll get rid of the online/offline events and just put in a
CHANGE when the array becomes available.

I'm still a bit concerned about the open->add->open infinite loop.
If anyone opens /dev/mdX while it isn't active (e.g. to check if it is
active), that will (given a patch that I would like to include) cause
and ADD event which will cause udev to start it's loop again.
Can we make udev ignore ADD for md and only watch for CHANGE?

Thanks,
NeilBrown

2006-11-06 08:38:34

by dean gaudet

[permalink] [raw]
Subject: Re: [PATCH 001 of 6] md: Send online/offline uevents when an md array starts/stops.

On Mon, 6 Nov 2006, Neil Brown wrote:

> This creates a deep disconnect between udev and md.
> udev expects a device to appear first, then it created the
> device-special-file in /dev.
> md expect the device-special-file to exist first, and then created the
> device on the first open.

could you create a special /dev/mdx device which is used to
assemble/create arrays only? i mean literally "mdx" not "mdX" where X is
a number. mdx would always be there if md module is loaded... so udev
would see the driver appear and then create the /dev/mdx. then mdadm
would use /dev/mdx to do assemble/creates/whatever and cause other devices
to appear/disappear in a manner which udev is happy with.

(much like how /dev/ptmx is used to create /dev/pts/N entries.)

doesn't help legacy mdadm binaries... but seems like it fits the New World
Order.

or hm i suppose the New World Order is to eschew binary interfaces and
suggest a /sys/class/md/ hierarchy with a bunch of files you have to splat
ascii data into to cause an array to be created/assembled.

-dean

2006-11-07 05:05:32

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 001 of 6] md: Send online/offline uevents when an md array starts/stops.

On Monday November 6, [email protected] wrote:
> On Mon, 6 Nov 2006, Neil Brown wrote:
>
> > This creates a deep disconnect between udev and md.
> > udev expects a device to appear first, then it created the
> > device-special-file in /dev.
> > md expect the device-special-file to exist first, and then created the
> > device on the first open.
>
> could you create a special /dev/mdx device which is used to
> assemble/create arrays only? i mean literally "mdx" not "mdX" where X is
> a number. mdx would always be there if md module is loaded... so udev
> would see the driver appear and then create the /dev/mdx. then mdadm
> would use /dev/mdx to do assemble/creates/whatever and cause other devices
> to appear/disappear in a manner which udev is happy with.
>
> (much like how /dev/ptmx is used to create /dev/pts/N entries.)
>
> doesn't help legacy mdadm binaries... but seems like it fits the New World
> Order.
>
> or hm i suppose the New World Order is to eschew binary interfaces and
> suggest a /sys/class/md/ hierarchy with a bunch of files you have to splat
> ascii data into to cause an array to be created/assembled.

I have the following patch sitting in my patch queue (since about
March).
It does what you suggest via /sys/module/md-mod/parameters/MAGIC_FILE
which is the only md-specific part of the /sys namespace that I could
find.

However I'm not at all convinced that it is a good idea. I would much
rather have mdadm control device naming than leave it up to udev.

An in any case, we have the semantic that opening an md device-file
creates the device, and we cannot get rid of that semantic without a
lot of warning and a lot of pain. And adding a new semantic isn't
really going to help.

We simply need to find the best way for udev and md to play together,
and I think we can achieve something quite workable. Both sides just
have to give a bit.

NeilBrown


Allow md devices to be created by writing to sysfs.

Until now, to create an md device, you needed to open the relevant
device-special file. This created a catch-22 with udev.

This patch provides an alternate.
Options include

echo 10 > /sys/module/md-mod/paramters/create
to create legacy device with minor 10,
echo d10 > /sys/module/md-mod/paramters/create
to create partitionable device 10<<6
cat /sys/module/md-mod/paramters/next_free_legacy
to return major:minor device of an unused legacy array, which will exist.
cat /sys/module/md-mod/paramters/next_free_partitionable
to return major:minor device of an unused partitionable array which will exist.


Signed-off-by: Neil Brown <[email protected]>

### Diffstat output
./drivers/md/md.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 56 insertions(+)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2006-03-27 09:20:58.000000000 +1100
+++ ./drivers/md/md.c 2006-03-27 09:20:56.000000000 +1100
@@ -5525,6 +5525,62 @@ module_param_call(start_ro, set_ro, get_
module_param(start_dirty_degraded, int, 0644);


+static int md_create(const char *val, struct kernel_param *kp)
+{
+ /* NN or dNN creates the numbered device */
+ int part = 0;
+ int num;
+ char *e;
+ if (*val == 'd' || *val == 'p') {
+ part = 1;
+ val++;
+ }
+ num = simple_strtoul(val, &e, 10);
+ if (*val && (*e == '\0' || *e == '\n')) {
+ /* success! */
+ dev_t dev;
+ if (part)
+ dev = MKDEV(mdp_major, num << MdpMinorShift);
+ else
+ dev = MKDEV(MD_MAJOR, num);
+ md_probe(dev, NULL, NULL);
+ return 0;
+ }
+ return -EINVAL;
+}
+static int md_next_free(char *buffer, struct kernel_param *kp)
+{
+ mddev_t *mddev;
+ int major = MD_MAJOR;
+ int inc = 1;
+ int next = MKDEV(MD_MAJOR,0);
+ if (kp->arg) {
+ next = MKDEV(mdp_major,0);
+ major = mdp_major;
+ inc = 1 << MdpMinorShift;
+ }
+ spin_lock(&all_mddevs_lock);
+ list_for_each_entry(mddev, &all_mddevs, all_mddevs)
+ if (MAJOR(mddev->unit) == major) {
+ if (atomic_read(&mddev->active)<=1 &&
+ mddev->pers == NULL &&
+ mddev->raid_disks == 0) {
+ next = mddev->unit;
+ break;
+ } else if (mddev->unit >= next)
+ next = mddev->unit + inc;
+ }
+ spin_unlock(&all_mddevs_lock);
+ md_probe(next, NULL, NULL);
+ return sprintf(buffer, "%d:%d", major, MINOR(next));
+}
+static int ignore(const char *val, struct kernel_param *kp) { return -EINVAL; }
+
+
+module_param_call(create, md_create, NULL, NULL, 0200);
+module_param_call(next_free_legacy, ignore, md_next_free, (void*)0, 0400);
+module_param_call(next_free_partitionable, ignore, md_next_free, (void*)1, 0400);
+
EXPORT_SYMBOL(register_md_personality);
EXPORT_SYMBOL(unregister_md_personality);
EXPORT_SYMBOL(md_error);

2006-11-08 11:14:46

by Kay Sievers

[permalink] [raw]
Subject: Re: [PATCH 001 of 6] md: Send online/offline uevents when an md array starts/stops.

On Mon, 2006-11-06 at 11:18 +1100, Neil Brown wrote:
> On Friday November 3, [email protected] wrote:

> > The persistent naming rules for /dev/disk/by-* are causing this. Md
> > devices will probably just get their own rules file, which will handle
> > this and which can be packaged and installed along with the md tools.

> I'm still a bit concerned about the open->add->open infinite loop.
> If anyone opens /dev/mdX while it isn't active (e.g. to check if it is
> active), that will (given a patch that I would like to include) cause
> and ADD event which will cause udev to start it's loop again.
> Can we make udev ignore ADD for md and only watch for CHANGE?

Is there a sysfs file or something similar(we could also call a md-tool)
udev could look at, before it tries to open the device? Like:
KERNEL=="md*", ATTR{state}=="active", IMPORT{program}= ...

If we currently ignore the "add" event, then we will not hook into the
coldplug logic, where "add" events are requested for all devices to do
the initial setup after bootup.

If we can't read the state of the md device, to see if it's safe to open
the device, we would need to be smarter with the coldplug logic by
requesting "change" events if necessary, or by passing a "coldplug" flag
with the synthesized event.

Thanks,
Kay

2006-11-09 00:17:52

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 001 of 6] md: Send online/offline uevents when an md array starts/stops.

On Wednesday November 8, [email protected] wrote:
>
> Is there a sysfs file or something similar(we could also call a md-tool)
> udev could look at, before it tries to open the device? Like:
> KERNEL=="md*", ATTR{state}=="active", IMPORT{program}= ...

If the /sys/block/mdX directory exists at all, it is safe to open the
device-special file. But that is racy. It could disappear between
checking that the dir exists, and opening the device-special-file.

I still think it would make SO much sense if /sys/block/md4/dev were a
device-special-file instead of a (silly) ascii file with 9:4. Then
this race could be closed. But I feel that is a battle I've never
going to win.

You could look at /sys/block/mdX/md/array_state. If that contains
'clean' or 'inactive' then there is no point opening the device.
Otherwise there might be a point, and the race would be a lot harder
to lose.

I guess it is time for me to learn about udev config files...

NeilBrown

2006-11-09 10:10:36

by Michael Tokarev

[permalink] [raw]
Subject: Re: [PATCH 001 of 6] md: Send online/offline uevents when an md array starts/stops.

Neil Brown wrote:
[/dev/mdx...]
>> (much like how /dev/ptmx is used to create /dev/pts/N entries.)
[]
> I have the following patch sitting in my patch queue (since about
> March).
> It does what you suggest via /sys/module/md-mod/parameters/MAGIC_FILE
> which is the only md-specific part of the /sys namespace that I could
> find.
>
> However I'm not at all convinced that it is a good idea. I would much
> rather have mdadm control device naming than leave it up to udev.

This is again the same "device naming" question as pops up every time
someone mentions udev. And as usual, I'm suggesting the following, which
should - hopefully - make everyone happy:

create kernel names *always*, be it /dev/mdN or /dev/sdF or whatever,
so that things like /proc/partitions, /proc/mdstat etc will be useful.
For this, the ideal solution - IMHO - is to have mini-devfs-like filesystem
mounted as /dev, so that it is possible to have "bare" names without any
help from any external programs like udev, but I don't want to start another
flamewar here, esp. since it's off-topic to *this* discussion.
Note /dev/mdN is as good as /dev/md/N - because there are only a few active
devices wich appear in /dev, there's no "risk" to have "too many" files in
/dev, hence no need to put them into subdirs like /dev/md/, /dev/sd/ etc.

if so desired, create *symlinks* at /dev with appropriate user-controlled
names to those official kernel device nodes. Be it like /dev/disk/by-label/
or /dev/cdrom0 or whatever.
The links can be created by mdadm, OR by udev - in this case, it's really
irrelevant. Udev rules does a good job of creating /dev/disk/ hierarchy
already, and that seems to be sufficient - i see no reason to make other
device nodes (symlinks) by mdadm.

By the way, unlike /dev/sdE and /dev/hdF entries, /dev/mdN nodes are pretty
stable. Even if scsi disks gets reordered, mdadm finds the component devices
by UUID (if DEVICE partitions is given in config file), and you have /dev/md1
pointing to the same "logical partition" (have the same filesystem or data)
regardless how you shuffle your disks (IF mdadm was able to find all components
and assemble the array, anyway). So sometimes, I use md/mdadm on systems
WITHOUT any "raided" drives, but where I suspect disk devices may change for
whatever reason - I just create raid0 "arrays" composed of a single partition
and let mdadm to find them in /dev/sd* and to assemble stable-numbered /dev/mdN
devices - without any help of udev or anything else (I for one dislike udev for
several reasons).

> An in any case, we have the semantic that opening an md device-file
> creates the device, and we cannot get rid of that semantic without a
> lot of warning and a lot of pain. And adding a new semantic isn't
> really going to help.

I don't think so. With new semantic in place, we've two options (provided
current semantics stays, and I don't see a strong reason why it should be
removed except of the bloat):

a) with new mdadm utilizing new semantics, there's nothing to change in udev --
it will all Just Work, by mdadm opening /dev/md-control-node (how it's called)
and assembling devices using that, and during assemble, udev will receive proper
events about new "disks" appearing and will handle that as usual.

b) without new mdadm, it will work as before (now). And in this case, let's not
send any udev events, as mdadm already created the nodes etc.

So if a user wants neat and nice md/udev integration, the way to go is case "a".
If it's not required, either case will do.

Sure, eventually, long term, support for case "b" can be removed. Or not - depending
on how the things will be implemented, because when done properly, both cases will
call the same routine(s), but case "b" will just skip sending uevents, so ioctl handlers
becomes two- or one-liners (two in case a and one in case b), which isn't bloat really ;)

/mjt

2006-11-09 10:17:59

by Michael Tokarev

[permalink] [raw]
Subject: Re: [PATCH 001 of 6] md: Send online/offline uevents when an md array starts/stops.

Michael Tokarev wrote:
> Neil Brown wrote:
> [/dev/mdx...]
[]
>> An in any case, we have the semantic that opening an md device-file
>> creates the device, and we cannot get rid of that semantic without a
>> lot of warning and a lot of pain. And adding a new semantic isn't
>> really going to help.
>
> I don't think so. With new semantic in place, we've two options (provided
> current semantics stays, and I don't see a strong reason why it should be
> removed except of the bloat):
>
> a) with new mdadm utilizing new semantics, there's nothing to change in udev --
> it will all Just Work, by mdadm opening /dev/md-control-node (how it's called)
> and assembling devices using that, and during assemble, udev will receive proper
> events about new "disks" appearing and will handle that as usual.
>
> b) without new mdadm, it will work as before (now). And in this case, let's not
> send any udev events, as mdadm already created the nodes etc.

Forgot to add. This is important point: do NOT change current behavour wrt uevents,
ie, don't add uevents for current semantics at all. Only send uevents (and in this
case it will be normal "add" and "remove" events) when assembling arrays "the new way",
using (stable!) /dev/mdcontrol misc device, after RUN_ARRAY and STOP_ARRAY actions has
been performed.

/mjt

> So if a user wants neat and nice md/udev integration, the way to go is case "a".
> If it's not required, either case will do.