2022-04-15 09:57:16

by NeilBrown

[permalink] [raw]
Subject: [PATCH/RFC] md: remove media-change code


md only ever used the media-change interfaces to trigger a partition
rescan once the array became active. Normally partition scan only
happens when the disk is first added, and with md the disk is typically
inactive when first added.

This rescan can now be achieved by simply setting GD_NEED_PART_SCAN.
So do that, and remove all the rescan.

This has the side effect of causing 'diskseq' to be stable for md devices.
Preciously diskseq would be incremented once the device became active
but no uevent would be generated to report this increment. This was
confusing to systemd.

https://github.com/systemd/systemd/pull/23011

Signed-off-by: NeilBrown <[email protected]>
---
drivers/md/md.c | 19 ++-----------------
drivers/md/md.h | 2 --
2 files changed, 2 insertions(+), 19 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 309b3af906ad..0ea4d34ec682 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5713,7 +5713,6 @@ static int md_alloc(dev_t dev, char *name)
mddev->queue = disk->queue;
blk_set_stacking_limits(&mddev->queue->limits);
blk_queue_write_cache(mddev->queue, true, true);
- disk->events |= DISK_EVENT_MEDIA_CHANGE;
mddev->gendisk = disk;
error = add_disk(disk);
if (error)
@@ -6089,7 +6088,7 @@ int do_md_run(struct mddev *mddev)

set_capacity_and_notify(mddev->gendisk, mddev->array_sectors);
clear_bit(MD_NOT_READY, &mddev->flags);
- mddev->changed = 1;
+ set_bit(GD_NEED_PART_SCAN, &mddev->gendisk->state);
kobject_uevent(&disk_to_dev(mddev->gendisk)->kobj, KOBJ_CHANGE);
sysfs_notify_dirent_safe(mddev->sysfs_state);
sysfs_notify_dirent_safe(mddev->sysfs_action);
@@ -6191,7 +6190,6 @@ static void md_clean(struct mddev *mddev)
mddev->sync_speed_min = mddev->sync_speed_max = 0;
mddev->recovery = 0;
mddev->in_sync = 0;
- mddev->changed = 0;
mddev->degraded = 0;
mddev->safemode = 0;
mddev->private = NULL;
@@ -6407,7 +6405,7 @@ static int do_md_stop(struct mddev *mddev, int mode,

set_capacity_and_notify(disk, 0);
mutex_unlock(&mddev->open_mutex);
- mddev->changed = 1;
+ set_bit(GD_NEED_PART_SCAN, &mddev->gendisk->state);

if (mddev->ro)
mddev->ro = 0;
@@ -7839,7 +7837,6 @@ static int md_open(struct block_device *bdev, fmode_t mode)
atomic_inc(&mddev->openers);
mutex_unlock(&mddev->open_mutex);

- bdev_check_media_change(bdev);
out:
if (err)
mddev_put(mddev);
@@ -7855,17 +7852,6 @@ static void md_release(struct gendisk *disk, fmode_t mode)
mddev_put(mddev);
}

-static unsigned int md_check_events(struct gendisk *disk, unsigned int clearing)
-{
- struct mddev *mddev = disk->private_data;
- unsigned int ret = 0;
-
- if (mddev->changed)
- ret = DISK_EVENT_MEDIA_CHANGE;
- mddev->changed = 0;
- return ret;
-}
-
const struct block_device_operations md_fops =
{
.owner = THIS_MODULE,
@@ -7877,7 +7863,6 @@ const struct block_device_operations md_fops =
.compat_ioctl = md_compat_ioctl,
#endif
.getgeo = md_getgeo,
- .check_events = md_check_events,
.set_read_only = md_set_read_only,
};

diff --git a/drivers/md/md.h b/drivers/md/md.h
index 6ac283864533..aec433ae5947 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -405,8 +405,6 @@ struct mddev {
atomic_t active; /* general refcount */
atomic_t openers; /* number of active opens */

- int changed; /* True if we might need to
- * reread partition info */
int degraded; /* whether md should consider
* adding a spare
*/
--
2.35.2


2022-04-16 01:16:18

by Mariusz Tkaczyk

[permalink] [raw]
Subject: Re: [PATCH/RFC] md: remove media-change code

On Thu, 14 Apr 2022 16:06:05 +1000
"NeilBrown" <[email protected]> wrote:

> md only ever used the media-change interfaces to trigger a partition
> rescan once the array became active. Normally partition scan only
> happens when the disk is first added, and with md the disk is typically
> inactive when first added.
>
> This rescan can now be achieved by simply setting GD_NEED_PART_SCAN.
> So do that, and remove all the rescan.
>
Hi Neil,

I experimented in this area in the past, mainly on IMSM (external
metadata). My problem is described here:
https://lore.kernel.org/linux-raid/SA0PR11MB4542ECA84F72506B39C3C9F1FFEE0@SA0PR11MB4542.namprd11.prod.outlook.com/
I lost reproduction on newer kernels, probably changes in block layer hide
issue, it seems to be time race. The change you proposed could bring the issue
back.

The current way is working, so I consider your change as potentially dangerous.
Anyway, I will help with testing if Song decides to take it.

For external metadata we should impose partition read after mdmon start (when
md device becomes RW), so it should be synchronized with mdadm.
It could break autostart functionality for native metadata (if it is still
in use).

Eventually, external metadata should be handled separately.

Thanks,
Mariusz