2023-08-03 13:39:10

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 00/29] md: synchronize io with array reconfiguration

From: Yu Kuai <[email protected]>

After previous four patchset of preparatory work, this patchset impelement
a new version of mddev_suspend(), the new apis:
- reconfig_mutex is not required;
- the weird logical that suspend array hold 'reconfig_mutex' for
mddev_check_recovery() to update superblock is not needed;
- the special handling, 'pers->prepare_suspend', for raid456 is not
needed;
- It's safe to be called at any time once mddev is allocated, and it's
designed to be used from slow path where array configuration is changed;

And use the new api to replace:

mddev_lock
mddev_suspend or not
// array reconfiguration
mddev_resume or not
mddev_unlock

With:

mddev_suspend
mddev_lock
// array reconfiguration
mddev_unlock
mddev_resume

However, the above change is not possible for raid5 and raid-cluster in
some corner cases, and mddev_suspend/resume() is replaced with quiesce()
callback, which will suspend the array as well.

This patchset is tested in my VM with mdadm testsuite with loop device
except for 10ddf tests(they always fail before this patchset).

A lot of cleanups will be started after this patchset.

Yu Kuai (29):
md: use READ_ONCE/WRITE_ONCE for 'suspend_lo' and 'suspend_hi'
md: use 'mddev->suspended' for is_md_suspended()
md: add new helpers to suspend/resume array
md: add new helpers to suspend/resume and lock/unlock array
md: use new apis to suspend array for suspend_lo/hi/store()
md: use new apis to suspend array for level_store()
md: use new apis to suspend array for serialize_policy_store()
md/dm-raid: use new apis to suspend array
md/md-bitmap: use new apis to suspend array for location_store()
md/raid5-cache: use READ_ONCE/WRITE_ONCE for 'conf->log'
md/raid5-cache: use new apis to suspend array for r5c_disable_writeback_async()
md/raid5-cache: use new apis to suspend array for r5c_journal_mode_store()
md/raid5: use new apis to suspend array for raid5_store_stripe_size()
md/raid5: use new apis to suspend array for raid5_store_skip_copy()
md/raid5: use new apis to suspend array for raid5_store_group_thread_cnt()
md/raid5: use new apis to suspend array for raid5_change_consistency_policy()
md/raid5: replace suspend with quiesce() callback
md: quiesce before md_kick_rdev_from_array() for md-cluster
md: use new apis to suspend array for ioctls involed array reconfiguration
md: use new apis to suspend array for adding/removing rdev from state_store()
md: use new apis to suspend array for bind_rdev_to_array()
md: use new apis to suspend array related to serial pool in state_store()
md: use new apis to suspend array in backlog_store()
md: suspend array in md_start_sync() if array need reconfiguration
md: cleanup mddev_create/destroy_serial_pool()
md/md-linear: cleanup linear_add()
md: remove mddev_suspend() and mddev_resume()
md/raid5: Revert "md/raid5: fix a deadlock in the case that reshape is interrupted"
md: Revert "md: add a new api prepare_suspend() in md_personality"

drivers/md/dm-raid.c | 12 +-
drivers/md/md-autodetect.c | 4 +-
drivers/md/md-bitmap.c | 18 ++-
drivers/md/md-linear.c | 2 -
drivers/md/md.c | 267 ++++++++++++++++++++++---------------
drivers/md/md.h | 57 ++++++--
drivers/md/raid5-cache.c | 65 +++++----
drivers/md/raid5.c | 100 +++-----------
8 files changed, 267 insertions(+), 258 deletions(-)

--
2.39.2



2023-08-03 13:39:11

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 07/29] md: use new apis to suspend array for serialize_policy_store()

From: Yu Kuai <[email protected]>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index fc1646f02ef3..765667b5fa59 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5548,7 +5548,7 @@ serialize_policy_store(struct mddev *mddev, const char *buf, size_t len)
if (value == mddev->serialize_policy)
return len;

- err = mddev_lock(mddev);
+ err = mddev_suspend_and_lock(mddev);
if (err)
return err;
if (mddev->pers == NULL || (mddev->pers->level != 1)) {
@@ -5557,15 +5557,13 @@ serialize_policy_store(struct mddev *mddev, const char *buf, size_t len)
goto unlock;
}

- mddev_suspend(mddev);
if (value)
mddev_create_serial_pool(mddev, NULL, true);
else
mddev_destroy_serial_pool(mddev, NULL, true);
mddev->serialize_policy = value;
- mddev_resume(mddev);
unlock:
- mddev_unlock(mddev);
+ mddev_unlock_and_resume(mddev);
return err ?: len;
}

--
2.39.2


2023-08-03 13:39:24

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 09/29] md/md-bitmap: use new apis to suspend array for location_store()

From: Yu Kuai <[email protected]>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md-bitmap.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 0c661e5036bb..7d21e2a5b06e 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -2348,11 +2348,10 @@ location_store(struct mddev *mddev, const char *buf, size_t len)
{
int rv;

- rv = mddev_lock(mddev);
+ rv = mddev_suspend_and_lock(mddev);
if (rv)
return rv;

- mddev_suspend(mddev);
if (mddev->pers) {
if (mddev->recovery || mddev->sync_thread) {
rv = -EBUSY;
@@ -2429,8 +2428,7 @@ location_store(struct mddev *mddev, const char *buf, size_t len)
}
rv = 0;
out:
- mddev_resume(mddev);
- mddev_unlock(mddev);
+ mddev_unlock_and_resume(mddev);
if (rv)
return rv;
return len;
--
2.39.2


2023-08-03 13:40:44

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 13/29] md/raid5: use new apis to suspend array for raid5_store_stripe_size()

From: Yu Kuai <[email protected]>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/raid5.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index d6695fc718c1..624899e95b4d 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7025,7 +7025,7 @@ raid5_store_stripe_size(struct mddev *mddev, const char *page, size_t len)
new != roundup_pow_of_two(new))
return -EINVAL;

- err = mddev_lock(mddev);
+ err = mddev_suspend_and_lock(mddev);
if (err)
return err;

@@ -7049,7 +7049,6 @@ raid5_store_stripe_size(struct mddev *mddev, const char *page, size_t len)
goto out_unlock;
}

- mddev_suspend(mddev);
mutex_lock(&conf->cache_size_mutex);
size = conf->max_nr_stripes;

@@ -7064,10 +7063,9 @@ raid5_store_stripe_size(struct mddev *mddev, const char *page, size_t len)
err = -ENOMEM;
}
mutex_unlock(&conf->cache_size_mutex);
- mddev_resume(mddev);

out_unlock:
- mddev_unlock(mddev);
+ mddev_unlock_and_resume(mddev);
return err ?: len;
}

--
2.39.2


2023-08-03 13:41:06

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 10/29] md/raid5-cache: use READ_ONCE/WRITE_ONCE for 'conf->log'

From: Yu Kuai <[email protected]>

'conf->log' is set with 'reconfig_mutex' grabbed, however, readers are
not procted, hence use READ_ONCE/WRITE_ONCE to prevent reading abnormal
value.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/raid5-cache.c | 47 +++++++++++++++++++++-------------------
1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 47ba7d9e81e1..b6e0a6bb3965 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -327,8 +327,9 @@ void r5l_wake_reclaim(struct r5l_log *log, sector_t space);
void r5c_check_stripe_cache_usage(struct r5conf *conf)
{
int total_cached;
+ struct r5l_log *log = READ_ONCE(conf->log);

- if (!r5c_is_writeback(conf->log))
+ if (!r5c_is_writeback(log))
return;

total_cached = atomic_read(&conf->r5c_cached_partial_stripes) +
@@ -344,7 +345,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
*/
if (total_cached > conf->min_nr_stripes * 1 / 2 ||
atomic_read(&conf->empty_inactive_list_nr) > 0)
- r5l_wake_reclaim(conf->log, 0);
+ r5l_wake_reclaim(log, 0);
}

/*
@@ -353,7 +354,9 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
*/
void r5c_check_cached_full_stripe(struct r5conf *conf)
{
- if (!r5c_is_writeback(conf->log))
+ struct r5l_log *log = READ_ONCE(conf->log);
+
+ if (!r5c_is_writeback(log))
return;

/*
@@ -363,7 +366,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf)
if (atomic_read(&conf->r5c_cached_full_stripes) >=
min(R5C_FULL_STRIPE_FLUSH_BATCH(conf),
conf->chunk_sectors >> RAID5_STRIPE_SHIFT(conf)))
- r5l_wake_reclaim(conf->log, 0);
+ r5l_wake_reclaim(log, 0);
}

/*
@@ -396,7 +399,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf)
*/
static sector_t r5c_log_required_to_flush_cache(struct r5conf *conf)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = READ_ONCE(conf->log);

if (!r5c_is_writeback(log))
return 0;
@@ -449,7 +452,7 @@ static inline void r5c_update_log_state(struct r5l_log *log)
void r5c_make_stripe_write_out(struct stripe_head *sh)
{
struct r5conf *conf = sh->raid_conf;
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = READ_ONCE(conf->log);

BUG_ON(!r5c_is_writeback(log));

@@ -491,7 +494,7 @@ static void r5c_handle_parity_cached(struct stripe_head *sh)
*/
static void r5c_finish_cache_stripe(struct stripe_head *sh)
{
- struct r5l_log *log = sh->raid_conf->log;
+ struct r5l_log *log = READ_ONCE(sh->raid_conf->log);

if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH) {
BUG_ON(test_bit(STRIPE_R5C_CACHING, &sh->state));
@@ -692,7 +695,7 @@ static void r5c_disable_writeback_async(struct work_struct *work)

/* wait superblock change before suspend */
wait_event(mddev->sb_wait,
- conf->log == NULL ||
+ !READ_ONCE(conf->log) ||
(!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
(locked = mddev_trylock(mddev))));
if (locked) {
@@ -1151,7 +1154,7 @@ static void r5l_run_no_space_stripes(struct r5l_log *log)
static sector_t r5c_calculate_new_cp(struct r5conf *conf)
{
struct stripe_head *sh;
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = READ_ONCE(conf->log);
sector_t new_cp;
unsigned long flags;

@@ -1159,12 +1162,12 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf)
return log->next_checkpoint;

spin_lock_irqsave(&log->stripe_in_journal_lock, flags);
- if (list_empty(&conf->log->stripe_in_journal_list)) {
+ if (list_empty(&log->stripe_in_journal_list)) {
/* all stripes flushed */
spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
return log->next_checkpoint;
}
- sh = list_first_entry(&conf->log->stripe_in_journal_list,
+ sh = list_first_entry(&log->stripe_in_journal_list,
struct stripe_head, r5c);
new_cp = sh->log_start;
spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
@@ -1400,7 +1403,7 @@ void r5c_flush_cache(struct r5conf *conf, int num)
struct stripe_head *sh, *next;

lockdep_assert_held(&conf->device_lock);
- if (!conf->log)
+ if (!READ_ONCE(conf->log))
return;

count = 0;
@@ -1421,7 +1424,7 @@ void r5c_flush_cache(struct r5conf *conf, int num)

static void r5c_do_reclaim(struct r5conf *conf)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = READ_ONCE(conf->log);
struct stripe_head *sh;
int count = 0;
unsigned long flags;
@@ -1550,7 +1553,7 @@ static void r5l_reclaim_thread(struct md_thread *thread)
{
struct mddev *mddev = thread->mddev;
struct r5conf *conf = mddev->private;
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = READ_ONCE(conf->log);

if (!log)
return;
@@ -1592,7 +1595,7 @@ void r5l_quiesce(struct r5l_log *log, int quiesce)

bool r5l_log_disk_error(struct r5conf *conf)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = READ_ONCE(conf->log);

/* don't allow write if journal disk is missing */
if (!log)
@@ -2636,7 +2639,7 @@ int r5c_try_caching_write(struct r5conf *conf,
struct stripe_head_state *s,
int disks)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = READ_ONCE(conf->log);
int i;
struct r5dev *dev;
int to_cache = 0;
@@ -2803,7 +2806,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,
struct stripe_head *sh,
struct stripe_head_state *s)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = READ_ONCE(conf->log);
int i;
int do_wakeup = 0;
sector_t tree_index;
@@ -2942,7 +2945,7 @@ int r5c_cache_data(struct r5l_log *log, struct stripe_head *sh)
/* check whether this big stripe is in write back cache. */
bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = READ_ONCE(conf->log);
sector_t tree_index;
void *slot;

@@ -3050,14 +3053,14 @@ int r5l_start(struct r5l_log *log)
void r5c_update_on_rdev_error(struct mddev *mddev, struct md_rdev *rdev)
{
struct r5conf *conf = mddev->private;
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = READ_ONCE(conf->log);

if (!log)
return;

if ((raid5_calc_degraded(conf) > 0 ||
test_bit(Journal, &rdev->flags)) &&
- conf->log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK)
+ log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK)
schedule_work(&log->disable_writeback_work);
}

@@ -3146,7 +3149,7 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
spin_lock_init(&log->stripe_in_journal_lock);
atomic_set(&log->stripe_in_journal_count, 0);

- conf->log = log;
+ WRITE_ONCE(conf->log, log);

set_bit(MD_HAS_JOURNAL, &conf->mddev->flags);
return 0;
@@ -3173,7 +3176,7 @@ void r5l_exit_log(struct r5conf *conf)
flush_work(&log->disable_writeback_work);
md_unregister_thread(&log->reclaim_thread);

- conf->log = NULL;
+ WRITE_ONCE(conf->log, NULL);

mempool_exit(&log->meta_pool);
bioset_exit(&log->bs);
--
2.39.2


2023-08-03 13:42:28

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 17/29] md/raid5: replace suspend with quiesce() callback

From: Yu Kuai <[email protected]>

raid5 is the only personality to suspend array in check_reshape() and
start_reshape() callback, suspend and quiesce() callback can both wait
for all normal io to be done, and prevent new io to be dispatched, the
difference is that suspend is implemented in common layer, and quiesce()
callback is implemented in raid5.

In order to cleanup all the usage of mddev_suspend(), the new apis
__mddev_suspend() need to be called before 'reconfig_mutex' is held,
and it's not good to affect all the personalities in common layer just
for raid5. Hence replace suspend with quiesce() callaback, prepare to
reomove all the users of mddev_suspend().

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/raid5.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index b4b2d0d5855d..4709b7ad0317 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -70,6 +70,8 @@ MODULE_PARM_DESC(devices_handle_discard_safely,
"Set to Y if all devices in each array reliably return zeroes on reads from discarded regions");
static struct workqueue_struct *raid5_wq;

+static void raid5_quiesce(struct mddev *mddev, int quiesce);
+
static inline struct hlist_head *stripe_hash(struct r5conf *conf, sector_t sect)
{
int hash = (sect >> RAID5_STRIPE_SHIFT(conf)) & HASH_MASK;
@@ -2492,15 +2494,12 @@ static int resize_chunks(struct r5conf *conf, int new_disks, int new_sectors)
unsigned long cpu;
int err = 0;

- /*
- * Never shrink. And mddev_suspend() could deadlock if this is called
- * from raid5d. In that case, scribble_disks and scribble_sectors
- * should equal to new_disks and new_sectors
- */
+ /* Never shrink. */
if (conf->scribble_disks >= new_disks &&
conf->scribble_sectors >= new_sectors)
return 0;
- mddev_suspend(conf->mddev);
+
+ raid5_quiesce(conf->mddev, true);
cpus_read_lock();

for_each_present_cpu(cpu) {
@@ -2514,7 +2513,8 @@ static int resize_chunks(struct r5conf *conf, int new_disks, int new_sectors)
}

cpus_read_unlock();
- mddev_resume(conf->mddev);
+ raid5_quiesce(conf->mddev, false);
+
if (!err) {
conf->scribble_disks = new_disks;
conf->scribble_sectors = new_sectors;
@@ -8550,8 +8550,8 @@ static int raid5_start_reshape(struct mddev *mddev)
* the reshape wasn't running - like Discard or Read - have
* completed.
*/
- mddev_suspend(mddev);
- mddev_resume(mddev);
+ raid5_quiesce(mddev, true);
+ raid5_quiesce(mddev, false);

/* Add some new drives, as many as will fit.
* We know there are enough to make the newly sized array work.
--
2.39.2


2023-08-03 13:42:51

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 12/29] md/raid5-cache: use new apis to suspend array for r5c_journal_mode_store()

From: Yu Kuai <[email protected]>

r5c_journal_mode_set() will suspend array and it has only 2 caller, the
other caller raid_ctl() already suspend the array with new apis.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/raid5-cache.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index c71cb5c954e0..51a68fbc241c 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -2585,9 +2585,7 @@ int r5c_journal_mode_set(struct mddev *mddev, int mode)
mode == R5C_JOURNAL_MODE_WRITE_BACK)
return -EINVAL;

- mddev_suspend(mddev);
conf->log->r5c_journal_mode = mode;
- mddev_resume(mddev);

pr_debug("md/raid:%s: setting r5c cache mode to %d: %s\n",
mdname(mddev), mode, r5c_journal_mode_str[mode]);
@@ -2612,11 +2610,11 @@ static ssize_t r5c_journal_mode_store(struct mddev *mddev,
if (strlen(r5c_journal_mode_str[mode]) == len &&
!strncmp(page, r5c_journal_mode_str[mode], len))
break;
- ret = mddev_lock(mddev);
+ ret = mddev_suspend_and_lock(mddev);
if (ret)
return ret;
ret = r5c_journal_mode_set(mddev, mode);
- mddev_unlock(mddev);
+ mddev_unlock_and_resume(mddev);
return ret ?: length;
}

--
2.39.2


2023-08-03 13:44:34

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 19/29] md: use new apis to suspend array for ioctls involed array reconfiguration

From: Yu Kuai <[email protected]>

'reconfig_mutex' will be grabbed before these ioctls, suspend array
before holding the lock, so that io won't concurrent with array
reconfiguration through ioctls.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index d550bacd0efc..66bb6a585291 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7183,7 +7183,6 @@ static int set_bitmap_file(struct mddev *mddev, int fd)
struct bitmap *bitmap;

bitmap = md_bitmap_create(mddev, -1);
- mddev_suspend(mddev);
if (!IS_ERR(bitmap)) {
mddev->bitmap = bitmap;
err = md_bitmap_load(mddev);
@@ -7193,11 +7192,8 @@ static int set_bitmap_file(struct mddev *mddev, int fd)
md_bitmap_destroy(mddev);
fd = -1;
}
- mddev_resume(mddev);
} else if (fd < 0) {
- mddev_suspend(mddev);
md_bitmap_destroy(mddev);
- mddev_resume(mddev);
}
}
if (fd < 0) {
@@ -7486,7 +7482,6 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
mddev->bitmap_info.space =
mddev->bitmap_info.default_space;
bitmap = md_bitmap_create(mddev, -1);
- mddev_suspend(mddev);
if (!IS_ERR(bitmap)) {
mddev->bitmap = bitmap;
rv = md_bitmap_load(mddev);
@@ -7494,7 +7489,6 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
rv = PTR_ERR(bitmap);
if (rv)
md_bitmap_destroy(mddev);
- mddev_resume(mddev);
} else {
/* remove the bitmap */
if (!mddev->bitmap) {
@@ -7519,9 +7513,7 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
module_put(md_cluster_mod);
mddev->safemode_delay = DEFAULT_SAFEMODE_DELAY;
}
- mddev_suspend(mddev);
md_bitmap_destroy(mddev);
- mddev_resume(mddev);
mddev->bitmap_info.offset = 0;
}
}
@@ -7592,6 +7584,19 @@ static inline bool md_ioctl_valid(unsigned int cmd)
}
}

+static bool md_ioctl_need_suspend(unsigned int cmd)
+{
+ switch (cmd) {
+ case ADD_NEW_DISK:
+ case HOT_ADD_DISK:
+ case HOT_REMOVE_DISK:
+ case SET_BITMAP_FILE:
+ return true;
+ default:
+ return false;
+ }
+}
+
static int __md_set_array_info(struct mddev *mddev, void __user *argp)
{
mdu_array_info_t info;
@@ -7720,6 +7725,9 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t mode,
mutex_unlock(&mddev->open_mutex);
sync_blockdev(bdev);
}
+
+ if (md_ioctl_need_suspend(cmd))
+ __mddev_suspend(mddev);
err = mddev_lock(mddev);
if (err) {
pr_debug("md: ioctl lock interrupted, reason %d, cmd %d\n",
@@ -7848,6 +7856,9 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t mode,
if (mddev->hold_active == UNTIL_IOCTL &&
err != -EINVAL)
mddev->hold_active = 0;
+
+ if (md_ioctl_need_suspend(cmd))
+ __mddev_resume(mddev);
mddev_unlock(mddev);
out:
if(did_set_md_closing)
--
2.39.2


2023-08-03 13:45:02

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 23/29] md: use new apis to suspend array in backlog_store()

From: Yu Kuai <[email protected]>

mddev_create/destroy_serial_pool() will be called from backlog_store(),
and mddev_suspend() will be called later.

Prepare to remove the mddev_suspend() from
mddev_create/destroy_serial_pool().

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md-bitmap.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 7d21e2a5b06e..b3d701c5c461 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -2537,7 +2537,7 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
if (backlog > COUNTER_MAX)
return -EINVAL;

- rv = mddev_lock(mddev);
+ rv = mddev_suspend_and_lock(mddev);
if (rv)
return rv;

@@ -2562,16 +2562,16 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
if (!backlog && mddev->serial_info_pool) {
/* serial_info_pool is not needed if backlog is zero */
if (!mddev->serialize_policy)
- mddev_destroy_serial_pool(mddev, NULL, false);
+ mddev_destroy_serial_pool(mddev, NULL, true);
} else if (backlog && !mddev->serial_info_pool) {
/* serial_info_pool is needed since backlog is not zero */
rdev_for_each(rdev, mddev)
- mddev_create_serial_pool(mddev, rdev, false);
+ mddev_create_serial_pool(mddev, rdev, true);
}
if (old_mwb != backlog)
md_bitmap_update_sb(mddev->bitmap);

- mddev_unlock(mddev);
+ mddev_unlock_and_resume(mddev);
return len;
}

--
2.39.2


2023-08-03 13:45:09

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 22/29] md: use new apis to suspend array related to serial pool in state_store()

From: Yu Kuai <[email protected]>

mddev_create/destroy_serial_pool() will be called from state_store() if
user write 'writemostly'/'-writemostly', and mddev_suspend() will be
called later.

Prepare to remove the mddev_suspend() from
mddev_create/destroy_serial_pool().

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index ca1fe1d17ec1..d5c061c87b2e 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -3067,11 +3067,11 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
err = remove_rdev(rdev);
} else if (cmd_match(buf, "writemostly")) {
set_bit(WriteMostly, &rdev->flags);
- mddev_create_serial_pool(rdev->mddev, rdev, false);
+ mddev_create_serial_pool(rdev->mddev, rdev, true);
need_update_sb = true;
err = 0;
} else if (cmd_match(buf, "-writemostly")) {
- mddev_destroy_serial_pool(rdev->mddev, rdev, false);
+ mddev_destroy_serial_pool(rdev->mddev, rdev, true);
clear_bit(WriteMostly, &rdev->flags);
need_update_sb = true;
err = 0;
@@ -3695,7 +3695,9 @@ rdev_attr_store(struct kobject *kobj, struct attribute *attr,
if (entry->store == state_store) {
if (cmd_match(page, "remove"))
kn = sysfs_break_active_protection(kobj, attr);
- if (cmd_match(page, "remove") || cmd_match(page, "re-add")) {
+ if (cmd_match(page, "remove") || cmd_match(page, "re-add") ||
+ cmd_match(page, "writemostly") ||
+ cmd_match(page, "-writemostly")) {
__mddev_suspend(mddev);
suspended = true;
}
--
2.39.2


2023-08-03 13:45:11

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 25/29] md: cleanup mddev_create/destroy_serial_pool()

From: Yu Kuai <[email protected]>

Now that except for stopping the array, all the callers already suspend
the array, there is no need to suspend anymore, hence remove the second
parameter.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md-bitmap.c | 8 ++++----
drivers/md/md.c | 33 ++++++++++-----------------------
drivers/md/md.h | 7 +++----
3 files changed, 17 insertions(+), 31 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index b3d701c5c461..9672f75c3050 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -1861,7 +1861,7 @@ void md_bitmap_destroy(struct mddev *mddev)

md_bitmap_wait_behind_writes(mddev);
if (!mddev->serialize_policy)
- mddev_destroy_serial_pool(mddev, NULL, true);
+ mddev_destroy_serial_pool(mddev, NULL);

mutex_lock(&mddev->bitmap_info.mutex);
spin_lock(&mddev->lock);
@@ -1977,7 +1977,7 @@ int md_bitmap_load(struct mddev *mddev)
goto out;

rdev_for_each(rdev, mddev)
- mddev_create_serial_pool(mddev, rdev, true);
+ mddev_create_serial_pool(mddev, rdev);

if (mddev_is_clustered(mddev))
md_cluster_ops->load_bitmaps(mddev, mddev->bitmap_info.nodes);
@@ -2562,11 +2562,11 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
if (!backlog && mddev->serial_info_pool) {
/* serial_info_pool is not needed if backlog is zero */
if (!mddev->serialize_policy)
- mddev_destroy_serial_pool(mddev, NULL, true);
+ mddev_destroy_serial_pool(mddev, NULL);
} else if (backlog && !mddev->serial_info_pool) {
/* serial_info_pool is needed since backlog is not zero */
rdev_for_each(rdev, mddev)
- mddev_create_serial_pool(mddev, rdev, true);
+ mddev_create_serial_pool(mddev, rdev);
}
if (old_mwb != backlog)
md_bitmap_update_sb(mddev->bitmap);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 9bfa15b2fa97..915a6f15d726 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -205,8 +205,7 @@ static int rdev_need_serial(struct md_rdev *rdev)
* 1. rdev is the first device which return true from rdev_enable_serial.
* 2. rdev is NULL, means we want to enable serialization for all rdevs.
*/
-void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
- bool is_suspend)
+void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev)
{
int ret = 0;

@@ -214,15 +213,12 @@ void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
!test_bit(CollisionCheck, &rdev->flags))
return;

- if (!is_suspend)
- mddev_suspend(mddev);
-
if (!rdev)
ret = rdevs_init_serial(mddev);
else
ret = rdev_init_serial(rdev);
if (ret)
- goto abort;
+ return;

if (mddev->serial_info_pool == NULL) {
/*
@@ -237,10 +233,6 @@ void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
pr_err("can't alloc memory pool for serialization\n");
}
}
-
-abort:
- if (!is_suspend)
- mddev_resume(mddev);
}

/*
@@ -249,8 +241,7 @@ void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
* 2. when bitmap is destroyed while policy is not enabled.
* 3. for disable policy, the pool is destroyed only when no rdev needs it.
*/
-void mddev_destroy_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
- bool is_suspend)
+void mddev_destroy_serial_pool(struct mddev *mddev, struct md_rdev *rdev)
{
if (rdev && !test_bit(CollisionCheck, &rdev->flags))
return;
@@ -259,8 +250,6 @@ void mddev_destroy_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
struct md_rdev *temp;
int num = 0; /* used to track if other rdevs need the pool */

- if (!is_suspend)
- mddev_suspend(mddev);
rdev_for_each(temp, mddev) {
if (!rdev) {
if (!mddev->serialize_policy ||
@@ -282,8 +271,6 @@ void mddev_destroy_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
mempool_destroy(mddev->serial_info_pool);
mddev->serial_info_pool = NULL;
}
- if (!is_suspend)
- mddev_resume(mddev);
}
}

@@ -2534,7 +2521,7 @@ static int bind_rdev_to_array(struct md_rdev *rdev, struct mddev *mddev)
pr_debug("md: bind<%s>\n", b);

if (mddev->raid_disks)
- mddev_create_serial_pool(mddev, rdev, true);
+ mddev_create_serial_pool(mddev, rdev);

if ((err = kobject_add(&rdev->kobj, &mddev->kobj, "dev-%s", b)))
goto fail;
@@ -2586,7 +2573,7 @@ static void md_kick_rdev_from_array(struct md_rdev *rdev)
bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk);
list_del_rcu(&rdev->same_set);
pr_debug("md: unbind<%pg>\n", rdev->bdev);
- mddev_destroy_serial_pool(rdev->mddev, rdev, false);
+ mddev_destroy_serial_pool(rdev->mddev, rdev);
rdev->mddev = NULL;
sysfs_remove_link(&rdev->kobj, "block");
sysfs_put(rdev->sysfs_state);
@@ -3067,11 +3054,11 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
err = remove_rdev(rdev);
} else if (cmd_match(buf, "writemostly")) {
set_bit(WriteMostly, &rdev->flags);
- mddev_create_serial_pool(rdev->mddev, rdev, true);
+ mddev_create_serial_pool(rdev->mddev, rdev);
need_update_sb = true;
err = 0;
} else if (cmd_match(buf, "-writemostly")) {
- mddev_destroy_serial_pool(rdev->mddev, rdev, true);
+ mddev_destroy_serial_pool(rdev->mddev, rdev);
clear_bit(WriteMostly, &rdev->flags);
need_update_sb = true;
err = 0;
@@ -5566,9 +5553,9 @@ serialize_policy_store(struct mddev *mddev, const char *buf, size_t len)
}

if (value)
- mddev_create_serial_pool(mddev, NULL, true);
+ mddev_create_serial_pool(mddev, NULL);
else
- mddev_destroy_serial_pool(mddev, NULL, true);
+ mddev_destroy_serial_pool(mddev, NULL);
mddev->serialize_policy = value;
unlock:
mddev_unlock_and_resume(mddev);
@@ -6334,7 +6321,7 @@ static void __md_stop_writes(struct mddev *mddev)
}
/* disable policy to guarantee rdevs free resources for serialization */
mddev->serialize_policy = 0;
- mddev_destroy_serial_pool(mddev, NULL, true);
+ mddev_destroy_serial_pool(mddev, NULL);
}

void md_stop_writes(struct mddev *mddev)
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 3a2488231fd2..24adb603461d 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -809,10 +809,9 @@ extern void __mddev_resume(struct mddev *mddev);

extern void md_reload_sb(struct mddev *mddev, int raid_disk);
extern void md_update_sb(struct mddev *mddev, int force);
-extern void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
- bool is_suspend);
-extern void mddev_destroy_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
- bool is_suspend);
+extern void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev);
+extern void mddev_destroy_serial_pool(struct mddev *mddev,
+ struct md_rdev *rdev);
struct md_rdev *md_find_rdev_nr_rcu(struct mddev *mddev, int nr);
struct md_rdev *md_find_rdev_rcu(struct mddev *mddev, dev_t dev);

--
2.39.2


2023-08-03 13:45:45

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 24/29] md: suspend array in md_start_sync() if array need reconfiguration

From: Yu Kuai <[email protected]>

So that io won't concurrent with array reconfiguration, and now that
all callers of md_kick_rdev_from_array() suspend the array, change the
second parameter to true and prepare to remove the mddev_suspend() from
mddev_create/destroy_serial_pool().

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index d5c061c87b2e..9bfa15b2fa97 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9358,12 +9358,20 @@ static void md_start_sync(struct work_struct *ws)
{
struct mddev *mddev = container_of(ws, struct mddev, sync_work);
int spares = 0;
+ bool suspended = false;
+
+ if (md_spares_need_change(mddev)) {
+ __mddev_suspend(mddev);
+ suspended = true;
+ }

mddev_lock_nointr(mddev);

if (!md_is_rdwr(mddev)) {
remove_and_add_spares(mddev);
mddev_unlock(mddev);
+ if (suspended)
+ __mddev_resume(mddev);
return;
}

@@ -9412,6 +9420,9 @@ static void md_start_sync(struct work_struct *ws)
}

mddev_unlock(mddev);
+ if (suspended)
+ __mddev_resume(mddev);
+
md_wakeup_thread(mddev->sync_thread);
sysfs_notify_dirent_safe(mddev->sysfs_action);
md_new_event();
@@ -9424,6 +9435,8 @@ static void md_start_sync(struct work_struct *ws)
clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
mddev_unlock(mddev);
+ if (suspended)
+ __mddev_resume(mddev);

wake_up(&resync_wait);
if (test_and_clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery) &&
--
2.39.2


2023-08-03 13:45:46

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 29/29] md: Revert "md: add a new api prepare_suspend() in md_personality"

From: Yu Kuai <[email protected]>

This reverts commit 3e00777d51572bdd75cef29c9c31106b52d7cc8f.

Because this new api is not used anymore.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.h | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/md/md.h b/drivers/md/md.h
index 492c5a130da3..dbe7f4d0b7d6 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -633,7 +633,6 @@ struct md_personality
int (*start_reshape) (struct mddev *mddev);
void (*finish_reshape) (struct mddev *mddev);
void (*update_reshape_pos) (struct mddev *mddev);
- void (*prepare_suspend) (struct mddev *mddev);
/* quiesce suspends or resumes internal processing.
* 1 - stop new actions and wait for action io to complete
* 0 - return to normal behaviour
--
2.39.2


2023-08-03 13:45:55

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 26/29] md/md-linear: cleanup linear_add()

From: Yu Kuai <[email protected]>

Now that caller already suspend the array, there is no need to suspend
array in liner_add().

Note that mddev_suspend/resume() is not used anymore.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md-linear.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c
index 71ac99646827..66412397cef0 100644
--- a/drivers/md/md-linear.c
+++ b/drivers/md/md-linear.c
@@ -183,7 +183,6 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev)
* in linear_congested(), therefore kfree_rcu() is used to free
* oldconf until no one uses it anymore.
*/
- mddev_suspend(mddev);
oldconf = rcu_dereference_protected(mddev->private,
lockdep_is_held(&mddev->reconfig_mutex));
mddev->raid_disks++;
@@ -192,7 +191,6 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev)
rcu_assign_pointer(mddev->private, newconf);
md_set_array_sectors(mddev, linear_size(mddev, 0, 0));
set_capacity_and_notify(mddev->gendisk, mddev->array_sectors);
- mddev_resume(mddev);
kfree_rcu(oldconf, rcu);
return 0;
}
--
2.39.2


2023-08-03 13:46:18

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 20/29] md: use new apis to suspend array for adding/removing rdev from state_store()

From: Yu Kuai <[email protected]>

User can write 'remove' and 're-add' to trigger array reconfiguration
through sysfs, suspend array in this case so that io won't concurrent
with array reconfiguration.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 66bb6a585291..4a522558a570 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2916,11 +2916,7 @@ static int add_bound_rdev(struct md_rdev *rdev)
*/
super_types[mddev->major_version].
validate_super(mddev, rdev);
- if (add_journal)
- mddev_suspend(mddev);
err = mddev->pers->hot_add_disk(mddev, rdev);
- if (add_journal)
- mddev_resume(mddev);
if (err) {
md_kick_rdev_from_array(rdev);
return err;
@@ -3687,6 +3683,7 @@ rdev_attr_store(struct kobject *kobj, struct attribute *attr,
struct rdev_sysfs_entry *entry = container_of(attr, struct rdev_sysfs_entry, attr);
struct md_rdev *rdev = container_of(kobj, struct md_rdev, kobj);
struct kernfs_node *kn = NULL;
+ bool suspended = false;
ssize_t rv;
struct mddev *mddev = rdev->mddev;

@@ -3695,8 +3692,14 @@ rdev_attr_store(struct kobject *kobj, struct attribute *attr,
if (!capable(CAP_SYS_ADMIN))
return -EACCES;

- if (entry->store == state_store && cmd_match(page, "remove"))
- kn = sysfs_break_active_protection(kobj, attr);
+ if (entry->store == state_store) {
+ if (cmd_match(page, "remove"))
+ kn = sysfs_break_active_protection(kobj, attr);
+ if (cmd_match(page, "remove") || cmd_match(page, "re-add")) {
+ __mddev_suspend(mddev);
+ suspended = true;
+ }
+ }

rv = mddev ? mddev_lock(mddev) : -ENODEV;
if (!rv) {
@@ -3705,6 +3708,9 @@ rdev_attr_store(struct kobject *kobj, struct attribute *attr,
else
rv = entry->store(rdev, page, length);
mddev_unlock(mddev);
+
+ if (suspended)
+ __mddev_resume(mddev);
}

if (kn)
--
2.39.2


2023-08-03 14:00:53

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 15/29] md/raid5: use new apis to suspend array for raid5_store_group_thread_cnt()

From: Yu Kuai <[email protected]>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/raid5.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index b8dd23357430..11fcaa1f7de5 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7221,15 +7221,13 @@ raid5_store_group_thread_cnt(struct mddev *mddev, const char *page, size_t len)
if (new > 8192)
return -EINVAL;

- err = mddev_lock(mddev);
+ err = mddev_suspend_and_lock(mddev);
if (err)
return err;
conf = mddev->private;
if (!conf)
err = -ENODEV;
else if (new != conf->worker_cnt_per_group) {
- mddev_suspend(mddev);
-
old_groups = conf->worker_groups;
if (old_groups)
flush_workqueue(raid5_wq);
@@ -7246,9 +7244,8 @@ raid5_store_group_thread_cnt(struct mddev *mddev, const char *page, size_t len)
kfree(old_groups[0].workers);
kfree(old_groups);
}
- mddev_resume(mddev);
}
- mddev_unlock(mddev);
+ mddev_unlock_and_resume(mddev);

return err ?: len;
}
--
2.39.2


2023-08-03 14:03:03

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 16/29] md/raid5: use new apis to suspend array for raid5_change_consistency_policy()

From: Yu Kuai <[email protected]>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/raid5.c | 19 ++++++-------------
1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 11fcaa1f7de5..b4b2d0d5855d 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -8966,12 +8966,12 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf)
struct r5conf *conf;
int err;

- err = mddev_lock(mddev);
+ err = mddev_suspend_and_lock(mddev);
if (err)
return err;
conf = mddev->private;
if (!conf) {
- mddev_unlock(mddev);
+ mddev_unlock_and_resume(mddev);
return -ENODEV;
}

@@ -8981,19 +8981,14 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf)
err = log_init(conf, NULL, true);
if (!err) {
err = resize_stripes(conf, conf->pool_size);
- if (err) {
- mddev_suspend(mddev);
+ if (err)
log_exit(conf);
- mddev_resume(mddev);
- }
}
} else
err = -EINVAL;
} else if (strncmp(buf, "resync", 6) == 0) {
if (raid5_has_ppl(conf)) {
- mddev_suspend(mddev);
log_exit(conf);
- mddev_resume(mddev);
err = resize_stripes(conf, conf->pool_size);
} else if (test_bit(MD_HAS_JOURNAL, &conf->mddev->flags) &&
r5l_log_disk_error(conf)) {
@@ -9006,11 +9001,9 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf)
break;
}

- if (!journal_dev_exists) {
- mddev_suspend(mddev);
+ if (!journal_dev_exists)
clear_bit(MD_HAS_JOURNAL, &mddev->flags);
- mddev_resume(mddev);
- } else /* need remove journal device first */
+ else /* need remove journal device first */
err = -EBUSY;
} else
err = -EINVAL;
@@ -9021,7 +9014,7 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf)
if (!err)
md_update_sb(mddev, 1);

- mddev_unlock(mddev);
+ mddev_unlock_and_resume(mddev);

return err;
}
--
2.39.2


2023-08-03 14:06:39

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 03/29] md: add new helpers to suspend/resume array

From: Yu Kuai <[email protected]>

Advantages for new apis:
- reconfig_mutex is not required;
- the weird logical that suspend array hold 'reconfig_mutex' for
mddev_check_recovery() to update superblock is not needed;
- the specail handling, 'pers->prepare_suspend', for raid456 is not
needed;
- It's safe to be called at any time once mddev is allocated, and it's
designed to be used from slow path where array configuration is changed;

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++--
drivers/md/md.h | 3 ++
2 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 89f5175b1295..3c98f253b980 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -442,12 +442,22 @@ void mddev_suspend(struct mddev *mddev)
lockdep_is_held(&mddev->reconfig_mutex));

WARN_ON_ONCE(thread && current == thread->tsk);
- if (mddev->suspended++)
+
+ /* can't concurrent with __mddev_suspend() and __mddev_resume() */
+ mutex_lock(&mddev->suspend_mutex);
+ if (mddev->suspended++) {
+ mutex_unlock(&mddev->suspend_mutex);
return;
+ }
+
wake_up(&mddev->sb_wait);
set_bit(MD_ALLOW_SB_UPDATE, &mddev->flags);
percpu_ref_kill(&mddev->active_io);

+ /*
+ * TODO: cleanup 'pers->prepare_suspend after all callers are replaced
+ * by __mddev_suspend().
+ */
if (mddev->pers && mddev->pers->prepare_suspend)
mddev->pers->prepare_suspend(mddev);

@@ -458,14 +468,21 @@ void mddev_suspend(struct mddev *mddev)
del_timer_sync(&mddev->safemode_timer);
/* restrict memory reclaim I/O during raid array is suspend */
mddev->noio_flag = memalloc_noio_save();
+
+ mutex_unlock(&mddev->suspend_mutex);
}
EXPORT_SYMBOL_GPL(mddev_suspend);

void mddev_resume(struct mddev *mddev)
{
lockdep_assert_held(&mddev->reconfig_mutex);
- if (--mddev->suspended)
+
+ /* can't concurrent with __mddev_suspend() and __mddev_resume() */
+ mutex_lock(&mddev->suspend_mutex);
+ if (--mddev->suspended) {
+ mutex_unlock(&mddev->suspend_mutex);
return;
+ }

/* entred the memalloc scope from mddev_suspend() */
memalloc_noio_restore(mddev->noio_flag);
@@ -476,9 +493,72 @@ void mddev_resume(struct mddev *mddev)
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
md_wakeup_thread(mddev->thread);
md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */
+
+ mutex_unlock(&mddev->suspend_mutex);
}
EXPORT_SYMBOL_GPL(mddev_resume);

+void __mddev_suspend(struct mddev *mddev)
+{
+
+ /*
+ * hold reconfig_mutex to wait for normal io will deadlock, because
+ * other context can't update super_block, and normal io can rely on
+ * updating super_block.
+ */
+ lockdep_assert_not_held(&mddev->reconfig_mutex);
+
+ mutex_lock(&mddev->suspend_mutex);
+
+ if (mddev->suspended) {
+ WRITE_ONCE(mddev->suspended, mddev->suspended + 1);
+ mutex_unlock(&mddev->suspend_mutex);
+ return;
+ }
+
+ percpu_ref_kill(&mddev->active_io);
+ wait_event(mddev->sb_wait, percpu_ref_is_zero(&mddev->active_io));
+
+ /*
+ * For raid456, io might be waiting for reshape to make progress,
+ * allow new reshape to start while waiting for io to be done to
+ * prevent deadlock.
+ */
+ WRITE_ONCE(mddev->suspended, mddev->suspended + 1);
+
+ del_timer_sync(&mddev->safemode_timer);
+ /* restrict memory reclaim I/O during raid array is suspend */
+ mddev->noio_flag = memalloc_noio_save();
+
+ mutex_unlock(&mddev->suspend_mutex);
+}
+EXPORT_SYMBOL_GPL(__mddev_suspend);
+
+void __mddev_resume(struct mddev *mddev)
+{
+ lockdep_assert_not_held(&mddev->reconfig_mutex);
+
+ mutex_lock(&mddev->suspend_mutex);
+ WRITE_ONCE(mddev->suspended, mddev->suspended - 1);
+ if (mddev->suspended) {
+ mutex_unlock(&mddev->suspend_mutex);
+ return;
+ }
+
+ /* entred the memalloc scope from __mddev_suspend() */
+ memalloc_noio_restore(mddev->noio_flag);
+
+ percpu_ref_resurrect(&mddev->active_io);
+ wake_up(&mddev->sb_wait);
+
+ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+ md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */
+
+ mutex_unlock(&mddev->suspend_mutex);
+}
+EXPORT_SYMBOL_GPL(__mddev_resume);
+
/*
* Generic flush handling for md
*/
@@ -666,6 +746,7 @@ int mddev_init(struct mddev *mddev)
mutex_init(&mddev->open_mutex);
mutex_init(&mddev->reconfig_mutex);
mutex_init(&mddev->sync_mutex);
+ mutex_init(&mddev->suspend_mutex);
mutex_init(&mddev->bitmap_info.mutex);
INIT_LIST_HEAD(&mddev->disks);
INIT_LIST_HEAD(&mddev->all_mddevs);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 82d2c4ed9aca..32bfe0fe0d97 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -308,6 +308,7 @@ struct mddev {
unsigned long sb_flags;

int suspended;
+ struct mutex suspend_mutex;
struct percpu_ref active_io;
int ro;
int sysfs_active; /* set when sysfs deletes
@@ -803,6 +804,8 @@ extern void md_rdev_clear(struct md_rdev *rdev);
extern void md_handle_request(struct mddev *mddev, struct bio *bio);
extern void mddev_suspend(struct mddev *mddev);
extern void mddev_resume(struct mddev *mddev);
+extern void __mddev_suspend(struct mddev *mddev);
+extern void __mddev_resume(struct mddev *mddev);

extern void md_reload_sb(struct mddev *mddev, int raid_disk);
extern void md_update_sb(struct mddev *mddev, int force);
--
2.39.2


2023-08-03 14:07:12

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 28/29] md/raid5: Revert "md/raid5: fix a deadlock in the case that reshape is interrupted"

From: Yu Kuai <[email protected]>

This reverts commit 868bba54a3bcbfc34314e963d5a7e66717f39d4e.

Because the old apis mddev_suspend() and mddev_resume() is removed, and
the deadlock is not possible anymore.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 1 -
drivers/md/raid5.c | 44 +-------------------------------------------
2 files changed, 1 insertion(+), 44 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3f4537876cc2..f480f225781a 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9165,7 +9165,6 @@ void md_do_sync(struct md_thread *thread)
spin_unlock(&mddev->lock);

wake_up(&resync_wait);
- wake_up(&mddev->sb_wait);
md_wakeup_thread(mddev->thread);
return;
}
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 4709b7ad0317..8a3045e4d43f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5953,19 +5953,6 @@ static int add_all_stripe_bios(struct r5conf *conf,
return ret;
}

-static bool reshape_inprogress(struct mddev *mddev)
-{
- return test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
- test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
- !test_bit(MD_RECOVERY_DONE, &mddev->recovery) &&
- !test_bit(MD_RECOVERY_INTR, &mddev->recovery);
-}
-
-static bool reshape_disabled(struct mddev *mddev)
-{
- return is_md_suspended(mddev) || !md_is_rdwr(mddev);
-}
-
static enum stripe_result make_stripe_request(struct mddev *mddev,
struct r5conf *conf, struct stripe_request_ctx *ctx,
sector_t logical_sector, struct bio *bi)
@@ -5997,8 +5984,7 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
if (ahead_of_reshape(mddev, logical_sector,
conf->reshape_safe)) {
spin_unlock_irq(&conf->device_lock);
- ret = STRIPE_SCHEDULE_AND_RETRY;
- goto out;
+ return STRIPE_SCHEDULE_AND_RETRY;
}
}
spin_unlock_irq(&conf->device_lock);
@@ -6077,15 +6063,6 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,

out_release:
raid5_release_stripe(sh);
-out:
- if (ret == STRIPE_SCHEDULE_AND_RETRY && !reshape_inprogress(mddev) &&
- reshape_disabled(mddev)) {
- bi->bi_status = BLK_STS_IOERR;
- ret = STRIPE_FAIL;
- pr_err("md/raid456:%s: io failed across reshape position while reshape can't make progress.\n",
- mdname(mddev));
- }
-
return ret;
}

@@ -9026,22 +9003,6 @@ static int raid5_start(struct mddev *mddev)
return r5l_start(conf->log);
}

-static void raid5_prepare_suspend(struct mddev *mddev)
-{
- struct r5conf *conf = mddev->private;
-
- wait_event(mddev->sb_wait, !reshape_inprogress(mddev) ||
- percpu_ref_is_zero(&mddev->active_io));
- if (percpu_ref_is_zero(&mddev->active_io))
- return;
-
- /*
- * Reshape is not in progress, and array is suspended, io that is
- * waiting for reshpape can never be done.
- */
- wake_up(&conf->wait_for_overlap);
-}
-
static struct md_personality raid6_personality =
{
.name = "raid6",
@@ -9062,7 +9023,6 @@ static struct md_personality raid6_personality =
.check_reshape = raid6_check_reshape,
.start_reshape = raid5_start_reshape,
.finish_reshape = raid5_finish_reshape,
- .prepare_suspend = raid5_prepare_suspend,
.quiesce = raid5_quiesce,
.takeover = raid6_takeover,
.change_consistency_policy = raid5_change_consistency_policy,
@@ -9087,7 +9047,6 @@ static struct md_personality raid5_personality =
.check_reshape = raid5_check_reshape,
.start_reshape = raid5_start_reshape,
.finish_reshape = raid5_finish_reshape,
- .prepare_suspend = raid5_prepare_suspend,
.quiesce = raid5_quiesce,
.takeover = raid5_takeover,
.change_consistency_policy = raid5_change_consistency_policy,
@@ -9113,7 +9072,6 @@ static struct md_personality raid4_personality =
.check_reshape = raid5_check_reshape,
.start_reshape = raid5_start_reshape,
.finish_reshape = raid5_finish_reshape,
- .prepare_suspend = raid5_prepare_suspend,
.quiesce = raid5_quiesce,
.takeover = raid4_takeover,
.change_consistency_policy = raid5_change_consistency_policy,
--
2.39.2


2023-08-03 14:09:29

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 05/29] md: use new apis to suspend array for suspend_lo/hi/store()

From: Yu Kuai <[email protected]>

Convert to use new apis, the old apis will be removed eventually.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3c98f253b980..e516c5000a00 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5271,15 +5271,10 @@ suspend_lo_store(struct mddev *mddev, const char *buf, size_t len)
if (new != (sector_t)new)
return -EINVAL;

- err = mddev_lock(mddev);
- if (err)
- return err;
-
- mddev_suspend(mddev);
+ __mddev_suspend(mddev);
WRITE_ONCE(mddev->suspend_lo, new);
- mddev_resume(mddev);
+ __mddev_resume(mddev);

- mddev_unlock(mddev);
return len;
}
static struct md_sysfs_entry md_suspend_lo =
@@ -5304,15 +5299,10 @@ suspend_hi_store(struct mddev *mddev, const char *buf, size_t len)
if (new != (sector_t)new)
return -EINVAL;

- err = mddev_lock(mddev);
- if (err)
- return err;
-
- mddev_suspend(mddev);
+ __mddev_suspend(mddev);
WRITE_ONCE(mddev->suspend_hi, new);
- mddev_resume(mddev);
+ __mddev_resume(mddev);

- mddev_unlock(mddev);
return len;
}
static struct md_sysfs_entry md_suspend_hi =
--
2.39.2


2023-08-03 14:09:30

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 21/29] md: use new apis to suspend array for bind_rdev_to_array()

From: Yu Kuai <[email protected]>

mddev_create_serial_pool() will be called from bind_rdev_to_array(), and
mddev_suspend() will be called if serial pool is used.

Prepare to remove the mddev_suspend() from mddev_create_serial_pool().

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md-autodetect.c | 4 ++--
drivers/md/md.c | 14 +++++++-------
2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/md/md-autodetect.c b/drivers/md/md-autodetect.c
index 6eaa0eab40f9..4b80165afd23 100644
--- a/drivers/md/md-autodetect.c
+++ b/drivers/md/md-autodetect.c
@@ -175,7 +175,7 @@ static void __init md_setup_drive(struct md_setup_args *args)
return;
}

- err = mddev_lock(mddev);
+ err = mddev_suspend_and_lock(mddev);
if (err) {
pr_err("md: failed to lock array %s\n", name);
goto out_mddev_put;
@@ -221,7 +221,7 @@ static void __init md_setup_drive(struct md_setup_args *args)
if (err)
pr_warn("md: starting %s failed\n", name);
out_unlock:
- mddev_unlock(mddev);
+ mddev_unlock_and_resume(mddev);
out_mddev_put:
mddev_put(mddev);
}
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 4a522558a570..ca1fe1d17ec1 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2534,7 +2534,7 @@ static int bind_rdev_to_array(struct md_rdev *rdev, struct mddev *mddev)
pr_debug("md: bind<%s>\n", b);

if (mddev->raid_disks)
- mddev_create_serial_pool(mddev, rdev, false);
+ mddev_create_serial_pool(mddev, rdev, true);

if ((err = kobject_add(&rdev->kobj, &mddev->kobj, "dev-%s", b)))
goto fail;
@@ -4666,7 +4666,7 @@ new_dev_store(struct mddev *mddev, const char *buf, size_t len)
minor != MINOR(dev))
return -EOVERFLOW;

- err = mddev_lock(mddev);
+ err = mddev_suspend_and_lock(mddev);
if (err)
return err;
if (mddev->persistent) {
@@ -4687,14 +4687,14 @@ new_dev_store(struct mddev *mddev, const char *buf, size_t len)
rdev = md_import_device(dev, -1, -1);

if (IS_ERR(rdev)) {
- mddev_unlock(mddev);
+ mddev_unlock_and_resume(mddev);
return PTR_ERR(rdev);
}
err = bind_rdev_to_array(rdev, mddev);
out:
if (err)
export_rdev(rdev, mddev);
- mddev_unlock(mddev);
+ mddev_unlock_and_resume(mddev);
if (!err)
md_new_event();
return err ? err : len;
@@ -6622,13 +6622,13 @@ static void autorun_devices(int part)
if (IS_ERR(mddev))
break;

- if (mddev_lock(mddev))
+ if (mddev_suspend_and_lock(mddev))
pr_warn("md: %s locked, cannot run\n", mdname(mddev));
else if (mddev->raid_disks || mddev->major_version
|| !list_empty(&mddev->disks)) {
pr_warn("md: %s already running, cannot run %pg\n",
mdname(mddev), rdev0->bdev);
- mddev_unlock(mddev);
+ mddev_unlock_and_resume(mddev);
} else {
pr_debug("md: created %s\n", mdname(mddev));
mddev->persistent = 1;
@@ -6638,7 +6638,7 @@ static void autorun_devices(int part)
export_rdev(rdev, mddev);
}
autorun_array(mddev);
- mddev_unlock(mddev);
+ mddev_unlock_and_resume(mddev);
}
/* on success, candidates will be empty, on error
* it won't...
--
2.39.2


2023-08-03 14:09:37

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 06/29] md: use new apis to suspend array for level_store()

From: Yu Kuai <[email protected]>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index e516c5000a00..fc1646f02ef3 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4001,7 +4001,7 @@ level_store(struct mddev *mddev, const char *buf, size_t len)
if (slen == 0 || slen >= sizeof(clevel))
return -EINVAL;

- rv = mddev_lock(mddev);
+ rv = mddev_suspend_and_lock(mddev);
if (rv)
return rv;

@@ -4094,7 +4094,6 @@ level_store(struct mddev *mddev, const char *buf, size_t len)
}

/* Looks like we have a winner */
- mddev_suspend(mddev);
mddev_detach(mddev);

spin_lock(&mddev->lock);
@@ -4180,14 +4179,13 @@ level_store(struct mddev *mddev, const char *buf, size_t len)
blk_set_stacking_limits(&mddev->queue->limits);
pers->run(mddev);
set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
- mddev_resume(mddev);
if (!mddev->thread)
md_update_sb(mddev, 1);
sysfs_notify_dirent_safe(mddev->sysfs_level);
md_new_event();
rv = len;
out_unlock:
- mddev_unlock(mddev);
+ mddev_unlock_and_resume(mddev);
return rv;
}

--
2.39.2


2023-08-03 14:11:12

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 02/29] md: use 'mddev->suspended' for is_md_suspended()

From: Yu Kuai <[email protected]>

'pers->prepare_suspend' is introduced to prevent a deadlock for raid456,
this change prepares to clean this up in later patches while refactoring
mddev_suspend().

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 2 +-
drivers/md/md.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5aa9f62a7c56..89f5175b1295 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -354,7 +354,7 @@ static DEFINE_SPINLOCK(all_mddevs_lock);
*/
static bool is_suspended(struct mddev *mddev, struct bio *bio)
{
- if (is_md_suspended(mddev))
+ if (is_md_suspended(mddev) || percpu_ref_is_dying(&mddev->active_io))
return true;
if (bio_data_dir(bio) != WRITE)
return false;
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 343dd89c13cf..82d2c4ed9aca 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -576,7 +576,7 @@ static inline bool md_is_rdwr(struct mddev *mddev)

static inline bool is_md_suspended(struct mddev *mddev)
{
- return percpu_ref_is_dying(&mddev->active_io);
+ return READ_ONCE(mddev->suspended) != 0;
}

static inline int __must_check mddev_lock(struct mddev *mddev)
--
2.39.2


2023-08-03 14:11:59

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 14/29] md/raid5: use new apis to suspend array for raid5_store_skip_copy()

From: Yu Kuai <[email protected]>

Convert to use new apis, the old apis will be removed eventually.

This is not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/raid5.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 624899e95b4d..b8dd23357430 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7151,7 +7151,7 @@ raid5_store_skip_copy(struct mddev *mddev, const char *page, size_t len)
return -EINVAL;
new = !!new;

- err = mddev_lock(mddev);
+ err = mddev_suspend_and_lock(mddev);
if (err)
return err;
conf = mddev->private;
@@ -7160,15 +7160,13 @@ raid5_store_skip_copy(struct mddev *mddev, const char *page, size_t len)
else if (new != conf->skip_copy) {
struct request_queue *q = mddev->queue;

- mddev_suspend(mddev);
conf->skip_copy = new;
if (new)
blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, q);
else
blk_queue_flag_clear(QUEUE_FLAG_STABLE_WRITES, q);
- mddev_resume(mddev);
}
- mddev_unlock(mddev);
+ mddev_unlock_and_resume(mddev);
return err ?: len;
}

--
2.39.2


2023-08-03 14:12:13

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 11/29] md/raid5-cache: use new apis to suspend array for r5c_disable_writeback_async()

From: Yu Kuai <[email protected]>

Convert to use new apis, the old apis will be removed eventually.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/raid5-cache.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index b6e0a6bb3965..c71cb5c954e0 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -686,7 +686,6 @@ static void r5c_disable_writeback_async(struct work_struct *work)
disable_writeback_work);
struct mddev *mddev = log->rdev->mddev;
struct r5conf *conf = mddev->private;
- int locked = 0;

if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH)
return;
@@ -696,13 +695,12 @@ static void r5c_disable_writeback_async(struct work_struct *work)
/* wait superblock change before suspend */
wait_event(mddev->sb_wait,
!READ_ONCE(conf->log) ||
- (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
- (locked = mddev_trylock(mddev))));
- if (locked) {
- mddev_suspend(mddev);
+ !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags));
+
+ if (READ_ONCE(conf->log)) {
+ __mddev_suspend(mddev);
log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_THROUGH;
- mddev_resume(mddev);
- mddev_unlock(mddev);
+ __mddev_resume(mddev);
}
}

--
2.39.2


2023-08-03 14:12:15

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 27/29] md: remove mddev_suspend() and mddev_resume()

From: Yu Kuai <[email protected]>

Now that mddev_suspend() and mddev_resume() is not used anywhere, remove
them, and remove 'MD_ALLOW_SB_UPDATE' and 'MD_UPDATING_SB' as well.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 80 -------------------------------------------------
drivers/md/md.h | 8 -----
2 files changed, 88 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 915a6f15d726..3f4537876cc2 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -417,74 +417,6 @@ static void md_submit_bio(struct bio *bio)
md_handle_request(mddev, bio);
}

-/* mddev_suspend makes sure no new requests are submitted
- * to the device, and that any requests that have been submitted
- * are completely handled.
- * Once mddev_detach() is called and completes, the module will be
- * completely unused.
- */
-void mddev_suspend(struct mddev *mddev)
-{
- struct md_thread *thread = rcu_dereference_protected(mddev->thread,
- lockdep_is_held(&mddev->reconfig_mutex));
-
- WARN_ON_ONCE(thread && current == thread->tsk);
-
- /* can't concurrent with __mddev_suspend() and __mddev_resume() */
- mutex_lock(&mddev->suspend_mutex);
- if (mddev->suspended++) {
- mutex_unlock(&mddev->suspend_mutex);
- return;
- }
-
- wake_up(&mddev->sb_wait);
- set_bit(MD_ALLOW_SB_UPDATE, &mddev->flags);
- percpu_ref_kill(&mddev->active_io);
-
- /*
- * TODO: cleanup 'pers->prepare_suspend after all callers are replaced
- * by __mddev_suspend().
- */
- if (mddev->pers && mddev->pers->prepare_suspend)
- mddev->pers->prepare_suspend(mddev);
-
- wait_event(mddev->sb_wait, percpu_ref_is_zero(&mddev->active_io));
- clear_bit_unlock(MD_ALLOW_SB_UPDATE, &mddev->flags);
- wait_event(mddev->sb_wait, !test_bit(MD_UPDATING_SB, &mddev->flags));
-
- del_timer_sync(&mddev->safemode_timer);
- /* restrict memory reclaim I/O during raid array is suspend */
- mddev->noio_flag = memalloc_noio_save();
-
- mutex_unlock(&mddev->suspend_mutex);
-}
-EXPORT_SYMBOL_GPL(mddev_suspend);
-
-void mddev_resume(struct mddev *mddev)
-{
- lockdep_assert_held(&mddev->reconfig_mutex);
-
- /* can't concurrent with __mddev_suspend() and __mddev_resume() */
- mutex_lock(&mddev->suspend_mutex);
- if (--mddev->suspended) {
- mutex_unlock(&mddev->suspend_mutex);
- return;
- }
-
- /* entred the memalloc scope from mddev_suspend() */
- memalloc_noio_restore(mddev->noio_flag);
-
- percpu_ref_resurrect(&mddev->active_io);
- wake_up(&mddev->sb_wait);
-
- set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
- md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */
-
- mutex_unlock(&mddev->suspend_mutex);
-}
-EXPORT_SYMBOL_GPL(mddev_resume);
-
void __mddev_suspend(struct mddev *mddev)
{

@@ -9455,18 +9387,6 @@ static void md_start_sync(struct work_struct *ws)
*/
void md_check_recovery(struct mddev *mddev)
{
- if (test_bit(MD_ALLOW_SB_UPDATE, &mddev->flags) && mddev->sb_flags) {
- /* Write superblock - thread that called mddev_suspend()
- * holds reconfig_mutex for us.
- */
- set_bit(MD_UPDATING_SB, &mddev->flags);
- smp_mb__after_atomic();
- if (test_bit(MD_ALLOW_SB_UPDATE, &mddev->flags))
- md_update_sb(mddev, 0);
- clear_bit_unlock(MD_UPDATING_SB, &mddev->flags);
- wake_up(&mddev->sb_wait);
- }
-
if (is_md_suspended(mddev))
return;

diff --git a/drivers/md/md.h b/drivers/md/md.h
index 24adb603461d..492c5a130da3 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -240,10 +240,6 @@ struct md_cluster_info;
* become failed.
* @MD_HAS_PPL: The raid array has PPL feature set.
* @MD_HAS_MULTIPLE_PPLS: The raid array has multiple PPLs feature set.
- * @MD_ALLOW_SB_UPDATE: md_check_recovery is allowed to update the metadata
- * without taking reconfig_mutex.
- * @MD_UPDATING_SB: md_check_recovery is updating the metadata without
- * explicitly holding reconfig_mutex.
* @MD_NOT_READY: do_md_run() is active, so 'array_state', ust not report that
* array is ready yet.
* @MD_BROKEN: This is used to stop writes and mark array as failed.
@@ -260,8 +256,6 @@ enum mddev_flags {
MD_FAILFAST_SUPPORTED,
MD_HAS_PPL,
MD_HAS_MULTIPLE_PPLS,
- MD_ALLOW_SB_UPDATE,
- MD_UPDATING_SB,
MD_NOT_READY,
MD_BROKEN,
MD_DELETED,
@@ -802,8 +796,6 @@ extern int md_rdev_init(struct md_rdev *rdev);
extern void md_rdev_clear(struct md_rdev *rdev);

extern void md_handle_request(struct mddev *mddev, struct bio *bio);
-extern void mddev_suspend(struct mddev *mddev);
-extern void mddev_resume(struct mddev *mddev);
extern void __mddev_suspend(struct mddev *mddev);
extern void __mddev_resume(struct mddev *mddev);

--
2.39.2


2023-08-03 14:12:26

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 08/29] md/dm-raid: use new apis to suspend array

From: Yu Kuai <[email protected]>

Convert to use new apis, the old apis will be removed eventually.

These are not hot path, so performance is not concerned.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/dm-raid.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 020598c10db0..2ff33b5d9a1b 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3244,7 +3244,7 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv)
set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery);

/* Has to be held on running the array */
- mddev_lock_nointr(&rs->md);
+ mddev_suspend_and_lock_nointr(&rs->md);
r = md_run(&rs->md);
rs->md.in_sync = 0; /* Assume already marked dirty */
if (r) {
@@ -3270,7 +3270,6 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv)
}
}

- mddev_suspend(&rs->md);
set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags);

/* Try to adjust the raid4/5/6 stripe cache size to the stripe size */
@@ -3800,9 +3799,7 @@ static void raid_postsuspend(struct dm_target *ti)
if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
md_stop_writes(&rs->md);

- mddev_lock_nointr(&rs->md);
- mddev_suspend(&rs->md);
- mddev_unlock(&rs->md);
+ __mddev_suspend(&rs->md);
}
}

@@ -4014,7 +4011,7 @@ static int raid_preresume(struct dm_target *ti)
}

/* Check for any resize/reshape on @rs and adjust/initiate */
- /* Be prepared for mddev_resume() in raid_resume() */
+ /* Be prepared for __mddev_resume() in raid_resume() */
set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
if (mddev->recovery_cp && mddev->recovery_cp < MaxSector) {
set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
@@ -4061,8 +4058,7 @@ static void raid_resume(struct dm_target *ti)
clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
mddev->ro = 0;
mddev->in_sync = 0;
- mddev_resume(mddev);
- mddev_unlock(mddev);
+ mddev_unlock_and_resume(mddev);
}
}

--
2.39.2


2023-08-03 14:12:51

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 18/29] md: quiesce before md_kick_rdev_from_array() for md-cluster

From: Yu Kuai <[email protected]>

md_kick_rdev_from_array() can be called from md_check_recovery() and
md_reload_sb() for md-cluster, it's very complicated to use new apis to
suspend the array before holding 'reconfig_mutex' in this case.

Fortunately, md-cluster is only supported for raid1 and raid10, and they
both impelement quiesce() callback that is safe to be called from daemon
thread. Hence use quiesce() callback to prevent io concurrent with
removing rdev from the array.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 765667b5fa59..d550bacd0efc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9514,6 +9514,21 @@ void md_check_recovery(struct mddev *mddev)

if (mddev_is_clustered(mddev)) {
struct md_rdev *rdev, *tmp;
+ bool suspended = false;
+
+ /*
+ * md-cluster is used for raid1/raid10, and they both
+ * implement quiesce() callback that is safe to be
+ * called from daemon thread.
+ */
+ rdev_for_each(rdev, mddev)
+ if (test_bit(ClusterRemove, &rdev->flags) &&
+ rdev->raid_disk < 0) {
+ mddev->pers->quiesce(mddev, true);
+ suspended = true;
+ break;
+ }
+
/* kick the device if another node issued a
* remove disk.
*/
@@ -9522,6 +9537,9 @@ void md_check_recovery(struct mddev *mddev)
rdev->raid_disk < 0)
md_kick_rdev_from_array(rdev);
}
+
+ if (suspended)
+ mddev->pers->quiesce(mddev, false);
}

if (try_set_sync && !mddev->external && !mddev->in_sync) {
@@ -9814,6 +9832,7 @@ static void check_sb_changes(struct mddev *mddev, struct md_rdev *rdev)
{
struct mdp_superblock_1 *sb = page_address(rdev->sb_page);
struct md_rdev *rdev2, *tmp;
+ bool suspended = false;
int role, ret;

/*
@@ -9828,6 +9847,22 @@ static void check_sb_changes(struct mddev *mddev, struct md_rdev *rdev)
md_bitmap_update_sb(mddev->bitmap);
}

+ /*
+ * md-cluster is used for raid1/raid10, and they both
+ * implement quiesce() callback.
+ */
+ rdev_for_each(rdev2, mddev) {
+ if (test_bit(Faulty, &rdev2->flags))
+ continue;
+ role = le16_to_cpu(sb->dev_roles[rdev2->desc_nr]);
+ if (test_bit(Candidate, &rdev2->flags) &&
+ role == MD_DISK_ROLE_FAULTY) {
+ mddev->pers->quiesce(mddev, true);
+ suspended = true;
+ break;
+ }
+ }
+
/* Check for change of roles in the active devices */
rdev_for_each_safe(rdev2, tmp, mddev) {
if (test_bit(Faulty, &rdev2->flags))
@@ -9883,6 +9918,9 @@ static void check_sb_changes(struct mddev *mddev, struct md_rdev *rdev)
}
}

+ if (suspended)
+ mddev->pers->quiesce(mddev, false);
+
if (mddev->raid_disks != le32_to_cpu(sb->raid_disks)) {
ret = update_raid_disks(mddev, le32_to_cpu(sb->raid_disks));
if (ret)
--
2.39.2


2023-08-22 01:41:01

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH -next 10/29] md/raid5-cache: use READ_ONCE/WRITE_ONCE for 'conf->log'

Hi,

在 2023/08/22 8:15, Song Liu 写道:
> On Thu, Aug 3, 2023 at 6:32 AM Yu Kuai <[email protected]> wrote:
>>
>> From: Yu Kuai <[email protected]>
>>
>> 'conf->log' is set with 'reconfig_mutex' grabbed, however, readers are
>> not procted, hence use READ_ONCE/WRITE_ONCE to prevent reading abnormal
>> value.
>>
>> Signed-off-by: Yu Kuai <[email protected]>
>
> It appears to me that this patch doesn't apply to md-next. Please rebase
> and resend all 3 sets.

Yes, I'll rebase previous 2 sets first, and wait for then before rebase
this set.

Thanks,
Kuai

>
> Thanks,
> Song
> .
>


2023-08-22 07:37:18

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH -next 10/29] md/raid5-cache: use READ_ONCE/WRITE_ONCE for 'conf->log'

On Thu, Aug 3, 2023 at 6:32 AM Yu Kuai <[email protected]> wrote:
>
> From: Yu Kuai <[email protected]>
>
> 'conf->log' is set with 'reconfig_mutex' grabbed, however, readers are
> not procted, hence use READ_ONCE/WRITE_ONCE to prevent reading abnormal
> value.
>
> Signed-off-by: Yu Kuai <[email protected]>

It appears to me that this patch doesn't apply to md-next. Please rebase
and resend all 3 sets.

Thanks,
Song