2023-11-29 04:32:39

by Yu Kuai

[permalink] [raw]
Subject: [PATCH v3 0/3] md: fix stopping sync thread

From: Yu Kuai <[email protected]>

Changes in v3:
- split bugfix patches for md-fixes

Changes in v2:
- add patch 2;
- split some patches from v1 that will be sent separately;
- rework some commit message;
- rework patch 5;

Yu Kuai (3):
md: fix missing flush of sync_work
md: don't leave 'MD_RECOVERY_FROZEN' in error path of
md_set_readonly()
md: fix stopping sync thread

drivers/md/md.c | 120 +++++++++++++++++++++++-------------------------
1 file changed, 58 insertions(+), 62 deletions(-)

--
2.39.2


2023-11-29 04:32:40

by Yu Kuai

[permalink] [raw]
Subject: [PATCH v3 2/3] md: don't leave 'MD_RECOVERY_FROZEN' in error path of md_set_readonly()

From: Yu Kuai <[email protected]>

If md_set_readonly() failed, the array could still be read-write, however
'MD_RECOVERY_FROZEN' could still be set, which leave the array in an
abnormal state that sync or recovery can't continue anymore.
Hence make sure the flag is cleared after md_set_readonly() returns.

Signed-off-by: Yu Kuai <[email protected]>
Acked-by: Xiao Ni <[email protected]>
---
drivers/md/md.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5640a948086b..2d8e45a1af23 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6355,6 +6355,9 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
int err = 0;
int did_freeze = 0;

+ if (mddev->external && test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
+ return -EBUSY;
+
if (!test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
did_freeze = 1;
set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
@@ -6369,8 +6372,6 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
*/
md_wakeup_thread_directly(mddev->sync_thread);

- if (mddev->external && test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
- return -EBUSY;
mddev_unlock(mddev);
wait_event(resync_wait, !test_bit(MD_RECOVERY_RUNNING,
&mddev->recovery));
@@ -6383,29 +6384,30 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
mddev->sync_thread ||
test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
pr_warn("md: %s still in use.\n",mdname(mddev));
- if (did_freeze) {
- clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
- set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
- }
err = -EBUSY;
goto out;
}
+
if (mddev->pers) {
__md_stop_writes(mddev);

- err = -ENXIO;
- if (mddev->ro == MD_RDONLY)
+ if (mddev->ro == MD_RDONLY) {
+ err = -ENXIO;
goto out;
+ }
+
mddev->ro = MD_RDONLY;
set_disk_ro(mddev->gendisk, 1);
+ }
+
+out:
+ if ((mddev->pers && !err) || did_freeze) {
clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
md_wakeup_thread(mddev->thread);
sysfs_notify_dirent_safe(mddev->sysfs_state);
- err = 0;
}
-out:
+
mutex_unlock(&mddev->open_mutex);
return err;
}
--
2.39.2

2023-11-29 04:51:11

by Yu Kuai

[permalink] [raw]
Subject: [PATCH v3 3/3] md: fix stopping sync thread

From: Yu Kuai <[email protected]>

Currently sync thread is stopped from multiple contex:
- idle_sync_thread
- frozen_sync_thread
- __md_stop_writes
- md_set_readonly
- do_md_stop

And there are some problems:
1) sync_work is flushed while reconfig_mutex is grabbed, this can
deadlock because the work function will grab reconfig_mutex as well.
2) md_reap_sync_thread() can't be called directly while md_do_sync() is
not finished yet, for example, commit 130443d60b1b ("md: refactor
idle/frozen_sync_thread() to fix deadlock").
3) If MD_RECOVERY_RUNNING is not set, there is no need to stop
sync_thread at all because sync_thread must not be registered.

Factor out a helper prepare_to_stop_sync_thread(), so that above contex
will behave the same. Fix 1) by flushing sync_work after reconfig_mutex
is released, before waiting for sync_thread to be done; Fix 2) bt
letting daemon thread to unregister sync_thread; Fix 3) by always
checking MD_RECOVERY_RUNNING first.

Fixes: db5e653d7c9f ("md: delay choosing sync action to md_start_sync()")
Acked-by: Xiao Ni <[email protected]>

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 96 +++++++++++++++++++++++--------------------------
1 file changed, 45 insertions(+), 51 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2d8e45a1af23..05902e36db66 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4840,26 +4840,9 @@ action_show(struct mddev *mddev, char *page)
return sprintf(page, "%s\n", type);
}

-static void stop_sync_thread(struct mddev *mddev)
+static void prepare_to_stop_sync_thread(struct mddev *mddev)
+ __releases(&mddev->reconfig_mutex)
{
- if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
- return;
-
- if (mddev_lock(mddev))
- return;
-
- /*
- * Check again in case MD_RECOVERY_RUNNING is cleared before lock is
- * held.
- */
- if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
- mddev_unlock(mddev);
- return;
- }
-
- if (work_pending(&mddev->sync_work))
- flush_workqueue(md_misc_wq);
-
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
/*
* Thread might be blocked waiting for metadata update which will now
@@ -4868,6 +4851,8 @@ static void stop_sync_thread(struct mddev *mddev)
md_wakeup_thread_directly(mddev->sync_thread);

mddev_unlock(mddev);
+ if (work_pending(&mddev->sync_work))
+ flush_work(&mddev->sync_work);
}

static void idle_sync_thread(struct mddev *mddev)
@@ -4876,10 +4861,20 @@ static void idle_sync_thread(struct mddev *mddev)

mutex_lock(&mddev->sync_mutex);
clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
- stop_sync_thread(mddev);

- wait_event(resync_wait, sync_seq != atomic_read(&mddev->sync_seq) ||
+ if (mddev_lock(mddev)) {
+ mutex_unlock(&mddev->sync_mutex);
+ return;
+ }
+
+ if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
+ prepare_to_stop_sync_thread(mddev);
+ wait_event(resync_wait,
+ sync_seq != atomic_read(&mddev->sync_seq) ||
!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
+ } else {
+ mddev_unlock(mddev);
+ }

mutex_unlock(&mddev->sync_mutex);
}
@@ -4888,10 +4883,19 @@ static void frozen_sync_thread(struct mddev *mddev)
{
mutex_lock(&mddev->sync_mutex);
set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
- stop_sync_thread(mddev);

- wait_event(resync_wait, mddev->sync_thread == NULL &&
+ if (mddev_lock(mddev)) {
+ mutex_unlock(&mddev->sync_mutex);
+ return;
+ }
+
+ if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
+ prepare_to_stop_sync_thread(mddev);
+ wait_event(resync_wait,
!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
+ } else {
+ mddev_unlock(mddev);
+ }

mutex_unlock(&mddev->sync_mutex);
}
@@ -6265,11 +6269,11 @@ static void md_clean(struct mddev *mddev)
static void __md_stop_writes(struct mddev *mddev)
{
set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
- if (work_pending(&mddev->sync_work))
- flush_workqueue(md_misc_wq);
- if (mddev->sync_thread) {
- set_bit(MD_RECOVERY_INTR, &mddev->recovery);
- md_reap_sync_thread(mddev);
+ if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
+ prepare_to_stop_sync_thread(mddev);
+ wait_event(resync_wait, !test_bit(MD_RECOVERY_RUNNING,
+ &mddev->recovery));
+ mddev_lock_nointr(mddev);
}

del_timer_sync(&mddev->safemode_timer);
@@ -6363,18 +6367,15 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
md_wakeup_thread(mddev->thread);
}
- if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
- set_bit(MD_RECOVERY_INTR, &mddev->recovery);

- /*
- * Thread might be blocked waiting for metadata update which will now
- * never happen
- */
- md_wakeup_thread_directly(mddev->sync_thread);
+ if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
+ prepare_to_stop_sync_thread(mddev);
+ wait_event(resync_wait, !test_bit(MD_RECOVERY_RUNNING,
+ &mddev->recovery));
+ } else {
+ mddev_unlock(mddev);
+ }

- mddev_unlock(mddev);
- wait_event(resync_wait, !test_bit(MD_RECOVERY_RUNNING,
- &mddev->recovery));
wait_event(mddev->sb_wait,
!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags));
mddev_lock_nointr(mddev);
@@ -6428,20 +6429,13 @@ static int do_md_stop(struct mddev *mddev, int mode,
set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
md_wakeup_thread(mddev->thread);
}
- if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
- set_bit(MD_RECOVERY_INTR, &mddev->recovery);

- /*
- * Thread might be blocked waiting for metadata update which will now
- * never happen
- */
- md_wakeup_thread_directly(mddev->sync_thread);
-
- mddev_unlock(mddev);
- wait_event(resync_wait, (mddev->sync_thread == NULL &&
- !test_bit(MD_RECOVERY_RUNNING,
- &mddev->recovery)));
- mddev_lock_nointr(mddev);
+ if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
+ prepare_to_stop_sync_thread(mddev);
+ wait_event(resync_wait, !test_bit(MD_RECOVERY_RUNNING,
+ &mddev->recovery));
+ mddev_lock_nointr(mddev);
+ }

mutex_lock(&mddev->open_mutex);
if ((mddev->pers && atomic_read(&mddev->openers) > !!bdev) ||
--
2.39.2

2023-12-01 20:53:56

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] md: don't leave 'MD_RECOVERY_FROZEN' in error path of md_set_readonly()

On Tue, Nov 28, 2023 at 8:32 PM Yu Kuai <[email protected]> wrote:
>
> From: Yu Kuai <[email protected]>
>
> If md_set_readonly() failed, the array could still be read-write, however
> 'MD_RECOVERY_FROZEN' could still be set, which leave the array in an
> abnormal state that sync or recovery can't continue anymore.
> Hence make sure the flag is cleared after md_set_readonly() returns.
>
> Signed-off-by: Yu Kuai <[email protected]>
> Acked-by: Xiao Ni <[email protected]>

Since we are shipping this via the md-fixes branch, we need a Fixes tag.

> ---
> drivers/md/md.c | 24 +++++++++++++-----------
> 1 file changed, 13 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 5640a948086b..2d8e45a1af23 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6355,6 +6355,9 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
> int err = 0;
> int did_freeze = 0;
>
> + if (mddev->external && test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
> + return -EBUSY;
> +
> if (!test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
> did_freeze = 1;
> set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> @@ -6369,8 +6372,6 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
> */
> md_wakeup_thread_directly(mddev->sync_thread);
>
> - if (mddev->external && test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
> - return -EBUSY;
> mddev_unlock(mddev);
> wait_event(resync_wait, !test_bit(MD_RECOVERY_RUNNING,
> &mddev->recovery));
> @@ -6383,29 +6384,30 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
> mddev->sync_thread ||
> test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
> pr_warn("md: %s still in use.\n",mdname(mddev));
> - if (did_freeze) {
> - clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> - md_wakeup_thread(mddev->thread);
> - }

This change (move did_freeze, etc.) is not explained in the commit log.
Is it just refactor?

Thanks,
Song


> err = -EBUSY;
> goto out;
> }
> +
> if (mddev->pers) {
> __md_stop_writes(mddev);
>
> - err = -ENXIO;
> - if (mddev->ro == MD_RDONLY)
> + if (mddev->ro == MD_RDONLY) {
> + err = -ENXIO;
> goto out;
> + }
> +
> mddev->ro = MD_RDONLY;
> set_disk_ro(mddev->gendisk, 1);
> + }
> +
> +out:
> + if ((mddev->pers && !err) || did_freeze) {
> clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> md_wakeup_thread(mddev->thread);
> sysfs_notify_dirent_safe(mddev->sysfs_state);
> - err = 0;
> }
> -out:
> +
> mutex_unlock(&mddev->open_mutex);
> return err;
> }
> --
> 2.39.2
>

2023-12-01 22:11:08

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] md: fix stopping sync thread

On Tue, Nov 28, 2023 at 8:32 PM Yu Kuai <[email protected]> wrote:
>
> From: Yu Kuai <[email protected]>
>
> Currently sync thread is stopped from multiple contex:
> - idle_sync_thread
> - frozen_sync_thread
> - __md_stop_writes
> - md_set_readonly
> - do_md_stop
>
> And there are some problems:
> 1) sync_work is flushed while reconfig_mutex is grabbed, this can
> deadlock because the work function will grab reconfig_mutex as well.
> 2) md_reap_sync_thread() can't be called directly while md_do_sync() is
> not finished yet, for example, commit 130443d60b1b ("md: refactor
> idle/frozen_sync_thread() to fix deadlock").
> 3) If MD_RECOVERY_RUNNING is not set, there is no need to stop
> sync_thread at all because sync_thread must not be registered.
>
> Factor out a helper prepare_to_stop_sync_thread(), so that above contex
> will behave the same. Fix 1) by flushing sync_work after reconfig_mutex
> is released, before waiting for sync_thread to be done; Fix 2) bt
> letting daemon thread to unregister sync_thread; Fix 3) by always
> checking MD_RECOVERY_RUNNING first.
>
> Fixes: db5e653d7c9f ("md: delay choosing sync action to md_start_sync()")
> Acked-by: Xiao Ni <[email protected]>
>
> Signed-off-by: Yu Kuai <[email protected]>
> ---
> drivers/md/md.c | 96 +++++++++++++++++++++++--------------------------
> 1 file changed, 45 insertions(+), 51 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 2d8e45a1af23..05902e36db66 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4840,26 +4840,9 @@ action_show(struct mddev *mddev, char *page)
> return sprintf(page, "%s\n", type);
> }
>
> -static void stop_sync_thread(struct mddev *mddev)
> +static void prepare_to_stop_sync_thread(struct mddev *mddev)
> + __releases(&mddev->reconfig_mutex)
> {
> - if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
> - return;
> -
> - if (mddev_lock(mddev))
> - return;
> -
> - /*
> - * Check again in case MD_RECOVERY_RUNNING is cleared before lock is
> - * held.
> - */
> - if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
> - mddev_unlock(mddev);
> - return;
> - }
> -
> - if (work_pending(&mddev->sync_work))
> - flush_workqueue(md_misc_wq);
> -
> set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> /*
> * Thread might be blocked waiting for metadata update which will now
> @@ -4868,6 +4851,8 @@ static void stop_sync_thread(struct mddev *mddev)
> md_wakeup_thread_directly(mddev->sync_thread);
>
> mddev_unlock(mddev);
> + if (work_pending(&mddev->sync_work))
> + flush_work(&mddev->sync_work);
> }
>
> static void idle_sync_thread(struct mddev *mddev)
> @@ -4876,10 +4861,20 @@ static void idle_sync_thread(struct mddev *mddev)
>
> mutex_lock(&mddev->sync_mutex);
> clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> - stop_sync_thread(mddev);
>
> - wait_event(resync_wait, sync_seq != atomic_read(&mddev->sync_seq) ||
> + if (mddev_lock(mddev)) {
> + mutex_unlock(&mddev->sync_mutex);
> + return;
> + }
> +
> + if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
> + prepare_to_stop_sync_thread(mddev);
> + wait_event(resync_wait,
> + sync_seq != atomic_read(&mddev->sync_seq) ||
> !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
> + } else {
> + mddev_unlock(mddev);
> + }

We have a few patterns like this:

if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
prepare_to_stop_sync_thread(mddev);
wait_event(resync_wait,
sync_seq != atomic_read(&mddev->sync_seq) ||
!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
} else {
mddev_unlock(mddev);
}

This is pretty confusing. Maybe try something like (untested)

static void stop_sync_thread(struct mddev *mddev,
bool do_unlock, bool check_sync_seq)
{
if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
if (do_unlock)
mddev_unlock(mddev);
return;
}
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
/*
* Thread might be blocked waiting for metadata update which will now
* never happen
*/
md_wakeup_thread_directly(mddev->sync_thread);

mddev_unlock(mddev);
if (work_pending(&mddev->sync_work))
flush_work(&mddev->sync_work);
wait_event(resync_wait,
!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
(check_sync_seq && sync_seq !=
atomic_read(&mddev->sync_seq)));
if (!do_unlock)
mddev_lock_nointr(mddev);
}

Thanks,
Song

2023-12-02 07:45:20

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] md: don't leave 'MD_RECOVERY_FROZEN' in error path of md_set_readonly()

Hi,

在 2023/12/02 4:53, Song Liu 写道:
> On Tue, Nov 28, 2023 at 8:32 PM Yu Kuai <[email protected]> wrote:
>>
>> From: Yu Kuai <[email protected]>
>>
>> If md_set_readonly() failed, the array could still be read-write, however
>> 'MD_RECOVERY_FROZEN' could still be set, which leave the array in an
>> abnormal state that sync or recovery can't continue anymore.
>> Hence make sure the flag is cleared after md_set_readonly() returns.
>>
>> Signed-off-by: Yu Kuai <[email protected]>
>> Acked-by: Xiao Ni <[email protected]>
>
> Since we are shipping this via the md-fixes branch, we need a Fixes tag.

Okay, I'll add following fix tag:

Fixes: 88724bfa68be ("md: wait for pending superblock updates before
switching to read-only")
>
>> ---
>> drivers/md/md.c | 24 +++++++++++++-----------
>> 1 file changed, 13 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 5640a948086b..2d8e45a1af23 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -6355,6 +6355,9 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
>> int err = 0;
>> int did_freeze = 0;
>>
>> + if (mddev->external && test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
>> + return -EBUSY;
>> +
>> if (!test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
>> did_freeze = 1;
>> set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>> @@ -6369,8 +6372,6 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
>> */
>> md_wakeup_thread_directly(mddev->sync_thread);
>>
>> - if (mddev->external && test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
>> - return -EBUSY;
>> mddev_unlock(mddev);
>> wait_event(resync_wait, !test_bit(MD_RECOVERY_RUNNING,
>> &mddev->recovery));
>> @@ -6383,29 +6384,30 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
>> mddev->sync_thread ||
>> test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
>> pr_warn("md: %s still in use.\n",mdname(mddev));
>> - if (did_freeze) {
>> - clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>> - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>> - md_wakeup_thread(mddev->thread);
>> - }
>
> This change (move did_freeze, etc.) is not explained in the commit log.
> Is it just refactor?

It is refactor, but it is also part of "make sure the flag is cleared
after md_set_readonly() returns", because now that MD_RECOVERY_FROZEN
will be cleared:

if ((mddev->pers && !err) || did_freeze)

Which means,
- If set readonly succeed, or;
- if something is wrong and did_freeze is set, exactly what this patch
tries to do;

Thanks,
Kuai

>
> Thanks,
> Song
>
>
>> err = -EBUSY;
>> goto out;
>> }
>> +
>> if (mddev->pers) {
>> __md_stop_writes(mddev);
>>
>> - err = -ENXIO;
>> - if (mddev->ro == MD_RDONLY)
>> + if (mddev->ro == MD_RDONLY) {
>> + err = -ENXIO;
>> goto out;
>> + }
>> +
>> mddev->ro = MD_RDONLY;
>> set_disk_ro(mddev->gendisk, 1);
>> + }
>> +
>> +out:
>> + if ((mddev->pers && !err) || did_freeze) {
>> clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>> set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>> md_wakeup_thread(mddev->thread);
>> sysfs_notify_dirent_safe(mddev->sysfs_state);
>> - err = 0;
>> }
>> -out:
>> +
>> mutex_unlock(&mddev->open_mutex);
>> return err;
>> }
>> --
>> 2.39.2
>>
> .
>

2023-12-02 07:51:45

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] md: fix stopping sync thread

Hi,

在 2023/12/02 6:08, Song Liu 写道:
> On Tue, Nov 28, 2023 at 8:32 PM Yu Kuai <[email protected]> wrote:
>>
>> From: Yu Kuai <[email protected]>
>>
>> Currently sync thread is stopped from multiple contex:
>> - idle_sync_thread
>> - frozen_sync_thread
>> - __md_stop_writes
>> - md_set_readonly
>> - do_md_stop
>>
>> And there are some problems:
>> 1) sync_work is flushed while reconfig_mutex is grabbed, this can
>> deadlock because the work function will grab reconfig_mutex as well.
>> 2) md_reap_sync_thread() can't be called directly while md_do_sync() is
>> not finished yet, for example, commit 130443d60b1b ("md: refactor
>> idle/frozen_sync_thread() to fix deadlock").
>> 3) If MD_RECOVERY_RUNNING is not set, there is no need to stop
>> sync_thread at all because sync_thread must not be registered.
>>
>> Factor out a helper prepare_to_stop_sync_thread(), so that above contex
>> will behave the same. Fix 1) by flushing sync_work after reconfig_mutex
>> is released, before waiting for sync_thread to be done; Fix 2) bt
>> letting daemon thread to unregister sync_thread; Fix 3) by always
>> checking MD_RECOVERY_RUNNING first.
>>
>> Fixes: db5e653d7c9f ("md: delay choosing sync action to md_start_sync()")
>> Acked-by: Xiao Ni <[email protected]>
>>
>> Signed-off-by: Yu Kuai <[email protected]>
>> ---
>> drivers/md/md.c | 96 +++++++++++++++++++++++--------------------------
>> 1 file changed, 45 insertions(+), 51 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 2d8e45a1af23..05902e36db66 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -4840,26 +4840,9 @@ action_show(struct mddev *mddev, char *page)
>> return sprintf(page, "%s\n", type);
>> }
>>
>> -static void stop_sync_thread(struct mddev *mddev)
>> +static void prepare_to_stop_sync_thread(struct mddev *mddev)
>> + __releases(&mddev->reconfig_mutex)
>> {
>> - if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
>> - return;
>> -
>> - if (mddev_lock(mddev))
>> - return;
>> -
>> - /*
>> - * Check again in case MD_RECOVERY_RUNNING is cleared before lock is
>> - * held.
>> - */
>> - if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
>> - mddev_unlock(mddev);
>> - return;
>> - }
>> -
>> - if (work_pending(&mddev->sync_work))
>> - flush_workqueue(md_misc_wq);
>> -
>> set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> /*
>> * Thread might be blocked waiting for metadata update which will now
>> @@ -4868,6 +4851,8 @@ static void stop_sync_thread(struct mddev *mddev)
>> md_wakeup_thread_directly(mddev->sync_thread);
>>
>> mddev_unlock(mddev);
>> + if (work_pending(&mddev->sync_work))
>> + flush_work(&mddev->sync_work);
>> }
>>
>> static void idle_sync_thread(struct mddev *mddev)
>> @@ -4876,10 +4861,20 @@ static void idle_sync_thread(struct mddev *mddev)
>>
>> mutex_lock(&mddev->sync_mutex);
>> clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>> - stop_sync_thread(mddev);
>>
>> - wait_event(resync_wait, sync_seq != atomic_read(&mddev->sync_seq) ||
>> + if (mddev_lock(mddev)) {
>> + mutex_unlock(&mddev->sync_mutex);
>> + return;
>> + }
>> +
>> + if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
>> + prepare_to_stop_sync_thread(mddev);
>> + wait_event(resync_wait,
>> + sync_seq != atomic_read(&mddev->sync_seq) ||
>> !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
>> + } else {
>> + mddev_unlock(mddev);
>> + }
>
> We have a few patterns like this:
>
> if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
> prepare_to_stop_sync_thread(mddev);
> wait_event(resync_wait,
> sync_seq != atomic_read(&mddev->sync_seq) ||
> !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
> } else {
> mddev_unlock(mddev);
> }
>
> This is pretty confusing. Maybe try something like (untested)
>
> static void stop_sync_thread(struct mddev *mddev,
> bool do_unlock, bool check_sync_seq)
Yes, if we can accept use parameter to distinguish different use case.

Thanks,
Kuai

> {
> if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
> if (do_unlock)
> mddev_unlock(mddev);
> return;
> }
> set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> /*
> * Thread might be blocked waiting for metadata update which will now
> * never happen
> */
> md_wakeup_thread_directly(mddev->sync_thread);
>
> mddev_unlock(mddev);
> if (work_pending(&mddev->sync_work))
> flush_work(&mddev->sync_work);
> wait_event(resync_wait,
> !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
> (check_sync_seq && sync_seq !=
> atomic_read(&mddev->sync_seq)));
> if (!do_unlock)
> mddev_lock_nointr(mddev);
> }
>
> Thanks,
> Song
>
> .
>