LinuxLists.cc - [PATCH 0/5] md: fix/prevent dm-raid regressions

2024-01-20 10:42:08

Subject: [PATCH 0/5] md: fix/prevent dm-raid regressions

From: Yu Kuai <[email protected]>

There are some problems that we fixed in md/raid, and some apis is changed.
However, dm-raid rely the old apis(noted that old apis is problematic in
corner cases), and now there are regressions in lvm2 testsuite.

This patchset fix some regressions(patch 1-3), and revert changes to
prevent regressions(patch 4,5). Noted that the problems in patch 4,5 is
not clear yet, and I'm not able to locate the root cause ASAP, hence I
decide to revert changes to prevent regressions first.

Yu Kuai (5):
md: don't ignore suspended array in md_check_recovery()
md: don't ignore read-only array in md_check_recovery()
md: make sure md_do_sync() will set MD_RECOVERY_DONE
md: revert commit fa2bbff7b0b4 ("md: synchronize flush io with array
reconfiguration") for dm-raid
md: use md_reap_sync_thread() directly for dm-raid

drivers/md/md.c | 58 ++++++++++++++++++++++++++++++-------------------
1 file changed, 36 insertions(+), 22 deletions(-)

--
2.39.2

2024-01-20 10:42:10

by Yu Kuai

[permalink] [raw]

Subject: [PATCH 2/5] md: don't ignore read-only array in md_check_recovery()

From: Yu Kuai <[email protected]>

If array is read-only and md_do_sync() is done. Currently
md_check_recovery() ignore this case, henc sync_thread can't be
unregistered.

Before this patch, use stop_sync_thread() directly for read-only
array will hang because md_check_recovery() can't clear
MD_RECOVERY_RUNNING, which is possible for dm-raid.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 31 ++++++++++++++++++-------------
1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 07b80278eaa5..6906d023f1d6 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9445,6 +9445,20 @@ static void md_start_sync(struct work_struct *ws)
sysfs_notify_dirent_safe(mddev->sysfs_action);
}

+static void unregister_sync_thread(struct mddev *mddev)
+{
+ if (!test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
+ /* resync/recovery still happening */
+ clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+ return;
+ }
+
+ if (WARN_ON_ONCE(!mddev->sync_thread))
+ return;
+
+ md_reap_sync_thread(mddev);
+}
+
/*
* This routine is regularly called by all per-raid-array threads to
* deal with generic issues like resync and super-block update.
@@ -9482,7 +9496,8 @@ void md_check_recovery(struct mddev *mddev)
}

if (!md_is_rdwr(mddev) &&
- !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
+ !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) &&
+ !test_bit(MD_RECOVERY_DONE, &mddev->recovery))
return;
if ( ! (
(mddev->sb_flags & ~ (1<<MD_SB_CHANGE_PENDING)) ||
@@ -9504,8 +9519,7 @@ void md_check_recovery(struct mddev *mddev)
struct md_rdev *rdev;

if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
- /* sync_work already queued. */
- clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+ unregister_sync_thread(mddev);
goto unlock;
}

@@ -9568,16 +9582,7 @@ void md_check_recovery(struct mddev *mddev)
* still set.
*/
if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
- if (!test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
- /* resync/recovery still happening */
- clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- goto unlock;
- }
-
- if (WARN_ON_ONCE(!mddev->sync_thread))
- goto unlock;
-
- md_reap_sync_thread(mddev);
+ unregister_sync_thread(mddev);
goto unlock;
}

--
2.39.2

2024-01-20 10:42:54

by Yu Kuai

[permalink] [raw]

Subject: [PATCH 1/5] md: don't ignore suspended array in md_check_recovery()

From: Yu Kuai <[email protected]>

mddev_suspend() never stop sync_thread, hence it dones't make sense to
ignore suspended array in md_check_recovery(), which might cause
sync_thread can't be unregistered.

Before this patch, use stop_sync_thread() directly for syspended array
will hang because md_check_recovery() can't clear MD_RECOVERY_RUNING,
which is possible for dm-raid.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2266358d8074..07b80278eaa5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9469,9 +9469,6 @@ static void md_start_sync(struct work_struct *ws)
*/
void md_check_recovery(struct mddev *mddev)
{
- if (READ_ONCE(mddev->suspended))
- return;
-
if (mddev->bitmap)
md_bitmap_daemon_work(mddev);

--
2.39.2

2024-01-20 10:43:17

by Yu Kuai

[permalink] [raw]

Subject: [PATCH RFC 4/5] md: revert commit fa2bbff7b0b4 ("md: synchronize flush io with array reconfiguration") for dm-raid

From: Yu Kuai <[email protected]>

This commit is used to fix a problem for md/raid, due to rdev lifetime
in conf is different from the array. However, on the one hand, the
management of rdev is completely different from dm-raid; on the other
hand, this commit breaks dm-raid and the test shell/integrity-caching.sh
will hang.

The root cause of the hang is still not clear yet, however, let's revert
the commit for dm-raid to prevent regression first. We can decide what
to do after figuring out the root cause.

Fixes: fa2bbff7b0b4 ("md: synchronize flush io with array reconfiguration")
Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index ba45c7be3dbe..7db749ba7e60 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -544,7 +544,8 @@ static void md_end_flush(struct bio *bio)

if (atomic_dec_and_test(&mddev->flush_pending)) {
/* The pair is percpu_ref_get() from md_flush_request() */
- percpu_ref_put(&mddev->active_io);
+ if (mddev->gendisk)
+ percpu_ref_put(&mddev->active_io);

/* The pre-request flush has finished */
queue_work(md_wq, &mddev->flush_work);
@@ -640,7 +641,8 @@ bool md_flush_request(struct mddev *mddev, struct bio *bio)
* concurrently.
*/
WARN_ON(percpu_ref_is_zero(&mddev->active_io));
- percpu_ref_get(&mddev->active_io);
+ if (mddev->gendisk)
+ percpu_ref_get(&mddev->active_io);
mddev->flush_bio = bio;
bio = NULL;
}
--
2.39.2

2024-01-20 10:43:39

by Yu Kuai

[permalink] [raw]

Subject: [PATCH RFC 5/5] md: use md_reap_sync_thread() directly for dm-raid

From: Yu Kuai <[email protected]>

Now that previous patch make sure that stop_sync_thread() can successfully
stop sync_thread, and lvm2 tests won't hang anymore. However, the test
lvconvert-raid-reshape.sh still fail and complain that ext4 is
corrupted.

The root cause is still not clear yet, however, let's convert dm-raid
back to use md_reap_sync_thread() directly. This is not safe but at
least there won't be new regressions. We can decide what to do after
figuring out the root cause.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 7db749ba7e60..3e8dd020bf9f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4909,6 +4909,14 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq)
if (work_pending(&mddev->sync_work))
flush_work(&mddev->sync_work);

+ if (!mddev->gendisk) {
+ mddev_lock_nointr(mddev);
+ md_reap_sync_thread(mddev);
+ if (!locked)
+ mddev_unlock(mddev);
+ return;
+ }
+
wait_event(resync_wait,
!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
(check_seq && sync_seq != atomic_read(&mddev->sync_seq)));
--
2.39.2

2024-01-21 04:41:55

by Song Liu

[permalink] [raw]

Subject: Re: [PATCH 0/5] md: fix/prevent dm-raid regressions

On Sat, Jan 20, 2024 at 2:41 AM Yu Kuai <[email protected]> wrote:
>
> From: Yu Kuai <[email protected]>
>
> There are some problems that we fixed in md/raid, and some apis is changed.
> However, dm-raid rely the old apis(noted that old apis is problematic in
> corner cases), and now there are regressions in lvm2 testsuite.
>
> This patchset fix some regressions(patch 1-3), and revert changes to
> prevent regressions(patch 4,5). Noted that the problems in patch 4,5 is
> not clear yet, and I'm not able to locate the root cause ASAP, hence I
> decide to revert changes to prevent regressions first.

Thanks for looking into this!

Patch 1-3 look good to me. But since we need to back port these fixes
to 6.7 kernels, let's make it very clear what issues are being fixed.
Please:
1. Test on both Linus' master branch and 6.7.y, and explain which tests
are failing before the fixes. (From my tests, the two branches don't have
the same test results). We can put these results in the cover letter and
include them in a merge commit.
2. If possible, add Fixes tag to all patches.
3. Add more details in the commit log, so it is clear what is being fixed.
4. Add "reported-by" and maybe also "closes" tag.

For patch 4-5, especially 5, I wonder whether the same issue also
happens with md. We can probably ship 4-5 now, with the same
improvements as patch 1-3.

I will run more tests on my side.

Mykulas, please also review and test these patches.

Thanks,
Song

>
> Yu Kuai (5):
> md: don't ignore suspended array in md_check_recovery()
> md: don't ignore read-only array in md_check_recovery()
> md: make sure md_do_sync() will set MD_RECOVERY_DONE
> md: revert commit fa2bbff7b0b4 ("md: synchronize flush io with array
> reconfiguration") for dm-raid
> md: use md_reap_sync_thread() directly for dm-raid
>
> drivers/md/md.c | 58 ++++++++++++++++++++++++++++++-------------------
> 1 file changed, 36 insertions(+), 22 deletions(-)
>
> --
> 2.39.2
>
>

2024-01-22 01:19:39

by Yu Kuai

[permalink] [raw]

Subject: Re: [PATCH 0/5] md: fix/prevent dm-raid regressions

Hi,

在 2024/01/21 12:41, Song Liu 写道:
> On Sat, Jan 20, 2024 at 2:41 AM Yu Kuai <[email protected]> wrote:
>>
>> From: Yu Kuai <[email protected]>
>>
>> There are some problems that we fixed in md/raid, and some apis is changed.
>> However, dm-raid rely the old apis(noted that old apis is problematic in
>> corner cases), and now there are regressions in lvm2 testsuite.
>>
>> This patchset fix some regressions(patch 1-3), and revert changes to
>> prevent regressions(patch 4,5). Noted that the problems in patch 4,5 is
>> not clear yet, and I'm not able to locate the root cause ASAP, hence I
>> decide to revert changes to prevent regressions first.
>
> Thanks for looking into this!
>
> Patch 1-3 look good to me. But since we need to back port these fixes
> to 6.7 kernels, let's make it very clear what issues are being fixed.
> Please:
> 1. Test on both Linus' master branch and 6.7.y, and explain which tests
> are failing before the fixes. (From my tests, the two branches don't have
> the same test results). We can put these results in the cover letter and
> include them in a merge commit.
> 2. If possible, add Fixes tag to all patches.
> 3. Add more details in the commit log, so it is clear what is being fixed.
> 4. Add "reported-by" and maybe also "closes" tag.
>

Will do this is the next version. I verified that the following tests
will pass now in my VM:

shell/integrity-caching.sh
shell/lvconvert-raid-reshape.sh

> For patch 4-5, especially 5, I wonder whether the same issue also
> happens with md. We can probably ship 4-5 now, with the same
> improvements as patch 1-3.

With patch 1-3, the test lvconvert-raid-reshape.sh won't hang anymore,
however it still fails and complain that ext4 is corrupted, and I'm
still trying to understand how reshape works in dm-raid. :(
>
> I will run more tests on my side.

Notice that the problem Mykulas mentioned in the patch md: partially
revert "md/raid6: use valid sector values to determine if an I/O should
wait on the reshape" still exist. And again, I'm stll trying to
understand how raid5 works in detail.
>
> Mykulas, please also review and test these patches.
>
> Thanks,
> Song
>
>
>
>>
>> Yu Kuai (5):
>> md: don't ignore suspended array in md_check_recovery()
>> md: don't ignore read-only array in md_check_recovery()
>> md: make sure md_do_sync() will set MD_RECOVERY_DONE
>> md: revert commit fa2bbff7b0b4 ("md: synchronize flush io with array
>> reconfiguration") for dm-raid
>> md: use md_reap_sync_thread() directly for dm-raid
>>
>> drivers/md/md.c | 58 ++++++++++++++++++++++++++++++-------------------
>> 1 file changed, 36 insertions(+), 22 deletions(-)
>>
>> --
>> 2.39.2
>>
>>
> .
>

2024-01-22 08:46:32

by Yu Kuai

[permalink] [raw]

Subject: Re: [PATCH 0/5] md: fix/prevent dm-raid regressions

Hi,

在 2024/01/21 12:41, Song Liu 写道:
> On Sat, Jan 20, 2024 at 2:41 AM Yu Kuai <[email protected]> wrote:
>>
>> From: Yu Kuai <[email protected]>
>>
>> There are some problems that we fixed in md/raid, and some apis is changed.
>> However, dm-raid rely the old apis(noted that old apis is problematic in
>> corner cases), and now there are regressions in lvm2 testsuite.
>>
>> This patchset fix some regressions(patch 1-3), and revert changes to
>> prevent regressions(patch 4,5). Noted that the problems in patch 4,5 is
>> not clear yet, and I'm not able to locate the root cause ASAP, hence I
>> decide to revert changes to prevent regressions first.
>
> Thanks for looking into this!
>
> Patch 1-3 look good to me. But since we need to back port these fixes
> to 6.7 kernels, let's make it very clear what issues are being fixed.
> Please:

I'm attaching my test result here, before I send the next version.

The tested patched add following changes for patch 5:

@@ -9379,6 +9387,15 @@ static void md_start_sync(struct work_struct *ws)
suspend ? mddev_suspend_and_lock_nointr(mddev) :
mddev_lock_nointr(mddev);

+ if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
+ /*
+ * dm-raid calls md_reap_sync_thread() directly to
unregister
+ * sync_thread, and md/raid should never trigger this.
+ */
+ WARN_ON_ONCE(mddev->gendisk);
+ goto not_running;;
+ }
+
if (!md_is_rdwr(mddev)) {

Failed tests for v6.6:
### failed: [ndev-vanilla] shell/duplicate-vgid.sh
### failed: [ndev-vanilla] shell/fsadm-crypt.sh
### failed: [ndev-vanilla] shell/lvchange-raid1-writemostly.sh
### failed: [ndev-vanilla] shell/lvconvert-cache-abort.sh
### failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
### failed: [ndev-vanilla] shell/lvcreate-large-raid.sh
### failed: [ndev-vanilla] shell/lvextend-raid.sh
### failed: [ndev-vanilla] shell/select-report.sh

Failed tests for next-20240117(latest linux-next, between v6.7 to v6.8-rc1)
### failed: [ndev-vanilla] shell/duplicate-vgid.sh
### failed: [ndev-vanilla] shell/fsadm-crypt.sh
### failed: [ndev-vanilla] shell/lvchange-raid1-writemostly.sh
### failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
### failed: [ndev-vanilla] shell/lvextend-raid.sh
### failed: [ndev-vanilla] shell/select-report.sh

Please noted that the test lvconvert-raid-reshape.sh is still possible
to fail due to commit c467e97f079f ("md/raid6: use valid sector values
to determine if an I/O should wait on the reshape").

Thanks,
Kuai

> 1. Test on both Linus' master branch and 6.7.y, and explain which tests
> are failing before the fixes. (From my tests, the two branches don't have
> the same test results). We can put these results in the cover letter and
> include them in a merge commit.
> 2. If possible, add Fixes tag to all patches.
> 3. Add more details in the commit log, so it is clear what is being fixed.
> 4. Add "reported-by" and maybe also "closes" tag.
>
> For patch 4-5, especially 5, I wonder whether the same issue also
> happens with md. We can probably ship 4-5 now, with the same
> improvements as patch 1-3.
>
> I will run more tests on my side.
>
> Mykulas, please also review and test these patches.
>
> Thanks,
> Song
>
>
>
>>
>> Yu Kuai (5):
>> md: don't ignore suspended array in md_check_recovery()
>> md: don't ignore read-only array in md_check_recovery()
>> md: make sure md_do_sync() will set MD_RECOVERY_DONE
>> md: revert commit fa2bbff7b0b4 ("md: synchronize flush io with array
>> reconfiguration") for dm-raid
>> md: use md_reap_sync_thread() directly for dm-raid
>>
>> drivers/md/md.c | 58 ++++++++++++++++++++++++++++++-------------------
>> 1 file changed, 36 insertions(+), 22 deletions(-)
>>
>> --
>> 2.39.2
>>
>>
> .
>

2024-01-22 10:10:00

by Song Liu

[permalink] [raw]

Subject: Re: [PATCH 0/5] md: fix/prevent dm-raid regressions

On Mon, Jan 22, 2024 at 12:24 AM Yu Kuai <[email protected]> wrote:
>
> Hi,
>
> 在 2024/01/21 12:41, Song Liu 写道:
> > On Sat, Jan 20, 2024 at 2:41 AM Yu Kuai <[email protected]> wrote:
> >>
> >> From: Yu Kuai <[email protected]>
> >>
> >> There are some problems that we fixed in md/raid, and some apis is changed.
> >> However, dm-raid rely the old apis(noted that old apis is problematic in
> >> corner cases), and now there are regressions in lvm2 testsuite.
> >>
> >> This patchset fix some regressions(patch 1-3), and revert changes to
> >> prevent regressions(patch 4,5). Noted that the problems in patch 4,5 is
> >> not clear yet, and I'm not able to locate the root cause ASAP, hence I
> >> decide to revert changes to prevent regressions first.
> >
> > Thanks for looking into this!
> >
> > Patch 1-3 look good to me. But since we need to back port these fixes
> > to 6.7 kernels, let's make it very clear what issues are being fixed.
> > Please:
>
> I'm attaching my test result here, before I send the next version.
>
> The tested patched add following changes for patch 5:
>
> @@ -9379,6 +9387,15 @@ static void md_start_sync(struct work_struct *ws)
> suspend ? mddev_suspend_and_lock_nointr(mddev) :
> mddev_lock_nointr(mddev);
>
> + if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
> + /*
> + * dm-raid calls md_reap_sync_thread() directly to
> unregister
> + * sync_thread, and md/raid should never trigger this.
> + */
> + WARN_ON_ONCE(mddev->gendisk);
> + goto not_running;;
> + }
> +
> if (!md_is_rdwr(mddev)) {
>
> Failed tests for v6.6:
> ### failed: [ndev-vanilla] shell/duplicate-vgid.sh
> ### failed: [ndev-vanilla] shell/fsadm-crypt.sh
> ### failed: [ndev-vanilla] shell/lvchange-raid1-writemostly.sh
> ### failed: [ndev-vanilla] shell/lvconvert-cache-abort.sh
> ### failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
> ### failed: [ndev-vanilla] shell/lvcreate-large-raid.sh
> ### failed: [ndev-vanilla] shell/lvextend-raid.sh
> ### failed: [ndev-vanilla] shell/select-report.sh
>
> Failed tests for next-20240117(latest linux-next, between v6.7 to v6.8-rc1)
> ### failed: [ndev-vanilla] shell/duplicate-vgid.sh
> ### failed: [ndev-vanilla] shell/fsadm-crypt.sh
> ### failed: [ndev-vanilla] shell/lvchange-raid1-writemostly.sh
> ### failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
> ### failed: [ndev-vanilla] shell/lvextend-raid.sh
> ### failed: [ndev-vanilla] shell/select-report.sh
>
> Please noted that the test lvconvert-raid-reshape.sh is still possible
> to fail due to commit c467e97f079f ("md/raid6: use valid sector values
> to determine if an I/O should wait on the reshape").

Thanks for the information!

I will look closer into the raid6 issue.

Song

2024-01-22 13:49:44

by Yu Kuai

[permalink] [raw]

Subject: Re: [PATCH RFC 5/5] md: use md_reap_sync_thread() directly for dm-raid

Hi,

?? 2024/01/20 18:37, Yu Kuai д??:
> The root cause is still not clear yet, however, let's convert dm-raid
> back to use md_reap_sync_thread() directly. This is not safe but at
> least there won't be new regressions. We can decide what to do after
> figuring out the root cause.

I think I finally figure out the root cause here. This patch is no
longer needed after following patch. I already verified in my VM for 3
times that lvconvert-raid-reshape.sh won't fail(with raid6 patch
2c265ac5ffde reverted).

I'll run more tests in case there are new regression. Meanwhile I'll try
to locate root cause of the problem decribed in patch 4.

Thanks,
Kuai

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index eb009d6bb03a..108e7e313631 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3241,7 +3241,7 @@ static int raid_ctr(struct dm_target *ti, unsigned
int argc, char **argv)
rs->md.in_sync = 1;

/* Keep array frozen until resume. */
- set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery);
+ md_frozen_sync_thread(&rs->md);

/* Has to be held on running the array */
mddev_suspend_and_lock_nointr(&rs->md);
@@ -3722,6 +3722,9 @@ static int raid_message(struct dm_target *ti,
unsigned int argc, char **argv,
if (!mddev->pers || !mddev->pers->sync_request)
return -EINVAL;

+ if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags))
+ return -EBUSY;
+
if (!strcasecmp(argv[0], "frozen"))
set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
else
@@ -3796,10 +3799,8 @@ static void raid_postsuspend(struct dm_target *ti)
struct raid_set *rs = ti->private;

if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
- /* Writes have to be stopped before suspending to avoid
deadlocks. */
- if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
- md_stop_writes(&rs->md);
-
+ md_frozen_sync_thread(&rs->md);
+ md_stop_writes(&rs->md);
mddev_suspend(&rs->md, false);
}
}
@@ -4011,9 +4012,6 @@ static int raid_preresume(struct dm_target *ti)
DMERR("Failed to resize bitmap");
}

- /* Check for any resize/reshape on @rs and adjust/initiate */
- /* Be prepared for mddev_resume() in raid_resume() */
- set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
if (mddev->recovery_cp && mddev->recovery_cp < MaxSector) {
set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
mddev->resync_min = mddev->recovery_cp;
@@ -4056,10 +4054,11 @@ static void raid_resume(struct dm_target *ti)
rs_set_capacity(rs);

mddev_lock_nointr(mddev);
- clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
mddev->ro = 0;
mddev->in_sync = 0;
mddev_unlock_and_resume(mddev);
+
+ md_unfrozen_sync_thread(mddev);
}
}
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 9ef17a769cc2..0638d104fe26 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4939,7 +4939,7 @@ static void idle_sync_thread(struct mddev *mddev)
mutex_unlock(&mddev->sync_mutex);
}

-static void frozen_sync_thread(struct mddev *mddev)
+void md_frozen_sync_thread(struct mddev *mddev)
{
mutex_lock(&mddev->sync_mutex);
set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
@@ -4952,6 +4952,18 @@ static void frozen_sync_thread(struct mddev *mddev)
stop_sync_thread(mddev, false, false);
mutex_unlock(&mddev->sync_mutex);
}
+EXPORT_SYMBOL_GPL(md_frozen_sync_thread);
+
+void md_unfrozen_sync_thread(struct mddev *mddev)
+{
+ mutex_lock(&mddev->sync_mutex);
+ clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+ md_wakeup_thread(mddev->thread);
+ sysfs_notify_dirent_safe(mddev->sysfs_action);
+ mutex_unlock(&mddev->sync_mutex);
+}
+EXPORT_SYMBOL_GPL(md_unfrozen_sync_thread);

static ssize_t
action_store(struct mddev *mddev, const char *page, size_t len)
@@ -4963,7 +4975,7 @@ action_store(struct mddev *mddev, const char
*page, size_t len)
if (cmd_match(page, "idle"))
idle_sync_thread(mddev);
else if (cmd_match(page, "frozen"))
- frozen_sync_thread(mddev);
+ md_frozen_sync_thread(mddev);
else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
return -EBUSY;
else if (cmd_match(page, "resync"))
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 8d881cc59799..332520595ed8 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -781,6 +781,8 @@ extern void md_rdev_clear(struct md_rdev *rdev);
extern void md_handle_request(struct mddev *mddev, struct bio *bio);
extern int mddev_suspend(struct mddev *mddev, bool interruptible);
extern void mddev_resume(struct mddev *mddev);
+extern void md_frozen_sync_thread(struct mddev *mddev);
+extern void md_unfrozen_sync_thread(struct mddev *mddev);

extern void md_reload_sb(struct mddev *mddev, int raid_disk);
extern void md_update_sb(struct mddev *mddev, int force);