From: Yu Kuai <[email protected]>
Changes in v2:
- Change the title from '-next' to 'md-6.8';
- Add Acked-by tag from Mike;
- Add Singed-off-by tag from Xiao;
link to part1: https://lore.kernel.org/all/CAPhsuW7u1UKHCDOBDhD7DzOVtkGemDz_QnJ4DUq_kSN-Q3G66Q@mail.gmail.com/
part1 contains fixes for deadlocks for stopping sync_thread
This set contains fixes:
- reshape can start unexpected, cause data corruption, patch 1,5,6;
- deadlocks that reshape concurrent with IO, patch 8;
- a lockdep warning, patch 9;
I'm running lvm2 tests with following scripts with a few rounds now,
for t in `ls test/shell`; do
if cat test/shell/$t | grep raid &> /dev/null; then
make check T=shell/$t
fi
done
There are no deadlocks now, there are still some fs corruption, however,
it's verified there are no new failed tests compared to v6.6.
Yu Kuai (9):
md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume
md: export helpers to stop sync_thread
md: export helper md_is_rdwr()
md: add a new helper reshape_interrupted()
dm-raid: really frozen sync_thread during suspend
md/dm-raid: don't call md_reap_sync_thread() directly
dm-raid: add a new helper prepare_suspend() in md_personality
dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io
concurrent with reshape
dm-raid: fix lockdep waring in "pers->hot_add_disk"
drivers/md/dm-raid.c | 93 ++++++++++++++++++++++++++++++++++----------
drivers/md/md.c | 73 ++++++++++++++++++++++++++--------
drivers/md/md.h | 38 +++++++++++++++++-
drivers/md/raid5.c | 32 ++++++++++++++-
4 files changed, 196 insertions(+), 40 deletions(-)
--
2.39.2
From: Yu Kuai <[email protected]>
The helper will be used for dm-raid456 later to detect the case that
reshape can't make progress.
Signed-off-by: Yu Kuai <[email protected]>
Signed-off-by: Xiao Ni <[email protected]>
Acked-by: Mike Snitzer <[email protected]>
---
drivers/md/md.h | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 684cc866402a..7f955115d78d 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -570,6 +570,25 @@ static inline bool md_is_rdwr(struct mddev *mddev)
return (mddev->ro == MD_RDWR);
}
+static inline bool reshape_interrupted(struct mddev *mddev)
+{
+ /* reshape never start */
+ if (mddev->reshape_position == MaxSector)
+ return false;
+
+ /* interrupted */
+ if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
+ return true;
+
+ /* running reshape will be interrupted soon. */
+ if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
+ test_bit(MD_RECOVERY_INTR, &mddev->recovery) ||
+ test_bit(MD_RECOVERY_FROZEN, &mddev->recovery))
+ return true;
+
+ return false;
+}
+
static inline int __must_check mddev_lock(struct mddev *mddev)
{
return mutex_lock_interruptible(&mddev->reconfig_mutex);
--
2.39.2
From: Yu Kuai <[email protected]>
Currently md_reap_sync_thread() is called from raid_message() directly
without holding 'reconfig_mutex', this is definitely unsafe because
md_reap_sync_thread() can change many fields that is protected by
'reconfig_mutex'.
However, hold 'reconfig_mutex' here is still problematic because this
will cause deadlock, for example, commit 130443d60b1b ("md: refactor
idle/frozen_sync_thread() to fix deadlock").
Fix this problem by using stop_sync_thread() to unregister sync_thread,
like md/raid did.
Fixes: be83651f0050 ("DM RAID: Add message/status support for changing sync action")
Signed-off-by: Yu Kuai <[email protected]>
Signed-off-by: Xiao Ni <[email protected]>
Acked-by: Mike Snitzer <[email protected]>
---
drivers/md/dm-raid.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)
diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index e2d7a73c0f87..47c4b1b6e532 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3719,6 +3719,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
{
struct raid_set *rs = ti->private;
struct mddev *mddev = &rs->md;
+ int ret = 0;
if (!mddev->pers || !mddev->pers->sync_request)
return -EINVAL;
@@ -3726,17 +3727,24 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags))
return -EBUSY;
- if (!strcasecmp(argv[0], "frozen"))
- set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
- else
- clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+ if (!strcasecmp(argv[0], "frozen")) {
+ ret = mddev_lock(mddev);
+ if (ret)
+ return ret;
- if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
- if (mddev->sync_thread) {
- set_bit(MD_RECOVERY_INTR, &mddev->recovery);
- md_reap_sync_thread(mddev);
- }
- } else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
+ md_frozen_sync_thread(mddev);
+ mddev_unlock(mddev);
+ } else if (!strcasecmp(argv[0], "idle")) {
+ ret = mddev_lock(mddev);
+ if (ret)
+ return ret;
+
+ md_idle_sync_thread(mddev);
+ mddev_unlock(mddev);
+ }
+
+ clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+ if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
return -EBUSY;
else if (!strcasecmp(argv[0], "resync"))
; /* MD_RECOVERY_NEEDED set below */
--
2.39.2
From: Yu Kuai <[email protected]>
1) commit f52f5c71f3d4 ("md: fix stopping sync thread") remove
MD_RECOVERY_FROZEN from __md_stop_writes() and doesn't realize that
dm-raid relies on __md_stop_writes() to frozen sync_thread
indirectly. Fix this problem by adding MD_RECOVERY_FROZEN in
md_stop_writes(), and since stop_sync_thread() is only used for
dm-raid in this case, also move stop_sync_thread() to
md_stop_writes().
2) The flag MD_RECOVERY_FROZEN doesn't mean that sync thread is frozen,
it only prevent new sync_thread to start, and it can't stop the
running sync thread; In order to frozen sync_thread, after seting the
flag, stop_sync_thread() should be used.
3) The flag MD_RECOVERY_FROZEN doesn't mean that writes are stopped, use
it as condition for md_stop_writes() in raid_postsuspend() doesn't
look correct. Consider that reentrant stop_sync_thread() do nothing,
always call md_stop_writes() in raid_postsuspend().
4) raid_message can set/clear the flag MD_RECOVERY_FROZEN at anytime,
and if MD_RECOVERY_FROZEN is cleared while the array is suspended,
new sync_thread can start unexpected. Fix this by disallow
raid_message() to change sync_thread status during suspend.
Note that after commit f52f5c71f3d4 ("md: fix stopping sync thread"), the
test shell/lvconvert-raid-reshape.sh start to hang in stop_sync_thread(),
and with previous fixes, the test won't hang there anymore, however, the
test will still fail and complain that ext4 is corrupted. And with this
patch, the test won't hang due to stop_sync_thread() or fail due to ext4
is corrupted anymore. However, there is still a deadlock related to
dm-raid456 that will be fixed in following patches.
Reported-by: Mikulas Patocka <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Fixes: 1af2048a3e87 ("dm raid: fix deadlock caused by premature md_stop_writes()")
Fixes: 9dbd1aa3a81c ("dm raid: add reshaping support to the target")
Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
Signed-off-by: Yu Kuai <[email protected]>
Signed-off-by: Xiao Ni <[email protected]>
Acked-by: Mike Snitzer <[email protected]>
---
drivers/md/dm-raid.c | 25 +++++++++++++++----------
drivers/md/md.c | 3 ++-
2 files changed, 17 insertions(+), 11 deletions(-)
diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index eb009d6bb03a..e2d7a73c0f87 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3240,11 +3240,12 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv)
rs->md.ro = 1;
rs->md.in_sync = 1;
- /* Keep array frozen until resume. */
- set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery);
-
/* Has to be held on running the array */
mddev_suspend_and_lock_nointr(&rs->md);
+
+ /* Keep array frozen until resume. */
+ md_frozen_sync_thread(&rs->md);
+
r = md_run(&rs->md);
rs->md.in_sync = 0; /* Assume already marked dirty */
if (r) {
@@ -3722,6 +3723,9 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
if (!mddev->pers || !mddev->pers->sync_request)
return -EINVAL;
+ if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags))
+ return -EBUSY;
+
if (!strcasecmp(argv[0], "frozen"))
set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
else
@@ -3796,10 +3800,11 @@ static void raid_postsuspend(struct dm_target *ti)
struct raid_set *rs = ti->private;
if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
- /* Writes have to be stopped before suspending to avoid deadlocks. */
- if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
- md_stop_writes(&rs->md);
-
+ /*
+ * sync_thread must be stopped during suspend, and writes have
+ * to be stopped before suspending to avoid deadlocks.
+ */
+ md_stop_writes(&rs->md);
mddev_suspend(&rs->md, false);
}
}
@@ -4012,8 +4017,6 @@ static int raid_preresume(struct dm_target *ti)
}
/* Check for any resize/reshape on @rs and adjust/initiate */
- /* Be prepared for mddev_resume() in raid_resume() */
- set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
if (mddev->recovery_cp && mddev->recovery_cp < MaxSector) {
set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
mddev->resync_min = mddev->recovery_cp;
@@ -4055,10 +4058,12 @@ static void raid_resume(struct dm_target *ti)
if (mddev->delta_disks < 0)
rs_set_capacity(rs);
+ WARN_ON_ONCE(!test_bit(MD_RECOVERY_FROZEN, &mddev->recovery));
+ WARN_ON_ONCE(test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
mddev_lock_nointr(mddev);
- clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
mddev->ro = 0;
mddev->in_sync = 0;
+ md_unfrozen_sync_thread(mddev);
mddev_unlock_and_resume(mddev);
}
}
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 306cbbf92fcf..e9e666240e9f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6335,7 +6335,6 @@ static void md_clean(struct mddev *mddev)
static void __md_stop_writes(struct mddev *mddev)
{
- stop_sync_thread(mddev, true, false);
del_timer_sync(&mddev->safemode_timer);
if (mddev->pers && mddev->pers->quiesce) {
@@ -6360,6 +6359,8 @@ static void __md_stop_writes(struct mddev *mddev)
void md_stop_writes(struct mddev *mddev)
{
mddev_lock_nointr(mddev);
+ set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+ stop_sync_thread(mddev, true, false);
__md_stop_writes(mddev);
mddev_unlock(mddev);
}
--
2.39.2
From: Yu Kuai <[email protected]>
There are no functional changes for now, prepare to fix a deadlock for
dm-raid456.
Signed-off-by: Yu Kuai <[email protected]>
Signed-off-by: Xiao Ni <[email protected]>
Acked-by: Mike Snitzer <[email protected]>
---
drivers/md/md.c | 12 ------------
drivers/md/md.h | 12 ++++++++++++
2 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 22e7512a5b1e..306cbbf92fcf 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -99,18 +99,6 @@ static void mddev_detach(struct mddev *mddev);
static void export_rdev(struct md_rdev *rdev, struct mddev *mddev);
static void md_wakeup_thread_directly(struct md_thread __rcu *thread);
-enum md_ro_state {
- MD_RDWR,
- MD_RDONLY,
- MD_AUTO_READ,
- MD_MAX_STATE
-};
-
-static bool md_is_rdwr(struct mddev *mddev)
-{
- return (mddev->ro == MD_RDWR);
-}
-
/*
* Default number of read corrections we'll attempt on an rdev
* before ejecting it from the array. We divide the read error
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 437ab70ce79b..684cc866402a 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -558,6 +558,18 @@ enum recovery_flags {
MD_RESYNCING_REMOTE, /* remote node is running resync thread */
};
+enum md_ro_state {
+ MD_RDWR,
+ MD_RDONLY,
+ MD_AUTO_READ,
+ MD_MAX_STATE
+};
+
+static inline bool md_is_rdwr(struct mddev *mddev)
+{
+ return (mddev->ro == MD_RDWR);
+}
+
static inline int __must_check mddev_lock(struct mddev *mddev)
{
return mutex_lock_interruptible(&mddev->reconfig_mutex);
--
2.39.2
From: Yu Kuai <[email protected]>
The lockdep assert is added by commit a448af25becf ("md/raid10: remove
rcu protection to access rdev from conf") in print_conf(). And I didn't
notice that dm-raid is calling "pers->hot_add_disk" without holding
'reconfig_mutex'.
"pers->hot_add_disk" read and write many fields that is protected by
'reconfig_mutex', and raid_resume() already grab the lock in other
contex. Hence fix this problem by protecting "pers->host_add_disk"
with the lock.
Fixes: 9092c02d9435 ("DM RAID: Add ability to restore transiently failed devices on resume")
Fixes: a448af25becf ("md/raid10: remove rcu protection to access rdev from conf")
Signed-off-by: Yu Kuai <[email protected]>
Signed-off-by: Xiao Ni <[email protected]>
Acked-by: Mike Snitzer <[email protected]>
---
drivers/md/dm-raid.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index ea45f777691c..17e9af60bbf7 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -4091,7 +4091,9 @@ static void raid_resume(struct dm_target *ti)
* Take this opportunity to check whether any failed
* devices are reachable again.
*/
+ mddev_lock_nointr(mddev);
attempt_restore_of_faulty_devices(rs);
+ mddev_unlock(mddev);
}
if (test_and_clear_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
--
2.39.2
From: Yu Kuai <[email protected]>
For raid456, if reshape is still in progress, then IO across reshape
position will wait for reshape to make progress. However, for dm-raid,
in following cases reshape will never make progress hence IO will hang:
1) the array is read-only;
2) MD_RECOVERY_WAIT is set;
3) MD_RECOVERY_FROZEN is set;
After commit c467e97f079f ("md/raid6: use valid sector values to determine
if an I/O should wait on the reshape") fix the problem that IO across
reshape position doesn't wait for reshape, the dm-raid test
shell/lvconvert-raid-reshape.sh start to hang:
[root@fedora ~]# cat /proc/979/stack
[<0>] wait_woken+0x7d/0x90
[<0>] raid5_make_request+0x929/0x1d70 [raid456]
[<0>] md_handle_request+0xc2/0x3b0 [md_mod]
[<0>] raid_map+0x2c/0x50 [dm_raid]
[<0>] __map_bio+0x251/0x380 [dm_mod]
[<0>] dm_submit_bio+0x1f0/0x760 [dm_mod]
[<0>] __submit_bio+0xc2/0x1c0
[<0>] submit_bio_noacct_nocheck+0x17f/0x450
[<0>] submit_bio_noacct+0x2bc/0x780
[<0>] submit_bio+0x70/0xc0
[<0>] mpage_readahead+0x169/0x1f0
[<0>] blkdev_readahead+0x18/0x30
[<0>] read_pages+0x7c/0x3b0
[<0>] page_cache_ra_unbounded+0x1ab/0x280
[<0>] force_page_cache_ra+0x9e/0x130
[<0>] page_cache_sync_ra+0x3b/0x110
[<0>] filemap_get_pages+0x143/0xa30
[<0>] filemap_read+0xdc/0x4b0
[<0>] blkdev_read_iter+0x75/0x200
[<0>] vfs_read+0x272/0x460
[<0>] ksys_read+0x7a/0x170
[<0>] __x64_sys_read+0x1c/0x30
[<0>] do_syscall_64+0xc6/0x230
[<0>] entry_SYSCALL_64_after_hwframe+0x6c/0x74
This is because reshape can't make progress.
For md/raid, the problem doesn't exist because register new sync_thread
doesn't rely on the IO to be done any more:
1) If array is read-only, it can switch to read-write by ioctl/sysfs;
2) md/raid never set MD_RECOVERY_WAIT;
3) If MD_RECOVERY_FROZEN is set, mddev_suspend() doesn't hold
'reconfig_mutex', hence it can be cleared and reshape can continue by
sysfs api 'sync_action'.
However, I'm not sure yet how to avoid the problem in dm-raid yet. This
patch on the one hand make sure raid_message() can't change
sync_thread() through raid_message() after presuspend(), on the other
hand detect the above 3 cases before wait for IO do be done in
dm_suspend(), and let dm-raid requeue those IO.
Signed-off-by: Yu Kuai <[email protected]>
Signed-off-by: Xiao Ni <[email protected]>
Acked-by: Mike Snitzer <[email protected]>
---
drivers/md/dm-raid.c | 22 ++++++++++++++++++++--
drivers/md/md.c | 24 ++++++++++++++++++++++--
drivers/md/md.h | 3 ++-
drivers/md/raid5.c | 32 ++++++++++++++++++++++++++++++--
4 files changed, 74 insertions(+), 7 deletions(-)
diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 7d48943acd57..ea45f777691c 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -213,6 +213,7 @@ struct raid_dev {
#define RT_FLAG_RS_IN_SYNC 6
#define RT_FLAG_RS_RESYNCING 7
#define RT_FLAG_RS_GROW 8
+#define RT_FLAG_RS_FROZEN 9
/* Array elements of 64 bit needed for rebuild/failed disk bits */
#define DISKS_ARRAY_ELEMS ((MAX_RAID_DEVICES + (sizeof(uint64_t) * 8 - 1)) / sizeof(uint64_t) / 8)
@@ -3340,7 +3341,8 @@ static int raid_map(struct dm_target *ti, struct bio *bio)
if (unlikely(bio_end_sector(bio) > mddev->array_sectors))
return DM_MAPIO_REQUEUE;
- md_handle_request(mddev, bio);
+ if (unlikely(!md_handle_request(mddev, bio)))
+ return DM_MAPIO_REQUEUE;
return DM_MAPIO_SUBMITTED;
}
@@ -3724,7 +3726,8 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
if (!mddev->pers || !mddev->pers->sync_request)
return -EINVAL;
- if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags))
+ if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags) ||
+ test_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags))
return -EBUSY;
if (!strcasecmp(argv[0], "frozen")) {
@@ -3808,6 +3811,12 @@ static void raid_presuspend(struct dm_target *ti)
struct raid_set *rs = ti->private;
struct mddev *mddev = &rs->md;
+ /*
+ * From now on, disallow raid_message() to change sync_thread until
+ * resume, raid_postsuspend() is too late.
+ */
+ set_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags);
+
if (!reshape_interrupted(mddev))
return;
@@ -3820,6 +3829,13 @@ static void raid_presuspend(struct dm_target *ti)
mddev->pers->prepare_suspend(mddev);
}
+static void raid_presuspend_undo(struct dm_target *ti)
+{
+ struct raid_set *rs = ti->private;
+
+ clear_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags);
+}
+
static void raid_postsuspend(struct dm_target *ti)
{
struct raid_set *rs = ti->private;
@@ -4085,6 +4101,7 @@ static void raid_resume(struct dm_target *ti)
WARN_ON_ONCE(!test_bit(MD_RECOVERY_FROZEN, &mddev->recovery));
WARN_ON_ONCE(test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
+ clear_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags);
mddev_lock_nointr(mddev);
mddev->ro = 0;
mddev->in_sync = 0;
@@ -4105,6 +4122,7 @@ static struct target_type raid_target = {
.iterate_devices = raid_iterate_devices,
.io_hints = raid_io_hints,
.presuspend = raid_presuspend,
+ .presuspend_undo = raid_presuspend_undo,
.postsuspend = raid_postsuspend,
.preresume = raid_preresume,
.resume = raid_resume,
diff --git a/drivers/md/md.c b/drivers/md/md.c
index e9e666240e9f..50d7bfa3c6f9 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -366,7 +366,7 @@ static bool is_suspended(struct mddev *mddev, struct bio *bio)
return true;
}
-void md_handle_request(struct mddev *mddev, struct bio *bio)
+bool md_handle_request(struct mddev *mddev, struct bio *bio)
{
check_suspended:
if (is_suspended(mddev, bio)) {
@@ -374,7 +374,7 @@ void md_handle_request(struct mddev *mddev, struct bio *bio)
/* Bail out if REQ_NOWAIT is set for the bio */
if (bio->bi_opf & REQ_NOWAIT) {
bio_wouldblock_error(bio);
- return;
+ return true;
}
for (;;) {
prepare_to_wait(&mddev->sb_wait, &__wait,
@@ -390,10 +390,13 @@ void md_handle_request(struct mddev *mddev, struct bio *bio)
if (!mddev->pers->make_request(mddev, bio)) {
percpu_ref_put(&mddev->active_io);
+ if (!mddev->gendisk && mddev->pers->prepare_suspend)
+ return false;
goto check_suspended;
}
percpu_ref_put(&mddev->active_io);
+ return true;
}
EXPORT_SYMBOL(md_handle_request);
@@ -8758,6 +8761,23 @@ void md_account_bio(struct mddev *mddev, struct bio **bio)
}
EXPORT_SYMBOL_GPL(md_account_bio);
+void md_free_cloned_bio(struct bio *bio)
+{
+ struct md_io_clone *md_io_clone = bio->bi_private;
+ struct bio *orig_bio = md_io_clone->orig_bio;
+ struct mddev *mddev = md_io_clone->mddev;
+
+ if (bio->bi_status && !orig_bio->bi_status)
+ orig_bio->bi_status = bio->bi_status;
+
+ if (md_io_clone->start_time)
+ bio_end_io_acct(orig_bio, md_io_clone->start_time);
+
+ bio_put(bio);
+ percpu_ref_put(&mddev->active_io);
+}
+EXPORT_SYMBOL_GPL(md_free_cloned_bio);
+
/* md_allow_write(mddev)
* Calling this ensures that the array is marked 'active' so that writes
* may proceed without blocking. It is important to call this before
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 2db0a75febf9..3133650b57d4 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -782,6 +782,7 @@ extern void md_finish_reshape(struct mddev *mddev);
void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
struct bio *bio, sector_t start, sector_t size);
void md_account_bio(struct mddev *mddev, struct bio **bio);
+void md_free_cloned_bio(struct bio *bio);
extern bool __must_check md_flush_request(struct mddev *mddev, struct bio *bio);
extern void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
@@ -810,7 +811,7 @@ extern void md_stop_writes(struct mddev *mddev);
extern int md_rdev_init(struct md_rdev *rdev);
extern void md_rdev_clear(struct md_rdev *rdev);
-extern void md_handle_request(struct mddev *mddev, struct bio *bio);
+extern bool md_handle_request(struct mddev *mddev, struct bio *bio);
extern int mddev_suspend(struct mddev *mddev, bool interruptible);
extern void mddev_resume(struct mddev *mddev);
extern void md_idle_sync_thread(struct mddev *mddev);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 6a7a32f7fb91..38a49f6eb54b 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -760,6 +760,7 @@ enum stripe_result {
STRIPE_RETRY,
STRIPE_SCHEDULE_AND_RETRY,
STRIPE_FAIL,
+ STRIPE_WAIT_RESHAPE,
};
struct stripe_request_ctx {
@@ -5946,7 +5947,8 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
if (ahead_of_reshape(mddev, logical_sector,
conf->reshape_safe)) {
spin_unlock_irq(&conf->device_lock);
- return STRIPE_SCHEDULE_AND_RETRY;
+ ret = STRIPE_SCHEDULE_AND_RETRY;
+ goto out;
}
}
spin_unlock_irq(&conf->device_lock);
@@ -6025,6 +6027,12 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
out_release:
raid5_release_stripe(sh);
+out:
+ if (ret == STRIPE_SCHEDULE_AND_RETRY && reshape_interrupted(mddev)) {
+ bi->bi_status = BLK_STS_RESOURCE;
+ ret = STRIPE_WAIT_RESHAPE;
+ pr_err_ratelimited("dm-raid456: io across reshape position while reshape can't make progress");
+ }
return ret;
}
@@ -6146,7 +6154,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
while (1) {
res = make_stripe_request(mddev, conf, &ctx, logical_sector,
bi);
- if (res == STRIPE_FAIL)
+ if (res == STRIPE_FAIL || res == STRIPE_WAIT_RESHAPE)
break;
if (res == STRIPE_RETRY)
@@ -6184,6 +6192,11 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
if (rw == WRITE)
md_write_end(mddev);
+ if (res == STRIPE_WAIT_RESHAPE) {
+ md_free_cloned_bio(bi);
+ return false;
+ }
+
bio_endio(bi);
return true;
}
@@ -8909,6 +8922,18 @@ static int raid5_start(struct mddev *mddev)
return r5l_start(conf->log);
}
+/*
+ * This is only used for dm-raid456, caller already frozen sync_thread, hence
+ * if rehsape is still in progress, io that is waiting for reshape can never be
+ * done now, hence wake up and handle those IO.
+ */
+static void raid5_prepare_suspend(struct mddev *mddev)
+{
+ struct r5conf *conf = mddev->private;
+
+ wake_up(&conf->wait_for_overlap);
+}
+
static struct md_personality raid6_personality =
{
.name = "raid6",
@@ -8932,6 +8957,7 @@ static struct md_personality raid6_personality =
.quiesce = raid5_quiesce,
.takeover = raid6_takeover,
.change_consistency_policy = raid5_change_consistency_policy,
+ .prepare_suspend = raid5_prepare_suspend,
};
static struct md_personality raid5_personality =
{
@@ -8956,6 +8982,7 @@ static struct md_personality raid5_personality =
.quiesce = raid5_quiesce,
.takeover = raid5_takeover,
.change_consistency_policy = raid5_change_consistency_policy,
+ .prepare_suspend = raid5_prepare_suspend,
};
static struct md_personality raid4_personality =
@@ -8981,6 +9008,7 @@ static struct md_personality raid4_personality =
.quiesce = raid5_quiesce,
.takeover = raid4_takeover,
.change_consistency_policy = raid5_change_consistency_policy,
+ .prepare_suspend = raid5_prepare_suspend,
};
static int __init raid5_init(void)
--
2.39.2
From: Yu Kuai <[email protected]>
The new heleprs will be used in dm-raid in later patches to fix
regressions and prevent calling md_reap_sync_thread() directly.
Signed-off-by: Yu Kuai <[email protected]>
Signed-off-by: Xiao Ni <[email protected]>
Acked-by: Mike Snitzer <[email protected]>
---
drivers/md/md.c | 29 +++++++++++++++++++++++++++++
drivers/md/md.h | 3 +++
2 files changed, 32 insertions(+)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 23f31fd1d024..22e7512a5b1e 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4919,6 +4919,35 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq)
mddev_lock_nointr(mddev);
}
+void md_idle_sync_thread(struct mddev *mddev)
+{
+ lockdep_assert_held(&mddev->reconfig_mutex);
+
+ clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+ stop_sync_thread(mddev, true, true);
+}
+EXPORT_SYMBOL_GPL(md_idle_sync_thread);
+
+void md_frozen_sync_thread(struct mddev *mddev)
+{
+ lockdep_assert_held(&mddev->reconfig_mutex);
+
+ set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+ stop_sync_thread(mddev, true, false);
+}
+EXPORT_SYMBOL_GPL(md_frozen_sync_thread);
+
+void md_unfrozen_sync_thread(struct mddev *mddev)
+{
+ lockdep_assert_held(&mddev->reconfig_mutex);
+
+ clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+ md_wakeup_thread(mddev->thread);
+ sysfs_notify_dirent_safe(mddev->sysfs_action);
+}
+EXPORT_SYMBOL_GPL(md_unfrozen_sync_thread);
+
static void idle_sync_thread(struct mddev *mddev)
{
mutex_lock(&mddev->sync_mutex);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 8d881cc59799..437ab70ce79b 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -781,6 +781,9 @@ extern void md_rdev_clear(struct md_rdev *rdev);
extern void md_handle_request(struct mddev *mddev, struct bio *bio);
extern int mddev_suspend(struct mddev *mddev, bool interruptible);
extern void mddev_resume(struct mddev *mddev);
+extern void md_idle_sync_thread(struct mddev *mddev);
+extern void md_frozen_sync_thread(struct mddev *mddev);
+extern void md_unfrozen_sync_thread(struct mddev *mddev);
extern void md_reload_sb(struct mddev *mddev, int raid_disk);
extern void md_update_sb(struct mddev *mddev, int force);
--
2.39.2
From: Yu Kuai <[email protected]>
There are no functional changes for now, prepare to fix a deadlock for
dm-raid456.
Signed-off-by: Yu Kuai <[email protected]>
Signed-off-by: Xiao Ni <[email protected]>
Acked-by: Mike Snitzer <[email protected]>
---
drivers/md/dm-raid.c | 18 ++++++++++++++++++
drivers/md/md.h | 1 +
2 files changed, 19 insertions(+)
diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 47c4b1b6e532..7d48943acd57 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3803,6 +3803,23 @@ static void raid_io_hints(struct dm_target *ti, struct queue_limits *limits)
blk_limits_io_opt(limits, chunk_size_bytes * mddev_data_stripes(rs));
}
+static void raid_presuspend(struct dm_target *ti)
+{
+ struct raid_set *rs = ti->private;
+ struct mddev *mddev = &rs->md;
+
+ if (!reshape_interrupted(mddev))
+ return;
+
+ /*
+ * For raid456, if reshape is interrupted, IO across reshape position
+ * will never make progress, while caller will wait for IO to be done.
+ * Inform raid456 to handle those IO to prevent deadlock.
+ */
+ if (mddev->pers && mddev->pers->prepare_suspend)
+ mddev->pers->prepare_suspend(mddev);
+}
+
static void raid_postsuspend(struct dm_target *ti)
{
struct raid_set *rs = ti->private;
@@ -4087,6 +4104,7 @@ static struct target_type raid_target = {
.message = raid_message,
.iterate_devices = raid_iterate_devices,
.io_hints = raid_io_hints,
+ .presuspend = raid_presuspend,
.postsuspend = raid_postsuspend,
.preresume = raid_preresume,
.resume = raid_resume,
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 7f955115d78d..2db0a75febf9 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -648,6 +648,7 @@ struct md_personality
int (*start_reshape) (struct mddev *mddev);
void (*finish_reshape) (struct mddev *mddev);
void (*update_reshape_pos) (struct mddev *mddev);
+ void (*prepare_suspend) (struct mddev *mddev);
/* quiesce suspends or resumes internal processing.
* 1 - stop new actions and wait for action io to complete
* 0 - return to normal behaviour
--
2.39.2
Dear Kuai,
Thank you for your patch.
Am 05.03.24 um 08:22 schrieb Yu Kuai:
> From: Yu Kuai <[email protected]>
>
> The new heleprs will be used in dm-raid in later patches to fix
hel*pe*rs
> regressions and prevent calling md_reap_sync_thread() directly.
Please list the new functions.
1. md_idle_sync_thread(struct mddev *mddev);
2. md_frozen_sync_thread(struct mddev *mddev);
3. md_unfrozen_sync_thread(struct mddev *mddev);
I do not like the naming so much. `md_reap_sync_thread()` has the verb
in imperative mood. At least myself, I would not know what the functions
do exactly without looking at the implementation.
Kind regards,
Paul
> Signed-off-by: Yu Kuai <[email protected]>
> Signed-off-by: Xiao Ni <[email protected]>
> Acked-by: Mike Snitzer <[email protected]>
> ---
> drivers/md/md.c | 29 +++++++++++++++++++++++++++++
> drivers/md/md.h | 3 +++
> 2 files changed, 32 insertions(+)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 23f31fd1d024..22e7512a5b1e 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4919,6 +4919,35 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq)
> mddev_lock_nointr(mddev);
> }
>
> +void md_idle_sync_thread(struct mddev *mddev)
> +{
> + lockdep_assert_held(&mddev->reconfig_mutex);
> +
> + clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> + stop_sync_thread(mddev, true, true);
> +}
> +EXPORT_SYMBOL_GPL(md_idle_sync_thread);
> +
> +void md_frozen_sync_thread(struct mddev *mddev)
> +{
> + lockdep_assert_held(&mddev->reconfig_mutex);
> +
> + set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> + stop_sync_thread(mddev, true, false);
> +}
> +EXPORT_SYMBOL_GPL(md_frozen_sync_thread);
> +
> +void md_unfrozen_sync_thread(struct mddev *mddev)
> +{
> + lockdep_assert_held(&mddev->reconfig_mutex);
> +
> + clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> + set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> + md_wakeup_thread(mddev->thread);
> + sysfs_notify_dirent_safe(mddev->sysfs_action);
> +}
> +EXPORT_SYMBOL_GPL(md_unfrozen_sync_thread);
> +
> static void idle_sync_thread(struct mddev *mddev)
> {
> mutex_lock(&mddev->sync_mutex);
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 8d881cc59799..437ab70ce79b 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -781,6 +781,9 @@ extern void md_rdev_clear(struct md_rdev *rdev);
> extern void md_handle_request(struct mddev *mddev, struct bio *bio);
> extern int mddev_suspend(struct mddev *mddev, bool interruptible);
> extern void mddev_resume(struct mddev *mddev);
> +extern void md_idle_sync_thread(struct mddev *mddev);
> +extern void md_frozen_sync_thread(struct mddev *mddev);
> +extern void md_unfrozen_sync_thread(struct mddev *mddev);
>
> extern void md_reload_sb(struct mddev *mddev, int raid_disk);
> extern void md_update_sb(struct mddev *mddev, int force);
Hi,
在 2024/03/05 16:08, Paul Menzel 写道:
> Dear Kuai,
>
>
> Thank you for your patch.
>
> Am 05.03.24 um 08:22 schrieb Yu Kuai:
>> From: Yu Kuai <[email protected]>
>>
>> The new heleprs will be used in dm-raid in later patches to fix
>
> hel*pe*rs
>
>> regressions and prevent calling md_reap_sync_thread() directly.
>
> Please list the new functions.
>
> 1. md_idle_sync_thread(struct mddev *mddev);
> 2. md_frozen_sync_thread(struct mddev *mddev);
> 3. md_unfrozen_sync_thread(struct mddev *mddev);
>
> I do not like the naming so much. `md_reap_sync_thread()` has the verb
> in imperative mood. At least myself, I would not know what the functions
> do exactly without looking at the implementation.
>
Thanks for the suggestions, I'm not that good at naming :(
Usually I'll send a new version with updated naming, however, we must
merge this set ASAP now, perhaps can we live with this for now?
Thanks,
Kuai
>
> Kind regards,
>
> Paul
>
>
>> Signed-off-by: Yu Kuai <[email protected]>
>> Signed-off-by: Xiao Ni <[email protected]>
>> Acked-by: Mike Snitzer <[email protected]>
>> ---
>> drivers/md/md.c | 29 +++++++++++++++++++++++++++++
>> drivers/md/md.h | 3 +++
>> 2 files changed, 32 insertions(+)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 23f31fd1d024..22e7512a5b1e 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -4919,6 +4919,35 @@ static void stop_sync_thread(struct mddev
>> *mddev, bool locked, bool check_seq)
>> mddev_lock_nointr(mddev);
>> }
>> +void md_idle_sync_thread(struct mddev *mddev)
>> +{
>> + lockdep_assert_held(&mddev->reconfig_mutex);
>> +
>> + clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>> + stop_sync_thread(mddev, true, true);
>> +}
>> +EXPORT_SYMBOL_GPL(md_idle_sync_thread);
>> +
>> +void md_frozen_sync_thread(struct mddev *mddev)
>> +{
>> + lockdep_assert_held(&mddev->reconfig_mutex);
>> +
>> + set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>> + stop_sync_thread(mddev, true, false);
>> +}
>> +EXPORT_SYMBOL_GPL(md_frozen_sync_thread);
>> +
>> +void md_unfrozen_sync_thread(struct mddev *mddev)
>> +{
>> + lockdep_assert_held(&mddev->reconfig_mutex);
>> +
>> + clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>> + set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>> + md_wakeup_thread(mddev->thread);
>> + sysfs_notify_dirent_safe(mddev->sysfs_action);
>> +}
>> +EXPORT_SYMBOL_GPL(md_unfrozen_sync_thread);
>> +
>> static void idle_sync_thread(struct mddev *mddev)
>> {
>> mutex_lock(&mddev->sync_mutex);
>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>> index 8d881cc59799..437ab70ce79b 100644
>> --- a/drivers/md/md.h
>> +++ b/drivers/md/md.h
>> @@ -781,6 +781,9 @@ extern void md_rdev_clear(struct md_rdev *rdev);
>> extern void md_handle_request(struct mddev *mddev, struct bio *bio);
>> extern int mddev_suspend(struct mddev *mddev, bool interruptible);
>> extern void mddev_resume(struct mddev *mddev);
>> +extern void md_idle_sync_thread(struct mddev *mddev);
>> +extern void md_frozen_sync_thread(struct mddev *mddev);
>> +extern void md_unfrozen_sync_thread(struct mddev *mddev);
>> extern void md_reload_sb(struct mddev *mddev, int raid_disk);
>> extern void md_update_sb(struct mddev *mddev, int force);
> .
>
Dear Kuai,
Am 05.03.24 um 09:13 schrieb Yu Kuai:
> 在 2024/03/05 16:08, Paul Menzel 写道:
>> Am 05.03.24 um 08:22 schrieb Yu Kuai:
>>> From: Yu Kuai <[email protected]>
>>>
>>> The new heleprs will be used in dm-raid in later patches to fix
>>
>> hel*pe*rs
>>
>>> regressions and prevent calling md_reap_sync_thread() directly.
>>
>> Please list the new functions.
>>
>> 1. md_idle_sync_thread(struct mddev *mddev);
>> 2. md_frozen_sync_thread(struct mddev *mddev);
>> 3. md_unfrozen_sync_thread(struct mddev *mddev);
>>
>> I do not like the naming so much. `md_reap_sync_thread()` has the verb
>> in imperative mood. At least myself, I would not know what the
>> functions do exactly without looking at the implementation.
>
> Thanks for the suggestions, I'm not that good at naming :(
>
> Usually I'll send a new version with updated naming, however, we must
> merge this set ASAP now, perhaps can we live with this for now?
Fine by me. Maybe when applying the typo can be fixed, and the naming
than later.
Kind regards,
Paul
>>> Signed-off-by: Yu Kuai <[email protected]>
>>> Signed-off-by: Xiao Ni <[email protected]>
>>> Acked-by: Mike Snitzer <[email protected]>
>>> ---
>>> drivers/md/md.c | 29 +++++++++++++++++++++++++++++
>>> drivers/md/md.h | 3 +++
>>> 2 files changed, 32 insertions(+)
>>>
>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>> index 23f31fd1d024..22e7512a5b1e 100644
>>> --- a/drivers/md/md.c
>>> +++ b/drivers/md/md.c
>>> @@ -4919,6 +4919,35 @@ static void stop_sync_thread(struct mddev
>>> *mddev, bool locked, bool check_seq)
>>> mddev_lock_nointr(mddev);
>>> }
>>> +void md_idle_sync_thread(struct mddev *mddev)
>>> +{
>>> + lockdep_assert_held(&mddev->reconfig_mutex);
>>> +
>>> + clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>>> + stop_sync_thread(mddev, true, true);
>>> +}
>>> +EXPORT_SYMBOL_GPL(md_idle_sync_thread);
>>> +
>>> +void md_frozen_sync_thread(struct mddev *mddev)
>>> +{
>>> + lockdep_assert_held(&mddev->reconfig_mutex);
>>> +
>>> + set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>>> + stop_sync_thread(mddev, true, false);
>>> +}
>>> +EXPORT_SYMBOL_GPL(md_frozen_sync_thread);
>>> +
>>> +void md_unfrozen_sync_thread(struct mddev *mddev)
>>> +{
>>> + lockdep_assert_held(&mddev->reconfig_mutex);
>>> +
>>> + clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>>> + set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>> + md_wakeup_thread(mddev->thread);
>>> + sysfs_notify_dirent_safe(mddev->sysfs_action);
>>> +}
>>> +EXPORT_SYMBOL_GPL(md_unfrozen_sync_thread);
>>> +
>>> static void idle_sync_thread(struct mddev *mddev)
>>> {
>>> mutex_lock(&mddev->sync_mutex);
>>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>>> index 8d881cc59799..437ab70ce79b 100644
>>> --- a/drivers/md/md.h
>>> +++ b/drivers/md/md.h
>>> @@ -781,6 +781,9 @@ extern void md_rdev_clear(struct md_rdev *rdev);
>>> extern void md_handle_request(struct mddev *mddev, struct bio *bio);
>>> extern int mddev_suspend(struct mddev *mddev, bool interruptible);
>>> extern void mddev_resume(struct mddev *mddev);
>>> +extern void md_idle_sync_thread(struct mddev *mddev);
>>> +extern void md_frozen_sync_thread(struct mddev *mddev);
>>> +extern void md_unfrozen_sync_thread(struct mddev *mddev);
>>> extern void md_reload_sb(struct mddev *mddev, int raid_disk);
>>> extern void md_update_sb(struct mddev *mddev, int force);
Hi Paul,
Thanks for reviewing the patch!
On Wed, Mar 6, 2024 at 7:10 AM Paul Menzel <[email protected]> wrote:
>
> Dear Kuai,
>
>
> Am 05.03.24 um 09:13 schrieb Yu Kuai:
>
> > 在 2024/03/05 16:08, Paul Menzel 写道:
>
> >> Am 05.03.24 um 08:22 schrieb Yu Kuai:
> >>> From: Yu Kuai <[email protected]>
> >>>
> >>> The new heleprs will be used in dm-raid in later patches to fix
> >>
> >> hel*pe*rs
> >>
> >>> regressions and prevent calling md_reap_sync_thread() directly.
> >>
> >> Please list the new functions.
> >>
> >> 1. md_idle_sync_thread(struct mddev *mddev);
> >> 2. md_frozen_sync_thread(struct mddev *mddev);
> >> 3. md_unfrozen_sync_thread(struct mddev *mddev);
> >>
> >> I do not like the naming so much. `md_reap_sync_thread()` has the verb
> >> in imperative mood. At least myself, I would not know what the
> >> functions do exactly without looking at the implementation.
> >
> > Thanks for the suggestions, I'm not that good at naming :(
> >
> > Usually I'll send a new version with updated naming, however, we must
> > merge this set ASAP now, perhaps can we live with this for now?
>
> Fine by me. Maybe when applying the typo can be fixed, and the naming
> than later.
Yes, I did exactly this when applying the patches, but forgot to mention it
here.
Song