2024-03-01 10:05:35

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2

From: Yu Kuai <[email protected]>

link to part1: https://lore.kernel.org/all/CAPhsuW7u1UKHCDOBDhD7DzOVtkGemDz_QnJ4DUq_kSN-Q3G66Q@mail.gmail.com/

part1 contains fixes for deadlocks for stopping sync_thread

This set contains fixes:
- reshape can start unexpected, cause data corruption, patch 1,5,6;
- deadlocks that reshape concurrent with IO, patch 8;
- a lockdep warning, patch 9;

I'm runing lvm2 tests with following scripts with a few rounds now,

for t in `ls test/shell`; do
if cat test/shell/$t | grep raid &> /dev/null; then
make check T=shell/$t
fi
done

There are no deadlock and no fs corrupt now, however, there are still four
failed tests:

### failed: [ndev-vanilla] shell/lvchange-raid1-writemostly.sh
### failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
### failed: [ndev-vanilla] shell/lvcreate-large-raid.sh
### failed: [ndev-vanilla] shell/lvextend-raid.sh

And failed reasons are the same:

## ERROR: The test started dmeventd (147856) unexpectedly

I have no clue yet, and it seems other folks doesn't have this issue.

Yu Kuai (9):
md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume
md: export helpers to stop sync_thread
md: export helper md_is_rdwr()
md: add a new helper reshape_interrupted()
dm-raid: really frozen sync_thread during suspend
md/dm-raid: don't call md_reap_sync_thread() directly
dm-raid: add a new helper prepare_suspend() in md_personality
dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io
concurrent with reshape
dm-raid: fix lockdep waring in "pers->hot_add_disk"

drivers/md/dm-raid.c | 93 ++++++++++++++++++++++++++++++++++----------
drivers/md/md.c | 73 ++++++++++++++++++++++++++--------
drivers/md/md.h | 38 +++++++++++++++++-
drivers/md/raid5.c | 32 ++++++++++++++-
4 files changed, 196 insertions(+), 40 deletions(-)

--
2.39.2



2024-03-01 10:05:46

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 4/9] md: add a new helper reshape_interrupted()

From: Yu Kuai <[email protected]>

The helper will be used for dm-raid456 later to detect the case that
reshape can't make progress.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.h | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/drivers/md/md.h b/drivers/md/md.h
index 09368517cc6c..b961c1b4ead7 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -570,6 +570,25 @@ static inline bool md_is_rdwr(struct mddev *mddev)
return (mddev->ro == MD_RDWR);
}

+static inline bool reshape_interrupted(struct mddev *mddev)
+{
+ /* reshape never start */
+ if (mddev->reshape_position == MaxSector)
+ return false;
+
+ /* interrupted */
+ if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
+ return true;
+
+ /* running reshape will be interrupted soon. */
+ if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
+ test_bit(MD_RECOVERY_INTR, &mddev->recovery) ||
+ test_bit(MD_RECOVERY_FROZEN, &mddev->recovery))
+ return true;
+
+ return false;
+}
+
static inline int __must_check mddev_lock(struct mddev *mddev)
{
return mutex_lock_interruptible(&mddev->reconfig_mutex);
--
2.39.2


2024-03-01 10:06:00

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 3/9] md: export helper md_is_rdwr()

From: Yu Kuai <[email protected]>

There are no functional changes for now, prepare to fix a deadlock for
dm-raid456.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 12 ------------
drivers/md/md.h | 12 ++++++++++++
2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5f6496cf43f5..2fe8b937998b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -99,18 +99,6 @@ static void mddev_detach(struct mddev *mddev);
static void export_rdev(struct md_rdev *rdev, struct mddev *mddev);
static void md_wakeup_thread_directly(struct md_thread __rcu *thread);

-enum md_ro_state {
- MD_RDWR,
- MD_RDONLY,
- MD_AUTO_READ,
- MD_MAX_STATE
-};
-
-static bool md_is_rdwr(struct mddev *mddev)
-{
- return (mddev->ro == MD_RDWR);
-}
-
/*
* Default number of read corrections we'll attempt on an rdev
* before ejecting it from the array. We divide the read error
diff --git a/drivers/md/md.h b/drivers/md/md.h
index a6d33f10b107..09368517cc6c 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -558,6 +558,18 @@ enum recovery_flags {
MD_RESYNCING_REMOTE, /* remote node is running resync thread */
};

+enum md_ro_state {
+ MD_RDWR,
+ MD_RDONLY,
+ MD_AUTO_READ,
+ MD_MAX_STATE
+};
+
+static inline bool md_is_rdwr(struct mddev *mddev)
+{
+ return (mddev->ro == MD_RDWR);
+}
+
static inline int __must_check mddev_lock(struct mddev *mddev)
{
return mutex_lock_interruptible(&mddev->reconfig_mutex);
--
2.39.2


2024-03-01 10:06:52

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 5/9] dm-raid: really frozen sync_thread during suspend

From: Yu Kuai <[email protected]>

1) commit f52f5c71f3d4 ("md: fix stopping sync thread") remove
MD_RECOVERY_FROZEN from __md_stop_writes() and doesn't realize that
dm-raid relies on __md_stop_writes() to frozen sync_thread
indirectly. Fix this problem by adding MD_RECOVERY_FROZEN in
md_stop_writes(), and since stop_sync_thread() is only used for
dm-raid in this case, also move stop_sync_thread() to
md_stop_writes().
2) The flag MD_RECOVERY_FROZEN doesn't mean that sync thread is frozen,
it only prevent new sync_thread to start, and it can't stop the
running sync thread; In order to frozen sync_thread, after seting the
flag, stop_sync_thread() should be used.
3) The flag MD_RECOVERY_FROZEN doesn't mean that writes are stopped, use
it as condition for md_stop_writes() in raid_postsuspend() doesn't
look correct. Consider that reentrant stop_sync_thread() do nothing,
always call md_stop_writes() in raid_postsuspend().
4) raid_message can set/clear the flag MD_RECOVERY_FROZEN at anytime,
and if MD_RECOVERY_FROZEN is cleared while the array is suspended,
new sync_thread can start unexpected. Fix this by disallow
raid_message() to change sync_thread status during suspend.

Note that after commit f52f5c71f3d4 ("md: fix stopping sync thread"), the
test shell/lvconvert-raid-reshape.sh start to hang in stop_sync_thread(),
and with previous fixes, the test won't hang there anymore, however, the
test will still fail and complain that ext4 is corrupted. And with this
patch, the test won't hang due to stop_sync_thread() or fail due to ext4
is corrupted anymore. However, there is still a deadlock related to
dm-raid456 that will be fixed in following patches.

Reported-by: Mikulas Patocka <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Fixes: 1af2048a3e87 ("dm raid: fix deadlock caused by premature md_stop_writes()")
Fixes: 9dbd1aa3a81c ("dm raid: add reshaping support to the target")
Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/dm-raid.c | 25 +++++++++++++++----------
drivers/md/md.c | 3 ++-
2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 6bb1765be1e5..ac8b37fcf76f 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3240,11 +3240,12 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv)
rs->md.ro = 1;
rs->md.in_sync = 1;

- /* Keep array frozen until resume. */
- set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery);
-
/* Has to be held on running the array */
mddev_suspend_and_lock_nointr(&rs->md);
+
+ /* Keep array frozen until resume. */
+ md_frozen_sync_thread(&rs->md);
+
r = md_run(&rs->md);
rs->md.in_sync = 0; /* Assume already marked dirty */
if (r) {
@@ -3722,6 +3723,9 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
if (!mddev->pers || !mddev->pers->sync_request)
return -EINVAL;

+ if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags))
+ return -EBUSY;
+
if (!strcasecmp(argv[0], "frozen"))
set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
else
@@ -3796,10 +3800,11 @@ static void raid_postsuspend(struct dm_target *ti)
struct raid_set *rs = ti->private;

if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
- /* Writes have to be stopped before suspending to avoid deadlocks. */
- if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
- md_stop_writes(&rs->md);
-
+ /*
+ * sync_thread must be stopped during suspend, and writes have
+ * to be stopped before suspending to avoid deadlocks.
+ */
+ md_stop_writes(&rs->md);
mddev_suspend(&rs->md, false);
}
}
@@ -4012,8 +4017,6 @@ static int raid_preresume(struct dm_target *ti)
}

/* Check for any resize/reshape on @rs and adjust/initiate */
- /* Be prepared for mddev_resume() in raid_resume() */
- set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
if (mddev->recovery_cp && mddev->recovery_cp < MaxSector) {
set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
mddev->resync_min = mddev->recovery_cp;
@@ -4055,10 +4058,12 @@ static void raid_resume(struct dm_target *ti)
if (mddev->delta_disks < 0)
rs_set_capacity(rs);

+ WARN_ON_ONCE(!test_bit(MD_RECOVERY_FROZEN, &mddev->recovery));
+ WARN_ON_ONCE(test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
mddev_lock_nointr(mddev);
- clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
mddev->ro = 0;
mddev->in_sync = 0;
+ md_unfrozen_sync_thread(mddev);
mddev_unlock_and_resume(mddev);
}
}
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2fe8b937998b..94dff077fbf4 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6326,7 +6326,6 @@ static void md_clean(struct mddev *mddev)

static void __md_stop_writes(struct mddev *mddev)
{
- stop_sync_thread(mddev, true, false);
del_timer_sync(&mddev->safemode_timer);

if (mddev->pers && mddev->pers->quiesce) {
@@ -6351,6 +6350,8 @@ static void __md_stop_writes(struct mddev *mddev)
void md_stop_writes(struct mddev *mddev)
{
mddev_lock_nointr(mddev);
+ set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+ stop_sync_thread(mddev, true, false);
__md_stop_writes(mddev);
mddev_unlock(mddev);
}
--
2.39.2


2024-03-01 10:06:57

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 6/9] md/dm-raid: don't call md_reap_sync_thread() directly

From: Yu Kuai <[email protected]>

Currently md_reap_sync_thread() is called from raid_message() directly
without holding 'reconfig_mutex', this is definitely unsafe because
md_reap_sync_thread() can change many fields that is protected by
'reconfig_mutex'.

However, hold 'reconfig_mutex' here is still problematic because this
will cause deadlock, for example, commit 130443d60b1b ("md: refactor
idle/frozen_sync_thread() to fix deadlock").

Fix this problem by using stop_sync_thread() to unregister sync_thread,
like md/raid did.

Fixes: be83651f0050 ("DM RAID: Add message/status support for changing sync action")
Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/dm-raid.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index ac8b37fcf76f..766a0334460e 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3719,6 +3719,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
{
struct raid_set *rs = ti->private;
struct mddev *mddev = &rs->md;
+ int ret = 0;

if (!mddev->pers || !mddev->pers->sync_request)
return -EINVAL;
@@ -3726,17 +3727,24 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags))
return -EBUSY;

- if (!strcasecmp(argv[0], "frozen"))
- set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
- else
- clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+ if (!strcasecmp(argv[0], "frozen")) {
+ ret = mddev_lock(mddev);
+ if (ret)
+ return ret;

- if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
- if (mddev->sync_thread) {
- set_bit(MD_RECOVERY_INTR, &mddev->recovery);
- md_reap_sync_thread(mddev);
- }
- } else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
+ md_frozen_sync_thread(mddev);
+ mddev_unlock(mddev);
+ } else if (!strcasecmp(argv[0], "idle")) {
+ ret = mddev_lock(mddev);
+ if (ret)
+ return ret;
+
+ md_idle_sync_thread(mddev);
+ mddev_unlock(mddev);
+ }
+
+ clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+ if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
return -EBUSY;
else if (!strcasecmp(argv[0], "resync"))
; /* MD_RECOVERY_NEEDED set below */
--
2.39.2


2024-03-01 10:06:59

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 9/9] dm-raid: fix lockdep waring in "pers->hot_add_disk"

From: Yu Kuai <[email protected]>

The lockdep assert is added by commit a448af25becf ("md/raid10: remove
rcu protection to access rdev from conf") in print_conf(). And I didn't
notice that dm-raid is calling "pers->hot_add_disk" without holding
'reconfig_mutex'.

"pers->hot_add_disk" read and write many fields that is protected by
'reconfig_mutex', and raid_resume() already grab the lock in other
contex. Hence fix this problem by protecting "pers->host_add_disk"
with the lock.

Fixes: 9092c02d9435 ("DM RAID: Add ability to restore transiently failed devices on resume")
Fixes: a448af25becf ("md/raid10: remove rcu protection to access rdev from conf")

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/dm-raid.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 64d381123ce3..97ad4a8582c4 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -4091,7 +4091,9 @@ static void raid_resume(struct dm_target *ti)
* Take this opportunity to check whether any failed
* devices are reachable again.
*/
+ mddev_lock_nointr(mddev);
attempt_restore_of_faulty_devices(rs);
+ mddev_unlock(mddev);
}

if (test_and_clear_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
--
2.39.2


2024-03-01 10:07:07

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 7/9] dm-raid: add a new helper prepare_suspend() in md_personality

From: Yu Kuai <[email protected]>

There are no functional changes for now, prepare to fix a deadlock for
dm-raid456.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/dm-raid.c | 18 ++++++++++++++++++
drivers/md/md.h | 1 +
2 files changed, 19 insertions(+)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 766a0334460e..002dcac20403 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3803,6 +3803,23 @@ static void raid_io_hints(struct dm_target *ti, struct queue_limits *limits)
blk_limits_io_opt(limits, chunk_size_bytes * mddev_data_stripes(rs));
}

+static void raid_presuspend(struct dm_target *ti)
+{
+ struct raid_set *rs = ti->private;
+ struct mddev *mddev = &rs->md;
+
+ if (!reshape_interrupted(mddev))
+ return;
+
+ /*
+ * For raid456, if reshape is interrupted, IO across reshape position
+ * will never make progress, while caller will wait for IO to be done.
+ * Inform raid456 to handle those IO to prevent deadlock.
+ */
+ if (mddev->pers && mddev->pers->prepare_suspend)
+ mddev->pers->prepare_suspend(mddev);
+}
+
static void raid_postsuspend(struct dm_target *ti)
{
struct raid_set *rs = ti->private;
@@ -4087,6 +4104,7 @@ static struct target_type raid_target = {
.message = raid_message,
.iterate_devices = raid_iterate_devices,
.io_hints = raid_io_hints,
+ .presuspend = raid_presuspend,
.postsuspend = raid_postsuspend,
.preresume = raid_preresume,
.resume = raid_resume,
diff --git a/drivers/md/md.h b/drivers/md/md.h
index b961c1b4ead7..23080727e75b 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -648,6 +648,7 @@ struct md_personality
int (*start_reshape) (struct mddev *mddev);
void (*finish_reshape) (struct mddev *mddev);
void (*update_reshape_pos) (struct mddev *mddev);
+ void (*prepare_suspend) (struct mddev *mddev);
/* quiesce suspends or resumes internal processing.
* 1 - stop new actions and wait for action io to complete
* 0 - return to normal behaviour
--
2.39.2


2024-03-01 10:07:22

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next 8/9] dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape

From: Yu Kuai <[email protected]>

For raid456, if reshape is still in progress, then IO across reshape
position will wait for reshape to make progress. However, for dm-raid,
in following cases reshape will never make progress hence IO will hang:

1) the array is read-only;
2) MD_RECOVERY_WAIT is set;
3) MD_RECOVERY_FROZEN is set;

After commit c467e97f079f ("md/raid6: use valid sector values to determine
if an I/O should wait on the reshape") fix the problem that IO across
reshape position doesn't wait for reshape, the dm-raid test
shell/lvconvert-raid-reshape.sh start to hang:

[root@fedora ~]# cat /proc/979/stack
[<0>] wait_woken+0x7d/0x90
[<0>] raid5_make_request+0x929/0x1d70 [raid456]
[<0>] md_handle_request+0xc2/0x3b0 [md_mod]
[<0>] raid_map+0x2c/0x50 [dm_raid]
[<0>] __map_bio+0x251/0x380 [dm_mod]
[<0>] dm_submit_bio+0x1f0/0x760 [dm_mod]
[<0>] __submit_bio+0xc2/0x1c0
[<0>] submit_bio_noacct_nocheck+0x17f/0x450
[<0>] submit_bio_noacct+0x2bc/0x780
[<0>] submit_bio+0x70/0xc0
[<0>] mpage_readahead+0x169/0x1f0
[<0>] blkdev_readahead+0x18/0x30
[<0>] read_pages+0x7c/0x3b0
[<0>] page_cache_ra_unbounded+0x1ab/0x280
[<0>] force_page_cache_ra+0x9e/0x130
[<0>] page_cache_sync_ra+0x3b/0x110
[<0>] filemap_get_pages+0x143/0xa30
[<0>] filemap_read+0xdc/0x4b0
[<0>] blkdev_read_iter+0x75/0x200
[<0>] vfs_read+0x272/0x460
[<0>] ksys_read+0x7a/0x170
[<0>] __x64_sys_read+0x1c/0x30
[<0>] do_syscall_64+0xc6/0x230
[<0>] entry_SYSCALL_64_after_hwframe+0x6c/0x74

This is because reshape can't make progress.

For md/raid, the problem doesn't exist because register new sync_thread
doesn't rely on the IO to be done any more:

1) If array is read-only, it can switch to read-write by ioctl/sysfs;
2) md/raid never set MD_RECOVERY_WAIT;
3) If MD_RECOVERY_FROZEN is set, mddev_suspend() doesn't hold
'reconfig_mutex', hence it can be cleared and reshape can continue by
sysfs api 'sync_action'.

However, I'm not sure yet how to avoid the problem in dm-raid yet. This
patch one the one hand make sure raid_message() can't change
sync_thread() through raid_message() after presuspend(), on the other
hand detect the above 3 cases before wait for IO do be done in
dm_suspend(), and let dm-raid requeue those IO.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/dm-raid.c | 22 ++++++++++++++++++++--
drivers/md/md.c | 24 ++++++++++++++++++++++--
drivers/md/md.h | 3 ++-
drivers/md/raid5.c | 32 ++++++++++++++++++++++++++++++--
4 files changed, 74 insertions(+), 7 deletions(-)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 002dcac20403..64d381123ce3 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -213,6 +213,7 @@ struct raid_dev {
#define RT_FLAG_RS_IN_SYNC 6
#define RT_FLAG_RS_RESYNCING 7
#define RT_FLAG_RS_GROW 8
+#define RT_FLAG_RS_FROZEN 9

/* Array elements of 64 bit needed for rebuild/failed disk bits */
#define DISKS_ARRAY_ELEMS ((MAX_RAID_DEVICES + (sizeof(uint64_t) * 8 - 1)) / sizeof(uint64_t) / 8)
@@ -3340,7 +3341,8 @@ static int raid_map(struct dm_target *ti, struct bio *bio)
if (unlikely(bio_end_sector(bio) > mddev->array_sectors))
return DM_MAPIO_REQUEUE;

- md_handle_request(mddev, bio);
+ if (unlikely(!md_handle_request(mddev, bio)))
+ return DM_MAPIO_REQUEUE;

return DM_MAPIO_SUBMITTED;
}
@@ -3724,7 +3726,8 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
if (!mddev->pers || !mddev->pers->sync_request)
return -EINVAL;

- if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags))
+ if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags) ||
+ test_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags))
return -EBUSY;

if (!strcasecmp(argv[0], "frozen")) {
@@ -3808,6 +3811,12 @@ static void raid_presuspend(struct dm_target *ti)
struct raid_set *rs = ti->private;
struct mddev *mddev = &rs->md;

+ /*
+ * From now on, disallow raid_message() to change sync_thread until
+ * resume, raid_postsuspend() is too late.
+ */
+ set_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags);
+
if (!reshape_interrupted(mddev))
return;

@@ -3820,6 +3829,13 @@ static void raid_presuspend(struct dm_target *ti)
mddev->pers->prepare_suspend(mddev);
}

+static void raid_presuspend_undo(struct dm_target *ti)
+{
+ struct raid_set *rs = ti->private;
+
+ clear_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags);
+}
+
static void raid_postsuspend(struct dm_target *ti)
{
struct raid_set *rs = ti->private;
@@ -4085,6 +4101,7 @@ static void raid_resume(struct dm_target *ti)

WARN_ON_ONCE(!test_bit(MD_RECOVERY_FROZEN, &mddev->recovery));
WARN_ON_ONCE(test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
+ clear_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags);
mddev_lock_nointr(mddev);
mddev->ro = 0;
mddev->in_sync = 0;
@@ -4105,6 +4122,7 @@ static struct target_type raid_target = {
.iterate_devices = raid_iterate_devices,
.io_hints = raid_io_hints,
.presuspend = raid_presuspend,
+ .presuspend_undo = raid_presuspend_undo,
.postsuspend = raid_postsuspend,
.preresume = raid_preresume,
.resume = raid_resume,
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 94dff077fbf4..f37903786fc5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -366,7 +366,7 @@ static bool is_suspended(struct mddev *mddev, struct bio *bio)
return true;
}

-void md_handle_request(struct mddev *mddev, struct bio *bio)
+bool md_handle_request(struct mddev *mddev, struct bio *bio)
{
check_suspended:
if (is_suspended(mddev, bio)) {
@@ -374,7 +374,7 @@ void md_handle_request(struct mddev *mddev, struct bio *bio)
/* Bail out if REQ_NOWAIT is set for the bio */
if (bio->bi_opf & REQ_NOWAIT) {
bio_wouldblock_error(bio);
- return;
+ return true;
}
for (;;) {
prepare_to_wait(&mddev->sb_wait, &__wait,
@@ -390,10 +390,13 @@ void md_handle_request(struct mddev *mddev, struct bio *bio)

if (!mddev->pers->make_request(mddev, bio)) {
percpu_ref_put(&mddev->active_io);
+ if (!mddev->gendisk && mddev->pers->prepare_suspend)
+ return false;
goto check_suspended;
}

percpu_ref_put(&mddev->active_io);
+ return true;
}
EXPORT_SYMBOL(md_handle_request);

@@ -8738,6 +8741,23 @@ void md_account_bio(struct mddev *mddev, struct bio **bio)
}
EXPORT_SYMBOL_GPL(md_account_bio);

+void md_free_cloned_bio(struct bio *bio)
+{
+ struct md_io_clone *md_io_clone = bio->bi_private;
+ struct bio *orig_bio = md_io_clone->orig_bio;
+ struct mddev *mddev = md_io_clone->mddev;
+
+ if (bio->bi_status && !orig_bio->bi_status)
+ orig_bio->bi_status = bio->bi_status;
+
+ if (md_io_clone->start_time)
+ bio_end_io_acct(orig_bio, md_io_clone->start_time);
+
+ bio_put(bio);
+ percpu_ref_put(&mddev->active_io);
+}
+EXPORT_SYMBOL_GPL(md_free_cloned_bio);
+
/* md_allow_write(mddev)
* Calling this ensures that the array is marked 'active' so that writes
* may proceed without blocking. It is important to call this before
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 23080727e75b..588381fa25de 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -782,6 +782,7 @@ extern void md_finish_reshape(struct mddev *mddev);
void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
struct bio *bio, sector_t start, sector_t size);
void md_account_bio(struct mddev *mddev, struct bio **bio);
+void md_free_cloned_bio(struct bio *bio);

extern bool __must_check md_flush_request(struct mddev *mddev, struct bio *bio);
extern void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
@@ -810,7 +811,7 @@ extern void md_stop_writes(struct mddev *mddev);
extern int md_rdev_init(struct md_rdev *rdev);
extern void md_rdev_clear(struct md_rdev *rdev);

-extern void md_handle_request(struct mddev *mddev, struct bio *bio);
+extern bool md_handle_request(struct mddev *mddev, struct bio *bio);
extern int mddev_suspend(struct mddev *mddev, bool interruptible);
extern void mddev_resume(struct mddev *mddev);
extern void md_idle_sync_thread(struct mddev *mddev);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index edf368fdc733..f1c41fd0f636 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -760,6 +760,7 @@ enum stripe_result {
STRIPE_RETRY,
STRIPE_SCHEDULE_AND_RETRY,
STRIPE_FAIL,
+ STRIPE_WAIT_RESHAPE,
};

struct stripe_request_ctx {
@@ -5946,7 +5947,8 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
if (ahead_of_reshape(mddev, logical_sector,
conf->reshape_safe)) {
spin_unlock_irq(&conf->device_lock);
- return STRIPE_SCHEDULE_AND_RETRY;
+ ret = STRIPE_SCHEDULE_AND_RETRY;
+ goto out;
}
}
spin_unlock_irq(&conf->device_lock);
@@ -6025,6 +6027,12 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,

out_release:
raid5_release_stripe(sh);
+out:
+ if (ret == STRIPE_SCHEDULE_AND_RETRY && reshape_interrupted(mddev)) {
+ bi->bi_status = BLK_STS_RESOURCE;
+ ret = STRIPE_WAIT_RESHAPE;
+ pr_err_ratelimited("dm-raid456: io across reshape position while reshape can't make progress");
+ }
return ret;
}

@@ -6146,7 +6154,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
while (1) {
res = make_stripe_request(mddev, conf, &ctx, logical_sector,
bi);
- if (res == STRIPE_FAIL)
+ if (res == STRIPE_FAIL || res == STRIPE_WAIT_RESHAPE)
break;

if (res == STRIPE_RETRY)
@@ -6184,6 +6192,11 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)

if (rw == WRITE)
md_write_end(mddev);
+ if (res == STRIPE_WAIT_RESHAPE) {
+ md_free_cloned_bio(bi);
+ return false;
+ }
+
bio_endio(bi);
return true;
}
@@ -8907,6 +8920,18 @@ static int raid5_start(struct mddev *mddev)
return r5l_start(conf->log);
}

+/*
+ * This is only used for dm-raid456, caller already frozen sync_thread, hence
+ * if rehsape is still in progress, io that is waiting for reshape can never be
+ * done now, hence wake up and handle those IO.
+ */
+static void raid5_prepare_suspend(struct mddev *mddev)
+{
+ struct r5conf *conf = mddev->private;
+
+ wake_up(&conf->wait_for_overlap);
+}
+
static struct md_personality raid6_personality =
{
.name = "raid6",
@@ -8930,6 +8955,7 @@ static struct md_personality raid6_personality =
.quiesce = raid5_quiesce,
.takeover = raid6_takeover,
.change_consistency_policy = raid5_change_consistency_policy,
+ .prepare_suspend = raid5_prepare_suspend,
};
static struct md_personality raid5_personality =
{
@@ -8954,6 +8980,7 @@ static struct md_personality raid5_personality =
.quiesce = raid5_quiesce,
.takeover = raid5_takeover,
.change_consistency_policy = raid5_change_consistency_policy,
+ .prepare_suspend = raid5_prepare_suspend,
};

static struct md_personality raid4_personality =
@@ -8979,6 +9006,7 @@ static struct md_personality raid4_personality =
.quiesce = raid5_quiesce,
.takeover = raid4_takeover,
.change_consistency_policy = raid5_change_consistency_policy,
+ .prepare_suspend = raid5_prepare_suspend,
};

static int __init raid5_init(void)
--
2.39.2


2024-03-01 22:36:28

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2

On Fri, Mar 1, 2024 at 2:03 AM Yu Kuai <[email protected]> wrote:
>
> From: Yu Kuai <[email protected]>
>
> link to part1: https://lore.kernel.org/all/CAPhsuW7u1UKHCDOBDhD7DzOVtkGemDz_QnJ4DUq_kSN-Q3G66Q@mail.gmail.com/
>
> part1 contains fixes for deadlocks for stopping sync_thread
>
> This set contains fixes:
> - reshape can start unexpected, cause data corruption, patch 1,5,6;
> - deadlocks that reshape concurrent with IO, patch 8;
> - a lockdep warning, patch 9;
>
> I'm runing lvm2 tests with following scripts with a few rounds now,
>
> for t in `ls test/shell`; do
> if cat test/shell/$t | grep raid &> /dev/null; then
> make check T=shell/$t
> fi
> done
>
> There are no deadlock and no fs corrupt now, however, there are still four
> failed tests:
>
> ### failed: [ndev-vanilla] shell/lvchange-raid1-writemostly.sh
> ### failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
> ### failed: [ndev-vanilla] shell/lvcreate-large-raid.sh
> ### failed: [ndev-vanilla] shell/lvextend-raid.sh
>
> And failed reasons are the same:
>
> ## ERROR: The test started dmeventd (147856) unexpectedly
>
> I have no clue yet, and it seems other folks doesn't have this issue.
>
> Yu Kuai (9):
> md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume
> md: export helpers to stop sync_thread
> md: export helper md_is_rdwr()
> md: add a new helper reshape_interrupted()
> dm-raid: really frozen sync_thread during suspend
> md/dm-raid: don't call md_reap_sync_thread() directly
> dm-raid: add a new helper prepare_suspend() in md_personality
> dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io
> concurrent with reshape
> dm-raid: fix lockdep waring in "pers->hot_add_disk"

This set looks good to me and passes the tests: reshape tests from
lvm2, mdadm tests, and the reboot test that catches some issue in
Xiao's version.

DM folks, please help review and test this set. If it looks good, we
can route it either via the md tree (I am thinking about md-6.8
branch) or the dm tree.

CC Jens,

I understand it is already late in the release cycle for 6.8 kernel.
Please let us know your thoughts on this set. These patches fixes
a crash when running lvm2 tests that are related to md-raid
reshape.

Thanks,
Song

2024-03-02 15:56:53

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2

On Fri, Mar 01 2024 at 5:36P -0500,
Song Liu <[email protected]> wrote:

> On Fri, Mar 1, 2024 at 2:03 AM Yu Kuai <[email protected]> wrote:
> >
> > From: Yu Kuai <[email protected]>
> >
> > link to part1: https://lore.kernel.org/all/CAPhsuW7u1UKHCDOBDhD7DzOVtkGemDz_QnJ4DUq_kSN-Q3G66Q@mail.gmail.com/
> >
> > part1 contains fixes for deadlocks for stopping sync_thread
> >
> > This set contains fixes:
> > - reshape can start unexpected, cause data corruption, patch 1,5,6;
> > - deadlocks that reshape concurrent with IO, patch 8;
> > - a lockdep warning, patch 9;
> >
> > I'm runing lvm2 tests with following scripts with a few rounds now,
> >
> > for t in `ls test/shell`; do
> > if cat test/shell/$t | grep raid &> /dev/null; then
> > make check T=shell/$t
> > fi
> > done
> >
> > There are no deadlock and no fs corrupt now, however, there are still four
> > failed tests:
> >
> > ### failed: [ndev-vanilla] shell/lvchange-raid1-writemostly.sh
> > ### failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
> > ### failed: [ndev-vanilla] shell/lvcreate-large-raid.sh
> > ### failed: [ndev-vanilla] shell/lvextend-raid.sh
> >
> > And failed reasons are the same:
> >
> > ## ERROR: The test started dmeventd (147856) unexpectedly
> >
> > I have no clue yet, and it seems other folks doesn't have this issue.
> >
> > Yu Kuai (9):
> > md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume
> > md: export helpers to stop sync_thread
> > md: export helper md_is_rdwr()
> > md: add a new helper reshape_interrupted()
> > dm-raid: really frozen sync_thread during suspend
> > md/dm-raid: don't call md_reap_sync_thread() directly
> > dm-raid: add a new helper prepare_suspend() in md_personality
> > dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io
> > concurrent with reshape
> > dm-raid: fix lockdep waring in "pers->hot_add_disk"
>
> This set looks good to me and passes the tests: reshape tests from
> lvm2, mdadm tests, and the reboot test that catches some issue in
> Xiao's version.
>
> DM folks, please help review and test this set. If it looks good, we
> can route it either via the md tree (I am thinking about md-6.8
> branch) or the dm tree.

Please send these changes through md-6.8.

There are a few typos in patch subjects and headers but:

Acked-by: Mike Snitzer <[email protected]>

> CC Jens,
>
> I understand it is already late in the release cycle for 6.8 kernel.
> Please let us know your thoughts on this set. These patches fixes
> a crash when running lvm2 tests that are related to md-raid
> reshape.

Would be good to get these into 6.8, but worst case if they slip to
the 6.9 merge is they'll go to relevant stable kernels (due to
"Fixes:" tags, though not all commits have Fixes).

Mike

2024-03-03 13:18:04

by Xiao Ni

[permalink] [raw]
Subject: Re: [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2

Hi all

There is a error report from lvm regression tests. The case is
lvconvert-raid-reshape-stripes-load-reload.sh. I saw this error when I
tried to fix dmraid regression problems too. In my patch set, after
reverting ad39c08186f8a0f221337985036ba86731d6aafe (md: Don't register
sync_thread for reshape directly), this problem doesn't appear.

I put the log in the attachment.

On Fri, Mar 1, 2024 at 6:03 PM Yu Kuai <[email protected]> wrote:
>
> From: Yu Kuai <[email protected]>
>
> link to part1: https://lore.kernel.org/all/CAPhsuW7u1UKHCDOBDhD7DzOVtkGemDz_QnJ4DUq_kSN-Q3G66Q@mail.gmail.com/
>
> part1 contains fixes for deadlocks for stopping sync_thread
>
> This set contains fixes:
> - reshape can start unexpected, cause data corruption, patch 1,5,6;
> - deadlocks that reshape concurrent with IO, patch 8;
> - a lockdep warning, patch 9;
>
> I'm runing lvm2 tests with following scripts with a few rounds now,
>
> for t in `ls test/shell`; do
> if cat test/shell/$t | grep raid &> /dev/null; then
> make check T=shell/$t
> fi
> done
>
> There are no deadlock and no fs corrupt now, however, there are still four
> failed tests:
>
> ### failed: [ndev-vanilla] shell/lvchange-raid1-writemostly.sh
> ### failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
> ### failed: [ndev-vanilla] shell/lvcreate-large-raid.sh
> ### failed: [ndev-vanilla] shell/lvextend-raid.sh
>
> And failed reasons are the same:
>
> ## ERROR: The test started dmeventd (147856) unexpectedly
>
> I have no clue yet, and it seems other folks doesn't have this issue.
>
> Yu Kuai (9):
> md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume
> md: export helpers to stop sync_thread
> md: export helper md_is_rdwr()
> md: add a new helper reshape_interrupted()
> dm-raid: really frozen sync_thread during suspend
> md/dm-raid: don't call md_reap_sync_thread() directly
> dm-raid: add a new helper prepare_suspend() in md_personality
> dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io
> concurrent with reshape
> dm-raid: fix lockdep waring in "pers->hot_add_disk"
>
> drivers/md/dm-raid.c | 93 ++++++++++++++++++++++++++++++++++----------
> drivers/md/md.c | 73 ++++++++++++++++++++++++++--------
> drivers/md/md.h | 38 +++++++++++++++++-
> drivers/md/raid5.c | 32 ++++++++++++++-
> 4 files changed, 196 insertions(+), 40 deletions(-)
>
> --
> 2.39.2
>


Attachments:
shell_lvconvert-raid-reshape-stripes-load-reload.sh.txt (159.47 kB)

2024-03-04 01:07:45

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2

Hi,

在 2024/03/03 21:16, Xiao Ni 写道:
> Hi all
>
> There is a error report from lvm regression tests. The case is
> lvconvert-raid-reshape-stripes-load-reload.sh. I saw this error when I
> tried to fix dmraid regression problems too. In my patch set, after
> reverting ad39c08186f8a0f221337985036ba86731d6aafe (md: Don't register
> sync_thread for reshape directly), this problem doesn't appear.

How often did you see this tes failed? I'm running the tests for over
two days now, for 30+ rounds, and this test never fail in my VM.

Thanks,
Kuai

>
> I put the log in the attachment.
>
> On Fri, Mar 1, 2024 at 6:03 PM Yu Kuai <[email protected]> wrote:
>>
>> From: Yu Kuai <[email protected]>
>>
>> link to part1: https://lore.kernel.org/all/CAPhsuW7u1UKHCDOBDhD7DzOVtkGemDz_QnJ4DUq_kSN-Q3G66Q@mail.gmail.com/
>>
>> part1 contains fixes for deadlocks for stopping sync_thread
>>
>> This set contains fixes:
>> - reshape can start unexpected, cause data corruption, patch 1,5,6;
>> - deadlocks that reshape concurrent with IO, patch 8;
>> - a lockdep warning, patch 9;
>>
>> I'm runing lvm2 tests with following scripts with a few rounds now,
>>
>> for t in `ls test/shell`; do
>> if cat test/shell/$t | grep raid &> /dev/null; then
>> make check T=shell/$t
>> fi
>> done
>>
>> There are no deadlock and no fs corrupt now, however, there are still four
>> failed tests:
>>
>> ### failed: [ndev-vanilla] shell/lvchange-raid1-writemostly.sh
>> ### failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
>> ### failed: [ndev-vanilla] shell/lvcreate-large-raid.sh
>> ### failed: [ndev-vanilla] shell/lvextend-raid.sh
>>
>> And failed reasons are the same:
>>
>> ## ERROR: The test started dmeventd (147856) unexpectedly
>>
>> I have no clue yet, and it seems other folks doesn't have this issue.
>>
>> Yu Kuai (9):
>> md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume
>> md: export helpers to stop sync_thread
>> md: export helper md_is_rdwr()
>> md: add a new helper reshape_interrupted()
>> dm-raid: really frozen sync_thread during suspend
>> md/dm-raid: don't call md_reap_sync_thread() directly
>> dm-raid: add a new helper prepare_suspend() in md_personality
>> dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io
>> concurrent with reshape
>> dm-raid: fix lockdep waring in "pers->hot_add_disk"
>>
>> drivers/md/dm-raid.c | 93 ++++++++++++++++++++++++++++++++++----------
>> drivers/md/md.c | 73 ++++++++++++++++++++++++++--------
>> drivers/md/md.h | 38 +++++++++++++++++-
>> drivers/md/raid5.c | 32 ++++++++++++++-
>> 4 files changed, 196 insertions(+), 40 deletions(-)
>>
>> --
>> 2.39.2
>>


2024-03-04 01:24:18

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2

Hi,

在 2024/03/04 9:07, Yu Kuai 写道:
> Hi,
>
> 在 2024/03/03 21:16, Xiao Ni 写道:
>> Hi all
>>
>> There is a error report from lvm regression tests. The case is
>> lvconvert-raid-reshape-stripes-load-reload.sh. I saw this error when I
>> tried to fix dmraid regression problems too. In my patch set,  after
>> reverting ad39c08186f8a0f221337985036ba86731d6aafe (md: Don't register
>> sync_thread for reshape directly), this problem doesn't appear.
>
> How often did you see this tes failed? I'm running the tests for over
> two days now, for 30+ rounds, and this test never fail in my VM.

Take a quick look, there is still a path from raid10 that
MD_RECOVERY_FROZEN can be cleared, and in theroy this problem can be
triggered. Can you test the following patch on the top of this set?
I'll keep running the test myself.

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index a5f8419e2df1..7ca29469123a 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4575,7 +4575,8 @@ static int raid10_start_reshape(struct mddev *mddev)
return 0;

abort:
- mddev->recovery = 0;
+ if (mddev->gendisk)
+ mddev->recovery = 0;
spin_lock_irq(&conf->device_lock);
conf->geo = conf->prev;
mddev->raid_disks = conf->geo.raid_disks;

Thanks,
Kuai
>
> Thanks,
> Kuai
>
>>
>> I put the log in the attachment.
>>
>> On Fri, Mar 1, 2024 at 6:03 PM Yu Kuai <[email protected]> wrote:
>>>
>>> From: Yu Kuai <[email protected]>
>>>
>>> link to part1:
>>> https://lore.kernel.org/all/CAPhsuW7u1UKHCDOBDhD7DzOVtkGemDz_QnJ4DUq_kSN-Q3G66Q@mail.gmail.com/
>>>
>>>
>>> part1 contains fixes for deadlocks for stopping sync_thread
>>>
>>> This set contains fixes:
>>>   - reshape can start unexpected, cause data corruption, patch 1,5,6;
>>>   - deadlocks that reshape concurrent with IO, patch 8;
>>>   - a lockdep warning, patch 9;
>>>
>>> I'm runing lvm2 tests with following scripts with a few rounds now,
>>>
>>> for t in `ls test/shell`; do
>>>          if cat test/shell/$t | grep raid &> /dev/null; then
>>>                  make check T=shell/$t
>>>          fi
>>> done
>>>
>>> There are no deadlock and no fs corrupt now, however, there are still
>>> four
>>> failed tests:
>>>
>>> ###       failed: [ndev-vanilla] shell/lvchange-raid1-writemostly.sh
>>> ###       failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
>>> ###       failed: [ndev-vanilla] shell/lvcreate-large-raid.sh
>>> ###       failed: [ndev-vanilla] shell/lvextend-raid.sh
>>>
>>> And failed reasons are the same:
>>>
>>> ## ERROR: The test started dmeventd (147856) unexpectedly
>>>
>>> I have no clue yet, and it seems other folks doesn't have this issue.
>>>
>>> Yu Kuai (9):
>>>    md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume
>>>    md: export helpers to stop sync_thread
>>>    md: export helper md_is_rdwr()
>>>    md: add a new helper reshape_interrupted()
>>>    dm-raid: really frozen sync_thread during suspend
>>>    md/dm-raid: don't call md_reap_sync_thread() directly
>>>    dm-raid: add a new helper prepare_suspend() in md_personality
>>>    dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io
>>>      concurrent with reshape
>>>    dm-raid: fix lockdep waring in "pers->hot_add_disk"
>>>
>>>   drivers/md/dm-raid.c | 93 ++++++++++++++++++++++++++++++++++----------
>>>   drivers/md/md.c      | 73 ++++++++++++++++++++++++++--------
>>>   drivers/md/md.h      | 38 +++++++++++++++++-
>>>   drivers/md/raid5.c   | 32 ++++++++++++++-
>>>   4 files changed, 196 insertions(+), 40 deletions(-)
>>>
>>> --
>>> 2.39.2
>>>
>
>
> .
>


2024-03-04 01:26:26

by Xiao Ni

[permalink] [raw]
Subject: Re: [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2

On Mon, Mar 4, 2024 at 9:24 AM Yu Kuai <[email protected]> wrote:
>
> Hi,
>
> 在 2024/03/04 9:07, Yu Kuai 写道:
> > Hi,
> >
> > 在 2024/03/03 21:16, Xiao Ni 写道:
> >> Hi all
> >>
> >> There is a error report from lvm regression tests. The case is
> >> lvconvert-raid-reshape-stripes-load-reload.sh. I saw this error when I
> >> tried to fix dmraid regression problems too. In my patch set, after
> >> reverting ad39c08186f8a0f221337985036ba86731d6aafe (md: Don't register
> >> sync_thread for reshape directly), this problem doesn't appear.
> >

Hi Kuai
> > How often did you see this tes failed? I'm running the tests for over
> > two days now, for 30+ rounds, and this test never fail in my VM.

I ran 5 times and it failed 2 times just now.

>
> Take a quick look, there is still a path from raid10 that
> MD_RECOVERY_FROZEN can be cleared, and in theroy this problem can be
> triggered. Can you test the following patch on the top of this set?
> I'll keep running the test myself.

Sure, I'll give the result later.

Regards
Xiao
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index a5f8419e2df1..7ca29469123a 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -4575,7 +4575,8 @@ static int raid10_start_reshape(struct mddev *mddev)
> return 0;
>
> abort:
> - mddev->recovery = 0;
> + if (mddev->gendisk)
> + mddev->recovery = 0;
> spin_lock_irq(&conf->device_lock);
> conf->geo = conf->prev;
> mddev->raid_disks = conf->geo.raid_disks;
>
> Thanks,
> Kuai
> >
> > Thanks,
> > Kuai
> >
> >>
> >> I put the log in the attachment.
> >>
> >> On Fri, Mar 1, 2024 at 6:03 PM Yu Kuai <[email protected]> wrote:
> >>>
> >>> From: Yu Kuai <[email protected]>
> >>>
> >>> link to part1:
> >>> https://lore.kernel.org/all/CAPhsuW7u1UKHCDOBDhD7DzOVtkGemDz_QnJ4DUq_kSN-Q3G66Q@mail.gmail.com/
> >>>
> >>>
> >>> part1 contains fixes for deadlocks for stopping sync_thread
> >>>
> >>> This set contains fixes:
> >>> - reshape can start unexpected, cause data corruption, patch 1,5,6;
> >>> - deadlocks that reshape concurrent with IO, patch 8;
> >>> - a lockdep warning, patch 9;
> >>>
> >>> I'm runing lvm2 tests with following scripts with a few rounds now,
> >>>
> >>> for t in `ls test/shell`; do
> >>> if cat test/shell/$t | grep raid &> /dev/null; then
> >>> make check T=shell/$t
> >>> fi
> >>> done
> >>>
> >>> There are no deadlock and no fs corrupt now, however, there are still
> >>> four
> >>> failed tests:
> >>>
> >>> ### failed: [ndev-vanilla] shell/lvchange-raid1-writemostly.sh
> >>> ### failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
> >>> ### failed: [ndev-vanilla] shell/lvcreate-large-raid.sh
> >>> ### failed: [ndev-vanilla] shell/lvextend-raid.sh
> >>>
> >>> And failed reasons are the same:
> >>>
> >>> ## ERROR: The test started dmeventd (147856) unexpectedly
> >>>
> >>> I have no clue yet, and it seems other folks doesn't have this issue.
> >>>
> >>> Yu Kuai (9):
> >>> md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume
> >>> md: export helpers to stop sync_thread
> >>> md: export helper md_is_rdwr()
> >>> md: add a new helper reshape_interrupted()
> >>> dm-raid: really frozen sync_thread during suspend
> >>> md/dm-raid: don't call md_reap_sync_thread() directly
> >>> dm-raid: add a new helper prepare_suspend() in md_personality
> >>> dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io
> >>> concurrent with reshape
> >>> dm-raid: fix lockdep waring in "pers->hot_add_disk"
> >>>
> >>> drivers/md/dm-raid.c | 93 ++++++++++++++++++++++++++++++++++----------
> >>> drivers/md/md.c | 73 ++++++++++++++++++++++++++--------
> >>> drivers/md/md.h | 38 +++++++++++++++++-
> >>> drivers/md/raid5.c | 32 ++++++++++++++-
> >>> 4 files changed, 196 insertions(+), 40 deletions(-)
> >>>
> >>> --
> >>> 2.39.2
> >>>
> >
> >
> > .
> >
>


2024-03-04 08:28:29

by Xiao Ni

[permalink] [raw]
Subject: Re: [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2

On Mon, Mar 4, 2024 at 9:25 AM Xiao Ni <[email protected]> wrote:
>
> On Mon, Mar 4, 2024 at 9:24 AM Yu Kuai <[email protected]> wrote:
> >
> > Hi,
> >
> > 在 2024/03/04 9:07, Yu Kuai 写道:
> > > Hi,
> > >
> > > 在 2024/03/03 21:16, Xiao Ni 写道:
> > >> Hi all
> > >>
> > >> There is a error report from lvm regression tests. The case is
> > >> lvconvert-raid-reshape-stripes-load-reload.sh. I saw this error when I
> > >> tried to fix dmraid regression problems too. In my patch set, after
> > >> reverting ad39c08186f8a0f221337985036ba86731d6aafe (md: Don't register
> > >> sync_thread for reshape directly), this problem doesn't appear.
> > >
>
> Hi Kuai
> > > How often did you see this tes failed? I'm running the tests for over
> > > two days now, for 30+ rounds, and this test never fail in my VM.
>
> I ran 5 times and it failed 2 times just now.
>
> >
> > Take a quick look, there is still a path from raid10 that
> > MD_RECOVERY_FROZEN can be cleared, and in theroy this problem can be
> > triggered. Can you test the following patch on the top of this set?
> > I'll keep running the test myself.
>
> Sure, I'll give the result later.

Hi all

It's not stable to reproduce this. After applying this raid10 patch it
failed once 28 times. Without the raid10 patch, it failed once 30
times, but it failed frequently this morning.

Regards
Xiao
>
> Regards
> Xiao
> >
> > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> > index a5f8419e2df1..7ca29469123a 100644
> > --- a/drivers/md/raid10.c
> > +++ b/drivers/md/raid10.c
> > @@ -4575,7 +4575,8 @@ static int raid10_start_reshape(struct mddev *mddev)
> > return 0;
> >
> > abort:
> > - mddev->recovery = 0;
> > + if (mddev->gendisk)
> > + mddev->recovery = 0;
> > spin_lock_irq(&conf->device_lock);
> > conf->geo = conf->prev;
> > mddev->raid_disks = conf->geo.raid_disks;
> >
> > Thanks,
> > Kuai
> > >
> > > Thanks,
> > > Kuai
> > >
> > >>
> > >> I put the log in the attachment.
> > >>
> > >> On Fri, Mar 1, 2024 at 6:03 PM Yu Kuai <[email protected]> wrote:
> > >>>
> > >>> From: Yu Kuai <[email protected]>
> > >>>
> > >>> link to part1:
> > >>> https://lore.kernel.org/all/CAPhsuW7u1UKHCDOBDhD7DzOVtkGemDz_QnJ4DUq_kSN-Q3G66Q@mail.gmail.com/
> > >>>
> > >>>
> > >>> part1 contains fixes for deadlocks for stopping sync_thread
> > >>>
> > >>> This set contains fixes:
> > >>> - reshape can start unexpected, cause data corruption, patch 1,5,6;
> > >>> - deadlocks that reshape concurrent with IO, patch 8;
> > >>> - a lockdep warning, patch 9;
> > >>>
> > >>> I'm runing lvm2 tests with following scripts with a few rounds now,
> > >>>
> > >>> for t in `ls test/shell`; do
> > >>> if cat test/shell/$t | grep raid &> /dev/null; then
> > >>> make check T=shell/$t
> > >>> fi
> > >>> done
> > >>>
> > >>> There are no deadlock and no fs corrupt now, however, there are still
> > >>> four
> > >>> failed tests:
> > >>>
> > >>> ### failed: [ndev-vanilla] shell/lvchange-raid1-writemostly.sh
> > >>> ### failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
> > >>> ### failed: [ndev-vanilla] shell/lvcreate-large-raid.sh
> > >>> ### failed: [ndev-vanilla] shell/lvextend-raid.sh
> > >>>
> > >>> And failed reasons are the same:
> > >>>
> > >>> ## ERROR: The test started dmeventd (147856) unexpectedly
> > >>>
> > >>> I have no clue yet, and it seems other folks doesn't have this issue.
> > >>>
> > >>> Yu Kuai (9):
> > >>> md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume
> > >>> md: export helpers to stop sync_thread
> > >>> md: export helper md_is_rdwr()
> > >>> md: add a new helper reshape_interrupted()
> > >>> dm-raid: really frozen sync_thread during suspend
> > >>> md/dm-raid: don't call md_reap_sync_thread() directly
> > >>> dm-raid: add a new helper prepare_suspend() in md_personality
> > >>> dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io
> > >>> concurrent with reshape
> > >>> dm-raid: fix lockdep waring in "pers->hot_add_disk"
> > >>>
> > >>> drivers/md/dm-raid.c | 93 ++++++++++++++++++++++++++++++++++----------
> > >>> drivers/md/md.c | 73 ++++++++++++++++++++++++++--------
> > >>> drivers/md/md.h | 38 +++++++++++++++++-
> > >>> drivers/md/raid5.c | 32 ++++++++++++++-
> > >>> 4 files changed, 196 insertions(+), 40 deletions(-)
> > >>>
> > >>> --
> > >>> 2.39.2
> > >>>
> > >
> > >
> > > .
> > >
> >


2024-03-04 11:06:50

by Xiao Ni

[permalink] [raw]
Subject: Re: [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2

On Mon, Mar 4, 2024 at 4:27 PM Xiao Ni <[email protected]> wrote:
>
> On Mon, Mar 4, 2024 at 9:25 AM Xiao Ni <[email protected]> wrote:
> >
> > On Mon, Mar 4, 2024 at 9:24 AM Yu Kuai <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > 在 2024/03/04 9:07, Yu Kuai 写道:
> > > > Hi,
> > > >
> > > > 在 2024/03/03 21:16, Xiao Ni 写道:
> > > >> Hi all
> > > >>
> > > >> There is a error report from lvm regression tests. The case is
> > > >> lvconvert-raid-reshape-stripes-load-reload.sh. I saw this error when I
> > > >> tried to fix dmraid regression problems too. In my patch set, after
> > > >> reverting ad39c08186f8a0f221337985036ba86731d6aafe (md: Don't register
> > > >> sync_thread for reshape directly), this problem doesn't appear.
> > > >
> >
> > Hi Kuai
> > > > How often did you see this tes failed? I'm running the tests for over
> > > > two days now, for 30+ rounds, and this test never fail in my VM.
> >
> > I ran 5 times and it failed 2 times just now.
> >
> > >
> > > Take a quick look, there is still a path from raid10 that
> > > MD_RECOVERY_FROZEN can be cleared, and in theroy this problem can be
> > > triggered. Can you test the following patch on the top of this set?
> > > I'll keep running the test myself.
> >
> > Sure, I'll give the result later.
>
> Hi all
>
> It's not stable to reproduce this. After applying this raid10 patch it
> failed once 28 times. Without the raid10 patch, it failed once 30
> times, but it failed frequently this morning.

Hi all

After running 152 times with kernel 6.6, the problem can appear too.
So it can return the state of 6.6. This patch set can make this
problem appear quickly.

Best Regards
Xiao


>
> Regards
> Xiao
> >
> > Regards
> > Xiao
> > >
> > > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> > > index a5f8419e2df1..7ca29469123a 100644
> > > --- a/drivers/md/raid10.c
> > > +++ b/drivers/md/raid10.c
> > > @@ -4575,7 +4575,8 @@ static int raid10_start_reshape(struct mddev *mddev)
> > > return 0;
> > >
> > > abort:
> > > - mddev->recovery = 0;
> > > + if (mddev->gendisk)
> > > + mddev->recovery = 0;
> > > spin_lock_irq(&conf->device_lock);
> > > conf->geo = conf->prev;
> > > mddev->raid_disks = conf->geo.raid_disks;
> > >
> > > Thanks,
> > > Kuai
> > > >
> > > > Thanks,
> > > > Kuai
> > > >
> > > >>
> > > >> I put the log in the attachment.
> > > >>
> > > >> On Fri, Mar 1, 2024 at 6:03 PM Yu Kuai <[email protected]> wrote:
> > > >>>
> > > >>> From: Yu Kuai <[email protected]>
> > > >>>
> > > >>> link to part1:
> > > >>> https://lore.kernel.org/all/CAPhsuW7u1UKHCDOBDhD7DzOVtkGemDz_QnJ4DUq_kSN-Q3G66Q@mail.gmail.com/
> > > >>>
> > > >>>
> > > >>> part1 contains fixes for deadlocks for stopping sync_thread
> > > >>>
> > > >>> This set contains fixes:
> > > >>> - reshape can start unexpected, cause data corruption, patch 1,5,6;
> > > >>> - deadlocks that reshape concurrent with IO, patch 8;
> > > >>> - a lockdep warning, patch 9;
> > > >>>
> > > >>> I'm runing lvm2 tests with following scripts with a few rounds now,
> > > >>>
> > > >>> for t in `ls test/shell`; do
> > > >>> if cat test/shell/$t | grep raid &> /dev/null; then
> > > >>> make check T=shell/$t
> > > >>> fi
> > > >>> done
> > > >>>
> > > >>> There are no deadlock and no fs corrupt now, however, there are still
> > > >>> four
> > > >>> failed tests:
> > > >>>
> > > >>> ### failed: [ndev-vanilla] shell/lvchange-raid1-writemostlysh
> > > >>> ### failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
> > > >>> ### failed: [ndev-vanilla] shell/lvcreate-large-raid.sh
> > > >>> ### failed: [ndev-vanilla] shell/lvextend-raid.sh
> > > >>>
> > > >>> And failed reasons are the same:
> > > >>>
> > > >>> ## ERROR: The test started dmeventd (147856) unexpectedly
> > > >>>
> > > >>> I have no clue yet, and it seems other folks doesn't have this issue.
> > > >>>
> > > >>> Yu Kuai (9):
> > > >>> md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume
> > > >>> md: export helpers to stop sync_thread
> > > >>> md: export helper md_is_rdwr()
> > > >>> md: add a new helper reshape_interrupted()
> > > >>> dm-raid: really frozen sync_thread during suspend
> > > >>> md/dm-raid: don't call md_reap_sync_thread() directly
> > > >>> dm-raid: add a new helper prepare_suspend() in md_personality
> > > >>> dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io
> > > >>> concurrent with reshape
> > > >>> dm-raid: fix lockdep waring in "pers->hot_add_disk"
> > > >>>
> > > >>> drivers/md/dm-raid.c | 93 ++++++++++++++++++++++++++++++++++----------
> > > >>> drivers/md/md.c | 73 ++++++++++++++++++++++++++--------
> > > >>> drivers/md/md.h | 38 +++++++++++++++++-
> > > >>> drivers/md/raid5.c | 32 ++++++++++++++-
> > > >>> 4 files changed, 196 insertions(+), 40 deletions(-)
> > > >>>
> > > >>> --
> > > >>> 2.39.2
> > > >>>
> > > >
> > > >
> > > > .
> > > >
> > >


2024-03-04 11:53:17

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2

Hi,

在 2024/03/04 19:06, Xiao Ni 写道:
> On Mon, Mar 4, 2024 at 4:27 PM Xiao Ni <[email protected]> wrote:
>>
>> On Mon, Mar 4, 2024 at 9:25 AM Xiao Ni <[email protected]> wrote:
>>>
>>> On Mon, Mar 4, 2024 at 9:24 AM Yu Kuai <[email protected]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> 在 2024/03/04 9:07, Yu Kuai 写道:
>>>>> Hi,
>>>>>
>>>>> 在 2024/03/03 21:16, Xiao Ni 写道:
>>>>>> Hi all
>>>>>>
>>>>>> There is a error report from lvm regression tests. The case is
>>>>>> lvconvert-raid-reshape-stripes-load-reload.sh. I saw this error when I
>>>>>> tried to fix dmraid regression problems too. In my patch set, after
>>>>>> reverting ad39c08186f8a0f221337985036ba86731d6aafe (md: Don't register
>>>>>> sync_thread for reshape directly), this problem doesn't appear.
>>>>>
>>>
>>> Hi Kuai
>>>>> How often did you see this tes failed? I'm running the tests for over
>>>>> two days now, for 30+ rounds, and this test never fail in my VM.
>>>
>>> I ran 5 times and it failed 2 times just now.
>>>
>>>>
>>>> Take a quick look, there is still a path from raid10 that
>>>> MD_RECOVERY_FROZEN can be cleared, and in theroy this problem can be
>>>> triggered. Can you test the following patch on the top of this set?
>>>> I'll keep running the test myself.
>>>
>>> Sure, I'll give the result later.
>>
>> Hi all
>>
>> It's not stable to reproduce this. After applying this raid10 patch it
>> failed once 28 times. Without the raid10 patch, it failed once 30
>> times, but it failed frequently this morning.
>
> Hi all
>
> After running 152 times with kernel 6.6, the problem can appear too.
> So it can return the state of 6.6. This patch set can make this
> problem appear quickly.

I verified in my VM that after testing 100+ times, this problem can both
triggered with v6.6 and v6.8-rc5 + this set.

I think we can merge this patchset, and figure out why the test can fail
later.

Thanks,
Kuai


>
> Best Regards
> Xiao
>
>
>>
>> Regards
>> Xiao
>>>
>>> Regards
>>> Xiao
>>>>
>>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>>>> index a5f8419e2df1..7ca29469123a 100644
>>>> --- a/drivers/md/raid10.c
>>>> +++ b/drivers/md/raid10.c
>>>> @@ -4575,7 +4575,8 @@ static int raid10_start_reshape(struct mddev *mddev)
>>>> return 0;
>>>>
>>>> abort:
>>>> - mddev->recovery = 0;
>>>> + if (mddev->gendisk)
>>>> + mddev->recovery = 0;
>>>> spin_lock_irq(&conf->device_lock);
>>>> conf->geo = conf->prev;
>>>> mddev->raid_disks = conf->geo.raid_disks;
>>>>
>>>> Thanks,
>>>> Kuai
>>>>>
>>>>> Thanks,
>>>>> Kuai
>>>>>
>>>>>>
>>>>>> I put the log in the attachment.
>>>>>>
>>>>>> On Fri, Mar 1, 2024 at 6:03 PM Yu Kuai <[email protected]> wrote:
>>>>>>>
>>>>>>> From: Yu Kuai <[email protected]>
>>>>>>>
>>>>>>> link to part1:
>>>>>>> https://lore.kernel.org/all/CAPhsuW7u1UKHCDOBDhD7DzOVtkGemDz_QnJ4DUq_kSN-Q3G66Q@mail.gmail.com/
>>>>>>>
>>>>>>>
>>>>>>> part1 contains fixes for deadlocks for stopping sync_thread
>>>>>>>
>>>>>>> This set contains fixes:
>>>>>>> - reshape can start unexpected, cause data corruption, patch 1,5,6;
>>>>>>> - deadlocks that reshape concurrent with IO, patch 8;
>>>>>>> - a lockdep warning, patch 9;
>>>>>>>
>>>>>>> I'm runing lvm2 tests with following scripts with a few rounds now,
>>>>>>>
>>>>>>> for t in `ls test/shell`; do
>>>>>>> if cat test/shell/$t | grep raid &> /dev/null; then
>>>>>>> make check T=shell/$t
>>>>>>> fi
>>>>>>> done
>>>>>>>
>>>>>>> There are no deadlock and no fs corrupt now, however, there are still
>>>>>>> four
>>>>>>> failed tests:
>>>>>>>
>>>>>>> ### failed: [ndev-vanilla] shell/lvchange-raid1-writemostly.sh
>>>>>>> ### failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
>>>>>>> ### failed: [ndev-vanilla] shell/lvcreate-large-raid.sh
>>>>>>> ### failed: [ndev-vanilla] shell/lvextend-raid.sh
>>>>>>>
>>>>>>> And failed reasons are the same:
>>>>>>>
>>>>>>> ## ERROR: The test started dmeventd (147856) unexpectedly
>>>>>>>
>>>>>>> I have no clue yet, and it seems other folks doesn't have this issue.
>>>>>>>
>>>>>>> Yu Kuai (9):
>>>>>>> md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume
>>>>>>> md: export helpers to stop sync_thread
>>>>>>> md: export helper md_is_rdwr()
>>>>>>> md: add a new helper reshape_interrupted()
>>>>>>> dm-raid: really frozen sync_thread during suspend
>>>>>>> md/dm-raid: don't call md_reap_sync_thread() directly
>>>>>>> dm-raid: add a new helper prepare_suspend() in md_personality
>>>>>>> dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io
>>>>>>> concurrent with reshape
>>>>>>> dm-raid: fix lockdep waring in "pers->hot_add_disk"
>>>>>>>
>>>>>>> drivers/md/dm-raid.c | 93 ++++++++++++++++++++++++++++++++++----------
>>>>>>> drivers/md/md.c | 73 ++++++++++++++++++++++++++--------
>>>>>>> drivers/md/md.h | 38 +++++++++++++++++-
>>>>>>> drivers/md/raid5.c | 32 ++++++++++++++-
>>>>>>> 4 files changed, 196 insertions(+), 40 deletions(-)
>>>>>>>
>>>>>>> --
>>>>>>> 2.39.2
>>>>>>>
>>>>>
>>>>>
>>>>> .
>>>>>
>>>>
>
> .
>