2023-03-15 06:19:05

by Yu Kuai

[permalink] [raw]
Subject: [PATCH v2 0/5] md: fix uaf for sync_thread

From: Yu Kuai <[email protected]>

Changes in v2:
- fix a compile error for for md-cluster in patch 2
- replace spin_lock/unlock with spin_lock/unlock_irq in patch 5
- don't wake up inside the new lock in md wakeup_thread in patch 5

Our test reports a uaf for 'mddev->sync_thread':

T1 T2
md_start_sync
md_register_thread
raid1d
md_check_recovery
md_reap_sync_thread
md_unregister_thread
kfree

md_wakeup_thread
wake_up
->sync_thread was freed

Currently, a global spinlock 'pers_lock' is borrowed to protect
'mddev->thread', this problem can be fixed likewise, however, there might
be similar problem for other md_thread, and I really don't like the idea to
borrow a global lock.

This patchset do some refactor, and then use a disk level spinlock to
protect md_thread in relevant apis.

I tested this pathset with mdadm tests, and there are no new regression,
by the way, following test will failed with or without this patchset:

01raid6integ
04r1update
05r6tor0
10ddf-create
10ddf-fail-spare
10ddf-fail-stop-readd
10ddf-geometry

Yu Kuai (5):
md: pass a md_thread pointer to md_register_thread()
md: refactor md_wakeup_thread()
md: use md_thread api to wake up sync_thread
md: pass a mddev to md_unregister_thread()
md: protect md_thread with a new disk level spin lock

drivers/md/dm-raid.c | 6 +-
drivers/md/md-bitmap.c | 6 +-
drivers/md/md-cluster.c | 39 +++++----
drivers/md/md-multipath.c | 8 +-
drivers/md/md.c | 162 ++++++++++++++++++++------------------
drivers/md/md.h | 15 ++--
drivers/md/raid1.c | 19 +++--
drivers/md/raid10.c | 31 ++++----
drivers/md/raid5-cache.c | 19 +++--
drivers/md/raid5-ppl.c | 2 +-
drivers/md/raid5.c | 48 ++++++-----
11 files changed, 177 insertions(+), 178 deletions(-)

--
2.31.1



2023-03-15 06:19:08

by Yu Kuai

[permalink] [raw]
Subject: [PATCH v2 1/5] md: pass a md_thread pointer to md_register_thread()

From: Yu Kuai <[email protected]>

Prepare to use a disk level spinlock to protect md_thread, there are no
functional changes.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md-cluster.c | 11 +++++------
drivers/md/md-multipath.c | 6 +++---
drivers/md/md.c | 27 ++++++++++++++-------------
drivers/md/md.h | 7 +++----
drivers/md/raid1.c | 5 ++---
drivers/md/raid10.c | 15 ++++++---------
drivers/md/raid5-cache.c | 5 ++---
drivers/md/raid5.c | 15 ++++++---------
8 files changed, 41 insertions(+), 50 deletions(-)

diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 10e0c5381d01..c19e29cb73bf 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -362,9 +362,8 @@ static void __recover_slot(struct mddev *mddev, int slot)

set_bit(slot, &cinfo->recovery_map);
if (!cinfo->recovery_thread) {
- cinfo->recovery_thread = md_register_thread(recover_bitmaps,
- mddev, "recover");
- if (!cinfo->recovery_thread) {
+ if (md_register_thread(&cinfo->recovery_thread, recover_bitmaps,
+ mddev, "recover")) {
pr_warn("md-cluster: Could not create recovery thread\n");
return;
}
@@ -888,9 +887,9 @@ static int join(struct mddev *mddev, int nodes)
goto err;
}
/* Initiate the communication resources */
- ret = -ENOMEM;
- cinfo->recv_thread = md_register_thread(recv_daemon, mddev, "cluster_recv");
- if (!cinfo->recv_thread) {
+ ret = md_register_thread(&cinfo->recv_thread, recv_daemon, mddev,
+ "cluster_recv");
+ if (ret) {
pr_err("md-cluster: cannot allocate memory for recv_thread!\n");
goto err;
}
diff --git a/drivers/md/md-multipath.c b/drivers/md/md-multipath.c
index 66edf5e72bd6..ceec9e4b2a60 100644
--- a/drivers/md/md-multipath.c
+++ b/drivers/md/md-multipath.c
@@ -400,9 +400,9 @@ static int multipath_run (struct mddev *mddev)
if (ret)
goto out_free_conf;

- mddev->thread = md_register_thread(multipathd, mddev,
- "multipath");
- if (!mddev->thread)
+ ret = md_register_thread(&mddev->thread, multipathd, mddev,
+ "multipath");
+ if (ret)
goto out_free_conf;

pr_info("multipath: array %s active with %d out of %d IO paths\n",
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 98970bbe32bf..0bbdde29a41f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7896,29 +7896,32 @@ void md_wakeup_thread(struct md_thread *thread)
}
EXPORT_SYMBOL(md_wakeup_thread);

-struct md_thread *md_register_thread(void (*run) (struct md_thread *),
- struct mddev *mddev, const char *name)
+int md_register_thread(struct md_thread **threadp,
+ void (*run)(struct md_thread *),
+ struct mddev *mddev, const char *name)
{
struct md_thread *thread;

thread = kzalloc(sizeof(struct md_thread), GFP_KERNEL);
if (!thread)
- return NULL;
+ return -ENOMEM;

init_waitqueue_head(&thread->wqueue);

thread->run = run;
thread->mddev = mddev;
thread->timeout = MAX_SCHEDULE_TIMEOUT;
- thread->tsk = kthread_run(md_thread, thread,
- "%s_%s",
- mdname(thread->mddev),
- name);
+ thread->tsk = kthread_run(md_thread, thread, "%s_%s",
+ mdname(thread->mddev), name);
if (IS_ERR(thread->tsk)) {
+ int err = PTR_ERR(thread->tsk);
+
kfree(thread);
- return NULL;
+ return err;
}
- return thread;
+
+ *threadp = thread;
+ return 0;
}
EXPORT_SYMBOL(md_register_thread);

@@ -9199,10 +9202,8 @@ static void md_start_sync(struct work_struct *ws)
{
struct mddev *mddev = container_of(ws, struct mddev, del_work);

- mddev->sync_thread = md_register_thread(md_do_sync,
- mddev,
- "resync");
- if (!mddev->sync_thread) {
+ if (md_register_thread(&mddev->sync_thread, md_do_sync, mddev,
+ "resync")) {
pr_warn("%s: could not start resync thread...\n",
mdname(mddev));
/* leave the spares where they are, it shouldn't hurt */
diff --git a/drivers/md/md.h b/drivers/md/md.h
index e148e3c83b0d..344e055e4d0f 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -730,10 +730,9 @@ extern int register_md_cluster_operations(struct md_cluster_operations *ops,
extern int unregister_md_cluster_operations(void);
extern int md_setup_cluster(struct mddev *mddev, int nodes);
extern void md_cluster_stop(struct mddev *mddev);
-extern struct md_thread *md_register_thread(
- void (*run)(struct md_thread *thread),
- struct mddev *mddev,
- const char *name);
+int md_register_thread(struct md_thread **threadp,
+ void (*run)(struct md_thread *thread),
+ struct mddev *mddev, const char *name);
extern void md_unregister_thread(struct md_thread **threadp);
extern void md_wakeup_thread(struct md_thread *thread);
extern void md_check_recovery(struct mddev *mddev);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 68a9e2d9985b..1217c1db0a40 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -3083,9 +3083,8 @@ static struct r1conf *setup_conf(struct mddev *mddev)
}
}

- err = -ENOMEM;
- conf->thread = md_register_thread(raid1d, mddev, "raid1");
- if (!conf->thread)
+ err = md_register_thread(&conf->thread, raid1d, mddev, "raid1");
+ if (err)
goto abort;

return conf;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 2f1522bba80d..f1e54c62f930 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4096,9 +4096,8 @@ static struct r10conf *setup_conf(struct mddev *mddev)
init_waitqueue_head(&conf->wait_barrier);
atomic_set(&conf->nr_pending, 0);

- err = -ENOMEM;
- conf->thread = md_register_thread(raid10d, mddev, "raid10");
- if (!conf->thread)
+ err = md_register_thread(&conf->thread, raid10d, mddev, "raid10");
+ if (err)
goto out;

conf->mddev = mddev;
@@ -4286,9 +4285,8 @@ static int raid10_run(struct mddev *mddev)
clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
- mddev->sync_thread = md_register_thread(md_do_sync, mddev,
- "reshape");
- if (!mddev->sync_thread)
+ if (md_register_thread(&mddev->sync_thread, md_do_sync, mddev,
+ "reshape"))
goto out_free_conf;
}

@@ -4688,9 +4686,8 @@ static int raid10_start_reshape(struct mddev *mddev)
set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);

- mddev->sync_thread = md_register_thread(md_do_sync, mddev,
- "reshape");
- if (!mddev->sync_thread) {
+ if (md_register_thread(&mddev->sync_thread, md_do_sync, mddev,
+ "reshape")) {
ret = -EAGAIN;
goto abort;
}
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 46182b955aef..0464d4d551fc 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -3121,9 +3121,8 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
spin_lock_init(&log->tree_lock);
INIT_RADIX_TREE(&log->big_stripe_tree, GFP_NOWAIT | __GFP_NOWARN);

- log->reclaim_thread = md_register_thread(r5l_reclaim_thread,
- log->rdev->mddev, "reclaim");
- if (!log->reclaim_thread)
+ if (md_register_thread(&log->reclaim_thread, r5l_reclaim_thread,
+ log->rdev->mddev, "reclaim"))
goto reclaim_thread;
log->reclaim_thread->timeout = R5C_RECLAIM_WAKEUP_INTERVAL;

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 7b820b81d8c2..04b1093195d0 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7665,11 +7665,10 @@ static struct r5conf *setup_conf(struct mddev *mddev)
}

sprintf(pers_name, "raid%d", mddev->new_level);
- conf->thread = md_register_thread(raid5d, mddev, pers_name);
- if (!conf->thread) {
+ ret = md_register_thread(&conf->thread, raid5d, mddev, pers_name);
+ if (ret) {
pr_warn("md/raid:%s: couldn't allocate thread.\n",
mdname(mddev));
- ret = -ENOMEM;
goto abort;
}

@@ -7989,9 +7988,8 @@ static int raid5_run(struct mddev *mddev)
clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
- mddev->sync_thread = md_register_thread(md_do_sync, mddev,
- "reshape");
- if (!mddev->sync_thread)
+ if (md_register_thread(&mddev->sync_thread, md_do_sync, mddev,
+ "reshape"))
goto abort;
}

@@ -8567,9 +8565,8 @@ static int raid5_start_reshape(struct mddev *mddev)
clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
- mddev->sync_thread = md_register_thread(md_do_sync, mddev,
- "reshape");
- if (!mddev->sync_thread) {
+ if (md_register_thread(&mddev->sync_thread, md_do_sync, mddev,
+ "reshape")) {
mddev->recovery = 0;
spin_lock_irq(&conf->device_lock);
write_seqcount_begin(&conf->gen_lock);
--
2.31.1


2023-03-15 06:19:12

by Yu Kuai

[permalink] [raw]
Subject: [PATCH v2 3/5] md: use md_thread api to wake up sync_thread

From: Yu Kuai <[email protected]>

Instead of wake_up_process() directly, convert to use
md_wakeup_thread().

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 97e87df4ee43..4ecfd0508afb 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6267,10 +6267,12 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
}
if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
- if (mddev->sync_thread)
- /* Thread might be blocked waiting for metadata update
- * which will now never happen */
- wake_up_process(mddev->sync_thread->tsk);
+
+ /*
+ * Thread might be blocked waiting for metadata update
+ * which will now never happen
+ */
+ md_wakeup_thread(&mddev->sync_thread, mddev);

if (mddev->external && test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
return -EBUSY;
@@ -6331,10 +6333,12 @@ static int do_md_stop(struct mddev *mddev, int mode,
}
if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
- if (mddev->sync_thread)
- /* Thread might be blocked waiting for metadata update
- * which will now never happen */
- wake_up_process(mddev->sync_thread->tsk);
+
+ /*
+ * Thread might be blocked waiting for metadata update
+ * which will now never happen
+ */
+ md_wakeup_thread(&mddev->sync_thread, mddev);

mddev_unlock(mddev);
wait_event(resync_wait, (mddev->sync_thread == NULL &&
--
2.31.1


2023-03-15 06:19:15

by Yu Kuai

[permalink] [raw]
Subject: [PATCH v2 4/5] md: pass a mddev to md_unregister_thread()

From: Yu Kuai <[email protected]>

Prepare to use a disk level spinlock to protect md_thread, there are no
functional changes.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/dm-raid.c | 2 +-
drivers/md/md-cluster.c | 8 ++++----
drivers/md/md.c | 13 +++++++------
drivers/md/md.h | 3 ++-
drivers/md/raid1.c | 4 ++--
drivers/md/raid10.c | 2 +-
drivers/md/raid5-cache.c | 2 +-
drivers/md/raid5.c | 2 +-
8 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 257c9c9f2b4d..1393c80b083b 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3729,7 +3729,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
if (mddev->sync_thread) {
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
- md_unregister_thread(&mddev->sync_thread);
+ md_unregister_thread(&mddev->sync_thread, mddev);
md_reap_sync_thread(mddev);
}
} else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 85fbcf5bae27..064211fe9830 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -946,8 +946,8 @@ static int join(struct mddev *mddev, int nodes)
return 0;
err:
set_bit(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, &cinfo->state);
- md_unregister_thread(&cinfo->recovery_thread);
- md_unregister_thread(&cinfo->recv_thread);
+ md_unregister_thread(&cinfo->recovery_thread, mddev);
+ md_unregister_thread(&cinfo->recv_thread, mddev);
lockres_free(cinfo->message_lockres);
lockres_free(cinfo->token_lockres);
lockres_free(cinfo->ack_lockres);
@@ -1009,8 +1009,8 @@ static int leave(struct mddev *mddev)
resync_bitmap(mddev);

set_bit(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, &cinfo->state);
- md_unregister_thread(&cinfo->recovery_thread);
- md_unregister_thread(&cinfo->recv_thread);
+ md_unregister_thread(&cinfo->recovery_thread, mddev);
+ md_unregister_thread(&cinfo->recv_thread, mddev);
lockres_free(cinfo->message_lockres);
lockres_free(cinfo->token_lockres);
lockres_free(cinfo->ack_lockres);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 4ecfd0508afb..ab9299187cfe 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4775,7 +4775,8 @@ action_store(struct mddev *mddev, const char *page, size_t len)

mddev_unlock(mddev);
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
- md_unregister_thread(&mddev->sync_thread);
+ md_unregister_thread(&mddev->sync_thread,
+ mddev);
mddev_lock_nointr(mddev);
/*
* set RECOVERY_INTR again and restore reshape
@@ -6175,7 +6176,7 @@ static void __md_stop_writes(struct mddev *mddev)
flush_workqueue(md_misc_wq);
if (mddev->sync_thread) {
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
- md_unregister_thread(&mddev->sync_thread);
+ md_unregister_thread(&mddev->sync_thread, mddev);
md_reap_sync_thread(mddev);
}

@@ -6215,7 +6216,7 @@ static void mddev_detach(struct mddev *mddev)
mddev->pers->quiesce(mddev, 1);
mddev->pers->quiesce(mddev, 0);
}
- md_unregister_thread(&mddev->thread);
+ md_unregister_thread(&mddev->thread, mddev);
if (mddev->queue)
blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/
}
@@ -7933,7 +7934,7 @@ int md_register_thread(struct md_thread **threadp,
}
EXPORT_SYMBOL(md_register_thread);

-void md_unregister_thread(struct md_thread **threadp)
+void md_unregister_thread(struct md_thread **threadp, struct mddev *mddev)
{
struct md_thread *thread;

@@ -9324,7 +9325,7 @@ void md_check_recovery(struct mddev *mddev)
* ->spare_active and clear saved_raid_disk
*/
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
- md_unregister_thread(&mddev->sync_thread);
+ md_unregister_thread(&mddev->sync_thread, mddev);
md_reap_sync_thread(mddev);
clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
@@ -9360,7 +9361,7 @@ void md_check_recovery(struct mddev *mddev)
goto unlock;
}
if (mddev->sync_thread) {
- md_unregister_thread(&mddev->sync_thread);
+ md_unregister_thread(&mddev->sync_thread, mddev);
md_reap_sync_thread(mddev);
goto unlock;
}
diff --git a/drivers/md/md.h b/drivers/md/md.h
index aeb2fc6b65c7..8f4137ad2dde 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -733,7 +733,8 @@ extern void md_cluster_stop(struct mddev *mddev);
int md_register_thread(struct md_thread **threadp,
void (*run)(struct md_thread *thread),
struct mddev *mddev, const char *name);
-extern void md_unregister_thread(struct md_thread **threadp);
+extern void md_unregister_thread(struct md_thread **threadp,
+ struct mddev *mddev);
extern void md_wakeup_thread(struct md_thread **threadp, struct mddev *mddev);
extern void md_check_recovery(struct mddev *mddev);
extern void md_reap_sync_thread(struct mddev *mddev);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 391ff239c711..8329a1ba9d12 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -3158,7 +3158,7 @@ static int raid1_run(struct mddev *mddev)
* RAID1 needs at least one disk in active
*/
if (conf->raid_disks - mddev->degraded < 1) {
- md_unregister_thread(&conf->thread);
+ md_unregister_thread(&conf->thread, mddev);
ret = -EINVAL;
goto abort;
}
@@ -3185,7 +3185,7 @@ static int raid1_run(struct mddev *mddev)

ret = md_integrity_register(mddev);
if (ret) {
- md_unregister_thread(&mddev->thread);
+ md_unregister_thread(&mddev->thread, mddev);
goto abort;
}
return 0;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 920e5722040f..47d18d56000e 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4293,7 +4293,7 @@ static int raid10_run(struct mddev *mddev)
return 0;

out_free_conf:
- md_unregister_thread(&mddev->thread);
+ md_unregister_thread(&mddev->thread, mddev);
raid10_free_conf(conf);
mddev->private = NULL;
out:
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index d6ee6a7a83b7..588c3d1f7467 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -3166,7 +3166,7 @@ void r5l_exit_log(struct r5conf *conf)
/* Ensure disable_writeback_work wakes up and exits */
wake_up(&conf->mddev->sb_wait);
flush_work(&log->disable_writeback_work);
- md_unregister_thread(&log->reclaim_thread);
+ md_unregister_thread(&log->reclaim_thread, conf->mddev);

conf->log = NULL;

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 2c0695d41436..b9f2688b141f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -8070,7 +8070,7 @@ static int raid5_run(struct mddev *mddev)

return 0;
abort:
- md_unregister_thread(&mddev->thread);
+ md_unregister_thread(&mddev->thread, mddev);
print_raid5_conf(conf);
free_conf(conf);
mddev->private = NULL;
--
2.31.1


2023-03-15 06:19:19

by Yu Kuai

[permalink] [raw]
Subject: [PATCH v2 5/5] md: protect md_thread with a new disk level spin lock

From: Yu Kuai <[email protected]>

Our test reports a uaf for 'mddev->sync_thread':

T1 T2
md_start_sync
md_register_thread
raid1d
md_check_recovery
md_reap_sync_thread
md_unregister_thread
kfree

md_wakeup_thread
wake_up
->sync_thread was freed

Currently, a global spinlock 'pers_lock' is borrowed to protect
'mddev->thread', this problem can be fixed likewise, however, there might
be similar problem for other md_thread, and I really don't like the idea to
borrow a global lock.

This patch use a disk level spinlock to protect md_thread in relevant apis.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 28 ++++++++++++----------------
drivers/md/md.h | 1 +
2 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index ab9299187cfe..926285bece5d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -663,6 +663,7 @@ void mddev_init(struct mddev *mddev)
atomic_set(&mddev->active, 1);
atomic_set(&mddev->openers, 0);
spin_lock_init(&mddev->lock);
+ spin_lock_init(&mddev->thread_lock);
atomic_set(&mddev->flush_pending, 0);
init_waitqueue_head(&mddev->sb_wait);
init_waitqueue_head(&mddev->recovery_wait);
@@ -801,13 +802,8 @@ void mddev_unlock(struct mddev *mddev)
} else
mutex_unlock(&mddev->reconfig_mutex);

- /* As we've dropped the mutex we need a spinlock to
- * make sure the thread doesn't disappear
- */
- spin_lock(&pers_lock);
md_wakeup_thread(&mddev->thread, mddev);
wake_up(&mddev->sb_wait);
- spin_unlock(&pers_lock);
}
EXPORT_SYMBOL_GPL(mddev_unlock);

@@ -7895,8 +7891,11 @@ static int md_thread(void *arg)

void md_wakeup_thread(struct md_thread **threadp, struct mddev *mddev)
{
- struct md_thread *thread = *threadp;
+ struct md_thread *thread;

+ spin_lock_irq(&mddev->thread_lock);
+ thread = *threadp;
+ spin_unlock_irq(&mddev->thread_lock);
if (thread) {
pr_debug("md: waking up MD thread %s.\n", thread->tsk->comm);
set_bit(THREAD_WAKEUP, &thread->flags);
@@ -7929,7 +7928,9 @@ int md_register_thread(struct md_thread **threadp,
return err;
}

+ spin_lock_irq(&mddev->thread_lock);
*threadp = thread;
+ spin_unlock_irq(&mddev->thread_lock);
return 0;
}
EXPORT_SYMBOL(md_register_thread);
@@ -7938,18 +7939,13 @@ void md_unregister_thread(struct md_thread **threadp, struct mddev *mddev)
{
struct md_thread *thread;

- /*
- * Locking ensures that mddev_unlock does not wake_up a
- * non-existent thread
- */
- spin_lock(&pers_lock);
+ spin_lock_irq(&mddev->thread_lock);
thread = *threadp;
- if (!thread) {
- spin_unlock(&pers_lock);
- return;
- }
*threadp = NULL;
- spin_unlock(&pers_lock);
+ spin_unlock_irq(&mddev->thread_lock);
+
+ if (!thread)
+ return;

pr_debug("interrupting MD-thread pid %d\n", task_pid_nr(thread->tsk));
kthread_stop(thread->tsk);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 8f4137ad2dde..ca182d21dd8d 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -367,6 +367,7 @@ struct mddev {
int new_chunk_sectors;
int reshape_backwards;

+ spinlock_t thread_lock;
struct md_thread *thread; /* management thread */
struct md_thread *sync_thread; /* doing resync or reconstruct */

--
2.31.1


2023-03-15 06:19:24

by Yu Kuai

[permalink] [raw]
Subject: [PATCH v2 2/5] md: refactor md_wakeup_thread()

From: Yu Kuai <[email protected]>

Pass a md_thread pointer and a mddev to md_wakeup_thread(), prepare to
use a disk level spinlock to protect md_thread, there are no functional
changes.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/dm-raid.c | 4 +--
drivers/md/md-bitmap.c | 6 ++--
drivers/md/md-cluster.c | 20 +++++------
drivers/md/md-multipath.c | 2 +-
drivers/md/md.c | 76 ++++++++++++++++++++-------------------
drivers/md/md.h | 4 +--
drivers/md/raid1.c | 10 +++---
drivers/md/raid10.c | 14 ++++----
drivers/md/raid5-cache.c | 12 +++----
drivers/md/raid5-ppl.c | 2 +-
drivers/md/raid5.c | 31 ++++++++--------
11 files changed, 93 insertions(+), 88 deletions(-)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 60632b409b80..257c9c9f2b4d 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3755,11 +3755,11 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
*/
mddev->ro = 0;
if (!mddev->suspended && mddev->sync_thread)
- md_wakeup_thread(mddev->sync_thread);
+ md_wakeup_thread(&mddev->sync_thread, mddev);
}
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
if (!mddev->suspended && mddev->thread)
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);

return 0;
}
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index e7cc6ba1b657..9489510405f7 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -1942,7 +1942,7 @@ int md_bitmap_load(struct mddev *mddev)
set_bit(MD_RECOVERY_NEEDED, &bitmap->mddev->recovery);

mddev->thread->timeout = mddev->bitmap_info.daemon_sleep;
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);

md_bitmap_update_sb(bitmap);

@@ -2363,7 +2363,7 @@ location_store(struct mddev *mddev, const char *buf, size_t len)
* metadata promptly.
*/
set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
}
rv = 0;
out:
@@ -2454,7 +2454,7 @@ timeout_store(struct mddev *mddev, const char *buf, size_t len)
*/
if (mddev->thread->timeout < MAX_SCHEDULE_TIMEOUT) {
mddev->thread->timeout = timeout;
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
}
}
return len;
diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index c19e29cb73bf..85fbcf5bae27 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -325,7 +325,7 @@ static void recover_bitmaps(struct md_thread *thread)
if (test_bit(MD_RESYNCING_REMOTE, &mddev->recovery) &&
test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
mddev->reshape_position != MaxSector)
- md_wakeup_thread(mddev->sync_thread);
+ md_wakeup_thread(&mddev->sync_thread, mddev);

if (hi > 0) {
if (lo < mddev->recovery_cp)
@@ -340,7 +340,7 @@ static void recover_bitmaps(struct md_thread *thread)
clear_bit(MD_RESYNCING_REMOTE,
&mddev->recovery);
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
}
}
clear_bit:
@@ -368,7 +368,7 @@ static void __recover_slot(struct mddev *mddev, int slot)
return;
}
}
- md_wakeup_thread(cinfo->recovery_thread);
+ md_wakeup_thread(&cinfo->recovery_thread, mddev);
}

static void recover_slot(void *arg, struct dlm_slot *slot)
@@ -422,7 +422,7 @@ static void ack_bast(void *arg, int mode)

if (mode == DLM_LOCK_EX) {
if (test_bit(MD_CLUSTER_ALREADY_IN_CLUSTER, &cinfo->state))
- md_wakeup_thread(cinfo->recv_thread);
+ md_wakeup_thread(&cinfo->recv_thread, res->mddev);
else
set_bit(MD_CLUSTER_PENDING_RECV_EVENT, &cinfo->state);
}
@@ -454,7 +454,7 @@ static void process_suspend_info(struct mddev *mddev,
clear_bit(MD_RESYNCING_REMOTE, &mddev->recovery);
remove_suspend_info(mddev, slot);
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
return;
}

@@ -546,7 +546,7 @@ static void process_remove_disk(struct mddev *mddev, struct cluster_msg *msg)
if (rdev) {
set_bit(ClusterRemove, &rdev->flags);
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
}
else
pr_warn("%s: %d Could not find disk(%d) to REMOVE\n",
@@ -696,7 +696,7 @@ static int lock_comm(struct md_cluster_info *cinfo, bool mddev_locked)
rv = test_and_set_bit_lock(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD,
&cinfo->state);
WARN_ON_ONCE(rv);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
set_bit = 1;
}

@@ -971,7 +971,7 @@ static void load_bitmaps(struct mddev *mddev, int total_slots)
set_bit(MD_CLUSTER_ALREADY_IN_CLUSTER, &cinfo->state);
/* wake up recv thread in case something need to be handled */
if (test_and_clear_bit(MD_CLUSTER_PENDING_RECV_EVENT, &cinfo->state))
- md_wakeup_thread(cinfo->recv_thread);
+ md_wakeup_thread(&cinfo->recv_thread, mddev);
}

static void resync_bitmap(struct mddev *mddev)
@@ -1052,7 +1052,7 @@ static int metadata_update_start(struct mddev *mddev)
ret = test_and_set_bit_lock(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD,
&cinfo->state);
WARN_ON_ONCE(ret);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);

wait_event(cinfo->wait,
!test_and_set_bit(MD_CLUSTER_SEND_LOCK, &cinfo->state) ||
@@ -1430,7 +1430,7 @@ static int add_new_disk(struct mddev *mddev, struct md_rdev *rdev)
/* Since MD_CHANGE_DEVS will be set in add_bound_rdev which
* will run soon after add_new_disk, the below path will be
* invoked:
- * md_wakeup_thread(mddev->thread)
+ * md_wakeup_thread(&mddev->thread)
* -> conf->thread (raid1d)
* -> md_check_recovery -> md_update_sb
* -> metadata_update_start/finish
diff --git a/drivers/md/md-multipath.c b/drivers/md/md-multipath.c
index ceec9e4b2a60..482536ec8850 100644
--- a/drivers/md/md-multipath.c
+++ b/drivers/md/md-multipath.c
@@ -57,7 +57,7 @@ static void multipath_reschedule_retry (struct multipath_bh *mp_bh)
spin_lock_irqsave(&conf->device_lock, flags);
list_add(&mp_bh->retry_list, &conf->retry_list);
spin_unlock_irqrestore(&conf->device_lock, flags);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
}

/*
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0bbdde29a41f..97e87df4ee43 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -487,8 +487,9 @@ void mddev_resume(struct mddev *mddev)
mddev->pers->quiesce(mddev, 0);

set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
- md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */
+ md_wakeup_thread(&mddev->thread, mddev);
+ /* possibly kick off a reshape */
+ md_wakeup_thread(&mddev->sync_thread, mddev);
}
EXPORT_SYMBOL_GPL(mddev_resume);

@@ -804,7 +805,7 @@ void mddev_unlock(struct mddev *mddev)
* make sure the thread doesn't disappear
*/
spin_lock(&pers_lock);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
wake_up(&mddev->sb_wait);
spin_unlock(&pers_lock);
}
@@ -2814,7 +2815,7 @@ static int add_bound_rdev(struct md_rdev *rdev)
set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
md_new_event();
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
return 0;
}

@@ -2931,7 +2932,7 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
md_kick_rdev_from_array(rdev);
if (mddev->pers) {
set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
}
md_new_event();
}
@@ -2962,7 +2963,7 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
clear_bit(BlockedBadBlocks, &rdev->flags);
wake_up(&rdev->blocked_wait);
set_bit(MD_RECOVERY_NEEDED, &rdev->mddev->recovery);
- md_wakeup_thread(rdev->mddev->thread);
+ md_wakeup_thread(&rdev->mddev->thread, rdev->mddev);

err = 0;
} else if (cmd_match(buf, "insync") && rdev->raid_disk == -1) {
@@ -3000,7 +3001,7 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
!test_bit(Replacement, &rdev->flags))
set_bit(WantReplacement, &rdev->flags);
set_bit(MD_RECOVERY_NEEDED, &rdev->mddev->recovery);
- md_wakeup_thread(rdev->mddev->thread);
+ md_wakeup_thread(&rdev->mddev->thread, rdev->mddev);
err = 0;
} else if (cmd_match(buf, "-want_replacement")) {
/* Clearing 'want_replacement' is always allowed.
@@ -3127,7 +3128,7 @@ slot_store(struct md_rdev *rdev, const char *buf, size_t len)
if (rdev->raid_disk >= 0)
return -EBUSY;
set_bit(MD_RECOVERY_NEEDED, &rdev->mddev->recovery);
- md_wakeup_thread(rdev->mddev->thread);
+ md_wakeup_thread(&rdev->mddev->thread, rdev->mddev);
} else if (rdev->mddev->pers) {
/* Activating a spare .. or possibly reactivating
* if we ever get bitmaps working here.
@@ -4359,7 +4360,7 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len)
if (st == active) {
restart_array(mddev);
clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
wake_up(&mddev->sb_wait);
} else /* st == clean */ {
restart_array(mddev);
@@ -4826,10 +4827,10 @@ action_store(struct mddev *mddev, const char *page, size_t len)
* canceling read-auto mode
*/
mddev->ro = MD_RDWR;
- md_wakeup_thread(mddev->sync_thread);
+ md_wakeup_thread(&mddev->sync_thread, mddev);
}
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
sysfs_notify_dirent_safe(mddev->sysfs_action);
return len;
}
@@ -5733,7 +5734,7 @@ static void md_safemode_timeout(struct timer_list *t)
if (mddev->external)
sysfs_notify_dirent_safe(mddev->sysfs_state);

- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
}

static int start_dirty_degraded;
@@ -6045,8 +6046,9 @@ int do_md_run(struct mddev *mddev)
/* run start up tasks that require md_thread */
md_start(mddev);

- md_wakeup_thread(mddev->thread);
- md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */
+ md_wakeup_thread(&mddev->thread, mddev);
+ /* possibly kick off a reshape */
+ md_wakeup_thread(&mddev->sync_thread, mddev);

set_capacity_and_notify(mddev->gendisk, mddev->array_sectors);
clear_bit(MD_NOT_READY, &mddev->flags);
@@ -6066,10 +6068,10 @@ int md_start(struct mddev *mddev)

if (mddev->pers->start) {
set_bit(MD_RECOVERY_WAIT, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
ret = mddev->pers->start(mddev);
clear_bit(MD_RECOVERY_WAIT, &mddev->recovery);
- md_wakeup_thread(mddev->sync_thread);
+ md_wakeup_thread(&mddev->sync_thread, mddev);
}
return ret;
}
@@ -6111,8 +6113,8 @@ static int restart_array(struct mddev *mddev)
pr_debug("md: %s switched to read-write mode.\n", mdname(mddev));
/* Kick recovery or resync if necessary */
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
- md_wakeup_thread(mddev->sync_thread);
+ md_wakeup_thread(&mddev->thread, mddev);
+ md_wakeup_thread(&mddev->sync_thread, mddev);
sysfs_notify_dirent_safe(mddev->sysfs_state);
return 0;
}
@@ -6261,7 +6263,7 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
if (!test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
did_freeze = 1;
set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
}
if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
@@ -6287,7 +6289,7 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
if (did_freeze) {
clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
}
err = -EBUSY;
goto out;
@@ -6302,7 +6304,7 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
set_disk_ro(mddev->gendisk, 1);
clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
sysfs_notify_dirent_safe(mddev->sysfs_state);
err = 0;
}
@@ -6325,7 +6327,7 @@ static int do_md_stop(struct mddev *mddev, int mode,
if (!test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
did_freeze = 1;
set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
}
if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
@@ -6350,7 +6352,7 @@ static int do_md_stop(struct mddev *mddev, int mode,
if (did_freeze) {
clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
}
return -EBUSY;
}
@@ -6893,7 +6895,7 @@ static int hot_remove_disk(struct mddev *mddev, dev_t dev)
md_kick_rdev_from_array(rdev);
set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
if (mddev->thread)
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
else
md_update_sb(mddev, 1);
md_new_event();
@@ -6976,7 +6978,7 @@ static int hot_add_disk(struct mddev *mddev, dev_t dev)
* array immediately.
*/
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
md_new_event();
return 0;

@@ -7886,8 +7888,10 @@ static int md_thread(void *arg)
return 0;
}

-void md_wakeup_thread(struct md_thread *thread)
+void md_wakeup_thread(struct md_thread **threadp, struct mddev *mddev)
{
+ struct md_thread *thread = *threadp;
+
if (thread) {
pr_debug("md: waking up MD thread %s.\n", thread->tsk->comm);
set_bit(THREAD_WAKEUP, &thread->flags);
@@ -7963,7 +7967,7 @@ void md_error(struct mddev *mddev, struct md_rdev *rdev)
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
if (!test_bit(MD_BROKEN, &mddev->flags)) {
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
}
if (mddev->event_work.func)
queue_work(md_misc_wq, &mddev->event_work);
@@ -8474,7 +8478,7 @@ void md_done_sync(struct mddev *mddev, int blocks, int ok)
if (!ok) {
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
set_bit(MD_RECOVERY_ERROR, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
// stop recovery, signal do_sync ....
}
}
@@ -8499,8 +8503,8 @@ bool md_write_start(struct mddev *mddev, struct bio *bi)
/* need to switch to read/write */
mddev->ro = MD_RDWR;
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
- md_wakeup_thread(mddev->sync_thread);
+ md_wakeup_thread(&mddev->thread, mddev);
+ md_wakeup_thread(&mddev->sync_thread, mddev);
did_change = 1;
}
rcu_read_lock();
@@ -8515,7 +8519,7 @@ bool md_write_start(struct mddev *mddev, struct bio *bi)
mddev->in_sync = 0;
set_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags);
set_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
did_change = 1;
}
spin_unlock(&mddev->lock);
@@ -8558,7 +8562,7 @@ void md_write_end(struct mddev *mddev)
percpu_ref_put(&mddev->writes_pending);

if (mddev->safemode == 2)
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
else if (mddev->safemode_delay)
/* The roundup() ensures this only performs locking once
* every ->safemode_delay jiffies
@@ -9100,7 +9104,7 @@ void md_do_sync(struct md_thread *thread)
spin_unlock(&mddev->lock);

wake_up(&resync_wait);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
return;
}
EXPORT_SYMBOL_GPL(md_do_sync);
@@ -9218,7 +9222,7 @@ static void md_start_sync(struct work_struct *ws)
if (mddev->sysfs_action)
sysfs_notify_dirent_safe(mddev->sysfs_action);
} else
- md_wakeup_thread(mddev->sync_thread);
+ md_wakeup_thread(&mddev->sync_thread, mddev);
sysfs_notify_dirent_safe(mddev->sysfs_action);
md_new_event();
}
@@ -9534,7 +9538,7 @@ int rdev_set_badblocks(struct md_rdev *rdev, sector_t s, int sectors,
sysfs_notify_dirent_safe(rdev->sysfs_state);
set_mask_bits(&mddev->sb_flags, 0,
BIT(MD_SB_CHANGE_CLEAN) | BIT(MD_SB_CHANGE_PENDING));
- md_wakeup_thread(rdev->mddev->thread);
+ md_wakeup_thread(&rdev->mddev->thread, mddev);
return 1;
} else
return 0;
@@ -9699,7 +9703,7 @@ static void check_sb_changes(struct mddev *mddev, struct md_rdev *rdev)
/* wakeup mddev->thread here, so array could
* perform resync with the new activated disk */
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
}
/* device faulty
* We just want to do the minimum to mark the disk
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 344e055e4d0f..aeb2fc6b65c7 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -734,7 +734,7 @@ int md_register_thread(struct md_thread **threadp,
void (*run)(struct md_thread *thread),
struct mddev *mddev, const char *name);
extern void md_unregister_thread(struct md_thread **threadp);
-extern void md_wakeup_thread(struct md_thread *thread);
+extern void md_wakeup_thread(struct md_thread **threadp, struct mddev *mddev);
extern void md_check_recovery(struct mddev *mddev);
extern void md_reap_sync_thread(struct mddev *mddev);
extern int mddev_init_writes_pending(struct mddev *mddev);
@@ -805,7 +805,7 @@ static inline void rdev_dec_pending(struct md_rdev *rdev, struct mddev *mddev)
int faulty = test_bit(Faulty, &rdev->flags);
if (atomic_dec_and_test(&rdev->nr_pending) && faulty) {
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
}
}

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 1217c1db0a40..391ff239c711 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -289,7 +289,7 @@ static void reschedule_retry(struct r1bio *r1_bio)
spin_unlock_irqrestore(&conf->device_lock, flags);

wake_up(&conf->wait_barrier);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
}

/*
@@ -1180,7 +1180,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
bio_list_merge(&conf->pending_bio_list, &plug->pending);
spin_unlock_irq(&conf->device_lock);
wake_up(&conf->wait_barrier);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
kfree(plug);
return;
}
@@ -1585,7 +1585,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
spin_lock_irqsave(&conf->device_lock, flags);
bio_list_add(&conf->pending_bio_list, mbio);
spin_unlock_irqrestore(&conf->device_lock, flags);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
}
}

@@ -2501,7 +2501,7 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio)
* get_unqueued_pending() == extra to be true.
*/
wake_up(&conf->wait_barrier);
- md_wakeup_thread(conf->mddev->thread);
+ md_wakeup_thread(&conf->mddev->thread, conf->mddev);
} else {
if (test_bit(R1BIO_WriteError, &r1_bio->state))
close_write(r1_bio);
@@ -3344,7 +3344,7 @@ static int raid1_reshape(struct mddev *mddev)

set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);

mempool_exit(&oldpool);
return 0;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index f1e54c62f930..920e5722040f 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -309,7 +309,7 @@ static void reschedule_retry(struct r10bio *r10_bio)
/* wake up frozen array... */
wake_up(&conf->wait_barrier);

- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
}

/*
@@ -1114,7 +1114,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
bio_list_merge(&conf->pending_bio_list, &plug->pending);
spin_unlock_irq(&conf->device_lock);
wake_up(&conf->wait_barrier);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
kfree(plug);
return;
}
@@ -1329,7 +1329,7 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio,
spin_lock_irqsave(&conf->device_lock, flags);
bio_list_add(&conf->pending_bio_list, mbio);
spin_unlock_irqrestore(&conf->device_lock, flags);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
}
}

@@ -1441,7 +1441,7 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
mddev->reshape_position = conf->reshape_progress;
set_mask_bits(&mddev->sb_flags, 0,
BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING));
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
if (bio->bi_opf & REQ_NOWAIT) {
allow_barrier(conf);
bio_wouldblock_error(bio);
@@ -3079,7 +3079,7 @@ static void handle_write_completed(struct r10conf *conf, struct r10bio *r10_bio)
* nr_pending == nr_queued + extra to be true.
*/
wake_up(&conf->wait_barrier);
- md_wakeup_thread(conf->mddev->thread);
+ md_wakeup_thread(&conf->mddev->thread, conf->mddev);
} else {
if (test_bit(R10BIO_WriteError,
&r10_bio->state))
@@ -4692,7 +4692,7 @@ static int raid10_start_reshape(struct mddev *mddev)
goto abort;
}
conf->reshape_checkpoint = jiffies;
- md_wakeup_thread(mddev->sync_thread);
+ md_wakeup_thread(&mddev->sync_thread, mddev);
md_new_event();
return 0;

@@ -4874,7 +4874,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
mddev->curr_resync_completed = conf->reshape_progress;
conf->reshape_checkpoint = jiffies;
set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
wait_event(mddev->sb_wait, mddev->sb_flags == 0 ||
test_bit(MD_RECOVERY_INTR, &mddev->recovery));
if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 0464d4d551fc..d6ee6a7a83b7 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -600,7 +600,7 @@ static void r5l_log_endio(struct bio *bio)
spin_unlock_irqrestore(&log->io_list_lock, flags);

if (log->need_cache_flush)
- md_wakeup_thread(log->rdev->mddev->thread);
+ md_wakeup_thread(&log->rdev->mddev->thread, log->rdev->mddev);

/* finish flush only io_unit and PAYLOAD_FLUSH only io_unit */
if (has_null_flush) {
@@ -1491,7 +1491,7 @@ static void r5c_do_reclaim(struct r5conf *conf)
if (!test_bit(R5C_LOG_CRITICAL, &conf->cache_state))
r5l_run_no_space_stripes(log);

- md_wakeup_thread(conf->mddev->thread);
+ md_wakeup_thread(&conf->mddev->thread, conf->mddev);
}

static void r5l_do_reclaim(struct r5l_log *log)
@@ -1519,7 +1519,7 @@ static void r5l_do_reclaim(struct r5l_log *log)
list_empty(&log->finished_ios)))
break;

- md_wakeup_thread(log->rdev->mddev->thread);
+ md_wakeup_thread(&log->rdev->mddev->thread, log->rdev->mddev);
wait_event_lock_irq(log->iounit_wait,
r5l_reclaimable_space(log) > reclaimable,
log->io_list_lock);
@@ -1571,7 +1571,7 @@ void r5l_wake_reclaim(struct r5l_log *log, sector_t space)
if (new < target)
return;
} while (!try_cmpxchg(&log->reclaim_target, &target, new));
- md_wakeup_thread(log->reclaim_thread);
+ md_wakeup_thread(&log->reclaim_thread, log->rdev->mddev);
}

void r5l_quiesce(struct r5l_log *log, int quiesce)
@@ -2776,7 +2776,7 @@ void r5c_release_extra_page(struct stripe_head *sh)

if (using_disk_info_extra_page) {
clear_bit(R5C_EXTRA_PAGE_IN_USE, &conf->cache_state);
- md_wakeup_thread(conf->mddev->thread);
+ md_wakeup_thread(&conf->mddev->thread, conf->mddev);
}
}

@@ -2832,7 +2832,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,

if (test_and_clear_bit(STRIPE_FULL_WRITE, &sh->state))
if (atomic_dec_and_test(&conf->pending_full_writes))
- md_wakeup_thread(conf->mddev->thread);
+ md_wakeup_thread(&conf->mddev->thread, conf->mddev);

if (do_wakeup)
wake_up(&conf->wait_for_overlap);
diff --git a/drivers/md/raid5-ppl.c b/drivers/md/raid5-ppl.c
index e495939bb3e0..47cf1e85c48d 100644
--- a/drivers/md/raid5-ppl.c
+++ b/drivers/md/raid5-ppl.c
@@ -601,7 +601,7 @@ static void ppl_flush_endio(struct bio *bio)

if (atomic_dec_and_test(&io->pending_flushes)) {
ppl_io_unit_finished(io);
- md_wakeup_thread(conf->mddev->thread);
+ md_wakeup_thread(&conf->mddev->thread, conf->mddev);
}
}

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 04b1093195d0..2c0695d41436 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -195,7 +195,7 @@ static void raid5_wakeup_stripe_thread(struct stripe_head *sh)
}

if (conf->worker_cnt_per_group == 0) {
- md_wakeup_thread(conf->mddev->thread);
+ md_wakeup_thread(&conf->mddev->thread, conf->mddev);
return;
}

@@ -268,13 +268,14 @@ static void do_release_stripe(struct r5conf *conf, struct stripe_head *sh,
return;
}
}
- md_wakeup_thread(conf->mddev->thread);
+ md_wakeup_thread(&conf->mddev->thread, conf->mddev);
} else {
BUG_ON(stripe_operations_active(sh));
if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
if (atomic_dec_return(&conf->preread_active_stripes)
< IO_THRESHOLD)
- md_wakeup_thread(conf->mddev->thread);
+ md_wakeup_thread(&conf->mddev->thread,
+ conf->mddev);
atomic_dec(&conf->active_stripes);
if (!test_bit(STRIPE_EXPANDING, &sh->state)) {
if (!r5c_is_writeback(conf->log))
@@ -356,7 +357,7 @@ static void release_inactive_stripe_list(struct r5conf *conf,
if (atomic_read(&conf->active_stripes) == 0)
wake_up(&conf->wait_for_quiescent);
if (conf->retry_read_aligned)
- md_wakeup_thread(conf->mddev->thread);
+ md_wakeup_thread(&conf->mddev->thread, conf->mddev);
}
}

@@ -407,7 +408,7 @@ void raid5_release_stripe(struct stripe_head *sh)
goto slow_path;
wakeup = llist_add(&sh->release_list, &conf->released_stripes);
if (wakeup)
- md_wakeup_thread(conf->mddev->thread);
+ md_wakeup_thread(&conf->mddev->thread, conf->mddev);
return;
slow_path:
/* we are ok here if STRIPE_ON_RELEASE_LIST is set or not */
@@ -981,7 +982,7 @@ static void stripe_add_to_batch_list(struct r5conf *conf,
if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
if (atomic_dec_return(&conf->preread_active_stripes)
< IO_THRESHOLD)
- md_wakeup_thread(conf->mddev->thread);
+ md_wakeup_thread(&conf->mddev->thread, conf->mddev);

if (test_and_clear_bit(STRIPE_BIT_DELAY, &sh->state)) {
int seq = sh->bm_seq;
@@ -3759,7 +3760,7 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh,

if (test_and_clear_bit(STRIPE_FULL_WRITE, &sh->state))
if (atomic_dec_and_test(&conf->pending_full_writes))
- md_wakeup_thread(conf->mddev->thread);
+ md_wakeup_thread(&conf->mddev->thread, conf->mddev);
}

static void
@@ -4156,7 +4157,7 @@ static void handle_stripe_clean_event(struct r5conf *conf,

if (test_and_clear_bit(STRIPE_FULL_WRITE, &sh->state))
if (atomic_dec_and_test(&conf->pending_full_writes))
- md_wakeup_thread(conf->mddev->thread);
+ md_wakeup_thread(&conf->mddev->thread, conf->mddev);

if (head_sh->batch_head && do_endio)
break_stripe_batch_list(head_sh, STRIPE_EXPAND_SYNC_FLAGS);
@@ -5369,7 +5370,7 @@ static void handle_stripe(struct stripe_head *sh)
atomic_dec(&conf->preread_active_stripes);
if (atomic_read(&conf->preread_active_stripes) <
IO_THRESHOLD)
- md_wakeup_thread(conf->mddev->thread);
+ md_wakeup_thread(&conf->mddev->thread, conf->mddev);
}

clear_bit_unlock(STRIPE_ACTIVE, &sh->state);
@@ -5436,7 +5437,7 @@ static void add_bio_to_retry(struct bio *bi,struct r5conf *conf)
conf->retry_read_aligned_list = bi;

spin_unlock_irqrestore(&conf->device_lock, flags);
- md_wakeup_thread(conf->mddev->thread);
+ md_wakeup_thread(&conf->mddev->thread, conf->mddev);
}

static struct bio *remove_bio_from_retry(struct r5conf *conf,
@@ -6045,7 +6046,7 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
* Stripe is busy expanding or add failed due to
* overlap. Flush everything and wait a while.
*/
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
ret = STRIPE_SCHEDULE_AND_RETRY;
goto out_release;
}
@@ -6345,7 +6346,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, int *sk

conf->reshape_checkpoint = jiffies;
set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
wait_event(mddev->sb_wait, mddev->sb_flags == 0 ||
test_bit(MD_RECOVERY_INTR, &mddev->recovery));
if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
@@ -6453,7 +6454,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, int *sk
rdev->recovery_offset = sector_nr;
conf->reshape_checkpoint = jiffies;
set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
wait_event(mddev->sb_wait,
!test_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags)
|| test_bit(MD_RECOVERY_INTR, &mddev->recovery));
@@ -8585,7 +8586,7 @@ static int raid5_start_reshape(struct mddev *mddev)
return -EAGAIN;
}
conf->reshape_checkpoint = jiffies;
- md_wakeup_thread(mddev->sync_thread);
+ md_wakeup_thread(&mddev->sync_thread, mddev);
md_new_event();
return 0;
}
@@ -8815,7 +8816,7 @@ static int raid5_check_reshape(struct mddev *mddev)
mddev->chunk_sectors = new_chunk;
}
set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(&mddev->thread, mddev);
}
return check_reshape(mddev);
}
--
2.31.1


2023-03-15 08:30:41

by Paul Menzel

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] md: fix uaf for sync_thread

Dear Logan,


Am 15.03.23 um 07:18 schrieb Yu Kuai:
> From: Yu Kuai <[email protected]>
>
> Changes in v2:
> - fix a compile error for for md-cluster in patch 2
> - replace spin_lock/unlock with spin_lock/unlock_irq in patch 5
> - don't wake up inside the new lock in md wakeup_thread in patch 5
>
> Our test reports a uaf for 'mddev->sync_thread':
>
> T1 T2
> md_start_sync
> md_register_thread
> raid1d
> md_check_recovery
> md_reap_sync_thread
> md_unregister_thread
> kfree
>
> md_wakeup_thread
> wake_up
> ->sync_thread was freed
>
> Currently, a global spinlock 'pers_lock' is borrowed to protect
> 'mddev->thread', this problem can be fixed likewise, however, there might
> be similar problem for other md_thread, and I really don't like the idea to
> borrow a global lock.
>
> This patchset do some refactor, and then use a disk level spinlock to
> protect md_thread in relevant apis.
>
> I tested this pathset with mdadm tests, and there are no new regression,
> by the way, following test will failed with or without this patchset:
>
> 01raid6integ
> 04r1update
> 05r6tor0
> 10ddf-create
> 10ddf-fail-spare
> 10ddf-fail-stop-readd
> 10ddf-geometry

As you improved the tests in the past, can you confirm, these failed on
your test systems too and are fixed now?

[…]


Kind regards,

Paul

2023-03-15 09:49:24

by Guoqing Jiang

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] md: protect md_thread with a new disk level spin lock



On 3/15/23 14:18, Yu Kuai wrote:
> From: Yu Kuai <[email protected]>
>
> Our test reports a uaf for 'mddev->sync_thread':
>
> T1 T2
> md_start_sync
> md_register_thread
> raid1d
> md_check_recovery
> md_reap_sync_thread
> md_unregister_thread
> kfree
>
> md_wakeup_thread
> wake_up
> ->sync_thread was freed

Better to provide the relevant uaf (user after free perhaps you mean)
log from the test.

> Currently, a global spinlock 'pers_lock' is borrowed to protect
> 'mddev->thread', this problem can be fixed likewise, however, there might
> be similar problem for other md_thread, and I really don't like the idea to
> borrow a global lock.
>
> This patch use a disk level spinlock to protect md_thread in relevant apis.

It is array level I think, and you probably want to remove the comment.

* pers_lockdoes extra service to protect accesses to
* mddev->thread when the mutex cannot be held.

Thanks,
Guoqing

2023-03-15 10:03:57

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] md: protect md_thread with a new disk level spin lock

Hi,

在 2023/03/15 17:39, Guoqing Jiang 写道:
>
>
> On 3/15/23 14:18, Yu Kuai wrote:
>> From: Yu Kuai <[email protected]>
>>
>> Our test reports a uaf for 'mddev->sync_thread':
>>
>> T1                      T2
>> md_start_sync
>>   md_register_thread
>>             raid1d
>>              md_check_recovery
>>               md_reap_sync_thread
>>                md_unregister_thread
>>                 kfree
>>
>>   md_wakeup_thread
>>    wake_up
>>    ->sync_thread was freed
>
> Better to provide the relevant uaf (user after free perhaps you mean)
> log from the test.
Ok, I'll add uaf report(the report is from v5.10) in the next version.
>
>> Currently, a global spinlock 'pers_lock' is borrowed to protect
>> 'mddev->thread', this problem can be fixed likewise, however, there might
>> be similar problem for other md_thread, and I really don't like the
>> idea to
>> borrow a global lock.
>>
>> This patch use a disk level spinlock to protect md_thread in relevant
>> apis.
>
> It is array level I think, and you probably want to remove the comment.
>
> * pers_lockdoes extra service to protect accesses to
> * mddev->thread when the mutex cannot be held.

Yes, I missed this.

Thanks,
Kuai
>
> Thanks,
> Guoqing
> .
>


2023-03-15 10:40:02

by Guoqing Jiang

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] md: protect md_thread with a new disk level spin lock



On 3/15/23 18:02, Yu Kuai wrote:
> Hi,
>
> 在 2023/03/15 17:39, Guoqing Jiang 写道:
>>
>>
>> On 3/15/23 14:18, Yu Kuai wrote:
>>> From: Yu Kuai <[email protected]>
>>>
>>> Our test reports a uaf for 'mddev->sync_thread':
>>>
>>> T1                      T2
>>> md_start_sync
>>>   md_register_thread
>>>             raid1d
>>>              md_check_recovery
>>>               md_reap_sync_thread
>>>                md_unregister_thread
>>>                 kfree
>>>
>>>   md_wakeup_thread
>>>    wake_up
>>>    ->sync_thread was freed
>>
>> Better to provide the relevant uaf (user after free perhaps you mean)
>> log from the test.
> Ok, I'll add uaf report(the report is from v5.10) in the next version.

Can you also try with latest mainline instead of just against 5.10 kernel?

Thanks,
Guoqing

2023-03-15 22:56:36

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] md: fix uaf for sync_thread



On 2023-03-15 02:30, Paul Menzel wrote:
> Am 15.03.23 um 07:18 schrieb Yu Kuai:
>> I tested this pathset with mdadm tests, and there are no new regression,
>> by the way, following test will failed with or without this patchset:
>>
>> 01raid6integ
>> 04r1update
>> 05r6tor0
>> 10ddf-create
>> 10ddf-fail-spare
>> 10ddf-fail-stop-readd
>> 10ddf-geometry
>
> As you improved the tests in the past, can you confirm, these failed on
> your test systems too and are fixed now?

Hmm, well Yu did not claim that those tests were fixed. If you re-read
what was said, the tests listed failed with or without the new changes.
As I read it, Yu asserts no new regressions were created with the patch
set, not that failing tests were fixed.

Unfortunately, the tests listed are largely not ones I saw failing the
last time I ran the tests (though it's been a few months since I last
tried). I know 01raid6integ used to fail some of the time, but the other
6 tests mentioned worked the last time I ran them; and there are many
other tests that failed when I ran them. (My notes on which tests are
broken are included in the most recent mdadm tree in tests/*.broken)

I was going to try and confirm that no new regressions were introduced
by Yu's patches, but seems the tests are getting worse. I tried running
the tests on the current md-next branch and found that one of the early
tests, 00raid5-zero, hangs indefinitely. I quickly ran the same test on
v6.3-rc2 and found that it runs just fine there. So it looks like
there's already a regression in md-next that is not part of this series
and I don't have the time to dig into the root cause right now.

Yu's patches don't apply cleanly to v6.3-rc2 and I can't run the tests
against md-next; so I didn't bother running them, but I did do a quick
review. The locking changes make sense to me so it might be worth
merging for correctness. However, I'm not entirely sure it's the best
solution -- the md thread stuff seems like a bit of a mess and passing
an mddev to thread functions that were not related to the mddev to get a
lock seems to just make the mess a bit worse.

For example, it seems a bit ugly to me for the lock mddev->thread_lock
to protect the access of a pointer in struct r5l_log. Just spit-balling,
but perhaps RCU would be more appropriate here. Then md_wakeup_thread()
would just need to hold the RCU read lock when dereferencing, and
md_unregister_thread() would just need to synchronize_rcu() before
stopping and freeing the thread. This has the benefit of not requiring
the mddev object for every md_thread and would probably require a lot
less churn than the current patches.

Logan





2023-03-16 01:27:00

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] md: fix uaf for sync_thread

Hi,

在 2023/03/16 6:55, Logan Gunthorpe 写道:
>
>
> On 2023-03-15 02:30, Paul Menzel wrote:
>> Am 15.03.23 um 07:18 schrieb Yu Kuai:
>>> I tested this pathset with mdadm tests, and there are no new regression,
>>> by the way, following test will failed with or without this patchset:
>>>
>>> 01raid6integ
>>> 04r1update
>>> 05r6tor0
>>> 10ddf-create
>>> 10ddf-fail-spare
>>> 10ddf-fail-stop-readd
>>> 10ddf-geometry
>>
>> As you improved the tests in the past, can you confirm, these failed on
>> your test systems too and are fixed now?
>
> Hmm, well Yu did not claim that those tests were fixed. If you re-read
> what was said, the tests listed failed with or without the new changes.
> As I read it, Yu asserts no new regressions were created with the patch
> set, not that failing tests were fixed.
>
> Unfortunately, the tests listed are largely not ones I saw failing the
> last time I ran the tests (though it's been a few months since I last
> tried). I know 01raid6integ used to fail some of the time, but the other
> 6 tests mentioned worked the last time I ran them; and there are many
> other tests that failed when I ran them. (My notes on which tests are
> broken are included in the most recent mdadm tree in tests/*.broken)
>
> I was going to try and confirm that no new regressions were introduced
> by Yu's patches, but seems the tests are getting worse. I tried running
> the tests on the current md-next branch and found that one of the early
> tests, 00raid5-zero, hangs indefinitely. I quickly ran the same test on
> v6.3-rc2 and found that it runs just fine there. So it looks like
> there's already a regression in md-next that is not part of this series
> and I don't have the time to dig into the root cause right now.
>
> Yu's patches don't apply cleanly to v6.3-rc2 and I can't run the tests
> against md-next; so I didn't bother running them, but I did do a quick
> review. The locking changes make sense to me so it might be worth
> merging for correctness. However, I'm not entirely sure it's the best
> solution -- the md thread stuff seems like a bit of a mess and passing
> an mddev to thread functions that were not related to the mddev to get a
> lock seems to just make the mess a bit worse.
>
> For example, it seems a bit ugly to me for the lock mddev->thread_lock
> to protect the access of a pointer in struct r5l_log. Just spit-balling,
> but perhaps RCU would be more appropriate here. Then md_wakeup_thread()
> would just need to hold the RCU read lock when dereferencing, and
> md_unregister_thread() would just need to synchronize_rcu() before
> stopping and freeing the thread. This has the benefit of not requiring
> the mddev object for every md_thread and would probably require a lot
> less churn than the current patches.

Thanks for your suggestion, this make sense to me. I'll try to use rcu.

Thanks,
Kuai
>
> Logan
>
>
>
>
> .
>


2023-03-17 02:30:38

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] md: use md_thread api to wake up sync_thread


Greeting,

FYI, we noticed INFO:task_blocked_for_more_than#seconds due to commit (built with gcc-11):

commit: af2203c7e88c00d3ce072f18c18a36e2936372fd ("[PATCH v2 3/5] md: use md_thread api to wake up sync_thread")
url: https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/md-pass-a-md_thread-pointer-to-md_register_thread/20230315-142018
base: git://git.kernel.org/cgit/linux/kernel/git/song/md.git md-next
patch link: https://lore.kernel.org/all/[email protected]/
patch subject: [PATCH v2 3/5] md: use md_thread api to wake up sync_thread

in testcase: mdadm-selftests
version: mdadm-selftests-x86_64-5f41845-1_20220826
with following parameters:

disk: 1HDD
test_prefix: 07revert



on test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4790T CPU @ 2.70GHz (Haswell) with 16G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-lkp/[email protected]


kern :err : [ 990.133727] INFO: task mdadm:1040 blocked for more than 491 seconds.
kern :err : [ 990.140824] Tainted: G S 6.3.0-rc1-00019-gaf2203c7e88c #1
kern :err : [ 990.148856] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kern :info : [ 990.157410] task:mdadm state:D stack:0 pid:1040 ppid:963 flags:0x00000000
kern :info : [ 990.166483] Call Trace:
kern :info : [ 990.169666] <TASK>
kern :info : [ 990.172504] __schedule (kernel/sched/core.c:5304 kernel/sched/core.c:6622)
kern :info : [ 990.176810] ? io_schedule_timeout (kernel/sched/core.c:6504)
kern :info : [ 990.181994] ? _raw_spin_lock_irqsave (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:186 include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
kern :info : [ 990.187250] ? _raw_write_lock_irq (kernel/locking/spinlock.c:153)
kern :info : [ 990.192255] schedule (arch/x86/include/asm/preempt.h:85 (discriminator 1) kernel/sched/core.c:6699 (discriminator 1))
kern :info : [ 990.196291] do_md_stop (drivers/md/md.c:6364 (discriminator 29))
kern :info : [ 990.200603] ? md_kick_rdev_from_array (drivers/md/md.c:6344)
kern :info : [ 990.206111] ? add_ptr_to_bulk_krc_lock (arch/x86/include/asm/atomic.h:95 include/linux/atomic/atomic-instrumented.h:191 kernel/rcu/tree.c:3273)
kern :info : [ 990.211711] ? cpuacct_stats_show (kernel/sched/wait.c:418)
kern :info : [ 990.216787] ? schedule_delayed_monitor_work (arch/x86/include/asm/bitops.h:207 arch/x86/include/asm/bitops.h:239 include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/rcu/tree.c:3044)
kern :info : [ 990.222736] ? __cond_resched (kernel/sched/core.c:8486)
kern :info : [ 990.227292] ? mutex_lock_interruptible (arch/x86/include/asm/atomic64_64.h:190 include/linux/atomic/atomic-long.h:443 include/linux/atomic/atomic-instrumented.h:1781 kernel/locking/mutex.c:171 kernel/locking/mutex.c:981)
kern :info : [ 990.232725] ? __mutex_lock_interruptible_slowpath (kernel/locking/mutex.c:978)
kern :info : [ 990.239095] ? __mutex_unlock_slowpath+0x2a0/0x2a0
kern :info : [ 990.245668] md_ioctl (drivers/md/md.c:7642)
kern :info : [ 990.249809] ? hot_add_disk (drivers/md/md.c:7513)
kern :info : [ 990.254360] ? __call_rcu_common+0x363/0x900
kern :info : [ 990.260389] ? blkdev_bszset (block/ioctl.c:473)
kern :info : [ 990.265047] ? rcu_nocb_gp_kthread (kernel/rcu/tree.c:2596)
kern :info : [ 990.270224] ? __fput (include/linux/fs.h:2554 fs/internal.h:110 fs/file_table.c:328)
kern :info : [ 990.274269] ? __cond_resched (kernel/sched/core.c:8486)
kern :info : [ 990.278826] blkdev_ioctl (block/ioctl.c:615)
kern :info : [ 990.283223] ? blkdev_common_ioctl (block/ioctl.c:560)
kern :info : [ 990.288568] ? _raw_spin_lock (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:186 include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
kern :info : [ 990.293139] ? _raw_write_lock_irq (kernel/locking/spinlock.c:153)
kern :info : [ 990.298137] __x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:870 fs/ioctl.c:856 fs/ioctl.c:856)
kern :info : [ 990.302800] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
kern :info : [ 990.307108] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
kern :info : [ 990.312889] RIP: 0033:0x7f00b07c8cc7
kern :info : [ 990.317187] RSP: 002b:00007ffc27cbbc88 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
kern :info : [ 990.325483] RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007f00b07c8cc7
kern :info : [ 990.333342] RDX: 0000000000000000 RSI: 0000000000000932 RDI: 0000000000000004
kern :info : [ 990.341195] RBP: 0000000000000004 R08: 0000000000000002 R09: 0000000000000000
kern :info : [ 990.349044] R10: fffffffffffff28e R11: 0000000000000206 R12: 00007ffc27cbe994
kern :info : [ 990.356888] R13: 0000000000000005 R14: 00007ffc27cbbd30 R15: 0000557160bdec10
kern :info : [ 990.364750] </TASK>
user :err : [ 1006.832241] Terminated

user :notice: [ 1006.968217] /lkp/benchmarks/mdadm-selftests/tests/07revert-grow... FAILED - see /var/tmp/07revert-grow.log and /var/tmp/fail07revert-grow.log for details

user :notice: [ 1007.123127] 07revert-grow TIMEOUT

user :err : [ 1008.444219] mdadm: cannot open /dev/md0: No such device

user :notice: [ 1012.202178] test: please run test suite without running RAIDs environment.

user :notice: [ 1017.437744] test: please run test suite without running RAIDs environment.

user :notice: [ 1017.836121] /usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 http://internal-lkp-server:80/~lkp/cgi-bin/lkp-jobfile-append-var?job_file=/lkp/jobs/scheduled/lkp-hsw-d05/mdadm-selftests-1HDD-07revert-debian-11.1-x86_64-20220510.cgz-af2203c7e88c00d3ce072f18c18a36e2936372fd-20230317-32283-1997139-5.yaml&job_state=post_run -O /dev/null

user :notice: [ 1019.573946] kill 528 vmstat --timestamp -n 10

user :notice: [ 1019.588893] kill 526 dmesg --follow --decode



To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests



Attachments:
(No filename) (6.66 kB)
config-6.3.0-rc1-00019-gaf2203c7e88c (166.32 kB)
job-script (5.70 kB)
kmsg.xz (35.71 kB)
mdadm-selftests (1.33 kB)
job.yaml (4.84 kB)
reproduce (97.00 B)
Download all attachments

2023-03-17 03:30:56

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] md: use md_thread api to wake up sync_thread

Hi,

在 2023/03/17 10:30, kernel test robot 写道:
>
> Greeting,
>
> FYI, we noticed INFO:task_blocked_for_more_than#seconds due to commit (built with gcc-11):
>
> commit: af2203c7e88c00d3ce072f18c18a36e2936372fd ("[PATCH v2 3/5] md: use md_thread api to wake up sync_thread")
> url: https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/md-pass-a-md_thread-pointer-to-md_register_thread/20230315-142018
> base: git://git.kernel.org/cgit/linux/kernel/git/song/md.git md-next
> patch link: https://lore.kernel.org/all/[email protected]/
> patch subject: [PATCH v2 3/5] md: use md_thread api to wake up sync_thread
>

I don't expect there is any difference between:

if (mddev->sync_thread)
wake_up_process(mddev->sync_thread->tsk);

and:

md_wakeup_thread(mddev->sync_thread);

I still don't understand the problem.. I'll try to reporduce this,
untill them, I guess I'll just drop this patch.

Thanks,
Kuai
> in testcase: mdadm-selftests
> version: mdadm-selftests-x86_64-5f41845-1_20220826
> with following parameters:
>
> disk: 1HDD
> test_prefix: 07revert
>
>
>
> on test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4790T CPU @ 2.70GHz (Haswell) with 16G memory
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
>
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <[email protected]>
> | Link: https://lore.kernel.org/oe-lkp/[email protected]
>
>
> kern :err : [ 990.133727] INFO: task mdadm:1040 blocked for more than 491 seconds.
> kern :err : [ 990.140824] Tainted: G S 6.3.0-rc1-00019-gaf2203c7e88c #1
> kern :err : [ 990.148856] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kern :info : [ 990.157410] task:mdadm state:D stack:0 pid:1040 ppid:963 flags:0x00000000
> kern :info : [ 990.166483] Call Trace:
> kern :info : [ 990.169666] <TASK>
> kern :info : [ 990.172504] __schedule (kernel/sched/core.c:5304 kernel/sched/core.c:6622)
> kern :info : [ 990.176810] ? io_schedule_timeout (kernel/sched/core.c:6504)
> kern :info : [ 990.181994] ? _raw_spin_lock_irqsave (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:186 include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
> kern :info : [ 990.187250] ? _raw_write_lock_irq (kernel/locking/spinlock.c:153)
> kern :info : [ 990.192255] schedule (arch/x86/include/asm/preempt.h:85 (discriminator 1) kernel/sched/core.c:6699 (discriminator 1))
> kern :info : [ 990.196291] do_md_stop (drivers/md/md.c:6364 (discriminator 29))
> kern :info : [ 990.200603] ? md_kick_rdev_from_array (drivers/md/md.c:6344)
> kern :info : [ 990.206111] ? add_ptr_to_bulk_krc_lock (arch/x86/include/asm/atomic.h:95 include/linux/atomic/atomic-instrumented.h:191 kernel/rcu/tree.c:3273)
> kern :info : [ 990.211711] ? cpuacct_stats_show (kernel/sched/wait.c:418)
> kern :info : [ 990.216787] ? schedule_delayed_monitor_work (arch/x86/include/asm/bitops.h:207 arch/x86/include/asm/bitops.h:239 include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/rcu/tree.c:3044)
> kern :info : [ 990.222736] ? __cond_resched (kernel/sched/core.c:8486)
> kern :info : [ 990.227292] ? mutex_lock_interruptible (arch/x86/include/asm/atomic64_64.h:190 include/linux/atomic/atomic-long.h:443 include/linux/atomic/atomic-instrumented.h:1781 kernel/locking/mutex.c:171 kernel/locking/mutex.c:981)
> kern :info : [ 990.232725] ? __mutex_lock_interruptible_slowpath (kernel/locking/mutex.c:978)
> kern :info : [ 990.239095] ? __mutex_unlock_slowpath+0x2a0/0x2a0
> kern :info : [ 990.245668] md_ioctl (drivers/md/md.c:7642)
> kern :info : [ 990.249809] ? hot_add_disk (drivers/md/md.c:7513)
> kern :info : [ 990.254360] ? __call_rcu_common+0x363/0x900
> kern :info : [ 990.260389] ? blkdev_bszset (block/ioctl.c:473)
> kern :info : [ 990.265047] ? rcu_nocb_gp_kthread (kernel/rcu/tree.c:2596)
> kern :info : [ 990.270224] ? __fput (include/linux/fs.h:2554 fs/internal.h:110 fs/file_table.c:328)
> kern :info : [ 990.274269] ? __cond_resched (kernel/sched/core.c:8486)
> kern :info : [ 990.278826] blkdev_ioctl (block/ioctl.c:615)
> kern :info : [ 990.283223] ? blkdev_common_ioctl (block/ioctl.c:560)
> kern :info : [ 990.288568] ? _raw_spin_lock (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:186 include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
> kern :info : [ 990.293139] ? _raw_write_lock_irq (kernel/locking/spinlock.c:153)
> kern :info : [ 990.298137] __x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:870 fs/ioctl.c:856 fs/ioctl.c:856)
> kern :info : [ 990.302800] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
> kern :info : [ 990.307108] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
> kern :info : [ 990.312889] RIP: 0033:0x7f00b07c8cc7
> kern :info : [ 990.317187] RSP: 002b:00007ffc27cbbc88 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
> kern :info : [ 990.325483] RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007f00b07c8cc7
> kern :info : [ 990.333342] RDX: 0000000000000000 RSI: 0000000000000932 RDI: 0000000000000004
> kern :info : [ 990.341195] RBP: 0000000000000004 R08: 0000000000000002 R09: 0000000000000000
> kern :info : [ 990.349044] R10: fffffffffffff28e R11: 0000000000000206 R12: 00007ffc27cbe994
> kern :info : [ 990.356888] R13: 0000000000000005 R14: 00007ffc27cbbd30 R15: 0000557160bdec10
> kern :info : [ 990.364750] </TASK>
> user :err : [ 1006.832241] Terminated
>
> user :notice: [ 1006.968217] /lkp/benchmarks/mdadm-selftests/tests/07revert-grow... FAILED - see /var/tmp/07revert-grow.log and /var/tmp/fail07revert-grow.log for details
>
> user :notice: [ 1007.123127] 07revert-grow TIMEOUT
>
> user :err : [ 1008.444219] mdadm: cannot open /dev/md0: No such device
>
> user :notice: [ 1012.202178] test: please run test suite without running RAIDs environment.
>
> user :notice: [ 1017.437744] test: please run test suite without running RAIDs environment.
>
> user :notice: [ 1017.836121] /usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 http://internal-lkp-server:80/~lkp/cgi-bin/lkp-jobfile-append-var?job_file=/lkp/jobs/scheduled/lkp-hsw-d05/mdadm-selftests-1HDD-07revert-debian-11.1-x86_64-20220510.cgz-af2203c7e88c00d3ce072f18c18a36e2936372fd-20230317-32283-1997139-5.yaml&job_state=post_run -O /dev/null
>
> user :notice: [ 1019.573946] kill 528 vmstat --timestamp -n 10
>
> user :notice: [ 1019.588893] kill 526 dmesg --follow --decode
>
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> sudo bin/lkp install job.yaml # job file is attached in this email
> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> sudo bin/lkp run generated-yaml-file
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>
>
>


2023-03-18 02:32:05

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] md: use md_thread api to wake up sync_thread

Hi,

在 2023/03/17 11:30, Yu Kuai 写道:
> Hi,
>
> 在 2023/03/17 10:30, kernel test robot 写道:
>>
>> Greeting,
>>
>> FYI, we noticed INFO:task_blocked_for_more_than#seconds due to commit
>> (built with gcc-11):
>>
>> commit: af2203c7e88c00d3ce072f18c18a36e2936372fd ("[PATCH v2 3/5] md:
>> use md_thread api to wake up sync_thread")
>> url:
>> https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/md-pass-a-md_thread-pointer-to-md_register_thread/20230315-142018
>>
>> base: git://git.kernel.org/cgit/linux/kernel/git/song/md.git md-next
>> patch link:
>> https://lore.kernel.org/all/[email protected]/
>>
>> patch subject: [PATCH v2 3/5] md: use md_thread api to wake up
>> sync_thread
>>
>
> I don't expect there is any difference between:
>
> if (mddev->sync_thread)
>     wake_up_process(mddev->sync_thread->tsk);
>
> and:
>
> md_wakeup_thread(mddev->sync_thread);

I understand that they are different now.

md_wakeup_thread() only wakeup wait_event() from md_thread(), it will
not wake up 'md_thread->tsk' if it's runing.

Hence this patch is wrong.

Thanks,
Kuai


2023-03-28 23:36:18

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] md: fix uaf for sync_thread

On Wed, Mar 15, 2023 at 6:26 PM Yu Kuai <[email protected]> wrote:
>
> Hi,
>
> 在 2023/03/16 6:55, Logan Gunthorpe 写道:
[...]
> > I was going to try and confirm that no new regressions were introduced
> > by Yu's patches, but seems the tests are getting worse. I tried running
> > the tests on the current md-next branch and found that one of the early
> > tests, 00raid5-zero, hangs indefinitely. I quickly ran the same test on

I am not able to repro the issue with 00raid5-zero. (I did a rebase before
running the test, so that might be the reason).

> > v6.3-rc2 and found that it runs just fine there. So it looks like
> > there's already a regression in md-next that is not part of this series
> > and I don't have the time to dig into the root cause right now.
> >
> > Yu's patches don't apply cleanly to v6.3-rc2 and I can't run the tests
> > against md-next; so I didn't bother running them, but I did do a quick
> > review. The locking changes make sense to me so it might be worth
> > merging for correctness. However, I'm not entirely sure it's the best
> > solution -- the md thread stuff seems like a bit of a mess and passing
> > an mddev to thread functions that were not related to the mddev to get a
> > lock seems to just make the mess a bit worse.
> >
> > For example, it seems a bit ugly to me for the lock mddev->thread_lock
> > to protect the access of a pointer in struct r5l_log. Just spit-balling,
> > but perhaps RCU would be more appropriate here. Then md_wakeup_thread()
> > would just need to hold the RCU read lock when dereferencing, and
> > md_unregister_thread() would just need to synchronize_rcu() before
> > stopping and freeing the thread. This has the benefit of not requiring
> > the mddev object for every md_thread and would probably require a lot
> > less churn than the current patches.
>
> Thanks for your suggestion, this make sense to me. I'll try to use rcu.

Yu Kuai, do you plan to resend the set with Logan suggestions?

Thanks,
Song

2023-03-29 01:16:40

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] md: fix uaf for sync_thread

Hi, Song!

在 2023/03/29 7:31, Song Liu 写道:
> On Wed, Mar 15, 2023 at 6:26 PM Yu Kuai <[email protected]> wrote:
>>
>> Hi,
>>
>> 在 2023/03/16 6:55, Logan Gunthorpe 写道:
> [...]
>>> I was going to try and confirm that no new regressions were introduced
>>> by Yu's patches, but seems the tests are getting worse. I tried running
>>> the tests on the current md-next branch and found that one of the early
>>> tests, 00raid5-zero, hangs indefinitely. I quickly ran the same test on
>
> I am not able to repro the issue with 00raid5-zero. (I did a rebase before
> running the test, so that might be the reason).
>
>>> v6.3-rc2 and found that it runs just fine there. So it looks like
>>> there's already a regression in md-next that is not part of this series
>>> and I don't have the time to dig into the root cause right now.
>>>
>>> Yu's patches don't apply cleanly to v6.3-rc2 and I can't run the tests
>>> against md-next; so I didn't bother running them, but I did do a quick
>>> review. The locking changes make sense to me so it might be worth
>>> merging for correctness. However, I'm not entirely sure it's the best
>>> solution -- the md thread stuff seems like a bit of a mess and passing
>>> an mddev to thread functions that were not related to the mddev to get a
>>> lock seems to just make the mess a bit worse.
>>>
>>> For example, it seems a bit ugly to me for the lock mddev->thread_lock
>>> to protect the access of a pointer in struct r5l_log. Just spit-balling,
>>> but perhaps RCU would be more appropriate here. Then md_wakeup_thread()
>>> would just need to hold the RCU read lock when dereferencing, and
>>> md_unregister_thread() would just need to synchronize_rcu() before
>>> stopping and freeing the thread. This has the benefit of not requiring
>>> the mddev object for every md_thread and would probably require a lot
>>> less churn than the current patches.
>>
>> Thanks for your suggestion, this make sense to me. I'll try to use rcu.
>
> Yu Kuai, do you plan to resend the set with Logan suggestions?

Yes, of course, it's just some other problems is triggered while I'm
testing the patchset, I'll resend the set once all tests is passed.

Thanks,
Kuai
>
> Thanks,
> Song
> .
>