2023-04-02 09:15:24

by Yu Kuai

[permalink] [raw]
Subject: [PATCH v4 0/5] md: fix uaf for sync_thread

From: Yu Kuai <[email protected]>

Changes in v4:
- remove patch 2 from v3
- fix sparse errors and warnings from v3, in order to do that, all access
to md_thread need to be modified, patch 2-4 is splited to avoid a huge
patch.

Changes in v3:
- remove patch 3 from v2
- use rcu instead of a new lock

Changes in v2:
- fix a compile error for md-cluster in patch 2
- replace spin_lock/unlock with spin_lock/unlock_irq in patch 5
- don't wake up inside the new lock in md wakeup_thread in patch 5

Yu Kuai (5):
md: pass a md_thread pointer to md_register_thread()
md: factor out a helper to wake up md_thread directly
md: add a helper to access md_thread() directly
dm-raid: remove useless checking in raid_message()
md: protect md_thread with rcu

drivers/md/dm-raid.c | 4 +-
drivers/md/md-bitmap.c | 28 +++++++---
drivers/md/md-cluster.c | 11 ++--
drivers/md/md-multipath.c | 6 +--
drivers/md/md.c | 108 ++++++++++++++++++++------------------
drivers/md/md.h | 21 +++++---
drivers/md/raid1.c | 9 ++--
drivers/md/raid1.h | 2 +-
drivers/md/raid10.c | 21 ++++----
drivers/md/raid10.h | 2 +-
drivers/md/raid5-cache.c | 14 ++---
drivers/md/raid5.c | 19 +++----
drivers/md/raid5.h | 2 +-
13 files changed, 132 insertions(+), 115 deletions(-)

--
2.39.2


2023-04-02 09:15:32

by Yu Kuai

[permalink] [raw]
Subject: [PATCH v4 3/5] md: add a helper to access md_thread() directly

From: Yu Kuai <[email protected]>

In some context it's safe to access md_thread directly without
protection, this patch add a helper to do that. There are no functional
changes, prepare to protect md_thread with rcu.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md-bitmap.c | 7 ++++---
drivers/md/md.c | 4 +++-
drivers/md/md.h | 6 ++++++
drivers/md/raid10.c | 2 +-
drivers/md/raid5-cache.c | 7 ++++---
5 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index e7cc6ba1b657..f670c72d97be 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -1246,7 +1246,7 @@ void md_bitmap_daemon_work(struct mddev *mddev)

bitmap->daemon_lastrun = jiffies;
if (bitmap->allclean) {
- mddev->thread->timeout = MAX_SCHEDULE_TIMEOUT;
+ get_md_thread(mddev->thread)->timeout = MAX_SCHEDULE_TIMEOUT;
goto done;
}
bitmap->allclean = 1;
@@ -1343,7 +1343,7 @@ void md_bitmap_daemon_work(struct mddev *mddev)

done:
if (bitmap->allclean == 0)
- mddev->thread->timeout =
+ get_md_thread(mddev->thread)->timeout =
mddev->bitmap_info.daemon_sleep;
mutex_unlock(&mddev->bitmap_info.mutex);
}
@@ -1941,7 +1941,8 @@ int md_bitmap_load(struct mddev *mddev)
/* Kick recovery in case any bits were set */
set_bit(MD_RECOVERY_NEEDED, &bitmap->mddev->recovery);

- mddev->thread->timeout = mddev->bitmap_info.daemon_sleep;
+ get_md_thread(mddev->thread)->timeout =
+ mddev->bitmap_info.daemon_sleep;
md_wakeup_thread(mddev->thread);

md_bitmap_update_sb(bitmap);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 139c7b0202e3..d5a29ccb24ec 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -458,7 +458,9 @@ static void md_submit_bio(struct bio *bio)
*/
void mddev_suspend(struct mddev *mddev)
{
- WARN_ON_ONCE(mddev->thread && current == mddev->thread->tsk);
+ struct md_thread *thread = get_md_thread(mddev->thread);
+
+ WARN_ON_ONCE(thread && current == thread->tsk);
lockdep_assert_held(&mddev->reconfig_mutex);
if (mddev->suspended++)
return;
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 344e055e4d0f..5acdd704a922 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -718,6 +718,12 @@ struct md_io_acct {

#define THREAD_WAKEUP 0

+/* caller need to make sured returned md_thread won't be freed */
+static inline struct md_thread *get_md_thread(struct md_thread *t)
+{
+ return t;
+}
+
static inline void safe_put_page(struct page *p)
{
if (p) put_page(p);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 0171ba4f19b0..fc8d07fb1c7d 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -996,7 +996,7 @@ static bool stop_waiting_barrier(struct r10conf *conf)
return true;

/* move on if recovery thread is blocked by us */
- if (conf->mddev->thread->tsk == current &&
+ if (get_md_thread(conf->mddev->thread)->tsk == current &&
test_bit(MD_RECOVERY_RUNNING, &conf->mddev->recovery) &&
conf->nr_queued > 0)
return true;
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 0464d4d551fc..7e45df3e093f 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -1582,11 +1582,11 @@ void r5l_quiesce(struct r5l_log *log, int quiesce)
/* make sure r5l_write_super_and_discard_space exits */
mddev = log->rdev->mddev;
wake_up(&mddev->sb_wait);
- kthread_park(log->reclaim_thread->tsk);
+ kthread_park(get_md_thread(log->reclaim_thread)->tsk);
r5l_wake_reclaim(log, MaxSector);
r5l_do_reclaim(log);
} else
- kthread_unpark(log->reclaim_thread->tsk);
+ kthread_unpark(get_md_thread(log->reclaim_thread)->tsk);
}

bool r5l_log_disk_error(struct r5conf *conf)
@@ -3124,7 +3124,8 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
if (md_register_thread(&log->reclaim_thread, r5l_reclaim_thread,
log->rdev->mddev, "reclaim"))
goto reclaim_thread;
- log->reclaim_thread->timeout = R5C_RECLAIM_WAKEUP_INTERVAL;
+ get_md_thread(log->reclaim_thread)->timeout =
+ R5C_RECLAIM_WAKEUP_INTERVAL;

init_waitqueue_head(&log->iounit_wait);

--
2.39.2

2023-04-02 09:17:58

by Yu Kuai

[permalink] [raw]
Subject: [PATCH v4 1/5] md: pass a md_thread pointer to md_register_thread()

From: Yu Kuai <[email protected]>

Prepare to protect md_thread with rcu, there are no functional changes.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md-cluster.c | 11 +++++------
drivers/md/md-multipath.c | 6 +++---
drivers/md/md.c | 27 ++++++++++++++-------------
drivers/md/md.h | 7 +++----
drivers/md/raid1.c | 5 ++---
drivers/md/raid10.c | 15 ++++++---------
drivers/md/raid5-cache.c | 5 ++---
drivers/md/raid5.c | 15 ++++++---------
8 files changed, 41 insertions(+), 50 deletions(-)

diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 10e0c5381d01..c19e29cb73bf 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -362,9 +362,8 @@ static void __recover_slot(struct mddev *mddev, int slot)

set_bit(slot, &cinfo->recovery_map);
if (!cinfo->recovery_thread) {
- cinfo->recovery_thread = md_register_thread(recover_bitmaps,
- mddev, "recover");
- if (!cinfo->recovery_thread) {
+ if (md_register_thread(&cinfo->recovery_thread, recover_bitmaps,
+ mddev, "recover")) {
pr_warn("md-cluster: Could not create recovery thread\n");
return;
}
@@ -888,9 +887,9 @@ static int join(struct mddev *mddev, int nodes)
goto err;
}
/* Initiate the communication resources */
- ret = -ENOMEM;
- cinfo->recv_thread = md_register_thread(recv_daemon, mddev, "cluster_recv");
- if (!cinfo->recv_thread) {
+ ret = md_register_thread(&cinfo->recv_thread, recv_daemon, mddev,
+ "cluster_recv");
+ if (ret) {
pr_err("md-cluster: cannot allocate memory for recv_thread!\n");
goto err;
}
diff --git a/drivers/md/md-multipath.c b/drivers/md/md-multipath.c
index 66edf5e72bd6..ceec9e4b2a60 100644
--- a/drivers/md/md-multipath.c
+++ b/drivers/md/md-multipath.c
@@ -400,9 +400,9 @@ static int multipath_run (struct mddev *mddev)
if (ret)
goto out_free_conf;

- mddev->thread = md_register_thread(multipathd, mddev,
- "multipath");
- if (!mddev->thread)
+ ret = md_register_thread(&mddev->thread, multipathd, mddev,
+ "multipath");
+ if (ret)
goto out_free_conf;

pr_info("multipath: array %s active with %d out of %d IO paths\n",
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 9bc05f451d42..1459c2cfb0dd 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7896,29 +7896,32 @@ void md_wakeup_thread(struct md_thread *thread)
}
EXPORT_SYMBOL(md_wakeup_thread);

-struct md_thread *md_register_thread(void (*run) (struct md_thread *),
- struct mddev *mddev, const char *name)
+int md_register_thread(struct md_thread **threadp,
+ void (*run)(struct md_thread *),
+ struct mddev *mddev, const char *name)
{
struct md_thread *thread;

thread = kzalloc(sizeof(struct md_thread), GFP_KERNEL);
if (!thread)
- return NULL;
+ return -ENOMEM;

init_waitqueue_head(&thread->wqueue);

thread->run = run;
thread->mddev = mddev;
thread->timeout = MAX_SCHEDULE_TIMEOUT;
- thread->tsk = kthread_run(md_thread, thread,
- "%s_%s",
- mdname(thread->mddev),
- name);
+ thread->tsk = kthread_run(md_thread, thread, "%s_%s",
+ mdname(thread->mddev), name);
if (IS_ERR(thread->tsk)) {
+ int err = PTR_ERR(thread->tsk);
+
kfree(thread);
- return NULL;
+ return err;
}
- return thread;
+
+ *threadp = thread;
+ return 0;
}
EXPORT_SYMBOL(md_register_thread);

@@ -9199,10 +9202,8 @@ static void md_start_sync(struct work_struct *ws)
{
struct mddev *mddev = container_of(ws, struct mddev, del_work);

- mddev->sync_thread = md_register_thread(md_do_sync,
- mddev,
- "resync");
- if (!mddev->sync_thread) {
+ if (md_register_thread(&mddev->sync_thread, md_do_sync, mddev,
+ "resync")) {
pr_warn("%s: could not start resync thread...\n",
mdname(mddev));
/* leave the spares where they are, it shouldn't hurt */
diff --git a/drivers/md/md.h b/drivers/md/md.h
index e148e3c83b0d..344e055e4d0f 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -730,10 +730,9 @@ extern int register_md_cluster_operations(struct md_cluster_operations *ops,
extern int unregister_md_cluster_operations(void);
extern int md_setup_cluster(struct mddev *mddev, int nodes);
extern void md_cluster_stop(struct mddev *mddev);
-extern struct md_thread *md_register_thread(
- void (*run)(struct md_thread *thread),
- struct mddev *mddev,
- const char *name);
+int md_register_thread(struct md_thread **threadp,
+ void (*run)(struct md_thread *thread),
+ struct mddev *mddev, const char *name);
extern void md_unregister_thread(struct md_thread **threadp);
extern void md_wakeup_thread(struct md_thread *thread);
extern void md_check_recovery(struct mddev *mddev);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 68a9e2d9985b..1217c1db0a40 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -3083,9 +3083,8 @@ static struct r1conf *setup_conf(struct mddev *mddev)
}
}

- err = -ENOMEM;
- conf->thread = md_register_thread(raid1d, mddev, "raid1");
- if (!conf->thread)
+ err = md_register_thread(&conf->thread, raid1d, mddev, "raid1");
+ if (err)
goto abort;

return conf;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 6c66357f92f5..0171ba4f19b0 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4077,9 +4077,8 @@ static struct r10conf *setup_conf(struct mddev *mddev)
init_waitqueue_head(&conf->wait_barrier);
atomic_set(&conf->nr_pending, 0);

- err = -ENOMEM;
- conf->thread = md_register_thread(raid10d, mddev, "raid10");
- if (!conf->thread)
+ err = md_register_thread(&conf->thread, raid10d, mddev, "raid10");
+ if (err)
goto out;

conf->mddev = mddev;
@@ -4273,9 +4272,8 @@ static int raid10_run(struct mddev *mddev)
clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
- mddev->sync_thread = md_register_thread(md_do_sync, mddev,
- "reshape");
- if (!mddev->sync_thread)
+ if (md_register_thread(&mddev->sync_thread, md_do_sync, mddev,
+ "reshape"))
goto out_free_conf;
}

@@ -4686,9 +4684,8 @@ static int raid10_start_reshape(struct mddev *mddev)
set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);

- mddev->sync_thread = md_register_thread(md_do_sync, mddev,
- "reshape");
- if (!mddev->sync_thread) {
+ if (md_register_thread(&mddev->sync_thread, md_do_sync, mddev,
+ "reshape")) {
ret = -EAGAIN;
goto abort;
}
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 46182b955aef..0464d4d551fc 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -3121,9 +3121,8 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
spin_lock_init(&log->tree_lock);
INIT_RADIX_TREE(&log->big_stripe_tree, GFP_NOWAIT | __GFP_NOWARN);

- log->reclaim_thread = md_register_thread(r5l_reclaim_thread,
- log->rdev->mddev, "reclaim");
- if (!log->reclaim_thread)
+ if (md_register_thread(&log->reclaim_thread, r5l_reclaim_thread,
+ log->rdev->mddev, "reclaim"))
goto reclaim_thread;
log->reclaim_thread->timeout = R5C_RECLAIM_WAKEUP_INTERVAL;

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 7b820b81d8c2..04b1093195d0 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7665,11 +7665,10 @@ static struct r5conf *setup_conf(struct mddev *mddev)
}

sprintf(pers_name, "raid%d", mddev->new_level);
- conf->thread = md_register_thread(raid5d, mddev, pers_name);
- if (!conf->thread) {
+ ret = md_register_thread(&conf->thread, raid5d, mddev, pers_name);
+ if (ret) {
pr_warn("md/raid:%s: couldn't allocate thread.\n",
mdname(mddev));
- ret = -ENOMEM;
goto abort;
}

@@ -7989,9 +7988,8 @@ static int raid5_run(struct mddev *mddev)
clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
- mddev->sync_thread = md_register_thread(md_do_sync, mddev,
- "reshape");
- if (!mddev->sync_thread)
+ if (md_register_thread(&mddev->sync_thread, md_do_sync, mddev,
+ "reshape"))
goto abort;
}

@@ -8567,9 +8565,8 @@ static int raid5_start_reshape(struct mddev *mddev)
clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
- mddev->sync_thread = md_register_thread(md_do_sync, mddev,
- "reshape");
- if (!mddev->sync_thread) {
+ if (md_register_thread(&mddev->sync_thread, md_do_sync, mddev,
+ "reshape")) {
mddev->recovery = 0;
spin_lock_irq(&conf->device_lock);
write_seqcount_begin(&conf->gen_lock);
--
2.39.2

2023-04-02 09:43:18

by Yu Kuai

[permalink] [raw]
Subject: [PATCH v4 5/5] md: protect md_thread with rcu

From: Yu Kuai <[email protected]>

Our test reports a uaf for 'mddev->sync_thread':

T1 T2
md_start_sync
md_register_thread
// mddev->sync_thread is set
raid1d
md_check_recovery
md_reap_sync_thread
md_unregister_thread
kfree

md_wakeup_thread
wake_up
->sync_thread was freed

Root cause is that there is a small windown between register thread and
wake up thread, where the thread can be freed concurrently.

Currently, a global spinlock 'pers_lock' is borrowed to protect
'mddev->thread', this problem can be fixed likewise, however, there might
be similar problem for other md_thread, and I really don't like the idea to
borrow a global lock.

This patch protect md_thread with rcu.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md-bitmap.c | 21 ++++++++++----
drivers/md/md.c | 62 ++++++++++++++++++----------------------
drivers/md/md.h | 14 ++++-----
drivers/md/raid1.c | 4 +--
drivers/md/raid1.h | 2 +-
drivers/md/raid10.c | 4 +--
drivers/md/raid10.h | 2 +-
drivers/md/raid5-cache.c | 2 +-
drivers/md/raid5.c | 4 +--
drivers/md/raid5.h | 2 +-
10 files changed, 61 insertions(+), 56 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index f670c72d97be..dc2ea2ce0ae9 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -1784,6 +1784,7 @@ void md_bitmap_wait_behind_writes(struct mddev *mddev)
void md_bitmap_destroy(struct mddev *mddev)
{
struct bitmap *bitmap = mddev->bitmap;
+ struct md_thread *thread;

if (!bitmap) /* there was no bitmap */
return;
@@ -1797,8 +1798,12 @@ void md_bitmap_destroy(struct mddev *mddev)
mddev->bitmap = NULL; /* disconnect from the md device */
spin_unlock(&mddev->lock);
mutex_unlock(&mddev->bitmap_info.mutex);
- if (mddev->thread)
- mddev->thread->timeout = MAX_SCHEDULE_TIMEOUT;
+
+ rcu_read_lock();
+ thread = rcu_dereference(mddev->thread);
+ if (thread)
+ thread->timeout = MAX_SCHEDULE_TIMEOUT;
+ rcu_read_unlock();

md_bitmap_free(bitmap);
}
@@ -2433,6 +2438,7 @@ timeout_store(struct mddev *mddev, const char *buf, size_t len)
{
/* timeout can be set at any time */
unsigned long timeout;
+ struct md_thread *thread;
int rv = strict_strtoul_scaled(buf, &timeout, 4);
if (rv)
return rv;
@@ -2448,16 +2454,21 @@ timeout_store(struct mddev *mddev, const char *buf, size_t len)
if (timeout < 1)
timeout = 1;
mddev->bitmap_info.daemon_sleep = timeout;
- if (mddev->thread) {
+
+ rcu_read_lock();
+ thread = rcu_dereference(mddev->thread);
+ if (thread) {
/* if thread->timeout is MAX_SCHEDULE_TIMEOUT, then
* the bitmap is all clean and we don't need to
* adjust the timeout right now
*/
- if (mddev->thread->timeout < MAX_SCHEDULE_TIMEOUT) {
- mddev->thread->timeout = timeout;
+ if (thread->timeout < MAX_SCHEDULE_TIMEOUT) {
+ thread->timeout = timeout;
md_wakeup_thread(mddev->thread);
}
}
+ rcu_read_unlock();
+
return len;
}

diff --git a/drivers/md/md.c b/drivers/md/md.c
index d5a29ccb24ec..5609b7e3abab 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -70,11 +70,7 @@
#include "md-bitmap.h"
#include "md-cluster.h"

-/* pers_list is a list of registered personalities protected
- * by pers_lock.
- * pers_lock does extra service to protect accesses to
- * mddev->thread when the mutex cannot be held.
- */
+/* pers_list is a list of registered personalities protected by pers_lock. */
static LIST_HEAD(pers_list);
static DEFINE_SPINLOCK(pers_lock);

@@ -92,7 +88,7 @@ static struct workqueue_struct *md_rdev_misc_wq;
static int remove_and_add_spares(struct mddev *mddev,
struct md_rdev *this);
static void mddev_detach(struct mddev *mddev);
-static void md_wakeup_thread_directly(struct md_thread *thread);
+static void md_wakeup_thread_directly(struct md_thread __rcu *thread);

enum md_ro_state {
MD_RDWR,
@@ -803,13 +799,8 @@ void mddev_unlock(struct mddev *mddev)
} else
mutex_unlock(&mddev->reconfig_mutex);

- /* As we've dropped the mutex we need a spinlock to
- * make sure the thread doesn't disappear
- */
- spin_lock(&pers_lock);
md_wakeup_thread(mddev->thread);
wake_up(&mddev->sb_wait);
- spin_unlock(&pers_lock);
}
EXPORT_SYMBOL_GPL(mddev_unlock);

@@ -7893,23 +7884,33 @@ static int md_thread(void *arg)
return 0;
}

-static void md_wakeup_thread_directly(struct md_thread *thread)
+static void md_wakeup_thread_directly(struct md_thread __rcu *thread)
{
- if (thread)
- wake_up_process(thread->tsk);
+ struct md_thread *t;
+
+ rcu_read_lock();
+ t = rcu_dereference(thread);
+ if (t)
+ wake_up_process(t->tsk);
+ rcu_read_unlock();
}

-void md_wakeup_thread(struct md_thread *thread)
+void md_wakeup_thread(struct md_thread __rcu *thread)
{
- if (thread) {
- pr_debug("md: waking up MD thread %s.\n", thread->tsk->comm);
- set_bit(THREAD_WAKEUP, &thread->flags);
- wake_up(&thread->wqueue);
+ struct md_thread *t;
+
+ rcu_read_lock();
+ t = rcu_dereference(thread);
+ if (t) {
+ pr_debug("md: waking up MD thread %s.\n", t->tsk->comm);
+ set_bit(THREAD_WAKEUP, &t->flags);
+ wake_up(&t->wqueue);
}
+ rcu_read_unlock();
}
EXPORT_SYMBOL(md_wakeup_thread);

-int md_register_thread(struct md_thread **threadp,
+int md_register_thread(struct md_thread __rcu **threadp,
void (*run)(struct md_thread *),
struct mddev *mddev, const char *name)
{
@@ -7933,27 +7934,20 @@ int md_register_thread(struct md_thread **threadp,
return err;
}

- *threadp = thread;
+ rcu_assign_pointer(*threadp, thread);
return 0;
}
EXPORT_SYMBOL(md_register_thread);

-void md_unregister_thread(struct md_thread **threadp)
+void md_unregister_thread(struct md_thread __rcu **threadp)
{
- struct md_thread *thread;
+ struct md_thread *thread = rcu_access_pointer(*threadp);

- /*
- * Locking ensures that mddev_unlock does not wake_up a
- * non-existent thread
- */
- spin_lock(&pers_lock);
- thread = *threadp;
- if (!thread) {
- spin_unlock(&pers_lock);
+ if (!thread)
return;
- }
- *threadp = NULL;
- spin_unlock(&pers_lock);
+
+ rcu_assign_pointer(*threadp, NULL);
+ synchronize_rcu();

pr_debug("interrupting MD-thread pid %d\n", task_pid_nr(thread->tsk));
kthread_stop(thread->tsk);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 5acdd704a922..0525f6d66a4d 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -367,8 +367,8 @@ struct mddev {
int new_chunk_sectors;
int reshape_backwards;

- struct md_thread *thread; /* management thread */
- struct md_thread *sync_thread; /* doing resync or reconstruct */
+ struct md_thread __rcu *thread; /* management thread */
+ struct md_thread __rcu *sync_thread; /* doing resync or reconstruct */

/* 'last_sync_action' is initialized to "none". It is set when a
* sync operation (i.e "data-check", "requested-resync", "resync",
@@ -719,9 +719,9 @@ struct md_io_acct {
#define THREAD_WAKEUP 0

/* caller need to make sured returned md_thread won't be freed */
-static inline struct md_thread *get_md_thread(struct md_thread *t)
+static inline struct md_thread *get_md_thread(struct md_thread __rcu *t)
{
- return t;
+ return rcu_access_pointer(t);
}

static inline void safe_put_page(struct page *p)
@@ -736,11 +736,11 @@ extern int register_md_cluster_operations(struct md_cluster_operations *ops,
extern int unregister_md_cluster_operations(void);
extern int md_setup_cluster(struct mddev *mddev, int nodes);
extern void md_cluster_stop(struct mddev *mddev);
-int md_register_thread(struct md_thread **threadp,
+int md_register_thread(struct md_thread __rcu **threadp,
void (*run)(struct md_thread *thread),
struct mddev *mddev, const char *name);
-extern void md_unregister_thread(struct md_thread **threadp);
-extern void md_wakeup_thread(struct md_thread *thread);
+extern void md_unregister_thread(struct md_thread __rcu **threadp);
+extern void md_wakeup_thread(struct md_thread __rcu *thread);
extern void md_check_recovery(struct mddev *mddev);
extern void md_reap_sync_thread(struct mddev *mddev);
extern int mddev_init_writes_pending(struct mddev *mddev);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 1217c1db0a40..64f019e3615f 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -3176,8 +3176,8 @@ static int raid1_run(struct mddev *mddev)
/*
* Ok, everything is just fine now
*/
- mddev->thread = conf->thread;
- conf->thread = NULL;
+ rcu_assign_pointer(mddev->thread, conf->thread);
+ rcu_assign_pointer(conf->thread, NULL);
mddev->private = conf;
set_bit(MD_FAILFAST_SUPPORTED, &mddev->flags);

diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
index ebb6788820e7..468f189da7a0 100644
--- a/drivers/md/raid1.h
+++ b/drivers/md/raid1.h
@@ -130,7 +130,7 @@ struct r1conf {
/* When taking over an array from a different personality, we store
* the new thread here until we fully activate the array.
*/
- struct md_thread *thread;
+ struct md_thread __rcu *thread;

/* Keep track of cluster resync window to send to other
* nodes.
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index fc8d07fb1c7d..5509955b79ec 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4140,8 +4140,8 @@ static int raid10_run(struct mddev *mddev)
}
}

- mddev->thread = conf->thread;
- conf->thread = NULL;
+ rcu_assign_pointer(mddev->thread, conf->thread);
+ rcu_assign_pointer(conf->thread, NULL);

if (mddev->queue) {
blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h
index 8c072ce0bc54..63e48b11b552 100644
--- a/drivers/md/raid10.h
+++ b/drivers/md/raid10.h
@@ -100,7 +100,7 @@ struct r10conf {
/* When taking over an array from a different personality, we store
* the new thread here until we fully activate the array.
*/
- struct md_thread *thread;
+ struct md_thread __rcu *thread;

/*
* Keep track of cluster resync window to send to other nodes.
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 7e45df3e093f..b677441d8889 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -120,7 +120,7 @@ struct r5l_log {
struct bio_set bs;
mempool_t meta_pool;

- struct md_thread *reclaim_thread;
+ struct md_thread __rcu *reclaim_thread;
unsigned long reclaim_target; /* number of space that need to be
* reclaimed. if it's 0, reclaim spaces
* used by io_units which are in
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 04b1093195d0..a7e47c37daf3 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7888,8 +7888,8 @@ static int raid5_run(struct mddev *mddev)
}

conf->min_offset_diff = min_offset_diff;
- mddev->thread = conf->thread;
- conf->thread = NULL;
+ rcu_assign_pointer(mddev->thread, conf->thread);
+ rcu_assign_pointer(conf->thread, NULL);
mddev->private = conf;

for (i = 0; i < conf->raid_disks && conf->previous_raid_disks;
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index e873938a6125..f19707189a7b 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -679,7 +679,7 @@ struct r5conf {
/* When taking over an array from a different personality, we store
* the new thread here until we fully activate the array.
*/
- struct md_thread *thread;
+ struct md_thread __rcu *thread;
struct list_head temp_inactive_list[NR_STRIPE_HASH_LOCKS];
struct r5worker_group *worker_groups;
int group_cnt;
--
2.39.2

2023-04-03 15:55:54

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 5/5] md: protect md_thread with rcu



On 2023-04-02 03:12, Yu Kuai wrote:
> From: Yu Kuai <[email protected]>
>
> Our test reports a uaf for 'mddev->sync_thread':
>
> T1 T2
> md_start_sync
> md_register_thread
> // mddev->sync_thread is set
> raid1d
> md_check_recovery
> md_reap_sync_thread
> md_unregister_thread
> kfree
>
> md_wakeup_thread
> wake_up
> ->sync_thread was freed
>
> Root cause is that there is a small windown between register thread and
> wake up thread, where the thread can be freed concurrently.
>
> Currently, a global spinlock 'pers_lock' is borrowed to protect
> 'mddev->thread', this problem can be fixed likewise, however, there might
> be similar problem for other md_thread, and I really don't like the idea to
> borrow a global lock.
>
> This patch protect md_thread with rcu.
>
> Signed-off-by: Yu Kuai <[email protected]>
> ---
> drivers/md/md-bitmap.c | 21 ++++++++++----
> drivers/md/md.c | 62 ++++++++++++++++++----------------------
> drivers/md/md.h | 14 ++++-----
> drivers/md/raid1.c | 4 +--
> drivers/md/raid1.h | 2 +-
> drivers/md/raid10.c | 4 +--
> drivers/md/raid10.h | 2 +-
> drivers/md/raid5-cache.c | 2 +-
> drivers/md/raid5.c | 4 +--
> drivers/md/raid5.h | 2 +-
> 10 files changed, 61 insertions(+), 56 deletions(-)
>
> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
> index f670c72d97be..dc2ea2ce0ae9 100644
> --- a/drivers/md/md-bitmap.c
> +++ b/drivers/md/md-bitmap.c
> @@ -1784,6 +1784,7 @@ void md_bitmap_wait_behind_writes(struct mddev *mddev)
> void md_bitmap_destroy(struct mddev *mddev)
> {
> struct bitmap *bitmap = mddev->bitmap;
> + struct md_thread *thread;
>
> if (!bitmap) /* there was no bitmap */
> return;
> @@ -1797,8 +1798,12 @@ void md_bitmap_destroy(struct mddev *mddev)
> mddev->bitmap = NULL; /* disconnect from the md device */
> spin_unlock(&mddev->lock);
> mutex_unlock(&mddev->bitmap_info.mutex);
> - if (mddev->thread)
> - mddev->thread->timeout = MAX_SCHEDULE_TIMEOUT;
> +
> + rcu_read_lock();
> + thread = rcu_dereference(mddev->thread);
> + if (thread)
> + thread->timeout = MAX_SCHEDULE_TIMEOUT;
> + rcu_read_unlock();
>
> md_bitmap_free(bitmap);
> }
> @@ -2433,6 +2438,7 @@ timeout_store(struct mddev *mddev, const char *buf, size_t len)
> {
> /* timeout can be set at any time */
> unsigned long timeout;
> + struct md_thread *thread;
> int rv = strict_strtoul_scaled(buf, &timeout, 4);
> if (rv)
> return rv;
> @@ -2448,16 +2454,21 @@ timeout_store(struct mddev *mddev, const char *buf, size_t len)
> if (timeout < 1)
> timeout = 1;
> mddev->bitmap_info.daemon_sleep = timeout;
> - if (mddev->thread) {
> +
> + rcu_read_lock();
> + thread = rcu_dereference(mddev->thread);
> + if (thread) {
> /* if thread->timeout is MAX_SCHEDULE_TIMEOUT, then
> * the bitmap is all clean and we don't need to
> * adjust the timeout right now
> */
> - if (mddev->thread->timeout < MAX_SCHEDULE_TIMEOUT) {
> - mddev->thread->timeout = timeout;
> + if (thread->timeout < MAX_SCHEDULE_TIMEOUT) {
> + thread->timeout = timeout;
> md_wakeup_thread(mddev->thread);
> }
> }
> + rcu_read_unlock();
> +
> return len;
> }
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index d5a29ccb24ec..5609b7e3abab 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -70,11 +70,7 @@
> #include "md-bitmap.h"
> #include "md-cluster.h"
>
> -/* pers_list is a list of registered personalities protected
> - * by pers_lock.
> - * pers_lock does extra service to protect accesses to
> - * mddev->thread when the mutex cannot be held.
> - */
> +/* pers_list is a list of registered personalities protected by pers_lock. */
> static LIST_HEAD(pers_list);
> static DEFINE_SPINLOCK(pers_lock);
>
> @@ -92,7 +88,7 @@ static struct workqueue_struct *md_rdev_misc_wq;
> static int remove_and_add_spares(struct mddev *mddev,
> struct md_rdev *this);
> static void mddev_detach(struct mddev *mddev);
> -static void md_wakeup_thread_directly(struct md_thread *thread);
> +static void md_wakeup_thread_directly(struct md_thread __rcu *thread);
>
> enum md_ro_state {
> MD_RDWR,
> @@ -803,13 +799,8 @@ void mddev_unlock(struct mddev *mddev)
> } else
> mutex_unlock(&mddev->reconfig_mutex);
>
> - /* As we've dropped the mutex we need a spinlock to
> - * make sure the thread doesn't disappear
> - */
> - spin_lock(&pers_lock);
> md_wakeup_thread(mddev->thread);
> wake_up(&mddev->sb_wait);
> - spin_unlock(&pers_lock);
> }
> EXPORT_SYMBOL_GPL(mddev_unlock);
>
> @@ -7893,23 +7884,33 @@ static int md_thread(void *arg)
> return 0;
> }
>
> -static void md_wakeup_thread_directly(struct md_thread *thread)
> +static void md_wakeup_thread_directly(struct md_thread __rcu *thread)
> {
> - if (thread)
> - wake_up_process(thread->tsk);
> + struct md_thread *t;
> +
> + rcu_read_lock();
> + t = rcu_dereference(thread);
> + if (t)
> + wake_up_process(t->tsk);
> + rcu_read_unlock();
> }
>
> -void md_wakeup_thread(struct md_thread *thread)
> +void md_wakeup_thread(struct md_thread __rcu *thread)
> {
> - if (thread) {
> - pr_debug("md: waking up MD thread %s.\n", thread->tsk->comm);
> - set_bit(THREAD_WAKEUP, &thread->flags);
> - wake_up(&thread->wqueue);
> + struct md_thread *t;
> +
> + rcu_read_lock();
> + t = rcu_dereference(thread);
> + if (t) {
> + pr_debug("md: waking up MD thread %s.\n", t->tsk->comm);
> + set_bit(THREAD_WAKEUP, &t->flags);
> + wake_up(&t->wqueue);
> }
> + rcu_read_unlock();
> }
> EXPORT_SYMBOL(md_wakeup_thread);
>
> -int md_register_thread(struct md_thread **threadp,
> +int md_register_thread(struct md_thread __rcu **threadp,
> void (*run)(struct md_thread *),
> struct mddev *mddev, const char *name)
> {
> @@ -7933,27 +7934,20 @@ int md_register_thread(struct md_thread **threadp,
> return err;
> }
>
> - *threadp = thread;
> + rcu_assign_pointer(*threadp, thread);
> return 0;
> }
> EXPORT_SYMBOL(md_register_thread);
>
> -void md_unregister_thread(struct md_thread **threadp)
> +void md_unregister_thread(struct md_thread __rcu **threadp)
> {
> - struct md_thread *thread;
> + struct md_thread *thread = rcu_access_pointer(*threadp);
> - /*
> - * Locking ensures that mddev_unlock does not wake_up a
> - * non-existent thread
> - */
> - spin_lock(&pers_lock);
> - thread = *threadp;
> - if (!thread) {
> - spin_unlock(&pers_lock);
> + if (!thread)
> return;
> - }
> - *threadp = NULL;
> - spin_unlock(&pers_lock);
> +
> + rcu_assign_pointer(*threadp, NULL);
> + synchronize_rcu();
>
> pr_debug("interrupting MD-thread pid %d\n", task_pid_nr(thread->tsk));
> kthread_stop(thread->tsk);
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 5acdd704a922..0525f6d66a4d 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -367,8 +367,8 @@ struct mddev {
> int new_chunk_sectors;
> int reshape_backwards;
>
> - struct md_thread *thread; /* management thread */
> - struct md_thread *sync_thread; /* doing resync or reconstruct */
> + struct md_thread __rcu *thread; /* management thread */
> + struct md_thread __rcu *sync_thread; /* doing resync or reconstruct */
>
> /* 'last_sync_action' is initialized to "none". It is set when a
> * sync operation (i.e "data-check", "requested-resync", "resync",
> @@ -719,9 +719,9 @@ struct md_io_acct {
> #define THREAD_WAKEUP 0
>
> /* caller need to make sured returned md_thread won't be freed */
> -static inline struct md_thread *get_md_thread(struct md_thread *t)
> +static inline struct md_thread *get_md_thread(struct md_thread __rcu *t)
> {
> - return t;
> + return rcu_access_pointer(t);

This should not be using rcu_access_pointer(). That function is only
appropriate when the value of t is not being dereferenced. This should
be using rcu_dereference_protected() with some reasoning as to why it's
safe to use this function. It might make sense to open code this for
every call site if the reasoning is different in each location.
Preferrably the second argument in the check should be some lockdep
condition that ensures this. If that's not possible, a comment
explaining the reasoning why it is safe in all the call sites should be
added here.

On one hand this is looking like my idea of using RCU is producing more
churn than the spin lock. On the other hand I think it's cleaning up and
documenting more unsafe use cases (like other potentially unsafe
accesses of the the thread pointer). So I still think the RCU is a good
approach here.

Thanks,

Logan

2023-04-04 01:46:37

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH v4 5/5] md: protect md_thread with rcu

Hi, Logan!

在 2023/04/03 23:53, Logan Gunthorpe 写道:
>>
>> /* caller need to make sured returned md_thread won't be freed */
>> -static inline struct md_thread *get_md_thread(struct md_thread *t)
>> +static inline struct md_thread *get_md_thread(struct md_thread __rcu *t)
>> {
>> - return t;
>> + return rcu_access_pointer(t);
>
> This should not be using rcu_access_pointer(). That function is only
> appropriate when the value of t is not being dereferenced. This should
> be using rcu_dereference_protected() with some reasoning as to why it's
> safe to use this function. It might make sense to open code this for
> every call site if the reasoning is different in each location.
> Preferrably the second argument in the check should be some lockdep
> condition that ensures this. If that's not possible, a comment
> explaining the reasoning why it is safe in all the call sites should be
> added here.

Yes, it's right rcu_dereference_protected() should be used here, I need
to take a look at each call site from patch 3 and figure out if they're
safe without rcu protection.

>
> On one hand this is looking like my idea of using RCU is producing more
> churn than the spin lock. On the other hand I think it's cleaning up and
> documenting more unsafe use cases (like other potentially unsafe
> accesses of the the thread pointer). So I still think the RCU is a good
> approach here.

Yes, some other unsafe accesses is protected now in this patch. I'll
send a new version soon.

Thanks,
Kuai