Hi,
This series includes fixes to fix all the kernel panics in the mdadm
tests and some, related, sparse issues. The first 10 patches
clean refactor the raid5-cache code so that the RCU usage of conf->log
can be cleaned up which is done in patches 11 and 12 -- fixing some
actual kernel NULL pointer dereference crashes in the mdadm test.
Patch 13 fixes some of the remaining sparse warnings that are just
missing __rcu annotations.
Patches 14 and 15 fix a couple additional hangs in an mdadm test.
This series also originally included a patch[1] to fix the
mddev->private=NULL issue in raid0. That bug caused an mdadm tests to
crash, but it seems Xiao beat me to the fix by a few days. Hopefully,
this work to improve mdadm tests will mean these types of bugs will
be caught much sooner, before merging.
This series will be followed by another series for mdadm which fixes
the segfaults and annotates some failing tests to make mdadm tests
runnable fairly reliably, but I'll wait for a stable hash for this
series to note the kernel version tested against. Following that,
v3 of my lock contention series will be sent with more confidence
of its correctness.
This series is based on the current md/md-next branch as of today
(6ad84d559b8c). A git branch is available here:
https://github.com/sbates130272/linux-p2pmem md-bug
Thanks,
Logan
[1] https://github.com/sbates130272/linux-p2pmem/commit/5a538f9f48d77cba111773759256bbc3ccaaa74a
--
Logan Gunthorpe (15):
md/raid5-log: Drop extern decorators for function prototypes
md/raid5-cache: Refactor r5c_is_writeback() to take a struct r5conf
md/raid5-cache: Refactor r5l_start() to take a struct r5conf
md/raid5-cache: Refactor r5l_flush_stripe_to_raid() to take a struct
r5conf
md/raid5-cache: Refactor r5l_wake_reclaim() to take a struct r5conf
md/raid5-cache: Refactor remaining functions to take a r5conf
md/raid5-ppl: Drop unused argument from ppl_handle_flush_request()
md/raid5-cache: Pass the log through to r5c_finish_cache_stripe()
md/raid5-cache: Don't pass conf to r5c_calculate_new_cp()
md/raid5-cache: Take struct r5l_log in
r5c_log_required_to_flush_cache()
md/raid5: Ensure array is suspended for calls to log_exit()
md/raid5-cache: Add RCU protection to conf->log accesses
md/raid5-cache: Annotate pslot with __rcu notation
md: Ensure resync is reported after it starts
md: Notify sysfs sync_completed in md_reap_sync_thread()
drivers/md/md.c | 13 ++-
drivers/md/raid5-cache.c | 240 ++++++++++++++++++++++++---------------
drivers/md/raid5-log.h | 103 ++++++++---------
drivers/md/raid5-ppl.c | 2 +-
drivers/md/raid5.c | 50 ++++----
drivers/md/raid5.h | 2 +-
6 files changed, 231 insertions(+), 179 deletions(-)
base-commit: 6ad84d559b8cbce9ab27a3a2658c438de867c98e
--
2.30.2
The mdadm test 21raid5cache randomly fails with NULL pointer accesses
conf->log when run repeatedly. conf->log was sort of protected with
a RCU, but most dereferences were not done with the correct functions.
Add rcu_read_locks() and rcu_access_pointers() to the appropriate
places.
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/md/raid5-cache.c | 135 +++++++++++++++++++++++++++------------
drivers/md/raid5-log.h | 14 ++--
drivers/md/raid5.c | 4 +-
drivers/md/raid5.h | 2 +-
4 files changed, 104 insertions(+), 51 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index f7b402138d16..1dbc7c4b9a15 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -254,7 +254,14 @@ static bool __r5c_is_writeback(struct r5l_log *log)
bool r5c_is_writeback(struct r5conf *conf)
{
- return __r5c_is_writeback(conf->log);
+ struct r5l_log *log;
+ bool ret;
+
+ rcu_read_lock();
+ log = rcu_dereference(conf->log);
+ ret = __r5c_is_writeback(log);
+ rcu_read_unlock();
+ return ret;
}
static sector_t r5l_ring_add(struct r5l_log *log, sector_t start, sector_t inc)
@@ -340,12 +347,23 @@ static void __r5l_wake_reclaim(struct r5l_log *log, sector_t space)
md_wakeup_thread(log->reclaim_thread);
}
+static struct r5l_log *get_log_for_io(struct r5conf *conf)
+{
+ /*
+ * rcu_dereference_protected is safe because the array will be
+ * quiesced before log_exit() so it can't be called while
+ * an IO is in progress.
+ */
+ return rcu_dereference_protected(conf->log, 1);
+}
+
/* Check whether we should flush some stripes to free up stripe cache */
void r5c_check_stripe_cache_usage(struct r5conf *conf)
{
+ struct r5l_log *log = get_log_for_io(conf);
int total_cached;
- if (!r5c_is_writeback(conf))
+ if (!__r5c_is_writeback(log))
return;
total_cached = atomic_read(&conf->r5c_cached_partial_stripes) +
@@ -361,7 +379,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
*/
if (total_cached > conf->min_nr_stripes * 1 / 2 ||
atomic_read(&conf->empty_inactive_list_nr) > 0)
- __r5l_wake_reclaim(conf->log, 0);
+ __r5l_wake_reclaim(log, 0);
}
/*
@@ -370,8 +388,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
*/
void r5c_check_cached_full_stripe(struct r5conf *conf)
{
- if (!r5c_is_writeback(conf))
- return;
+ struct r5l_log *log = get_log_for_io(conf);
/*
* wake up reclaim for R5C_FULL_STRIPE_FLUSH_BATCH cached stripes
@@ -380,7 +397,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf)
if (atomic_read(&conf->r5c_cached_full_stripes) >=
min(R5C_FULL_STRIPE_FLUSH_BATCH(conf),
conf->chunk_sectors >> RAID5_STRIPE_SHIFT(conf)))
- __r5l_wake_reclaim(conf->log, 0);
+ __r5l_wake_reclaim(log, 0);
}
/*
@@ -703,7 +720,7 @@ static void r5c_disable_writeback_async(struct work_struct *work)
/* wait superblock change before suspend */
wait_event(mddev->sb_wait,
- conf->log == NULL ||
+ rcu_access_pointer(conf->log) ||
(!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
(locked = mddev_trylock(mddev))));
if (locked) {
@@ -1001,7 +1018,7 @@ static inline void r5l_add_no_space_stripe(struct r5l_log *log,
*/
int r5l_write_stripe(struct r5conf *conf, struct stripe_head *sh)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = get_log_for_io(conf);
int write_disks = 0;
int data_pages, parity_pages;
int reserve;
@@ -1099,7 +1116,8 @@ int r5l_write_stripe(struct r5conf *conf, struct stripe_head *sh)
void r5l_write_stripe_run(struct r5conf *conf)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = get_log_for_io(conf);
+
if (!log)
return;
mutex_lock(&log->io_mutex);
@@ -1109,7 +1127,7 @@ void r5l_write_stripe_run(struct r5conf *conf)
int r5l_handle_flush_request(struct r5conf *conf, struct bio *bio)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = get_log_for_io(conf);
if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH) {
/*
@@ -1297,7 +1315,7 @@ static void r5l_log_flush_endio(struct bio *bio)
*/
void r5l_flush_stripe_to_raid(struct r5conf *conf)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = get_log_for_io(conf);
bool do_flush;
if (!log || !log->need_cache_flush)
@@ -1412,7 +1430,7 @@ void r5c_flush_cache(struct r5conf *conf, int num)
struct stripe_head *sh, *next;
lockdep_assert_held(&conf->device_lock);
- if (!conf->log)
+ if (!rcu_access_pointer(conf->log))
return;
count = 0;
@@ -1431,9 +1449,8 @@ void r5c_flush_cache(struct r5conf *conf, int num)
}
}
-static void r5c_do_reclaim(struct r5conf *conf)
+static void r5c_do_reclaim(struct r5conf *conf, struct r5l_log *log)
{
- struct r5l_log *log = conf->log;
struct stripe_head *sh;
int count = 0;
unsigned long flags;
@@ -1441,7 +1458,7 @@ static void r5c_do_reclaim(struct r5conf *conf)
int stripes_to_flush;
int flushing_partial, flushing_full;
- if (!r5c_is_writeback(conf))
+ if (!__r5c_is_writeback(log))
return;
flushing_partial = atomic_read(&conf->r5c_flushing_partial_stripes);
@@ -1561,22 +1578,30 @@ static void r5l_reclaim_thread(struct md_thread *thread)
{
struct mddev *mddev = thread->mddev;
struct r5conf *conf = mddev->private;
- struct r5l_log *log = conf->log;
+ struct r5l_log *log;
+ /*
+ * This is safe, because the reclaim thread will be stopped before
+ * the log is freed in log_exit()
+ */
+ log = rcu_dereference_protected(conf->log, 1);
if (!log)
return;
- r5c_do_reclaim(conf);
+
+ r5c_do_reclaim(conf, log);
r5l_do_reclaim(log);
}
void r5l_wake_reclaim(struct r5conf *conf, sector_t space)
{
- __r5l_wake_reclaim(conf->log, space);
+ struct r5l_log *log = get_log_for_io(conf);
+
+ __r5l_wake_reclaim(log, space);
}
void r5l_quiesce(struct r5conf *conf, int quiesce)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = get_log_for_io(conf);
struct mddev *mddev;
if (quiesce) {
@@ -2534,16 +2559,20 @@ static void r5l_write_super(struct r5l_log *log, sector_t cp)
static ssize_t r5c_journal_mode_show(struct mddev *mddev, char *page)
{
struct r5conf *conf;
- int ret;
+ struct r5l_log *log;
+ int ret = 0;
spin_lock(&mddev->lock);
conf = mddev->private;
- if (!conf || !conf->log) {
- spin_unlock(&mddev->lock);
- return 0;
- }
+ if (!conf)
+ goto out_spin_unlock;
+
+ rcu_read_lock();
+ log = rcu_dereference(conf->log);
+ if (!log)
+ goto out;
- switch (conf->log->r5c_journal_mode) {
+ switch (log->r5c_journal_mode) {
case R5C_JOURNAL_MODE_WRITE_THROUGH:
ret = snprintf(
page, PAGE_SIZE, "[%s] %s\n",
@@ -2559,6 +2588,10 @@ static ssize_t r5c_journal_mode_show(struct mddev *mddev, char *page)
default:
ret = 0;
}
+
+out:
+ rcu_read_unlock();
+out_spin_unlock:
spin_unlock(&mddev->lock);
return ret;
}
@@ -2572,13 +2605,15 @@ static ssize_t r5c_journal_mode_show(struct mddev *mddev, char *page)
int r5c_journal_mode_set(struct mddev *mddev, int mode)
{
struct r5conf *conf;
+ struct r5l_log *log;
+ int ret = 0;
if (mode < R5C_JOURNAL_MODE_WRITE_THROUGH ||
mode > R5C_JOURNAL_MODE_WRITE_BACK)
return -EINVAL;
conf = mddev->private;
- if (!conf || !conf->log)
+ if (!conf || !rcu_access_pointer(conf->log))
return -ENODEV;
if (raid5_calc_degraded(conf) > 0 &&
@@ -2586,12 +2621,19 @@ int r5c_journal_mode_set(struct mddev *mddev, int mode)
return -EINVAL;
mddev_suspend(mddev);
- conf->log->r5c_journal_mode = mode;
+ rcu_read_lock();
+ log = rcu_dereference(conf->log);
+ if (log) {
+ log->r5c_journal_mode = mode;
+ pr_debug("md/raid:%s: setting r5c cache mode to %d: %s\n",
+ mdname(mddev), mode, r5c_journal_mode_str[mode]);
+ } else {
+ ret = -ENODEV;
+ }
+ rcu_read_unlock();
mddev_resume(mddev);
- pr_debug("md/raid:%s: setting r5c cache mode to %d: %s\n",
- mdname(mddev), mode, r5c_journal_mode_str[mode]);
- return 0;
+ return ret;
}
EXPORT_SYMBOL(r5c_journal_mode_set);
@@ -2637,7 +2679,7 @@ int r5c_try_caching_write(struct r5conf *conf,
struct stripe_head_state *s,
int disks)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = get_log_for_io(conf);
int i;
struct r5dev *dev;
int to_cache = 0;
@@ -2804,7 +2846,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,
struct stripe_head *sh,
struct stripe_head_state *s)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = get_log_for_io(conf);
int i;
int do_wakeup = 0;
sector_t tree_index;
@@ -2887,7 +2929,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,
int r5c_cache_data(struct r5conf *conf, struct stripe_head *sh)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = get_log_for_io(conf);
int pages = 0;
int reserve;
int i;
@@ -2943,7 +2985,7 @@ int r5c_cache_data(struct r5conf *conf, struct stripe_head *sh)
/* check whether this big stripe is in write back cache. */
bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = get_log_for_io(conf);
sector_t tree_index;
void *slot;
@@ -2953,6 +2995,7 @@ bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect)
WARN_ON_ONCE(!rcu_read_lock_held());
tree_index = r5c_tree_index(conf, sect);
slot = radix_tree_lookup(&log->big_stripe_tree, tree_index);
+
return slot != NULL;
}
@@ -3033,30 +3076,38 @@ static int r5l_load_log(struct r5l_log *log)
int r5l_start(struct r5conf *conf)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log;
int ret;
+ log = rcu_dereference_protected(conf->log,
+ lockdep_is_held(&conf->mddev->reconfig_mutex));
if (!log)
return 0;
ret = r5l_load_log(log);
if (ret)
r5l_exit_log(conf);
+
return ret;
}
void r5c_update_on_rdev_error(struct mddev *mddev, struct md_rdev *rdev)
{
struct r5conf *conf = mddev->private;
- struct r5l_log *log = conf->log;
+ struct r5l_log *log;
+ rcu_read_lock();
+ log = rcu_dereference(conf->log);
if (!log)
- return;
+ goto out;
if ((raid5_calc_degraded(conf) > 0 ||
test_bit(Journal, &rdev->flags)) &&
- conf->log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK)
+ log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK)
schedule_work(&log->disable_writeback_work);
+
+out:
+ rcu_read_unlock();
}
int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
@@ -3164,15 +3215,17 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
void r5l_exit_log(struct r5conf *conf)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log;
- conf->log = NULL;
- synchronize_rcu();
+ lockdep_assert_held(&conf->mddev->reconfig_mutex);
+ log = rcu_replace_pointer(conf->log, NULL, 1);
/* Ensure disable_writeback_work wakes up and exits */
wake_up(&conf->mddev->sb_wait);
flush_work(&log->disable_writeback_work);
md_unregister_thread(&log->reclaim_thread);
+ synchronize_rcu();
+
mempool_exit(&log->meta_pool);
bioset_exit(&log->bs);
mempool_exit(&log->io_pool);
diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
index f26e6f4c7f9a..24b4dbd5b25c 100644
--- a/drivers/md/raid5-log.h
+++ b/drivers/md/raid5-log.h
@@ -58,7 +58,7 @@ static inline int log_stripe(struct stripe_head *sh, struct stripe_head_state *s
{
struct r5conf *conf = sh->raid_conf;
- if (conf->log) {
+ if (rcu_access_pointer(conf->log)) {
if (!test_bit(STRIPE_R5C_CACHING, &sh->state)) {
/* writing out phase */
if (s->waiting_extra_page)
@@ -79,7 +79,7 @@ static inline void log_stripe_write_finished(struct stripe_head *sh)
{
struct r5conf *conf = sh->raid_conf;
- if (conf->log)
+ if (rcu_access_pointer(conf->log))
r5l_stripe_write_finished(sh);
else if (raid5_has_ppl(conf))
ppl_stripe_write_finished(sh);
@@ -87,7 +87,7 @@ static inline void log_stripe_write_finished(struct stripe_head *sh)
static inline void log_write_stripe_run(struct r5conf *conf)
{
- if (conf->log)
+ if (rcu_access_pointer(conf->log))
r5l_write_stripe_run(conf);
else if (raid5_has_ppl(conf))
ppl_write_stripe_run(conf);
@@ -95,7 +95,7 @@ static inline void log_write_stripe_run(struct r5conf *conf)
static inline void log_flush_stripe_to_raid(struct r5conf *conf)
{
- if (conf->log)
+ if (rcu_access_pointer(conf->log))
r5l_flush_stripe_to_raid(conf);
else if (raid5_has_ppl(conf))
ppl_write_stripe_run(conf);
@@ -105,7 +105,7 @@ static inline int log_handle_flush_request(struct r5conf *conf, struct bio *bio)
{
int ret = -ENODEV;
- if (conf->log)
+ if (rcu_access_pointer(conf->log))
ret = r5l_handle_flush_request(conf, bio);
else if (raid5_has_ppl(conf))
ret = ppl_handle_flush_request(bio);
@@ -115,7 +115,7 @@ static inline int log_handle_flush_request(struct r5conf *conf, struct bio *bio)
static inline void log_quiesce(struct r5conf *conf, int quiesce)
{
- if (conf->log)
+ if (rcu_access_pointer(conf->log))
r5l_quiesce(conf, quiesce);
else if (raid5_has_ppl(conf))
ppl_quiesce(conf, quiesce);
@@ -123,7 +123,7 @@ static inline void log_quiesce(struct r5conf *conf, int quiesce)
static inline void log_exit(struct r5conf *conf)
{
- if (conf->log)
+ if (rcu_access_pointer(conf->log))
r5l_exit_log(conf);
else if (raid5_has_ppl(conf))
ppl_exit_log(conf);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 37fe2af77c93..c06c9ef88417 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7937,7 +7937,7 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
struct md_rdev *tmp;
print_raid5_conf(conf);
- if (test_bit(Journal, &rdev->flags) && conf->log) {
+ if (test_bit(Journal, &rdev->flags) && rcu_access_pointer(conf->log)) {
mddev_suspend(mddev);
log_exit(conf);
mddev_resume(mddev);
@@ -8019,7 +8019,7 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
int last = conf->raid_disks - 1;
if (test_bit(Journal, &rdev->flags)) {
- if (conf->log)
+ if (rcu_access_pointer(conf->log))
return -EBUSY;
rdev->raid_disk = 0;
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index 638d29863503..04fe5b6f679c 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -684,7 +684,7 @@ struct r5conf {
struct r5worker_group *worker_groups;
int group_cnt;
int worker_cnt_per_group;
- struct r5l_log *log;
+ struct r5l_log __rcu *log;
void *log_private;
spinlock_t pending_bios_lock;
--
2.30.2
The raid5-cache code relies on there being no IO in flight when
log_exit() is called. There are two places where this is not
guaranteed so add mddev_suspend() and mddev_resume() calls to these
sites.
The site in raid5_remove_disk() has a comment saying that it is
called in raid5d and thus cannot wait for pending writes; however that
does not appear to be correct anymore (if it ever was) as
raid5_remove_disk() is called from hot_remove_disk() which only
appears to be called in the md_ioctl(). Thus, the comment is removed,
as well as the racy check and replaced with calls to suspend/resume.
The site in raid5_change_consistency_policy() is in the error path,
and another similar call site already has suspend/resume calls just
below it; so it should be equally safe to make that change here.
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/md/raid5.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 09e768f2d32c..37fe2af77c93 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7938,18 +7938,9 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
print_raid5_conf(conf);
if (test_bit(Journal, &rdev->flags) && conf->log) {
- /*
- * we can't wait pending write here, as this is called in
- * raid5d, wait will deadlock.
- * neilb: there is no locking about new writes here,
- * so this cannot be safe.
- */
- if (atomic_read(&conf->active_stripes) ||
- atomic_read(&conf->r5c_cached_full_stripes) ||
- atomic_read(&conf->r5c_cached_partial_stripes)) {
- return -EBUSY;
- }
+ mddev_suspend(mddev);
log_exit(conf);
+ mddev_resume(mddev);
return 0;
}
if (rdev == rcu_access_pointer(p->rdev))
@@ -8697,8 +8688,11 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf)
err = log_init(conf, NULL, true);
if (!err) {
err = resize_stripes(conf, conf->pool_size);
- if (err)
+ if (err) {
+ mddev_suspend(mddev);
log_exit(conf);
+ mddev_resume(mddev);
+ }
}
} else
err = -EINVAL;
--
2.30.2
All calls to r5l_flush_stripe_to_raid() have to dereference conf to
get the log. The log pointer should be protected by RCU, but isn't. Push
the dereference of conf->log into r5l_flush_stripe_to_raid() to reduce
the number of sites that dereference conf->log so this can be improved.
No functional changes intended.
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/md/raid5-cache.c | 3 ++-
drivers/md/raid5-log.h | 4 ++--
drivers/md/raid5.c | 6 +++---
3 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 0049c4f45e9c..6db8040fb01b 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -1287,8 +1287,9 @@ static void r5l_log_flush_endio(struct bio *bio)
* only write stripes of an io_unit to raid disks till the io_unit is the first
* one whose data/parity is in log.
*/
-void r5l_flush_stripe_to_raid(struct r5l_log *log)
+void r5l_flush_stripe_to_raid(struct r5conf *conf)
{
+ struct r5l_log *log = conf->log;
bool do_flush;
if (!log || !log->need_cache_flush)
diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
index c6f877df4f3e..ead541075528 100644
--- a/drivers/md/raid5-log.h
+++ b/drivers/md/raid5-log.h
@@ -6,7 +6,7 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev);
void r5l_exit_log(struct r5conf *conf);
int r5l_write_stripe(struct r5l_log *log, struct stripe_head *head_sh);
void r5l_write_stripe_run(struct r5l_log *log);
-void r5l_flush_stripe_to_raid(struct r5l_log *log);
+void r5l_flush_stripe_to_raid(struct r5conf *conf);
void r5l_stripe_write_finished(struct stripe_head *sh);
int r5l_handle_flush_request(struct r5l_log *log, struct bio *bio);
void r5l_quiesce(struct r5l_log *log, int quiesce);
@@ -96,7 +96,7 @@ static inline void log_write_stripe_run(struct r5conf *conf)
static inline void log_flush_stripe_to_raid(struct r5conf *conf)
{
if (conf->log)
- r5l_flush_stripe_to_raid(conf->log);
+ r5l_flush_stripe_to_raid(conf);
else if (raid5_has_ppl(conf))
ppl_write_stripe_run(conf);
}
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 355760f34cb6..436be2a42cc1 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6427,7 +6427,7 @@ static int handle_active_stripes(struct r5conf *conf, int group,
release_inactive_stripe_list(conf, temp_inactive_list,
NR_STRIPE_HASH_LOCKS);
- r5l_flush_stripe_to_raid(conf->log);
+ r5l_flush_stripe_to_raid(conf);
if (release_inactive) {
spin_lock_irq(&conf->device_lock);
return 0;
@@ -6483,7 +6483,7 @@ static void raid5_do_work(struct work_struct *work)
flush_deferred_bios(conf);
- r5l_flush_stripe_to_raid(conf->log);
+ r5l_flush_stripe_to_raid(conf);
async_tx_issue_pending_all();
blk_finish_plug(&plug);
@@ -6570,7 +6570,7 @@ static void raid5d(struct md_thread *thread)
flush_deferred_bios(conf);
- r5l_flush_stripe_to_raid(conf->log);
+ r5l_flush_stripe_to_raid(conf);
async_tx_issue_pending_all();
blk_finish_plug(&plug);
--
2.30.2
radix_tree_lookup_slot() and radix_tree_replace_slot() API expect the
slot returned and looked up to be marked with __rcu. Otherwise
sparse warnings are generated:
drivers/md/raid5-cache.c:2939:23: warning: incorrect type in
assignment (different address spaces)
drivers/md/raid5-cache.c:2939:23: expected void **pslot
drivers/md/raid5-cache.c:2939:23: got void [noderef] __rcu **
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/md/raid5-cache.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 1dbc7c4b9a15..a541d17f84f6 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -2683,7 +2683,7 @@ int r5c_try_caching_write(struct r5conf *conf,
int i;
struct r5dev *dev;
int to_cache = 0;
- void **pslot;
+ void __rcu **pslot;
sector_t tree_index;
int ret;
uintptr_t refcount;
@@ -2850,7 +2850,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,
int i;
int do_wakeup = 0;
sector_t tree_index;
- void **pslot;
+ void __rcu **pslot;
uintptr_t refcount;
if (!log || !test_bit(R5_InJournal, &sh->dev[sh->pd_idx].flags))
--
2.30.2
ppl_handle_flush_request() takes an struct r5log argument but doesn't
use it. It has no buisiness taking this argument as it is only used
by raid5-cache and has no way to derference it anyway. Remove
the argument.
No functional changes intended.
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/md/raid5-log.h | 4 ++--
drivers/md/raid5-ppl.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
index ccfbf8814753..f26e6f4c7f9a 100644
--- a/drivers/md/raid5-log.h
+++ b/drivers/md/raid5-log.h
@@ -41,7 +41,7 @@ void ppl_write_stripe_run(struct r5conf *conf);
void ppl_stripe_write_finished(struct stripe_head *sh);
int ppl_modify_log(struct r5conf *conf, struct md_rdev *rdev, bool add);
void ppl_quiesce(struct r5conf *conf, int quiesce);
-int ppl_handle_flush_request(struct r5l_log *log, struct bio *bio);
+int ppl_handle_flush_request(struct bio *bio);
extern struct md_sysfs_entry ppl_write_hint;
static inline bool raid5_has_log(struct r5conf *conf)
@@ -108,7 +108,7 @@ static inline int log_handle_flush_request(struct r5conf *conf, struct bio *bio)
if (conf->log)
ret = r5l_handle_flush_request(conf, bio);
else if (raid5_has_ppl(conf))
- ret = ppl_handle_flush_request(conf->log, bio);
+ ret = ppl_handle_flush_request(bio);
return ret;
}
diff --git a/drivers/md/raid5-ppl.c b/drivers/md/raid5-ppl.c
index 973e2e06f19c..4f5bdb4cad2b 100644
--- a/drivers/md/raid5-ppl.c
+++ b/drivers/md/raid5-ppl.c
@@ -679,7 +679,7 @@ void ppl_quiesce(struct r5conf *conf, int quiesce)
}
}
-int ppl_handle_flush_request(struct r5l_log *log, struct bio *bio)
+int ppl_handle_flush_request(struct bio *bio)
{
if (bio->bi_iter.bi_size == 0) {
bio_endio(bio);
--
2.30.2
Most calls to r5c_is_writeback() have to dereference conf to get the
log. The log pointer should be protected by RCU, but isn't. Push
the dereference of conf->log into r5c_is_writeback() to reduce
the number of sites that dereference conf->log so this can be improved.
No functional changes intended.
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/md/raid5-cache.c | 25 ++++++++++++++-----------
drivers/md/raid5-log.h | 2 +-
drivers/md/raid5.c | 16 ++++++++--------
3 files changed, 23 insertions(+), 20 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 83c184eddbda..241c271e836d 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -247,10 +247,14 @@ enum r5l_io_unit_state {
IO_UNIT_STRIPE_END = 3, /* stripes data finished writing to raid */
};
-bool r5c_is_writeback(struct r5l_log *log)
+static bool __r5c_is_writeback(struct r5l_log *log)
{
- return (log != NULL &&
- log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK);
+ return log && log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK;
+}
+
+bool r5c_is_writeback(struct r5conf *conf)
+{
+ return __r5c_is_writeback(conf->log);
}
static sector_t r5l_ring_add(struct r5l_log *log, sector_t start, sector_t inc)
@@ -328,7 +332,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
{
int total_cached;
- if (!r5c_is_writeback(conf->log))
+ if (!r5c_is_writeback(conf))
return;
total_cached = atomic_read(&conf->r5c_cached_partial_stripes) +
@@ -353,7 +357,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
*/
void r5c_check_cached_full_stripe(struct r5conf *conf)
{
- if (!r5c_is_writeback(conf->log))
+ if (!r5c_is_writeback(conf))
return;
/*
@@ -398,7 +402,7 @@ static sector_t r5c_log_required_to_flush_cache(struct r5conf *conf)
{
struct r5l_log *log = conf->log;
- if (!r5c_is_writeback(log))
+ if (!r5c_is_writeback(conf))
return 0;
return BLOCK_SECTORS *
@@ -420,7 +424,7 @@ static inline void r5c_update_log_state(struct r5l_log *log)
sector_t reclaim_space;
bool wake_reclaim = false;
- if (!r5c_is_writeback(log))
+ if (!__r5c_is_writeback(log))
return;
free_space = r5l_ring_distance(log, log->log_start,
@@ -449,9 +453,8 @@ static inline void r5c_update_log_state(struct r5l_log *log)
void r5c_make_stripe_write_out(struct stripe_head *sh)
{
struct r5conf *conf = sh->raid_conf;
- struct r5l_log *log = conf->log;
- BUG_ON(!r5c_is_writeback(log));
+ BUG_ON(!r5c_is_writeback(conf));
WARN_ON(!test_bit(STRIPE_R5C_CACHING, &sh->state));
clear_bit(STRIPE_R5C_CACHING, &sh->state);
@@ -1429,7 +1432,7 @@ static void r5c_do_reclaim(struct r5conf *conf)
int stripes_to_flush;
int flushing_partial, flushing_full;
- if (!r5c_is_writeback(log))
+ if (!r5c_is_writeback(conf))
return;
flushing_partial = atomic_read(&conf->r5c_flushing_partial_stripes);
@@ -2644,7 +2647,7 @@ int r5c_try_caching_write(struct r5conf *conf,
int ret;
uintptr_t refcount;
- BUG_ON(!r5c_is_writeback(log));
+ BUG_ON(!__r5c_is_writeback(log));
if (!test_bit(STRIPE_R5C_CACHING, &sh->state)) {
/*
diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
index 270ced4f770f..8a98b5e8c04b 100644
--- a/drivers/md/raid5-log.h
+++ b/drivers/md/raid5-log.h
@@ -11,7 +11,7 @@ void r5l_stripe_write_finished(struct stripe_head *sh);
int r5l_handle_flush_request(struct r5l_log *log, struct bio *bio);
void r5l_quiesce(struct r5l_log *log, int quiesce);
bool r5l_log_disk_error(struct r5conf *conf);
-bool r5c_is_writeback(struct r5l_log *log);
+bool r5c_is_writeback(struct r5conf *conf);
int r5c_try_caching_write(struct r5conf *conf, struct stripe_head *sh,
struct stripe_head_state *s, int disks);
void r5c_finish_stripe_write_out(struct r5conf *conf, struct stripe_head *sh,
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 5d09256d7f81..bbc0e1101d0f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -224,7 +224,7 @@ static void do_release_stripe(struct r5conf *conf, struct stripe_head *sh,
BUG_ON(!list_empty(&sh->lru));
BUG_ON(atomic_read(&conf->active_stripes)==0);
- if (r5c_is_writeback(conf->log))
+ if (r5c_is_writeback(conf))
for (i = sh->disks; i--; )
if (test_bit(R5_InJournal, &sh->dev[i].flags))
injournal++;
@@ -236,7 +236,7 @@ static void do_release_stripe(struct r5conf *conf, struct stripe_head *sh,
* 2. when resync is requested fot the stripe.
*/
if (test_bit(STRIPE_SYNC_REQUESTED, &sh->state) ||
- (conf->quiesce && r5c_is_writeback(conf->log) &&
+ (conf->quiesce && r5c_is_writeback(conf) &&
!test_bit(STRIPE_HANDLE, &sh->state) && injournal != 0)) {
if (test_bit(STRIPE_R5C_CACHING, &sh->state))
r5c_make_stripe_write_out(sh);
@@ -274,7 +274,7 @@ static void do_release_stripe(struct r5conf *conf, struct stripe_head *sh,
md_wakeup_thread(conf->mddev->thread);
atomic_dec(&conf->active_stripes);
if (!test_bit(STRIPE_EXPANDING, &sh->state)) {
- if (!r5c_is_writeback(conf->log))
+ if (!r5c_is_writeback(conf))
list_add_tail(&sh->lru, temp_inactive_list);
else {
WARN_ON(test_bit(R5_InJournal, &sh->dev[sh->pd_idx].flags));
@@ -1786,7 +1786,7 @@ static void ops_complete_prexor(void *stripe_head_ref)
pr_debug("%s: stripe %llu\n", __func__,
(unsigned long long)sh->sector);
- if (r5c_is_writeback(sh->raid_conf->log))
+ if (r5c_is_writeback(sh->raid_conf))
/*
* raid5-cache write back uses orig_page during prexor.
* After prexor, it is time to free orig_page
@@ -1905,9 +1905,9 @@ ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
tx = async_copy_data(1, wbi, &dev->page,
dev->offset,
dev->sector, tx, sh,
- r5c_is_writeback(conf->log));
+ r5c_is_writeback(conf));
if (dev->page != dev->orig_page &&
- !r5c_is_writeback(conf->log)) {
+ !r5c_is_writeback(conf)) {
set_bit(R5_SkipCopy, &dev->flags);
clear_bit(R5_UPTODATE, &dev->flags);
clear_bit(R5_OVERWRITE, &dev->flags);
@@ -5085,7 +5085,7 @@ static void handle_stripe(struct stripe_head *sh)
*/
if (!sh->reconstruct_state && !sh->check_state && !sh->log_io) {
- if (!r5c_is_writeback(conf->log)) {
+ if (!r5c_is_writeback(conf)) {
if (s.to_write)
handle_stripe_dirtying(conf, sh, &s, disks);
} else { /* write back cache */
@@ -5533,7 +5533,7 @@ static struct stripe_head *__get_priority_stripe(struct r5conf *conf, int group)
struct stripe_head *sh, *tmp;
struct list_head *handle_list = NULL;
struct r5worker_group *wg;
- bool second_try = !r5c_is_writeback(conf->log) &&
+ bool second_try = !r5c_is_writeback(conf) &&
!r5l_log_disk_error(conf);
bool try_loprio = test_bit(R5C_LOG_TIGHT, &conf->cache_state) ||
r5l_log_disk_error(conf);
--
2.30.2
All calls to r5l_wake_reclaim() have to dereference conf to get the
log. The log pointer should be protected by RCU, but isn't. Push
the dereference of conf->log into r5l_wake_reclaim() to reduce
the number of sites that dereference conf->log so this can be improved.
No functional changes intended.
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/md/raid5-cache.c | 41 +++++++++++++++++++++-------------------
drivers/md/raid5-log.h | 2 +-
drivers/md/raid5.c | 2 +-
3 files changed, 24 insertions(+), 21 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 6db8040fb01b..735666f9d793 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -325,7 +325,20 @@ void r5c_handle_cached_data_endio(struct r5conf *conf,
}
}
-void r5l_wake_reclaim(struct r5l_log *log, sector_t space);
+static void __r5l_wake_reclaim(struct r5l_log *log, sector_t space)
+{
+ unsigned long target;
+ unsigned long new = (unsigned long)space; /* overflow in theory */
+
+ if (!log)
+ return;
+ do {
+ target = log->reclaim_target;
+ if (new < target)
+ return;
+ } while (cmpxchg(&log->reclaim_target, target, new) != target);
+ md_wakeup_thread(log->reclaim_thread);
+}
/* Check whether we should flush some stripes to free up stripe cache */
void r5c_check_stripe_cache_usage(struct r5conf *conf)
@@ -348,7 +361,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
*/
if (total_cached > conf->min_nr_stripes * 1 / 2 ||
atomic_read(&conf->empty_inactive_list_nr) > 0)
- r5l_wake_reclaim(conf->log, 0);
+ __r5l_wake_reclaim(conf->log, 0);
}
/*
@@ -367,7 +380,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf)
if (atomic_read(&conf->r5c_cached_full_stripes) >=
min(R5C_FULL_STRIPE_FLUSH_BATCH(conf),
conf->chunk_sectors >> RAID5_STRIPE_SHIFT(conf)))
- r5l_wake_reclaim(conf->log, 0);
+ __r5l_wake_reclaim(conf->log, 0);
}
/*
@@ -443,7 +456,7 @@ static inline void r5c_update_log_state(struct r5l_log *log)
clear_bit(R5C_LOG_TIGHT, &conf->cache_state);
if (wake_reclaim)
- r5l_wake_reclaim(log, 0);
+ __r5l_wake_reclaim(log, 0);
}
/*
@@ -1085,7 +1098,7 @@ int r5l_write_stripe(struct r5l_log *log, struct stripe_head *sh)
mutex_unlock(&log->io_mutex);
if (wake_reclaim)
- r5l_wake_reclaim(log, reserve);
+ __r5l_wake_reclaim(log, reserve);
return 0;
}
@@ -1237,7 +1250,7 @@ static void __r5l_stripe_write_finished(struct r5l_io_unit *io)
if (r5l_reclaimable_space(log) > log->max_free_space ||
test_bit(R5C_LOG_TIGHT, &conf->cache_state))
- r5l_wake_reclaim(log, 0);
+ __r5l_wake_reclaim(log, 0);
spin_unlock_irqrestore(&log->io_list_lock, flags);
wake_up(&log->iounit_wait);
@@ -1562,19 +1575,9 @@ static void r5l_reclaim_thread(struct md_thread *thread)
r5l_do_reclaim(log);
}
-void r5l_wake_reclaim(struct r5l_log *log, sector_t space)
+void r5l_wake_reclaim(struct r5conf *conf, sector_t space)
{
- unsigned long target;
- unsigned long new = (unsigned long)space; /* overflow in theory */
-
- if (!log)
- return;
- do {
- target = log->reclaim_target;
- if (new < target)
- return;
- } while (cmpxchg(&log->reclaim_target, target, new) != target);
- md_wakeup_thread(log->reclaim_thread);
+ __r5l_wake_reclaim(conf->log, space);
}
void r5l_quiesce(struct r5l_log *log, int quiesce)
@@ -1586,7 +1589,7 @@ void r5l_quiesce(struct r5l_log *log, int quiesce)
mddev = log->rdev->mddev;
wake_up(&mddev->sb_wait);
kthread_park(log->reclaim_thread->tsk);
- r5l_wake_reclaim(log, MaxSector);
+ __r5l_wake_reclaim(log, MaxSector);
r5l_do_reclaim(log);
} else
kthread_unpark(log->reclaim_thread->tsk);
diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
index ead541075528..3dd59dd4257f 100644
--- a/drivers/md/raid5-log.h
+++ b/drivers/md/raid5-log.h
@@ -18,7 +18,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf, struct stripe_head *sh,
struct stripe_head_state *s);
void r5c_release_extra_page(struct stripe_head *sh);
void r5c_use_extra_page(struct stripe_head *sh);
-void r5l_wake_reclaim(struct r5l_log *log, sector_t space);
+void r5l_wake_reclaim(struct r5conf *conf, sector_t space);
void r5c_handle_cached_data_endio(struct r5conf *conf,
struct stripe_head *sh, int disks);
int r5c_cache_data(struct r5l_log *log, struct stripe_head *sh);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 436be2a42cc1..09e768f2d32c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -742,7 +742,7 @@ raid5_get_active_stripe(struct r5conf *conf, sector_t sector,
if (!sh) {
set_bit(R5_INACTIVE_BLOCKED,
&conf->cache_state);
- r5l_wake_reclaim(conf->log, 0);
+ r5l_wake_reclaim(conf, 0);
wait_event_lock_irq(
conf->wait_for_stripe,
!list_empty(conf->inactive_list + hash) &&
--
2.30.2
Looks good:
Reviewed-by: Christoph Hellwig <[email protected]>
On Thu, May 19, 2022 at 12:13 PM Logan Gunthorpe <[email protected]> wrote:
>
> Hi,
>
> This series includes fixes to fix all the kernel panics in the mdadm
> tests and some, related, sparse issues. The first 10 patches
> clean refactor the raid5-cache code so that the RCU usage of conf->log
> can be cleaned up which is done in patches 11 and 12 -- fixing some
> actual kernel NULL pointer dereference crashes in the mdadm test.
>
> Patch 13 fixes some of the remaining sparse warnings that are just
> missing __rcu annotations.
>
> Patches 14 and 15 fix a couple additional hangs in an mdadm test.
>
> This series also originally included a patch[1] to fix the
> mddev->private=NULL issue in raid0. That bug caused an mdadm tests to
> crash, but it seems Xiao beat me to the fix by a few days. Hopefully,
> this work to improve mdadm tests will mean these types of bugs will
> be caught much sooner, before merging.
Thanks for the fix! The set looks good overall. Please address feedback
from Christoph and Donald. We should be able to ship it soon.
Thanks,
Song
>
> This series will be followed by another series for mdadm which fixes
> the segfaults and annotates some failing tests to make mdadm tests
> runnable fairly reliably, but I'll wait for a stable hash for this
> series to note the kernel version tested against. Following that,
> v3 of my lock contention series will be sent with more confidence
> of its correctness.
>
> This series is based on the current md/md-next branch as of today
> (6ad84d559b8c). A git branch is available here:
>
> https://github.com/sbates130272/linux-p2pmem md-bug
>
> Thanks,
>
> Logan
>
> [1] https://github.com/sbates130272/linux-p2pmem/commit/5a538f9f48d77cba111773759256bbc3ccaaa74a
>
> --
>
> Logan Gunthorpe (15):
> md/raid5-log: Drop extern decorators for function prototypes
> md/raid5-cache: Refactor r5c_is_writeback() to take a struct r5conf
> md/raid5-cache: Refactor r5l_start() to take a struct r5conf
> md/raid5-cache: Refactor r5l_flush_stripe_to_raid() to take a struct
> r5conf
> md/raid5-cache: Refactor r5l_wake_reclaim() to take a struct r5conf
> md/raid5-cache: Refactor remaining functions to take a r5conf
> md/raid5-ppl: Drop unused argument from ppl_handle_flush_request()
> md/raid5-cache: Pass the log through to r5c_finish_cache_stripe()
> md/raid5-cache: Don't pass conf to r5c_calculate_new_cp()
> md/raid5-cache: Take struct r5l_log in
> r5c_log_required_to_flush_cache()
> md/raid5: Ensure array is suspended for calls to log_exit()
> md/raid5-cache: Add RCU protection to conf->log accesses
> md/raid5-cache: Annotate pslot with __rcu notation
> md: Ensure resync is reported after it starts
> md: Notify sysfs sync_completed in md_reap_sync_thread()
>
> drivers/md/md.c | 13 ++-
> drivers/md/raid5-cache.c | 240 ++++++++++++++++++++++++---------------
> drivers/md/raid5-log.h | 103 ++++++++---------
> drivers/md/raid5-ppl.c | 2 +-
> drivers/md/raid5.c | 50 ++++----
> drivers/md/raid5.h | 2 +-
> 6 files changed, 231 insertions(+), 179 deletions(-)
>
>
> base-commit: 6ad84d559b8cbce9ab27a3a2658c438de867c98e
> --
> 2.30.2