Hi,
This is the updated series with the feedback received in v1[1].
This series includes fixes to fix all the kernel panics in the mdadm
tests and some, related, sparse issues. The first 12 patches
clean refactor the raid5-cache code so that the RCU usage of conf->log
can be cleaned up which is done in patch 13 -- fixing some actual kernel
NULL pointer dereference crashes in the mdadm test.
Patch 14 fixes some of the remaining sparse warnings that are just
missing __rcu annotations.
Patches 15 provides a cleanup for patches 16 and 17 which fix a couple
additional hangs seen in an mdadm test.
This series will be followed by another series for mdadm which fixes
the segfaults and annotates some failing tests to make mdadm tests
runnable fairly reliably, but I'll wait for a stable hash for this
series to note the kernel version tested against. Following that,
v3 of my lock contention series will be sent with more confidence
of its correctness.
This series is based on the current md/md-next branch as of today
(42b805af10). A git branch is available here:
https://github.com/sbates130272/linux-p2pmem md-bug_v2
Thanks,
Logan
[1] https://lore.kernel.org/all/[email protected]
--
Changes since v1:
* Add a patch to move the struct r5l_log to raid5-log.h in order
to fix a compiler error with rcu_access_pointer() in versions
prior to gcc-10
* Rework r5c_is_writeback() changes to make less churn (per Christoph)
* Change some 1s to trues in rcu_dereference_protected calls (per
Christoph)
* Fix an odd hunk mistake in the RCU protection patch (per Christoph)
* Fix an inverted conditional (noticed by Donald)
* Add a patch to add an enum for the overloaded values used by
mddev->curr_resync to make the status_resync() fixes clearer
(per Christoph)
--
Logan Gunthorpe (17):
md/raid5-log: Drop extern decorators for function prototypes
md/raid5-cache: Add r5c_conf_is_writeback() helper
md/raid5-cache: Refactor r5l_start() to take a struct r5conf
md/raid5-cache: Refactor r5l_flush_stripe_to_raid() to take a struct
r5conf
md/raid5-cache: Refactor r5l_wake_reclaim() to take a struct r5conf
md/raid5-cache: Refactor remaining functions to take a r5conf
md/raid5-ppl: Drop unused argument from ppl_handle_flush_request()
md/raid5-cache: Pass the log through to r5c_finish_cache_stripe()
md/raid5-cache: Don't pass conf to r5c_calculate_new_cp()
md/raid5-cache: Take struct r5l_log in
r5c_log_required_to_flush_cache()
md/raid5: Ensure array is suspended for calls to log_exit()
md/raid5-cache: Move struct r5l_log definition to raid5-log.h
md/raid5-cache: Add RCU protection to conf->log accesses
md/raid5-cache: Annotate pslot with __rcu notation
md: Use enum for overloaded magic numbers used by mddev->curr_resync
md: Ensure resync is reported after it starts
md: Notify sysfs sync_completed in md_reap_sync_thread()
drivers/md/md.c | 55 +++----
drivers/md/md.h | 15 ++
drivers/md/raid5-cache.c | 304 ++++++++++++++++++---------------------
drivers/md/raid5-log.h | 178 ++++++++++++++++-------
drivers/md/raid5-ppl.c | 2 +-
drivers/md/raid5.c | 50 +++----
drivers/md/raid5.h | 2 +-
7 files changed, 336 insertions(+), 270 deletions(-)
base-commit: 42b805af102471f53e3c7867b8c2b502ea4eef7e
--
2.30.2
All calls to r5l_start() have to dereference conf to get the
log. The log pointer should be protected by RCU, but isn't. Push
the dereference of conf->log into r5l_start() to reduce
the number of sites that dereference conf->log so this can be improved.
No functional changes intended.
Signed-off-by: Logan Gunthorpe <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
drivers/md/raid5-cache.c | 9 +++------
drivers/md/raid5-log.h | 2 +-
drivers/md/raid5.c | 4 ++--
3 files changed, 6 insertions(+), 9 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 96f51ce9b6c1..7c5dc45b1ea8 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -3034,20 +3034,17 @@ static int r5l_load_log(struct r5l_log *log)
return ret;
}
-int r5l_start(struct r5l_log *log)
+int r5l_start(struct r5conf *conf)
{
+ struct r5l_log *log = conf->log;
int ret;
if (!log)
return 0;
ret = r5l_load_log(log);
- if (ret) {
- struct mddev *mddev = log->rdev->mddev;
- struct r5conf *conf = mddev->private;
-
+ if (ret)
r5l_exit_log(conf);
- }
return ret;
}
diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
index 720e164e4d4d..db4c1262b380 100644
--- a/drivers/md/raid5-log.h
+++ b/drivers/md/raid5-log.h
@@ -29,7 +29,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf);
extern struct md_sysfs_entry r5c_journal_mode;
void r5c_update_on_rdev_error(struct mddev *mddev, struct md_rdev *rdev);
bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect);
-int r5l_start(struct r5l_log *log);
+int r5l_start(struct r5conf *conf);
struct dma_async_tx_descriptor *
ops_run_partial_parity(struct stripe_head *sh, struct raid5_percpu *percpu,
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 458d88faf2e9..83911affcd88 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -8040,7 +8040,7 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
if (ret)
return ret;
- ret = r5l_start(conf->log);
+ ret = r5l_start(conf);
if (ret)
return ret;
@@ -8743,7 +8743,7 @@ static int raid5_start(struct mddev *mddev)
{
struct r5conf *conf = mddev->private;
- return r5l_start(conf->log);
+ return r5l_start(conf);
}
static struct md_personality raid6_personality =
--
2.30.2
Many of the interface functions dereference conf->log in inline
helpers in raid5-log.h. Refactor all off these to dereference
conf->log inside the function instead. This will help to fix the
rcu locking when accessing conf->log.
No functional changes intended.
Signed-off-by: Logan Gunthorpe <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
drivers/md/raid5-cache.c | 18 +++++++++++-------
drivers/md/raid5-log.h | 20 ++++++++++----------
2 files changed, 21 insertions(+), 17 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 3427f95cb358..24110b687055 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -1006,9 +1006,9 @@ static inline void r5l_add_no_space_stripe(struct r5l_log *log,
* running in raid5d, where reclaim could wait for raid5d too (when it flushes
* data from log to raid disks), so we shouldn't wait for reclaim here
*/
-int r5l_write_stripe(struct r5l_log *log, struct stripe_head *sh)
+int r5l_write_stripe(struct r5conf *conf, struct stripe_head *sh)
{
- struct r5conf *conf = sh->raid_conf;
+ struct r5l_log *log = conf->log;
int write_disks = 0;
int data_pages, parity_pages;
int reserve;
@@ -1104,8 +1104,9 @@ int r5l_write_stripe(struct r5l_log *log, struct stripe_head *sh)
return 0;
}
-void r5l_write_stripe_run(struct r5l_log *log)
+void r5l_write_stripe_run(struct r5conf *conf)
{
+ struct r5l_log *log = conf->log;
if (!log)
return;
mutex_lock(&log->io_mutex);
@@ -1113,8 +1114,10 @@ void r5l_write_stripe_run(struct r5l_log *log)
mutex_unlock(&log->io_mutex);
}
-int r5l_handle_flush_request(struct r5l_log *log, struct bio *bio)
+int r5l_handle_flush_request(struct r5conf *conf, struct bio *bio)
{
+ struct r5l_log *log = conf->log;
+
if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH) {
/*
* in write through (journal only)
@@ -1582,8 +1585,9 @@ void r5l_wake_reclaim(struct r5conf *conf, sector_t space)
__r5l_wake_reclaim(conf->log, space);
}
-void r5l_quiesce(struct r5l_log *log, int quiesce)
+void r5l_quiesce(struct r5conf *conf, int quiesce)
{
+ struct r5l_log *log = conf->log;
struct mddev *mddev;
if (quiesce) {
@@ -2892,9 +2896,9 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,
set_bit(STRIPE_HANDLE, &sh->state);
}
-int r5c_cache_data(struct r5l_log *log, struct stripe_head *sh)
+int r5c_cache_data(struct r5conf *conf, struct stripe_head *sh)
{
- struct r5conf *conf = sh->raid_conf;
+ struct r5l_log *log = conf->log;
int pages = 0;
int reserve;
int i;
diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
index 7594d2d80520..e39db84c5de0 100644
--- a/drivers/md/raid5-log.h
+++ b/drivers/md/raid5-log.h
@@ -4,12 +4,12 @@
int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev);
void r5l_exit_log(struct r5conf *conf);
-int r5l_write_stripe(struct r5l_log *log, struct stripe_head *head_sh);
-void r5l_write_stripe_run(struct r5l_log *log);
+int r5l_write_stripe(struct r5conf *conf, struct stripe_head *head_sh);
+void r5l_write_stripe_run(struct r5conf *conf);
void r5l_flush_stripe_to_raid(struct r5conf *conf);
void r5l_stripe_write_finished(struct stripe_head *sh);
-int r5l_handle_flush_request(struct r5l_log *log, struct bio *bio);
-void r5l_quiesce(struct r5l_log *log, int quiesce);
+int r5l_handle_flush_request(struct r5conf *conf, struct bio *bio);
+void r5l_quiesce(struct r5conf *conf, int quiesce);
bool r5l_log_disk_error(struct r5conf *conf);
bool r5c_conf_is_writeback(struct r5conf *conf);
int r5c_try_caching_write(struct r5conf *conf, struct stripe_head *sh,
@@ -21,7 +21,7 @@ void r5c_use_extra_page(struct stripe_head *sh);
void r5l_wake_reclaim(struct r5conf *conf, sector_t space);
void r5c_handle_cached_data_endio(struct r5conf *conf,
struct stripe_head *sh, int disks);
-int r5c_cache_data(struct r5l_log *log, struct stripe_head *sh);
+int r5c_cache_data(struct r5conf *conf, struct stripe_head *sh);
void r5c_make_stripe_write_out(struct stripe_head *sh);
void r5c_flush_cache(struct r5conf *conf, int num);
void r5c_check_stripe_cache_usage(struct r5conf *conf);
@@ -63,10 +63,10 @@ static inline int log_stripe(struct stripe_head *sh, struct stripe_head_state *s
/* writing out phase */
if (s->waiting_extra_page)
return 0;
- return r5l_write_stripe(conf->log, sh);
+ return r5l_write_stripe(conf, sh);
} else if (test_bit(STRIPE_LOG_TRAPPED, &sh->state)) {
/* caching phase */
- return r5c_cache_data(conf->log, sh);
+ return r5c_cache_data(conf, sh);
}
} else if (raid5_has_ppl(conf)) {
return ppl_write_stripe(conf, sh);
@@ -88,7 +88,7 @@ static inline void log_stripe_write_finished(struct stripe_head *sh)
static inline void log_write_stripe_run(struct r5conf *conf)
{
if (conf->log)
- r5l_write_stripe_run(conf->log);
+ r5l_write_stripe_run(conf);
else if (raid5_has_ppl(conf))
ppl_write_stripe_run(conf);
}
@@ -106,7 +106,7 @@ static inline int log_handle_flush_request(struct r5conf *conf, struct bio *bio)
int ret = -ENODEV;
if (conf->log)
- ret = r5l_handle_flush_request(conf->log, bio);
+ ret = r5l_handle_flush_request(conf, bio);
else if (raid5_has_ppl(conf))
ret = ppl_handle_flush_request(conf->log, bio);
@@ -116,7 +116,7 @@ static inline int log_handle_flush_request(struct r5conf *conf, struct bio *bio)
static inline void log_quiesce(struct r5conf *conf, int quiesce)
{
if (conf->log)
- r5l_quiesce(conf->log, quiesce);
+ r5l_quiesce(conf, quiesce);
else if (raid5_has_ppl(conf))
ppl_quiesce(conf, quiesce);
}
--
2.30.2
Most calls to r5c_is_writeback() have to dereference conf to get the
log. The log pointer should be protected by RCU, but isn't. Push
the dereference of conf->log into a new r5c_conf_is_writeback() helper
to reduce the number of sites that dereference conf->log so this can be
improved.
No functional changes intended.
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/md/raid5-cache.c | 7 ++++++-
drivers/md/raid5-log.h | 2 +-
drivers/md/raid5.c | 16 ++++++++--------
3 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 83c184eddbda..96f51ce9b6c1 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -247,12 +247,17 @@ enum r5l_io_unit_state {
IO_UNIT_STRIPE_END = 3, /* stripes data finished writing to raid */
};
-bool r5c_is_writeback(struct r5l_log *log)
+static bool r5c_is_writeback(struct r5l_log *log)
{
return (log != NULL &&
log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK);
}
+bool r5c_conf_is_writeback(struct r5conf *conf)
+{
+ return r5c_is_writeback(conf->log);
+}
+
static sector_t r5l_ring_add(struct r5l_log *log, sector_t start, sector_t inc)
{
start += inc;
diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
index 270ced4f770f..720e164e4d4d 100644
--- a/drivers/md/raid5-log.h
+++ b/drivers/md/raid5-log.h
@@ -11,7 +11,7 @@ void r5l_stripe_write_finished(struct stripe_head *sh);
int r5l_handle_flush_request(struct r5l_log *log, struct bio *bio);
void r5l_quiesce(struct r5l_log *log, int quiesce);
bool r5l_log_disk_error(struct r5conf *conf);
-bool r5c_is_writeback(struct r5l_log *log);
+bool r5c_conf_is_writeback(struct r5conf *conf);
int r5c_try_caching_write(struct r5conf *conf, struct stripe_head *sh,
struct stripe_head_state *s, int disks);
void r5c_finish_stripe_write_out(struct r5conf *conf, struct stripe_head *sh,
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 5d09256d7f81..458d88faf2e9 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -224,7 +224,7 @@ static void do_release_stripe(struct r5conf *conf, struct stripe_head *sh,
BUG_ON(!list_empty(&sh->lru));
BUG_ON(atomic_read(&conf->active_stripes)==0);
- if (r5c_is_writeback(conf->log))
+ if (r5c_conf_is_writeback(conf))
for (i = sh->disks; i--; )
if (test_bit(R5_InJournal, &sh->dev[i].flags))
injournal++;
@@ -236,7 +236,7 @@ static void do_release_stripe(struct r5conf *conf, struct stripe_head *sh,
* 2. when resync is requested fot the stripe.
*/
if (test_bit(STRIPE_SYNC_REQUESTED, &sh->state) ||
- (conf->quiesce && r5c_is_writeback(conf->log) &&
+ (conf->quiesce && r5c_conf_is_writeback(conf) &&
!test_bit(STRIPE_HANDLE, &sh->state) && injournal != 0)) {
if (test_bit(STRIPE_R5C_CACHING, &sh->state))
r5c_make_stripe_write_out(sh);
@@ -274,7 +274,7 @@ static void do_release_stripe(struct r5conf *conf, struct stripe_head *sh,
md_wakeup_thread(conf->mddev->thread);
atomic_dec(&conf->active_stripes);
if (!test_bit(STRIPE_EXPANDING, &sh->state)) {
- if (!r5c_is_writeback(conf->log))
+ if (!r5c_conf_is_writeback(conf))
list_add_tail(&sh->lru, temp_inactive_list);
else {
WARN_ON(test_bit(R5_InJournal, &sh->dev[sh->pd_idx].flags));
@@ -1786,7 +1786,7 @@ static void ops_complete_prexor(void *stripe_head_ref)
pr_debug("%s: stripe %llu\n", __func__,
(unsigned long long)sh->sector);
- if (r5c_is_writeback(sh->raid_conf->log))
+ if (r5c_conf_is_writeback(sh->raid_conf))
/*
* raid5-cache write back uses orig_page during prexor.
* After prexor, it is time to free orig_page
@@ -1905,9 +1905,9 @@ ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
tx = async_copy_data(1, wbi, &dev->page,
dev->offset,
dev->sector, tx, sh,
- r5c_is_writeback(conf->log));
+ r5c_conf_is_writeback(conf));
if (dev->page != dev->orig_page &&
- !r5c_is_writeback(conf->log)) {
+ !r5c_conf_is_writeback(conf)) {
set_bit(R5_SkipCopy, &dev->flags);
clear_bit(R5_UPTODATE, &dev->flags);
clear_bit(R5_OVERWRITE, &dev->flags);
@@ -5085,7 +5085,7 @@ static void handle_stripe(struct stripe_head *sh)
*/
if (!sh->reconstruct_state && !sh->check_state && !sh->log_io) {
- if (!r5c_is_writeback(conf->log)) {
+ if (!r5c_conf_is_writeback(conf)) {
if (s.to_write)
handle_stripe_dirtying(conf, sh, &s, disks);
} else { /* write back cache */
@@ -5533,7 +5533,7 @@ static struct stripe_head *__get_priority_stripe(struct r5conf *conf, int group)
struct stripe_head *sh, *tmp;
struct list_head *handle_list = NULL;
struct r5worker_group *wg;
- bool second_try = !r5c_is_writeback(conf->log) &&
+ bool second_try = !r5c_conf_is_writeback(conf) &&
!r5l_log_disk_error(conf);
bool try_loprio = test_bit(R5C_LOG_TIGHT, &conf->cache_state) ||
r5l_log_disk_error(conf);
--
2.30.2
All calls to r5l_wake_reclaim() have to dereference conf to get the
log. The log pointer should be protected by RCU, but isn't. Push
the dereference of conf->log into r5l_wake_reclaim() to reduce
the number of sites that dereference conf->log so this can be improved.
No functional changes intended.
Signed-off-by: Logan Gunthorpe <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
drivers/md/raid5-cache.c | 41 +++++++++++++++++++++-------------------
drivers/md/raid5-log.h | 2 +-
drivers/md/raid5.c | 2 +-
3 files changed, 24 insertions(+), 21 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index edff4e8d07dc..3427f95cb358 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -326,7 +326,20 @@ void r5c_handle_cached_data_endio(struct r5conf *conf,
}
}
-void r5l_wake_reclaim(struct r5l_log *log, sector_t space);
+static void __r5l_wake_reclaim(struct r5l_log *log, sector_t space)
+{
+ unsigned long target;
+ unsigned long new = (unsigned long)space; /* overflow in theory */
+
+ if (!log)
+ return;
+ do {
+ target = log->reclaim_target;
+ if (new < target)
+ return;
+ } while (cmpxchg(&log->reclaim_target, target, new) != target);
+ md_wakeup_thread(log->reclaim_thread);
+}
/* Check whether we should flush some stripes to free up stripe cache */
void r5c_check_stripe_cache_usage(struct r5conf *conf)
@@ -349,7 +362,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
*/
if (total_cached > conf->min_nr_stripes * 1 / 2 ||
atomic_read(&conf->empty_inactive_list_nr) > 0)
- r5l_wake_reclaim(conf->log, 0);
+ __r5l_wake_reclaim(conf->log, 0);
}
/*
@@ -368,7 +381,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf)
if (atomic_read(&conf->r5c_cached_full_stripes) >=
min(R5C_FULL_STRIPE_FLUSH_BATCH(conf),
conf->chunk_sectors >> RAID5_STRIPE_SHIFT(conf)))
- r5l_wake_reclaim(conf->log, 0);
+ __r5l_wake_reclaim(conf->log, 0);
}
/*
@@ -444,7 +457,7 @@ static inline void r5c_update_log_state(struct r5l_log *log)
clear_bit(R5C_LOG_TIGHT, &conf->cache_state);
if (wake_reclaim)
- r5l_wake_reclaim(log, 0);
+ __r5l_wake_reclaim(log, 0);
}
/*
@@ -1087,7 +1100,7 @@ int r5l_write_stripe(struct r5l_log *log, struct stripe_head *sh)
mutex_unlock(&log->io_mutex);
if (wake_reclaim)
- r5l_wake_reclaim(log, reserve);
+ __r5l_wake_reclaim(log, reserve);
return 0;
}
@@ -1239,7 +1252,7 @@ static void __r5l_stripe_write_finished(struct r5l_io_unit *io)
if (r5l_reclaimable_space(log) > log->max_free_space ||
test_bit(R5C_LOG_TIGHT, &conf->cache_state))
- r5l_wake_reclaim(log, 0);
+ __r5l_wake_reclaim(log, 0);
spin_unlock_irqrestore(&log->io_list_lock, flags);
wake_up(&log->iounit_wait);
@@ -1564,19 +1577,9 @@ static void r5l_reclaim_thread(struct md_thread *thread)
r5l_do_reclaim(log);
}
-void r5l_wake_reclaim(struct r5l_log *log, sector_t space)
+void r5l_wake_reclaim(struct r5conf *conf, sector_t space)
{
- unsigned long target;
- unsigned long new = (unsigned long)space; /* overflow in theory */
-
- if (!log)
- return;
- do {
- target = log->reclaim_target;
- if (new < target)
- return;
- } while (cmpxchg(&log->reclaim_target, target, new) != target);
- md_wakeup_thread(log->reclaim_thread);
+ __r5l_wake_reclaim(conf->log, space);
}
void r5l_quiesce(struct r5l_log *log, int quiesce)
@@ -1588,7 +1591,7 @@ void r5l_quiesce(struct r5l_log *log, int quiesce)
mddev = log->rdev->mddev;
wake_up(&mddev->sb_wait);
kthread_park(log->reclaim_thread->tsk);
- r5l_wake_reclaim(log, MaxSector);
+ __r5l_wake_reclaim(log, MaxSector);
r5l_do_reclaim(log);
} else
kthread_unpark(log->reclaim_thread->tsk);
diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
index 5ace25d11ea4..7594d2d80520 100644
--- a/drivers/md/raid5-log.h
+++ b/drivers/md/raid5-log.h
@@ -18,7 +18,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf, struct stripe_head *sh,
struct stripe_head_state *s);
void r5c_release_extra_page(struct stripe_head *sh);
void r5c_use_extra_page(struct stripe_head *sh);
-void r5l_wake_reclaim(struct r5l_log *log, sector_t space);
+void r5l_wake_reclaim(struct r5conf *conf, sector_t space);
void r5c_handle_cached_data_endio(struct r5conf *conf,
struct stripe_head *sh, int disks);
int r5c_cache_data(struct r5l_log *log, struct stripe_head *sh);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index f1b55495de53..0d486f8aaf87 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -742,7 +742,7 @@ raid5_get_active_stripe(struct r5conf *conf, sector_t sector,
if (!sh) {
set_bit(R5_INACTIVE_BLOCKED,
&conf->cache_state);
- r5l_wake_reclaim(conf->log, 0);
+ r5l_wake_reclaim(conf, 0);
wait_event_lock_irq(
conf->wait_for_stripe,
!list_empty(conf->inactive_list + hash) &&
--
2.30.2
The raid5-cache code relies on there being no IO in flight when
log_exit() is called. There are two places where this is not
guaranteed so add mddev_suspend() and mddev_resume() calls to these
sites.
The site in raid5_remove_disk() has a comment saying that it is
called in raid5d and thus cannot wait for pending writes; however that
does not appear to be correct anymore (if it ever was) as
raid5_remove_disk() is called from hot_remove_disk() which only
appears to be called in the md_ioctl(). Thus, the comment is removed,
as well as the racy check and replaced with calls to suspend/resume.
The site in raid5_change_consistency_policy() is in the error path,
and another similar call site already has suspend/resume calls just
below it; so it should be equally safe to make that change here.
Signed-off-by: Logan Gunthorpe <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
drivers/md/raid5.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 0d486f8aaf87..4b25a0262cb5 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7938,18 +7938,9 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
print_raid5_conf(conf);
if (test_bit(Journal, &rdev->flags) && conf->log) {
- /*
- * we can't wait pending write here, as this is called in
- * raid5d, wait will deadlock.
- * neilb: there is no locking about new writes here,
- * so this cannot be safe.
- */
- if (atomic_read(&conf->active_stripes) ||
- atomic_read(&conf->r5c_cached_full_stripes) ||
- atomic_read(&conf->r5c_cached_partial_stripes)) {
- return -EBUSY;
- }
+ mddev_suspend(mddev);
log_exit(conf);
+ mddev_resume(mddev);
return 0;
}
if (rdev == rcu_access_pointer(p->rdev))
@@ -8697,8 +8688,11 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf)
err = log_init(conf, NULL, true);
if (!err) {
err = resize_stripes(conf, conf->pool_size);
- if (err)
+ if (err) {
+ mddev_suspend(mddev);
log_exit(conf);
+ mddev_resume(mddev);
+ }
}
} else
err = -EINVAL;
--
2.30.2
The mdadm test 21raid5cache randomly fails with NULL pointer accesses
of conf->log when run repeatedly. conf->log was sort of protected with
RCU, but most dereferences were not done with the correct functions.
Add rcu_read_locks(), rcu_dereference_protected() and rcu_access_pointers()
calls to the appropriate places and mark the pointer with __rcu.
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/md/raid5-cache.c | 132 +++++++++++++++++++++++++++------------
drivers/md/raid5-log.h | 14 ++---
drivers/md/raid5.c | 4 +-
drivers/md/raid5.h | 2 +-
4 files changed, 102 insertions(+), 50 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 6349cfaae2c8..7c56ee977c3a 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -180,7 +180,12 @@ static bool r5c_is_writeback(struct r5l_log *log)
bool r5c_conf_is_writeback(struct r5conf *conf)
{
- return r5c_is_writeback(conf->log);
+ bool ret;
+
+ rcu_read_lock();
+ ret = r5c_is_writeback(rcu_dereference(conf->log));
+ rcu_read_unlock();
+ return ret;
}
static sector_t r5l_ring_add(struct r5l_log *log, sector_t start, sector_t inc)
@@ -266,12 +271,23 @@ static void __r5l_wake_reclaim(struct r5l_log *log, sector_t space)
md_wakeup_thread(log->reclaim_thread);
}
+static struct r5l_log *get_log_for_io(struct r5conf *conf)
+{
+ /*
+ * rcu_dereference_protected is safe because the array will be
+ * quiesced before log_exit() so it can't be called while
+ * an IO is in progress.
+ */
+ return rcu_dereference_protected(conf->log, true);
+}
+
/* Check whether we should flush some stripes to free up stripe cache */
void r5c_check_stripe_cache_usage(struct r5conf *conf)
{
+ struct r5l_log *log = get_log_for_io(conf);
int total_cached;
- if (!r5c_is_writeback(conf->log))
+ if (!r5c_is_writeback(log))
return;
total_cached = atomic_read(&conf->r5c_cached_partial_stripes) +
@@ -287,7 +303,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
*/
if (total_cached > conf->min_nr_stripes * 1 / 2 ||
atomic_read(&conf->empty_inactive_list_nr) > 0)
- __r5l_wake_reclaim(conf->log, 0);
+ __r5l_wake_reclaim(log, 0);
}
/*
@@ -296,7 +312,9 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
*/
void r5c_check_cached_full_stripe(struct r5conf *conf)
{
- if (!r5c_is_writeback(conf->log))
+ struct r5l_log *log = get_log_for_io(conf);
+
+ if (!r5c_is_writeback(log))
return;
/*
@@ -306,7 +324,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf)
if (atomic_read(&conf->r5c_cached_full_stripes) >=
min(R5C_FULL_STRIPE_FLUSH_BATCH(conf),
conf->chunk_sectors >> RAID5_STRIPE_SHIFT(conf)))
- __r5l_wake_reclaim(conf->log, 0);
+ __r5l_wake_reclaim(log, 0);
}
/*
@@ -388,7 +406,7 @@ static inline void r5c_update_log_state(struct r5l_log *log)
void r5c_make_stripe_write_out(struct stripe_head *sh)
{
struct r5conf *conf = sh->raid_conf;
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = get_log_for_io(conf);
BUG_ON(!r5c_is_writeback(log));
@@ -630,7 +648,7 @@ static void r5c_disable_writeback_async(struct work_struct *work)
/* wait superblock change before suspend */
wait_event(mddev->sb_wait,
- conf->log == NULL ||
+ !rcu_access_pointer(conf->log) ||
(!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
(locked = mddev_trylock(mddev))));
if (locked) {
@@ -928,7 +946,7 @@ static inline void r5l_add_no_space_stripe(struct r5l_log *log,
*/
int r5l_write_stripe(struct r5conf *conf, struct stripe_head *sh)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = get_log_for_io(conf);
int write_disks = 0;
int data_pages, parity_pages;
int reserve;
@@ -1026,7 +1044,8 @@ int r5l_write_stripe(struct r5conf *conf, struct stripe_head *sh)
void r5l_write_stripe_run(struct r5conf *conf)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = get_log_for_io(conf);
+
if (!log)
return;
mutex_lock(&log->io_mutex);
@@ -1036,7 +1055,7 @@ void r5l_write_stripe_run(struct r5conf *conf)
int r5l_handle_flush_request(struct r5conf *conf, struct bio *bio)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = get_log_for_io(conf);
if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH) {
/*
@@ -1224,7 +1243,7 @@ static void r5l_log_flush_endio(struct bio *bio)
*/
void r5l_flush_stripe_to_raid(struct r5conf *conf)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = get_log_for_io(conf);
bool do_flush;
if (!log || !log->need_cache_flush)
@@ -1339,7 +1358,7 @@ void r5c_flush_cache(struct r5conf *conf, int num)
struct stripe_head *sh, *next;
lockdep_assert_held(&conf->device_lock);
- if (!conf->log)
+ if (!rcu_access_pointer(conf->log))
return;
count = 0;
@@ -1358,9 +1377,8 @@ void r5c_flush_cache(struct r5conf *conf, int num)
}
}
-static void r5c_do_reclaim(struct r5conf *conf)
+static void r5c_do_reclaim(struct r5conf *conf, struct r5l_log *log)
{
- struct r5l_log *log = conf->log;
struct stripe_head *sh;
int count = 0;
unsigned long flags;
@@ -1488,22 +1506,28 @@ static void r5l_reclaim_thread(struct md_thread *thread)
{
struct mddev *mddev = thread->mddev;
struct r5conf *conf = mddev->private;
- struct r5l_log *log = conf->log;
+ struct r5l_log *log;
+ /*
+ * This is safe, because the reclaim thread will be stopped before
+ * the log is freed in log_exit()
+ */
+ log = rcu_dereference_protected(conf->log, true);
if (!log)
return;
- r5c_do_reclaim(conf);
+
+ r5c_do_reclaim(conf, log);
r5l_do_reclaim(log);
}
void r5l_wake_reclaim(struct r5conf *conf, sector_t space)
{
- __r5l_wake_reclaim(conf->log, space);
+ __r5l_wake_reclaim(get_log_for_io(conf), space);
}
void r5l_quiesce(struct r5conf *conf, int quiesce)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = get_log_for_io(conf);
struct mddev *mddev;
if (quiesce) {
@@ -2461,16 +2485,20 @@ static void r5l_write_super(struct r5l_log *log, sector_t cp)
static ssize_t r5c_journal_mode_show(struct mddev *mddev, char *page)
{
struct r5conf *conf;
- int ret;
+ struct r5l_log *log;
+ int ret = 0;
spin_lock(&mddev->lock);
conf = mddev->private;
- if (!conf || !conf->log) {
- spin_unlock(&mddev->lock);
- return 0;
- }
+ if (!conf)
+ goto out_spin_unlock;
- switch (conf->log->r5c_journal_mode) {
+ rcu_read_lock();
+ log = rcu_dereference(conf->log);
+ if (!log)
+ goto out;
+
+ switch (log->r5c_journal_mode) {
case R5C_JOURNAL_MODE_WRITE_THROUGH:
ret = snprintf(
page, PAGE_SIZE, "[%s] %s\n",
@@ -2486,6 +2514,10 @@ static ssize_t r5c_journal_mode_show(struct mddev *mddev, char *page)
default:
ret = 0;
}
+
+out:
+ rcu_read_unlock();
+out_spin_unlock:
spin_unlock(&mddev->lock);
return ret;
}
@@ -2499,13 +2531,15 @@ static ssize_t r5c_journal_mode_show(struct mddev *mddev, char *page)
int r5c_journal_mode_set(struct mddev *mddev, int mode)
{
struct r5conf *conf;
+ struct r5l_log *log;
+ int ret = 0;
if (mode < R5C_JOURNAL_MODE_WRITE_THROUGH ||
mode > R5C_JOURNAL_MODE_WRITE_BACK)
return -EINVAL;
conf = mddev->private;
- if (!conf || !conf->log)
+ if (!conf || !rcu_access_pointer(conf->log))
return -ENODEV;
if (raid5_calc_degraded(conf) > 0 &&
@@ -2513,12 +2547,19 @@ int r5c_journal_mode_set(struct mddev *mddev, int mode)
return -EINVAL;
mddev_suspend(mddev);
- conf->log->r5c_journal_mode = mode;
+ rcu_read_lock();
+ log = rcu_dereference(conf->log);
+ if (log) {
+ log->r5c_journal_mode = mode;
+ pr_debug("md/raid:%s: setting r5c cache mode to %d: %s\n",
+ mdname(mddev), mode, r5c_journal_mode_str[mode]);
+ } else {
+ ret = -ENODEV;
+ }
+ rcu_read_unlock();
mddev_resume(mddev);
- pr_debug("md/raid:%s: setting r5c cache mode to %d: %s\n",
- mdname(mddev), mode, r5c_journal_mode_str[mode]);
- return 0;
+ return ret;
}
EXPORT_SYMBOL(r5c_journal_mode_set);
@@ -2564,7 +2605,7 @@ int r5c_try_caching_write(struct r5conf *conf,
struct stripe_head_state *s,
int disks)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = get_log_for_io(conf);
int i;
struct r5dev *dev;
int to_cache = 0;
@@ -2731,7 +2772,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,
struct stripe_head *sh,
struct stripe_head_state *s)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = get_log_for_io(conf);
int i;
int do_wakeup = 0;
sector_t tree_index;
@@ -2814,7 +2855,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,
int r5c_cache_data(struct r5conf *conf, struct stripe_head *sh)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = get_log_for_io(conf);
int pages = 0;
int reserve;
int i;
@@ -2870,7 +2911,7 @@ int r5c_cache_data(struct r5conf *conf, struct stripe_head *sh)
/* check whether this big stripe is in write back cache. */
bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log = get_log_for_io(conf);
sector_t tree_index;
void *slot;
@@ -2880,6 +2921,7 @@ bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect)
WARN_ON_ONCE(!rcu_read_lock_held());
tree_index = r5c_tree_index(conf, sect);
slot = radix_tree_lookup(&log->big_stripe_tree, tree_index);
+
return slot != NULL;
}
@@ -2960,30 +3002,38 @@ static int r5l_load_log(struct r5l_log *log)
int r5l_start(struct r5conf *conf)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log;
int ret;
+ log = rcu_dereference_protected(conf->log,
+ lockdep_is_held(&conf->mddev->reconfig_mutex));
if (!log)
return 0;
ret = r5l_load_log(log);
if (ret)
r5l_exit_log(conf);
+
return ret;
}
void r5c_update_on_rdev_error(struct mddev *mddev, struct md_rdev *rdev)
{
struct r5conf *conf = mddev->private;
- struct r5l_log *log = conf->log;
+ struct r5l_log *log;
+ rcu_read_lock();
+ log = rcu_dereference(conf->log);
if (!log)
- return;
+ goto out;
if ((raid5_calc_degraded(conf) > 0 ||
test_bit(Journal, &rdev->flags)) &&
- conf->log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK)
+ log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK)
schedule_work(&log->disable_writeback_work);
+
+out:
+ rcu_read_unlock();
}
int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
@@ -3091,15 +3141,17 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
void r5l_exit_log(struct r5conf *conf)
{
- struct r5l_log *log = conf->log;
+ struct r5l_log *log;
- conf->log = NULL;
- synchronize_rcu();
+ lockdep_assert_held(&conf->mddev->reconfig_mutex);
+ log = rcu_replace_pointer(conf->log, NULL, 1);
/* Ensure disable_writeback_work wakes up and exits */
wake_up(&conf->mddev->sb_wait);
flush_work(&log->disable_writeback_work);
md_unregister_thread(&log->reclaim_thread);
+ synchronize_rcu();
+
mempool_exit(&log->meta_pool);
bioset_exit(&log->bs);
mempool_exit(&log->io_pool);
diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
index 5948c41f1f2e..5b910e2d4279 100644
--- a/drivers/md/raid5-log.h
+++ b/drivers/md/raid5-log.h
@@ -133,7 +133,7 @@ static inline int log_stripe(struct stripe_head *sh, struct stripe_head_state *s
{
struct r5conf *conf = sh->raid_conf;
- if (conf->log) {
+ if (rcu_access_pointer(conf->log)) {
if (!test_bit(STRIPE_R5C_CACHING, &sh->state)) {
/* writing out phase */
if (s->waiting_extra_page)
@@ -154,7 +154,7 @@ static inline void log_stripe_write_finished(struct stripe_head *sh)
{
struct r5conf *conf = sh->raid_conf;
- if (conf->log)
+ if (rcu_access_pointer(conf->log))
r5l_stripe_write_finished(sh);
else if (raid5_has_ppl(conf))
ppl_stripe_write_finished(sh);
@@ -162,7 +162,7 @@ static inline void log_stripe_write_finished(struct stripe_head *sh)
static inline void log_write_stripe_run(struct r5conf *conf)
{
- if (conf->log)
+ if (rcu_access_pointer(conf->log))
r5l_write_stripe_run(conf);
else if (raid5_has_ppl(conf))
ppl_write_stripe_run(conf);
@@ -170,7 +170,7 @@ static inline void log_write_stripe_run(struct r5conf *conf)
static inline void log_flush_stripe_to_raid(struct r5conf *conf)
{
- if (conf->log)
+ if (rcu_access_pointer(conf->log))
r5l_flush_stripe_to_raid(conf);
else if (raid5_has_ppl(conf))
ppl_write_stripe_run(conf);
@@ -180,7 +180,7 @@ static inline int log_handle_flush_request(struct r5conf *conf, struct bio *bio)
{
int ret = -ENODEV;
- if (conf->log)
+ if (rcu_access_pointer(conf->log))
ret = r5l_handle_flush_request(conf, bio);
else if (raid5_has_ppl(conf))
ret = ppl_handle_flush_request(bio);
@@ -190,7 +190,7 @@ static inline int log_handle_flush_request(struct r5conf *conf, struct bio *bio)
static inline void log_quiesce(struct r5conf *conf, int quiesce)
{
- if (conf->log)
+ if (rcu_access_pointer(conf->log))
r5l_quiesce(conf, quiesce);
else if (raid5_has_ppl(conf))
ppl_quiesce(conf, quiesce);
@@ -198,7 +198,7 @@ static inline void log_quiesce(struct r5conf *conf, int quiesce)
static inline void log_exit(struct r5conf *conf)
{
- if (conf->log)
+ if (rcu_access_pointer(conf->log))
r5l_exit_log(conf);
else if (raid5_has_ppl(conf))
ppl_exit_log(conf);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 4b25a0262cb5..93b4b69650bb 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7937,7 +7937,7 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
struct md_rdev *tmp;
print_raid5_conf(conf);
- if (test_bit(Journal, &rdev->flags) && conf->log) {
+ if (test_bit(Journal, &rdev->flags) && rcu_access_pointer(conf->log)) {
mddev_suspend(mddev);
log_exit(conf);
mddev_resume(mddev);
@@ -8019,7 +8019,7 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
int last = conf->raid_disks - 1;
if (test_bit(Journal, &rdev->flags)) {
- if (conf->log)
+ if (rcu_access_pointer(conf->log))
return -EBUSY;
rdev->raid_disk = 0;
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index 638d29863503..04fe5b6f679c 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -684,7 +684,7 @@ struct r5conf {
struct r5worker_group *worker_groups;
int group_cnt;
int worker_cnt_per_group;
- struct r5l_log *log;
+ struct r5l_log __rcu *log;
void *log_private;
spinlock_t pending_bios_lock;
--
2.30.2
The 07layouts test in mdadm fails on some systems. The failure
presents itself as the backup file not being removed before the next
layout is grown into:
mdadm: /dev/md0: cannot create backup file /tmp/md-test-backup:
File exists
This is because the background mdadm process, which is responsible for
cleaning up this backup file gets into an infinite loop waiting for
the reshape to start. mdadm checks the mdstat file if a reshape is
going and, if it is not, it waits for an event on the file or times
out in 5 seconds. On faster machines, the reshape may complete before
the 5 seconds times out, and thus the background mdadm process loops
waiting for a reshape to start that has already occurred.
mdadm reads the mdstat file to start, but mdstat does not report that the
reshape has begun, even though it has indeed begun. So the mdstat_wait()
call (in mdadm) which polls on the mdstat file won't ever return until
timing out.
The reason mdstat reports the reshape has started is due to an issue
in status_resync(). recovery_active is subtracted from curr_resync which
will result in a value of zero for the first chunk of reshaped data, and
the resulting read will report no reshape in progress.
To fix this, if "resync - recovery_active" is an overloaded value, force
the value to be MD_RESYNC_ACTIVE so the code reports a resync in progress.
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/md/md.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0893029865eb..2be429874d18 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8022,10 +8022,20 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev)
if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
/* Still cleaning up */
resync = max_sectors;
- } else if (resync > max_sectors)
+ } else if (resync > max_sectors) {
resync = max_sectors;
- else
+ } else {
resync -= atomic_read(&mddev->recovery_active);
+ if (resync < MD_RESYNC_ACTIVE) {
+ /*
+ * Resync has started, but the subtraction has
+ * yielded one of the special values. Force it
+ * to active to ensure the status reports an
+ * active resync.
+ */
+ resync = MD_RESYNC_ACTIVE;
+ }
+ }
if (resync == MD_RESYNC_NONE) {
if (test_bit(MD_RESYNCING_REMOTE, &mddev->recovery)) {
--
2.30.2
r5c_finish_cache_stripe() dereferences conf->log, which will need an
rcu_read_lock(). But that is not necessary here as the log is already
available in the call sites through other means.
No functional changes intended.
Signed-off-by: Logan Gunthorpe <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
drivers/md/raid5-cache.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 24110b687055..8284ce3e5cf6 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -507,10 +507,9 @@ static void r5c_handle_parity_cached(struct stripe_head *sh)
* Setting proper flags after writing (or flushing) data and/or parity to the
* log device. This is called from r5l_log_endio() or r5l_log_flush_endio().
*/
-static void r5c_finish_cache_stripe(struct stripe_head *sh)
+static void r5c_finish_cache_stripe(struct r5l_log *log,
+ struct stripe_head *sh)
{
- struct r5l_log *log = sh->raid_conf->log;
-
if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH) {
BUG_ON(test_bit(STRIPE_R5C_CACHING, &sh->state));
/*
@@ -528,14 +527,14 @@ static void r5c_finish_cache_stripe(struct stripe_head *sh)
}
}
-static void r5l_io_run_stripes(struct r5l_io_unit *io)
+static void r5l_io_run_stripes(struct r5l_log *log, struct r5l_io_unit *io)
{
struct stripe_head *sh, *next;
list_for_each_entry_safe(sh, next, &io->stripe_list, log_list) {
list_del_init(&sh->log_list);
- r5c_finish_cache_stripe(sh);
+ r5c_finish_cache_stripe(log, sh);
set_bit(STRIPE_HANDLE, &sh->state);
raid5_release_stripe(sh);
@@ -554,7 +553,7 @@ static void r5l_log_run_stripes(struct r5l_log *log)
break;
list_move_tail(&io->log_sibling, &log->finished_ios);
- r5l_io_run_stripes(io);
+ r5l_io_run_stripes(log, io);
}
}
@@ -1284,7 +1283,7 @@ static void r5l_log_flush_endio(struct bio *bio)
spin_lock_irqsave(&log->io_list_lock, flags);
list_for_each_entry(io, &log->flushing_ios, log_sibling)
- r5l_io_run_stripes(io);
+ r5l_io_run_stripes(log, io);
list_splice_tail_init(&log->flushing_ios, &log->finished_ios);
spin_unlock_irqrestore(&log->io_list_lock, flags);
--
2.30.2
On Thu, May 26, 2022 at 10:36:00AM -0600, Logan Gunthorpe wrote:
> The mdadm test 21raid5cache randomly fails with NULL pointer accesses
> of conf->log when run repeatedly. conf->log was sort of protected with
> RCU, but most dereferences were not done with the correct functions.
>
> Add rcu_read_locks(), rcu_dereference_protected() and rcu_access_pointers()
> calls to the appropriate places and mark the pointer with __rcu.
Looking at the code a bit more, is this really enough? Calls to
r5c_is_writeback / r5c_confi_is_writeback are sprinkled all over the
code, and my gut feeling is the value is not expected to change over
way longer critical sections than this. So maybe the answer here is to
fix up the release to be properly locked as it only affects the non-I/O
slow path anyway.
Looks good:
Reviewed-by: Christoph Hellwig <[email protected]>
On 2022-05-30 00:01, Christoph Hellwig wrote:
> On Thu, May 26, 2022 at 10:36:00AM -0600, Logan Gunthorpe wrote:
>> The mdadm test 21raid5cache randomly fails with NULL pointer accesses
>> of conf->log when run repeatedly. conf->log was sort of protected with
>> RCU, but most dereferences were not done with the correct functions.
>>
>> Add rcu_read_locks(), rcu_dereference_protected() and rcu_access_pointers()
>> calls to the appropriate places and mark the pointer with __rcu.
>
> Looking at the code a bit more, is this really enough? Calls to
> r5c_is_writeback / r5c_confi_is_writeback are sprinkled all over the
> code, and my gut feeling is the value is not expected to change over
> way longer critical sections than this. So maybe the answer here is to
> fix up the release to be properly locked as it only affects the non-I/O
> slow path anyway.
Yeah, I think your gut feeling is correct. It looks like all the
is_writeback calls are in the IO path as well. I'll review this again
and see if we can just replace the RCU stuff and the paths that were
hitting NULL pointer deference with the taking of a lock.
Logan
Looks good:
Reviewed-by: Christoph Hellwig <[email protected]>