From: Yu Kuai <[email protected]>
We're running many tests for raid10 currently, and we found a lot of
problems already. This patchset is the first part for some simple and
obvious problems. Most of the patches were sent separately already, but
I think a patchset is better for review.
Yu Kuai (6):
md/raid10: don't call bio_start_io_acct twice for bio which
experienced read error
md: fix soft lockup in status_resync
md/raid10: don't BUG_ON() in raise_barrier()
md/radi10: fix leak of 'r10bio->remaining' for recovery
md/raid10: fix memleak for 'conf->bio_split'
md/raid10: fix memleak of md thread
drivers/md/md.c | 18 +++++------
drivers/md/raid10.c | 78 +++++++++++++++++++++++----------------------
2 files changed, 49 insertions(+), 47 deletions(-)
--
2.31.1
From: Yu Kuai <[email protected]>
In the error path of raid10_run(), 'conf' need be freed, however,
'conf->bio_split' is missed and memory will be leaked.
Since there are 3 places to free 'conf', factor out a helper to fix the
problem.
Fixes: fc9977dd069e ("md/raid10: simplify the splitting of requests.")
Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/raid10.c | 37 +++++++++++++++++--------------------
1 file changed, 17 insertions(+), 20 deletions(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index f7002a1aa9cf..bdfa02e8fe7e 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4009,6 +4009,20 @@ static int setup_geo(struct geom *geo, struct mddev *mddev, enum geo_type new)
return nc*fc;
}
+static void raid10_free_conf(struct r10conf *conf)
+{
+ if (!conf)
+ return;
+
+ mempool_exit(&conf->r10bio_pool);
+ kfree(conf->mirrors);
+ kfree(conf->mirrors_old);
+ kfree(conf->mirrors_new);
+ safe_put_page(conf->tmppage);
+ bioset_exit(&conf->bio_split);
+ kfree(conf);
+}
+
static struct r10conf *setup_conf(struct mddev *mddev)
{
struct r10conf *conf = NULL;
@@ -4091,13 +4105,7 @@ static struct r10conf *setup_conf(struct mddev *mddev)
return conf;
out:
- if (conf) {
- mempool_exit(&conf->r10bio_pool);
- kfree(conf->mirrors);
- safe_put_page(conf->tmppage);
- bioset_exit(&conf->bio_split);
- kfree(conf);
- }
+ raid10_free_conf(conf);
return ERR_PTR(err);
}
@@ -4288,10 +4296,7 @@ static int raid10_run(struct mddev *mddev)
out_free_conf:
md_unregister_thread(&mddev->thread);
- mempool_exit(&conf->r10bio_pool);
- safe_put_page(conf->tmppage);
- kfree(conf->mirrors);
- kfree(conf);
+ raid10_free_conf(conf);
mddev->private = NULL;
out:
return -EIO;
@@ -4299,15 +4304,7 @@ static int raid10_run(struct mddev *mddev)
static void raid10_free(struct mddev *mddev, void *priv)
{
- struct r10conf *conf = priv;
-
- mempool_exit(&conf->r10bio_pool);
- safe_put_page(conf->tmppage);
- kfree(conf->mirrors);
- kfree(conf->mirrors_old);
- kfree(conf->mirrors_new);
- bioset_exit(&conf->bio_split);
- kfree(conf);
+ raid10_free_conf(priv);
}
static void raid10_quiesce(struct mddev *mddev, int quiesce)
--
2.31.1
From: Yu Kuai <[email protected]>
raid10_sync_request() will add 'r10bio->remaining' for both rdev and
replacement rdev. However, if the read io fails, recovery_request_write()
returns without issuing the write io, in this case, end_sync_request()
is only called once and 'remaining' is leaked, cause an io hang.
Fix the problem by decreasing 'remaining' according to if 'bio' and
'repl_bio' is valid.
Fixes: 24afd80d99f8 ("md/raid10: handle recovery of replacement devices.")
Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/raid10.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index a8b5fecef136..f7002a1aa9cf 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2611,11 +2611,22 @@ static void recovery_request_write(struct mddev *mddev, struct r10bio *r10_bio)
{
struct r10conf *conf = mddev->private;
int d;
- struct bio *wbio, *wbio2;
+ struct bio *wbio = r10_bio->devs[1].bio;
+ struct bio *wbio2 = r10_bio->devs[1].repl_bio;
+
+ /* Need to test wbio2->bi_end_io before we call
+ * submit_bio_noacct as if the former is NULL,
+ * the latter is free to free wbio2.
+ */
+ if (wbio2 && !wbio2->bi_end_io)
+ wbio2 = NULL;
if (!test_bit(R10BIO_Uptodate, &r10_bio->state)) {
fix_recovery_read_error(r10_bio);
- end_sync_request(r10_bio);
+ if (wbio->bi_end_io)
+ end_sync_request(r10_bio);
+ if (wbio2)
+ end_sync_request(r10_bio);
return;
}
@@ -2624,14 +2635,6 @@ static void recovery_request_write(struct mddev *mddev, struct r10bio *r10_bio)
* and submit the write request
*/
d = r10_bio->devs[1].devnum;
- wbio = r10_bio->devs[1].bio;
- wbio2 = r10_bio->devs[1].repl_bio;
- /* Need to test wbio2->bi_end_io before we call
- * submit_bio_noacct as if the former is NULL,
- * the latter is free to free wbio2.
- */
- if (wbio2 && !wbio2->bi_end_io)
- wbio2 = NULL;
if (wbio->bi_end_io) {
atomic_inc(&conf->mirrors[d].rdev->nr_pending);
md_sync_acct(conf->mirrors[d].rdev->bdev, bio_sectors(wbio));
--
2.31.1
From: Yu Kuai <[email protected]>
status_resync() will calculate 'curr_resync - recovery_active' to show
user a progress bar like following:
[============>........] resync = 61.4%
'curr_resync' and 'recovery_active' is updated in md_do_sync(), and
status_resync() can read them concurrently, hence it's possible that
'curr_resync - recovery_active' can overflow to a huge number. In this
case status_resync() will be stuck in the loop to print a large amount
of '=', which will end up soft lockup.
Fix the problem by setting 'resync' to MD_RESYNC_ACTIVE in this case,
this way resync in progress will be reported to user.
Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/md.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 546b1b81eb28..98970bbe32bf 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8009,16 +8009,16 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev)
} else if (resync > max_sectors) {
resync = max_sectors;
} else {
- resync -= atomic_read(&mddev->recovery_active);
- if (resync < MD_RESYNC_ACTIVE) {
- /*
- * Resync has started, but the subtraction has
- * yielded one of the special values. Force it
- * to active to ensure the status reports an
- * active resync.
- */
+ res = atomic_read(&mddev->recovery_active);
+ /*
+ * Resync has started, but the subtraction has overflowed or
+ * yielded one of the special values. Force it to active to
+ * ensure the status reports an active resync.
+ */
+ if (resync < res || resync - res < MD_RESYNC_ACTIVE)
resync = MD_RESYNC_ACTIVE;
- }
+ else
+ resync -= res;
}
if (resync == MD_RESYNC_NONE) {
--
2.31.1
From: Yu Kuai <[email protected]>
In raid10_run(), if setup_conf() succeed and raid10_run() failed before
setting 'mddev->thread', then in the error path 'conf->thread' is not
freed.
Fix the problem by setting 'mddev->thread' right after setup_conf().
Fixes: 43a521238aca ("md-cluster: choose correct label when clustered layout is not supported")
Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/raid10.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index bdfa02e8fe7e..2f1522bba80d 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4142,6 +4142,9 @@ static int raid10_run(struct mddev *mddev)
if (!conf)
goto out;
+ mddev->thread = conf->thread;
+ conf->thread = NULL;
+
if (mddev_is_clustered(conf->mddev)) {
int fc, fo;
@@ -4154,9 +4157,6 @@ static int raid10_run(struct mddev *mddev)
}
}
- mddev->thread = conf->thread;
- conf->thread = NULL;
-
if (mddev->queue) {
blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
blk_queue_io_min(mddev->queue, mddev->chunk_sectors << 9);
--
2.31.1
On Thu, Mar 9, 2023 at 11:39 PM Yu Kuai <[email protected]> wrote:
>
> From: Yu Kuai <[email protected]>
>
> status_resync() will calculate 'curr_resync - recovery_active' to show
> user a progress bar like following:
>
> [============>........] resync = 61.4%
>
> 'curr_resync' and 'recovery_active' is updated in md_do_sync(), and
> status_resync() can read them concurrently, hence it's possible that
> 'curr_resync - recovery_active' can overflow to a huge number. In this
> case status_resync() will be stuck in the loop to print a large amount
> of '=', which will end up soft lockup.
>
> Fix the problem by setting 'resync' to MD_RESYNC_ACTIVE in this case,
> this way resync in progress will be reported to user.
>
> Signed-off-by: Yu Kuai <[email protected]>
Looks good. Applied to md-next.
Thanks,
Song
> ---
> drivers/md/md.c | 18 +++++++++---------
> 1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 546b1b81eb28..98970bbe32bf 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8009,16 +8009,16 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev)
> } else if (resync > max_sectors) {
> resync = max_sectors;
> } else {
> - resync -= atomic_read(&mddev->recovery_active);
> - if (resync < MD_RESYNC_ACTIVE) {
> - /*
> - * Resync has started, but the subtraction has
> - * yielded one of the special values. Force it
> - * to active to ensure the status reports an
> - * active resync.
> - */
> + res = atomic_read(&mddev->recovery_active);
> + /*
> + * Resync has started, but the subtraction has overflowed or
> + * yielded one of the special values. Force it to active to
> + * ensure the status reports an active resync.
> + */
> + if (resync < res || resync - res < MD_RESYNC_ACTIVE)
> resync = MD_RESYNC_ACTIVE;
> - }
> + else
> + resync -= res;
> }
>
> if (resync == MD_RESYNC_NONE) {
> --
> 2.31.1
>
On Thu, Mar 9, 2023 at 11:39 PM Yu Kuai <[email protected]> wrote:
>
> From: Yu Kuai <[email protected]>
>
> We're running many tests for raid10 currently, and we found a lot of
> problems already. This patchset is the first part for some simple and
> obvious problems. Most of the patches were sent separately already, but
> I think a patchset is better for review.
>
> Yu Kuai (6):
> md/raid10: don't call bio_start_io_acct twice for bio which
> experienced read error
> md: fix soft lockup in status_resync
> md/raid10: don't BUG_ON() in raise_barrier()
> md/radi10: fix leak of 'r10bio->remaining' for recovery
> md/raid10: fix memleak for 'conf->bio_split'
> md/raid10: fix memleak of md thread
Applied 2/6 to 6/6 to md-next. Thanks!
Song
>
> drivers/md/md.c | 18 +++++------
> drivers/md/raid10.c | 78 +++++++++++++++++++++++----------------------
> 2 files changed, 49 insertions(+), 47 deletions(-)
>
> --
> 2.31.1
>
Hi, Song!
在 2023/03/14 6:37, Song Liu 写道:
> On Thu, Mar 9, 2023 at 11:39 PM Yu Kuai <[email protected]> wrote:
>>
>> From: Yu Kuai <[email protected]>
>>
>> We're running many tests for raid10 currently, and we found a lot of
>> problems already. This patchset is the first part for some simple and
>> obvious problems. Most of the patches were sent separately already, but
>> I think a patchset is better for review.
>>
We had many testcase for raid10 locally,just consulte that is mdadm the
only place to push some new tests? We're maintaining tests through
blktests, is't ok if we push those tests to blktests?
Thanks,
Kuai
>> Yu Kuai (6):
>> md/raid10: don't call bio_start_io_acct twice for bio which
>> experienced read error
>> md: fix soft lockup in status_resync
>> md/raid10: don't BUG_ON() in raise_barrier()
>> md/radi10: fix leak of 'r10bio->remaining' for recovery
>> md/raid10: fix memleak for 'conf->bio_split'
>> md/raid10: fix memleak of md thread
>
> Applied 2/6 to 6/6 to md-next. Thanks!
>
> Song
>
>>
>> drivers/md/md.c | 18 +++++------
>> drivers/md/raid10.c | 78 +++++++++++++++++++++++----------------------
>> 2 files changed, 49 insertions(+), 47 deletions(-)
>>
>> --
>> 2.31.1
>>
> .
>