LinuxLists.cc - [PATCH 0/3] raid10 bugfix

2023-05-22 12:11:08

Subject: [PATCH 0/3] raid10 bugfix

From: Li Nan <[email protected]>

Li Nan (3):
md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request
md/raid10: fix incorrect done of recovery
md/raid10: fix io loss while replacement replace rdev

drivers/md/raid10.c | 51 +++++++++++++++++++++++++++++++++------------
1 file changed, 38 insertions(+), 13 deletions(-)

--
2.31.1

2023-05-22 12:13:06

by Li Nan

[permalink] [raw]

Subject: [PATCH 2/3] md/raid10: fix incorrect done of recovery

From: Li Nan <[email protected]>

Recovery will go to giveup and let chunks_skipped++ in
raid10_sync_request() if there are some bad_blocks, and it will return
max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
data is inconsistent but user think recovery is done, it is wrong.

Fix it by set mirror's recovery_disabled and spare device shouln't be
added to here.

Signed-off-by: Li Nan <[email protected]>
---
drivers/md/raid10.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index e21502c03b45..70cc87c7ee57 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
int chunks_skipped = 0;
sector_t chunk_mask = conf->geo.chunk_mask;
int page_idx = 0;
+ int error_disk = -1;

/*
* Allow skipping a full rebuild for incremental assembly
@@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
return reshape_request(mddev, sector_nr, skipped);

if (chunks_skipped >= conf->geo.raid_disks) {
- /* if there has been nothing to do on any drive,
+ pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
+ test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? "resync" : "recovery");
+ if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
+ /*
+ * recovery fail, set mirrors.recovory_disabled,
+ * device shouldn't be added to there.
+ */
+ conf->mirrors[error_disk].recovery_disabled = mddev->recovery_disabled;
+ return 0;
+ }
+ /*
+ * if there has been nothing to do on any drive,
* then there is nothing to do at all..
*/
*skipped = 1;
@@ -3640,6 +3652,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
mdname(mddev));
mirror->recovery_disabled
= mddev->recovery_disabled;
+ } else {
+ error_disk = i;
}
put_buf(r10_bio);
if (rb2)
--
2.31.1

2023-05-22 14:29:24

by Yu Kuai

[permalink] [raw]

Subject: Re: [PATCH 2/3] md/raid10: fix incorrect done of recovery

Hi,

?? 2023/05/22 19:54, [email protected] д??:
> From: Li Nan <[email protected]>
>
> Recovery will go to giveup and let chunks_skipped++ in
> raid10_sync_request() if there are some bad_blocks, and it will return
> max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
> data is inconsistent but user think recovery is done, it is wrong.
>
> Fix it by set mirror's recovery_disabled and spare device shouln't be
> added to here.
>
> Signed-off-by: Li Nan <[email protected]>
> ---
> drivers/md/raid10.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index e21502c03b45..70cc87c7ee57 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> int chunks_skipped = 0;
> sector_t chunk_mask = conf->geo.chunk_mask;
> int page_idx = 0;
> + int error_disk = -1;
>
> /*
> * Allow skipping a full rebuild for incremental assembly
> @@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> return reshape_request(mddev, sector_nr, skipped);
>
> if (chunks_skipped >= conf->geo.raid_disks) {
> - /* if there has been nothing to do on any drive,
> + pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
> + test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? "resync" : "recovery");

Line exceed 80 columns, and following.
> + if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {

Resync has the same problem, right?

Thanks,
Kuai
> + /*
> + * recovery fail, set mirrors.recovory_disabled,
> + * device shouldn't be added to there.
> + */
> + conf->mirrors[error_disk].recovery_disabled = mddev->recovery_disabled;
> + return 0;
> + }
> + /*
> + * if there has been nothing to do on any drive,
> * then there is nothing to do at all..
> */
> *skipped = 1;
> @@ -3640,6 +3652,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> mdname(mddev));
> mirror->recovery_disabled
> = mddev->recovery_disabled;
> + } else {
> + error_disk = i;
> }
> put_buf(r10_bio);
> if (rb2)
>

2023-05-22 16:13:41

by Song Liu

[permalink] [raw]

Subject: Re: [PATCH 2/3] md/raid10: fix incorrect done of recovery

On Mon, May 22, 2023 at 6:54 AM Yu Kuai <[email protected]> wrote:
>
> Hi,
>
> 在 2023/05/22 19:54, [email protected] 写道:
> > From: Li Nan <[email protected]>
> >
> > Recovery will go to giveup and let chunks_skipped++ in
> > raid10_sync_request() if there are some bad_blocks, and it will return
> > max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
> > data is inconsistent but user think recovery is done, it is wrong.
> >
> > Fix it by set mirror's recovery_disabled and spare device shouln't be
> > added to here.
> >
> > Signed-off-by: Li Nan <[email protected]>
> > ---
> > drivers/md/raid10.c | 16 +++++++++++++++-
> > 1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> > index e21502c03b45..70cc87c7ee57 100644
> > --- a/drivers/md/raid10.c
> > +++ b/drivers/md/raid10.c
> > @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> > int chunks_skipped = 0;
> > sector_t chunk_mask = conf->geo.chunk_mask;
> > int page_idx = 0;
> > + int error_disk = -1;
> >
> > /*
> > * Allow skipping a full rebuild for incremental assembly
> > @@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> > return reshape_request(mddev, sector_nr, skipped);
> >
> > if (chunks_skipped >= conf->geo.raid_disks) {
> > - /* if there has been nothing to do on any drive,
> > + pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
> > + test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? "resync" : "recovery");
>
> Line exceed 80 columns, and following.

If it makes the code look better, such as in this case, it is OK to have
lines longer than 80 columns.

Thanks,
Song

> > + if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
>

2023-05-23 19:14:10

by Song Liu

[permalink] [raw]

Subject: Re: [PATCH 0/3] raid10 bugfix

On Mon, May 22, 2023 at 4:56 AM <[email protected]> wrote:
>
> From: Li Nan <[email protected]>
>
> Li Nan (3):
> md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request
> md/raid10: fix incorrect done of recovery
> md/raid10: fix io loss while replacement replace rdev

Please address feedback and send v2.

Thanks,
Song

2023-05-25 14:08:38

by Li Nan

[permalink] [raw]

Subject: Re: [PATCH 0/3] raid10 bugfix

在 2023/5/24 2:44, Song Liu 写道:
> On Mon, May 22, 2023 at 4:56 AM <[email protected]> wrote:
>>
>> From: Li Nan <[email protected]>
>>
>> Li Nan (3):
>> md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request
>> md/raid10: fix incorrect done of recovery
>> md/raid10: fix io loss while replacement replace rdev
>
> Please address feedback and send v2.
>
> Thanks,
> Song
> .

Thanks for review, I will send v2 later.
--
Thanks,
Nan

2023-05-25 14:37:25

by Li Nan

[permalink] [raw]

Subject: Re: [PATCH 2/3] md/raid10: fix incorrect done of recovery

在 2023/5/22 21:54, Yu Kuai 写道:
> Hi,
>
> 在 2023/05/22 19:54, [email protected] 写道:
>> From: Li Nan <[email protected]>
>>
>> Recovery will go to giveup and let chunks_skipped++ in
>> raid10_sync_request() if there are some bad_blocks, and it will return
>> max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
>> data is inconsistent but user think recovery is done, it is wrong.
>>
>> Fix it by set mirror's recovery_disabled and spare device shouln't be
>> added to here.
>>
>> Signed-off-by: Li Nan <[email protected]>
>> ---
>> drivers/md/raid10.c | 16 +++++++++++++++-
>> 1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index e21502c03b45..70cc87c7ee57 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev
>> *mddev, sector_t sector_nr,
>>       int chunks_skipped = 0;
>>       sector_t chunk_mask = conf->geo.chunk_mask;
>>       int page_idx = 0;
>> +    int error_disk = -1;
>>       /*
>>        * Allow skipping a full rebuild for incremental assembly
>> @@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct
>> mddev *mddev, sector_t sector_nr,
>>           return reshape_request(mddev, sector_nr, skipped);
>>       if (chunks_skipped >= conf->geo.raid_disks) {
>> -        /* if there has been nothing to do on any drive,
>> +        pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
>> +            test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? "resync"
>> : "recovery");
>
> Line exceed 80 columns, and following.
>> +        if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC,
>> &mddev->recovery)) {
>
> Resync has the same problem, right?
>

Yes. But I have no idea to fix it. md_error disk nor set
recovery_disabled is a good solution. So, just print error message now.
Do you have any ideas?
--
Thanks,
Nan

2023-05-26 02:58:15

by Yu Kuai

[permalink] [raw]

Subject: Re: [PATCH 2/3] md/raid10: fix incorrect done of recovery

Hi,

在 2023/05/25 22:00, Li Nan 写道:
>
>
> 在 2023/5/22 21:54, Yu Kuai 写道:
>> Hi,
>>
>> 在 2023/05/22 19:54, [email protected] 写道:
>>> From: Li Nan <[email protected]>
>>>
>>> Recovery will go to giveup and let chunks_skipped++ in
>>> raid10_sync_request() if there are some bad_blocks, and it will return
>>> max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
>>> data is inconsistent but user think recovery is done, it is wrong.
>>>
>>> Fix it by set mirror's recovery_disabled and spare device shouln't be
>>> added to here.
>>>
>>> Signed-off-by: Li Nan <[email protected]>
>>> ---
>>> drivers/md/raid10.c | 16 +++++++++++++++-
>>> 1 file changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>>> index e21502c03b45..70cc87c7ee57 100644
>>> --- a/drivers/md/raid10.c
>>> +++ b/drivers/md/raid10.c
>>> @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct
>>> mddev *mddev, sector_t sector_nr,
>>>       int chunks_skipped = 0;
>>>       sector_t chunk_mask = conf->geo.chunk_mask;
>>>       int page_idx = 0;
>>> +    int error_disk = -1;
>>>       /*
>>>        * Allow skipping a full rebuild for incremental assembly
>>> @@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct
>>> mddev *mddev, sector_t sector_nr,
>>>           return reshape_request(mddev, sector_nr, skipped);
>>>       if (chunks_skipped >= conf->geo.raid_disks) {
>>> -        /* if there has been nothing to do on any drive,
>>> +        pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
>>> +            test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? "resync"
>>> : "recovery");
>>
>> Line exceed 80 columns, and following.
>>> +        if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC,
>>> &mddev->recovery)) {
>>
>> Resync has the same problem, right?
>>
>
> Yes. But I have no idea to fix it. md_error disk nor set
> recovery_disabled is a good solution. So, just print error message now.
> Do you have any ideas?

I'll look into this, in the meadtime, I don't suggest to apply this
patch because this is just temporary solution that only fix half of
the problem.

Thanks,
Kuai