2017-03-20 09:52:07

by Gi-Oh Kim

[permalink] [raw]
Subject: [PATCHv2 0/2] mdadm: setting device role of raid1 disk with failfast

Hi,

I've found a case that failfast option of mdadm set a disk faulty wrongly.
Following is my test case.

mdadm --create /dev/md100 -l 1 --failfast -e 1.2 -n 2 /dev/vdb /dev/vdc
mdadm /dev/md100 -a --failfast /dev/vdd

If I use failfast option, the vdd disk was faulty wrongly.
If not, it was spare.

This patch fixes a corner case for setting device role and
prints device role if it's faulty.
This patch is based on "mdadm - v4.0-8-g72b616a - 2017-03-07".

v2: fix a typo of v1

Gioh Kim (1):
super1: ignore failfast flag for setting device role

Jack Wang (1):
super1: check and output faulty dev role

super1.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)

--
2.5.0


2017-03-20 09:52:13

by Gi-Oh Kim

[permalink] [raw]
Subject: [PATCHv2 1/2] super1: ignore failfast flag for setting device role

There is corner case for setting device role,
if new device has failfast flag.
The failfast flag should be ignored.

Signed-off-by: Gioh Kim <[email protected]>
Signed-off-by: Jack Wang <[email protected]>
---
super1.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/super1.c b/super1.c
index 882cd61..f3520ac 100644
--- a/super1.c
+++ b/super1.c
@@ -1491,6 +1491,7 @@ static int add_to_super1(struct supertype *st, mdu_disk_info_t *dk,
struct devinfo *di, **dip;
bitmap_super_t *bms = (bitmap_super_t*)(((char*)sb) + MAX_SB_SIZE);
int rv, lockid;
+ int dk_state;

if (bms->version == BITMAP_MAJOR_CLUSTERED && dlm_funs_ready()) {
rv = cluster_get_dlmlock(&lockid);
@@ -1501,11 +1502,12 @@ static int add_to_super1(struct supertype *st, mdu_disk_info_t *dk,
}
}

- if ((dk->state & 6) == 6) /* active, sync */
+ dk_state = dk->state & ~(1<<MD_DISK_FAILFAST);
+ if ((dk_state & 6) == 6) /* active, sync */
*rp = __cpu_to_le16(dk->raid_disk);
- else if (dk->state & (1<<MD_DISK_JOURNAL))
+ else if (dk_state & (1<<MD_DISK_JOURNAL))
*rp = MD_DISK_ROLE_JOURNAL;
- else if ((dk->state & ~2) == 0) /* active or idle -> spare */
+ else if ((dk_state & ~2) == 0) /* active or idle -> spare */
*rp = MD_DISK_ROLE_SPARE;
else
*rp = MD_DISK_ROLE_FAULTY;
--
2.5.0

2017-03-20 09:52:18

by Gi-Oh Kim

[permalink] [raw]
Subject: [PATCHv2 2/2] super1: check and output faulty dev role

From: Jack Wang <[email protected]>

Output the real dev role in examine_super1, it will help to
find problem.

Signed-off-by: Jack Wang <[email protected]>
Reviewed-by: Gioh Kim <[email protected]>
---
super1.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/super1.c b/super1.c
index f3520ac..c903371 100644
--- a/super1.c
+++ b/super1.c
@@ -501,8 +501,10 @@ static void examine_super1(struct supertype *st, char *homehost)
#endif
printf(" Device Role : ");
role = role_from_sb(sb);
- if (role >= MD_DISK_ROLE_FAULTY)
- printf("spare\n");
+ if (role == MD_DISK_ROLE_SPARE)
+ printf("Spare\n");
+ else if (role == MD_DISK_ROLE_FAULTY)
+ printf("Faulty\n");
else if (role == MD_DISK_ROLE_JOURNAL)
printf("Journal\n");
else if (sb->feature_map & __cpu_to_le32(MD_FEATURE_REPLACEMENT))
--
2.5.0

2017-03-21 20:43:43

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCHv2 2/2] super1: check and output faulty dev role

On Mon, Mar 20 2017, Gioh Kim wrote:

> From: Jack Wang <[email protected]>
>
> Output the real dev role in examine_super1, it will help to
> find problem.
>
> Signed-off-by: Jack Wang <[email protected]>
> Reviewed-by: Gioh Kim <[email protected]>
> ---
> super1.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/super1.c b/super1.c
> index f3520ac..c903371 100644
> --- a/super1.c
> +++ b/super1.c
> @@ -501,8 +501,10 @@ static void examine_super1(struct supertype *st, char *homehost)
> #endif
> printf(" Device Role : ");
> role = role_from_sb(sb);
> - if (role >= MD_DISK_ROLE_FAULTY)
> - printf("spare\n");
> + if (role == MD_DISK_ROLE_SPARE)
> + printf("Spare\n");
> + else if (role == MD_DISK_ROLE_FAULTY)
> + printf("Faulty\n");
> else if (role == MD_DISK_ROLE_JOURNAL)
> printf("Journal\n");
> else if (sb->feature_map & __cpu_to_le32(MD_FEATURE_REPLACEMENT))
> --
> 2.5.0

I don't think the distinction between "faulty" and "spare" is really
useful here. I used to report the difference and it turned out to be
confusing, so we stopped.

This is information stored on some other disk, not the one that is
spare-or-faulty. All it needs to know if what other devices are
working. It doesn't need to know about which devices aren't working and
why.
The distinction between 'faulty' and 'spare' is only relevant to the
device itself, and to the array as a whole.

We should probably get rid of the distinction between
MD_DISK_ROLE_FAULTY and MD_DISK_ROLE_SPARE.
Most places that test for it just test >= MD_DISK_ROLE_FAULTY.

NeilBrown


Attachments:
signature.asc (832.00 B)

2017-03-22 10:24:18

by Jack Wang

[permalink] [raw]
Subject: Re: [PATCHv2 2/2] super1: check and output faulty dev role

On Tue, Mar 21, 2017 at 8:55 PM, NeilBrown <[email protected]> wrote:
> On Mon, Mar 20 2017, Gioh Kim wrote:
>
>> From: Jack Wang <[email protected]>
>>
>> Output the real dev role in examine_super1, it will help to
>> find problem.
>>
>> Signed-off-by: Jack Wang <[email protected]>
>> Reviewed-by: Gioh Kim <[email protected]>
>> ---
>> super1.c | 6 ++++--
>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/super1.c b/super1.c
>> index f3520ac..c903371 100644
>> --- a/super1.c
>> +++ b/super1.c
>> @@ -501,8 +501,10 @@ static void examine_super1(struct supertype *st, char *homehost)
>> #endif
>> printf(" Device Role : ");
>> role = role_from_sb(sb);
>> - if (role >= MD_DISK_ROLE_FAULTY)
>> - printf("spare\n");
>> + if (role == MD_DISK_ROLE_SPARE)
>> + printf("Spare\n");
>> + else if (role == MD_DISK_ROLE_FAULTY)
>> + printf("Faulty\n");
>> else if (role == MD_DISK_ROLE_JOURNAL)
>> printf("Journal\n");
>> else if (sb->feature_map & __cpu_to_le32(MD_FEATURE_REPLACEMENT))
>> --
>> 2.5.0
>
> I don't think the distinction between "faulty" and "spare" is really
> useful here. I used to report the difference and it turned out to be
> confusing, so we stopped.
>
> This is information stored on some other disk, not the one that is
> spare-or-faulty. All it needs to know if what other devices are
> working. It doesn't need to know about which devices aren't working and
> why.
> The distinction between 'faulty' and 'spare' is only relevant to the
> device itself, and to the array as a whole.
>
> We should probably get rid of the distinction between
> MD_DISK_ROLE_FAULTY and MD_DISK_ROLE_SPARE.
> Most places that test for it just test >= MD_DISK_ROLE_FAULTY.
>
> NeilBrown

The reason why I did this change, was during debugging the problem, we
notice the dev_role was wrong, but
when I print in mdadm it said 'spare', which lead me to check other
kernel code path, so spent more time until, I found
examine_super1, treat >=MD_DISK_ROLE_FAULTY as 'spare'.

I thought if the output was right, it could have saved me or maybe
also other developer some time.

But if this cause confusing in the past, we can drop it, the first is
the real bugfix.

Thanks!
--
Jack Wang
Linux Kernel Developer

2017-03-27 10:59:36

by Gi-Oh Kim

[permalink] [raw]
Subject: Re: [PATCHv2 0/2] mdadm: setting device role of raid1 disk with failfast

Hi,
Is nobody interested in those patches?

On Mon, Mar 20, 2017 at 10:51:55AM +0100, Gioh Kim wrote:
> Hi,
>
> I've found a case that failfast option of mdadm set a disk faulty wrongly.
> Following is my test case.
>
> mdadm --create /dev/md100 -l 1 --failfast -e 1.2 -n 2 /dev/vdb /dev/vdc
> mdadm /dev/md100 -a --failfast /dev/vdd
>
> If I use failfast option, the vdd disk was faulty wrongly.
> If not, it was spare.
>
> This patch fixes a corner case for setting device role and
> prints device role if it's faulty.
> This patch is based on "mdadm - v4.0-8-g72b616a - 2017-03-07".
>
> v2: fix a typo of v1
>
> Gioh Kim (1):
> super1: ignore failfast flag for setting device role
>
> Jack Wang (1):
> super1: check and output faulty dev role
>
> super1.c | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> --
> 2.5.0
>

--
Best regards,
Gi-Oh Kim
TEL: 0176 2697 8962

2017-03-28 18:02:24

by Jes Sorensen

[permalink] [raw]
Subject: Re: [PATCHv2 1/2] super1: ignore failfast flag for setting device role

Gioh Kim <[email protected]> writes:
> There is corner case for setting device role,
> if new device has failfast flag.
> The failfast flag should be ignored.
>
> Signed-off-by: Gioh Kim <[email protected]>
> Signed-off-by: Jack Wang <[email protected]>
> ---
> super1.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)

Applied!

Ideally I would like to get rid of the hardcoded values and use the bit
defines instead, but the original code did the same, so I am going to
take it.

I am leaving out the second patch for now, per Neil's comments. If you
feel strongly about it, please conveince Neil first :)

Thanks,
Jes