2017-03-17 16:47:16

by Gi-Oh Kim

[permalink] [raw]
Subject: [PATCH 0/2] mdadm: setting device role of raid1 disk with failfast

Hi,

I've found a case that failfast option of mdadm set a disk faulty wrongly.
Following is my test case.

mdadm --create /dev/md100 -l 1 --failfast -e 1.2 -n 2 /dev/vdb /dev/vdc
mdadm /dev/md100 -a --failfast /dev/vdd

If I use failfast option, the vdd disk was faulty wrongly.
If not, it was spare.

This patch fixes a corner case for setting device role and
prints device role if it's faulty.
This patch is based on "mdadm - v4.0-8-g72b616a - 2017-03-07".

Gioh Kim (1):
super1: ignore failfast flag for setting device role

Jack Wang (1):
super1: check and output faulty dev role

super1.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)

--
2.5.0


2017-03-17 16:47:18

by Gi-Oh Kim

[permalink] [raw]
Subject: [PATCH 2/2] super1: check and output faulty dev role

From: Jack Wang <[email protected]>

Output the real dev role in examine_super1, it will help to
find problem.

Signed-off-by: Jack Wang <[email protected]>
Reviewed-by: Gioh Kim <[email protected]>
---
super1.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/super1.c b/super1.c
index 1da33ef..0bf4715 100644
--- a/super1.c
+++ b/super1.c
@@ -501,8 +501,10 @@ static void examine_super1(struct supertype *st, char *homehost)
#endif
printf(" Device Role : ");
role = role_from_sb(sb);
- if (role >= MD_DISK_ROLE_FAULTY)
- printf("spare\n");
+ if (role == MD_DISK_ROLE_SPARE)
+ printf("Spare\n");
+ else if (role == MD_DISK_ROLE_FAULTY)
+ printf("Faulty\n");
else if (role == MD_DISK_ROLE_JOURNAL)
printf("Journal\n");
else if (sb->feature_map & __cpu_to_le32(MD_FEATURE_REPLACEMENT))
--
2.5.0

2017-03-17 16:47:23

by Gi-Oh Kim

[permalink] [raw]
Subject: [PATCH 1/2] super1: ignore failfast flag for setting device role

There is corner case for setting device role,
if new device has failfast flag.
The failfast flag should be ignored.

Signed-off-by: Gioh Kim <[email protected]>
Signed-off-by: Jack Wang <[email protected]>
---
super1.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/super1.c b/super1.c
index 882cd61..1da33ef 100644
--- a/super1.c
+++ b/super1.c
@@ -1491,6 +1491,7 @@ static int add_to_super1(struct supertype *st, mdu_disk_info_t *dk,
struct devinfo *di, **dip;
bitmap_super_t *bms = (bitmap_super_t*)(((char*)sb) + MAX_SB_SIZE);
int rv, lockid;
+ int dk_state;

if (bms->version == BITMAP_MAJOR_CLUSTERED && dlm_funs_ready()) {
rv = cluster_get_dlmlock(&lockid);
@@ -1501,11 +1502,12 @@ static int add_to_super1(struct supertype *st, mdu_disk_info_t *dk,
}
}

- if ((dk->state & 6) == 6) /* active, sync */
+ dk_state &= ~(1<<MD_DISK_FAILFAST);
+ if ((dk_state & 6) == 6) /* active, sync */
*rp = __cpu_to_le16(dk->raid_disk);
- else if (dk->state & (1<<MD_DISK_JOURNAL))
+ else if (dk_state & (1<<MD_DISK_JOURNAL))
*rp = MD_DISK_ROLE_JOURNAL;
- else if ((dk->state & ~2) == 0) /* active or idle -> spare */
+ else if ((dk_state & ~2) == 0) /* active or idle -> spare */
*rp = MD_DISK_ROLE_SPARE;
else
*rp = MD_DISK_ROLE_FAULTY;
--
2.5.0

2017-03-17 20:21:37

by Jes Sorensen

[permalink] [raw]
Subject: Re: [PATCH 1/2] super1: ignore failfast flag for setting device role

Gioh Kim <[email protected]> writes:
> There is corner case for setting device role,
> if new device has failfast flag.
> The failfast flag should be ignored.
>
> Signed-off-by: Gioh Kim <[email protected]>
> Signed-off-by: Jack Wang <[email protected]>
> ---
> super1.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/super1.c b/super1.c
> index 882cd61..1da33ef 100644
> --- a/super1.c
> +++ b/super1.c
> @@ -1491,6 +1491,7 @@ static int add_to_super1(struct supertype *st, mdu_disk_info_t *dk,
> struct devinfo *di, **dip;
> bitmap_super_t *bms = (bitmap_super_t*)(((char*)sb) + MAX_SB_SIZE);
> int rv, lockid;
> + int dk_state;
>
> if (bms->version == BITMAP_MAJOR_CLUSTERED && dlm_funs_ready()) {
> rv = cluster_get_dlmlock(&lockid);
> @@ -1501,11 +1502,12 @@ static int add_to_super1(struct supertype *st, mdu_disk_info_t *dk,
> }
> }
>
> - if ((dk->state & 6) == 6) /* active, sync */
> + dk_state &= ~(1<<MD_DISK_FAILFAST);
> + if ((dk_state & 6) == 6) /* active, sync */
> *rp = __cpu_to_le16(dk->raid_disk);

This does not look right - you haven't assigned a value to dk_state, but
then start masking bits out of it.

Cheers,
Jes