LinuxLists.cc - [PATCH] md/raid1: add error handling of read error from FailFast device

2018-05-02 11:08:57

Subject: [PATCH] md/raid1: add error handling of read error from FailFast device

Current handle_read_error() function calls fix_read_error()
only if md device is RW and rdev does not include FailFast flag.
It does not handle a read error from a RW device including
FailFast flag.

I am not sure it is intended. But I found that write IO error
sets rdev faulty. The md module should handle the read IO error and
write IO error equally. So I think read IO error should set rdev faulty.

Signed-off-by: Gioh Kim <[email protected]>
---
drivers/md/raid1.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index e9e3308cb0a7..4445179aa4c8 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2474,6 +2474,8 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
fix_read_error(conf, r1_bio->read_disk,
r1_bio->sector, r1_bio->sectors);
unfreeze_array(conf);
+ } else if (mddev->ro == 0 && test_bit(FailFast, &rdev->flags)) {
+ md_error(mddev, rdev);
} else {
r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
}
--
2.14.1

2018-05-02 12:13:23

by Gi-Oh Kim

[permalink] [raw]

Subject: Re: [PATCH] md/raid1: add error handling of read error from FailFast device

On Wed, May 2, 2018 at 1:08 PM, Gioh Kim <[email protected]> wrote:
> Current handle_read_error() function calls fix_read_error()
> only if md device is RW and rdev does not include FailFast flag.
> It does not handle a read error from a RW device including
> FailFast flag.
>
> I am not sure it is intended. But I found that write IO error
> sets rdev faulty. The md module should handle the read IO error and
> write IO error equally. So I think read IO error should set rdev faulty.
>
> Signed-off-by: Gioh Kim <[email protected]>
> ---
> drivers/md/raid1.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index e9e3308cb0a7..4445179aa4c8 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2474,6 +2474,8 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
> fix_read_error(conf, r1_bio->read_disk,
> r1_bio->sector, r1_bio->sectors);
> unfreeze_array(conf);
> + } else if (mddev->ro == 0 && test_bit(FailFast, &rdev->flags)) {
> + md_error(mddev, rdev);
> } else {
> r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
> }
> --
> 2.14.1
>

I think it would be helpful to show how I tested it.

As following I used Ubuntu 17.10 and mdadm v4.0.
# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=17.10
DISTRIB_CODENAME=artful
DISTRIB_DESCRIPTION="Ubuntu 17.10"
# uname -a
Linux ws00837 4.13.0-16-generic #19-Ubuntu SMP Wed Oct 11 18:35:14 UTC
2017 x86_64 x86_64 x86_64 GNU/Linux
# mdadm --version
mdadm - v4.0 - 2017-01-09

Following is how I generated the read IO error and checked md device.
After read IO, no device was set as faulty

# modprobe scsi_debug num_parts=2
# man mdadm
# mdadm -C /dev/md111 --failfast -l 1 -n 2 /dev/sdc1 /dev/sdc2
mdadm: Note: this array has metadata at the start and
may not be suitable as a boot device. If you plan to
store '/boot' on this device please ensure that
your boot-loader understands md/v1.x metadata, or use
--metadata=0.90
mdadm: largest drive (/dev/sdc2) exceeds size (3904K) by more than 1%
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md111 started.
# mdadm -D /dev/md111
/dev/md111:
Version : 1.2
Creation Time : Wed May 2 10:55:35 2018
Raid Level : raid1
Array Size : 3904
Used Dev Size : 3904
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Update Time : Wed May 2 10:55:36 2018
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Name : ws00837:111 (local to host ws00837)
UUID : 9f214193:03cf7c97:3208da22:d6ab8a13
Events : 17

Number Major Minor RaidDevice State
0 8 33 0 active sync failfast /dev/sdc1
1 8 34 1 active sync failfast /dev/sdc2
# cat /proc/mdstat
Personalities : [raid1]
md111 : active raid1 sdc2[1] sdc1[0]
3904 blocks super 1.2 [2/2] [UU]

unused devices: <none>
# echo -1 > /sys/module/scsi_debug/parameters/every_nth && echo 4 >
/sys/module/scsi_debug/parameters/opts
# dd if=/dev/md111 of=/dev/null bs=4K count=1 iflag=direct &
[1] 6322
# dd: error reading '/dev/md111': Input/output error
0+0 records in
0+0 records out
0 bytes copied, 124,376 s, 0,0 kB/s

[1]+ Exit 1 dd if=/dev/md111 of=/dev/null bs=4K
count=1 iflag=direct
# mdadm -D /dev/md111/dev/md111:
Version : 1.2
Creation Time : Wed May 2 10:55:35 2018
Raid Level : raid1
Array Size : 3904
Used Dev Size : 3904
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Update Time : Wed May 2 10:55:36 2018
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Number Major Minor RaidDevice State
0 8 33 0 active sync failfast /dev/sdc1
1 8 34 1 active sync failfast /dev/sdc2

Following is how I generated the write IO error and checked md device.
After write IO error, one device was set as faulty.

gohkim@ws00837:~$ sudo modprobe scsi_debug num_parts=2
gohkim@ws00837:~$ sudo mdadm -C /dev/md111 --failfast -l 1 -n 2
/dev/sdc1 /dev/sdc2
mdadm: Note: this array has metadata at the start and
may not be suitable as a boot device. If you plan to
store '/boot' on this device please ensure that
your boot-loader understands md/v1.x metadata, or use
--metadata=0.90
mdadm: largest drive (/dev/sdc2) exceeds size (3904K) by more than 1%
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md111 started.
gohkim@ws00837:~$ sudo mdadm -D /dev/md111
/dev/md111:
Version : 1.2
Creation Time : Wed May 2 14:03:30 2018
Raid Level : raid1
Array Size : 3904
Used Dev Size : 3904
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Update Time : Wed May 2 14:03:31 2018
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Name : ws00837:111 (local to host ws00837)
UUID : ba51fe65:c517a25a:a381ccc5:3617322b
Events : 17

Number Major Minor RaidDevice State
0 8 33 0 active sync failfast /dev/sdc1
1 8 34 1 active sync failfast /dev/sdc2
gohkim@ws00837:~$ echo -1 | sudo tee /sys/module/scsi_debug/parameters/every_nth
-1
gohkim@ws00837:~$ echo 4 | sudo tee /sys/module/scsi_debug/parameters/opts
4
gohkim@ws00837:~$ sudo dd if=/dev/zero of=/dev/md111 bs=4K count=1
oflag=direct &
[1] 13081
gohkim@ws00837:~$ dd: error writing '/dev/md111': Input/output error
1+0 records in
0+0 records out
0 bytes copied, 184,523 s, 0,0 kB/s

[1]+ Exit 1 sudo dd if=/dev/zero of=/dev/md111 bs=4K
count=1 oflag=direct
gohkim@ws00837:~$ sudo mdadm -D /dev/md111
/dev/md111:
Version : 1.2
Creation Time : Wed May 2 14:03:30 2018
Raid Level : raid1
Array Size : 3904
Used Dev Size : 3904
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Update Time : Wed May 2 14:07:47 2018
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0

Number Major Minor RaidDevice State
0 8 33 0 active sync failfast /dev/sdc1
- 0 0 1 removed

1 8 34 - faulty failfast /dev/sdc2

--
GIOH KIM
Linux Kernel Entwickler

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel: +49 176 2697 8962
Fax: +49 30 577 008 299
Email: [email protected]
URL: https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss, Matthias Steinberg, Christoph Steffens

2018-05-04 08:39:45

by Gi-Oh Kim

[permalink] [raw]

Subject: Re: [PATCH] md/raid1: add error handling of read error from FailFast device

On Wed, May 2, 2018 at 2:11 PM, Gi-Oh Kim <[email protected]> wrote:
> On Wed, May 2, 2018 at 1:08 PM, Gioh Kim <[email protected]> wrote:
>> Current handle_read_error() function calls fix_read_error()
>> only if md device is RW and rdev does not include FailFast flag.
>> It does not handle a read error from a RW device including
>> FailFast flag.
>>
>> I am not sure it is intended. But I found that write IO error
>> sets rdev faulty. The md module should handle the read IO error and
>> write IO error equally. So I think read IO error should set rdev faulty.

Hi Mr. Neil Brown.

Could you please inform me if it is a bug or feature that md module
does not set device faulty after read IO error?
My company product uses failfast flag to create md devices for a
virtual machine.
Even if storage get failed and the virtual machine fails to read data,
I cannot check which md device is faulty with mdadm tool.
If it is intended, I need to disable failfast flag.

Thank you in advance.

>>
>> Signed-off-by: Gioh Kim <[email protected]>
>> ---
>> drivers/md/raid1.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index e9e3308cb0a7..4445179aa4c8 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -2474,6 +2474,8 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>> fix_read_error(conf, r1_bio->read_disk,
>> r1_bio->sector, r1_bio->sectors);
>> unfreeze_array(conf);
>> + } else if (mddev->ro == 0 && test_bit(FailFast, &rdev->flags)) {
>> + md_error(mddev, rdev);
>> } else {
>> r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
>> }
>> --
>> 2.14.1
>>
>
> I think it would be helpful to show how I tested it.
>
> As following I used Ubuntu 17.10 and mdadm v4.0.
> # cat /etc/lsb-release
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=17.10
> DISTRIB_CODENAME=artful
> DISTRIB_DESCRIPTION="Ubuntu 17.10"
> # uname -a
> Linux ws00837 4.13.0-16-generic #19-Ubuntu SMP Wed Oct 11 18:35:14 UTC
> 2017 x86_64 x86_64 x86_64 GNU/Linux
> # mdadm --version
> mdadm - v4.0 - 2017-01-09
>
> Following is how I generated the read IO error and checked md device.
> After read IO, no device was set as faulty
>
> # modprobe scsi_debug num_parts=2
> # man mdadm
> # mdadm -C /dev/md111 --failfast -l 1 -n 2 /dev/sdc1 /dev/sdc2
> mdadm: Note: this array has metadata at the start and
> may not be suitable as a boot device. If you plan to
> store '/boot' on this device please ensure that
> your boot-loader understands md/v1.x metadata, or use
> --metadata=0.90
> mdadm: largest drive (/dev/sdc2) exceeds size (3904K) by more than 1%
> Continue creating array? y
> mdadm: Defaulting to version 1.2 metadata
> mdadm: array /dev/md111 started.
> # mdadm -D /dev/md111
> /dev/md111:
> Version : 1.2
> Creation Time : Wed May 2 10:55:35 2018
> Raid Level : raid1
> Array Size : 3904
> Used Dev Size : 3904
> Raid Devices : 2
> Total Devices : 2
> Persistence : Superblock is persistent
>
> Update Time : Wed May 2 10:55:36 2018
> State : clean
> Active Devices : 2
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 0
>
> Name : ws00837:111 (local to host ws00837)
> UUID : 9f214193:03cf7c97:3208da22:d6ab8a13
> Events : 17
>
> Number Major Minor RaidDevice State
> 0 8 33 0 active sync failfast /dev/sdc1
> 1 8 34 1 active sync failfast /dev/sdc2
> # cat /proc/mdstat
> Personalities : [raid1]
> md111 : active raid1 sdc2[1] sdc1[0]
> 3904 blocks super 1.2 [2/2] [UU]
>
> unused devices: <none>
> # echo -1 > /sys/module/scsi_debug/parameters/every_nth && echo 4 >
> /sys/module/scsi_debug/parameters/opts
> # dd if=/dev/md111 of=/dev/null bs=4K count=1 iflag=direct &
> [1] 6322
> # dd: error reading '/dev/md111': Input/output error
> 0+0 records in
> 0+0 records out
> 0 bytes copied, 124,376 s, 0,0 kB/s
>
> [1]+ Exit 1 dd if=/dev/md111 of=/dev/null bs=4K
> count=1 iflag=direct
> # mdadm -D /dev/md111/dev/md111:
> Version : 1.2
> Creation Time : Wed May 2 10:55:35 2018
> Raid Level : raid1
> Array Size : 3904
> Used Dev Size : 3904
> Raid Devices : 2
> Total Devices : 2
> Persistence : Superblock is persistent
>
> Update Time : Wed May 2 10:55:36 2018
> State : clean
> Active Devices : 2
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 0
>
> Number Major Minor RaidDevice State
> 0 8 33 0 active sync failfast /dev/sdc1
> 1 8 34 1 active sync failfast /dev/sdc2
>
>
> Following is how I generated the write IO error and checked md device.
> After write IO error, one device was set as faulty.
>
> gohkim@ws00837:~$ sudo modprobe scsi_debug num_parts=2
> gohkim@ws00837:~$ sudo mdadm -C /dev/md111 --failfast -l 1 -n 2
> /dev/sdc1 /dev/sdc2
> mdadm: Note: this array has metadata at the start and
> may not be suitable as a boot device. If you plan to
> store '/boot' on this device please ensure that
> your boot-loader understands md/v1.x metadata, or use
> --metadata=0.90
> mdadm: largest drive (/dev/sdc2) exceeds size (3904K) by more than 1%
> Continue creating array? y
> mdadm: Defaulting to version 1.2 metadata
> mdadm: array /dev/md111 started.
> gohkim@ws00837:~$ sudo mdadm -D /dev/md111
> /dev/md111:
> Version : 1.2
> Creation Time : Wed May 2 14:03:30 2018
> Raid Level : raid1
> Array Size : 3904
> Used Dev Size : 3904
> Raid Devices : 2
> Total Devices : 2
> Persistence : Superblock is persistent
>
> Update Time : Wed May 2 14:03:31 2018
> State : clean
> Active Devices : 2
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 0
>
> Name : ws00837:111 (local to host ws00837)
> UUID : ba51fe65:c517a25a:a381ccc5:3617322b
> Events : 17
>
> Number Major Minor RaidDevice State
> 0 8 33 0 active sync failfast /dev/sdc1
> 1 8 34 1 active sync failfast /dev/sdc2
> gohkim@ws00837:~$ echo -1 | sudo tee /sys/module/scsi_debug/parameters/every_nth
> -1
> gohkim@ws00837:~$ echo 4 | sudo tee /sys/module/scsi_debug/parameters/opts
> 4
> gohkim@ws00837:~$ sudo dd if=/dev/zero of=/dev/md111 bs=4K count=1
> oflag=direct &
> [1] 13081
> gohkim@ws00837:~$ dd: error writing '/dev/md111': Input/output error
> 1+0 records in
> 0+0 records out
> 0 bytes copied, 184,523 s, 0,0 kB/s
>
> [1]+ Exit 1 sudo dd if=/dev/zero of=/dev/md111 bs=4K
> count=1 oflag=direct
> gohkim@ws00837:~$ sudo mdadm -D /dev/md111
> /dev/md111:
> Version : 1.2
> Creation Time : Wed May 2 14:03:30 2018
> Raid Level : raid1
> Array Size : 3904
> Used Dev Size : 3904
> Raid Devices : 2
> Total Devices : 2
> Persistence : Superblock is persistent
>
> Update Time : Wed May 2 14:07:47 2018
> State : clean, degraded
> Active Devices : 1
> Working Devices : 1
> Failed Devices : 1
> Spare Devices : 0
>
> Number Major Minor RaidDevice State
> 0 8 33 0 active sync failfast /dev/sdc1
> - 0 0 1 removed
>
> 1 8 34 - faulty failfast /dev/sdc2
>
>
>
> --
> GIOH KIM
> Linux Kernel Entwickler
>
> ProfitBricks GmbH
> Greifswalder Str. 207
> D - 10405 Berlin
>
> Tel: +49 176 2697 8962
> Fax: +49 30 577 008 299
> Email: [email protected]
> URL: https://www.profitbricks.de
>
> Sitz der Gesellschaft: Berlin
> Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
> Geschäftsführer: Achim Weiss, Matthias Steinberg, Christoph Steffens

--
GIOH KIM
Linux Kernel Entwickler

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel: +49 176 2697 8962
Fax: +49 30 577 008 299
Email: [email protected]
URL: https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss, Matthias Steinberg, Christoph Steffens

2018-05-14 08:25:08

by Jack Wang

[permalink] [raw]

Subject: Re: [PATCH] md/raid1: add error handling of read error from FailFast device

On Wed, May 9, 2018 at 10:58 AM, Jack Wang <[email protected]> wrote:
> ---------- Forwarded message ----------
> From: Gioh Kim <[email protected]>
> Date: 2018-05-02 13:08 GMT+02:00
> Subject: [PATCH] md/raid1: add error handling of read error from FailFast device
> To: [email protected]
> 抄送： [email protected], [email protected], Gioh Kim
> <[email protected]>
>
>
> Current handle_read_error() function calls fix_read_error()
> only if md device is RW and rdev does not include FailFast flag.
> It does not handle a read error from a RW device including
> FailFast flag.
>
> I am not sure it is intended. But I found that write IO error
> sets rdev faulty. The md module should handle the read IO error and
> write IO error equally. So I think read IO error should set rdev faulty.
>
> Signed-off-by: Gioh Kim <[email protected]>
> ---
> drivers/md/raid1.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index e9e3308cb0a7..4445179aa4c8 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2474,6 +2474,8 @@ static void handle_read_error(struct r1conf
> *conf, struct r1bio *r1_bio)
> fix_read_error(conf, r1_bio->read_disk,
> r1_bio->sector, r1_bio->sectors);
> unfreeze_array(conf);
> + } else if (mddev->ro == 0 && test_bit(FailFast, &rdev->flags)) {
> + md_error(mddev, rdev);
> } else {
> r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
> }
> --
> 2.14.1

Patch looks good to me!
Reviewed-by: Jack Wang <[email protected]>
--
Jack Wang
Linux Kernel Developer

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel: +49 30 577 008 042
Fax: +49 30 577 008 299
Email: [email protected]
URL: https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss, Matthias Steinberg, Christoph Steffens

2018-05-14 19:30:40

by Shaohua Li

[permalink] [raw]

Subject: Re: [PATCH] md/raid1: add error handling of read error from FailFast device

On Wed, May 02, 2018 at 01:08:11PM +0200, Gioh Kim wrote:
> Current handle_read_error() function calls fix_read_error()
> only if md device is RW and rdev does not include FailFast flag.
> It does not handle a read error from a RW device including
> FailFast flag.
>
> I am not sure it is intended. But I found that write IO error
> sets rdev faulty. The md module should handle the read IO error and
> write IO error equally. So I think read IO error should set rdev faulty.
>
> Signed-off-by: Gioh Kim <[email protected]>
> ---
> drivers/md/raid1.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index e9e3308cb0a7..4445179aa4c8 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2474,6 +2474,8 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
> fix_read_error(conf, r1_bio->read_disk,
> r1_bio->sector, r1_bio->sectors);
> unfreeze_array(conf);
> + } else if (mddev->ro == 0 && test_bit(FailFast, &rdev->flags)) {
> + md_error(mddev, rdev);
> } else {
> r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
> }
> --
> 2.14.1

Looks reasonable, applied!