2023-07-20 07:09:32

by Paul Menzel

[permalink] [raw]
Subject: Re: [PATCH] md/bitmap: Fix bitmap page writing problem when using block integrity

Dear Jinyoung,


Thank you very much for your patch. Some minor comments, you can also
ignore.

For the commit message summary/title you might be more specific. Maybe:

> Avoid protection error writing bitmap page with block integrity

Am 20.07.23 um 08:12 schrieb Jinyoung CHOI:
> Be careful when changing the page to perform DMA.
> Changing the bitmap page is also possible on the page where the DMA is
> being performed or scheduled in the MD.

Please add a blank line between paragraphs or do not wrap a line just
because a sentence ends.

> When configuring raid1(mirror) with devices that support block integrity,

Add a space before the (?

> the same bitmap page is sent to the device twice during the resync process,
> causing the following problems.
> (When requeue is executed, integrity is not updated)
>
> [Func 1] [Func 2]
>
> 1 A(page) + a(integrity)
> 2 (sq doorbell)
> 3 A(page) -> A-1(page)
> 4 A-1(page-updated) + a(integiry) A-1(page) + a-1(integrity)

integ*rit*y

> 5 (sq doorbell)
> 6 (DMA) (DMA)
>
> I/O Fail and retry N I/O Success
> To be Faulty Device
>
> The following is the log when a problem occurs. The problematic device
> is in the faulty device state.
>
> Log:
> [ 135.037253] md/raid1:md0: active with 2 out of 2 mirrors
> [ 135.038228] md0: detected capacity change from 0 to 7501212288
> [ 135.038270] md: resync of RAID array md0
> [ 151.252172] nvme2n1: I/O Cmd(0x1) @ LBA 16, 8 blocks, I/O Error (sct 0x2 / sc 0x82) MORE
> [ 151.252180] protection error, dev nvme2n1, sector 16 op 0x1:(WRITE) flags 0x10800 phys_seg 1 prio class 2
> [ 151.252185] md: super_written gets error=-84
> [ 151.252187] md/raid1:md0: Disk failure on nvme2n1, disabling device.
> md/raid1:md0: Operation continuing on 1 devices.
> [ 151.267450] nvme3n1: I/O Cmd(0x1) @ LBA 16, 8 blocks, I/O Error (sct 0x2 / sc 0x82) MORE
> [ 151.267457] protection error, dev nvme3n1, sector 16 op 0x1:(WRITE) flags 0x10800 phys_seg 1 prio class 2
> [ 151.267460] md: super_written gets error=-84
> [ 151.268458] md: md0: resync interrupted.
> [ 151.320765] md: resync of RAID array md0
> [ 151.321205] md: md0: resync done.

Although you explained the problem well, it’d be great nevertheless if
you could add the details of your system to the commit message.

> Fixes: 85c9ccd4f026 ("md/bitmap: Don't write bitmap while earlier writes might be in-flight")
> Signed-off-by: Jinyoung Choi <[email protected]>

Your From line spells it CHOI. Maybe you can update your git
configuration to also use Choi?

> ---
> drivers/md/md-bitmap.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
> index 1ff712889a3b..dfb7418ba48a 100644
> --- a/drivers/md/md-bitmap.c
> +++ b/drivers/md/md-bitmap.c
> @@ -467,6 +467,13 @@ void md_bitmap_update_sb(struct bitmap *bitmap)
> return;
> if (!bitmap->storage.sb_page) /* no superblock */
> return;
> +
> + /*
> + * Before modifying the bitmap page and re-issue it, wait for
> + * the requests previously sent to the device to be completed.
> + */
> + md_bitmap_wait_writes(bitmap);
> +
> sb = kmap_atomic(bitmap->storage.sb_page);
> sb->events = cpu_to_le64(bitmap->mddev->events);
> if (bitmap->mddev->events < bitmap->events_cleared)


Kind regards,

Paul


2023-07-20 08:02:52

by Jinyoung Choi

[permalink] [raw]
Subject: RE:(2) [PATCH] md/bitmap: Fix bitmap page writing problem when using block integrity

Dear Paul,

> Dear Jinyoung,
>
>
> Thank you very much for your patch. Some minor comments, you can also
> ignore.

I will reflect the advice you gave me regarding the commit message and send it again.

> Your From line spells it CHOI. Maybe you can update your git
> configuration to also use Choi?

It was being set like that in the company mail system. (CHOI)
I will modify it to be seen as "Choi".
Thank you for your comment. :)

Best Regards,
Jinyoung.