2023-05-29 13:17:35

by Yu Kuai

[permalink] [raw]
Subject: [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread

From: Yu Kuai <[email protected]>

current->bio_list will be set under submit_bio() context, in this case
bitmap io will be added to the list and wait for current io submission to
finish, while current io submission must wait for bitmap io to be done.
commit 874807a83139 ("md/raid1{,0}: fix deadlock in bitmap_unplug.") fix
the deadlock by handling plugged bio by daemon thread.

On the one hand, the deadlock won't exist after commit a214b949d8e3
("blk-mq: only flush requests from the plug in blk_mq_submit_bio"). On
the other hand, current solution makes it impossible to flush plugged bio
in raid1/10_make_request(), because this will cause that all the writes
will goto daemon thread.

In order to limit the number of plugged bio, commit 874807a83139
("md/raid1{,0}: fix deadlock in bitmap_unplug.") is reverted, and the
deadlock is fixed by handling bitmap io asynchronously.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/raid1-10.c | 14 ++++++++++++++
drivers/md/raid1.c | 4 ++--
drivers/md/raid10.c | 8 +++-----
3 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
index 73cc3cb9154d..17e55c1fd5a1 100644
--- a/drivers/md/raid1-10.c
+++ b/drivers/md/raid1-10.c
@@ -151,3 +151,17 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,

return true;
}
+
+/*
+ * current->bio_list will be set under submit_bio() context, in this case bitmap
+ * io will be added to the list and wait for current io submission to finish,
+ * while current io submission must wait for bitmap io to be done. In order to
+ * avoid such deadlock, submit bitmap io asynchronously.
+ */
+static inline void raid1_prepare_flush_writes(struct bitmap *bitmap)
+{
+ if (current->bio_list)
+ md_bitmap_unplug_async(bitmap);
+ else
+ md_bitmap_unplug(bitmap);
+}
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 0778e398584c..006620fed595 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -794,7 +794,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
static void flush_bio_list(struct r1conf *conf, struct bio *bio)
{
/* flush any pending bitmap writes to disk before proceeding w/ I/O */
- md_bitmap_unplug(conf->mddev->bitmap);
+ raid1_prepare_flush_writes(conf->mddev->bitmap);
wake_up(&conf->wait_barrier);

while (bio) { /* submit pending writes */
@@ -1166,7 +1166,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
struct r1conf *conf = mddev->private;
struct bio *bio;

- if (from_schedule || current->bio_list) {
+ if (from_schedule) {
spin_lock_irq(&conf->device_lock);
bio_list_merge(&conf->pending_bio_list, &plug->pending);
spin_unlock_irq(&conf->device_lock);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 6640507ecb0d..fb22cfe94d32 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -902,9 +902,7 @@ static void flush_pending_writes(struct r10conf *conf)
__set_current_state(TASK_RUNNING);

blk_start_plug(&plug);
- /* flush any pending bitmap writes to disk
- * before proceeding w/ I/O */
- md_bitmap_unplug(conf->mddev->bitmap);
+ raid1_prepare_flush_writes(conf->mddev->bitmap);
wake_up(&conf->wait_barrier);

while (bio) { /* submit pending writes */
@@ -1108,7 +1106,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
struct r10conf *conf = mddev->private;
struct bio *bio;

- if (from_schedule || current->bio_list) {
+ if (from_schedule) {
spin_lock_irq(&conf->device_lock);
bio_list_merge(&conf->pending_bio_list, &plug->pending);
spin_unlock_irq(&conf->device_lock);
@@ -1120,7 +1118,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)

/* we aren't scheduling, so we can do the write-out directly. */
bio = bio_list_get(&plug->pending);
- md_bitmap_unplug(mddev->bitmap);
+ raid1_prepare_flush_writes(mddev->bitmap);
wake_up(&conf->wait_barrier);

while (bio) { /* submit pending writes */
--
2.39.2



2023-05-31 07:57:12

by Xiao Ni

[permalink] [raw]
Subject: Re: [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread

On Mon, May 29, 2023 at 9:14 PM Yu Kuai <[email protected]> wrote:
>
> From: Yu Kuai <[email protected]>
>
> current->bio_list will be set under submit_bio() context, in this case
> bitmap io will be added to the list and wait for current io submission to
> finish, while current io submission must wait for bitmap io to be done.
> commit 874807a83139 ("md/raid1{,0}: fix deadlock in bitmap_unplug.") fix
> the deadlock by handling plugged bio by daemon thread.

Thanks for the historic introduction. I did a test and printed the
logs in raid10_unplug. The tools I used are dd and mkfs. from_schedule
is always true during I/O and it's 0 when io finishes. So I have a
question here, how can I trigger the condition that from_schedule is 0
and current->list is not NULL? In other words, is there really a
deadlock here? Before your patch it looks like all bios are merged
into conf->pending_bio_list and are handled by raid10d. It can't
submit bio directly in the originating process which mentioned in
57c67df48866

>
> On the one hand, the deadlock won't exist after commit a214b949d8e3
> ("blk-mq: only flush requests from the plug in blk_mq_submit_bio"). On
> the other hand, current solution makes it impossible to flush plugged bio
> in raid1/10_make_request(), because this will cause that all the writes
> will goto daemon thread.
>
> In order to limit the number of plugged bio, commit 874807a83139
> ("md/raid1{,0}: fix deadlock in bitmap_unplug.") is reverted, and the
> deadlock is fixed by handling bitmap io asynchronously.
>
> Signed-off-by: Yu Kuai <[email protected]>
> ---
> drivers/md/raid1-10.c | 14 ++++++++++++++
> drivers/md/raid1.c | 4 ++--
> drivers/md/raid10.c | 8 +++-----
> 3 files changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
> index 73cc3cb9154d..17e55c1fd5a1 100644
> --- a/drivers/md/raid1-10.c
> +++ b/drivers/md/raid1-10.c
> @@ -151,3 +151,17 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
>
> return true;
> }
> +
> +/*
> + * current->bio_list will be set under submit_bio() context, in this case bitmap
> + * io will be added to the list and wait for current io submission to finish,
> + * while current io submission must wait for bitmap io to be done. In order to
> + * avoid such deadlock, submit bitmap io asynchronously.
> + */
> +static inline void raid1_prepare_flush_writes(struct bitmap *bitmap)
> +{
> + if (current->bio_list)
> + md_bitmap_unplug_async(bitmap);
> + else
> + md_bitmap_unplug(bitmap);
> +}
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 0778e398584c..006620fed595 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -794,7 +794,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
> static void flush_bio_list(struct r1conf *conf, struct bio *bio)
> {
> /* flush any pending bitmap writes to disk before proceeding w/ I/O */
> - md_bitmap_unplug(conf->mddev->bitmap);
> + raid1_prepare_flush_writes(conf->mddev->bitmap);

If we unplug bitmap asynchronously, can we make sure the bitmap are
flushed before the corresponding data?

Regards
Xiao

> wake_up(&conf->wait_barrier);
>
> while (bio) { /* submit pending writes */
> @@ -1166,7 +1166,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
> struct r1conf *conf = mddev->private;
> struct bio *bio;
>
> - if (from_schedule || current->bio_list) {
> + if (from_schedule) {
> spin_lock_irq(&conf->device_lock);
> bio_list_merge(&conf->pending_bio_list, &plug->pending);
> spin_unlock_irq(&conf->device_lock);
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 6640507ecb0d..fb22cfe94d32 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -902,9 +902,7 @@ static void flush_pending_writes(struct r10conf *conf)
> __set_current_state(TASK_RUNNING);
>
> blk_start_plug(&plug);
> - /* flush any pending bitmap writes to disk
> - * before proceeding w/ I/O */
> - md_bitmap_unplug(conf->mddev->bitmap);
> + raid1_prepare_flush_writes(conf->mddev->bitmap);
> wake_up(&conf->wait_barrier);
>
> while (bio) { /* submit pending writes */
> @@ -1108,7 +1106,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
> struct r10conf *conf = mddev->private;
> struct bio *bio;
>
> - if (from_schedule || current->bio_list) {
> + if (from_schedule) {
> spin_lock_irq(&conf->device_lock);
> bio_list_merge(&conf->pending_bio_list, &plug->pending);
> spin_unlock_irq(&conf->device_lock);
> @@ -1120,7 +1118,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
>
> /* we aren't scheduling, so we can do the write-out directly. */
> bio = bio_list_get(&plug->pending);
> - md_bitmap_unplug(mddev->bitmap);
> + raid1_prepare_flush_writes(mddev->bitmap);
> wake_up(&conf->wait_barrier);
>
> while (bio) { /* submit pending writes */
> --
> 2.39.2
>


2023-05-31 08:06:25

by Xiao Ni

[permalink] [raw]
Subject: Re: [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread

On Wed, May 31, 2023 at 3:55 PM Yu Kuai <[email protected]> wrote:
>
> Hi,
>
> 在 2023/05/31 15:50, Xiao Ni 写道:
> > On Mon, May 29, 2023 at 9:14 PM Yu Kuai <[email protected]> wrote:
> >>
> >> From: Yu Kuai <[email protected]>
> >>
> >> current->bio_list will be set under submit_bio() context, in this case
> >> bitmap io will be added to the list and wait for current io submission to
> >> finish, while current io submission must wait for bitmap io to be done.
> >> commit 874807a83139 ("md/raid1{,0}: fix deadlock in bitmap_unplug.") fix
> >> the deadlock by handling plugged bio by daemon thread.
> >
> > Thanks for the historic introduction. I did a test and printed the
> > logs in raid10_unplug. The tools I used are dd and mkfs. from_schedule
> > is always true during I/O and it's 0 when io finishes. So I have a
> > question here, how can I trigger the condition that from_schedule is 0
> > and current->list is not NULL? In other words, is there really a
> > deadlock here? Before your patch it looks like all bios are merged
> > into conf->pending_bio_list and are handled by raid10d. It can't
> > submit bio directly in the originating process which mentioned in
> > 57c67df48866
> >
> As I mentioned below, after commit a214b949d8e3, this deadlock doesn't
> exist anymore, and without this patch, patch 7 will introduce this
> scenario again.
>
> Thanks,
> Kuai
> >>
> >> On the one hand, the deadlock won't exist after commit a214b949d8e3
> >> ("blk-mq: only flush requests from the plug in blk_mq_submit_bio"). On
> >> the other hand, current solution makes it impossible to flush plugged bio
> >> in raid1/10_make_request(), because this will cause that all the writes
> >> will goto daemon thread.
> >>
> >> In order to limit the number of plugged bio, commit 874807a83139
> >> ("md/raid1{,0}: fix deadlock in bitmap_unplug.") is reverted, and the
> >> deadlock is fixed by handling bitmap io asynchronously.
> >>
> >> Signed-off-by: Yu Kuai <[email protected]>
> >> ---
> >> drivers/md/raid1-10.c | 14 ++++++++++++++
> >> drivers/md/raid1.c | 4 ++--
> >> drivers/md/raid10.c | 8 +++-----
> >> 3 files changed, 19 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
> >> index 73cc3cb9154d..17e55c1fd5a1 100644
> >> --- a/drivers/md/raid1-10.c
> >> +++ b/drivers/md/raid1-10.c
> >> @@ -151,3 +151,17 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
> >>
> >> return true;
> >> }
> >> +
> >> +/*
> >> + * current->bio_list will be set under submit_bio() context, in this case bitmap
> >> + * io will be added to the list and wait for current io submission to finish,
> >> + * while current io submission must wait for bitmap io to be done. In order to
> >> + * avoid such deadlock, submit bitmap io asynchronously.
> >> + */
> >> +static inline void raid1_prepare_flush_writes(struct bitmap *bitmap)
> >> +{
> >> + if (current->bio_list)
> >> + md_bitmap_unplug_async(bitmap);
> >> + else
> >> + md_bitmap_unplug(bitmap);
> >> +}
> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> >> index 0778e398584c..006620fed595 100644
> >> --- a/drivers/md/raid1.c
> >> +++ b/drivers/md/raid1.c
> >> @@ -794,7 +794,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
> >> static void flush_bio_list(struct r1conf *conf, struct bio *bio)
> >> {
> >> /* flush any pending bitmap writes to disk before proceeding w/ I/O */
> >> - md_bitmap_unplug(conf->mddev->bitmap);
> >> + raid1_prepare_flush_writes(conf->mddev->bitmap);
> >
> > If we unplug bitmap asynchronously, can we make sure the bitmap are
> > flushed before the corresponding data?

Could you explain this question?

Regards
Xiao


> >
> > Regards
> > Xiao
> >
> >> wake_up(&conf->wait_barrier);
> >>
> >> while (bio) { /* submit pending writes */
> >> @@ -1166,7 +1166,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
> >> struct r1conf *conf = mddev->private;
> >> struct bio *bio;
> >>
> >> - if (from_schedule || current->bio_list) {
> >> + if (from_schedule) {
> >> spin_lock_irq(&conf->device_lock);
> >> bio_list_merge(&conf->pending_bio_list, &plug->pending);
> >> spin_unlock_irq(&conf->device_lock);
> >> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> >> index 6640507ecb0d..fb22cfe94d32 100644
> >> --- a/drivers/md/raid10.c
> >> +++ b/drivers/md/raid10.c
> >> @@ -902,9 +902,7 @@ static void flush_pending_writes(struct r10conf *conf)
> >> __set_current_state(TASK_RUNNING);
> >>
> >> blk_start_plug(&plug);
> >> - /* flush any pending bitmap writes to disk
> >> - * before proceeding w/ I/O */
> >> - md_bitmap_unplug(conf->mddev->bitmap);
> >> + raid1_prepare_flush_writes(conf->mddev->bitmap);
> >> wake_up(&conf->wait_barrier);
> >>
> >> while (bio) { /* submit pending writes */
> >> @@ -1108,7 +1106,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
> >> struct r10conf *conf = mddev->private;
> >> struct bio *bio;
> >>
> >> - if (from_schedule || current->bio_list) {
> >> + if (from_schedule) {
> >> spin_lock_irq(&conf->device_lock);
> >> bio_list_merge(&conf->pending_bio_list, &plug->pending);
> >> spin_unlock_irq(&conf->device_lock);
> >> @@ -1120,7 +1118,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
> >>
> >> /* we aren't scheduling, so we can do the write-out directly. */
> >> bio = bio_list_get(&plug->pending);
> >> - md_bitmap_unplug(mddev->bitmap);
> >> + raid1_prepare_flush_writes(mddev->bitmap);
> >> wake_up(&conf->wait_barrier);
> >>
> >> while (bio) { /* submit pending writes */
> >> --
> >> 2.39.2
> >>
> >
> > .
> >
>


2023-05-31 08:24:12

by Paul Menzel

[permalink] [raw]
Subject: Re: [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread

Dear Yu,


Thank you for your patch. Some minor nits in case you should resend this.

In the summary/title it should be plug*g*ed.

Am 29.05.23 um 15:11 schrieb Yu Kuai:
> From: Yu Kuai <[email protected]>
>
> current->bio_list will be set under submit_bio() context, in this case
> bitmap io will be added to the list and wait for current io submission to

1. I’d use present tense: s/will be set/is set/; s/will be added/is added/
2. wait*s*

> finish, while current io submission must wait for bitmap io to be done.
> commit 874807a83139 ("md/raid1{,0}: fix deadlock in bitmap_unplug.") fix
> the deadlock by handling plugged bio by daemon thread.
>
> On the one hand, the deadlock won't exist after commit a214b949d8e3
> ("blk-mq: only flush requests from the plug in blk_mq_submit_bio"). On
> the other hand, current solution makes it impossible to flush plugged bio
> in raid1/10_make_request(), because this will cause that all the writes
> will goto daemon thread.

s/goto/go to/


Kind regards,

Paul


> In order to limit the number of plugged bio, commit 874807a83139
> ("md/raid1{,0}: fix deadlock in bitmap_unplug.") is reverted, and the
> deadlock is fixed by handling bitmap io asynchronously.
>
> Signed-off-by: Yu Kuai <[email protected]>
> ---
> drivers/md/raid1-10.c | 14 ++++++++++++++
> drivers/md/raid1.c | 4 ++--
> drivers/md/raid10.c | 8 +++-----
> 3 files changed, 19 insertions(+), 7 deletions(-)

[…]

2023-05-31 08:26:02

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread

Hi,

在 2023/05/31 16:00, Xiao Ni 写道:
> On Wed, May 31, 2023 at 3:55 PM Yu Kuai <[email protected]> wrote:
>>
>> Hi,
>>
>> 在 2023/05/31 15:50, Xiao Ni 写道:
>>> On Mon, May 29, 2023 at 9:14 PM Yu Kuai <[email protected]> wrote:
>>>>
>>>> From: Yu Kuai <[email protected]>
>>>>
>>>> current->bio_list will be set under submit_bio() context, in this case
>>>> bitmap io will be added to the list and wait for current io submission to
>>>> finish, while current io submission must wait for bitmap io to be done.
>>>> commit 874807a83139 ("md/raid1{,0}: fix deadlock in bitmap_unplug.") fix
>>>> the deadlock by handling plugged bio by daemon thread.
>>>
>>> Thanks for the historic introduction. I did a test and printed the
>>> logs in raid10_unplug. The tools I used are dd and mkfs. from_schedule
>>> is always true during I/O and it's 0 when io finishes. So I have a
>>> question here, how can I trigger the condition that from_schedule is 0
>>> and current->list is not NULL? In other words, is there really a
>>> deadlock here? Before your patch it looks like all bios are merged
>>> into conf->pending_bio_list and are handled by raid10d. It can't
>>> submit bio directly in the originating process which mentioned in
>>> 57c67df48866
>>>
>> As I mentioned below, after commit a214b949d8e3, this deadlock doesn't
>> exist anymore, and without this patch, patch 7 will introduce this
>> scenario again.
>>
>> Thanks,
>> Kuai
>>>>
>>>> On the one hand, the deadlock won't exist after commit a214b949d8e3
>>>> ("blk-mq: only flush requests from the plug in blk_mq_submit_bio"). On
>>>> the other hand, current solution makes it impossible to flush plugged bio
>>>> in raid1/10_make_request(), because this will cause that all the writes
>>>> will goto daemon thread.
>>>>
>>>> In order to limit the number of plugged bio, commit 874807a83139
>>>> ("md/raid1{,0}: fix deadlock in bitmap_unplug.") is reverted, and the
>>>> deadlock is fixed by handling bitmap io asynchronously.
>>>>
>>>> Signed-off-by: Yu Kuai <[email protected]>
>>>> ---
>>>> drivers/md/raid1-10.c | 14 ++++++++++++++
>>>> drivers/md/raid1.c | 4 ++--
>>>> drivers/md/raid10.c | 8 +++-----
>>>> 3 files changed, 19 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
>>>> index 73cc3cb9154d..17e55c1fd5a1 100644
>>>> --- a/drivers/md/raid1-10.c
>>>> +++ b/drivers/md/raid1-10.c
>>>> @@ -151,3 +151,17 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
>>>>
>>>> return true;
>>>> }
>>>> +
>>>> +/*
>>>> + * current->bio_list will be set under submit_bio() context, in this case bitmap
>>>> + * io will be added to the list and wait for current io submission to finish,
>>>> + * while current io submission must wait for bitmap io to be done. In order to
>>>> + * avoid such deadlock, submit bitmap io asynchronously.
>>>> + */
>>>> +static inline void raid1_prepare_flush_writes(struct bitmap *bitmap)
>>>> +{
>>>> + if (current->bio_list)
>>>> + md_bitmap_unplug_async(bitmap);
>>>> + else
>>>> + md_bitmap_unplug(bitmap);
>>>> +}
>>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>>> index 0778e398584c..006620fed595 100644
>>>> --- a/drivers/md/raid1.c
>>>> +++ b/drivers/md/raid1.c
>>>> @@ -794,7 +794,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
>>>> static void flush_bio_list(struct r1conf *conf, struct bio *bio)
>>>> {
>>>> /* flush any pending bitmap writes to disk before proceeding w/ I/O */
>>>> - md_bitmap_unplug(conf->mddev->bitmap);
>>>> + raid1_prepare_flush_writes(conf->mddev->bitmap);
>>>
>>> If we unplug bitmap asynchronously, can we make sure the bitmap are
>>> flushed before the corresponding data?
>
> Could you explain this question?

Sorry that I missed this... See the new helper in patch 5,
md_bitmap_unplug_async() will still wait for bitmap io to finish.

md_bitmap_unplug_async
DECLARE_COMPLETION_ONSTACK(done)
...
wait_for_completion(&done)

Thanks,
Kuai
>
> Regards
> Xiao
>
>
>>>
>>> Regards
>>> Xiao
>>>
>>>> wake_up(&conf->wait_barrier);
>>>>
>>>> while (bio) { /* submit pending writes */
>>>> @@ -1166,7 +1166,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
>>>> struct r1conf *conf = mddev->private;
>>>> struct bio *bio;
>>>>
>>>> - if (from_schedule || current->bio_list) {
>>>> + if (from_schedule) {
>>>> spin_lock_irq(&conf->device_lock);
>>>> bio_list_merge(&conf->pending_bio_list, &plug->pending);
>>>> spin_unlock_irq(&conf->device_lock);
>>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>>>> index 6640507ecb0d..fb22cfe94d32 100644
>>>> --- a/drivers/md/raid10.c
>>>> +++ b/drivers/md/raid10.c
>>>> @@ -902,9 +902,7 @@ static void flush_pending_writes(struct r10conf *conf)
>>>> __set_current_state(TASK_RUNNING);
>>>>
>>>> blk_start_plug(&plug);
>>>> - /* flush any pending bitmap writes to disk
>>>> - * before proceeding w/ I/O */
>>>> - md_bitmap_unplug(conf->mddev->bitmap);
>>>> + raid1_prepare_flush_writes(conf->mddev->bitmap);
>>>> wake_up(&conf->wait_barrier);
>>>>
>>>> while (bio) { /* submit pending writes */
>>>> @@ -1108,7 +1106,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
>>>> struct r10conf *conf = mddev->private;
>>>> struct bio *bio;
>>>>
>>>> - if (from_schedule || current->bio_list) {
>>>> + if (from_schedule) {
>>>> spin_lock_irq(&conf->device_lock);
>>>> bio_list_merge(&conf->pending_bio_list, &plug->pending);
>>>> spin_unlock_irq(&conf->device_lock);
>>>> @@ -1120,7 +1118,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
>>>>
>>>> /* we aren't scheduling, so we can do the write-out directly. */
>>>> bio = bio_list_get(&plug->pending);
>>>> - md_bitmap_unplug(mddev->bitmap);
>>>> + raid1_prepare_flush_writes(mddev->bitmap);
>>>> wake_up(&conf->wait_barrier);
>>>>
>>>> while (bio) { /* submit pending writes */
>>>> --
>>>> 2.39.2
>>>>
>>>
>>> .
>>>
>>
>
> .
>


2023-05-31 08:26:52

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread

Hi,

在 2023/05/31 15:50, Xiao Ni 写道:
> On Mon, May 29, 2023 at 9:14 PM Yu Kuai <[email protected]> wrote:
>>
>> From: Yu Kuai <[email protected]>
>>
>> current->bio_list will be set under submit_bio() context, in this case
>> bitmap io will be added to the list and wait for current io submission to
>> finish, while current io submission must wait for bitmap io to be done.
>> commit 874807a83139 ("md/raid1{,0}: fix deadlock in bitmap_unplug.") fix
>> the deadlock by handling plugged bio by daemon thread.
>
> Thanks for the historic introduction. I did a test and printed the
> logs in raid10_unplug. The tools I used are dd and mkfs. from_schedule
> is always true during I/O and it's 0 when io finishes. So I have a
> question here, how can I trigger the condition that from_schedule is 0
> and current->list is not NULL? In other words, is there really a
> deadlock here? Before your patch it looks like all bios are merged
> into conf->pending_bio_list and are handled by raid10d. It can't
> submit bio directly in the originating process which mentioned in
> 57c67df48866
>
As I mentioned below, after commit a214b949d8e3, this deadlock doesn't
exist anymore, and without this patch, patch 7 will introduce this
scenario again.

Thanks,
Kuai
>>
>> On the one hand, the deadlock won't exist after commit a214b949d8e3
>> ("blk-mq: only flush requests from the plug in blk_mq_submit_bio"). On
>> the other hand, current solution makes it impossible to flush plugged bio
>> in raid1/10_make_request(), because this will cause that all the writes
>> will goto daemon thread.
>>
>> In order to limit the number of plugged bio, commit 874807a83139
>> ("md/raid1{,0}: fix deadlock in bitmap_unplug.") is reverted, and the
>> deadlock is fixed by handling bitmap io asynchronously.
>>
>> Signed-off-by: Yu Kuai <[email protected]>
>> ---
>> drivers/md/raid1-10.c | 14 ++++++++++++++
>> drivers/md/raid1.c | 4 ++--
>> drivers/md/raid10.c | 8 +++-----
>> 3 files changed, 19 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
>> index 73cc3cb9154d..17e55c1fd5a1 100644
>> --- a/drivers/md/raid1-10.c
>> +++ b/drivers/md/raid1-10.c
>> @@ -151,3 +151,17 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
>>
>> return true;
>> }
>> +
>> +/*
>> + * current->bio_list will be set under submit_bio() context, in this case bitmap
>> + * io will be added to the list and wait for current io submission to finish,
>> + * while current io submission must wait for bitmap io to be done. In order to
>> + * avoid such deadlock, submit bitmap io asynchronously.
>> + */
>> +static inline void raid1_prepare_flush_writes(struct bitmap *bitmap)
>> +{
>> + if (current->bio_list)
>> + md_bitmap_unplug_async(bitmap);
>> + else
>> + md_bitmap_unplug(bitmap);
>> +}
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index 0778e398584c..006620fed595 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -794,7 +794,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
>> static void flush_bio_list(struct r1conf *conf, struct bio *bio)
>> {
>> /* flush any pending bitmap writes to disk before proceeding w/ I/O */
>> - md_bitmap_unplug(conf->mddev->bitmap);
>> + raid1_prepare_flush_writes(conf->mddev->bitmap);
>
> If we unplug bitmap asynchronously, can we make sure the bitmap are
> flushed before the corresponding data?
>
> Regards
> Xiao
>
>> wake_up(&conf->wait_barrier);
>>
>> while (bio) { /* submit pending writes */
>> @@ -1166,7 +1166,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
>> struct r1conf *conf = mddev->private;
>> struct bio *bio;
>>
>> - if (from_schedule || current->bio_list) {
>> + if (from_schedule) {
>> spin_lock_irq(&conf->device_lock);
>> bio_list_merge(&conf->pending_bio_list, &plug->pending);
>> spin_unlock_irq(&conf->device_lock);
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index 6640507ecb0d..fb22cfe94d32 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -902,9 +902,7 @@ static void flush_pending_writes(struct r10conf *conf)
>> __set_current_state(TASK_RUNNING);
>>
>> blk_start_plug(&plug);
>> - /* flush any pending bitmap writes to disk
>> - * before proceeding w/ I/O */
>> - md_bitmap_unplug(conf->mddev->bitmap);
>> + raid1_prepare_flush_writes(conf->mddev->bitmap);
>> wake_up(&conf->wait_barrier);
>>
>> while (bio) { /* submit pending writes */
>> @@ -1108,7 +1106,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
>> struct r10conf *conf = mddev->private;
>> struct bio *bio;
>>
>> - if (from_schedule || current->bio_list) {
>> + if (from_schedule) {
>> spin_lock_irq(&conf->device_lock);
>> bio_list_merge(&conf->pending_bio_list, &plug->pending);
>> spin_unlock_irq(&conf->device_lock);
>> @@ -1120,7 +1118,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
>>
>> /* we aren't scheduling, so we can do the write-out directly. */
>> bio = bio_list_get(&plug->pending);
>> - md_bitmap_unplug(mddev->bitmap);
>> + raid1_prepare_flush_writes(mddev->bitmap);
>> wake_up(&conf->wait_barrier);
>>
>> while (bio) { /* submit pending writes */
>> --
>> 2.39.2
>>
>
> .
>


2023-05-31 15:27:32

by Xiao Ni

[permalink] [raw]
Subject: Re: [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread

On Wed, May 31, 2023 at 4:06 PM Yu Kuai <[email protected]> wrote:
>
> Hi,
>
> 在 2023/05/31 16:00, Xiao Ni 写道:
> > On Wed, May 31, 2023 at 3:55 PM Yu Kuai <[email protected]> wrote:
> >>
> >> Hi,
> >>
> >> 在 2023/05/31 15:50, Xiao Ni 写道:
> >>> On Mon, May 29, 2023 at 9:14 PM Yu Kuai <[email protected]> wrote:
> >>>>
> >>>> From: Yu Kuai <[email protected]>
> >>>>
> >>>> current->bio_list will be set under submit_bio() context, in this case
> >>>> bitmap io will be added to the list and wait for current io submission to
> >>>> finish, while current io submission must wait for bitmap io to be done.
> >>>> commit 874807a83139 ("md/raid1{,0}: fix deadlock in bitmap_unplug.") fix
> >>>> the deadlock by handling plugged bio by daemon thread.
> >>>
> >>> Thanks for the historic introduction. I did a test and printed the
> >>> logs in raid10_unplug. The tools I used are dd and mkfs. from_schedule
> >>> is always true during I/O and it's 0 when io finishes. So I have a
> >>> question here, how can I trigger the condition that from_schedule is 0
> >>> and current->list is not NULL? In other words, is there really a
> >>> deadlock here? Before your patch it looks like all bios are merged
> >>> into conf->pending_bio_list and are handled by raid10d. It can't
> >>> submit bio directly in the originating process which mentioned in
> >>> 57c67df48866
> >>>
> >> As I mentioned below, after commit a214b949d8e3, this deadlock doesn't
> >> exist anymore, and without this patch, patch 7 will introduce this
> >> scenario again.
> >>
> >> Thanks,
> >> Kuai
> >>>>
> >>>> On the one hand, the deadlock won't exist after commit a214b949d8e3
> >>>> ("blk-mq: only flush requests from the plug in blk_mq_submit_bio"). On
> >>>> the other hand, current solution makes it impossible to flush plugged bio
> >>>> in raid1/10_make_request(), because this will cause that all the writes
> >>>> will goto daemon thread.
> >>>>
> >>>> In order to limit the number of plugged bio, commit 874807a83139
> >>>> ("md/raid1{,0}: fix deadlock in bitmap_unplug.") is reverted, and the
> >>>> deadlock is fixed by handling bitmap io asynchronously.
> >>>>
> >>>> Signed-off-by: Yu Kuai <[email protected]>
> >>>> ---
> >>>> drivers/md/raid1-10.c | 14 ++++++++++++++
> >>>> drivers/md/raid1.c | 4 ++--
> >>>> drivers/md/raid10.c | 8 +++-----
> >>>> 3 files changed, 19 insertions(+), 7 deletions(-)
> >>>>
> >>>> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
> >>>> index 73cc3cb9154d..17e55c1fd5a1 100644
> >>>> --- a/drivers/md/raid1-10.c
> >>>> +++ b/drivers/md/raid1-10.c
> >>>> @@ -151,3 +151,17 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
> >>>>
> >>>> return true;
> >>>> }
> >>>> +
> >>>> +/*
> >>>> + * current->bio_list will be set under submit_bio() context, in this case bitmap
> >>>> + * io will be added to the list and wait for current io submission to finish,
> >>>> + * while current io submission must wait for bitmap io to be done. In order to
> >>>> + * avoid such deadlock, submit bitmap io asynchronously.
> >>>> + */
> >>>> +static inline void raid1_prepare_flush_writes(struct bitmap *bitmap)
> >>>> +{
> >>>> + if (current->bio_list)
> >>>> + md_bitmap_unplug_async(bitmap);
> >>>> + else
> >>>> + md_bitmap_unplug(bitmap);
> >>>> +}
> >>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> >>>> index 0778e398584c..006620fed595 100644
> >>>> --- a/drivers/md/raid1.c
> >>>> +++ b/drivers/md/raid1.c
> >>>> @@ -794,7 +794,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
> >>>> static void flush_bio_list(struct r1conf *conf, struct bio *bio)
> >>>> {
> >>>> /* flush any pending bitmap writes to disk before proceeding w/ I/O */
> >>>> - md_bitmap_unplug(conf->mddev->bitmap);
> >>>> + raid1_prepare_flush_writes(conf->mddev->bitmap);
> >>>
> >>> If we unplug bitmap asynchronously, can we make sure the bitmap are
> >>> flushed before the corresponding data?
> >
> > Could you explain this question?
>
> Sorry that I missed this... See the new helper in patch 5,
> md_bitmap_unplug_async() will still wait for bitmap io to finish.
>
> md_bitmap_unplug_async
> DECLARE_COMPLETION_ONSTACK(done)
> ...
> wait_for_completion(&done)

Ah I c. You use this way to avoid putting the bitmap io to
current->bio_list. Thanks for the explanation :)

Regards
Xiao
>
> Thanks,
> Kuai
> >
> > Regards
> > Xiao
> >
> >
> >>>
> >>> Regards
> >>> Xiao
> >>>
> >>>> wake_up(&conf->wait_barrier);
> >>>>
> >>>> while (bio) { /* submit pending writes */
> >>>> @@ -1166,7 +1166,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
> >>>> struct r1conf *conf = mddev->private;
> >>>> struct bio *bio;
> >>>>
> >>>> - if (from_schedule || current->bio_list) {
> >>>> + if (from_schedule) {
> >>>> spin_lock_irq(&conf->device_lock);
> >>>> bio_list_merge(&conf->pending_bio_list, &plug->pending);
> >>>> spin_unlock_irq(&conf->device_lock);
> >>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> >>>> index 6640507ecb0d..fb22cfe94d32 100644
> >>>> --- a/drivers/md/raid10.c
> >>>> +++ b/drivers/md/raid10.c
> >>>> @@ -902,9 +902,7 @@ static void flush_pending_writes(struct r10conf *conf)
> >>>> __set_current_state(TASK_RUNNING);
> >>>>
> >>>> blk_start_plug(&plug);
> >>>> - /* flush any pending bitmap writes to disk
> >>>> - * before proceeding w/ I/O */
> >>>> - md_bitmap_unplug(conf->mddev->bitmap);
> >>>> + raid1_prepare_flush_writes(conf->mddev->bitmap);
> >>>> wake_up(&conf->wait_barrier);
> >>>>
> >>>> while (bio) { /* submit pending writes */
> >>>> @@ -1108,7 +1106,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
> >>>> struct r10conf *conf = mddev->private;
> >>>> struct bio *bio;
> >>>>
> >>>> - if (from_schedule || current->bio_list) {
> >>>> + if (from_schedule) {
> >>>> spin_lock_irq(&conf->device_lock);
> >>>> bio_list_merge(&conf->pending_bio_list, &plug->pending);
> >>>> spin_unlock_irq(&conf->device_lock);
> >>>> @@ -1120,7 +1118,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
> >>>>
> >>>> /* we aren't scheduling, so we can do the write-out directly. */
> >>>> bio = bio_list_get(&plug->pending);
> >>>> - md_bitmap_unplug(mddev->bitmap);
> >>>> + raid1_prepare_flush_writes(mddev->bitmap);
> >>>> wake_up(&conf->wait_barrier);
> >>>>
> >>>> while (bio) { /* submit pending writes */
> >>>> --
> >>>> 2.39.2
> >>>>
> >>>
> >>> .
> >>>
> >>
> >
> > .
> >
>