2019-02-11 05:44:13

by jianchao.wang

[permalink] [raw]
Subject: [PATCH] blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue

When requeue, if RQF_DONTPREP, rq has contained some driver
specific data, so insert it to hctx dispatch list to avoid any
merge. Take scsi as example, here is the trace event log (no
io scheduler, because RQF_STARTED would prevent merging),

kworker/0:1H-339 [000] ...1 2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
scsi_inert_test-1987 [000] .... 2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
scsi_inert_test-1987 [000] ...2 2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
kworker/0:1H-339 [000] .... 2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
scsi_inert_test-1996 [000] ..s1 2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
scsi_inert_test-1996 [000] .Ns1 2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
kworker/0:1H-339 [000] ...1 2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
kworker/0:1H-339 [000] ...1 2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
scsi_inert_test-1986 [000] ..s1 2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]

(32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
the sdb only contained the part of (32768 + 8), then only that part
was completed. The lucky thing was that scsi_io_completion detected
it and requeued the remaining part. So we didn't get corrupted data.
However, the requeue of (32776 + 8) is not expected.

Signed-off-by: Jianchao Wang <[email protected]>
---
block/blk-mq.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8f5b533..2d93eb5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -737,6 +737,18 @@ static void blk_mq_requeue_work(struct work_struct *work)
spin_unlock_irq(&q->requeue_lock);

list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
+ /*
+ * If RQF_DONTPREP, rq has contained some driver specific
+ * data, so insert it to hctx dispatch list to avoid any
+ * merge.
+ */
+ if (rq->rq_flags & RQF_DONTPREP) {
+ rq->rq_flags &= ~RQF_SOFTBARRIER;
+ list_del_init(&rq->queuelist);
+ blk_mq_request_bypass_insert(rq, false);
+ continue;
+ }
+
if (!(rq->rq_flags & RQF_SOFTBARRIER))
continue;

--
2.7.4



2019-02-11 16:01:29

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue

On 2/10/19 10:41 PM, Jianchao Wang wrote:
> When requeue, if RQF_DONTPREP, rq has contained some driver
> specific data, so insert it to hctx dispatch list to avoid any
> merge. Take scsi as example, here is the trace event log (no
> io scheduler, because RQF_STARTED would prevent merging),
>
> kworker/0:1H-339 [000] ...1 2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
> scsi_inert_test-1987 [000] .... 2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
> scsi_inert_test-1987 [000] ...2 2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
> kworker/0:1H-339 [000] .... 2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
> scsi_inert_test-1996 [000] ..s1 2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
> scsi_inert_test-1996 [000] .Ns1 2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
> kworker/0:1H-339 [000] ...1 2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
> kworker/0:1H-339 [000] ...1 2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
> scsi_inert_test-1986 [000] ..s1 2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
>
> (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
> Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
> the sdb only contained the part of (32768 + 8), then only that part
> was completed. The lucky thing was that scsi_io_completion detected
> it and requeued the remaining part. So we didn't get corrupted data.
> However, the requeue of (32776 + 8) is not expected.

Good catch, but how about something like this? Makes it more integrated,
I think that's cleaner.


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 44d471ff8754..4c26bbb4330f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -737,12 +737,20 @@ static void blk_mq_requeue_work(struct work_struct *work)
spin_unlock_irq(&q->requeue_lock);

list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
- if (!(rq->rq_flags & RQF_SOFTBARRIER))
+ /*
+ * If RQF_DONTPREP is set, rq may contain some driver
+ * specific data. Insert it to hctx dispatch list to avoid
+ * any merge.
+ */
+ if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
continue;

rq->rq_flags &= ~RQF_SOFTBARRIER;
list_del_init(&rq->queuelist);
- blk_mq_sched_insert_request(rq, true, false, false);
+ if (rq->rq_flags & RQF_SOFTBARRIER)
+ blk_mq_sched_insert_request(rq, true, false, false);
+ else
+ blk_mq_request_bypass_insert(rq, false);
}

while (!list_empty(&rq_list)) {


--
Jens Axboe


2019-02-11 23:18:12

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue

On 2/11/19 8:59 AM, Jens Axboe wrote:
> On 2/10/19 10:41 PM, Jianchao Wang wrote:
>> When requeue, if RQF_DONTPREP, rq has contained some driver
>> specific data, so insert it to hctx dispatch list to avoid any
>> merge. Take scsi as example, here is the trace event log (no
>> io scheduler, because RQF_STARTED would prevent merging),
>>
>> kworker/0:1H-339 [000] ...1 2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
>> scsi_inert_test-1987 [000] .... 2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
>> scsi_inert_test-1987 [000] ...2 2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
>> kworker/0:1H-339 [000] .... 2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
>> scsi_inert_test-1996 [000] ..s1 2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
>> scsi_inert_test-1996 [000] .Ns1 2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
>> kworker/0:1H-339 [000] ...1 2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>> kworker/0:1H-339 [000] ...1 2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>> scsi_inert_test-1986 [000] ..s1 2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
>>
>> (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
>> Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
>> the sdb only contained the part of (32768 + 8), then only that part
>> was completed. The lucky thing was that scsi_io_completion detected
>> it and requeued the remaining part. So we didn't get corrupted data.
>> However, the requeue of (32776 + 8) is not expected.
>
> Good catch, but how about something like this? Makes it more integrated,
> I think that's cleaner.

This is probably better (and safer):


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8f5b533764ca..b3908eb3881c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -737,12 +737,21 @@ static void blk_mq_requeue_work(struct work_struct *work)
spin_unlock_irq(&q->requeue_lock);

list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
- if (!(rq->rq_flags & RQF_SOFTBARRIER))
+ if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
continue;

rq->rq_flags &= ~RQF_SOFTBARRIER;
list_del_init(&rq->queuelist);
- blk_mq_sched_insert_request(rq, true, false, false);
+
+ /*
+ * If RQF_DONTPREP is set, rq may contain some driver
+ * specific data. Insert it to hctx dispatch list to avoid
+ * any merge.
+ */
+ if (rq->rq_flags & RQF_DONTPREP)
+ blk_mq_sched_insert_request(rq, true, false, false);
+ else
+ blk_mq_request_bypass_insert(rq, false);
}

while (!list_empty(&rq_list)) {

--
Jens Axboe


2019-02-11 23:22:37

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue

On 2/11/19 4:15 PM, Jens Axboe wrote:
> On 2/11/19 8:59 AM, Jens Axboe wrote:
>> On 2/10/19 10:41 PM, Jianchao Wang wrote:
>>> When requeue, if RQF_DONTPREP, rq has contained some driver
>>> specific data, so insert it to hctx dispatch list to avoid any
>>> merge. Take scsi as example, here is the trace event log (no
>>> io scheduler, because RQF_STARTED would prevent merging),
>>>
>>> kworker/0:1H-339 [000] ...1 2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
>>> scsi_inert_test-1987 [000] .... 2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
>>> scsi_inert_test-1987 [000] ...2 2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
>>> kworker/0:1H-339 [000] .... 2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
>>> scsi_inert_test-1996 [000] ..s1 2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
>>> scsi_inert_test-1996 [000] .Ns1 2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
>>> kworker/0:1H-339 [000] ...1 2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>>> kworker/0:1H-339 [000] ...1 2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>>> scsi_inert_test-1986 [000] ..s1 2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
>>>
>>> (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
>>> Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
>>> the sdb only contained the part of (32768 + 8), then only that part
>>> was completed. The lucky thing was that scsi_io_completion detected
>>> it and requeued the remaining part. So we didn't get corrupted data.
>>> However, the requeue of (32776 + 8) is not expected.
>>
>> Good catch, but how about something like this? Makes it more integrated,
>> I think that's cleaner.
>
> This is probably better (and safer):

Here's the one I wanted to send, not a half done one. Maybe I'll be
luckier this time around?


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8f5b533764ca..35e6aba52808 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -737,12 +737,21 @@ static void blk_mq_requeue_work(struct work_struct *work)
spin_unlock_irq(&q->requeue_lock);

list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
- if (!(rq->rq_flags & RQF_SOFTBARRIER))
+ if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
continue;

rq->rq_flags &= ~RQF_SOFTBARRIER;
list_del_init(&rq->queuelist);
- blk_mq_sched_insert_request(rq, true, false, false);
+
+ /*
+ * If RQF_DONTPREP is set, rq may contain some driver
+ * specific data. Insert it to hctx dispatch list to avoid
+ * any merge.
+ */
+ if (rq->rq_flags & RQF_DONTPREP)
+ blk_mq_request_bypass_insert(rq, false);
+ else
+ blk_mq_sched_insert_request(rq, true, false, false);
}

while (!list_empty(&rq_list)) {

--
Jens Axboe


2019-02-12 01:56:32

by jianchao.wang

[permalink] [raw]
Subject: Re: [PATCH] blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue

Hi Jens

Thanks for your kindly response.

On 2/12/19 7:20 AM, Jens Axboe wrote:
> On 2/11/19 4:15 PM, Jens Axboe wrote:
>> On 2/11/19 8:59 AM, Jens Axboe wrote:
>>> On 2/10/19 10:41 PM, Jianchao Wang wrote:
>>>> When requeue, if RQF_DONTPREP, rq has contained some driver
>>>> specific data, so insert it to hctx dispatch list to avoid any
>>>> merge. Take scsi as example, here is the trace event log (no
>>>> io scheduler, because RQF_STARTED would prevent merging),
>>>>
>>>> kworker/0:1H-339 [000] ...1 2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
>>>> scsi_inert_test-1987 [000] .... 2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
>>>> scsi_inert_test-1987 [000] ...2 2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
>>>> kworker/0:1H-339 [000] .... 2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
>>>> scsi_inert_test-1996 [000] ..s1 2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
>>>> scsi_inert_test-1996 [000] .Ns1 2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
>>>> kworker/0:1H-339 [000] ...1 2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>>>> kworker/0:1H-339 [000] ...1 2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>>>> scsi_inert_test-1986 [000] ..s1 2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
>>>>
>>>> (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
>>>> Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
>>>> the sdb only contained the part of (32768 + 8), then only that part
>>>> was completed. The lucky thing was that scsi_io_completion detected
>>>> it and requeued the remaining part. So we didn't get corrupted data.
>>>> However, the requeue of (32776 + 8) is not expected.
>>>
>>> Good catch, but how about something like this? Makes it more integrated,
>>> I think that's cleaner.
>>
>> This is probably better (and safer):
>
> Here's the one I wanted to send, not a half done one. Maybe I'll be
> luckier this time around?
>
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 8f5b533764ca..35e6aba52808 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -737,12 +737,21 @@ static void blk_mq_requeue_work(struct work_struct *work)
> spin_unlock_irq(&q->requeue_lock);
>
> list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
> - if (!(rq->rq_flags & RQF_SOFTBARRIER))
> + if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
> continue;
>
> rq->rq_flags &= ~RQF_SOFTBARRIER;
> list_del_init(&rq->queuelist);
> - blk_mq_sched_insert_request(rq, true, false, false);
> +
> + /*
> + * If RQF_DONTPREP is set, rq may contain some driver
> + * specific data. Insert it to hctx dispatch list to avoid
> + * any merge.
> + */
> + if (rq->rq_flags & RQF_DONTPREP)
> + blk_mq_request_bypass_insert(rq, false);
> + else
> + blk_mq_sched_insert_request(rq, true, false, false);
> }
>
> while (!list_empty(&rq_list)) {
>

The test is OK.
And I will send out the V2 based on this.

Thanks
Jianchao