2017-12-05 07:53:13

by Ming Lei

[permalink] [raw]
Subject: [PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle

Before commit 0df21c86bdbf ("scsi: implement .get_budget and .put_budget
for blk-mq"), we run queue after 3ms if queue is idle and SCSI device
queue isn't ready, which is done in handling BLK_STS_RESOURCE. After
commit 0df21c86bdbf is introduced, queue won't be run any more under
this situation.

IO hang is observed when timeout happened, and this patch fixes the IO
hang issue by running queue after delay in scsi_dev_queue_ready, just like
non-mq. This issue can be triggered by the following script[1].

There is another issue which can be covered by running idle queue:
when .get_budget() is called on request coming from hctx->dispatch_list,
if one request just completes during .get_budget(), we can't depend on
SCSI's restart to make progress any more. This patch fixes the race too.

With this patch, we basically recover to previous behaviour(before commit
0df21c86bdbf) of handling idle queue when running out of resource.

[1] script for test/verify SCSI timeout
rmmod scsi_debug
modprobe scsi_debug max_queue=1

DEVICE=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename`
DISK_DIR=`ls -d /sys/block/$DEVICE/device/scsi_disk/*`

echo "using scsi device $DEVICE"
echo "-1" >/sys/bus/pseudo/drivers/scsi_debug/every_nth
echo "temporary write through" >$DISK_DIR/cache_type
echo "128" >/sys/bus/pseudo/drivers/scsi_debug/opts
echo none > /sys/block/$DEVICE/queue/scheduler
dd if=/dev/$DEVICE of=/dev/null bs=1M iflag=direct count=1 &
sleep 5
echo "0" >/sys/bus/pseudo/drivers/scsi_debug/opts
wait
echo "SUCCESS"

Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq")
Signed-off-by: Ming Lei <[email protected]>
---
drivers/scsi/scsi_lib.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index db9556662e27..1816dd8259b3 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1967,6 +1967,8 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
out_put_device:
put_device(&sdev->sdev_gendev);
out:
+ if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev))
+ blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
return false;
}

--
2.9.5


2017-12-05 14:30:01

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle

[ +Cc Omar ]

Ming Lei <[email protected]> writes:
> Before commit 0df21c86bdbf ("scsi: implement .get_budget and .put_budget
> for blk-mq"), we run queue after 3ms if queue is idle and SCSI device
> queue isn't ready, which is done in handling BLK_STS_RESOURCE. After
> commit 0df21c86bdbf is introduced, queue won't be run any more under
> this situation.
>
> IO hang is observed when timeout happened, and this patch fixes the IO
> hang issue by running queue after delay in scsi_dev_queue_ready, just like
> non-mq. This issue can be triggered by the following script[1].
>
> There is another issue which can be covered by running idle queue:
> when .get_budget() is called on request coming from hctx->dispatch_list,
> if one request just completes during .get_budget(), we can't depend on
> SCSI's restart to make progress any more. This patch fixes the race too.
>
> With this patch, we basically recover to previous behaviour(before commit
> 0df21c86bdbf) of handling idle queue when running out of resource.
>
> [1] script for test/verify SCSI timeout
> rmmod scsi_debug
> modprobe scsi_debug max_queue=1
>
> DEVICE=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename`
> DISK_DIR=`ls -d /sys/block/$DEVICE/device/scsi_disk/*`
>
> echo "using scsi device $DEVICE"
> echo "-1" >/sys/bus/pseudo/drivers/scsi_debug/every_nth
> echo "temporary write through" >$DISK_DIR/cache_type
> echo "128" >/sys/bus/pseudo/drivers/scsi_debug/opts
> echo none > /sys/block/$DEVICE/queue/scheduler
> dd if=/dev/$DEVICE of=/dev/null bs=1M iflag=direct count=1 &
> sleep 5
> echo "0" >/sys/bus/pseudo/drivers/scsi_debug/opts
> wait
> echo "SUCCESS"

SO I turned the above into a blktest but have found some shortcommings
of my bash skills. Maybe you or Omar has a soution for it:

--- 8< ---
>From 80e5810011d52bc188cd858962ce202bfd4dbee5 Mon Sep 17 00:00:00 2001
From: Johannes Thumshirn <[email protected]>
Date: Tue, 5 Dec 2017 15:21:08 +0100
Subject: [PATCH blktests] block/013: add test for scsi_device queue starvation

Add a test for Ming Lei's patch titled "SCSI: run queue if SCSI device
queue isn't ready and queue is idle"

Signed-off-by: Johannes Thumshirn <[email protected]>

---
This test case has two shortcommings, which need to be addressed I'm
just lacking a bit of the shell magic to address them properly.

1) Testing without the patch applied hangs the test forever as it
doesn't get killed after a specific timeout (I think this should be
solved in a common function).
2) It has a nasty sleep at it's end to wait for scsi_debug's refcounts
to drop to 0 before removing the module or removing will fail and thus
the test case. This as well should be solved in a more generic way.
---
tests/block/013 | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++
tests/block/013.out | 2 ++
2 files changed, 65 insertions(+)
create mode 100755 tests/block/013
create mode 100644 tests/block/013.out

diff --git a/tests/block/013 b/tests/block/013
new file mode 100755
index 000000000000..f73724fc9ed2
--- /dev/null
+++ b/tests/block/013
@@ -0,0 +1,63 @@
+#!/bin/bash
+#
+# Regression test for patch "SCSI: delay run queue if device is
+# blocked in scsi_dev_queue_ready()"
+#
+# Copyright (C) 2017 Johannes Thumshirn
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+. common/scsi_debug
+
+DESCRIPTION="Test if a SCSI device's queue can be run if it isn't ready but the device is idle"
+TIMED=1
+
+requires() {
+ _have_scsi_debug && _have_module sd_mod && \
+ grep -q Y /sys/module/scsi_mod/parameters/use_blk_mq
+}
+
+test_one_device()
+{
+ local device=$1
+
+ echo "-1" > /sys/bus/pseudo/drivers/scsi_debug/every_nth
+ echo "temporary write through" > \
+ /sys/block/"${device}"/device/scsi_disk/"$(basename $(readlink /sys/block/${device}/device))"/cache_type
+ echo "128" > /sys/bus/pseudo/drivers/scsi_debug/opts
+ echo "none" > /sys/block/${device}/queue/scheduler
+ dd if=/dev/"${device}" of=/dev/null bs=1M iflag=direct \
+ count=1 2> /dev/null &
+ sleep 5
+ echo 0 > /sys/bus/pseudo/drivers/scsi_debug/opts
+ wait
+}
+
+test() {
+ echo "Running ${TEST_NAME}"
+
+ if ! _init_scsi_debug statistics=1 max_queue=1; then
+ return
+ fi
+
+ local device
+ for device in "${SCSI_DEBUG_DEVICES[@]}"; do
+ test_one_device ${device}
+ done
+
+ sleep 5 # to free up all scsi_debug refcnts
+ _exit_scsi_debug
+
+ echo "Test complete"
+}
diff --git a/tests/block/013.out b/tests/block/013.out
new file mode 100644
index 000000000000..947bd04e2104
--- /dev/null
+++ b/tests/block/013.out
@@ -0,0 +1,2 @@
+Running block/013
+Test complete
--
2.13.6



--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2017-12-05 16:08:25

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle

On Tue, 2017-12-05 at 15:52 +0800, Ming Lei wrote:
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index db9556662e27..1816dd8259b3 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1967,6 +1967,8 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
> out_put_device:
> put_device(&sdev->sdev_gendev);
> out:
> + if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev))
> + blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
> return false;
> }

This cannot work since multiple threads can call scsi_mq_get_budget()
concurrently and hence it can happen that none of them sees
atomic_read(&sdev->device_busy) == 0. BTW, the above patch is something I
consider as a workaround. Have you considered to fix the root cause and to
add blk_mq_sched_mark_restart_hctx() where these are missing, e.g. in
blk_mq_sched_dispatch_requests()? The patch below is not a full solution
but resulted in a significant improvement in my tests:

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 69e3226e66ca..9d86876ec503 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -226,6 +226,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
* TODO: get more budgets, and dequeue more requests in
* one time.
*/
+ blk_mq_sched_mark_restart_hctx(hctx);
blk_mq_do_dispatch_ctx(hctx);
} else {
blk_mq_flush_busy_ctxs(hctx, &rq_list);

Bart.

2017-12-05 16:16:51

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle

On Tue, 2017-12-05 at 15:29 +0100, Johannes Thumshirn wrote:
> 1) Testing without the patch applied hangs the test forever as it
> doesn't get killed after a specific timeout (I think this should be
> solved in a common function).

Hello Johannes,

If a request queue got stuck then the processes that submitted the requests
on that queue are unkillable. The only approach I know of to stop these
processes is to send a kill signal and next to trigger a queue run from user
space. One possible approach is to run the following command:

for d in /sys/kernel/debug/block/*; do echo kick >$d/state; done

Bart.

2017-12-05 16:28:51

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle

On Tue, Dec 05, 2017 at 04:08:20PM +0000, Bart Van Assche wrote:
> On Tue, 2017-12-05 at 15:52 +0800, Ming Lei wrote:
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index db9556662e27..1816dd8259b3 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1967,6 +1967,8 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
> > out_put_device:
> > put_device(&sdev->sdev_gendev);
> > out:
> > + if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev))
> > + blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
> > return false;
> > }
>
> This cannot work since multiple threads can call scsi_mq_get_budget()

That is exactly the way we are handling these cases before 0df21c86bdbf(scsi:
implement .get_budget and .put_budget for blk-mq), so if it can't work,
that is not fault of commit 0df21c86bdbf.

> concurrently and hence it can happen that none of them sees
> atomic_read(&sdev->device_busy) == 0. BTW, the above patch is something I

If there is concurrent .get_budget(), one of them will see the counter
becoming zero finally because each sdev->device_busy is inc/dec
atomically. Or scsi_dev_queue_ready() return true.

Anyway, we need this patch to avoid possible regression. If you think
there are bugs in blk-mq RESTART, just root cause and and fix it.

> consider as a workaround. Have you considered to fix the root cause and to
> add blk_mq_sched_mark_restart_hctx() where these are missing, e.g. in
> blk_mq_sched_dispatch_requests()? The patch below is not a full solution
> but resulted in a significant improvement in my tests:
>
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 69e3226e66ca..9d86876ec503 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -226,6 +226,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> * TODO: get more budgets, and dequeue more requests in
> * one time.
> */
> + blk_mq_sched_mark_restart_hctx(hctx);
> blk_mq_do_dispatch_ctx(hctx);
> } else {
> blk_mq_flush_busy_ctxs(hctx, &rq_list);

This is still a workaround for RESTART, see my comment before:

https://marc.info/?l=linux-block&m=151217500929341&w=2

--
Ming

2017-12-05 16:41:53

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle

On Wed, 2017-12-06 at 00:28 +0800, Ming Lei wrote:
> This is still a workaround for RESTART, see my comment before:
>
> https://marc.info/?l=linux-block&m=151217500929341&w=2

A quote from that e-mail: "The theory about using BLK_MQ_S_SCHED_RESTART in
current way is that we mark it after requests are added to hctx->dispatch".
Reading that makes me wonder whether you understand the purpose of the
BLK_MQ_S_SCHED_RESTART flag? That flag is not set after requests are added
to the dispatch list but after requests have been *removed*. The purpose of
that flag is to detect whether another thread has run the queue after
requests were removed from the dispatch list and before these were readded.
If so, the queue needs to be rerun.

Bart.

2017-12-05 16:45:38

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle

On Tue, Dec 05, 2017 at 04:41:46PM +0000, Bart Van Assche wrote:
> On Wed, 2017-12-06 at 00:28 +0800, Ming Lei wrote:
> > This is still a workaround for RESTART, see my comment before:
> >
> > https://marc.info/?l=linux-block&m=151217500929341&w=2
>
> A quote from that e-mail: "The theory about using BLK_MQ_S_SCHED_RESTART in
> current way is that we mark it after requests are added to hctx->dispatch".
> Reading that makes me wonder whether you understand the purpose of the
> BLK_MQ_S_SCHED_RESTART flag? That flag is not set after requests are added
> to the dispatch list but after requests have been *removed*. The purpose of
> that flag is to detect whether another thread has run the queue after
> requests were removed from the dispatch list and before these were readded.
> If so, the queue needs to be rerun.

If you want to discuss that, please reply on that thread of '[PATCH 4/7] blk-mq:
Avoid that request processing sta', I will reply on you there too.

--
Ming

2017-12-06 01:52:32

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle

On Wed, Dec 06, 2017 at 12:28:25AM +0800, Ming Lei wrote:
> On Tue, Dec 05, 2017 at 04:08:20PM +0000, Bart Van Assche wrote:
> > On Tue, 2017-12-05 at 15:52 +0800, Ming Lei wrote:
> > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > index db9556662e27..1816dd8259b3 100644
> > > --- a/drivers/scsi/scsi_lib.c
> > > +++ b/drivers/scsi/scsi_lib.c
> > > @@ -1967,6 +1967,8 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
> > > out_put_device:
> > > put_device(&sdev->sdev_gendev);
> > > out:
> > > + if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev))
> > > + blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
> > > return false;
> > > }
> >
> > This cannot work since multiple threads can call scsi_mq_get_budget()
>
> That is exactly the way we are handling these cases before 0df21c86bdbf(scsi:
> implement .get_budget and .put_budget for blk-mq), so if it can't work,
> that is not fault of commit 0df21c86bdbf.
>
> > concurrently and hence it can happen that none of them sees
> > atomic_read(&sdev->device_busy) == 0. BTW, the above patch is something I
>
> If there is concurrent .get_budget(), one of them will see the counter
> becoming zero finally because each sdev->device_busy is inc/dec
> atomically. Or scsi_dev_queue_ready() return true.
>
> Anyway, we need this patch to avoid possible regression. If you think
> there are bugs in blk-mq RESTART, just root cause and and fix it.
>
> > consider as a workaround. Have you considered to fix the root cause and to
> > add blk_mq_sched_mark_restart_hctx() where these are missing, e.g. in
> > blk_mq_sched_dispatch_requests()? The patch below is not a full solution
> > but resulted in a significant improvement in my tests:
> >
> > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > index 69e3226e66ca..9d86876ec503 100644
> > --- a/block/blk-mq-sched.c
> > +++ b/block/blk-mq-sched.c
> > @@ -226,6 +226,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > * TODO: get more budgets, and dequeue more requests in
> > * one time.
> > */
> > + blk_mq_sched_mark_restart_hctx(hctx);
> > blk_mq_do_dispatch_ctx(hctx);
> > } else {
> > blk_mq_flush_busy_ctxs(hctx, &rq_list);
>

BTW, this kind of change can't cover scsi_set_blocked() which is
triggered by timeout, scsi dispatch failure. You will see that
easily if you run the SCSI test script I provided in the commit log.

--
Ming

2017-12-06 16:07:24

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle

On Wed, 2017-12-06 at 09:52 +0800, Ming Lei wrote:
> On Wed, Dec 06, 2017 at 12:28:25AM +0800, Ming Lei wrote:
> > On Tue, Dec 05, 2017 at 04:08:20PM +0000, Bart Van Assche wrote:
> > > The patch below is not a full solution but resulted in a significant
> > > improvement in my tests:
> > >
> > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > > index 69e3226e66ca..9d86876ec503 100644
> > > --- a/block/blk-mq-sched.c
> > > +++ b/block/blk-mq-sched.c
> > > @@ -226,6 +226,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > * TODO: get more budgets, and dequeue more requests in
> > > * one time.
> > > */
> > > + blk_mq_sched_mark_restart_hctx(hctx);
> > > blk_mq_do_dispatch_ctx(hctx);
> > > } else {
> > > blk_mq_flush_busy_ctxs(hctx, &rq_list);
>
> BTW, this kind of change can't cover scsi_set_blocked() which is
> triggered by timeout, scsi dispatch failure. You will see that
> easily if you run the SCSI test script I provided in the commit log.

Hello Ming,

I am aware that the above change does not cover all cases. That's why I wrote
in my previous e-mail that that patch is not a full solution. The reason I
posted that change anyway is because I prefer a solution that is not based on
delayed queue runs over a solution that is based on delayed queue runs
(blk_mq_delay_run_hw_queue()). My concern is that performance of a solution
based on delayed queue runs will be suboptimal.

Bart.

2017-12-06 23:11:02

by Holger Hoffstätte

[permalink] [raw]
Subject: Re: [PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle

On 12/05/17 08:52, Ming Lei wrote:
> Before commit 0df21c86bdbf ("scsi: implement .get_budget and .put_budget
> for blk-mq"), we run queue after 3ms if queue is idle and SCSI device
> queue isn't ready, which is done in handling BLK_STS_RESOURCE. After
> commit 0df21c86bdbf is introduced, queue won't be run any more under
> this situation.
>
> IO hang is observed when timeout happened, and this patch fixes the IO
> hang issue by running queue after delay in scsi_dev_queue_ready, just like
> non-mq. This issue can be triggered by the following script[1].
>
> There is another issue which can be covered by running idle queue:
> when .get_budget() is called on request coming from hctx->dispatch_list,
> if one request just completes during .get_budget(), we can't depend on
> SCSI's restart to make progress any more. This patch fixes the race too.
>
> With this patch, we basically recover to previous behaviour(before commit
> 0df21c86bdbf) of handling idle queue when running out of resource.
>
> [1] script for test/verify SCSI timeout
> rmmod scsi_debug
> modprobe scsi_debug max_queue=1
>
> DEVICE=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename`
> DISK_DIR=`ls -d /sys/block/$DEVICE/device/scsi_disk/*`
>
> echo "using scsi device $DEVICE"
> echo "-1" >/sys/bus/pseudo/drivers/scsi_debug/every_nth
> echo "temporary write through" >$DISK_DIR/cache_type
> echo "128" >/sys/bus/pseudo/drivers/scsi_debug/opts
> echo none > /sys/block/$DEVICE/queue/scheduler
> dd if=/dev/$DEVICE of=/dev/null bs=1M iflag=direct count=1 &
> sleep 5
> echo "0" >/sys/bus/pseudo/drivers/scsi_debug/opts
> wait
> echo "SUCCESS"
>
> Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq")
> Signed-off-by: Ming Lei <[email protected]>
> ---
> drivers/scsi/scsi_lib.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index db9556662e27..1816dd8259b3 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1967,6 +1967,8 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
> out_put_device:
> put_device(&sdev->sdev_gendev);
> out:
> + if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev))
> + blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
> return false;
> }

So just to follow up on this: with this patch I haven't encountered any
new hangs with blk-mq, regardless of medium (SSD/rotating disk) or scheduler.
I cannot speak for other hangs that may be reproducible by other means,
but for now here's my:

Tested-by: Holger Hoffstätte <[email protected]>

cheers,
Holger

2017-12-07 01:31:48

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle

On Wed, Dec 06, 2017 at 04:07:17PM +0000, Bart Van Assche wrote:
> On Wed, 2017-12-06 at 09:52 +0800, Ming Lei wrote:
> > On Wed, Dec 06, 2017 at 12:28:25AM +0800, Ming Lei wrote:
> > > On Tue, Dec 05, 2017 at 04:08:20PM +0000, Bart Van Assche wrote:
> > > > The patch below is not a full solution but resulted in a significant
> > > > improvement in my tests:
> > > >
> > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > > > index 69e3226e66ca..9d86876ec503 100644
> > > > --- a/block/blk-mq-sched.c
> > > > +++ b/block/blk-mq-sched.c
> > > > @@ -226,6 +226,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > * TODO: get more budgets, and dequeue more requests in
> > > > * one time.
> > > > */
> > > > + blk_mq_sched_mark_restart_hctx(hctx);
> > > > blk_mq_do_dispatch_ctx(hctx);
> > > > } else {
> > > > blk_mq_flush_busy_ctxs(hctx, &rq_list);
> >
> > BTW, this kind of change can't cover scsi_set_blocked() which is
> > triggered by timeout, scsi dispatch failure. You will see that
> > easily if you run the SCSI test script I provided in the commit log.
>
> Hello Ming,
>
> I am aware that the above change does not cover all cases. That's why I wrote
> in my previous e-mail that that patch is not a full solution. The reason I
> posted that change anyway is because I prefer a solution that is not based on
> delayed queue runs over a solution that is based on delayed queue runs
> (blk_mq_delay_run_hw_queue()). My concern is that performance of a solution
> based on delayed queue runs will be suboptimal.

Hi,

The patch I posted won't cause any performance regression because it is
only triggered when queue is becoming idle, also that is exact the way for us
to deal with these cases before.

But if you always call blk_mq_sched_mark_restart_hctx() before a new
dispatch, that may affect performance on NVMe which may never trigger
BLK_STS_RESOURCE.

--
Ming

2017-12-07 01:40:23

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle

On Thu, Dec 07, 2017 at 12:10:51AM +0100, Holger Hoffst?tte wrote:
> On 12/05/17 08:52, Ming Lei wrote:
> > Before commit 0df21c86bdbf ("scsi: implement .get_budget and .put_budget
> > for blk-mq"), we run queue after 3ms if queue is idle and SCSI device
> > queue isn't ready, which is done in handling BLK_STS_RESOURCE. After
> > commit 0df21c86bdbf is introduced, queue won't be run any more under
> > this situation.
> >
> > IO hang is observed when timeout happened, and this patch fixes the IO
> > hang issue by running queue after delay in scsi_dev_queue_ready, just like
> > non-mq. This issue can be triggered by the following script[1].
> >
> > There is another issue which can be covered by running idle queue:
> > when .get_budget() is called on request coming from hctx->dispatch_list,
> > if one request just completes during .get_budget(), we can't depend on
> > SCSI's restart to make progress any more. This patch fixes the race too.
> >
> > With this patch, we basically recover to previous behaviour(before commit
> > 0df21c86bdbf) of handling idle queue when running out of resource.
> >
> > [1] script for test/verify SCSI timeout
> > rmmod scsi_debug
> > modprobe scsi_debug max_queue=1
> >
> > DEVICE=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename`
> > DISK_DIR=`ls -d /sys/block/$DEVICE/device/scsi_disk/*`
> >
> > echo "using scsi device $DEVICE"
> > echo "-1" >/sys/bus/pseudo/drivers/scsi_debug/every_nth
> > echo "temporary write through" >$DISK_DIR/cache_type
> > echo "128" >/sys/bus/pseudo/drivers/scsi_debug/opts
> > echo none > /sys/block/$DEVICE/queue/scheduler
> > dd if=/dev/$DEVICE of=/dev/null bs=1M iflag=direct count=1 &
> > sleep 5
> > echo "0" >/sys/bus/pseudo/drivers/scsi_debug/opts
> > wait
> > echo "SUCCESS"
> >
> > Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq")
> > Signed-off-by: Ming Lei <[email protected]>
> > ---
> > drivers/scsi/scsi_lib.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index db9556662e27..1816dd8259b3 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1967,6 +1967,8 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
> > out_put_device:
> > put_device(&sdev->sdev_gendev);
> > out:
> > + if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev))
> > + blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
> > return false;
> > }
>
> So just to follow up on this: with this patch I haven't encountered any
> new hangs with blk-mq, regardless of medium (SSD/rotating disk) or scheduler.
> I cannot speak for other hangs that may be reproducible by other means,
> but for now here's my:
>
> Tested-by: Holger Hoffst?tte <[email protected]>

Hi Holger,

That is great to see this patch fixes your issue, and thanks for your
test!

Jens, Martin, would any of you mind making this patch in V4.15? Since
it fixes real use cases and this way is exact what we do before
0df21c86bdbf("scsi: implement .get_budget and .put_budget for blk-mq").


Thanks,
Ming

2017-12-07 21:07:06

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle

On Wed, 2017-12-06 at 00:28 +0800, Ming Lei wrote:
> On Tue, Dec 05, 2017 at 04:08:20PM +0000, Bart Van Assche wrote:
> > On Tue, 2017-12-05 at 15:52 +0800, Ming Lei wrote:
> > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > index db9556662e27..1816dd8259b3 100644
> > > --- a/drivers/scsi/scsi_lib.c
> > > +++ b/drivers/scsi/scsi_lib.c
> > > @@ -1967,6 +1967,8 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
> > > out_put_device:
> > > put_device(&sdev->sdev_gendev);
> > > out:
> > > + if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev))
> > > + blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
> > > return false;
> > > }
> >
> > This cannot work since multiple threads can call scsi_mq_get_budget()
>
> That is exactly the way we are handling these cases before 0df21c86bdbf(scsi:
> implement .get_budget and .put_budget for blk-mq), so if it can't work,
> that is not fault of commit 0df21c86bdbf.
>
> > concurrently and hence it can happen that none of them sees
> > atomic_read(&sdev->device_busy) == 0. BTW, the above patch is something I
>
> If there is concurrent .get_budget(), one of them will see the counter
> becoming zero finally because each sdev->device_busy is inc/dec
> atomically. Or scsi_dev_queue_ready() return true.
>
> Anyway, we need this patch to avoid possible regression. If you think
> there are bugs in blk-mq RESTART, just root cause and and fix it.

Hello Ming,

When I looked at the patch at the start of this thread for the first time I
got frustrated because I didn't see how this patch could fix the queue stall
I ran into myself. Today I started realizing that what Holger reported is
probably another issue than what I ran into myself. Since this patch by
itself looks now useful to me:

Reviewed-by: Bart Van Assche <[email protected]>

BTW, do you think the patch at the start of this thread also fixes the issue
that resulted in commit 826a70a08b12 ("SCSI: don't get target/host busy_count
in scsi_mq_get_budget()")? Do you think we still need that patch?

Bart.

2017-12-07 21:12:01

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle

On Thu, 2017-12-07 at 09:31 +0800, Ming Lei wrote:
> But if you always call blk_mq_sched_mark_restart_hctx() before a new
> dispatch, that may affect performance on NVMe which may never trigger
> BLK_STS_RESOURCE.

Hmm ... only the SCSI core implements .get_budget() and .put_budget() and
I proposed to insert a blk_mq_sched_mark_restart_hctx() call under "if
(q->mq_ops->get_budget)". In other words, I proposed to insert a
blk_mq_sched_mark_restart_hctx() call in a code path that is never triggered
by the NVMe driver. So I don't see how the change I proposed could affect
the performance of the NVMe driver?

Bart.

2017-12-08 00:36:56

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle

On Thu, Dec 07, 2017 at 09:11:54PM +0000, Bart Van Assche wrote:
> On Thu, 2017-12-07 at 09:31 +0800, Ming Lei wrote:
> > But if you always call blk_mq_sched_mark_restart_hctx() before a new
> > dispatch, that may affect performance on NVMe which may never trigger
> > BLK_STS_RESOURCE.
>
> Hmm ... only the SCSI core implements .get_budget() and .put_budget() and
> I proposed to insert a blk_mq_sched_mark_restart_hctx() call under "if
> (q->mq_ops->get_budget)". In other words, I proposed to insert a
> blk_mq_sched_mark_restart_hctx() call in a code path that is never triggered
> by the NVMe driver. So I don't see how the change I proposed could affect
> the performance of the NVMe driver?

You only add the check on none scheduler, right?

But this race isn't related with scheduler, that means it can't fix the
race with other schedulers.

I have test case to trigger this issue on both none and mq-deadline, and
my patch fixes them all.

Thanks,
Ming

2017-12-08 00:50:49

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle

On Thu, Dec 07, 2017 at 09:06:58PM +0000, Bart Van Assche wrote:
> On Wed, 2017-12-06 at 00:28 +0800, Ming Lei wrote:
> > On Tue, Dec 05, 2017 at 04:08:20PM +0000, Bart Van Assche wrote:
> > > On Tue, 2017-12-05 at 15:52 +0800, Ming Lei wrote:
> > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > > index db9556662e27..1816dd8259b3 100644
> > > > --- a/drivers/scsi/scsi_lib.c
> > > > +++ b/drivers/scsi/scsi_lib.c
> > > > @@ -1967,6 +1967,8 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
> > > > out_put_device:
> > > > put_device(&sdev->sdev_gendev);
> > > > out:
> > > > + if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev))
> > > > + blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
> > > > return false;
> > > > }
> > >
> > > This cannot work since multiple threads can call scsi_mq_get_budget()
> >
> > That is exactly the way we are handling these cases before 0df21c86bdbf(scsi:
> > implement .get_budget and .put_budget for blk-mq), so if it can't work,
> > that is not fault of commit 0df21c86bdbf.
> >
> > > concurrently and hence it can happen that none of them sees
> > > atomic_read(&sdev->device_busy) == 0. BTW, the above patch is something I
> >
> > If there is concurrent .get_budget(), one of them will see the counter
> > becoming zero finally because each sdev->device_busy is inc/dec
> > atomically. Or scsi_dev_queue_ready() return true.
> >
> > Anyway, we need this patch to avoid possible regression. If you think
> > there are bugs in blk-mq RESTART, just root cause and and fix it.
>
> Hello Ming,
>
> When I looked at the patch at the start of this thread for the first time I
> got frustrated because I didn't see how this patch could fix the queue stall
> I ran into myself. Today I started realizing that what Holger reported is
> probably another issue than what I ran into myself. Since this patch by
> itself looks now useful to me:
>
> Reviewed-by: Bart Van Assche <[email protected]>

Hi Bart,

Thanks for your Review!

>
> BTW, do you think the patch at the start of this thread also fixes the issue
> that resulted in commit 826a70a08b12 ("SCSI: don't get target/host busy_count
> in scsi_mq_get_budget()")? Do you think we still need that patch?

Yeah, once the patch in this thread is in, IO hang shouldn't happen any more
even without 826a70a08b12.

But that commit is still the correct thing to do, since we let blk-mq's sbitmap
queue respect per-host queue depth, which way is much efficient than the
simple atomic operations used in scsi_host_queue_ready(). So I think that commit
is still useful.

When I was figuring out patch of 826a70a08b12, I just ignore the .get_budget() for
requests from hctx->dispatch_list, because we don't need to deal with queue idle
when .get_budget() is called from both blk_mq_do_dispatch_sched() and blk_mq_do_dispatch_ctx().

Thanks,
Ming

2017-12-08 00:56:34

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle


Ming,

> Jens, Martin, would any of you mind making this patch in V4.15? Since
> it fixes real use cases and this way is exact what we do before
> 0df21c86bdbf("scsi: implement .get_budget and .put_budget for blk-mq").

Applied to 4.15/scsi-fixes, thank you!

--
Martin K. Petersen Oracle Linux Engineering