2018-02-08 14:39:00

by Nitesh Shetty

[permalink] [raw]
Subject: [PATCH] blk: optimization for classic polling

This removes the dependency on interrupts to wake up task. Set task
state as TASK_RUNNING, if need_resched() returns true,
while polling for IO completion.
Earlier, polling task used to sleep, relying on interrupt to wake it up.
This made some IO take very long when interrupt-coalescing is enabled in
NVMe.

Reference:
http://lists.infradead.org/pipermail/linux-nvme/2018-February/015435.html
Signed-off-by: Nitesh Shetty <[email protected]>
---
fs/block_dev.c | 16 ++++++++++++----
fs/direct-io.c | 8 ++++++--
fs/iomap.c | 10 +++++++---
3 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 4a181fc..a87d8b7 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -236,9 +236,13 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
set_current_state(TASK_UNINTERRUPTIBLE);
if (!READ_ONCE(bio.bi_private))
break;
- if (!(iocb->ki_flags & IOCB_HIPRI) ||
- !blk_poll(bdev_get_queue(bdev), qc))
+ if (!(iocb->ki_flags & IOCB_HIPRI))
io_schedule();
+ else if (!blk_poll(bdev_get_queue(bdev), qc)) {
+ if (need_resched())
+ set_current_state(TASK_RUNNING);
+ io_schedule();
+ }
}
__set_current_state(TASK_RUNNING);

@@ -401,9 +405,13 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
if (!READ_ONCE(dio->waiter))
break;

- if (!(iocb->ki_flags & IOCB_HIPRI) ||
- !blk_poll(bdev_get_queue(bdev), qc))
+ if (!(iocb->ki_flags & IOCB_HIPRI))
io_schedule();
+ else if (!blk_poll(bdev_get_queue(bdev), qc)) {
+ if (need_resched())
+ set_current_state(TASK_RUNNING);
+ io_schedule();
+ }
}
__set_current_state(TASK_RUNNING);

diff --git a/fs/direct-io.c b/fs/direct-io.c
index a0ca9e4..c815ac9 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -518,9 +518,13 @@ static struct bio *dio_await_one(struct dio *dio)
__set_current_state(TASK_UNINTERRUPTIBLE);
dio->waiter = current;
spin_unlock_irqrestore(&dio->bio_lock, flags);
- if (!(dio->iocb->ki_flags & IOCB_HIPRI) ||
- !blk_poll(dio->bio_disk->queue, dio->bio_cookie))
+ if (!(dio->iocb->ki_flags & IOCB_HIPRI))
io_schedule();
+ else if (!blk_poll(dio->bio_disk->queue, dio->bio_cookie)) {
+ if (need_resched())
+ __set_current_state(TASK_RUNNING);
+ io_schedule();
+ }
/* wake up sets us TASK_RUNNING */
spin_lock_irqsave(&dio->bio_lock, flags);
dio->waiter = NULL;
diff --git a/fs/iomap.c b/fs/iomap.c
index afd1635..b51569d 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -1072,10 +1072,14 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
break;

if (!(iocb->ki_flags & IOCB_HIPRI) ||
- !dio->submit.last_queue ||
- !blk_poll(dio->submit.last_queue,
- dio->submit.cookie))
+ !dio->submit.last_queue)
io_schedule();
+ else if (!blk_poll(dio->submit.last_queue,
+ dio->submit.cookie)) {
+ if (need_resched())
+ set_current_state(TASK_RUNNING);
+ io_schedule();
+ }
}
__set_current_state(TASK_RUNNING);
}
--
2.7.4



2018-02-08 15:26:35

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH] blk: optimization for classic polling

On Sun, May 30, 2083 at 09:51:06AM +0530, Nitesh Shetty wrote:
> This removes the dependency on interrupts to wake up task. Set task
> state as TASK_RUNNING, if need_resched() returns true,
> while polling for IO completion.
> Earlier, polling task used to sleep, relying on interrupt to wake it up.
> This made some IO take very long when interrupt-coalescing is enabled in
> NVMe.
>
> Reference:
> http://lists.infradead.org/pipermail/linux-nvme/2018-February/015435.html
> Signed-off-by: Nitesh Shetty <[email protected]>
> ---
> fs/block_dev.c | 16 ++++++++++++----
> fs/direct-io.c | 8 ++++++--
> fs/iomap.c | 10 +++++++---
> 3 files changed, 25 insertions(+), 9 deletions(-)

I think it'd be simpler to have blk_poll set it back to running if
need_resched is true rather than repeat this patter across all the
callers:

---
diff --git a/block/blk-mq.c b/block/blk-mq.c
index df93102e2149..40285fe1c8ad 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3164,6 +3164,7 @@ static bool __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
cpu_relax();
}

+ set_current_state(TASK_RUNNING);
return false;
}

--

2018-02-08 16:04:39

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH] blk: optimization for classic polling


> I think it'd be simpler to have blk_poll set it back to running if
> need_resched is true rather than repeat this patter across all the
> callers:
>
> ---
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index df93102e2149..40285fe1c8ad 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3164,6 +3164,7 @@ static bool __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
> cpu_relax();
> }
>
> + set_current_state(TASK_RUNNING);
> return false;
> }
>
> --

Nice!

2018-02-12 15:51:58

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] blk: optimization for classic polling

On 05/29/83 20:21, Nitesh Shetty wrote:
> [ ... ]

Hello Nitesh,

Can you check the clock of the system you used to send this e-mail? In
the header of your e-mail I found the following:

Date: Sun, 30 May 2083 09:51:06 +0530

Thanks,

Bart.

2018-02-13 08:28:33

by Nitesh Shetty

[permalink] [raw]
Subject: [PATCH v2 RESENT] blk: optimization for classic polling

This removes the dependency on interrupts to wake up task. Set task
state as TASK_RUNNING, if need_resched() returns true,
while polling for IO completion.
Earlier, polling task used to sleep, relying on interrupt to wake it up.
This made some IO take very long when interrupt-coalescing is enabled in
NVMe.

Reference:
http://lists.infradead.org/pipermail/linux-nvme/2018-February/015435.html

Changes since v1:
-setting task state once in blk_poll, instead of multiple
callers.
Signed-off-by: Nitesh Shetty <[email protected]>
---
block/blk-mq.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index df93102..40285fe 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3164,6 +3164,7 @@ static bool __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
cpu_relax();
}

+ set_current_state(TASK_RUNNING);
return false;
}

--
2.7.4


2018-02-13 14:44:44

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v2 RESENT] blk: optimization for classic polling

On 2/13/18 11:56 AM, Nitesh Shetty wrote:
> This removes the dependency on interrupts to wake up task. Set task
> state as TASK_RUNNING, if need_resched() returns true,
> while polling for IO completion.
> Earlier, polling task used to sleep, relying on interrupt to wake it up.
> This made some IO take very long when interrupt-coalescing is enabled in
> NVMe.

__set_current_state() should suffice here.

--
Jens Axboe


2018-02-13 15:49:28

by Nitesh Shetty

[permalink] [raw]
Subject: [PATCH v3] blk: optimization for classic polling

This removes the dependency on interrupts to wake up task. Set task
state as TASK_RUNNING, if need_resched() returns true,
while polling for IO completion.
Earlier, polling task used to sleep, relying on interrupt to wake it up.
This made some IO take very long when interrupt-coalescing is enabled in
NVMe.

Reference:
http://lists.infradead.org/pipermail/linux-nvme/2018-February/015435.html

Changes since v2->v3:
-using __set_current_state() instead of set_current_state()

Changes since v1->v2:
-setting task state once in blk_poll, instead of multiple
callers.
Signed-off-by: Nitesh Shetty <[email protected]>
---
block/blk-mq.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index df93102..3574927 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3164,6 +3164,7 @@ static bool __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
cpu_relax();
}

+ __set_current_state(TASK_RUNNING);
return false;
}

--
2.7.4


2018-02-13 16:13:59

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v3] blk: optimization for classic polling

On 2/13/18 8:48 AM, Nitesh Shetty wrote:
> This removes the dependency on interrupts to wake up task. Set task
> state as TASK_RUNNING, if need_resched() returns true,
> while polling for IO completion.
> Earlier, polling task used to sleep, relying on interrupt to wake it up.
> This made some IO take very long when interrupt-coalescing is enabled in
> NVMe.

Thanks, applied.

--
Jens Axboe


2018-02-20 13:22:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] blk: optimization for classic polling

On Sun, May 30, 2083 at 09:51:06AM +0530, Nitesh Shetty wrote:
> This removes the dependency on interrupts to wake up task. Set task
> state as TASK_RUNNING, if need_resched() returns true,
> while polling for IO completion.
> Earlier, polling task used to sleep, relying on interrupt to wake it up.
> This made some IO take very long when interrupt-coalescing is enabled in
> NVMe.

This is a horrible Changelog.. it does not in fact explain why the patch
works or is correct.

Also, set_current_state(TASK_RUNNING) is dodgy (similarly in
__blk_mq_poll), why do you need that memory barrier?


> Signed-off-by: Nitesh Shetty <[email protected]>
> ---
> fs/block_dev.c | 16 ++++++++++++----
> fs/direct-io.c | 8 ++++++--
> fs/iomap.c | 10 +++++++---
> 3 files changed, 25 insertions(+), 9 deletions(-)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 4a181fc..a87d8b7 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -236,9 +236,13 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
> set_current_state(TASK_UNINTERRUPTIBLE);
> if (!READ_ONCE(bio.bi_private))
> break;
> - if (!(iocb->ki_flags & IOCB_HIPRI) ||
> - !blk_poll(bdev_get_queue(bdev), qc))
> + if (!(iocb->ki_flags & IOCB_HIPRI))
> io_schedule();
> + else if (!blk_poll(bdev_get_queue(bdev), qc)) {
> + if (need_resched())
> + set_current_state(TASK_RUNNING);
> + io_schedule();
> + }
> }
> __set_current_state(TASK_RUNNING);
>

2018-02-20 16:28:53

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH] blk: optimization for classic polling

On Tue, Feb 20, 2018 at 02:21:37PM +0100, Peter Zijlstra wrote:
> Also, set_current_state(TASK_RUNNING) is dodgy (similarly in
> __blk_mq_poll), why do you need that memory barrier?

You're right. The subsequent revision that was committed removed the
barrier. The commit is here:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=67b4110f8c8d16e588d7730db8e8b01b32c1bd8b

I hope the code at least looks more reasonable. The changelog isn't much
different, though.

2018-02-20 22:38:20

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] blk: optimization for classic polling

On 2/20/18 3:21 AM, Peter Zijlstra wrote:
> On Sun, May 30, 2083 at 09:51:06AM +0530, Nitesh Shetty wrote:
>> This removes the dependency on interrupts to wake up task. Set task
>> state as TASK_RUNNING, if need_resched() returns true,
>> while polling for IO completion.
>> Earlier, polling task used to sleep, relying on interrupt to wake it up.
>> This made some IO take very long when interrupt-coalescing is enabled in
>> NVMe.
>
> This is a horrible Changelog.. it does not in fact explain why the patch
> works or is correct.

Yeah, that should have been improved.

> Also, set_current_state(TASK_RUNNING) is dodgy (similarly in
> __blk_mq_poll), why do you need that memory barrier?

I pointed that out in the review, and v2 fixed it. v2 is the
one that got merged.

--
Jens Axboe


2018-02-21 08:33:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] blk: optimization for classic polling

On Tue, Feb 20, 2018 at 12:37:07PM -1000, Jens Axboe wrote:
> On 2/20/18 3:21 AM, Peter Zijlstra wrote:
> > On Sun, May 30, 2083 at 09:51:06AM +0530, Nitesh Shetty wrote:
> >> This removes the dependency on interrupts to wake up task. Set task
> >> state as TASK_RUNNING, if need_resched() returns true,
> >> while polling for IO completion.
> >> Earlier, polling task used to sleep, relying on interrupt to wake it up.
> >> This made some IO take very long when interrupt-coalescing is enabled in
> >> NVMe.
> >
> > This is a horrible Changelog.. it does not in fact explain why the patch
> > works or is correct.
>
> Yeah, that should have been improved.

Being ever more forgetful (I blame the kids) I find Changelogs more and
more important, ymmv ;-)

> > Also, set_current_state(TASK_RUNNING) is dodgy (similarly in
> > __blk_mq_poll), why do you need that memory barrier?
>
> I pointed that out in the review, and v2 fixed it. v2 is the
> one that got merged.

Right missed that. In fact, possibly the only reason I saw this is that
Nitesh had this computer configured wrong and the email is from the
future and thus the very first entry in my lkml folder.

2018-10-10 18:54:01

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH] blk: optimization for classic polling

On 05/29/2083 08:21 PM, Nitesh Shetty wrote:
> This removes the dependency on interrupts to wake up task. Set task
> state as TASK_RUNNING, if need_resched() returns true,
> while polling for IO completion.
> Earlier, polling task used to sleep, relying on interrupt to wake it up.
> This made some IO take very long when interrupt-coalescing is enabled in
> NVMe.
>
> Reference:
> http://lists.infradead.org/pipermail/linux-nvme/2018-February/015435.html
> Signed-off-by: Nitesh Shetty <[email protected]>

There is something seriously wrong with the date on your computer, this
patch is dated from May 29th 2083,... I would not be surprised if it
messed up people's email client and you did not get any response because
of that...
--
Florian