On 9/17/22 10:40 AM, Liu Song wrote:
> From: Liu Song <[email protected]>
>
> NVMe devices usually have a 1:1 mapping between "ctx" and "hctx",
> so when "nr_ctx" is equal to 1, there is no possibility of remote
> request, so the corresponding process can be simplified.
If the worry is the call overhead of blk_mq_complete_request_remote(),
why don't we just make that available as an inline instead? That seems
vastly superior to providing a random shortcut in a driver to avoid
calling it.
--
Jens Axboe
On 2022/9/18 00:50, Jens Axboe wrote:
> On 9/17/22 10:40 AM, Liu Song wrote:
>> From: Liu Song <[email protected]>
>>
>> NVMe devices usually have a 1:1 mapping between "ctx" and "hctx",
>> so when "nr_ctx" is equal to 1, there is no possibility of remote
>> request, so the corresponding process can be simplified.
> If the worry is the call overhead of blk_mq_complete_request_remote(),
> why don't we just make that available as an inline instead? That seems
> vastly superior to providing a random shortcut in a driver to avoid
> calling it.
Hi
This is what I think about it. If it is an SSD with only one hw queue,
remote requests will
appear occasionally. As a real multi-queue device, nvme usually does not
have remote
requests, so we don't need to care about it. So even if
"blk_mq_complete_request_remote"
is called, there is a high probability that it will return false, and
the cost of changing the call
to "blk_mq_complete_request_remote" to determine whether
"req->mq_hctx->nr_ctx" is 1 is
not only a function call, but more judgments in
"blk_mq_complete_request_remote".
If "blk_mq_complete_request_remote" is decorated as inline, it also
depends on the compiler,
there is uncertainty.
Thanks
On 9/18/22 10:10 AM, Liu Song wrote:
>
> On 2022/9/18 00:50, Jens Axboe wrote:
>> On 9/17/22 10:40 AM, Liu Song wrote:
>>> From: Liu Song <[email protected]>
>>>
>>> NVMe devices usually have a 1:1 mapping between "ctx" and "hctx",
>>> so when "nr_ctx" is equal to 1, there is no possibility of remote
>>> request, so the corresponding process can be simplified.
>> If the worry is the call overhead of blk_mq_complete_request_remote(),
>> why don't we just make that available as an inline instead? That seems
>> vastly superior to providing a random shortcut in a driver to avoid
>> calling it.
>
> Hi
>
> This is what I think about it. If it is an SSD with only one hw queue,
> remote requests will appear occasionally. As a real multi-queue
> device, nvme usually does not have remote requests, so we don't need
> to care about it. So even if "blk_mq_complete_request_remote" is
> called, there is a high probability that it will return false, and the
> cost of changing the call to "blk_mq_complete_request_remote" to
> determine whether "req->mq_hctx->nr_ctx" is 1 is not only a function
> call, but more judgments in "blk_mq_complete_request_remote". If
> "blk_mq_complete_request_remote" is decorated as inline, it also
> depends on the compiler, there is uncertainty.
I'm not disagreeing with any of that, my point is just that you're
hacking around this in the nvme driver. This is problematic whenever
core changes happen, because now we have to touch individual drivers.
While the expectation is that there are no remote IPI completions for
NVMe, queue starved devices do exist and those do see remote
completions.
This optimization belongs in the blk-mq core, not in nvme. I do think it
makes sense, you just need to solve it in blk-mq rather than in the nvme
driver.
--
Jens Axboe
On Mon, Sep 19, 2022 at 08:10:31AM -0600, Jens Axboe wrote:
> I'm not disagreeing with any of that, my point is just that you're
> hacking around this in the nvme driver. This is problematic whenever
> core changes happen, because now we have to touch individual drivers.
> While the expectation is that there are no remote IPI completions for
> NVMe, queue starved devices do exist and those do see remote
> completions.
>
> This optimization belongs in the blk-mq core, not in nvme. I do think it
> makes sense, you just need to solve it in blk-mq rather than in the nvme
> driver.
I'd also really see solid numbers to justify it.
And btw, having more than one core per queue is quite common in
nvme. Even many enterprise SSDs only have 64 queues, and some of
the low-end consumers ones have very low number of queues that are
not enough for the number of cores in even mid-end desktop CPUs.
On 2022/9/19 22:35, Christoph Hellwig wrote:
> On Mon, Sep 19, 2022 at 08:10:31AM -0600, Jens Axboe wrote:
>> I'm not disagreeing with any of that, my point is just that you're
>> hacking around this in the nvme driver. This is problematic whenever
>> core changes happen, because now we have to touch individual drivers.
>> While the expectation is that there are no remote IPI completions for
>> NVMe, queue starved devices do exist and those do see remote
>> completions.
>>
>> This optimization belongs in the blk-mq core, not in nvme. I do think it
>> makes sense, you just need to solve it in blk-mq rather than in the nvme
>> driver.
> I'd also really see solid numbers to justify it.
>
> And btw, having more than one core per queue is quite common in
> nvme. Even many enterprise SSDs only have 64 queues, and some of
> the low-end consumers ones have very low number of queues that are
> not enough for the number of cores in even mid-end desktop CPUs.
Hi
Thank you for your suggestion. Here is what I think about it. NVMe
devices that can support
one core per queue are usually high-performance, so I prefer to make
more optimizations
for high-performance devices, while for ordinary consumption class NVMe
devices, such
modifications will not bring special additional overhead.
Thanks
From: Liu Song <[email protected]>
High-performance NVMe devices usually support a large hw queue, which
ensures a 1:1 mapping of hctx and ctx. In this case there will be no
remote request, so we don't need to care about it.
Signed-off-by: Liu Song <[email protected]>
---
block/blk-mq.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c11949d..9626ea1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1093,10 +1093,12 @@ bool blk_mq_complete_request_remote(struct request *rq)
WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
/*
- * For a polled request, always complete locally, it's pointless
- * to redirect the completion.
+ * For request which hctx has only one ctx mapping,
+ * or a polled request, always complete locally,
+ * it's pointless to redirect the completion.
*/
- if (rq->cmd_flags & REQ_POLLED)
+ if (rq->mq_hctx->nr_ctx == 1 ||
+ rq->cmd_flags & REQ_POLLED)
return false;
if (blk_mq_complete_need_ipi(rq)) {
--
1.8.3.1
On Wed, Sep 21, 2022 at 11:32:03AM +0800, Liu Song wrote:
> From: Liu Song <[email protected]>
>
> High-performance NVMe devices usually support a large hw queue, which
a larger number of?
> /*
> - * For a polled request, always complete locally, it's pointless
> - * to redirect the completion.
> + * For request which hctx has only one ctx mapping,
> + * or a polled request, always complete locally,
> + * it's pointless to redirect the completion.
> */
> - if (rq->cmd_flags & REQ_POLLED)
> + if (rq->mq_hctx->nr_ctx == 1 ||
> + rq->cmd_flags & REQ_POLLED)
Some very odd comment formatting and and indentation here.
On 2022/9/22 14:20, Christoph Hellwig wrote:
> On Wed, Sep 21, 2022 at 11:32:03AM +0800, Liu Song wrote:
>> From: Liu Song <[email protected]>
>>
>> High-performance NVMe devices usually support a large hw queue, which
> a larger number of?
>
>> /*
>> - * For a polled request, always complete locally, it's pointless
>> - * to redirect the completion.
>> + * For request which hctx has only one ctx mapping,
>> + * or a polled request, always complete locally,
>> + * it's pointless to redirect the completion.
>> */
>> - if (rq->cmd_flags & REQ_POLLED)
>> + if (rq->mq_hctx->nr_ctx == 1 ||
>> + rq->cmd_flags & REQ_POLLED)
> Some very odd comment formatting and and indentation here.
Thanks, I will issue a V2 patch as suggested.
From: Liu Song <[email protected]>
High-performance NVMe devices usually support a larger number of hw queue,
which ensures a 1:1 mapping of hctx and ctx. In this case there will be no
remote request, so we don't need to care about it.
Signed-off-by: Liu Song <[email protected]>
---
block/blk-mq.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c11949d..7f9be13 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1093,10 +1093,11 @@ bool blk_mq_complete_request_remote(struct request *rq)
WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
/*
- * For a polled request, always complete locally, it's pointless
- * to redirect the completion.
+ * If the request's hctx has only one ctx mapping,
+ * or is a polled request, both always complete locally,
+ * it's pointless to redirect the completion.
*/
- if (rq->cmd_flags & REQ_POLLED)
+ if (rq->mq_hctx->nr_ctx == 1 || rq->cmd_flags & REQ_POLLED)
return false;
if (blk_mq_complete_need_ipi(rq)) {
--
1.8.3.1
On Wed, 21 Sep 2022 11:32:03 +0800, Liu Song wrote:
> From: Liu Song <[email protected]>
>
> High-performance NVMe devices usually support a large hw queue, which
> ensures a 1:1 mapping of hctx and ctx. In this case there will be no
> remote request, so we don't need to care about it.
>
>
> [...]
Applied, thanks!
[1/1] blk-mq: hctx has only one ctx mapping is no need to redirect the completion
commit: f168420c62e7a106961f2489a89f6ade84fe3f27
Best regards,
--
Jens Axboe